-
Advanced information
To prepare the demo system shown in the video at the start of this guide, 3-4 minutes of video was recorded every day for three days. Half of this was filmed in dim lighting and half filmed in bright lighting. Each one of the five actions was repeated with several variations and the results were classified and saved.
After training a ConvNet from scratch on the day1 dataset, it achieved an accuracy of 98% on its training data and 98% on the validation set. The high validation accuracy shows that the model has not just memorized the input data, but as the validation data was drawn from the same day it does not tell us whether the model will generalize to different clothes, ambient daylight and so on.
The test set performance is good but not great - making a mistake 9% of the time leads to many misinterpreted gestures:
As more data is added from subsequent days, the performance improves across on every dataset:
Here we can see that although neither
day1/model.h5
norday1+2/model.h5
were trained on the day3 data, having seen two days worth of data meant thatday1+2/model.h5
handled it significantly better. Adding the 3rd day of data almost halved the error on the test set and examining the remaining errors shows many pictures that were arguably mislabelled by hand in the classification process.For our demo video, this was sufficient for reliable predictions but each use case will vary. To get the best performance, keep on adding extra datasets until there is no extra benefit from training on them.
This process ends in a single model file, such as
day1+2+3/model.h5
, which produces a good performance on all the data seen so far as well as the test set. All that remains now is to deploy it.