Train a network on the data
train.py script will load your data files into memory and train a neural network to distinguish between them. There is good support in Keras and TensorFlow for training from disk without loading every example into RAM, but this script is a quick and easy way to get started. To run the script, enter the following command:
python3 train.py example/model.h5 example/yeah example/sitting example/random
The first argument is the name of the model file to save the neural network to, which is
model.h5. You then list the different behavior video files that you have recorded. It is important that the first file is the file that contains your cheering gesture. All of the others are gestures the network should learn to ignore.
The slowest part of training is loading and converting the data. After that, the neural network should compute 20 iterations and then stop. In the example shown in this screenshot, the neural network converges to an accuracy of 98%.
What is going on in this process?
The script takes the training files in turn and loads all the frames from each file, which means that you need sufficient RAM to hold all the files. When the images are loaded, they are passed through a neural network, called MobileNet, that has been pre-trained on the ImageNet dataset.
MobileNet has already learned how to extract useful features, like edges, texture, and shapes from images. This step is the slowest part of training. If we were artificially adding noise and variation to the images, which is called domain randomization, this step would be even slower and longer. We are taking a short cut in this guide and skipping this step.
Having converted every frame into an array of features, these features are used to train a small classifier network that learns to predict the class (in this case the 0-based input file index) it came from.
Each epoch is one cycle through every frame. To prevent it latching on to overly specific things about individual frames, you add some random noise to the features. Adding noise may prevent the neural network from reaching 100% accuracy during training but makes the neural network more accurate when run on previously unseen images.
MobileNet was introduced by Google in 2017. It is a convolutional neural network that has been optimized for size and speed at the expense of accuracy. MobileNet is trained on the well-known ImageNet dataset, in which one million images are split into 1000 different classes. In this guide, we have removed the final layer that classifies images into 1000 categories removed from MobileNet. Instead, the
train.py script uses Keras to build a new layer that classifies images into the number of categories that are passed to train.py.
If you read the code in the
train.py file, you will notice that the classifier model is simple. The data is only augmented with gaussian noise instead of rotations, transforms, and occlusions. The learning rates and batch size are left at Keras defaults. A validation split with early stopping is not used. Domain randomization is applied to the data at source, and is not varied across epochs during training. This example is deliberately clean and sparse to keep it easy to understand and quick to train even on a Pi Zero. This means that you can experiment to improve your results.