Train a network on the data

Running the train.py script loads your data files into memory and train a neural network to distinguish between them. There is good support in Keras and TensorFlow for training from disk without loading every example into RAM, but this is a quick and easy way to get us started.

Enter the following command to run the script:

python train.py example/model.h5 example/yeah example/sitting example/random 

The first argument is the name of the model file to save the neural network to, which is model.h5. You then list the different behavior video files that you have recorded. It is important that the first one is the file that contains your cheering gesture. All of the others are gestures the network should learn to ignore.

The slowest part of training is loading and converting the data. After that it should compute 20 iterations and stop. In the example shown in this screenshot, it converges to an accuracy of 98%.

command line showing the results of the command train.py

What is going on in this process?

Here, the script takes the training files in turn and load all the frames from each. This means that you need enough RAM to hold them all. Once the images are loaded, they are passed through a neural network (MobileNet) that has been pre-trained on the ImageNet dataset.

This has already learned how to extract useful features from images, such as edges, texture, and shapes. This is the slowest part of training, and if we were artificially adding noise and variation to the images (domain randomization) it would be even slower and longer, which is why we took a short cut here.

Having converted every frame into an array of features, these features are used to train a small classifier network that learns to predict the class (in this case the 0-based input file index) it came from.

Each epoch is one cycle through every frame. To prevent it latching on to overly-specific things about individual frames, you add some random noise to the features. This may prevent it from reaching 100% accuracy during training but will make it more accurate when run on previously unseen images, for example, in real use.

Advanced information

MobileNet was introduced by Google in 2017. It is a convolutional neural network that has been optimized for size and speed at the expense of accuracy. MobileNet is trained on the well-known ImageNet dataset, in which 1 million images are split into 1000 different classes. The MobileNet included here has had that final layer that classifies into 1000 categories removed. Instead, the train.py script uses Keras to build a new layer that classifies into however many categories are passed to train.py.

If you read the code in the train.py file you will notice the classifier model is as simple as can be, the data is only augmented with gaussian noise instead of rotations, transforms and occlusions and the learning rates and batch size are left at Keras defaults. A validation split with early stopping is not used and your ghetto domain randomization is applied to the data at source and not varied across epochs during training. There is a lot of room for improvement here so you can experiment to try and improve your results. This example is deliberately clean and sparse to keep it easy to understand and quick to train even on a Pi Zero.

Record data Run network