Train a neural network from scratch
Training a convolutional network is very compute-intensive and will take a long time on a Raspberry Pi 3. It will be quicker to copy the files to a laptop or desktop and run the
train.py script there. To do this you will need to install TensorFlow on your laptop or desktop by following this guide.
To train a neural network from scratch with the LeNet-like model using your training data and validation data, use this command:
python train.py day1 val_day1
This shows regular progress updates while training the model, as shown here:
Although the output suggests that the network will train for 100 epochs, this is the maximum value, and in practice it finishes earlier than this. The
day1/model.h5 file is updated whenever a better result is achieved, so it is possible to leave training running, then copy the best
model.h5 file that gets produced to the Raspberry Pi and try it out to decide whether it is already good enough for real-world use.
Note: A GPU will speed this up but is not necessary. With 2500 images, the models train in under an hour on a 2017 MacBook Pro.
What are we doing here?
Let's take a look at what's going on in
The simple LeNet architecture features blocks of convolution, activation, and max pooling followed by one or more dense layers. This architecture works well for a wide range of applications and is small enough to run at around 10 FPS on a Raspberry Pi 3.
The code in
train.py sets up a simple convolutional network following this pattern:
You can increase or decrease the capability of the network by increasing and decreasing the channels (the first argument to each Conv2D call) and the size of the dense layer. In the code shown, these are set to 32, 32, 64, 64 respectively. To detect just one gesture (such as pointing at a light to turn it on or off) a network using 16, 16, 32, 16 trains and runs twice as fast with no loss in accuracy, so feel free to experiment with these values.
Once a good model has been found it can be instructive to come back and explore the effect of different activation functions such as
selu, the amount of dropout used, adding batch normalization or additional layers and so on. In general, this is not necessary as the above defaults already work well across multiple tasks.
train.py script used in the previous tutorial loaded all the data into memory before training, which limited the number of images that could be used. This version uses Keras' ImageDataGenerator to stream the images from disk without loading them all at once:
This code also uses ImageDataGenerator's data augmentation to randomly shear and zoom each image in the training set by up to 20% each time it is seen by the neural network. This helps to make sure the neural network does not overfit to specific locations or sizes without having to move the camera between each recording.
When training a convolutional neural network from scratch instead of just fitting a classifier to features as in the previous tutorial, it helps to use a few extra tricks:
The three callbacks shown here each help improve training and generalization performance:
- ModelCheckpoint: this ensures that the final model.h5 file saved is the one with the best score on the validation dataset, even if the model overfit the training data in subsequent epochs.
- EarlyStopping: this stops training after validation performance does not improve for more than 10 epochs to help prevent overfitting the validation dataset itself.
- ReduceLROnPlateau: this decreases the learning rate when training performance levels off and improves the overall performance of the model without having to fine-tune the learning-rate parameter.
You can explore the source code of
train.py to see how these pieces fit together.