Test the network on held-out data

To evaluate whether a model will work in the real world it is useful to build a test dataset.

To do this, modify record.py and initialize the camera with training_mode set to False instead of True, like this:

setting the training_mode to False in record.py

Now use it to record a single set of all the gestures you want to be recognized:

python record.py test1 -1

The -1 indicates that record.py should keep recording until you stop it by pressing ctrl-c on the keyboard, but you can also provide a fixed number of seconds if you prefer.

There are two important features of this test data:

  1. The training process of the network should never see it, not even as validation data.
  2. It should be different from the training data in the same way that real-world usage might be different.

To test a network that should work whatever clothing you are wearing, record the test set wearing different clothes to the training set. Likewise, if you want the network to generalize to people that it has not seen before, make sure a previously unseen person is recorded in the test set.

Use the test.py script to load the model from the training directory and run each image in the test directory through it:

python test.py day1 test1

Incorrectly predicted images are listed along with their expected class and the probabilities the network predicted, allowing you to open the files and start to understand why the neural network did not recognize them. After testing on all images, the prediction accuracy for each class as well as overall is printed:

Terminal showing prediction accuracy in percent

Note: Some terminals, such as iTerm on the Mac, will interpret the filenames as links, allowing you to click on them to open the image directly. For the model we tested, most of the errors were in predicting class 1 (opening and closing the door). When manually inspecting these images it is clear in every case that the door is ever so slightly open and in fact, the classification label in the test set is arguably incorrect.

Previous Next