Set up your camera and environment
Neural networks can learn to detect and generalize small gestural changes that only affect a few dozen pixels on the screen, but it takes much more data and training time than we want to spend.
To make it easier for the neural network to learn from limited data, place the camera so that:
- Your gestures are roughly in the center of the camera view.
- You are close enough to the camera that your gesture affects a significant number of pixels.
A good rule is to place the camera in front of you so that your hands can reach the edges of the frame when you stretch your hands out in front of you. To preview what the record and run scripts will see, use the
preview.py script with this command:
The setup in this example preview is acceptable:
It does not matter if the camera is the correct way up as long as its position remains consistent. For best results, ensure that:
- There is not too much activity happening in the background.
- There is good lighting and the camera has a good view of the subject.
- Your arms are in the picture even when they are extended.
When you are happy with the positioning of the camera and the lighting, either close the preview window or press ctrl-c at the console to exit the preview app.
Teaching by example
An AI is only as good as its data. When we train an AI, we teach it by example. In this case, you are giving the AI one set of pictures with your hands up and another set of pictures with your hands down. Each frame of video is one of these pictures.
Ideally, the AI learns that the principal difference between the two sets of pictures is the position of your hands. However, the AI might learn to recognize other things, for example:
- The angle at which you are leaning. This is because people often lean back slightly when they are seated and raising their hands.
- The overall lighting is brighter or dimmer. This is because your raised hands might cast a shadow on the camera when you raise them.
- If someone in the background is working in Word or watching YouTube, their monitor might change.
To help you get good results, here are some tips:
- Record variations of the gesture. With your hands in the air lean left and right, wave them higher and lower, twist left and right. Recording variations helps the network to recognize more of the slight changes that it will see in the real world.
- Record the same variations without the gesture. Try to do the same actions but with your hands down. This teaches the neural network that the chief difference between cheering and not cheering is the position of your hands, and nothing else.
- Record random things that you want the neural network to ignore. These can include things like scratching your head, standing up and walking away, jumping up and down, hiding, and covering the camera.
The most important thing is to be consistent. If you are teaching the network to detect your hands in the air, keep them up there in every frame of the hands in the air recording. Vary the position and angle of your hands, but do not lower them all the way down and then put them up again. If you do, you will include frames that look just like the hands down frames. This means that the network will assume that the difference is, for example, how your clothing is folded or how lights reflect off your glasses.