Set up your camera and environment

Neural networks can learn to detect and generalize small gestural changes that only affect a few dozen pixels on the screen, but it takes much more data and training time than we want to spend.

To make it easier for the neural network to learn from limited data, place the camera so that:

  • Your gestures are roughly in the center of the camera's view.
  • You are close enough that a significant number of pixels are affected by the gesture.

A good rule of thumb is to place the camera in front of you so that your hands can reach the edges of the frame when they are out-stretched. To preview what the record and run scripts will see, use the script with this command:


The setup in this example preview is acceptable:

It does not matter if the camera is the right way up as long as it remains consistent. For best results, ensure that:

  • There is not too much activity happening in the background.
  • There is good lighting and a good view of the subject.
  • Your arms are in the picture even when they are extended.

When you are happy with the positioning of the camera and the lighting either close the window or press ctrl-c at the console to exit the preview app.

Teaching by example

An AI is only as good as its data! When we train an AI we teach it by example – in this case you are giving it one set of pictures with your hands up and another set with your hands down. Each frame of video is one of these pictures.

Ideally the AI learns that the principle difference between these sets of pictures is the position of your hands. However, depending on the examples it might learn to recognise coincidental things, such as:

  • The angle with which you are leaning, as people tend to lean back slightly when raising their hands while seated.
  • The overall lighting is brighter or dimmer, if your raised hands cast a shadow on the camera.
  • Whether someone in the background is working in word or watching youtube, if their monitor is visible and they happened to change when you did.

Here are some tips to help you get good results:

  • Record variations of the gesture. With your hands in the air lean left and right, wave them higher and lower, twist left and right. Recording variations helps the network recognize more of the slight changes it will see in the real world.
  • Record the same variations without the gesture. Try to do the same actions but with your hands down. This teaches the neural network that the chief difference between the cheering and not cheering is the position of your hands and nothing else.
  • Record random things you want ignored. These can include things like scratching your head, standing up and walking away, jumping up and down, hiding, and covering the camera.

Above all, be consistent. If you are teaching the network to detect your hands in the air, keep them up there in every frame of the "hands in the air" recording. Vary the position and angle but do not take them all the way down and then put them up again, or you will include frames that look just like the "hands down" frames. This will force the network to assume the difference in those cases is, for example how your clothing is folded, or the reflection of light on your glasses.

Train the AI Record data