Overview

This guide is the second episode in our series on how to teach your Raspberry Pi gesture recognition using a neural network.

Whereas Episode 1 shows you how to train a neural network on a Raspberry Pi to recognize one simple gesture, this guide shows you how to implement a neural network on your Pi that recognizes multiple gestures.

First, you will record a sequence of gestures and manually classify them. The footage is saved as a series of images files directly to a filesystem, which avoids loading to and from memory. These images become your training and validation datasets that you'll use to train a neural network from scratch which we can compare to previously trained models. We will introduce some techniques that help improve the performance of a neural network to perform gesture recognition using larger datasets and in different lighting conditions, and then explain how to deploy your new network with and integrate speech, music, and light control.

This demo video shows the resulting system that can be achieved Here you can see a user’s gestures controlling a lamp, music playback and volume, triggering an audio speech file, and locking/unlocking a PC:


The neural network demonstrated in the video distinguishes between multiple gestures and works across varied lighting conditions and clothing. The neural network runs on a single Raspberry Pi 3 with a connected ZeroCam which is positioned in the corner of the room and out of shot.

The limitations of the simple approach used in Episode 1 were:

  • Restricted data set size - all the images needed to be loaded into RAM before training.
  • Limited sensitivity - the pretrained MobileNet was not well-suited to recognizing gestures that only use a few percent of the pixels or were not in the middle of the frame.

This episode addresses these limitations and outlines the process of designing and training neural networks for a specific task.

Next