Record data

In Episode 1, images are converted to a list of numpy arrays, which are written to disk at the end of the program. This enables a higher frame rate on low-end Pi Zero hardware, but it limits the number of images that can be stored by the amount of RAM that is available. In this guide, we are going to write out each frame as a PNG (Portable Network Graphic) file.

Record video and classify actions

Start by setting up your filming environment. We recommend doing a test recording to ensure that your camera is positioned correctly to capture your actions.

Define your actions, or use the ones from our demo.

For our demo video, these classes were used:

Class Action
0 None - moving around, sitting, putting on and taking off coat
1 Door - coming in or leaving through the door
2 Light - pointing at the main light to turn it on/off
3 Music - holding both hands up to start the music playing
4 Stop - holding up one hand to dim the volume of the music

Rehearse a short sequence that encompasses all of the actions.

To film all of your gestures in one recording and write out the output as png files:

  1. Record all the data in one go:
    python day1 -1

    The -1 tells the script to keep running until you kill it with ctrl-C. 

  2. After the images have been saved in day1/*.png, run the classify script:
    python day1
    screenshot of image classification and progress in percent 
  3. Use these keyboard controls to classify the images:
  • 0-4: Classify the current image as '0' through '4'.
  • 9: Classify the next 10 images as '0' (useful as there are often a lot of "normal" images in a row).
  • Escape: Go back one image.
  • Space: Toggle viewer brightness to make human classification easier (does not modify the image file).
  • S: Move classified images to their new directories and quit.

    You can close the window without pressing S to exit without moving any files. This is not very user-friendly but provides just enough interface to perform the task.

    If you interrupted with ctrl-c then the last image file is usually incomplete and cannot be loaded. In that case, classifying will show an error message in the title bar for that image and will not move it to a classification directory. You can delete this image manually after checking if it is unreadable.

What are we doing here?

Writing out each frame as a PNG file is a widely used technique which is compatible with the Keras tools for streaming data from disk during training. It allows easy inspection and curation of the training data, runs at 11 FPS, and generates approximately 700 MB of data per hour. This combination of good performance and easy access to the data makes this suitable for training a network from scratch with moderate amounts of data.

The standard way to store this data is in the following format:


Here, DATA_DIR is a directory for this set of data (for example, the recordings from one particular day) and CLASS is the action class that the images represent. 

Recording the data is straightforward, as seen in the script:

section of the script that records the data

For our demo video, 5 classes were used - one for each action. 

You can record classes, individually. For example, by running:

python day1/0 60

This will record 1 minute of images that should represent the None action (class 0). However, it is more efficient to record multiple actions together in a natural sequence, such as entering the room, pointing at the light, sitting down, getting up and then leaving the room. When doing this you need to label the images with their class after recording ends, and split them into separate directories, as described in the next section.

Advanced information

There are several other ways of collecting and storing your data, each with their own strengths and weaknesses. Here are some of these alternatives, for future reference:

  • Run the MobileNet feature extractor during recording and only save its output features

    With a slightly modified feature extractor (an AvgPool layer at the end to reduce the output features to just 256 floats) this generates around 27 MB of data per hour of recorded video.

    This is very efficient and eliminates any concerns about memory usage. However, it reduces the frame rate to 7 FPS and as only the features are stored and not the images there is no way to examine the recorded images to understand problems that occur during training.

    This method forces you to stick with one feature extractor; to train a new neural network from scratch or try a different feature extractor you will have to recollect all the data.

  • Use HDF5 to read and write files

    The Hierarchal Data Format (HDF5) is designed to store and organize large amounts of data. Python's h5py module can be used to append frames to a file directly, rather than buffering them all in memory. This keeps access to the raw images whilst avoiding the need to store many images in memory during recording or training.

    This reduces the frame rate to 8 FPS on a Raspberry Pi 3 and generates around 8 GB of data per hour of recorded video. This is storage-intensive and the images are not particularly convenient to inspect while debugging training problems.

  • Record video and post-process it to extract frames

    This is the most efficient option and allows a high frame rate to be captured and stored. When recording very long sequences, we recommend considering this option.

    However, there are a few issues to be aware of. Firstly, recording compressed video does not perform as well if you are randomly fluctuating the white balance of the camera to add robustness to different lighting conditions. Secondly, lossy compression can cause artifacts in individual frames that do not appear in your "real" data while the network is being used. This can, for example, teach a network to identify objects based on the compression artifacts caused by their patterns. If possible, it is best to avoid such complexities.

Previous Next