In Episode 1, images are converted to a list of numpy arrays, which are written to disk at the end of the program. This enables a higher frame rate on low-end Pi Zero hardware, but it limits the number of images that can be stored by the amount of RAM that is available. In this guide, we are going to write out each frame as a PNG (Portable Network Graphic) file.
Record video and classify actions
Start by setting up your filming environment. We recommend doing a test recording to ensure that your camera is positioned correctly to capture your actions.
Define your actions, or use the ones from our demo.
For our demo video, these classes were used:
|0||None - moving around, sitting, putting on and taking off coat|
|1||Door - coming in or leaving through the door|
|2||Light - pointing at the main light to turn it on/off|
|3||Music - holding both hands up to start the music playing|
|4||Stop - holding up one hand to dim the volume of the music|
Rehearse a short sequence that encompasses all of the actions.
To film all of your gestures in one recording and write out the output as png files:
- Record all the data in one go:
python record.py day1 -1
The -1 tells the
record.pyscript to keep running until you kill it with ctrl-C.
- After the images have been saved in day1/*.png, run the classify script:
python classify.py day1
- Use these keyboard controls to classify the images:
- 0-4: Classify the current image as '0' through '4'.
- 9: Classify the next 10 images as '0' (useful as there are often a lot of "normal" images in a row).
- Escape: Go back one image.
- Space: Toggle viewer brightness to make human classification easier (does not modify the image file).
- S: Move classified images to their new directories and quit.
You can close the window without pressing S to exit without moving any files. This is not very user-friendly but provides just enough interface to perform the task.
If you interrupted record.py with ctrl-c then the last image file is usually incomplete and cannot be loaded. In that case, classifying will show an error message in the title bar for that image and will not move it to a classification directory. You can delete this image manually after checking if it is unreadable.
What are we doing here?
Writing out each frame as a PNG file is a widely used technique which is compatible with the Keras tools for streaming data from disk during training. It allows easy inspection and curation of the training data, runs at 11 FPS, and generates approximately 700 MB of data per hour. This combination of good performance and easy access to the data makes this suitable for training a network from scratch with moderate amounts of data.
The standard way to store this data is in the following format:
Here, DATA_DIR is a directory for this set of data (for example, the recordings from one particular day) and CLASS is the action class that the images represent.
Recording the data is straightforward, as seen in the
For our demo video, 5 classes were used - one for each action.
You can record classes, individually. For example, by running:
python record.py day1/0 60
This will record 1 minute of images that should represent the None action (class 0). However, it is more efficient to record multiple actions together in a natural sequence, such as entering the room, pointing at the light, sitting down, getting up and then leaving the room. When doing this you need to label the images with their class after recording ends, and split them into separate directories, as described in the next section.
There are several other ways of collecting and storing your data, each with their own strengths and weaknesses. Here are some of these alternatives, for future reference: