Download and build the sample application
Install Arm toolchain and Mbed CLI
- Download Arm cross compilation toolchain. Select the correct toolchain for the OS that your computer is running. For Windows users, if you have already set up the Linux virtual environment, install the toolchain there.
- To build and deploy the application, we use the Mbed CLI. We recommend that you install Mbed CLI with our installer. If you need more customization, you can perform a manual install. Although this is not recommended.
If you do not already have Mbed CLI installed, download the installer:
- After Mbed CLI is installed, tell Mbed where to find the Arm embedded toolchain using the following command:
mbed config -G GCC_ARM_PATH <path_to_your_arm_toolchain>/bin
We recommend running the following commands from inside the Mbed CLI terminal that gets launched with the Mbed CLI Application. This is because it is much quicker to set up, because it resolves all your environment dependencies automatically.
Build and compile micro speech example
Navigate to the directory where you keep code projects. Run the following command to download TensorFlow Lite source code.
git clone https://github.com/tensorflow/tensorflow.git
While you wait for the project to download, let us explore the project files on GitHub and learn how this TensorFlow Lite for Microcontrollers example works.
The code samples audio from the microphone on the STM32F7. The audio is run through a Fast Fourier transform to create a spectrogram. The spectrogram is then fed into a pre-trained machine learning model. The model uses a convolutional neural network to identify whether the sample represents either the command “yes” or “no”, silence, or an unknown input. We explore how this works in more detail later in the guide.
The micro speech sample application is in the
Here are descriptions of some interesting source files:
- disco_f746ng/audio_provider.cc captures audio from the microphone on the device.
- micro_features/micro_features_generator.cc: uses a Fast Fourier transform to create a spectrogram from audio.
- micro_features/tiny_conv_micro_features_model_data.cc. This file is the machine learning model itself, represented by a large array of unsigned char values.
- command_responder.cc is called every time a potential command has been identified.
- main.cc. This file is the entry point for the Mbed program, which runs the machine learning model using TensorFlow Lite for Microcontrollers.
After the project has downloaded, you can run the following commands to navigate into the project directory and build it:
cd tensorflow make -f tensorflow/lite/micro/tools/make/Makefile TARGET=mbed TAGS="CMSIS-NN disco_f746ng" generate_micro_speech_mbed_project
These commands create an Mbed project folder in
The micro speech source code of the generated Mbed project is in tensorflow/lite/micro/tools/make/gen/mbed_cortex-m4/prj/micro_speech/mbed/tensorflow/lite/micro/examples/micro_speech.If you must make further changes to the source code after generating the Mbed project, change the source code in the micro_speech folder.
If you encounter the error message
"Tensorflow/lite/micro/tools/make/Makefile:2 *** “Require make version 3.82 or later (current 3.81)", please refer to the Troubleshooting section.
mbed config root .
TensorFlow requires C++ 11, so you must update your profiles to reflect this. Here is a short Python command that does that. Run it from the command line:
python -c 'import fileinput, glob;
for filename in glob.glob("mbed-os/tools/profiles/*.json"):
for line in fileinput.input(filename, inplace=True):
print line.replace("\"-std=gnu++98\"","\"-std=c++11\", \"-fpermissive\"")'
After that setting is updated, you can compile:
mbed compile -m DISCO_F746NG -t GCC_ARM
In the example above, we compiled our project with a
TAGS="cmsis-nn" flag, which enables kernel optimization with CMSIS-NN library. Following are some CMSIS-NN acceleration techniques.
The CMSIS-NN library provides optimized neural network kernel implementations for all Arm Cortex-M processors, ranging from Cortex-M0 to Cortex-M55. The library utilizes the capabilities of the processor, such as DSP and M-Profile Vector (MVE) extensions, to enable the best possible performance.
The STMicroelectronics F746NG Discovery board we use in the guide is powered by Arm Cortex-M7, which supports DSP extensions. That enables the optimized kernels to perform multiple operations in one cycle using SIMD (Single Instruction Multiple Data) instructions. Another optimization technique used by the CMSIS-NN library is loop unrolling. These techniques combined significantly accelerate kernel performance on Arm MCUs.
In the following example, we use the SIMD instruction, SMLAD (Signed Multiply with Addition), together with loop unrolling to perform a matrix multiplication y = a * b, where
a = [1, 2]
b = [3, 5 4, 6]
a, b are 8-bit values and y is a 32-bit value. With regular C, the code would look something like the following code:
for (i=0; i<2; ++i) for (j=0; j<2; ++j) y[i] += a[j] * b[j][i]
However, using loop unrolling and SIMD instructions, the loop looks like the following code:
a_operand = a | a << 16 // put a, a into one variable for(i=0; i<2; ++i) b_operand = b[i] | b[i] << 16 // vice versa for b y[i] = __SMLAD(a_operand, b_operand, y[i])
This code saves cycles due to fewer for-loop checks since
__SMLAD performs two multiply and accumulate operations in one cycle.
With CMSIS-NN enabled, we observed a 16x performance uplift in the micro speech inference time.