Download and build the sample application

Install Arm toolchain and Mbed CLI

  1. Download Arm cross compilation toolchain. Select the correct toolchain for the OS that your computer is running. For Windows users, if you have already set up the Linux virtual environment, install the toolchain there.

  2. To build and deploy the application, we use the Mbed CLI. We recommend that you install Mbed CLI with our installer. If you need more customization, you can perform a manual install. Although this is not recommended.

    If you do not already have Mbed CLI installed, download the installer:

    Mac installer

  3. After Mbed CLI is installed, tell Mbed where to find the Arm embedded toolchain using the following command:

    mbed config -G GCC_ARM_PATH <path_to_your_arm_toolchain>/bin


    We recommend running the following commands from inside the Mbed CLI terminal that gets launched with the Mbed CLI Application. This is because it is much quicker to set up, because it resolves all your environment dependencies automatically.

    Mbed CLI chip image

Build and compile micro speech example

Navigate to the directory where you keep code projects. Run the following command to download TensorFlow Lite source code.

git clone

While you wait for the project to download, let us explore the project files on GitHub and learn how this TensorFlow Lite for Microcontrollers example works.

The code samples audio from the microphone on the STM32F7. The audio is run through a Fast Fourier transform to create a spectrogram. The spectrogram is then fed into a pre-trained machine learning model. The model uses a convolutional neural network to identify whether the sample represents either the command “yes” or “no”, silence, or an unknown input. We explore how this works in more detail later in the guide.

The micro speech sample application is in the tensorflow/lite/micro/examples/microspeech directory.

Here are descriptions of some interesting source files:

After the project has downloaded, you can run the following commands to navigate into the project directory and build it:

cd tensorflow

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=mbed TAGS="CMSIS-NN disco_f746ng" generate_micro_speech_mbed_project

These commands create an Mbed project folder in tensorflow/lite/micro/tools/make/gen/mbed_cortex-m4/prj/micro_speech/mbed.

The micro speech source code of the generated Mbed project is in tensorflow/lite/micro/tools/make/gen/mbed_cortex-m4/prj/micro_speech/mbed/tensorflow/lite/micro/examples/micro_speech.If you must make further changes to the source code after generating the Mbed project, change the source code in the micro_speech folder.

If you encounter the error message "Tensorflow/lite/micro/tools/make/Makefile:2 *** “Require make version 3.82 or later (current 3.81)", please refer to the Troubleshooting section.

cd tensorflow/lite/micro/tools/make/gen/mbed_cortex-m4/prj/micro_speech/mbed
mbed config root .
mbed deploy

TensorFlow requires C++ 11, so you must update your profiles to reflect this. Here is a short Python command that does that. Run it from the command line:

python -c 'import fileinput, glob;
for filename in glob.glob("mbed-os/tools/profiles/*.json"):
  for line in fileinput.input(filename, inplace=True):
    print line.replace("\"-std=gnu++98\"","\"-std=c++11\", \"-fpermissive\"")'

After that setting is updated, you can compile:

mbed compile -m DISCO_F746NG -t GCC_ARM


In the example above, we compiled our project with a TAGS="cmsis-nn" flag, which enables kernel optimization with CMSIS-NN library. Following are some CMSIS-NN acceleration techniques.

The CMSIS-NN library provides optimized neural network kernel implementations for all Arm Cortex-M processors, ranging from Cortex-M0 to Cortex-M55. The library utilizes the capabilities of the processor, such as DSP and M-Profile Vector (MVE) extensions, to enable the best possible performance.

The STMicroelectronics F746NG Discovery board we use in the guide is powered by Arm Cortex-M7, which supports DSP extensions. That enables the optimized kernels to perform multiple operations in one cycle using SIMD (Single Instruction Multiple Data) instructions. Another optimization technique used by the CMSIS-NN library is loop unrolling. These techniques combined significantly accelerate kernel performance on Arm MCUs.

In the following example, we use the SIMD instruction, SMLAD (Signed Multiply with Addition), together with loop unrolling to perform a matrix multiplication y = a * b, where

a = [1, 2]


b = [3, 5
     4, 6]

a, b are 8-bit values and y is a 32-bit value. With regular C, the code would look something like the following code:

for (i=0; i<2; ++i)
     for (j=0; j<2; ++j)
         y[i] += a[j] * b[j][i]

However, using loop unrolling and SIMD instructions, the loop looks like the following code:

a_operand = a[0] | a[1] << 16 // put a[0], a[1] into one variable
for(i=0; i<2; ++i)
    b_operand = b[0][i] | b[1][i] << 16 // vice versa for b
    y[i] = __SMLAD(a_operand, b_operand, y[i])

This code saves cycles due to fewer for-loop checks since __SMLAD performs two multiply and accumulate operations in one cycle.

With CMSIS-NN enabled, we observed a 16x performance uplift in the micro speech inference time.

Previous Next