Installing DeepSpeech 2 for Arm

Baidu's DeepSpeech network provides state-of-the-art speech-to-text capabilities. Their PaddlePaddle-based implementation comes with state-of-the-art models that have been trained on their internal >8000 hour English speech dataset. Mandarin versions are also available.

Mozilla host a TensorFlow-based version of DeepSpeech, but the model files available for it are trained on small public datasets and offer significantly lower accuracy than Baidu's internally-trained ones.

The remainder of this section provides a condensed guide on installing DeepSpeech2, tested on Ubuntu 16.04 LTS running on a Type 2A server. To install it on another platform, follow Baidu's general installation guide.

Install dependencies

Once you have a working PaddlePaddle installation, install the additional DeepSpeech dependencies. These are mostly audio codecs:

apt-get install -y pkg-config libflac-dev libogg-dev libvorbis-dev libboost-dev libffi-dev

Build DeepSpeech

DeepSpeech's requirements.txt file specifies particular scipy and Cython versions, which will automatically be built from source. These builds can take longer than an hour, so while this is happening, download the models (see next section), which also takes a long time.

git clone
cd DeepSpeech

Download models while building

These two files are large (400MB and 8GB) so it is useful to start downloading these while the previous build step is in progress. To do this, open a new command line, login to the server using SSH as before, navigate to the Paddle/build/python/dist directory and enter:

cd DeepSpeech/models/baidu_en8k
cd ../lm
cd ../../

Build speech manifest

The librispeech manifest is used by the demo server to provide warmup examples. Scripts to download it are provided:

cd data/librispeech
ln -s ../../data_utils data_utils
python --full_download=False
cd ../..
Previous Next