Installing DeepSpeech 2 for Arm
Baidu's DeepSpeech network provides state-of-the-art speech-to-text capabilities. Their PaddlePaddle-based implementation comes with state-of-the-art models that have been trained on their internal >8000 hour English speech dataset. Mandarin versions are also available.
Mozilla host a TensorFlow-based version of DeepSpeech, but the model files available for it are trained on small public datasets and offer significantly lower accuracy than Baidu's internally-trained ones.
The remainder of this section provides a condensed guide on installing DeepSpeech2, tested on Ubuntu 16.04 LTS running on a packet.net Type 2A server. To install it on another platform, follow Baidu's general installation guide.
Once you have a working PaddlePaddle installation, install the additional DeepSpeech dependencies. These are mostly audio codecs:
apt-get install -y pkg-config libflac-dev libogg-dev libvorbis-dev libboost-dev libffi-dev
DeepSpeech's requirements.txt file specifies particular scipy and Cython versions, which will automatically be built from source. These builds can take longer than an hour, so while this is happening, download the models (see next section), which also takes a long time.
git clone https://github.com/PaddlePaddle/DeepSpeech.git
Download models while building
These two files are large (400MB and 8GB) so it is useful to start downloading these while the previous build step is in progress. To do this, open a new command line, login to the server using SSH as before, navigate to the Paddle/build/python/dist directory and enter:
Build speech manifest
The librispeech manifest is used by the demo server to provide warmup examples. Scripts to download it are provided:
ln -s ../../data_utils data_utils
python librispeech.py --full_download=False