Week 3

This week was spent on trying to get different open source speech recognition systems installed. I went through installing the HTK-toolkit, Kaldi, and CMU Sphinx. Because we first wanted to start with building a HMM-based speech recognition system I first attempted to install HTK. Of course, the installing process for HTK was not as straightforward as their website detailed. So, I will walk through installing HTK, Kaldi, and CMU Sphinx just in case anyone else gets lost (especially if you’re a mac user) as I did.

HTK

The Hidden Markov Model Toolkit (HTK) is an open source framework for building hidden-Markov models. HTK is used for a variety of research projects, but is most notable for speech recognition research. There are a lot of different libraries and modules that I’m really interested in playing with.

Installing HTK

Before installing HTK you have to register for an account on their website, this will allow you to download the HTK manual (433 pages) and source code.

Once all download prerequisites (X11 installed for mac users) have been met, download the latest stable release (3.4.1) to your machine. Navigate to the compressed file using a terminal and unzip:

One can also download HTK-3.5, this was released in 2015 and appears to supports the building of acoustics models also using different types of neural networks. I’ll probably utilize HTK-3.5 a bit later.

$ tar xzf HTK-3.4.1.tar.gz
$ cd HTK

If you’re using macOS or linux do not run the config file that is described on their installation page. For macOS when you try to make after running the configuration fileyou get fatal error: ‘X11/Xos.h’ file not found

Turns out HTK graphical tools has a couple of issues with Xcode 11. This can be avoided by disabling the graphical tools during configuration:

$ export CPPFLAGS=-UPHNALG
$ ./configure --without-x --disable-hslab
$ make all
$ sudo make install

You can test the installation by using the HTK-samples provided on the HTK website (underneath the HTK-3.4.1 download). Unzip and navigate to the HTKDemo folder:

$ cd Downloads/samples/HTKDemo
perl ./runDemo configs/monPlainM1S1.dcf

Kaldi

Kaldi is another toolkit for speech recognition, it’s a lot more recent than HTK. It was developed by Dan Povey and colleagues. Kaldi is written in C++ and has really nice documentation, as well as a more lively support community for getting started.

Installing Kaldi

The installation process for Kaldi was way simpler than htk. Simply pull the Kaldi package from their Github repository and cd into the directory:

$ git clone https://github.com/Kaldi-asr/Kaldi.git Kaldi --origin upstream
$ cd Kaldi

From there, follow the INSTALL instructions in the tools and src directories. The install instructions say to make use of of parallel processing. This speeds up the installation significantly… Actually, parallelism will be really useful for this will be really useful for training.

CMU Sphinx

The history of CMU Sphinx pre dates both HTK and Kaldi. It’s a very popular toolkit and was started at developed at Carnegie Mellon University. I’m really interested in CMU Sphinx for it’s mobile implementation (pocketsphix), I think that could be a lot of fun to work with. CMU Sphinx also gives you a lot of flexibility in terms of what programming language (python) you can interface with it.

Installing CMU Sphinx

I tried to install CMU Sphix a while back, and I was pretty confused, however now it’s all migrated to github. One can also quickly install the un-versioned with homebrew.

To be honest, I probably won’t use all of these different open source software, but it’s good to have options…

Best,
EO