Week 6

Quick recap: I’m learning to build an simple ASR system with Kaldi using VoxForge (free) data as Kaldi has a couple of pre-built recipes available. Again, this is just a good way to get comfortable with building ASRs with Kaldi.

Notes about Voxforge data

There is a slight problem with using the Voxforge data as the majority of it is anonymous. From Kaldi’s perspective makes it difficult to perform speaker-dependent transforms. So, I need to spend a bit of time organizing a subset of the Voxforge data by running: voxforge_format_data.sh

$ local/voxforge_format_data.sh
=== Preparing train and test data ...
cp: data/local/train_wav.scp: No such file or directory

Not entirely sure about where the lost train_wav.scp file, hopefully that won’t cause a problem.

Training and testing

After this was completed I attempted to do training and testing. The first step in this process is splitting the VoxForge data into a training and testing set. This was actually accomplished by running the script voxforge_data_prep.sh

$ local/voxforge_data_prep.sh
=== Building a language model ...
--- Preparing a corpus from test and train transcripts ...

Additionally, the actual number of speakers used for training is set in run.sh; although the speakers used for training and testing are chosen at random. Actually it has been pointed out that this basically means every time voxforge_data_prep.sh runs the speakers used for training will be different and the systems accuracy on the training set.

Building language model

Alright, now i’m ready to actually build the language model. It was suggested that the MIT Language Modeling Toolkit would be helpful, but the IRSTLM Toolkit is installed by default under the Kaldi tools within extras. For next week I will take time trying the install the different language toolkits.

Best,
EO