CREU 2017-2018 | University of Kansas

Week 26
13 Mar 2018
Cleaning up sound It may come as no surprise that when training a speech recognition system, it’s probably better to have high-quality audio. A high-quality clean audio file should have few acoustic artifacts which makes it easier to deal with than an audio file recorded in a noisy environment with...
Week 25
07 Mar 2018
Notes on Processing I initially wanted to learn about the Processing programming language because it was advertised as a “programming language for visual artist”. Art is cool, sometimes I feel like an artist. I actually first learned about Processing from one of the researchers at the Santa Fe Institute who...
Week 24
27 Feb 2018
Notes on interest and learning This week I spent collecting the results of my LTASS experiments. I mainly focused on visualizing these results because soon it will be presentation season. In fact, one of the things I wanted to do better this year was visualizing results. I want them to...
Week 23
12 Feb 2018
Spent this week outlining the LTASS method in MATLAB and took a look at the data preparation scripts that are under the VoxForge Kaldi example. I thought it was going to be straight-forward, but it appears there are some conflicts with the directory names of the CHILDES folder and VoxForge...
Week 22
06 Feb 2018
Group meeting We could not make the CogSci 2018 deadline, so we spent this week talking about alternative conference venus. Additionally, we outlined some experiments we would like to try with the corpora. These are LTASS experiments that will hopefully allow us to see the acoustic structure of the two...
Week 21
30 Jan 2018
Formatting audio data for Kaldi This week I came back to Kaldi and began to format the audio data. This is the final process in getting our ASR to generate meaningful (or not meaningful) results. From the various tutorials I have read, there is still a lot of scripting involved....
Week 20
23 Jan 2018
Race to the finish! There is still quite a lot of work to do. This week I spent my time calculating the signal-to-noise Ration (SNR) over all the segmented CHILDES files and VoxForge data. I did this so I could get a better picture (mean, variance) of how rich the...
Week 19
16 Jan 2018
Cleaning up data pt. 2 So, instead of using R to go back and calculate summary statistics, Rebekah, my CREU partner actually developed some C# code that scrapes all the transcripts, gets the lines with timestamps and then determines things like number of tokens, number of unique tokens, duration of...