• Week 26

    Cleaning up sound It may come as no surprise that when training a speech recognition system, it’s probably better to have high-quality audio. A high-quality clean audio file should have few acoustic artifacts which makes it easier to deal with than an audio file recorded in a noisy environment with...

  • Week 25

    Notes on Processing I initially wanted to learn about the Processing programming language because it was advertised as a “programming language for visual artist”. Art is cool, sometimes I feel like an artist. I actually first learned about Processing from one of the researchers at the Santa Fe Institute who...

  • Week 24

    Notes on interest and learning This week I spent collecting the results of my LTASS experiments. I mainly focused on visualizing these results because soon it will be presentation season. In fact, one of the things I wanted to do better this year was visualizing results. I want them to...

  • Week 23

    Spent this week outlining the LTASS method in MATLAB and took a look at the data preparation scripts that are under the VoxForge Kaldi example. I thought it was going to be straight-forward, but it appears there are some conflicts with the directory names of the CHILDES folder and VoxForge...

  • Week 22

    Group meeting We could not make the CogSci 2018 deadline, so we spent this week talking about alternative conference venus. Additionally, we outlined some experiments we would like to try with the corpora. These are LTASS experiments that will hopefully allow us to see the acoustic structure of the two...

  • Week 21

    Formatting audio data for Kaldi This week I came back to Kaldi and began to format the audio data. This is the final process in getting our ASR to generate meaningful (or not meaningful) results. From the various tutorials I have read, there is still a lot of scripting involved....

  • Week 20

    Race to the finish! There is still quite a lot of work to do. This week I spent my time calculating the signal-to-noise Ration (SNR) over all the segmented CHILDES files and VoxForge data. I did this so I could get a better picture (mean, variance) of how rich the...

  • Week 19

    Cleaning up data pt. 2 So, instead of using R to go back and calculate summary statistics, Rebekah, my CREU partner actually developed some C# code that scrapes all the transcripts, gets the lines with timestamps and then determines things like number of tokens, number of unique tokens, duration of...