Week 24

Notes on interest and learning

This week I spent collecting the results of my LTASS experiments. I mainly focused on visualizing these results because soon it will be presentation season. In fact, one of the things I wanted to do better this year was visualizing results. I want them to be transparent and interesting. So, I’m shooting for something simple but profound!

Currently brainstorming ideas about how I can showcase speech recognition and highlight the relationship between speech recognition and language learning. Ultimately, I think this will do a couple of things for me:

  1. Reintroduce the goal of this project to myself
  2. Have fun learning about new ways to visualize my data and results
  3. Begin building the story of how and why I chose to model language learning (MLL) using these methods

Because to me, the coolest part of research is coming up with creative ways to study and present an idea or set of views to people. These are frameworks (stories?) that could potentially transform the way people think about the world. I mean, from my point of view this is super difficult. And often it doesn’t appear that glamorous. However, I know that’s where I want to go with this.

The role of a child’s interest

What does it take for a child to learn (acquire) a language? Indeed, it is some contribution to the child’s intuition or implicit knowledge about the world and their environment. For instance, while it is not ethical to subject people to feral child experiments, Genie tells us a lot. So, assuming that children are in the appropriate social and linguistic environment that supports their native tongue, what contributions from other humans are required for the child to acquire language.

There turns out to be a lot of interesting infancy and developmental psychology research that discusses such items. Here I won’t attempt to unpack the decades of results from these fields, nor pretend to understand them. Instead, I want to point to Trevarthen (1979) where he describes how the control people share during typical conversations is the basis for showing how an infant’s consciousness and intentionality can be used to instruct the parent in establishing consistent communication. That is, because there is a certain level of intelligence that the child displays and the parent recognizes, they are willing to share control appropriately during the interaction. From there, parents can more easily become aligned with the interest of the child and their developmental needs. Essentially a long-term training session between parent (personal trainer) and child (developing individual). So in the end, the child has interest, and the parent becomes interested in the child and their interest allowing the child to become interested (more interested) in the parent and their interest. Quite the theory!

tl;dr It takes time to master a language (or anything), but perhaps being interested is motivation or a force that propels us through progress.

LTASS

I don’t think I have been very clear about what exactly LTASS is trying to accomplish. This was partly because I hadn’t understood it very until recently. So, this week I spent a bit more time explaining LTASS and associated concepts. Though, I suppose that its all in the name. Again, LTASS stands for the long-term average speech spectrum, and it provides a way to analyze voice. The analysis. So, LTASS can show us:

  1. Abnormal acoustical patterns
  2. Quality of voice in terms of pitch and volume

Hence, we’re mainly interested in the average frequencies (spectrum) of a particular voice (speech). We don’t know if the average spectrum of the VoxForge and CHILDES corpora are different, but if they are significantly different, then perhaps some of the variability in speech recognition performance can be attributed to their difference in the spectrum (or a combination of their spectra, SNR, and other acoustical measurements).

Conceptually, the LTASS methods looks like collecting all of the speech samples from some input signal that was sampled at some sampling rate (let’s say 16kHz). Then, taking the FFT/periodogram at some resolution. The resolution can be calculated beforehand because it depends on the FFT length and sampling rate:

So, let’s say in MATLAB we collected 88000 samples:

[input_file, Fs] = audioread('input_signal'); % -> 88000 x 1 vector of samples; Fs = 16000
FFTBins = length(input_file)/2; % ->  44000

The number of FFT bins we would have is half of the number of samples (see Nyquist-Shannon sampling theorem). So, if our input signal was sampled at 16 kHz and we collected 88000 samples, the resolution of each FFT bin would be:

or

Now, imagine some curious child listening to their parent’s voice and after every 15 minutes (900 seconds) the child takes note of all the frequencies present in that 15 minute chunk of adult speech. The child is brilliant, so this note taking happens almost instantaneously.

This is more or less how we’re using LTASS. After every 15 minutes we collect the frequency content in the adult speech and characterize the average spectral density (periodogram).

In MATLAB this looks like:

.
.
.
totalNumberOfFiles = length(childesDirectory.length);
totalDuration = 0;
AccumulatedSamples = [];
AccumulatedFreq = {};

if totalNumberOfFiles >= 1
  for k = 1 : totalNumberOfFiles
    % Go through all those files
    thisFolder = childesDirectory(k).folder;
    thisBaseFileName = childesDirectory(k).name;
    fullFileName = fullfile(thisFolder, thisBaseFileName);

    % Accumulate audio samples
    info = audioinfo(fullFileName);
    audioFileDuration = info.Duration;
    totalDuration = totalDuration + audioFileDuration;
    [input_file, Fs] = audioread(fullFileName);
    if(Fs ~= 16000)
      Fs = 16000;
      cd('Resampled');
      audiowrite(thisBaseFileName,input_file,Fs);
      [input_file,Fs] = audioread(thisBaseFileName);
      cd ..
    end
    AccumulatedSamples = cat(1,AccumulatedSamples,input_file); %

    % Get frequencies
    if(totalDuration >= 900 || k == totalNumberOfFiles)
      signal = AccumulatedSamples(:,1);
      ltass_psd = psd(spectrum.periodogram,signal,'Fs',Fs,'NFFT',length(signal));
      frq_data = ltass_psd.data;
      AccumulatedFreq = [AccumulatedFreq, {frq_data}];
      totalDuration = 0;
      AccumulatedSamples = [];
    end
end

In researching LTASS, I found some SoundZone_Tools by JacobD10. One of the tools is an LTASS function. Instead, we can use his method:

.
.
.
  % Get frequencies
  if(totalDuration >= 900 || k == totalNumberOfFiles)
      [ltass_SZ, frqs] = LTASS(AccumulatedSamples,(Fs/(length(AccumulatedSamples))),Fs);
      totalDuration = 0;
      AccumulatedSamples = [];
  end
end

Cleaning up my mess

I believe it’s time to make functions out of all the data collection methods I’ve been using. I’ve been neglecting this, but I need to be more productive.

Best,
EO