Week 41 (Extension)

Re-segmenting

A few weeks ago we noticed a discrepancy with the number of hours I calculated in each of the CHILDES age ranges.

childes

According to the table below Rebekah calculated, the number of hours in each of the age ranges should be:

0-12: 16.99 hrs
13-24: 109.37 hrs
25-36: 91.22 hrs

However, when I calculated the duration of each corpus when I resampled the audio segments down to 16 kHz I OBTAINED!!!:

0-12: 16.56 hrs
13-24: 260.91 hrs
25-36: 89.67 hrs

So, this prompted me to look at my segmentation method:

for index = 1:length(matches)
  num = textscan(string(strrep(matches(index),'_', ' ')),'%f', 'Delimiter',' ');
  num = num2cell(permute(num{1},[2,1]));
  num = cell2mat(num);
  st = num(1);
  et = num(2);
  if(st == 0)
    st = st + 1;           
  end

  % Get start and stop times from matches
  startTime = round((st/1000)*Fs);
  endTime = round((et/1000)*Fs);
  if(endTime > length(input_signal))
    endTime = length(input_signal);
  end
  startTime = max(startTime,1)
  if(startTime > length(input_signal))
    startTime = round(startTime - Fs);
  end
  segment = input_signal(startTime:endTime);
  if(isempty(segment))
    disp('audio file(s) does not exist');
    AccumulatedTranscripts = [AccumulatedTranscripts; {thisBaseFileName}];
  end
  % Format name for output file
  index = num2str(index);
  index = insertBefore(index,index,'_');
  outputFileName = strcat(index,'.wav');
  outputFileName = strrep(thisBaseFileName,'.txt', outputFileName);

  % Output audio file
  audiowrite(outputFileName,segment, Fs);   
end

Then, I saw the following lines:

% Get start and stop times from matches
  startTime = round((st/1000)*Fs);
  endTime = round((et/1000)*Fs);
  if(endTime > length(input_signal))
    endTime = length(input_signal);
  end
  startTime = max(startTime,1)
  if(startTime > length(input_signal))
    startTime = round(startTime - Fs);
  end
  segment = input_signal(startTime:endTime);

Dr. Brumberg gave me some advice on the code and, then I made the following changes:

startSample = (startTime/1000)*Fs;
startSample = max(startSample,1); % If start time is 0
endSample = (endTime/1000)*Fs;
segment = input_signal(floor(startSample):ceil(endSample));

And also added:

% Resample audio segment
[P,Q] = rat(16e3/Fs);
% Normalize audio
segment = segment/max(abs(segment(:)));
% Leave headroom
segment = segment * 0.9;
% Resample signal
resampled_signal = resample(segment,P,Q);

So, after the segments still make sure to resample back down to 16 kHz.

for index = 1:length(matches)
     % Convert the timestamps from transcripts to numbers
     % Note: the timestamps are in milliseconds
     num = textscan(string(strrep(matches(index),'_', ' ')),'%f', 'Delimiter',' ');
     num = num2cell(permute(num{1},[2,1]));
     num = cell2mat(num);
     startTime = num(1);
     endTime = num(2);

     % Use timestamps to segment audio file
     startSample = (startTime/1000)*Fs;
     startSample = max(startSample,1); % If start time is 0
     endSample = (endTime/1000)*Fs;
     segment = input_signal(floor(startSample):ceil(endSample));
     if(isempty(segment))
         disp('audio file(s) does not exist');
         AccumulatedTranscripts = [AccumulatedTranscripts; {thisBaseFileName}];
     end

     % Resample audio segment
     [P,Q] = rat(16e3/Fs);
     % Normalize audio
     segment = segment/max(abs(segment(:)));
     % Leave headroom
     segment = segment * 0.9;
     % Resample signal
     resampled_signal = resample(segment,P,Q);

     % Construct the name for output audio segment
     index = num2str(index);
     index = insertBefore(index,index,'_');
     outputFileName = strcat(index,'.wav');
     outputFileName = strrep(thisBaseFileName,'.txt', outputFileName);

     % Output audio file
     audiowrite(outputFileName,resampled_signal,16e3);   
 end

Then, I tried resegmenting and resampling all the CHILDES files. This process took a bit, but I it was an opportunity to check out if the problem was actually with my segmentation method or some of the transcripts (error coding the timestamps). I ended up finding that my segmentation method was probably collecting more audio that it needed:

if(endTime > length(input_signal))
  endTime = length(input_signal);
end

But also that some of the transcripts did have coding errors:

time_stamp_error

So, I ended up getting errors:

index exceeds matrix dimensions
Error in get_childes_segments (line 93)
segment = input_signal(floor(startSample):ceil(endSample))

When this error occurred it meant I had to get rid of the last two or last utterance/timestamps, which is fine.

Updating my segmentation method was actually helpful because I got to check if my segments really matched the length of time that the timestamps indicated. It also, helped me listen to some of the different sub copra again. Again, I noticed that they vary in sound quality. and noisiness, but also some of them are panned for the video. For example, listen to the first few seconds of the audio file from the Providence sub corpus. After resegmenting, resampling, and making the panning adjustments I can try recalculating the audio duration in the different CHILDES age ranges.

Voice activity detection

Trying to figure out a good way extract only the adult speech from audio segments. Because again, the recorded utterances are sometimes not only the parent’s utterance, but contain unnecessary silence, child speech, music, or speech is buried in background noise and so is unintelligible. But yeah, I need to create some threshold for the audio segments, which says if the audio segments contains the following: snr, some probability of speech, and child utterances, get rid of it. From there, I can actually see how many hours of speech I have left to train and test on.

Best,
EO