Speech Processing System
This project team will extend Naresh Trilok's dissertation work as follows:
You should use Matlab for converting the new speech samples into the 13 frequency bands
as a function of time (simply use Matlab programs already provided by Naresh).
You should be able to use either Java or Matlab for the grey-scale display,
for the segmentation of "My name is", for further dividing the segmented samples into the 7 speech sounds,
and for obtaining the features (means and variances).
Matlab should again be used for the neural network classifier (again using the programs provided by Naresh).
- become familiar with Naresh's Matlab programs and rerun the experiments to understand the system
- create a grey-scale plot of the 13 frequency bands as a function of time
(this plot can be used to display the speech samples before and after segmentation,
and also after dividing the segmented samples in the seven speech sounds)
- run a 24 feature experiment (12 frequency bands without the first Cepstral component, means and variances)
- add 40 speakers to the database of speech samples,
increasing the number of speakers from 10 to 50, with 10 samples of "My name is ..." from each speaker
(increasing the total number of utterances from 100 to 500).
Each speech sample in the database should include the speakers name,
gender, age, and nationality.
- automate the segmentation of the "My name is" portion of the speech samples
- use the signal energy (first spectral component) to locate the start of "My name is"
- use the ratio of the sum of the spectral components over 2KHz
to the sum of those under 2KHz to locate the [z] sound and thus the end of "My name is"
(it should also be approximately one second after the start since this phrase is about one second in duration)
- automate the segmentation of the "My name is" phrase into its 7 speech sounds
- use the elastic matching (dynamic time warping) algorithm (see technical papers provided to team)
to align with a pre-segmented speech sample
- this requires the manual segmentation of at least one clearly pronounced "My name is" utterance
to serve as a reference
- rerun the 24 and 84 feature experiments with the increased database of speech samples
- the 24 feature experiment requires steps 4 and 5 but not 6
- the 84 feature experiment requires steps 4, 5, and 6
- run a 168 feature experiment (both means and variances of the 84 measures)
- this is a new experiment and it should yield the best results
For a summary of Naresh's dissertation that we just submitted to a conference, see
that was submitted to