Speech Database/Tool System
This project team will create a database of speech samples and speech processing tools.
Examples of speech labs, speech databases, and speech processing software tools can be found at
The database of speech recordings (.wav files) will be accessible through
a Web interface so users can input new recordings or listen to selected existing recordings.
An example of a similar database for speech samples can be found at
George Mason University's Speech Database.
Several speech processing tools will also be developed for experimental speech research in such areas
as speech recognition, speaker authentication, and voice biometric studies.
In order of importance, these tools include:
A search will be made to find such tools on the Internet.
The tools will be incorporated in a directory on Utopia, possibly to be used interactively through the Web interface.
We anticipate finding an appropriate spectral analysis tool on the Internet.
For example, see the tool used in
Establishing the Uniqueness of the Human Voice
for Security Applications.
However, we need a spectrographic tool that provides access to the actual numerical data
(e.g., the energy in a particular frequency band in a particular time interval)
that can be processed later in an application.
A tool to separate speech from background noise can likely also be found on the Internet.
However, we may have to develop the alignment tool in-house,
but that should not be difficult because it is a rather concise algorithm.
The procedure to recognize a person's nationality from speech samples has been provided by DPS student Arthur Phidd.
- A speech spectrogram tool to perform a spectral analysis of a speech signal
(this is a standard speech visualization tool
that typically gives a grey-scale plot of frequency bands as a function of time)
- A procedure to facilitate the recognition of a person's nationality from his/her speech patterns.
For example, an interactive, speech-data quiz might be created to measure the nationality recognition accuracy
of trained linguists or briefly-trained non-linguist examiners.
- A tool to segment the speech portion of the signal from the backgraound noise
by threshholding the signal's energy function
- A tool that uses the elastic matching (dynamic time warping) algorithm
to align a speech signal with a pre-segmented one for alignment purposes
This team will write a technical paper describing the system and any experimental results obtained.