Paper
6 April 1995 Isolated digit recognition without time alignment
Jeffrey M. Gay, Martin P. DeSimio
Author Affiliations +
Abstract
A method for isolated digit recognition without time alignment is examined in this paper. Rather than providing a classifier with feature vectors generated from frames of data (typically at rates near 100 per second) over the word's duration, this method uses only one feature vector per word. A baseline speaker-independent recognition accuracy of 98.1% is established with intraword time alignment from the male speaker/digit subset of a Texas Instruments database using dynamic time warping (DTW) and 12 LPC cepstral coefficients as features. Without intraword time alignment and 12 time-averages LPC cepstral coefficients as feature vectors with a multilayer perceptron (MLP) classifier, the recognition accuracy is 78.4%. By augmenting the feature vectors with 9 time-averaged critical band energy elements and 10 time-averaged LPC coefficients, the accuracy increases to 97.1%. This difference between methods is not statistically significant at the 95% confidence level. Thus, time alignment is demonstrated not to be a critical factor for the digit recognition task. Advantages of the proposed method are that (1) intraword time alignment is not required, and (2) only a single feature vector is computed per utterance. The advantages come at the expense of requiring additional information in the feature vectors relative to a DTW-based classifier.
© (1995) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jeffrey M. Gay and Martin P. DeSimio "Isolated digit recognition without time alignment", Proc. SPIE 2492, Applications and Science of Artificial Neural Networks, (6 April 1995); https://doi.org/10.1117/12.205184
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Databases

Speaker recognition

Speech recognition

Computer engineering

Computing systems

Image classification

Coastal modeling

RELATED CONTENT

Non-native speech recognition using audio style transfer
Proceedings of SPIE (November 06 2019)
Fast nearest-neighbor search algorithm
Proceedings of SPIE (March 13 1996)
Attentional classification
Proceedings of SPIE (March 22 1999)
Vector quantization by neural network
Proceedings of SPIE (July 01 1990)

Back to Top