Title page for ETD etd-06062008-162619


Type of Document Dissertation
Author Huliehel, Fakhralden A.
URN etd-06062008-162619
Title An RBFN-based system for speaker-independent speech recognition
Degree PhD
Department Electrical Engineering
Advisory Committee
Advisor Name Title
VanLandingham, Hugh F. Committee Chair
Abbott, A. Lynn Committee Member
Bay, John S. Committee Member
Beex, A. A. Louis Committee Member
Palettas, Panickos N. Committee Member
Keywords
  • voice-driven menu systems
Date of Defense 1995-07-17
Availability restricted
Abstract
A speaker-independent isolated-word small vocabulary system is developed for applications such as voice-driven menu systems. The design of a cascade of recognition layers is presented. Several feature sets are compared. Phone recognition is performed using a radial basis function network (RBFN). Dynamic time warping (DTW) is used for word recognition. The TIMIT database is used to design and test the automatic speech recognition (ASR) system.

Several feature sets using mel-scale filter bank (MSFB), smoothed FFT, reflection coefficients (also called P ARCORs), and cepstral features are extracted. The MSFBs outperform the other features considered in our study.

Multilayer perceptrons (MLPs) and radial basis function networks (RBFNs) are considered for phoneme recognition. RBFN's are easier to train than MLPs so that RBFN's were selected to perform phoneme classification.

Four RBFN's are compared: RBFN type-I is a single-layer RBFN, RBFN type-II is a two-layer net where the second layer consists of a vector of weights, RBFN type-III is a two-layer net where the second layer is a linear layer, and RBFN type-IV is a two-layer net where the second layer is a RBFN. RBFN type-II outperforms the others on the phone level where the phone recognition rate is about 44%.

Using clustering techniques, a suboptimal, iterative and interactive algorithm is developed to train the radial basis functions (RBFs). An algorithm is developed to reduce segmentation errors in TIMIT. The TIMIT 60 phone set is reduced to a 33 phone set by merging similar phones.

For 168 test speakers, 84% recognition rate is achieved on a vocabulary of 11 words from the sentence SAl ("she had your dark suit in greasy wash water all year") in TIMIT. For applications such as voice driven menu systems, the vocabulary words can be selected to be separable and distinct. A 95% recognition rate is achieved when the confusing words in the 11 words vocabulary are excluded to get an 8-word vocabulary.

Real-time implementation of the proposed system can be achieved using a digital signal processor that can perform a multiplication within lOOns.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
[VT] LD5655.V856_1995.H855.pdf 5.35 Mb 00:24:47 00:12:44 00:11:09 00:05:34 00:00:28
[BTD] next to an author's name indicates that all files or directories associated with their ETD are accessible from the Virginia Tech campus network only.

Browse All Available ETDs by ( Author | Department )

dla home
etds imagebase journals news ereserve special collections
virgnia tech home contact dla university libraries

If you have questions or technical problems, please Contact DLA.