Speech Recognition using MFCC & VQ
Authors:SHIKHA GUPTA, MOHD SUHEL
Authors:SHIKHA GUPTA, MOHD SUHEL
Abstract: Speech recognition are becoming more and more useful nowadays. Various fields for research in speech processing
has been done. In this work, the Mel Frequency Cepstrum Coefficient (MFCC) and Vector Quantization (VQ) has been used for
making a text independent speaker identification system. Several features are extracted from speech signal of spoken words
using MFCC. The VQ-based methods are parametric approaches which use VQ codebooks consisting of a small number of
representative feature vectors,. Speech recognition systems are the efficient alternatives for such devices where typing becomes
difficult.
Keywords: MATLAB, Mel Frequency Cepstral Coefficients (MFCC), Speaker Recognition, Vector Quantization(VQ).
INTRODUCTION
The Speech is the most common & primary mode of
communication among human beings. Human voice
conveys much more information such as gender, emotion
and identity of the speaker. Speech Recognition can be
defined as the process of converting speech signal to a
sequence of words by means an Algorithm .The objective of
speech recognition is to determine which speaker is present
based on the individual’s characterization [1].The most
popular spectral based parameter used in recognition
approach is the Mel Frequency Cepstral Coefficients called
MFCC.MFCCs are coefficients, which represent audio,
based on perception of human auditory systems. By using
hamming window, speech signal is divided into a number of
blocks of short duration so that Fourier transform can be
applied. In this work, the Mel frequency Cepstrum
Coefficient (MFCC) feature has been used for designing a
text independent speaker identification system. The
extracted speech features (MFCC’s) of a speaker are
quantized to a number of centroids using vector quantization
algorithm. These centroids constitute the codebook of that
speaker. MFCC’s are calculated in training phase and again
in testing phase. Speakers uttered same words once in a
training session and once in a testing session later. The
Euclidean distance between the MFCC’s of each speaker in
training phase to the centroids of individual speaker in
testing phase is measured and the speaker is identified
according to the minimum Euclidean distance[11].The code
is developed in the MATLAB environment and performs
the identification satisfactorily.
No comments:
Post a Comment