Speech Recognition using MFCC & VQ

Speech Recognition using MFCC & VQ
Authors:SHIKHA GUPTA, MOHD SUHEL

Abstract: Speech recognition are becoming more and more useful nowadays. Various fields for research in speech processing has been done. In this work, the Mel Frequency Cepstrum Coefficient (MFCC) and Vector Quantization (VQ) has been used for making a text independent speaker identification system. Several features are extracted from speech signal of spoken words using MFCC. The VQ-based methods are parametric approaches which use VQ codebooks consisting of a small number of representative feature vectors,. Speech recognition systems are the efficient alternatives for such devices where typing becomes difficult. 

Keywords: MATLAB, Mel Frequency Cepstral Coefficients (MFCC), Speaker Recognition, Vector Quantization(VQ). 

INTRODUCTION 
 The Speech is the most common & primary mode of communication among human beings. Human voice conveys much more information such as gender, emotion and identity of the speaker. Speech Recognition can be defined as the process of converting speech signal to a sequence of words by means an Algorithm .The objective of speech recognition is to determine which speaker is present based on the individual’s characterization [1].The most popular spectral based parameter used in recognition approach is the Mel Frequency Cepstral Coefficients called MFCC.MFCCs are coefficients, which represent audio, based on perception of human auditory systems. By using hamming window, speech signal is divided into a number of blocks of short duration so that Fourier transform can be applied. In this work, the Mel frequency Cepstrum Coefficient (MFCC) feature has been used for designing a text independent speaker identification system. The extracted speech features (MFCC’s) of a speaker are quantized to a number of centroids using vector quantization algorithm. These centroids constitute the codebook of that speaker. MFCC’s are calculated in training phase and again in testing phase. Speakers uttered same words once in a training session and once in a testing session later. The Euclidean distance between the MFCC’s of each speaker in training phase to the centroids of individual speaker in testing phase is measured and the speaker is identified according to the minimum Euclidean distance[11].The code is developed in the MATLAB environment and performs the identification satisfactorily.

                                                                                   Read More....


No comments:

Post a Comment