Abstract:
The artificial neural network (ANN), the dynamic time warping (DTW) andthe direct comparison (DC) speech recognition algorithms are compared for the task of speaker-independent syllable recognitions. For the 10 isolated Chinese digits, the experimental results show that when multiple sets of samples were used for training, the training time required for ANN is at least 100 times of that for DTW, and the time for DTW is 5 times of that for DC; the recognition speed of ANN is 300 times more than that of DC, and the speed of DC is 5 times more than that of DTW; the memory required for ANN is less than that for DTW or DC; the correct recognition rate of ANN is 2.3% higher than that of DTW; and the rate of DTW is 6.7% higher than that of DC. The results also indicate that when a single set of samples is used, the recognition rate of DTW is 3.6% higher than that of DC, and the rate of DC is 8.1% higher than that of ANN. It can be concluded that in the case of a small size vocabulary, the overall performance of ANN is superior to that of DTW, and the overall performance of DTW is superior to that of DC.