Overview
ABSTRACT
Great progress has recently been made in speech recognition performance (close to that of humans), but the level of understanding of present systems remains very low. Such systems are based on statistical modeling of speech: Hidden Markov Models (HMM) for acoustics, and n-gram models storing the conditional probabilities of sequences of linguistic units. Recent progress has been achieved by coupling classical HMMs with deep neural networks that are made up of a large number of hidden layers and trained by deep learning algorithms using very large amounts of training data. Applications concern mainly text dictation, transcription of media (radio, television) and especially vocal telematics.
Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.
Read the articleAUTHOR
-
Jean-Paul HATON: Professor at the University of Lorraine, LORIA/INRIA – Member of the Institut Universitaire de France
INTRODUCTION
The use of speech as a means of communication between man and machine has been widely studied in recent decades. In this article, we focus on automatic speech recognition (ASR), i.e. all the techniques used to communicate verbally with a machine. ALR is of undeniable practical interest, under certain conditions of use (remote access, heavy workload, disabled people, etc.). Commercial products have been available for over thirty years, initially mainly for the recognition of isolated and concatenated words, and now for continuously spoken sentences. Most are based on dynamic programming algorithms and stochastic models (Markov sources). However, there are still problems to be solved in order to increase the robustness of these systems and extend their dialog capabilities. Current research focuses on the recognition of noisy speech, the processing of incomplete or incorrect utterances, the definition of dialog procedures, etc.
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference
KEYWORDS
Hidden Markov Models (HMM) | deep neural networks | deep learning
This article is included in
Traceability
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
Services
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
Automatic speech recognition
Bibliography
Software tools
HTK (HMM ToolKit): open-source software for the development of complete speech recognition applications based on MMC http://www.htk.eng.cam.ac.uk/
VISPER (Visual speech processing system): free software for visualizing dynamic programming and MMC recognition stages, developed by the Technical University of Liberec, Czech...
Directory
Manufacturers – Suppliers – Distributors (non-exhaustive list)
Companies specializing in automatic speech processing:
Vecsys http://www.vecsys.fr/presentation/index.htm
Loquendo http://www.loquendo.com/fr/
...Documentation
Speech Communication magazine (4 issues/year)
IEEE Transactions on Pattern Recognition and Machine Intelligence (6 issues/year)
International Journal of Pattern Recognition and Artificial Intelligence (4 issues/year)
Journal of the Acoustical Society of America (12 issues/year)
Traitement du Signal magazine (4 issues/year)...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference