Article | REF: H3728 V3

Automatic speech recognition

Author: Jean-Paul HATON

Publication date: October 10, 2018 | Lire en français

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!

Automatically translated using artificial intelligence technology (Note that only the original version is binding) > find out more.

A | A

Overview

ABSTRACT

Great progress has recently been made in speech recognition performance (close to that of humans), but the level of understanding of present systems remains very low. Such systems are based on statistical modeling of speech: Hidden Markov Models (HMM) for acoustics, and n-gram models storing the conditional probabilities of sequences of linguistic units. Recent progress has been achieved by coupling classical HMMs with deep neural networks that are made up of a large number of hidden layers and trained by deep learning algorithms using very large amounts of training data. Applications concern mainly text dictation, transcription of media (radio, television) and especially vocal telematics.

Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.

Read the article

AUTHOR

Jean-Paul HATON: Professor at the University of Lorraine, LORIA/INRIA – Member of the Institut Universitaire de France

INTRODUCTION

The use of speech as a means of communication between man and machine has been widely studied in recent decades. In this article, we focus on automatic speech recognition (ASR), i.e. all the techniques used to communicate verbally with a machine. ALR is of undeniable practical interest, under certain conditions of use (remote access, heavy workload, disabled people, etc.). Commercial products have been available for over thirty years, initially mainly for the recognition of isolated and concatenated words, and now for continuously spoken sentences. Most are based on dynamic programming algorithms and stochastic models (Markov sources). However, there are still problems to be solved in order to increase the robustness of these systems and extend their dialog capabilities. Current research focuses on the recognition of noisy speech, the processing of incomplete or incorrect utterances, the definition of dialog procedures, etc.

You do not have access to this resource.

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!

The Ultimate Scientific and Technical Reference

A Comprehensive Knowledge Base, with over 1,200 authors and 100 scientific advisors

+ More than 10,000 articles and 1,000 how-to sheets, over 800 new or updated articles every year

From design to prototyping, right through to industrialization, the reference for securing the development of your industrial projects

KEYWORDS

Hidden Markov Models (HMM) | deep neural networks | deep learning

CAN BE ALSO FOUND IN:

Home IT Software technologies and System architectures Automatic speech recognition

Home IT Digital documents and content management Automatic speech recognition

This article is included in

Traceability

This offer includes:

Knowledge Base

Updated and enriched with articles validated by our scientific committees

Services

A set of exclusive tools to complement the resources

Practical Path

Operational and didactic, to guarantee the acquisition of transversal skills

Doc & Quiz

Interactive articles with quizzes, for constructive reading

Subscribe now!

Ongoing reading
Automatic speech recognition

Characteristics of spoken man-machine communication

Bibliography

(1) - RABINER (L.), HUANG (B.H.) - Fundamentals of speech recognition. – - Prentice-Hall, Englewood Cliffs (1993).
(2) - JUNQUA (J.-C.), HATON (J.-P.) - Robustness in automatic speech recognition. – - Kluwer Academic, Dordrecht...

Software tools

HTK (HMM ToolKit): open-source software for the development of complete speech recognition applications based on MMC http://www.htk.eng.cam.ac.uk/

VISPER (Visual speech processing system): free software for visualizing dynamic programming and MMC recognition stages, developed by the Technical University of Liberec, Czech...