Robust Automatic Speech Recognition: A Bridge to Practical Applications

Ebook612 pages6 hours

Robust Automatic Speech Recognition: A Bridge to Practical Applications

Name: Robust Automatic Speech Recognition: A Bridge to Practical Applications
Author: Jinyu Li
ISBN: 9780128026168

By Jinyu Li, Li Deng, Reinhold Haeb-Umbach and Yifan Gong

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that have been developed over the past thirty years, with an emphasis on practical methods that have been proven to be successful and which are likely to be further developed for future applications.The strengths and weaknesses of robustness-enhancing speech recognition techniques are carefully analyzed. The book covers noise-robust techniques designed for acoustic models which are based on both Gaussian mixture models and deep neural networks. In addition, a guide to selecting the best methods for practical applications is provided.The reader will:

Gain a unified, deep and systematic understanding of the state-of-the-art technologies for robust speech recognition
Learn the links and relationship between alternative technologies for robust speech recognition
Be able to use the technology analysis and categorization detailed in the book to guide future technology development
Be able to develop new noise-robust methods in the current era of deep learning for acoustic modeling in speech recognition

The first book that provides a comprehensive review on noise and reverberation robust speech recognition methods in the era of deep neural networks
Connects robust speech recognition techniques to machine learning paradigms with rigorous mathematical treatment
Provides elegant and structural ways to categorize and analyze noise-robust speech recognition techniques
Written by leading researchers who have been actively working on the subject matter in both industrial and academic organizations for many years

Skip carousel

LanguageEnglish

PublisherAcademic Press

Release dateOct 30, 2015

ISBN9780128026168

Author

Jinyu Li

Jinyu Li received a Ph.D. degree from Georgia Institute of Technology, U.S. From 2000 to 2003, he was a Researcher at Intel China Research Center and a Research Manager at iFlytek, China. Currently, he is a Principal Applied Scientist at Microsoft, working as a technical lead to design and improve speech modeling algorithms and technologies that ensure industry state-of-the-art speech recognition accuracy for Microsoft products. His major research interests cover several topics in speech recognition and machine learning, including noise robustness, deep learning, discriminative training, and feature extraction. He has authored over 60 papers and awarded over 10 patents.

Related authors

Skip carousel

Related to Robust Automatic Speech Recognition

Related ebooks

Skip carousel

Patterns for Fault Tolerant Software
Ebook
Patterns for Fault Tolerant Software
byRobert S. Hanmer
Rating: 4 out of 5 stars
4/5
Deep Learning on Edge Computing Devices: Design Challenges of Algorithm and Architecture
Ebook
Deep Learning on Edge Computing Devices: Design Challenges of Algorithm and Architecture
byXichuan Zhou
Rating: 0 out of 5 stars
0 ratings
Multidimensional Signal, Image, and Video Processing and Coding
Ebook
Multidimensional Signal, Image, and Video Processing and Coding
byJohn W. Woods
Rating: 0 out of 5 stars
0 ratings
Advanced Video Coding: Principles and Techniques: The Content-based Approach
Ebook
Advanced Video Coding: Principles and Techniques: The Content-based Approach
byK.N. Ngan
Rating: 0 out of 5 stars
0 ratings
Cellular Internet of Things: From Massive Deployments to Critical 5G Applications
Ebook
Cellular Internet of Things: From Massive Deployments to Critical 5G Applications
byOlof Liberg
Rating: 0 out of 5 stars
0 ratings
Machine Reading Comprehension: Algorithms and Practice
Ebook
Machine Reading Comprehension: Algorithms and Practice
byChenguang Zhu
Rating: 0 out of 5 stars
0 ratings
Advances in Independent Component Analysis and Learning Machines
Ebook
Advances in Independent Component Analysis and Learning Machines
byElla Bingham
Rating: 0 out of 5 stars
0 ratings
Feature Extraction and Image Processing for Computer Vision
Ebook
Feature Extraction and Image Processing for Computer Vision
byMark Nixon
Rating: 4 out of 5 stars
4/5
Microwave Wireless Communications: From Transistor to System Level
Ebook
Microwave Wireless Communications: From Transistor to System Level
byAntonio Raffo
Rating: 4 out of 5 stars
4/5
Artificial Intelligence Foundations: Learning from experience
Ebook
Artificial Intelligence Foundations: Learning from experience
byAndrew Lowe
Rating: 0 out of 5 stars
0 ratings
F# for Machine Learning Essentials
Ebook
F# for Machine Learning Essentials
bySudipta Mukherjee
Rating: 0 out of 5 stars
0 ratings
TensorFlow A Complete Guide - 2019 Edition
Ebook
TensorFlow A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition
Ebook
Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition
byJames Jeffers
Rating: 0 out of 5 stars
0 ratings
Signal Processing for Neuroscientists, A Companion Volume: Advanced Topics, Nonlinear Techniques and Multi-Channel Analysis
Ebook
Signal Processing for Neuroscientists, A Companion Volume: Advanced Topics, Nonlinear Techniques and Multi-Channel Analysis
byWim van Drongelen
Rating: 0 out of 5 stars
0 ratings
An Introduction to Wavelets
Ebook
An Introduction to Wavelets
byCharles K. Chui
Rating: 0 out of 5 stars
0 ratings
A Wavelet Tour of Signal Processing: The Sparse Way
Ebook
A Wavelet Tour of Signal Processing: The Sparse Way
byStephane Mallat
Rating: 0 out of 5 stars
0 ratings
Ascend AI Processor Architecture and Programming: Principles and Applications of CANN
Ebook
Ascend AI Processor Architecture and Programming: Principles and Applications of CANN
byXiaoyao Liang
Rating: 0 out of 5 stars
0 ratings
Swarm Intelligence and Bio-Inspired Computation: Theory and Applications
Ebook
Swarm Intelligence and Bio-Inspired Computation: Theory and Applications
byXin-She Yang
Rating: 0 out of 5 stars
0 ratings
Semiconductor Lasers I: Fundamentals
Ebook
Semiconductor Lasers I: Fundamentals
byEli Kapon
Rating: 0 out of 5 stars
0 ratings
Principles and Labs for Deep Learning
Ebook
Principles and Labs for Deep Learning
byShih-Chia Huang
Rating: 0 out of 5 stars
0 ratings
Wavelets: Theory, Algorithms, and Applications
Ebook
Wavelets: Theory, Algorithms, and Applications
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence-Based Brain-Computer Interface
Ebook
Artificial Intelligence-Based Brain-Computer Interface
byVarun Bajaj
Rating: 0 out of 5 stars
0 ratings
Introduction to Deep Learning and Neural Networks with Python™: A Practical Guide
Ebook
Introduction to Deep Learning and Neural Networks with Python™: A Practical Guide
byAhmed Fawzy Gad
Rating: 0 out of 5 stars
0 ratings
Heterogeneous Computing with OpenCL 2.0
Ebook
Heterogeneous Computing with OpenCL 2.0
byDavid R. Kaeli
Rating: 0 out of 5 stars
0 ratings
Pattern Recognition and Image Processing
Ebook
Pattern Recognition and Image Processing
byD Luo
Rating: 5 out of 5 stars
5/5
Meta-Learning: Theory, Algorithms and Applications
Ebook
Meta-Learning: Theory, Algorithms and Applications
byLan Zou
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Robot Perception and Cognition
Ebook
Deep Learning for Robot Perception and Cognition
byAlexandros Iosifidis
Rating: 4 out of 5 stars
4/5
Deep Learning with Structured Data
Ebook
Deep Learning with Structured Data
byMark Ryan
Rating: 0 out of 5 stars
0 ratings
Academic Press Library in Signal Processing, Volume 7: Array, Radar and Communications Engineering
Ebook
Academic Press Library in Signal Processing, Volume 7: Array, Radar and Communications Engineering
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Natural Language Processing A Complete Guide - 2020 Edition
Ebook
Natural Language Processing A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings

Electrical Engineering & Electronics For You

Skip carousel

Electricity for Beginners
Ebook
Electricity for Beginners
byTrevor Wrightson
Rating: 5 out of 5 stars
5/5
How to Diagnose and Fix Everything Electronic, Second Edition
Ebook
How to Diagnose and Fix Everything Electronic, Second Edition
byMichael Jay Geier
Rating: 4 out of 5 stars
4/5
No Nonsense Technician Class License Study Guide: for Tests Given Between July 2018 and June 2022
Ebook
No Nonsense Technician Class License Study Guide: for Tests Given Between July 2018 and June 2022
byDan Romanchik KB6NU
Rating: 5 out of 5 stars
5/5
The Fast Track to Your Technician Class Ham Radio License: For Exams July 1, 2022 - June 30, 2026
Ebook
The Fast Track to Your Technician Class Ham Radio License: For Exams July 1, 2022 - June 30, 2026
byMichael Burnette, AF7KB
Rating: 5 out of 5 stars
5/5
A Degree in a Book: Electrical And Mechanical Engineering: Everything You Need to Know to Master the Subject - in One Book!
Ebook
A Degree in a Book: Electrical And Mechanical Engineering: Everything You Need to Know to Master the Subject - in One Book!
byDavid Baker
Rating: 5 out of 5 stars
5/5
Beginner's Guide to Reading Schematics, Third Edition
Ebook
Beginner's Guide to Reading Schematics, Third Edition
byStan Gibilisco
Rating: 0 out of 5 stars
0 ratings
Beginner's Guide to Reading Schematics, Fourth Edition
Ebook
Beginner's Guide to Reading Schematics, Fourth Edition
byStan Gibilisco
Rating: 4 out of 5 stars
4/5
Electrical Engineering 101: Everything You Should Have Learned in School...but Probably Didn't
Ebook
Electrical Engineering 101: Everything You Should Have Learned in School...but Probably Didn't
byDarren Ashby
Rating: 5 out of 5 stars
5/5
The Homeowner's DIY Guide to Electrical Wiring
Ebook
The Homeowner's DIY Guide to Electrical Wiring
byDavid Herres
Rating: 5 out of 5 stars
5/5
Very Truly Yours, Nikola Tesla
Ebook
Very Truly Yours, Nikola Tesla
byNikola Tesla
Rating: 5 out of 5 stars
5/5
Electrician's Pocket Manual
Ebook
Electrician's Pocket Manual
byRex Miller
Rating: 0 out of 5 stars
0 ratings
Basic Electricity
Ebook
Basic Electricity
byU.S. Bureau of Naval Personnel
Rating: 4 out of 5 stars
4/5
Upcycled Technology: Clever Projects You Can Do With Your Discarded Tech (Tech gift)
Ebook
Upcycled Technology: Clever Projects You Can Do With Your Discarded Tech (Tech gift)
byDaniel Davis
Rating: 5 out of 5 stars
5/5
Practical Electrical Wiring: Residential, Farm, Commercial, and Industrial
Ebook
Practical Electrical Wiring: Residential, Farm, Commercial, and Industrial
byF. P. Hartwell
Rating: 4 out of 5 stars
4/5
Electrician's Calculations Manual, Second Edition
Ebook
Electrician's Calculations Manual, Second Edition
byNick Fowler
Rating: 0 out of 5 stars
0 ratings
Solar & 12 Volt Power For Beginners
Ebook
Solar & 12 Volt Power For Beginners
byGeorge Eccleston
Rating: 4 out of 5 stars
4/5
Off-Grid Projects: Step-by-Step Guide to Building Your Own Off-Grid System
Ebook
Off-Grid Projects: Step-by-Step Guide to Building Your Own Off-Grid System
byRachel Pratt
Rating: 0 out of 5 stars
0 ratings
Two-Stroke Engine Repair and Maintenance
Ebook
Two-Stroke Engine Repair and Maintenance
byPaul Dempsey
Rating: 0 out of 5 stars
0 ratings
DIY Lithium Battery
Ebook
DIY Lithium Battery
byJeremy A. Hampton
Rating: 3 out of 5 stars
3/5
15 Dangerously Mad Projects for the Evil Genius
Ebook
15 Dangerously Mad Projects for the Evil Genius
bySimon Monk
Rating: 4 out of 5 stars
4/5
Mims Circuit Scrapbook V.I.
Ebook
Mims Circuit Scrapbook V.I.
byForrest Mims
Rating: 5 out of 5 stars
5/5
Raspberry Pi Projects for the Evil Genius
Ebook
Raspberry Pi Projects for the Evil Genius
byDonald Norris
Rating: 0 out of 5 stars
0 ratings
Programming Arduino: Getting Started with Sketches
Ebook
Programming Arduino: Getting Started with Sketches
bySimon Monk
Rating: 4 out of 5 stars
4/5
DIY Drones for the Evil Genius: Design, Build, and Customize Your Own Drones
Ebook
DIY Drones for the Evil Genius: Design, Build, and Customize Your Own Drones
byIan Cinnamon
Rating: 4 out of 5 stars
4/5
Electroculture - The Application of Electricity to Seeds in Vegetable Growing
Ebook
Electroculture - The Application of Electricity to Seeds in Vegetable Growing
byAlexander Carr Bennett
Rating: 0 out of 5 stars
0 ratings
How Do Electric Motors Work? Physics Books for Kids | Children's Physics Books
Ebook
How Do Electric Motors Work? Physics Books for Kids | Children's Physics Books
byBaby Professor
Rating: 0 out of 5 stars
0 ratings
Electric Circuits Essentials
Ebook
Electric Circuits Essentials
byThe Editors of REA
Rating: 5 out of 5 stars
5/5
Schaum's Outline of Basic Electricity, Second Edition
Ebook
Schaum's Outline of Basic Electricity, Second Edition
byMilton Gussow
Rating: 5 out of 5 stars
5/5
Basic Electronics: Book 2
Ebook
Basic Electronics: Book 2
byPaul Daak
Rating: 5 out of 5 stars
5/5
THE Amateur Radio Dictionary: The Most Complete Glossary of Ham Radio Terms Ever Compiled
Ebook
THE Amateur Radio Dictionary: The Most Complete Glossary of Ham Radio Terms Ever Compiled
byDon Keith
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

[MINI] Long Short Term Memory: Thanks to our sponsor brilliant.org/dataskeptics A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which attempts to provide the network the capacity to store information for longer periods of time. An...
Podcast episode
[MINI] Long Short Term Memory: Thanks to our sponsor brilliant.org/dataskeptics A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which attempts to provide the network the capacity to store information for longer periods of time. An...
byData Skeptic
0 ratings
0% found this document useful
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
Podcast episode
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Recurrent Neural Nets: This week, we're doing a crash course in recurren…
Podcast episode
Recurrent Neural Nets: This week, we're doing a crash course in recurren…
byLinear Digressions
0 ratings
0% found this document useful
#111 The Rise of the Julia Programming Language
Podcast episode
#111 The Rise of the Julia Programming Language
byDataFramed
0 ratings
0% found this document useful
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
Podcast episode
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
A Programmer's Introduction to Mathematics with Jeremy Kun: Like Programming, Mathematics has language and culture. Jeremy Kun has written A Programmer's Introduction to Mathematics as a way to bridge these two worlds and make the power and magic of mathematics available and understandable to programmers everywhere.
Podcast episode
A Programmer's Introduction to Mathematics with Jeremy Kun: Like Programming, Mathematics has language and culture. Jeremy Kun has written A Programmer's Introduction to Mathematics as a way to bridge these two worlds and make the power and magic of mathematics available and understandable to programmers everywhere.
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
Past, Present and Future of C++ with Bjarne Stroustrup: Rob and Jason are joined by Bjarne Stroustrup, designer and original implementer of C++ to discuss the current state of C++, his vision for the future as well as some discussion of the past. Bjarne Stroustrup is the designer and original implementer...
Podcast episode
Past, Present and Future of C++ with Bjarne Stroustrup: Rob and Jason are joined by Bjarne Stroustrup, designer and original implementer of C++ to discuss the current state of C++, his vision for the future as well as some discussion of the past. Bjarne Stroustrup is the designer and original implementer...
byCppCast
0 ratings
0% found this document useful
The magic of Software Defined Radio with Ben Hilburn: Ben Hilburn is the Director of Engineering at DeepSig Inc., which is commercializing the fundamental research behind deep learning applied to wireless communications and signal processing. He also runs GNU Radio, the most widely used open-source signal processing toolkit in the world, serving as Project Lead and President of The GNU Radio Foundation. Ben talks to Scott about why Software Defined Radio is magical and they talk about how SDR can be used to teach STEM and solve interesting engineering problems.
Podcast episode
The magic of Software Defined Radio with Ben Hilburn: Ben Hilburn is the Director of Engineering at DeepSig Inc., which is commercializing the fundamental research behind deep learning applied to wireless communications and signal processing. He also runs GNU Radio, the most widely used open-source signal processing toolkit in the world, serving as Project Lead and President of The GNU Radio Foundation. Ben talks to Scott about why Software Defined Radio is magical and they talk about how SDR can be used to teach STEM and solve interesting engineering problems.
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
Systems Engineering: Our PM methods face stress in the face of projects that are related to research and development…where systems engineering is the key discipline is needed. The focus is on the requirements that change, that the project scope is unstable, as there are...
Podcast episode
Systems Engineering: Our PM methods face stress in the face of projects that are related to research and development…where systems engineering is the key discipline is needed. The focus is on the requirements that change, that the project scope is unstable, as there are...
byPM Point of View
0 ratings
0% found this document useful
Data Collection And Management For Teaching Machines To Hear At Audio Analytic - Episode 139: An interview about how Audio Analytic is building a data set of high quality audio samples from scratch to power their sound recognition technology.
Podcast episode
Data Collection And Management For Teaching Machines To Hear At Audio Analytic - Episode 139: An interview about how Audio Analytic is building a data set of high quality audio samples from scratch to power their sound recognition technology.
byData Engineering Podcast
0 ratings
0% found this document useful
Season Two, Episode One -- Future of Work Part One: Communication Conundrum
Podcast episode
Season Two, Episode One -- Future of Work Part One: Communication Conundrum
byScience in Parallel
0 ratings
0% found this document useful
Podcast Ep. #18 – Prof. Wenbin Yu on the Structure Genome: On this episode I am speaking to Wenbin Yu, who is a professor at the School of Aeronautics and Astronautics of Purdue University and CTO of AnalySwift, a provider of simulation software for composites. Wenbin has achieved many accolades in both the ac...
Podcast episode
Podcast Ep. #18 – Prof. Wenbin Yu on the Structure Genome: On this episode I am speaking to Wenbin Yu, who is a professor at the School of Aeronautics and Astronautics of Purdue University and CTO of AnalySwift, a provider of simulation software for composites. Wenbin has achieved many accolades in both the ac...
byAerospace Engineering Podcast
0 ratings
0% found this document useful
PhD goal setting and organisation: How is organising a PhD like an episode of Grand Designs?
Podcast episode
PhD goal setting and organisation: How is organising a PhD like an episode of Grand Designs?
byPhD: Addicted to Research
0 ratings
0% found this document useful
What's real and what's hype? - Decades of ML with Eugene Dubossarsky - 012: What does a person tell you who has decades of experience in ML? Learn statistics.
Podcast episode
What's real and what's hype? - Decades of ML with Eugene Dubossarsky - 012: What does a person tell you who has decades of experience in ML? Learn statistics.
byMachine Learning Cafe
0 ratings
0% found this document useful
It's All In The (Deepfake) Experience with Siwei Lyu: This week we catch up with Dr. Siwei Lyu, a SUNY Empire Innovation Professor and founding Co-Director of Center for Information Integrity (CII) at the University at Buffalo, State University of New York. Siwei breaks down the deepfake experience, both...
Podcast episode
It's All In The (Deepfake) Experience with Siwei Lyu: This week we catch up with Dr. Siwei Lyu, a SUNY Empire Innovation Professor and founding Co-Director of Center for Information Integrity (CII) at the University at Buffalo, State University of New York. Siwei breaks down the deepfake experience, both...
byTo The Point - Cybersecurity
0 ratings
0% found this document useful
From search trees to neural nets, a deep dive into natural language processing: Today's episode is sponsored by Rev. We explore the history of automatic speech recognition and computer systems that can understand human commands. From there, we explain the machine learning revolution that has powered recent advancements in speech to text systems like the one employed by Rev. Finally, we look to the future, and imagine the features and services that the next generation of this AI could produce.
Podcast episode
From search trees to neural nets, a deep dive into natural language processing: Today's episode is sponsored by Rev. We explore the history of automatic speech recognition and computer systems that can understand human commands. From there, we explain the machine learning revolution that has powered recent advancements in speech to text systems like the one employed by Rev. Finally, we look to the future, and imagine the features and services that the next generation of this AI could produce.
byThe Stack Overflow Podcast
0 ratings
0% found this document useful
Cisco Optics Podcast Ep 43. Fascinating laser research projects you wish you thought of (9 of 9): Lasers have been around for over six decades. Sin…
Podcast episode
Cisco Optics Podcast Ep 43. Fascinating laser research projects you wish you thought of (9 of 9): Lasers have been around for over six decades. Sin…
byCisco Podcast Network
0 ratings
0% found this document useful
Episode 8 (2020): Sylvain Besençon - Securing by hacking: maintenance regimes around an end-to-end encryption standard
Podcast episode
Episode 8 (2020): Sylvain Besençon - Securing by hacking: maintenance regimes around an end-to-end encryption standard
byHacker Cultures: The Conference Podcast
0 ratings
0% found this document useful
Skeleton of Thought: LLMs Can Do Parallel Decoding
Podcast episode
Skeleton of Thought: LLMs Can Do Parallel Decoding
byDeep Papers
0 ratings
0% found this document useful
Lost in the Middle: How Language Models Use Long Contexts
Podcast episode
Lost in the Middle: How Language Models Use Long Contexts
byDeep Papers
0 ratings
0% found this document useful
Episode 293: JSJ 290: Open Source Software with Dirk Hohndel - VMWare Chief Open Source Officer
Podcast episode
Episode 293: JSJ 290: Open Source Software with Dirk Hohndel - VMWare Chief Open Source Officer
byJavaScript Jabber
0 ratings
0% found this document useful
Open Source Software as a Triumph of Information Hiding, Modularity, and Creating Optionality with Dr. Gail Murphy: In this newest episode of The Idealcast, Gene Kim speaks with Dr. Gail Murphy, Professor of Computer Science and Vice President of Research and Innovation at the University of British Columbia. She is also the co-founder, board member, and former Chi...
Podcast episode
Open Source Software as a Triumph of Information Hiding, Modularity, and Creating Optionality with Dr. Gail Murphy: In this newest episode of The Idealcast, Gene Kim speaks with Dr. Gail Murphy, Professor of Computer Science and Vice President of Research and Innovation at the University of British Columbia. She is also the co-founder, board member, and former Chi...
byThe Idealcast with Gene Kim by IT Revolution
0 ratings
0% found this document useful
Nodes of Design#82: Industrial design by Charles L Mauro, CHFP
Podcast episode
Nodes of Design#82: Industrial design by Charles L Mauro, CHFP
byNodes of Design
0 ratings
0% found this document useful
Cisco Optics Podcast Ep 40. Fascinating laser research projects you wish you thought of (6 of 9): Lasers have been around for over six decades. Sin…
Podcast episode
Cisco Optics Podcast Ep 40. Fascinating laser research projects you wish you thought of (6 of 9): Lasers have been around for over six decades. Sin…
byCisco Podcast Network
0 ratings
0% found this document useful
HP’s Aaron Weller on Privacy Engineering, PETs, and Information Security: Aaron Weller is the Leader of the Global Privacy Engineering Center of Excellence at HP, an international IT company developing personal computers, printers, and 3D printing solutions. Aaron provides technical leadership for privacy engineering,...
Podcast episode
HP’s Aaron Weller on Privacy Engineering, PETs, and Information Security: Aaron Weller is the Leader of the Global Privacy Engineering Center of Excellence at HP, an international IT company developing personal computers, printers, and 3D printing solutions. Aaron provides technical leadership for privacy engineering,...
byShe Said Privacy/He Said Security
0 ratings
0% found this document useful
050 - Hearing in 3D with Dr. Ivan Tashev
Podcast episode
050 - Hearing in 3D with Dr. Ivan Tashev
byMicrosoft Research Podcast
0 ratings
0% found this document useful
Episode 21: Remember when RealNetworks used to-- BUFFERING: Are you about to head off to college? Interested in DevOps and the Cloud? Is there a good way for someone like you who is starting out in the world of technology to absorb the necessary skills? The Open Source Lab (OSL) at Oregon State University (OSU) is
Podcast episode
Episode 21: Remember when RealNetworks used to-- BUFFERING: Are you about to head off to college? Interested in DevOps and the Cloud? Is there a good way for someone like you who is starting out in the world of technology to absorb the necessary skills? The Open Source Lab (OSL) at Oregon State University (OSU) is
byScreaming in the Cloud
0 ratings
0% found this document useful
Language Parsing and Character Mining with Jinho Choi - TWiML Talk #206: Today, in the second episode of our re:Invent series, we’re joined by Jinho Choi, assistant professor of computer science at Emory University. Jinho presented at the conference on ELIT — a cloud-based NLP platform — which is short for Evolution...
Podcast episode
Language Parsing and Character Mining with Jinho Choi - TWiML Talk #206: Today, in the second episode of our re:Invent series, we’re joined by Jinho Choi, assistant professor of computer science at Emory University. Jinho presented at the conference on ELIT — a cloud-based NLP platform — which is short for Evolution...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
100%
100% found this document useful
Fast.ai, AutoML, and Software Engineering for ML: Jeremy Howard // Coffee Session #47
Podcast episode
Fast.ai, AutoML, and Software Engineering for ML: Jeremy Howard // Coffee Session #47
byMLOps.community
0 ratings
0% found this document useful
Sahil Gupta, Co-founder, Product Lead, Soundskrit —The New Technology That Is Improving Audio: Sahil Gupta, co-founder and product lead of Soundskrit (soundskrit.ca), delivers an engrossing overview of the future of audio and microphone technology. Gupta received a B.S. and master’s of engineering in electrical and computer engineering from...
Podcast episode
Sahil Gupta, Co-founder, Product Lead, Soundskrit —The New Technology That Is Improving Audio: Sahil Gupta, co-founder and product lead of Soundskrit (soundskrit.ca), delivers an engrossing overview of the future of audio and microphone technology. Gupta received a B.S. and master’s of engineering in electrical and computer engineering from...
byFinding Genius Podcast
0 ratings
0% found this document useful

Skip carousel

Quantum Entanglement Could Take GPS To The Next Level
Futurity
Article
Quantum Entanglement Could Take GPS To The Next Level
Apr 20, 2020
3 min read
Kodachi 8.3
Linux Format
Article
Kodachi 8.3
May 4, 2021
2 min read
Build Your Own Distro With NixOS
Linux Format
Article
Build Your Own Distro With NixOS
Apr 4, 2023
Credit: https://nixos.org Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. In his spare time, he enjoys listening to music and reading. We’re continuing our look at next-gen
10 min read
Customizable Blocks Cut Noise Pollution 6X Better
Futurity
Article
Customizable Blocks Cut Noise Pollution 6X Better
May 28, 2019
2 min read
Voice-Activated Technology Must Advance to Support Hybrid Workplaces
Techfastly
Article
Voice-Activated Technology Must Advance to Support Hybrid Workplaces
Jun 1, 2022
5 min read
Wireless ‘Clearbuds’ Use Machine Learning For Better Sound
Futurity
Article
Wireless ‘Clearbuds’ Use Machine Learning For Better Sound
Jul 12, 2022
2 min read
Spectral Effects
Future Music
Article
Spectral Effects
Feb 7, 2023
2 min read
Kef’s Jack Oclee-brown: “The Job Of The Speaker Is Simply To Not Get In The Way”
What Hi-Fi?
Article
Kef’s Jack Oclee-brown: “The Job Of The Speaker Is Simply To Not Get In The Way”
Jun 24, 2020
6 min read
App Could Keep Hackers From Stealing Your Voice
Futurity
Article
App Could Keep Hackers From Stealing Your Voice
Jun 6, 2017
Using only tools already on smartphones, including the compass, engineers have created an app to stop voice hacking. While convenient, Siri, WeChat, and other voice-based smartphone apps can expose you to this growing security threat. With just a few
2 min read
When AI Can Transcribe Everything
The Atlantic
Article
When AI Can Transcribe Everything
Jun 20, 2017
5 min read
Audiomovers: Sound That Crosses Borders
Computer Music
Article
Audiomovers: Sound That Crosses Borders
Nov 1, 2023
7 min read
THE SLEEPING GIANT: Voice in the Enterprise
The European Business Review
Article
THE SLEEPING GIANT: Voice in the Enterprise
Oct 3, 2019
9 min read
Installing Assistive Listening Technology
Facility Management
Article
Installing Assistive Listening Technology
Jun 27, 2019
4 min read
Picture In A Mainframe
Linux Format
Article
Picture In A Mainframe
Jul 2, 2019
11 min read
How To Develop And Sell Your Own Instruments
Beat English
Article
How To Develop And Sell Your Own Instruments
Jan 5, 2022
11 min read
Beautiful, Functional, and Durable
Residential Tech Today
Article
Beautiful, Functional, and Durable
Mar 8, 2019
At Cloud9 Smart, one of our guiding pillars is that, “In our interaction with clients, trades, and each other, we focus on the collective goal.” Surely, the collective goal of our industry is to work together with our clients and other trades to buil
5 min read
Super Cheap Hearing Aid Uses Parts That Cost $1
Futurity
Article
Super Cheap Hearing Aid Uses Parts That Cost $1
Sep 29, 2020
4 min read
Master Your Synths
Future Music
Article
Master Your Synths
Jan 9, 2024
25 min read
VHF Plus
CQ Amateur Radio
Article
VHF Plus
May 1, 2021
9 min read
Shh! we’re Making Far Too Much Noise…
Science Illustrated
Article
Shh! we’re Making Far Too Much Noise…
Feb 15, 2024
4 min read
Siegfried Linkwitz (1935–2018)
Australian HiFi
Article
Siegfried Linkwitz (1935–2018)
Nov 11, 2018
3 min read
NATIVE INSTRUMENTS Pharlight
Music Tech Magazine
Article
NATIVE INSTRUMENTS Pharlight
Aug 20, 2020
5 min read
Earbud Chirps May One Day Detect Infections
Futurity
Article
Earbud Chirps May One Day Detect Infections
Aug 4, 2022
1 min read
‘EarEcho’ Uses Your Ear To Unlock Your Phone
Futurity
Article
‘EarEcho’ Uses Your Ear To Unlock Your Phone
Sep 19, 2019
2 min read
Smart Mouthguard Uses Bite To Control All Kinds Of Devices
Futurity
Article
Smart Mouthguard Uses Bite To Control All Kinds Of Devices
Oct 18, 2022
2 min read
Always Looking Forward
Recoil
Article
Always Looking Forward
Mar 23, 2021
3 min read
Mark Döhmann
Sound and Image
Article
Mark Döhmann
Dec 3, 2018
1 min read
Opinion: Why Brain Decoding Is Not Mind Reading — And Why That Matters
STAT
Article
Opinion: Why Brain Decoding Is Not Mind Reading — And Why That Matters
Jun 8, 2023
1 min read
Case Studies: Design Review Panels In Action
Architecture Australia
Article
Case Studies: Design Review Panels In Action
Mar 5, 2023
7 min read
Editing Software podcasting
T3 India
Article
Editing Software podcasting
Dec 6, 2023
2 min read

Related categories

Skip carousel

Reviews for Robust Automatic Speech Recognition

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Robust Automatic Speech Recognition - Jinyu Li

Chapter 1

Introduction

Abstract

Automatic speech recognition (ASR) by machine has been a field of research for more than 60 years. The industry has developed a broad range of commercial products where ASR as user interface has become ever more useful and pervasive. Consumer-centric applications increasingly require ASR to be robust to the full range of real-world noise and other acoustic distorting conditions. However, reliably recognizing spoken words in realistic acoustic environments is still a challenge.

We introduce distortion factors that operate in various stages of speech production, from thought to speech signals, leading to the issues of ASR robustness as the focus of this book. We provide an introductory summary of this book in this chapter, covering the ASR robustness problem for acoustic models based on both Gaussian mixture models and deep neural networks. The book goes significantly beyond much of the existing survey literature, and illustrates the research and product development on ASR robustness to noisy acoustic environments that has been progressing for over 30 years.

Finally, we define the mission, goal, and structure of the book in this chapter. We aim to establish a solid, consistent, and common mathematical foundation for robust ASR, emphasizing the methods proven to be successful and expected to sustain or expand their future applicability.

Keywords

Automatic speech recognition

Noise robustness

ASR applications

Survey

Gaussian mixture models

Deep neural networks

Chapter Outline

1.1 Automatic Speech Recognition 1

1.2 Robustness to Noisy Environments 2

1.3 Existing Surveys in the Area 2

1.4 Book Structure Overview 5

References 6

1.1 Automatic Speech Recognition

Automatic speech recognition (ASR) is the process and the related technology for converting the speech signal into its corresponding sequence of words or other linguistic entities by means of algorithms implemented in a device, a computer, or computer clusters (Deng and O’Shaughnessy, 2003; Huang et al., 2001b). ASR by machine has been a field of research for more than 60 years (Baker et al., 2009a,b; Davis et al., 1952). The industry has developed a broad range of commercial products where speech recognition as user interface has become ever useful and pervasive.

Historically, ASR applications have included voice dialing, call routing, interactive voice response, data entry and dictation, voice command and control, gaming, structured document creation (e.g., medical and legal transcriptions), appliance control by voice, computer-aided language learning, content-based spoken audio search, and robotics. More recently, with the exponential growth of big data and computing power, ASR technology has advanced to the stage where more challenging applications are becoming a reality. Examples are voice search, digital assistance and interactions with mobile devices (e.g., Siri on iPhone, Bing voice search and Cortana on winPhone and Windows 10 OS, and Google Now on Android), voice control in home entertainment systems (e.g., Kinect on xBox), machine translation, home automation, in-vehicle navigation and entertainment, and various speech-centric information processing applications capitalizing on downstream processing of ASR outputs (He and Deng, 2013).

1.2 Robustness to Noisy Environments

New waves of consumer-centric applications increasingly require ASR to be robust to the full range of real-world noise and other acoustic distorting conditions. However, reliably recognizing spoken words in realistic acoustic environments is still a challenge. For such large-scale, real-world applications, noise robustness is becoming an increasingly important core technology since ASR needs to work in much more difficult acoustic environments than in the past (Deng et al., 2002).

Noise refers to any unwanted disturbances superposed upon the intended speech signal. Robustness is the ability of a system to maintain its good performance under varying operating conditions, including those unforeseeable or unavailable at the time of system development.

Speech as observed and digitized is generated by a complex process, from the thoughts to actual speech signals. This process can be described in five stages as shown in Figure 1.1, where a number of variables affect the outcome of each stage. Some major stages in this long chain have been analyzed and modeled mathematically in Deng (1999, 2006).

Figure 1.1 From thoughts to speech.

All of the above could lead to ASR robustness issues. This book addresses challenges mostly in the acoustic channel area where interfering signals lead to ASR performance degradation.

In this area, robustness of ASR to noisy background can be approached from two directions:

• reducing the noise level by exploring hardware utilizing spatial or directional information from microphone technology and transducer principles, such as noise canceling microphones and microphone arrays;

• software algorithmic processing taking advantage of the spectral and temporal separation between speech and interfering signals, which is the major focus of this book.

1.3 Existing Surveys in the Area

Researchers and practitioners have been trying to improve ASR robustness to operating conditions for many years (Huang et al., 2001a; Huang and Deng, 2010). A survey of the 1970s speech recognition systems has identified (Lea, 1980) that a primary difficulty with speech recognition is this ability of the input to pick up other sounds in the environment that act as interfering noise. The term robust speech recognition emerged in the late 1980s. Survey papers in the 1990s include (Gong, 1995; Juang, 1991; Junqua and Haton, 1995). By 2000, robust speech recognition has gained significant importance in the speech and language processing fields. Actually, it was the most popular area in the International Conference on Acoustics, Speech and Signal Processing, at least during 2001-2003 (Gong, 2004). Since 2010, robust ASR remains one of the most popular areas in the speech processing community, and tremendous and steady progress in noisy speech recognition have been made.

A large number of noise-robust ASR methods, in the order of hundreds, have been proposed and published over the past 30 years or so, and many of them have created significant impact on either research or commercial use. Such accumulated knowledge deserves thorough examination not only to define the state of the art in this field from a fresh and unifying perspective, but also to point to potentially fruitful future directions. Nevertheless, a well-organized framework for relating and analyzing these methods is conspicuously missing. The existing survey papers (Acero, 1993; Deng, 2011; Droppo and Acero, 2008; Gales, 2011; Gong, 1995; Haeb-Umbach, 2011; Huo and Lee, 2001; Juang, 1991; Kumatani et al., 2012; Lee, 1998) in noise-robust ASR either do not cover all recent advances in the field or focus only on a specific sub-area. Although there are also few recent books (Kolossa and Haeb-Umbach, 2011; Virtanen et al., 2012), they are collections of topics with each chapter written by different authors and it is hard to provide a unified view across all topics. Given the importance of noise-robust ASR, the time is ripe to analyze and unify the solutions. The most recent overview paper (Li et al., 2014) elaborates on the basic concepts in noise-robust ASR and develops categorization criteria and unifying themes. Specifically, it hierarchically classifies the major and significant noise-robust ASR methods using a consistent and unifying mathematical language. It establishes their interrelations and differentiates among important techniques, and discusses current technical challenges and future research directions. It also identifies relatively promising, short-term new research areas based on a careful analysis of successful methods, which can serve as a reference for future algorithm development in the field. Furthermore, in the literature spanning over 30 years on noise-robust ASR, there is inconsistent use of basic concepts and terminology as adopted by different researchers in the field. This kind of inconsistency is confusing at times, especially for new researchers and students. It is, therefore, important to examine discrepancies in the current literature and re-define a consistent terminology. However, due to the restriction of page length, the overview paper (Li et al., 2014) did not discuss the technologies in depth. More importantly, all the aforementioned books and articles largely assumed that the acoustic models for ASR are based on Gaussian mixture model hidden Markov models (GMM-HMMs).

More recently, a new acoustic modeling technique, referred to as the context-dependent deep neural network hidden Markov model (CD-DNN-HMM) which employs deep learning, has been developed (Deng and Yu, 2014; Yu and Deng, 2011, 2014). This new DNN-based acoustic model has been shown, by many groups, to significantly outperform the conventional state-of-the-art GMM-HMMs in many ASR tasks (Dahl et al., 2012; Hinton et al., 2012). As of the writing of this book, DNN-based ASR has been widely adopted by almost all major speech recognition products and public tools worldwide.

DNNs combine acoustic feature extraction and speech phonetic symbol classification into a single framework. By design, they ensure that both feature extraction and classification are jointly optimized under a discriminative criterion. With their complex non-linear mapping built on top of successive applications of simple non-linear mapping, DNNs force input features distorted by a variety of noise and channels as well as other factors to be mapped to a same output vector of phonetic symbol classes. Such an ability provides the potential for substantial performance improvement in noisy speech recognition.

However, while DNNs dramatically reduce overall word error rate for speech recognition, many new questions are raised: How much are DNNs more robust than GMMs? How should we introduce a physical model of speech, noise, and channel into a DNN model so that a better DNN can be trained given the same data? Will feature cleaning for a DNN add value to the DNN modeling? Can we model speech with a DNN such that complete, expensive retraining can be avoided upon a change in noise? To what extend the noise robustness methods developed for GMMs can enhance the robustness of DNNs? etc. More generally, what the future of noise-robust ASR technologies would hold in the new era of DNNs for ASR is a question not addressed in the existing survey literature on noise-robust ASR. One of the main goals of this book is to survey the recent noise-robust methods developed for DNNs as the acoustic models of speech, and to discuss the future research directions.

1.4 Book Structure Overview

This book is devoted to providing a summary of the current, fast expanding knowledge and approaches to solving a variety of problems in noise-robust ASR. A more specific purpose is to assist readers in acquiring a structured understanding of the state of the art and to continue to enrich the knowledge.

In this book, we aim to establish a solid, consistent, and common mathematical foundation for noise-robust ASR. We emphasize the methods that are proven to be effective and successful and that are likely to sustain or expand their future applicability. For the methods described in this book, we attempt to present the basic ideas, the assumptions, and the relationships with other methods. We categorize a wide range of noise-robust techniques using different criteria to equip the reader with the insight to choose among techniques and with the awareness of the performance-complexity tradeoffs. The pros and cons of using different noise-robust ASR techniques in practical application scenarios are provided as a guide to interested practitioners. The current challenges and future research directions especially in the era of DNNs and deep learning are carefully analyzed.

This book is organized as follows. We provide the basic concepts and formulations of ASR in Chapter 2. In Chapter 3, we discuss the fundamentals of noise-robust ASR. The impact of noise and channel distortions on clean speech is examined. Then, we build a general framework for noise-robust ASR and define five ways of categorizing and analyzing noise-robust ASR techniques. Chapter 4 is devoted to the first category—feature-domain vs. model-domain techniques. Various feature-domain processing methods are covered in detail, including noise-resistant features, feature moment normalization, and feature compensation, as well as a few most prominent model-domain methods. The second category, detailed in Chapter 5, comprises methods that exploit prior knowledge about the signal distortion. Examples of such models are mapping functions between the clean and noisy speech features, and environment-specific models combined during online operation of the noise-robust algorithms. Methods that incorporate an explicit distortion model to predict the distorted speech from a clean one define the third category, covered in Chapter 6. The use of uncertainty constitutes the fourth way to categorize a wide range of noise-robust ASR algorithms, and is covered in Chapter 7. Uncertainty in either the model space or feature space may be incorporated within the Bayesian framework to promote noise-robust ASR. The final, fifth way to categorize and analyze noise-robust ASR techniques exploits joint model training, described in Chapter 8. With joint model training, environmental variability in the training data is removed in order to generate canonical models. After the noise-robust techniques for single-microphone non-reverberant ASR are comprehensively discussed above, the book includes two chapters, covering reverberant ASR and multi-channel processing for noise-robust ASR, respectively. We conclude this book in Chapter 11, with discussions on future directions for noise-robust

Enjoying the preview?

Page 1 of 1

Robust Automatic Speech Recognition: A Bridge to Practical Applications

About this ebook

Jinyu Li

Related authors

Related to Robust Automatic Speech Recognition

Related ebooks

Electrical Engineering & Electronics For You

Related podcast episodes

Related articles

Related categories

Reviews for Robust Automatic Speech Recognition

What did you think?

Book preview

Robust Automatic Speech Recognition - Jinyu Li

Abstract

Keywords

Chapter Outline

1.1 Automatic Speech Recognition

1.2 Robustness to Noisy Environments

1.3 Existing Surveys in the Area

1.4 Book Structure Overview