Vous êtes sur la page 1sur 18

EE 225D

Audio Signal Processing in Humans and Machines

Prof. N. Morgan and friends

MW 4:00-5:30

http://www.icsi.berkeley.edu/eecs225d/spr14/ove
rview.html
http://www.icsi.berkeley.edu/eecs225d/spr14/slid
es/
Textbook
Speech and Audio Signal Processing

Gold, Morgan, and Ellis

Wiley&Sons, 2nd edition, 2011


Prerequisites

EE123 or equivalent, and Stat 200A


or equivalent; or grad standing and
consent of instructor
Speech and audio signal
processing:
why does this material
Speech w/omatter?
visual vs visual w/o
speech

Requires DSP, machine learning


Multidisciplinary tasks are good
training

Many applications!
What should we be able
to do
(automatically)?
Human example suggests, plenty
What was said
Who said it
When they said it
What it meant
How to respond
Why is it hard?
Speaker variability (within and
between)

Noise, reverberation, channel


Confusable vocabulary
Meaning and tone
Course Philosophy I
People can do these tasks
effortlessly

Include psychoacoustics and


physiology

Also some acoustics


But of course, also DSP and
machine learning
Course Philosophy II

First part of the course is basic stuff


The rest is applications
Much of the course grade based on an
original project

Some practice in oral presentation


Middle of the course has students
presenting the material (slides from
previous classes can help)
Section I: Broad
background
Synthesis/vocoding history (chaps
2&3)

Recognition history (chap 4)


Machine recognition basics (chap
5)

Human recognition basics (chap


18)
Section II: Scientific
background
Pattern classification (chaps 8 and
9)

Acoustics (chaps 10 and 13)


Linguistic sound categories (chap
23)

(Auditory neurophysiology late in


the course)
Section IIIa: Engineering Apps
Speech recognition

Signal processing front end (chaps 19-22)


Deterministic sequence recognition (chap 24)
Statistical modeling and inference (chaps 25,26)
Discriminant methods and adaptation (chaps
27,28)

Speech recognition and understanding (chap 29)


Section IIIb: Engineering
Apps
Other speech applications

Speech synthesis (chap 30)


Speaker verification (chap 41)
Section IIIc: Engineering Apps
Other audio applications

Perceptual audio coding (chap 35)


Music signal analysis (chap37)
Source separation (chap 39)
Section IV: Hearing
[presented by Prof. Oded Ghitza, Boston

University]

Auditory physiology (chap 14)


Psychoacoustics (chap 15,16)
Section V: Student Projects

Project proposal: By spring break, iterate on


proposed project

Last week of class, students present their


projects, modeled after ICASSP or Interspeech

Finals week, submit written version of project,


schedule demos

Any topic in speech/music/general audio


potentially OK, including tutorial or original
research
Course grading
Quizzes/homeworks (for first half): 20%
Student presentations/participation:
20%

Project proposal: 10%


Project oral presentation: 20%
Project write-up & results: 30%
Course location
After today, 6 floor ICSI
th

1947 Center Street, between Milvia


and MLK

Class will start at 4:15 instead of


4:10 (15 minute walk from Cory)

Office hour, one hour before each


class
Course location

Vous aimerez peut-être aussi