0 Votes +0 Votes -

23 vues5 pagesArticulo cientifico sobre procesamiento del habla

Apr 11, 2016

© © All Rights Reserved

PDF, TXT ou lisez en ligne sur Scribd

Articulo cientifico sobre procesamiento del habla

© All Rights Reserved

23 vues

Articulo cientifico sobre procesamiento del habla

© All Rights Reserved

- MEASUREMENTS AND ANALYSIS OF THE ACOUSTICS OF THE ANCIENT THEATRE OF EPIDAURUS
- O. Toderean
- Machine Learning Bio Inform Atcs
- A method for Vietnamese text normalization to improve the quality of speech synthesis
- Casio Privia PX-5S-FAQ
- Discrete Signal Analysis and Design
- fatigue
- Multi-Agent Event Recognition in Structured Scenarios - Morariu, Davis - Proceedings of IEEE Conference on Computer Vision and Pattern Recognition - 2011
- MS20 Mini Drum Synthesis Guide
- 6431-17796-1-SM
- 5450.1.OMAE2012-83405
- Optical Sensor Based Instrument for Correlative Analysis
- The Timbre Toolbox: Extracting audio descriptors from musical signals
- Audacity Tutorial
- STFT
- 9104_FULLTEXT.pdf
- A New Method to Predict Vessel Platform Critical Dynamics
- Digital Signal Processing (1)
- WCEE2012_5777
- Measurement of Electrocardiographic Signals for Analysis of HeartConditions and Problems

Vous êtes sur la page 1sur 5

Athira Aroon

S.B Dhonde

Abstract-

Pune , India

Pune,India

athiraaroon3@gmail.com

dhondesomnath@gmail.com

the

been

In

this

introduced

.Have

techniques used in

Gaussian

Process

Distribution

paper

we

have

emphasized

briefly

the

reviewed

recent

emerging

Regression(GPR),

Estimators

(NADE)

Neural

Autoregressive

overcoming

Restricted

take

procedural

view:

they

describe

the

sequence

of

in a simple 'pipeline' architecture[[][2] . We have undertaken

this review inorder to study the progresses towards the speech

synthesis .

estimation

algorithms

proposed

for

speech

synthesis

like

high quality.)

Index

Statistical

Parametric

vocabularies

.Spoken

words

combination

speech

and

termed

is

a

as

originated

limited

sound

people. [t comprises of

quality.

L INTRODUCTION

consonant

description of

Terms--. Text-to-Speech,

phonetic

set

out

of

vowels

unites

in

the

and

The

rather than stored exemplars. It is

statistical because

it

speech

synthesizer[ 1].

coefficients are obtained from speech database by mel cepstral

analysis .Mel cepstral coefficients are used to train HMM

phenome [3].

Abstract

Input

Analysis

Text

routines

Underlying

Linguistic

r-------1

Synthes is

Output

routines

Speech

Description

phenomes

later stage text to be synthesized is transformed into

phenome sequence, representing the whole text to be

synthesized

constructed

by

concatenating

phenome

sequence is generated using the algorithm for speech

required for speech synthesizer. [ntelligibility is the ease with

IEEE Sponsored 9th International Conference on Intelligent Systems and Control (ISCO)2015

The autoregressive HMM as a probabilistic model. We then

distribution[8].

It

supports

existing

high

quality

speech

Generally

parameter

causing

over-smoothing

problem.

Global

global variance;

of

kernel

function

used

introduced.

GY

outperformed

the

and

subjective evaluation[9].

in

GP.

So

hyperparameter

hyperparameter

conventional

optimization

HMM-based

approach

by

producing speech that is as natural as that of the standard

HMM synthesis framework with its conventional settings, but

C.

The NADE proposed is inspired by Restricted Boltzmann

Machines(RBM)

and voice conversion . However, RBM does not

provide a

observation. Not knowing the exact value of partition function

makes it hard to evaluate how well the distribution estimated

by the RBM fits the observations. So NADE evolved solving

the difficulty of partition function calculation by decomposing

the joint distribution of observations into tractable conditional

distributions. Therefore, NADE was adopted as the form of

the state PDFs instead of RBM[lO].

NADE has been proved to be an efficient multivariate

binary distribution estimator and performs similarly to a large

(but intractable) RBMs on several datasets.

comparing the

the experimental results show that NADEs demonstrates better

performance than RBMs due to the accurate calculation of

gradients at training time . It can also be understood as a

special kind of autoencoder whose

output assigns

valid

model. Results have also shown the superiority of NADEs

over Gaussian mixture models in describing the distribution of

spectral envelopes as a density model and in alleviating the

over-smoothing effect at the synthesis time.. Incorporating the

dynamic features of mel-cepstra and spectral envelopes into

NADE modeling and extending the spectral features from the

spectral envelopes to the FFT spectrum[lO][ 1 1] .

Fig 2. HMM based TTS system[4]

B. Gaussian Process Regression

They

cost

feasible

partially

independent

condtional

(PlC)

based

HMM-based system.

approximation

Contributing

was

adopted

and

showed

that

GP

IEEE Sponsored 9th International Conference on Intelligent Systems and Control (ISCO)2015

DNN-based acoustic models offer an efficient and distributed

distribution

Gaussian

estimators (NADE)

models

for

the

spectral

describing

naturally-sounding

in

Density

modeling

synthesized

speech.

Mixture

statistical

limitations

parametric

mixture

in

the

of

distribution

spectral envelopes

speech

as a density model

synthesis. In order

and

to

the

over

smoothing

effect

alleviate

the

over-smoothing

effect

generated

structures.

on

the

spectral

at

alleviating

the

synthesis

time.

and improved the naturalness of the synthesized speech

significantly[7][ 10].

4

TABLE I.

Adopted

graphical models

supenonty

(20 13)[10]

with

RBM

Authors

Proposed Work

No

Contribution

Boltzmann

describing

estimation

evaluations

(2014)

algorithm

to

high-

quality

speech

synthesis.

The

algorithm

obtains

an

spectral

using

demonstrated

that

Cheap-Trick

was

superior

to

CheapTrick

fundamental

frequency (FO)

of

low-level

envelopes

RBM

HMM

autoregressive

standard

et.al

hidden Markov

synthesis

(20 13)[6]

framework, the

speech

trajectory

The

synthesis.

autoregressive

same

mean

trajectories,

for

much

better

trajectory

and

covariances, and a

synthesis

in

higher naturalness

consistent

way,

in

contrast

to

the

score.

Compared

to

the

standard

approach

autoregressive

to

statistical

HMM,

parametric

speech

has

synthesis.

the

trajectory

HMM

better

mean

trajectory

Xiang Yin

new

approach

AI

which

neural

superiority

autoregressive

NADEs

utilized

by

results

show

than

features

sampling,

the

of

over

speech

less

monotonic

and

boring.

5

Tomoki

Proposed

Koriyama

issues of a

which

et.al(20 14)

statistical speech

and

[9]

synthesis approach

hyperparameter

based Gaussian

optimization

process (GP)

outperformed

regression.

conventional

Although GP-based

HMM-based

speech synthesis

approach

subjective

performance in

evaluation.

Experimental

(2014)[ 1 1]

et.

more

acoustic

generating spectral

modeling

3

DBN

generating

the

estimation

and

HMM

uses

parameter

the

speech.

model

HMM

of

appropriate

the

Shannon

over-smoothing

are

more

to

the

synthetic

FO-manipulated

Compared

and in mitigating

state.

Matt

as

models

effect

other algorithms.

2

spectral

at each HMM

the

speech

of

density

distribution

the

distribution

represent

spectral

In

belief

envelopes

the

particular,

synthesized

(RBM)

deep

networks (DBN), to

the

the

algorithms.

and

envelope,

and

conventional

stable

accurate

temporally

machines

subjective

Morise,

achieve

Gaussian

Authors

The

presented

over

DBN

mixture model

A spectral envelope

[ 14]

variables,

of

and

including restricted

Masanori

IS

the

multiple

hidden

Sf.

Zhen-Hua

Ling,

HMM-based one.

method

uses

GV

the

by

IEEE Sponsored 9th International Conference on Intelligent Systems and Control (ISCO)2015

III.VOCODER

not exactly model the real speech waveform, and the problem

of

over-smoothing

of

the

HMM-generated

parameter

spectral estimations. For HMM-

B. TANDEM-STRAIGHT

A. STRAIGHT

STRAIGHT (Speech Transformation and Representation

the time and frequency domains, even if the system and the

designed to address.

processing. The main feature of the STRAIGHT

refilled

periodic signals that does not have a temporally varying

analysis

three types of positive-valued parameters: an interference-free

spectrogram,

an

aperiodicity

map,

and

the

the temporal variation of the logarithmic power spectra.

STRA[GHT uses FO adaptive triangular smoothing

fundamental

function

2wO

rectangular function

Smoothing function

account.

h2(w)

h [(w)

hl(w) is obtained by the convolution of

due to periodicity.

perfectly[ 13].

of the STRAIGHT model[12].

C. CHEAPTRICK

For high-quality speech synthesis a simple algorithm for

high-quality speech synthesis is introduced that is superior to

conventional ones both objectively and subjectively.

CheapTrick consists of power spectrum estimation with

Spectrum

extraction 'With

ti me freq uency

spectrum, and spectral recovery in the quefrency domain. The

algorithm

spectral

can

obtain

envelope

algorithms

by

other

an

accurate

objective

than

and

temporally

evaluations.

STRAIGHT

and

stable

Conventional

TANDEM

performance and remove the time-varying component. an

Fig 3

. The name CheapTrick comes from its cheap and tricky

IEEE Sponsored 9th International Conference on Intelligent Systems and Control (ISCO)2015

regardless of gender the results include the sound quality of

not

William

only

the

re-synthesized

speech

but

also

the

FO

robust against FO manipulation. The difference in sound

quality in female speech was smaller than that in male speech,

and this difference is associated with the objective evaluation

results in which the error in higher FO was smaller than that in

lower FO[14].

Parametric

Sythesis

viz.

global variance; and supports a simple and exact time

form,

Gaussian

Process

Regression(GPR)

conventional HMM-based approach, Neural Autoregressive

Distribution Estimators

Autoregressive

Synthesis", IEEE

Models

for

Transactions on

20 13.

[7] Heiga Zen, Andrew Senior, "Deep Mixture Density Network for

acoustic modelling in statistical parametric speech synthesis ",

Speech

Synthesis

based

on

Gaussian

Process

Processing

like

recursive

Member,"

Parametric

Speech

Senior

IV.CONCLUSION

Statistical

Byrne,

Speech Synthesis using local and global variance" , 24th IEEE

International Workshop on Machine learning and Signal processing,

20 14 .

[ 10] Zhen-Hua Ling" LiDeng, , and Dong Yu,"

Modeling Spectral

Networks

for Statistical

Parametric

Speech

IEEE

Synthesis ",

NADE on problems other than distribution estimation, in

particular on problems for which RBMs and autoencoders

are often considered., Deep Neural Networks (DNNs),

describing the distribution of spectral envelopes, making

the synthetic speech less monotonic and improved the

naturalness of the synthesized speech.

Vocoder quality is the major drawback of SPPS ,so

the recent evolving vocoder algorithms like STRAIGHT

,TANDEM-STRAIGHT,

Cheaptrick

was

comparatively

[ 1 1]

parametric

[ 12]

Ning

XU 1,

Yuan

GAOl,

Changping

Computational

Hideki

Kawaharai

of

and

Masanori

TANDEM-STRAIGHT,

Morise

a

,"Technical

speech

analysis,

Vo!. 36,

Part 5,

Masanori Morise ," CheapTrick, a spectral envelope estimator

V.REFERENCES

September 20 14.

and Audio Processing,20 13.

Jokinen ,Speech

Simon King ," An introduction to statistical parametric speech

Heiga Zen" Keiichi Tokuda, Alan W. BlackcK. , "Statistical

April 6 2009.

[5]

[ 14]

[4]

TANG,

[ 13]

statistical

International conference

foundations

[3]

Yibin

ZHU2,Qingbang HAN2,"

methods and

and synthesize speech with higher sound quality than

[2]

"

- MEASUREMENTS AND ANALYSIS OF THE ACOUSTICS OF THE ANCIENT THEATRE OF EPIDAURUSTransféré parGiuseppe Marsico
- O. TodereanTransféré parCristina Toma
- Machine Learning Bio Inform AtcsTransféré parReenaRrb
- A method for Vietnamese text normalization to improve the quality of speech synthesisTransféré parrain1024
- Casio Privia PX-5S-FAQTransféré parscott_hamlin_cfymca
- Discrete Signal Analysis and DesignTransféré pardavorko_t
- fatigueTransféré parJin-hwan Kim
- Multi-Agent Event Recognition in Structured Scenarios - Morariu, Davis - Proceedings of IEEE Conference on Computer Vision and Pattern Recognition - 2011Transféré parzukun
- MS20 Mini Drum Synthesis GuideTransféré parcatsarefunny
- 6431-17796-1-SMTransféré parAhmadSulaiman
- 5450.1.OMAE2012-83405Transféré parphantom29
- Optical Sensor Based Instrument for Correlative AnalysisTransféré parRivas TOgyion
- The Timbre Toolbox: Extracting audio descriptors from musical signalsTransféré parAlisa Kobzar
- Audacity TutorialTransféré parSeven Man
- STFTTransféré parrakesh das
- 9104_FULLTEXT.pdfTransféré parDivine Grace Burmal
- A New Method to Predict Vessel Platform Critical DynamicsTransféré parjsouza16
- Digital Signal Processing (1)Transféré parJagadeesh Kumar
- WCEE2012_5777Transféré pardce_40
- Measurement of Electrocardiographic Signals for Analysis of HeartConditions and ProblemsTransféré parAJER JOURNAL
- VST InstrumentsTransféré parKlaudjo Gaucho
- IISRT Anjana Francis (EC)Transféré parIISRT
- Agilent Experiment on CoherenceTransféré parDustin
- Cakewalk Sonar ManualTransféré parsatriendrix
- MethSurvey.pdfTransféré paraissaboulmerk
- Data Transmission by Frequency-Division Multiplexing Using the Discrete Fourier TransformTransféré parDa Ny
- text1Transféré parFilipe Soares
- SR095_02.pdfTransféré parPilar Lirio
- e95908bf491a18ee37372759aa80b6822351.pdfTransféré parmassitoro
- qs6Transféré parManuel Manu Castro

- 2015-Vocal Indicators of Emotional StressTransféré parhord72
- art%3A10.1155%2F2009%2F898576Transféré parhord72
- Prosody Phonology and PhoneticsTransféré parhord72
- 2012-11-01：【技術專題】Digital Voltage Control of Boost CRM PFC Converters.pdfTransféré parhord72
- 2008-Feature Extraction of Speech Signals in Emotion IdentificationTransféré parhord72
- 2015-An Automatic Emotion Recognizer Using MFCCs and HMMTransféré parhord72
- 2009 05 29：【技術專題】2nd Order SystemTransféré parhord72
- 2007-02-15：【技術專題】Introduction to PID ControlTransféré pardaswk
- 2012_Emotion Recognition From Speech a ReviewTransféré parhord72
- bme221labbookTransféré parhord72
- WP-0005_Rev_ATransféré parhord72
- 2016-Assessment on Speech Emotion Recognition for ASDTransféré parhord72
- 2016-Relations Among Detection of Syllable StressTransféré parhord72
- 2003-Classification of Stress in Speech Using Linear and Nonlinear FeaturesTransféré parhord72
- 2008-Speech Emotion Classification Using Machine Learning AlgorithmsTransféré parhord72
- 2009-Stress and Emotion Recognition Using Log-Gabor Filter Analysis of Speech SpectrogramsTransféré parhord72
- 2007-Stress and Emotion Classification Using Jitter and Shimmer FeaturesTransféré parhord72
- 2009-Time-Frequency Feature Extraction From Stress and Emotio Clasification in SpeechTransféré parhord72
- 1996-Feature Analysis and Neural Network-Based Classification of Speech Under StressTransféré parhord72
- 2013-Very Early Detection of Autism Spectrum Disorders Based on Acoustic Analysis of Pre-Verbal Vocalizations of 18-Month Old ToddlersTransféré parhord72
- 2015_A Machine Learning Based WSN System for Autism Activity RecognitionTransféré parhord72
- 2000_Technical Report-Impact of Speech Under Stress on Military Speech TechnologyTransféré parhord72
- 2014_Temporal Dynamics of Speech and Gesture in ASDTransféré parhord72
- Invoice Email42148307Transféré parhord72
- EEpart2Transféré parVitor Aquino
- EEpart1Transféré parhord72
- Invent to learnTransféré parsnowmanflo
- 1-s2.0-S1871519215001249-mainTransféré parhord72

- Combined AFS and DYC Control of Four-Wheel-Independent-Drive Electric Vehicles over CAN Network with Time-Varying DelaysTransféré parPsyBaba
- Modeling-an-Inverted-Pendulum.docxTransféré parMd.tanvir Ibny Gias
- SU (2018) Exponential ThinkingTransféré parberthaescobar
- Pattern RecognitionTransféré parJaya Shukla
- Choosing a Machine Learning ClassifierTransféré parjstpallav
- lec0_4pTransféré parAlex Kang
- collection wiki cybernetic bookTransféré parDrabrajib Hazarika
- SKEE 3133 Chapter 1Transféré parNadhirah Faizaluddin
- Classification using Neural Network & Support Vector Machine for Sonar datasetTransféré parseventhsensegroup
- Ancient Tamil Translator SystemTransféré parInternational Journal of Innovative Science and Research Technology
- Design of Controller for Buck Boost ConverterTransféré parAnkur Dev
- UntitledTransféré parVictor Ramirez Lopez
- Voice MouseTransféré parcvmgreat
- Chad Heiser Annotated BibliographyTransféré parCooliotion
- PID Tuning TutorialTransféré parAjie Prasetya
- The Important of Interaction in Second Language AcquisitionTransféré parEbes Eles
- A Robust Filtered-s LMS Algorithm for Nonlinear Active Noise ControlTransféré parAbhishek Aich
- Robotics Sample Questions-Transféré parSidhu Suresh
- systemTransféré parsonti11
- Neural Network Learning Without BackpropagationTransféré parShujon Naha
- SAEP-16Transféré parBiplabPal
- a fast learning algorithm for deep belief nets.pdfTransféré parChikal
- 87895historical Evolution of Responsive FacadesTransféré parIshita Singh
- CS275 Project ProposalTransféré parPranav Sodhani
- saitama[1].docTransféré parfranz jay L. camacho
- Quantitative Techniques QuestionsTransféré parmstfahmed76
- The 5th Tribe CaretTransféré parSurya Prangga
- 53Transféré parRefaat AbouZaid
- Minimalism Versus Contextualism in SemanticsTransféré parjr1234
- EducationTransféré parAthirah Razak

## Bien plus que des documents.

Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.

Annulez à tout moment.