Vous êtes sur la page 1sur 146

Thse

pour obtenir le grade de


Docteur de l'Universit de Grenoble

Spcialit Statistique
Arrt ministriel : 7 aot 2006

Prsente et soutenue publiquement par


Fethi Madani

le (//2012)

Aspects thoriques et pratiques dans l'estimation


non paramtrique de la densit conditionnelle
pour des donnes fonctionnelles

JURY
Jacques Demongeot

Univ. Joseph Fourier, Grenoble

Prsident

Pascal Sarda

Universit Le Mirail, Toulouse

Rapporteur

Elias Ould-Sad

Univ. du Littoral Cte d'Opale, France

Rapporteur

Mustapha Rachdi

Univ. P. Mends France, Grenoble

Directeur de thse

Ali Laksaci

Univ. D. Liabs Sidi Bel Abbs, Algrie

Examinateur

Idir Ouassou

ENSA, Marrakech, Maroc

Examinateur

Sophie Lambert-Lacroix

Univ. P. Mends France, Grenoble

Examinateur

Thse prpare au sein du laboratoire AGe Imagerie et Modlisation (AGIM) dans l'cole
Doctorale Mathmatiques, Sciences et Technologies de l'Information, Informatique.

Table des matires

Table des matires


0.1
0.2

Description et Contribution de cette thse . . . . . . . . . . . . . . . . . . . . 9


Contexte bibliographique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1 Introduction aux donnes fonctionnelles et l'estimation de la densit


conditionnelle
15
1.1
1.2

1.3

Donnes fonctionnelles . . . . . . . . . . . . . . . . . . . . . .
Donnes fonctionnelles vs semi-mtrique . . . . . . . . . . . .
1.2.1 Probabilits des petites boules . . . . . . . . . . . . .
1.2.2 Champs d'application des donnes fonctionnelles . . .
Quelques rsultats sur l'estimation non-paramtrique pour des
tionnels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Notations et hypothses . . . . . . . . . . . . . . . . .
1.3.2 Estimation de la loi conditionnelle . . . . . . . . . . .
1.3.3 Estimateur noyau de la densit conditionnelle . . . .
1.3.4 Estimation du mode conditionnel . . . . . . . . . . . .

. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
modles fonc. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .

.
.
.
.

15
18
21
22

.
.
.
.
.

28
28
29
30
31

2 Kernel conditional density estimation when the regressor is valued in a


semi-metric space
35
2.1
2.2
2.3

2.4

2.5
2.6

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Global and local bandwidth selection rules . . . . . . . . . . .
Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Some interpretations and examples on our hypotheses
2.3.3 Two theorems on global and local criteria . . . . . . .
Discussion and applications . . . . . . . . . . . . . . . . . . .
2.4.1 On the applicability of the method . . . . . . . . . . .
2.4.2 On the nite-sample performance of the method . . .
2.4.3 A real data application . . . . . . . . . . . . . . . . . .
Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix : Proofs of technical lemmas . . . . . . . . . . . . .
3

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

36
38
40
40
41
43
44
44
47
51
54
56

Bibliography

62

3 Functional data : Local linear estimation of the conditional density and


its application
67
3.1
3.2
3.3
3.4
3.5
3.6

Introduction . . . . . . . . . . . . . . . .
Model . . . . . . . . . . . . . . . . . . . .
Pointwise almost complete convergence . .
Uniform almost complete convergence . .
Application : Conditional mode estimation
Appendix . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

Bibliography
4

67
68
69
71
73
73

85

A fast functional locally modeled of the conditional density and mode in


functional time series
87
4.1
4.2
4.3
4.4

Introduction . . .
Main results . . .
Concludes remarks
Appendix . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

Bibliography

87
89
91
93

102

5 On the quadratic error of the functional local linear estimate of the conditional density
107
5.1
5.2
5.3
5.4
5.5

Introduction .
The model . . .
Main results . .
Some comments
Proofs . . . . .

. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
and discussion
. . . . . . . . .

Bibliography

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

107
108
109
111
112

119

6 Estimation locale linaire des paramtres conditionnels pour des donnes


fonctionnelles : Application sur des donnes simules et relles
121
6.1
6.2
6.3

Illustration du mode conditionnel . . . . . . . . . . . . . . . . . . . . . . . . . 121


Illustration de la densit conditionnelle . . . . . . . . . . . . . . . . . . . . . . 126
Application sur des donnes relles . . . . . . . . . . . . . . . . . . . . . . . . 127

7 Conclusion et Perspectives

133

8 Bibliographie gnrale

135

7.1
7.2

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

Rsum

Dans cette thse, nous nous intressons l'estimation non paramtrique de la densit conditionnelle d'une variable rponse relle conditionne par une variable explicative
fonctionnelle de dimension ventuellement nie.
Dans un premier temps, nous considrons l'estimation de ce modle par la mthode
double noyaux. Nous proposons une mthode de slection pour le choix du paramtre de
lissage (global ou local) des paramtres de lissage et nous montrons son optimalit asymptotique dans le cas o les observations sont indpendantes et identiquement distribues. Le
critre adopt est issu du principe de validations croises. Dans cette partie nous comparons
galement les deux types de choix (local et global).
Dans la deuxime partie, nous estimons la densit conditionnelle par la mthode des
polynmes locaux. Sous certaines conditions, nous tablissons des proprits asymptotiques
de cet estimateur tel la convergence presque complte et la convergence en moyenne quadratique dans le cas o les observations sont indpendantes et identiquement distribues. Nous
traitons aussi le cas o les observations sont de type - mlangeantes, dont on montre la
convergence presque complte (avec vitesse) de l'estimateur propos. Les rsultats obtenus
sont galement illustrs par des exemples sur des donnes simules montrant l'applicabilit
rapide et facile de cette mthode d'estimation dans le cadre fonctionnel.

Summary

In this thesis, we consider the problem of the nonparametric estimation of the conditional
density when the response variable is real and the regressor is valued in a functional space.
In the rst part, we use the double kernels method as a estimation method where we
focus on the choice of the smoothing parameters. We construct data a driven method to select
optimally bandwidths parameters. As main results, we study the asymptotic optimality
of this selection's method in the case where observations are independent and identically
distributed. Our selection rule is based on the classical cross-validation procedure and it
deals with the both (global or local ) choice. The nite sample performance of our approach
is illustrated by some simulation results where we give a comparison between the two types
of choice (local or global).
In the second part, we estimate the conditional density by the local linear method.
Under some general conditions, we establish the almost complete convergence of the proposed
estimator (with rate) in the both cases ( i.i.d. case and the -mixing case) . As application,
we use the conditional density estimator to estimate the conditional mode estimation and
we derive the same asymptotic proprieties.
Further, we study the quadratic error of this estimator by giving the asymptotic expansion of the exact expression involved in the leading in the bias and variance terms.

Liste des travaux

Publications dans des revues comits de lecture


1. J. Demongeot, A. Laksaci, F. Madani and M. Rachdi. Local Linear estimation of the
conditional density for functional data. C. R., Math., Acad. Sci. Paris, 348, Issues
15-16, Pages 931-934, (2010).
2. J. Demongeot, A. Laksaci, F. Madani and M. Rachdi. Functional data : local linear estimation of the density and its application. Statistics, DOI : 10.1080/02331888.2011.568117
( paratre en 2012).
3. J. Demongeot, A. Laksaci, F. Madani and M. Rachdi (2011).

A fast functional locally

modeled conditional density and mode for functional time-series.

Recent Advances in
Functional Data Analysis and Related Topics Contributions to Statistics, PhysicaVerlag/Springer, 2011, 85-90, DOI : 10.1007/978-3-7908-2736-1_13

4. A. Laksaci, F. Madani and M. Rachdi. Kernel conditional density estimation when the
regressor is valued in a semi-metric space. Accept pour publication dans : Communications Statistics-Theory and Methods, 2012.

Communications dans des congrs


1.

Local bandwidth selection for kernel conditional density estimation when the regressor

2.

Local bandwidth selection for kernel conditional density estimation when the regressor

3.

Some asymptotics for conditional parameters when the data are curves. International

is valued in a semi-metric space.

Colloque international de Statistique des processus


et Applications, CISPA 2008, Constantine : 18-19 octobre 2008.

is valued in a semi-metric space.

Journes de Statistique, Modlisation et Application


JSMA'08, Alger : 22-23-24 novembre 2008.

Conference on Statistics,

Theory and Practice, Sidi Bel-Abbs, 10-12 avril 2010.

Introduction gnrale

0.1

Description et Contribution de cette thse

La statistique non paramtrique connat un grand essor chez de nombreux auteurs et dans
dirents domaines. En eet, celle-ci possde un champ d'application trs large permettant,
ainsi, l'explication de certains phnomnes mal modliss jusqu' prsent, tels que les sries
chronologiques, et prdire les ralisations futures.
Il faut mentionner, par ailleurs, que les progrs atteints dans les procds de recueil de donnes ont permis d'orir la possibilit aux statisticiens de disposer de plus en plus souvent
d'observations de variables dites fonctionnelles, c'est--dire de courbes. Ces donnes sont
modlises comme tant des ralisations d'une variable alatoire prenant ses valeurs dans
un espace abstrait de dimension ventuellement nie. Dans cette thse, nous nous intressons l'estimation non paramtrique de la densit conditionnelle et les paramtres qui en
dcoulent, comme le mode conditionnel, pour des variables alatoires fonctionnelles.
Dans le but de prsenter les travaux que nous avons ralis durant la ralisation de cette
thse, celle-ci est organis comme suit :
Le chapitre suivant, est un chapitre Introductif, qui prsente une tude bibliographique des
problmes lis l'analyse statistique des variables fonctionnelles ainsi qu' l'estimation non
paramtrique des paramtres conditionnels que ce soit dans le cadre de dimension nie ou
innie. Ensuite, dans le chapitre 1, nous abordons l'tat de l'art des variables fonctionnelles
et leurs champs d'application. De plus, an de rendre la lecture de cette thse simple, nous
exposons les rsultats obtenus, dans la littrature, concernant l'estimation de la densit et
du mode conditionnels, tout en fournissant et discutant les hypothses qui ont permis d'obtenir ces rsultats.
Dans le chapitre 2, nous commenons par construire et tudier les proprits asymptotiques
de l'estimateur noyau de la densit conditionnelle quand la variable explicative est valeurs dans un espace norm. Ensuite, nous proposons deux critres (le premier global et
le second local ) de choix automatique du paramtre de lissage an de rendre ecace notre
9

10

Chapitre 0. Introduction gnrale

estimation. Enn, nous tablissons les rsultats thoriques ainsi que pratiques d'optimalit
asymptotique du paramtre slectionn.
Une suite logique de ce chapitre veut que l'on amliore les rsultats obtenus. C'est pourquoi le chapitre 3 est consacr l'tude d'une mthode d'estimation non paramtrique de
la densit conditionnelle d'une variable scalaire Y sachant une variable fonctionnelle X i.e.,
une variable valeurs dans un espace semi-mtrique. Cette mthode est base sur une estimation par polynmes locaux. Une fois la construction de notre estimateur, l'image de ce
qui se fait en dimesnion nie, est acheve, nous nous sommes attel tablir sous certaines
conditions, les convergences ponctuelle et uniforme presques compltes ainsi que les vitesses
de convergence de cet estimateur. Nous avons utilis, ensuite, les rsultats obtenus an de
dterminer les proprits asymptotiques de l'estimateur local linaire du mode conditionnel.
Le chapitre 4 quant lui, est destin l'tude, sous certaines conditions de dpendance
faible (mlange fort), de la convergence forte de l'estimateur du chapitre prcdent, ainsi
qu' la prvision d'une srie temporelle par l'estimation du mode conditionnel.
Tandis que dans le chapitre 5, nous avons tabli les vitesses de convergence dans l'estimation en moyenne quadratique de l'estimateur tudi dans les deux chapitres prcdents, le
chapitre 6 est consacr la mise en application de ces rsultats pour des donnes simules
puis pour des donnes relles.
Enn, dans le chapitre 7 nous exposons des perspectives de recherche permettant d'tendre
et parfois de gnraliser les rsultats de cette thse.

0.2

Contexte bibliographique

L'analyse statistique pour des variables fonctionnelles a pris une ampleur considrable ces
dernires annes. Ce domaine de recherche en statistique connat actuellement un grand
succs auprs de la commuaut des statisticiens. La preuve de cet intrt est la publication
de nombreuses publications scientiques sur ce sujet ainsi que les nombreuses applications
pratiques auquelles ces donnes s'y prtent. C'est le cas, notamment, lorsque l'on s'intresse aux techniques d'estimation quand les donnes sont fonctionnelles (cf. Kneip et Gasser
(1992), Ramsay et Li (1996), Rice et Silverman (1991)). Il existe, en fait, deux principales
raisons l'engouement suscit par le traitement statistique des variables fonctionnelles : (1)
cela permet d'utiliser et de dvelopper des outils thoriques performants, (2) cela ore un
norme potentiel en terme d'applications, notamment, en imagerie, en agro-alimentaire, en
reconnaissance de formes, en gophysique, en conomtrie, en environnement, . . .. De plus,
cette thmatique de recherche couvre tous les domaines concerns par la comunaut de statisticiens : des plus appliqus aux plus thoriques sans prdominance de l'une sur l'autre.
D'abord, signalons les eorts considrables qui ont t dploys pour la gnralisation des

0.2.

Contexte bibliographique

11

rsultats connus et tablis en dimension nie grce l'ouvrage de Ferraty et Vieu (2006).
Celui-ci est devenu une rfrence en statistique non-paramtrique pour des donnes fonctionnelles. Notons que, l'analyse des donnes statistiques fait toujours intervenir le facteur
dimension dans le comportement asymptotique des estimateurs tablis. D'autant plus qu'il
est connu que les vitesses de convergence se dgradent au fur et mesure que la dimension
augmente. Rappelons ici que les mthodes bases sur la dicrtisation des donnes fonctionnelles ont t adoptes pour adapter les rsultats de la statistique non-paramtrique au cas
de donnes multivaries.
Vu l'avance qu'a connu l'outil informatique dans la faon de rcolter les donnes, d'autres
alternatives sont devenues obligatoires an de surmonter cette dicult et d'tudier les donnes dans leurs propre dimensions.
D'ailleurs, le traitement des donnes en tant que courbes remonte aux annes soixantes
lorsque plusieurs tudes dans direntes disciplines se sont confrontes des observations
sous forme de trajectoires (cf. entre autres, Holmstrom (1961) en climatologie, Deville (1974)
en dmographie, Molenaar et Boomsma (1987) puis Kirkpatrick (1989) en gntique,...)
Il est bien connu qu'en statistique, le modle de rgression (paramtrique ou non-paramtrique)
en dimension nie, constitue un champ de recherche et d'application trs important, nous
renvoyons ici aux travaux de Collomb (1981, 1985) qui ds le dbut des annes quatre-vingt
font dj tat de nombreux dveloppements varis sur ce thme. Il convient, galement, de
se rfrer aux ouvrages de Hrdle (1990), Bosq et Lecoutre (1987) et Schimek (2000) qui
dressent un bilan presque exhaustif sur les diverses techniques en la matire. Ces champs
de la recherche en statistique sont encore potentiellement porteurs la fois au niveau des
dveloppements thoriques et cause des multiples possibilits d'application.
Par ailleurs, les applications lies au modle de rgression ont une place trs importante
dans la prvision des sries chronologiques issues de direntes disciplines telles que la communication, les systmes de contrle, la climatologie ainsi que l'conomtrie. Il s'agit, donc,
de domaines de prvision pour lesquels les premiers rsultats consquents furent implants
par Collomb (1981) et Robinson (1983). Ce domaine de la statistique connat des dveloppements continus, comme en tmoignent les nombreuses ralisations (cf. Gyet al. (1989),
Yoshihara (1994), Hrdle et al. (1997) et Bosq (1991),...)
Commenons par signaler que, l'estimation de la loi de probabilit ou de la fonction de
distribution joue un rle important dans l'estimation d'autres paramtres fonctionnels. Les
premiers travaux concernant l'estimation de la loi de probabilit des variables fonctionnelles
ont t raliss par Geroy (1974), Gasser et al. (1998). Notons aussi que, Cadre (2001) s'est
intress l'tude de la mdiane d'une distribution pour une variable fonctionnelle valeurs
dans un espace de Banach.

12

Chapitre 0. Introduction gnrale

Nous faisons remarquer que les paramtres conditionnels, tels que la distribution conditionnelle, la densit conditionnelle, le mode conditionnel, le quantile conditionnel et la fonction
de hasard conditionnelle, sont largement tudis en dimension nie. A travers ces paramtres, la prvision dans les modles non-paramtriques ore une vritable alternative
la rgression non paramtrique. Il faut dire qu'en dimension nie, il existe une litrature
abondante pour ces paramtres conditionnels. Roussas (1968) fut le premier tablir des
proprits asymptotiques pour l'estimateur noyau de la distribution conditionnelle, pour
des donnes markoviennes, pour lesquelles il a montr la convergence en probabilit. Youndj
(1993) quant lui, il s'est intress l'tude de la densit conditionnelle pour des donnes
dpendantes ou indpendantes. On peut, notamment, citer le travail men par Laksaci et
Yousfate (2002) et dans lequel ils ont tabli, pour un processus markovien stationnaire, la
convergence en norme Lp de l'estimateur noyau de la densit conditionnelle.
Vu l'intrt que revt l'estimation du mode et du mode conditionnel dans le domaine de
la prvision, plusieurs auteurs s'en sont intresss. Nous pouvons citer par exemple, Perzen
(1962) qui a t l'un des premiers considrer le probme de l'estimation du mode d'une
densit de probabilit univarie. Il a montr que, sous certaines conditions, l'estimateur du
mode obtenu en maximisant un estimateur noyau est convergent et est asymptotiquement
normal quand les donnes sont indpendantes et identiquement distribues (i.i.d). Les techniques de base qu'il a developp pour cette tude ont t reprises par de nombreux auteurs
dans le cas de la densit de probabilit ou de la rgression. Nous n'avons mentionn ici que
les principales contributions, en ayant essentiellement en vue la normalit asymptotique.
Notons aussi que Nadaraya (1965) et VanRyzin (1969) ont dmontr la convergente forte de
l'estimateur du mode mis en place par Perzen, alors que Samanta (1973) et Konakov (1974)
ont tudi des versions multivaries de cet estimateur. Les travaux d'Eddy (1980 et 1982),
quant eux, ils ont permis d'aaiblir les conditions susantes de normali asymptotique qui
aurait t donnes initialement. Par ailleurs, grce des conditions locales, Romano (1980),
a aaibli les hypothses prcedentes. Notons aussi que Vieu (1996) a compar deux estimateurs noyau du mode dont le premier est dni partir du maximun d'un estimateur de la
densit de probabilit et le second partir du zero d'un estimateur de la drive de celle-ci.
Ce travail a t repris par Rachdi et Sabre (2000) an d'estimer le mode de la densit de
probabilit quand les donnes sont entaches d'erreurs additives (les problmes de dconvolution). Il y a aussi, entre autres, Louani (1998) qui a tabli la normalit asymptotique pour
la densit et ses drives avec application au mode.
Concernant le mode conditional, les proprits de convergence et de normalit asymptotiques ont t tablies par Samanta et Thaavaneswaran (1990) dans le cadre de donnes
indpendantes et identiquement distribues, alors que des conditions de convergence dans
le cas de donnes -mlangeantes ont t tablies par Collomb et al. (1987), dans le cas
de donnes -mlangeantes par Ould-Sad (1993), dans le cas de donnes ergodiques par
Rosa (1993) et Ould-Sad (1997). De leur cot, Quintela et Vieu (1997) ont estim le mode
conditionnel comme tant le point annulant la drive d'ordre un de l'estimateur de la densit conditionnelle et ils ont tabli la convergence presque complte de cet estimateur sous

0.2.

Contexte bibliographique

13

la condition d'-mlangeance. Berlinet et al. (1998), quant eux, ils ont prsent des rsultats sur la normalit asymptotique des estimateurs convergents du mode conditionnel,
indpendamment de la structure de dpendance des donnes avec une application au cas
d'un processus stationnaire -mlangeant. Tandis que Louani et Ould-Sad (1999) ont tabli la normalit asymptotique dans le cas de donnes fortement mlageantes et dans le cas
de donnes censures. Ould-Sad et Cai (2005), quant eux, ils ont tabli la convergence
uniforme sur un compact.
Par ailleurs, dans le cadre de donnes valeurs dans un espace de dimension eventuellement
nie, les travaux de Ramsay et Silverman (2002 et 2005) constituent un recueil important de
mthodes statistiques, principalement du point de vue pratique, mais des dveloppements
thoriques peuvent tre trouvs dans Bosq (2000) et Ferraty et Vieu (2006).
Une contribution qui s'avre importante dans la construction de l'estimateur des paramtres
dans le modle de rgression linaire est celle qui est due Cardot et al. (1999). Elle consiste
en la construction d'un estimateur pour l'oprateur de rgression partir des proprits
spectrales de l'estimateur empirique de l'oprateur de covariance de la variable explicative
fonctionnelle. Ils ont tabli, galement, les convergences en probabilit et presque sre de
l'estimateur construit. Ce travail a t revisit dans Cuevas et al. (2002). Dans celui-ci,
une tude des proprits asymptotiques de l'estimateur de l'oprateur de rgression linaire
quand la variable explicative est fonctionnelle dterministe et la rponse est fonctionnelle
alatoire a t conduite. Cardot et al. (2004a, 2004b et 2005) ont propos et tudi des
mthodes d'estimation linaire de l'oprateur de rgression par quantiles conditionnels. Une
autre mthode d'estimation des quantiles conditionnels partir de l'estimation noyau de
la fonction de rpartition conditionnelle a galement t propose et tudie par Ferraty et
al. (2005), Ferraty et al. (2006), Ferraty et Vieu (2006a) et Ezzahrioui (2007). D'autres mthodes ont t proposes an d'estimer la rgression par le mode conditionnel. Celles-ci sont
bases sur l'estimation de la densi conditionnelle par des estimateurs noyau (cf. Ferraty
et al. (2005), Ferraty et Vieu (2006a), Ferraty et al. (2006), Dabo-Niang et Laksaci (2006)
et Ezzahrioui (2007)).
Donc, l'estimation de la densit conditionnelle en dimension ventuellement nie a connu
un grand intrt en statistique. Ce paramtre fonctionnel intervient pour l'estimation des
quantiles, du mode ou de la fonction de hasard.
Signalons, qu'en dimension innie, le mode conditionnel a connu tout rcemment un intrt
croissant, malgr le peu de rsultats disponibles dans la littrature. Dans ce contexte, les
premiers travaux ont t raliss par Ferraty et al. (2006). Ils ont montr, sous des conditions
de rgularit de la densit conditionnelle, la convergence presque complte des estimateurs
noyau de la densit conditionnelle et du mode conditionnel et ont tabli leurs vitesses de
convergence. Notons aussi qu'une application de leurs rsultats aux donnes issues de l'industrie agro-alimentaire a t prsente. Dans le mme contexte, Dabo-Niang et al. (2004)
ont tudi un estimateur non paramtrique du mode de la densit d'une variable explicative

14

Chapitre 0. Introduction gnrale

valeurs dans un espace vectoriel semi-norm, de dimension eventuellement nie. Ils ont
tabli la convergence presque sre avec une application de ce rsulat au cas o la mesure
de probabilit de la variable explicative vrie une condition de concentration. On trouve
aussi dans Dabo-Niang et Laksaci (2007) l'tude d'un estimateur noyau du mode de la
distribution d'une variable relle Y conditionne par une variable explicative X , valeurs
dans un espace semi-mtrique. Ils ont tabli la convergence en norme Lp de l'estimateur et
ils ont montr que les rsultats asymptotiques tablis sont lis aux probabilits des petites
boules de la loi de la variable explicative ainsi que la rgularit de la densit conditionnelle.
Notons galement, qu'il y a deux autres paramtres fonctionnels qui sont d'une grande importance savoir, le quantile et le quantile conditionnel. Ces paramtres proposent une
alternative majeure dans la prvision, grce leur caractre robuste (cf. par exemple, les
travaux de Cardot et al. (2004a, 2004b, 2005 et 2006), Ferraty et al. (2005b) et (2006)).
Pour terminer ce rapide tour d'horizon, non exhaustif, armons que d'un point de vue
thorique, l'utilisation de variables alatoires fonctionnelles introduit une dicult
supplmentaire puisqu'on ne peut plus se permettre de manipuler la fonction de densit
de probabilit aussi facilement que dans le cas rel ou encore dans le cas vectoriel. On est
donc amen donner une criture probabiliste qui nous conduit des hypothses agissant
directement sur la distribution de la variable alatoire fonctionnelle plutt que sur la densit,
comme dans le cas de dimension nie.

Chapitre 1
Introduction aux donnes
fonctionnelles et l'estimation de la
densit conditionnelle

Dans ce chapitre, nous prsentons, d'abord, quelques notions sur l'analyse des donnes fonctionnelles et son champ d'application, et puis, les rsultats existants dans la littrature sur
l'estimation de la densit conditionnelle.

1.1

Donnes fonctionnelles

Au cours de ces dernires annes, la branche de la statistique consacre l'analyse des


donnes fonctionnelles a connu un rel essor tant en termes des dveloppements thoriques
et mthodologiques que de la diversication des domaines d'application. Ceci revient aux
progrs qu' connu l'outil informatique au niveau des capacits de stockage qui permettent
d'enregistrer des donnes de plus en plus volumineuses. Ainsi, un trs grand nombre de variables peuvent tre observes pour l'tude d'un mme phnomne.
Une fois la ralit des variables fonctionnelles est prsente, on s'intresse aux aspects de
modlisation les concernant. Dans ce but, nous donnons quelques dnitions permettant de
xer un vocabulaire. Rappelons, tout d'abord, qu'une variable alatoire fonctionnelle est
tout simplement une variable alatoire valeurs dans un espace de dimension ventuellement nie que nous noterons F . Par exemple, cet espace F peut tre un espace de fonctions,
d'oprateur linaires, . . .. Selon la terminologie en vigueur dans la littrature, on parle aussi
bien de variables alatoires fonctionnelles que de donnes fonctionnelles, ce qui englobe notamment tout ce qui concerne l'analyse statistique de courbes.
15

16

Chapitre 1. Revue bibliographique sur l'analyse des donnes fonctionnelles

Dnition 1.1.1.

On appelle modle fonctionnels, tout modle prenant en compte au moins

une variable alatoire fonctionnelle (v.a.f ).

Dnition 1.1.2.

Un modle fonctionnel est dit paramtrique si

nombre ni de paramtres appartenant

F,

est indexable par un

n'est qu'un sous-ensemble de

l'ensemble des fonctions dnies sur l'espace fonctionnel

F
IFF
F (IFF

et valeurs dans l'espace

F ).

Un modle fonctionnel est dit non-paramtrique dans le cas contraire.

De nombreux travaux ont t ddis l'tude des modles impliquant des variables alatoires multivaries. Ce domaine de la statistique connait encore une activit de recherche
soutenue. Cependant, les rcentes innovations ralises sur les appareils de mesure et les mthodes d'acquisition ainsi que l'utilisation de moyens informatique perfectionns permettent
souvent de rcolter des donnes discrtises sur des grilles de plus en plus nes, ce qui les
rend fondamentalement fonctionnelles : c'est par exemple le cas en mtorologie, en mdecine, en imagerie satellite et dans de nombreux autre domaines d'tudes. C'est une des
raisons pour lesquelles un nouveau champ de la statistique ddi l'tude de donnes fonctionnelles, a soulev un grand d au dbut des annes quatre-vingt, sous l'impulsion des
travaux de Grennder (1981), Dauxois et al. (1982) et Ramsay (1982). En fait, ce domaine
a t popularis par Ramsay et Silverman (1997), puis par les dirents ouvrages de Bosq
(2000), Ramsay et Silverman (2002, 2005) et Ferraty et Vieu (2006). Notons que c'est un des
domaines de la statistique qui est en plein essor comme en tmoignent les travaux publis
et/ou cits dans des revues de premiers rangs, , etc.
De plus, mme si les donnes dont dispose le statisticien ne sont pas de nature fonctionnelle,
celui-ci peut tre amen tudier des variables fonctionnelles construites partir de son
chantillon initial. Un exemple classique est celui o l'on observe plusieurs chantillons de
donnes relles indpendantes et o l'on est ensuite amens comparer les densits de ces
dirents chantillons ou bien considrer des modles o elles interviennent (cf. Ramsay et
Silverman, 2002). Dans le contexte particulier de l'tude des sries temporelles, l'approche
introduite par Bosq (1991) fait apparatre une suite de donnes fonctionnelles dpendantes
qui modlisent la srie chronologique observe. Cette approche consiste tout d'abord considrer le processus non pas travers sa forme discrtise mais comme tant un processus
temps continu puis le dcouper en un chantillon de courbes successives.

1.1.

Donnes fonctionnelles

17

Remarquons que la principale source de dicult, que ce soit d'un point de vue thorique
que pratique, provient du fait que les observations de ce type de variables sont supposes
appartenir un espace de dimension innie.
Les tous premiers travaux dans lesquels nous retrouvons l'ide de considrer les donnes
fonctionnelles sont relativement anciens. Rao (1958) et Tucker (1958) ont envisag l'analyse
en composantes principales et l'analyse factorielle pour des donnes fonctionnelles, en considrant explicitement les donnes fonctionnelles comme un type particulier de donnes. Par
la suite, Ramsay (1982) a dgag la notion de donnes fonctionnelles et a soulev la question
de l'adaptation des mthodes utilises en analyse statistique de donnes multivaries (en
dimension nie) au cadre fonctionnel.
A partir de l, les travaux portant sur la statistique des donnes fonctionnelles ont commenc
se multiplier pour nalement aboutir, aujourd'hui, des ouvrages devenus des rfrences en
la matire. Par exemple, les monographies de Ramsay et Silverman (2002 et 2005), Ferraty
et Vieu (2006) prsentent une collection importante de mthodes statistiques spciques
aux variables fonctionnelles dans les cadres linaire et non linaire. De mme, Bosq (1991)
a contribu au dveloppement de mthodes statistiques permettant l'analyse de variables
alatoires fonctionnelles dpendantes (processus autorgressifs hilbertiens). Citons aussi, les
travaux de Cuevas et al. (2002) qui se sont intresss au problme de la rgression linaire
d'une variable fonctionnelle sur un ensemble de donnes fonctionnelles dterministes xed
functional design. D'autre part, Benhenni et al. (2010) ont considr le problme d'estimation de l'oprateur de rgression quand les donnes fonctionnelles sont dterministes et les
erreurs sont corrles. Cardot et al. (2005) quant eux, ils ont propos un estimateur non
paramtrique de l'oprateur de rgression quand le facteur prdictif est rel et la variable
rponse est une courbe.
Par ailleurs, l'tude du modle de rgression non linaire est beaucoup plus rcente que celle
du cas linaire. Ferraty et Vieu (2000) ont tabli les premiers rsultats sur l'estimation non
paramtrique de l'oprateur de rgression non linaire. Ces rsultats ont ensuite t prolongs par Ferraty et al. (2002) en traitant le cas de donnes dpendantes et en tablissant des
convergences fortes de l'estimateur noyau de la rgression.
A leur tour, Niang et Rhomari (2003) ont tudi la convergence en norme Lp de l'estimateur
de l'oprateur de rgression et ont exprimont leur rsultats la discrimination et la
classication de courbes. Rachdi et al. (2008) ont trait le problme d'estimation non paramtrique de l'oprateur de rgression quand les erreurs vrient des proprits de longue
mmoire. Ils ont tabli aussi la convergence en probablilit ponctuelle puis uniforme de l'estimateur noyau opratoriel. Une autre contribution base sur la construction d'un critre
de choix automatique et optimal du paramtre de lissage pour l'estimateur de la rgression
quand le rgresseur est de type fonctionnel a t mene par Rachdi et Vieu (2005, 2007).
Tandis qu'El Methni et Rachdi (2011) ont tabli l'estimation locale d'une moyenne pondres de l'oprateur de rgression pour des donnes fonctionnelles dterministes. Ouassou et

18

Chapitre 1. Revue bibliographique sur l'analyse des donnes fonctionnelles

Rachdi (2010) ont amlior ensuite cette estimation par l'estimateur de Stein.
Rappelons que, le au de la dimension rend les vitesses de convergence trs faibles. Une
manire de tenter de remdier cela est de chercher une topologie qui restitue de faon
pertinente les proximits entre les donnes. Cela peut tre fait, par exemple, l'aide d'une
semi-mtrique de projection base sur les composantes principales fonctionnelles, les dcompositions selon une base de Fourier, d'ondelettes, de splines, . . .. Lorsque la variable explicative est valeurs dans un espace de Hilbert sparable, Ferraty et Vieu (2006a, Lemme
13-6) ont montr que l'on peut dnir de manire gnrale une semi-mtrique de projection qui permet de se ramener des probabilits de petites boules de type fractal (i.e.
C, > 0, Fx (h) Cx h quand h 0). On condense ainsi les donnes en rduisant leur
dimension et on contourne ainsi le au de la dimension. En eet, on revient des vitesses
de convergence en puissance de n. Dans d'autres situations, on peut tre confront des
donnes trs lisses (comme les courbes spectromtriques de masse donnes dans la Figure
1.2). Dans ce cas de gure, il peut tre intressant d'utiliser plutt des semi-mtriques bases sur les drives (cf. Ferraty et Vieu, 2006a). Ces semi-mtriques peuvent galement tre
utiles lorsque les donnes prsentent un shift vertical articiel (i.e non informatif vis--vis
des rponses). Elles ont alors pour eet d'liminer ces dcalages verticaux qui nuisent la
qualit de la prdiction. Enn, on peut envisager d'autres types de phnomnes comme,
titre d'exemple, les dcalages horizontaux (cf. Dabo-Niang et al., 2006).
Face la grande diversit des semi-mtriques qu'on peut construire, on peut se poser la
question sur comment choisir la semi-mtrique la mieux adapte au donnes. Ceci va motiver
l'tude du problme de construction d'une semi-norme sur F .

1.2

Donnes fonctionnelles vs semi-mtrique

D'une faon gnrale, l'analyse de tout type de donnes ncessite la dnition de la notion
de distance entre celles-ci. Il est bien connu que dans un espace vectoriel de dimension nie
toutes les mtriques sont quivalentes. Ceci n'est plus le cas quand l'espace d'observations
est de dimension innie. C'est pourquoi le choix de la mtrique (et donc de la topologie
associe) est un lment crucial pour l'tude des variables alatoires fonctionnelles.
De nombreux auteurs dnissent ou tudient les variables fonctionnelles comme tant des
variables alatoires de carrs intgrables c'est--dire valeurs dans L2 (0, 1) (cf. notamment,
Crambes et al., 2007) ou plus gnralement dans un espace de Hilbert (cf. par exemple,
Preda, 2007), ou de Banach (cf. Cuevas et Fraiman, 2004) ou mtrique (cf. Dabo-Niang et
Rhomari, 2003). Notons d'ailleurs que Bosq (2000), quant lui, il a considr des chantillons
de variables fonctionnelles dpendantes et valeurs dans un espace de Hilbert ou de Banach.
Ces observations fonctionnelles ont t obtenues suite au dcoupage d'un mme processus
temps continu. De plus, parmi les semi-mtriques, disponibles dans la littrature, il est
souvent plus intressant de considrer des semi-mtriques permettant un ventail plus large

1.2.

Donnes fonctionnelles vs semi-mtrique

19

de topologies possibles que l'on pourra choisir en fonction de la nature des donnes et du
problme traiter.
Signalons que, l'intrt d'utiliser une semi-mtrique plutt qu'une mtrique est que cela peut
constituer une alternative aux problmes lis la grande dimension des donnes. En eet,
on peut considrer une semi-mtrique qui soit dnie partir d'une projection de nos donnes fonctionnelles sur un espace de dimension plus petite : (1) que ce soit en ralisant une
analyse en composantes principales fonctionnelles de nos donnes (cf. Dauxois et al. (1982),
Besse et Ramsay (1986), Hall et Hosseini-Nasab (2006) et Yao et Lee (2006)) ou (2) en les
projetant sur une base de cardinal ni (ondelettes, splines, . . .). Cela permet de rduire la
dimension des donnes et ainsi d'augmenter la vitesse de convergence des mthodes utilises
tout en prservant la nature fonctionnelle des donnes. D'ailleurs, on peut choisir la base sur
laquelle on projette en fonction des connaissances que l'on a de la nature de la donne fonctionnelle. Par exemple, on pourrait choisir la base de Fourier si on suppose que la variable
fonctionnelle observe est priodique. On peut se rfrer, pour cela, Ramsay et Silverman
(1997 et 2005) ou Rossi et al. (2005) pour une discussion plus complte sur les direntes
mthodes d'approximation par projection de donnes fonctionnelles. Aussi, une discussion
plus approfondie de l'intrt d'utiliser dirents types de semi-mtriques est prsente dans
le livre de Ferraty et Vieu (2006) (paragraphes 3 et 4) ainsi que dans le travail ralis par
Benhenni et al. (2007).
Pour ces direntes raisons, nous prsentons ici quelque pistes (cf. Ferraty et Vieu, 2006)
permettant de construire une semi-mtrique. En fait, nous prsentons, dans ce qui suit,
seulement deux familles de semi-mtriques mais, naturellement, beaucoup d'autres peuvent
tre construites : la premire est bien adapte aux courbes dites bruites et aux courbes
irrgulires tandis que la deuxime sera plutt employe pour le traitement de courbes tout
fait lisses (ou rgulires).
Pour ce faire, nous commenons par considrer un chantillon de n courbes X1 , . . . , Xn indpendantes et identiquement distribues de la variable alatoire fonctionnelle
X = {X(t), t [0, 1]}.
Notons que, l'analyse en composantes principales classique (ACP) est considre comme
tant un outil trs utile pour la description et la visualisation des donnes dans un espace
de dimension plus petite. Cette technique a t prolonge aux donnes fonctionnelles et plus
rcemment employe pour dirents buts statistiques. Nous verrons que le FPCA (Functional
Principal Components Analysis) est devenue un bon outil pour calculer des proximits entre
les courbes dans un espace de dimension rduite. Ainsi, partir de la semi-mtrique classique
L2 , nous pouvons construire une classe paramtrique de semi-normes, que nous noterons
SMPCA (Semi-Mtrique base sur l'ACP), de la manire suivante :
v
u q (
)2
u
ACP
t
xq
=
x(t)vk (t) dt pour tout x F
k=1

20

Chapitre 1. Revue bibliographique sur l'analyse des donnes fonctionnelles

o v1 , ..., vq sont les fonctions propres orthonormales de l'oprateur de covariance :

X (s, t) = E(X(t)X(s))
associes aux valeurs propres 1 2 q .
Signalons aussi que, l'entier q n'est pas un paramtre de lissage, mais plutt un paramtre
de rglage indiquant le niveau de rsolution auquel le problme est considr.
On en dduit une famille de semi-mtriques comme suit :
v
u q (
)2
u
ACP
t
dq (Xi , x) =
(Xi (t) x(t))vk (t) dt

(1)

k=1

Notons que, l'approximation de l'intgrale dans la formule (1) peut se faire comme suit (cf
Castro et al., 1986) :

(Xi (t) x(t))vk (t)dt w

wj (Xi (tj ) x(tj ))vk (tj )

j=1

o les poids wj = tj tj1 et la grille (t1 , ..., tJ ) est constitue de J valeurs quidistantes
dans [0,1].
Si nous discrtisons deux courbes xi et xi alors, la quantit dACP
(xi , xi ) sera approxime
q
par sa version empirique :
v

2
u
u
q
J

dACP
(xi , xi ) = t
wj (xi (tj ) xi (tj ))vk (tj )
q
k=1

o {xi = (xi (t1 ), ..., xi (tJ ))t )}i=1,...,n

j=1

et {xi = (xi (t1 ), ..., xi (tJ ))t )}i =1,...,n

En eet, cette famille de semi-mtriques peut tre utilise seulement si les donnes sont
quilibres (les courbes sont observes aux mmes points). Ceci pourrait apparatre comme
un inconvnient pour l'usage d'un tel genre de semi-mtriques mais, leur principal avantage
est d'tre utilis mme si les courbes son irrgulires. En prenant l'exemple de la prvision
de la concentration maximale de l'ozone au ple nord pendant une journe sur quatre annes successives (de 2000 2004), tant donn la courbe de cette concentration pendant la
journe prcdente (cf. Figure 1.4), nous avons choisi la norme L21,24 calcule, en utilisant ce
genre de semi- mtriques.
Une autre manire de construire une autre famille de semi-mtriques est base sur les drives, que nous allons noter par SMD (Semi-Mtrique base sur la Drive). Elle est dnie

1.2.

21

Donnes fonctionnelles vs semi-mtrique

de la manire suivante :

D
dSM
(xi , xi )
q

=
0

(q)

(q)

(xi (t) xi (t))2 dt

(2)

pour deux courbes observes xi et xi , o x(q) dsigne la drive d'ordre q de x.


D (x, 0) concide avec la norme classique sur l'espace L2 de x.
Notons, par ailleurs, que dSM
0
De plus, on peut aussi utiliser l'approximation de chaque courbe par des B-splines (cf. De
Boor (1978) ou Schumaker (1981)) et ainsi les drives successives seront directement calcules en direnciant plusieurs fois leurs formes analytiques. Ainsi, le calcul de l'intgrale
dans (2) peut tre eectu en utilisant la mthode de Gauss (cf. Lanczoz, 1956). Dans la
pratique, cette classe de semi-mtriques sera bien adapte et employe quand on a aaire
des courbes lisses, comme les donnes spectromtriques de masse (cf. Figure 1.2).
A ce stade, on pense que l'ensemble des donnes, lui-mme, devrait tre mis en avant an
de choisir la semi-mtrique employer.
En conclusion, chacune des deux familles discutes ci-dessus est adapte un certain genre
de donnes : la SMPCA est prvue pour des donnes irrgulires, tandis que, la SMD est
adapte aux donnes lisses.
On peut donc armer, sans hsitation, que le choix de la semi-mtrique permet la fois
de prendre en compte des situations plus varies et de pouvoir contourner le au de la
dimension. Ce choix ne doit cependant pas tre pris la lgre mais, doit prendre en compte,
non seulement la nature des donnes mais aussi la nature du problme tudi.

1.2.1 Probabilits des petites boules


Le problme du au de la dimension est un phnomne bien connu dans le cas de modles
de rgression multivarie non paramtrique. Il est bien connu que ce problme provoque une
dcroissance exponentielle des vitesses de convergence des estimateurs non paramtriques en
fonction de la dimension (cf. Stone, 1982). Par consquent, il est lgitime de penser que les
mthodes non paramriques dans l'tude des modles variables fonctionnelles risque d'avoir
une vitesse de convergence trs lente. Dans le cas o la variable explicative est multivarie
(i.e. valeurs dans un espace de dimension ventuellement nie (F, d)), les vitesses de
convergence de l'estimateur noyau sont exprimes en fonction d'un terme de la forme hdn ,
provenant de la valeur de la probabilit que la variable explicative appartienne la boule
de centre x et de rayon hn . Dans le cas d'une variable explicative fonctionnelle les rsultats
asymptotiques sont exprims partir de quantits plus gnrales appeles probabilits des
petites boules et qui sont dnies par :

Fx (hn ) := IP(d(X, x) hn ) o hn 0

22

Chapitre 1. Revue bibliographique sur l'analyse des donnes fonctionnelles

Au travers des dirents rsultats de convergence concernant l'estimateur tudi dans ce mmoire (de type Nadaraya-Watson et/ou local linaire), on observe que la vitesse de convergence est fonction de la manire dont dcroissent ces probabilits de petite boules. Il existe
dans la littrature un nombre assez important de rsultats probabilistes qui tudient la manire dont ces probabilits des petites boules tendent vers 0 quand d est une norme (cf. par
exemple, Li et Shao (2001), Lifshits et al. (2006) et Gao et Li (2007)). On pourra galement
se rferer au travail de Dereich (2003, Chapitre 7) qui est consacr au comportement des
probabilits des petites boules dont les centres sont alatoires. Au travers de ces travaux
on peut voir, par exemple, que dans le cas de processus non-lisses tels que le mouvement
brownien ou le processus d'Ornstein-Uhlenbeck, ces probabilits des petites boules sont de
forme exponentielle (par rapport hn ) et que par consquent la vitesse de convergence de
nos estimateurs est en puissance de ln(n) (cf. Ferraty et al. (2006), paragraphe 5 et Ferraty
et Vieu (2006a), paragraphe 13.3.2, pour une discussion plus approfondie sur ce sujet).
Dans ce qui suit, nous allons prsenter un aperu sur l'utilit de l'analyse des donnes
fonctionnelles dans les applications.

1.2.2 Champs d'application des donnes fonctionnelles


Depuis plusieurs dcennies, nombreux sont les statisticiens qui ont dvelopp des applications permettant le traitement de variables alatoires fonctionnelles. D'une part, ce traitement permet d'utiliser ou de dvelopper des outils thoriques performants, et d'autre part,
il ore un norme potentiel en terme d'applications (en imagerie, agro-industrie, gologie,
conomtrie,...). Nous exposons ci-dessous quelques exemples concrets.
: le problme de la reconnaissance vocale est un sujet
d'actualit. L'objectif est de pouvoir retranscrire phontiquement des mots et des phrases
prononcs par un individu. Les donnes sont des courbes correspondant des enregistrements
de phonmes prononcs par dirents individus. Des travaux ont t, galement, raliss,
notamment concernant la reconnaissance vocale. On peut citer par exemple Hastie et al.
(1995), Berlinet et al. (2005) ou encore Ferraty et Vieu (2003).
Dans le domaine de la linguistique

: il s'agit d'un jeu de donnes provenant de l'tude d'un


phnomne climatologique assez important. Ce phnomne est couramment appel El Nio.
C'est un grand courant marin qui survient de manire exceptionnelle (en moyenne une
deux fois par dcennie) le long des ctes pruviennes la n de l'hiver. Ce courant
provoque des drglements climatiques l'chelle de la plante. Le jeu de donnes est
constitu de relevs de tempratures mensuelles de la surface ocanique eectus depuis
1950 dans une zone situe au large du nord du Prou (de coordonnes 0-10 Sud, 80-90
Ouest) dans laquelle peut apparatre le courant marin El Nio. Ces donnes et leur description sont disponibles sur le site internet du centre de prvision du climat amricain :
http : //www.cpc.ncep.noaa.gov/data/indices/. Il faut noter que l'volution des tempratures au cours du temps est rellement un phnomne continu. Le nombre de mesures permet
Etude du phnomne d'El Nio

1.2.

Donnes fonctionnelles vs semi-mtrique

23

Figure 1.1  Les courbes correspondant au courant d'El Nno

de prendre en considration la nature fonctionnelle des donnes (cf. Figure 1.1). A partir de
ces donnes, on peut s'intresser la prdiction de l'volution du phnomne partir des
donnes recueillies lors des annes prcdentes.
: Ferraty et Vieu (2002, 2003) se sont intresss des donnes
spectromtriques de masse. Ces donnes proviennent d'un problme de contrle de qualit
en industrie alimentaire. Ils ont tudi la contenance en graisse dans les morceaux de viande
tant donn les courbes d'absorption de ces morceaux de viande (cf. pour ceci Figure 1.2).
Ces donnes relles ont t utilises dans le cas o les variables sont indpendantes.
En industrie alimentaire

: dans le cadre des donnes dpendantes, on peut considrer l'exemple d'une srie chronologique qui concerne la consommation annuelle
Consommation d'lectricit aux USA

Chapitre 1. Revue bibliographique sur l'analyse des donnes fonctionnelles

4.0
2.0

2.5

3.0

3.5

CURVES[1, ]

4.5

5.0

5.5

24

20

40

60

80

100

Index

Figure 1.2  Les courbes spectromtriques


d'lectricit, aux USA, par des secteurs rsidentiels et commerciaux de janvier 1973 jusqu'en
fvrier 2001 (338 mois). Le but de cette tude est de prvoir la consommation d'lectricit de
l'anne suivante sachant la consommation d'lectricit de toute l'anne prcdente. L'chantillon se compose de 28 donnes comme le montre la Figure 1.3. Cette srie chronologique
peut tre regarde comme tant un ensemble de donnes fonctionnelles dpendantes (c'est-dire, une population de 28 courbes : chaque anne correspond 1 courbe).
: Un autre exemple de variables alatoires fonctionnelles dpendantes
portant sur l'tude de phnomnes lis l'environnement est le problme de pollution. Il
s'agit d'tudier la courbe de concentration d'ozone au Ple Nord sur quatre annes successives (de 2000 2004). L'objectif est de prvoir la concentration de l'ozone dans une
journe tant donn la courbe de concentration de l'ozone de la veille. En procdant par
un dcoupage journalier de la courbe de concentration annuelle de l'ozone, on obtient les
courbes reprsentes dans Figure 1.4. Notons que plusieurs auteurs se sont intresss aux
phnomnes lis l'environnement, on peut citer entre autres, Damon et Guillas (2002),
Aneiros-Perez et al. (2004), Cardot et al. (2004, 2006), Meiring (2005).
Donnes de pollution

Bref, de nombreux autres domaines d'application o l'on peut tre confront des donnes
de natures fonctionnelles existent et/sinon auent. Vu l'normit des exemples que l'on
peut citer, nous sommes incapable de prsenter dans cette thse une liste exhaustive de ces
applications. Sinon, nous nous contentons, dans la suite de ce paragraphe, d'un rapide tour
d'horizon de ces champs d'application.
: pour l'tude des variations des courbes de croissance (cf. Rao, 1958 et Figure
1.5), et plus rcemment, pour l'tude des variations de l'angle du genou durant la marche
En biologie

25

0.0
0.1
0.2

electricityconsumption[1, ]

0.1

0.2

Donnes fonctionnelles vs semi-mtrique

10

12

Index

Figure 1.3  Les courbes annuelles de consommation d'lectricit aux USA

60
40
20

pollution

80

100

20002004

1.2.

10

15

20

Heure

Figure 1.4  Les courbes de pollution au Ple Nord

26

Chapitre 1. Revue bibliographique sur l'analyse des donnes fonctionnelles

Figure 1.5  Courbes de croissance


(cf. Ramsay et Silverman, 2002). Notons qu'un norme nombre de donnes fonctionnelles est
produit et ne demande qu' avoir la mthodologie adquate pour son traitement, notamment
les donnes spectromtriques de masse (cf. pour le cancer Figure 1.6).
: des tudes sur la ponte de mouches mditerranennes ont t eectues
et rsumes par des courbes donnant, pour chaque mouche, la quantit d'oeufs pondus en
fonction du temps (cf. Figure 1.7).
En biologie animal

: on est souvent confronts de nombreux phnomnes que l'on peut modliser par des variables fonctionnelles. Parmi ces phnomnes on peut citer la volatilit des
marchs nanciers (cf. Mller et al., 2007), le rendement d'une entreprise (cf. Kawassaki et
Ando, 2004), le commerce lectronique (cf. Jank et Shmueli, 2006) ou l'intensit des transactions nancires (cf. Laukaitis et Rackauskas, 2002). On peut se rferer Kneip et Utikal
(2001), Benko (2006) et Benko et al. (2006) pour des rfrences supplmentaires. Par ailleurs,
nous pouvons aussi citer un exemple qui consiste l'observation des uctuations d'un indice
boursier en fonction du temps : il s'agit typiquement d'une srie temporelle qu'on dcoupe
selon des sous-intervalles de l'espace temps (cf. Bosq, 2002).
En conomtrie

: l'apport des techniques de la statistique fonctionnelle a aussi trouv une


application en graphologie. Parmi les travaux raliss sur cette problmatique on peut citer,
titre d'exemple, ceux de Hastie et al. (1995) et Ramsay (2000). Ce dernier a modlis
la position du stylo (abscisses et ordonnes en fonction du temps) l'aide d'un systme
d'quations direntielles de paramtres fonctionnels.
En graphologie

1.2.

Donnes fonctionnelles vs semi-mtrique

Figure 1.6  Courbes spectromtriques de masse sur des cellules cancereuses

Figure 1.7  Une courbe du nombre d'oeufs journaliers pondus par une mouche

27

28

Chapitre 1. Revue bibliographique sur l'analyse des donnes fonctionnelles

Les mesures et notamment les images recueillies par satellites sont galement des donnes
dont l'tude peut tre ectue partir des mthodologies de la statistique fonctionnelle. On
peut citer, par exemple, les travaux de Vidakovic (2001) dans le domaine de la mtorologie
ou ceux de Dabo-Niang et al. (2004b, 2007) dans le domaine de la gophysique. Dans ces
travaux, on s'intresse la classication des courbes recueillies par le satellite dirents
endroits de l'amazonie, ce qui permettrait d'identier la nature du sol. Enn, citons Cardot
et al. (2003) et Cardot et Sarda (2006) qui ont tudi l'volution de la vgtation partir
de donnes satellitaires.

1.3

Quelques rsultats sur l'estimation non-paramtrique pour


des modles fonctionnels

Nous rappelons, dans ce paragraphe et dans un premier temps quelques hypothses


et notations qui paraissent importantes pour la suite de ce travail de thse. Ensuite, les
rsultats obtenus par Ferraty et al. (2006) et brivement ceux obtenus par Laksaci (2005)et
Ezzahrioui (2007) sur l'estimation de quelques paramtres conditionnels.

1.3.1 Notations et hypothses


Considrons le couple de variables alatoire (X, Y ) o Y est valeurs dans R et X
est valeurs dans un espace semi-mtrique (F, d) qui peut tre de dimension ventuellement
nie. Pour x F , la distribution de probabilit de Y sachant X est dnie par :

y R, F x (y) = IP(Y y|X = x)


ou cette distribution est absolument continue par rapport la mesure de Lebesgue sur R.
Notons par f x (respectivement f x(j) ) la densit conditionnelle (respectivement sa drive
d'ordre j ) de Y sachant X = x. Par la suite on dsignera par x le point xe de F , Vx un
voisinage de x et SIR un sous-ensemble compact de R. Notons aussi par : B(x, h) = {x
F|d(x , x) < h} la boule de centre x et de rayon h.
Voici quelques hypothses dont nous avons besoin dans les enoncs des rsultats prliminaires.
(H1) P (X B(x, h)) = x (h) > 0
Pour la fonction de rpartition conditionnelle, celle-ci sera suppose vrier la condition
suivante :
(
)
(H2) (y1 , y2 ) SS, (x1 , x2 ) Vx Vx , |F x1 (y1 )F x2 (y2 )| Cx d(x1 , x2 )b1 + |y1 y2 |b2

1.3.

29

Quelques rsultats sur l'estimation non-paramtrique pour des modles fonctionnels

et pour certain j 0,
Concernant la densit conditionnelle f x , on la supposera de classe C j (et telle que :
)
(H3) (y1 , y2 ) SS, (x1 , x2 ) Vx Vx , |f x1 (j) (y1 )f x2 (j) (y2 )| Cx d(x1 , x2 )b1 + |y1 y2 |b2
La condition de concentration (H1) joue un rle important. Ce genre de condition est li
la semi-mtrique d. Elle quantie et contrle les probabilites des petites boules.

(y1 , y2 ) R , |H(y1 ) H(y2 )| C|y1 y2 |


(H4) =


b2 (1)
(t)dt < +
R |t| H
(H5) Le noyau K est support dans (0, 1), tel que, 0 < C1 < K(t) < C2 , o
C1 et C2 sont deux constantes strictement positives,
(H6) lim hK = 0 et lim
n

log n
= 0,
nx (hK )

(H7)- lim hH = 0 et lim n hH = , pour un certain rel > 0.


n

o H est un noyau, hK = hK,n (respectivement, hH = hH,n ) est une suite de nombres


rels positifs tendant vers 0 quand n tend vers l'inni.

1.3.2 Estimation de la loi conditionnelle


Dans ce paragraphe, nous donnons un rsultat de convergence de l'estimateur noyau
de la loi conditionnelle. tant donn un lment x x de F et soit (Xi , Yi )i=1,...,n un chantillon de couples de variables alatoires indpendantes valeurs dans R F , l'estimateur
noyau de la loi conditionelle F x (.) est dni par :

(
) (
)
d(x,Xi )
yYi
K
H
i=1
hK
hH
(
)
, y R
n
d(x,Xi )
K
i=1
hK

n
F x (y) =

Le thorme suivant donne la convergence 1 (p.co.) prsque complte de l'estimateur F x (y)

1. soit

(zn )nN

zn converge presque compltement (p.co.)

> 0,
I
P
(|z
|
>
0)
<

. De plus, soit (un )nN une suite de nombres


n
n=1

zn = O(un ) p.co. si, et seulement si, > 0,


n=1 IP (|zn | > un ) < : ce type

une suite de variables alatoires. On dit que

vers 0 si, et seulement si,


rels positifs. On dit que

de convergence implique la convergence presque sure et la convergence en probabilit (cf. [13] pour plus de
dtails).

30

Chapitre 1. Revue bibliographique sur l'analyse des donnes fonctionnelles

Thorme 1.3.1.

(Ferraty et al. 2006). Sous les hypothses H1-H6, ona :

( )
( )
sup |IFxn (y) F x (y)| = O hbK1 + O hbH2 + O

yS

log n
n x (hK )

)
, p.co.

1.3.3 Estimateur noyau de la densit conditionnelle


Dans ce pragraphe, nous prsentons un estimateur noyau de la drive d'ordre j de
la densit conditionnelle et un rsultat sur le comportement asymptotique de cet estimateur.
Cet estimateur fx(j) de f x(j) est donn par :

f(j)(y|x) =

hj1
H

i=1 K

d(x,Xi )
hK

i=1 K

H (j+1)
)

yYi
hH

d(x,Xi )
hK

)
, y R

Notons que, cet estimateur est analogue celui introduit par Rosenblatt (1969) dans le
cas o X est une variable alatoire relle. Il est aussi largement tudi depuis ce temps (cf.
Youndj, 1996). An d'tablir quelques rsultats de convergence, les hypothses suivantes
seront ncessaires :

(y1 , y2 ) R2 , |H (j+1) (y1 ) H (j+1) (y2 )| C|y1 y2 |

> 0, j j + 1, lim |y|1+ |H (j+1) (y)| = 0


(H8)
y

(j+1)
H
est born
(H9) lim hK = 0 avec lim
n

log n
2j+1
nhH
x (hK )

= 0.

Le thorme suivant concerne le comportement asymptotique de l'estimateur fonctionnel


noyau fx(j) .

1.3.

31

Quelques rsultats sur l'estimation non-paramtrique pour des modles fonctionnels

Thorme 1.3.2.

(Ferraty et al., 2006). Sous les hypothses H1, H3, H4 et H6-H9,

ona :

sup |fnx(j) (y) f x(j) (y)| = O hbK1 + O hbH2 + O


yS

log n

n h2j+1
x (hK )
H

, p.co.

o S une sous-ensemble compact de R

1.3.4 Estimation du mode conditionnel


Cas o les donnes sont i.i.d.
Ce paragraphe prsente un estimateur du mode conditionnel not par . Notons que,
l'ensemble compact S est choisi de telle sorte qu'il n'y ait qu'un unique mode . Cet estimateur est bas sur la prcdente estimation fonctionnelle de la densit conditionnelle.
Dans la suite de ce paragraphe, on utilise S = [ , + ] comme ensemble compact.
L'estimateur du mode conditionnel est dnie comme.

= sup fx (y)
fx ()
yS

Notons que, l'estimateur n'est pas ncessairement unique, pour assur cette unicit et la
convergence de n , on suppose :
(H10) > 0, f x dans [ , ] et f x dans [, + ].
(H11) f x est j -fois continment direntiable par rapport y sur [ , + ],
et

x(l)

f () = 0, si 1 l < j
(H12)

|f x(j) ()| > 0 sinon

Signalons que ces conditions ont une grande inuence sur la vitesse de convergence de l'estimateur (cf. le thorme ci-dessous). De plus la convergence de cet estimateur peut tre
obtenue par l'hypothse (H10) (cf. Laksaci (2005), Lemme 2.4.1).

32

Chapitre 1. Revue bibliographique sur l'analyse des donnes fonctionnelles

Thorme 1.3.3.

(Laksaci , 2005). Si les hypothses du Thorme 1.3.2 et H10-H12 sont

vries, alors :

( b1 )
( b2 )
(
= O hKj + O hHj + O

log n
n hH x (hK )

)1

2j

, p.co.

Cas o les donnes sont -mlangeantes


Les rsultats obtenus dans le cas des variables alatoires fonctionnelles indpendantes
et identiquement distribues (i.i.d.) ont t prolongs au cas des variables fortement mlangeantes. Un rsultat (cf. Thorme 1.3.4) s'annonce dans ce cadre grce des hypothses
faites dans le cas i.i.d. Ces hypothses ont t renforces par des conditions de concentration
de la loi conjointe des couples (X, Y ) et quelques hypothses sur les coecients de mlange.

Deux exemples d'application sont tudis. Le premier correspond au cas i.i.d. Il concerne
l'industrie agro-alimentaire (courbes spectromtriques de masse). L'autre exemple correspond au cas dpendant. Celui-ci concerne un problme de pollution (les courbes de la concentration de l'ozone sur le ple nord) (cf. Laksaci , 2005 pour plus de dtails).
Les hypothses suivantes sont ncessaires dans l'enonc du Thorme 1.3.4 :
(H13) supi=j P ((Xi , Xj )) B(x, r)XB(x, r) = x (r)x (r) > 0,
(H14) Les coecients de -mlange de la suite (Xi , Yi ) vrient la condition :

a > (5 +

17)/2, c > 0telsquen, n cna

,
(H15) lim hH = 0 et 1
n

(H16)

4
tel que lim n1 hH = ,
n
(a + 1)(a 2)
logn
= 0.
n hH x (hK )
et

lim hK = 0, lim

3a

2 > 0, c1 > 0, c2 > 0, c2 n( a+1 )+2 x (hK ) c1 n 1a .

x (hK ) dsigne le maximum de la concentration entre la loi marginale et les lois conjointes
de chaque couple d'observations fonctionnelles dans la boules de centre x et de rayon hK .

1.3.

33

Quelques rsultats sur l'estimation non-paramtrique pour des modles fonctionnels

Thorme 1.3.4.

(Laksaci, 2005). Si les hypothses (H1), (H3)-(H5) et (H10)-(H16) sont

vries, alors :

((
( b1 )
( b2 )
j
j
= O hK + O hH + O
o

b1

et

b2

deux rels strictement positifs.

log n
n hH x (hK )

)1)
2j

, p.co.

34

Chapitre 1. Revue bibliographique sur l'analyse des donnes fonctionnelles

Chapitre 2
Kernel conditional density estimation
when the regressor is valued in a
semi-metric space

Ali Laksaci 1 , Fethi Madani2 and Mustapha Rachdi 2 , 3


( paratre dans Communications Statistics- Theory and Methods, 2012)
Abstract.This

paper deals with the conditional density estimation when the explanatory variable is
functional. In fact, nonparametric kernel type estimator of the conditional density has been recently
introduced when the regressor is valued in a semi-metric space. This estimator depends on a smoothing parameter which controls its behavior. Thus, we aim to construct and study the asymptotic
properties of a data-driven criterion for choosing automatically and optimally this smoothing parameter. This criterion can be formulated in terms of a functional version of cross-validation ideas.
Under mild assumptions on the unknown conditional density, it is proved that this rule is asymptotically optimal. Finally, a simulation study and an application on real data are carried out to
illustrate, for nite samples, the behavior of our method. Finally, mention our results can also be
considered as novel in the nite dimensional setting and several other open questions are raised in
this article.

Cross-validation, functional data, kernel estimator, nonparametric model, bandwidth selection, small balls probability
Keywords.

1. Universit Djillali Liabs, BP. 89, Sidi Bel-Abbs 22000, Algeria. E-mail : alilak@yahoo.fr
2. Laboratoire AGIM FRE

3405

CNRS, Equipe TIMB, Universit P. Mends France (Grenoble 2),

UFR SHS, BP. 47, 38040 Grenoble Cedex 09, France. E-mails : Mustapha.Rachdi@upmf-grenoble.fr and
Fethi.Madani@imag.fr
3. Corresponding author

35

36

2. Choix de la largeur de fentre

AMS Subject Classication.

2.1

Primary : 62G05, Secondary : 62G07, 62G08, 62G35, 62G20.

Introduction

Conditional density estimation is a statistical technique that allows for a better understanding of the relationship between a response variable and a set of covariates, in comparison
with usual regression methods. Therefore, this technique is of great importance in many
scientic elds where knowledge about conditional means, obtained by regression methods,
is not enough to draw valuable conclusions about the problem at hand. Moreover, conditional density functions arise in a variety of areas. One of the more useful applications involves
density forecasting, where the probability density of the forecast of a time series, such as the
rate of ination, can be used to make probability statements regarding the future course of
that series. However, the probability density, and its resulting interpretation, is conditional
on the hypothesis that the model used to produce the forecasts is correctly specied.
Recall that, if g(x, y) denotes the joint density of (X, Y ) and h(x) denotes the marginal
density of X , then the conditional density of Y given X = x is obtained by f (x, y) =
g(x, y)/h(x). The standard nonparametric regression does not allow the analysis of changes
in modality, and standard density estimation does not allow conditioning on an explanatory
variable. Notice also that conditional density estimation is, in some ways, a generalization
of both nonparametric regression and standard univariate density estimation. The kernel
conditional density estimation was rst considered by Rosenblatt (1969) who studied the
problem of estimating the density of Y given X = x where X is an univariate random
variable.
On the other hand, estimators of the conditional mode, the conditional distribution and
the conditional median can be derived directly from estimators of f (x, y). For instance in
Collomb et al. (1987) it is shown how one can get an estimator of the conditional mode
and how such an estimator can be used for forecasting problems (cf. to cite a few, Hrdle
(1990), Gannoun (1990), Youndj (1993 and 1996) and the references therein). Moreover, It
is important to mention that estimators of conditional modes are of particular interest for
prediction (cf. Collomb et al. (1987) and Ferraty et al. (2005)).
Furthermore, the problem of the conditional density estimation appears to have lain free of
scrutiny until it was revisited and some improved estimators were proposed (cf. Hyndman
et al. (1996), and references therein for some developments). Indeed, the following modied
form of Rosenblatt's estimator was considered :

b1 nj=1 K(a1 ||x Xj ||x )K(b1 ||y Yj ||y )


b
n
f(a,b) (x, y) =
(1)
1
j=1 K(a ||x Xj ||x )
where (X1 , Y1 ), . . . , (Xn , Yn ) is a sample of independent observations from the distribution
of (X, Y ) and ||.||x and ||.||y are metrics on the spaces values of X and Y , respectively.

2.1.

Introduction

37

The kernel function, K(u), is assumed satisfying some specic conditions. Popular choices
of K(u) are dened in terms of univariate and unimodal probability density functions. Moreover, Youndj (1993 and 1996), Hyndman et al. (1996) and others give the bias, variance,
mean squared error (MSE) and convergence properties of the estimator (1) and proposed
also an alternative kernel estimator with smaller MSE than the standard estimator in some
commonly occurring situations. On the other, we can not continue our introduction without
mentioning the work by Fan et al. (1996), who proposed an alternative conditional density
estimator by generalizing Rosenblatt's estimator using local polynomial techniques. Then,
Hyndman and Yao (1998) introduced two further local parametric estimators which improve
on the estimators given by Fan et al. (1996). Stone (1994), meanwhile, followed a dierent
path by using tensor products of polynomial splines to obtain conditional log density estimators. For other studies on the nonparametric estimation of the conditional density we
refer also to Gannoun (1990), Youndj (1993 and 1996), Hall et al. (1999), Hrdle et al.
(1991), Bashtannyk and Hyndman (2001), Gannoun et al. (2003), El Ghouch and Genton
(2009) and the references therein.
In this paper, we are interested in the ecient estimation of the conditional probability
density when the explanatory variables are of functional type. It should be noticed that,
these questions in the innite dimensional framework are particularly interesting, at once
for the fundamental problems they formulate, but also for many applications they may allow
(cf. Bosq (2000), Ramsay and Silverman (2005), Ferraty and Vieu (2006) and references
therein). In fact, in this conditional context, the rst results were obtained by Ferraty and
Vieu (2005) and Ferraty et al. (2006). They established the almost-complete consistency,
in both cases i.i.d. and strongly mixing data, of the kernel estimators of the conditional
distribution function and of the conditional probability density. Moreover, they presented
some applications of their results on both the conditional mode and on conditional quantiles.
Among the lot of papers which are concerned with the nonparametric modelization related
to the conditional distribution of a real variable given a random variable taking values in
innite dimensional spaces, we refer only to Dabo-Niang and Laksaci (2007) for the conditional mode estimation, and to Laksaci (2007) for the asymptotic expression of leading terms
in the quadratic error of conditional density kernel estimators.
On the other hand, it is well known that kernel estimators have some nice asymptotic properties when the curse of dimensionality is controlled by means of suitable considerations
on the small ball probabilities of the functional variable (cf. Ferraty and Vieu 2006 and
references therein). However it is also well-known that, as in the standard nite dimensional
framework, the smoothing parameter has to be selected suitably for insuring good practical
performances (cf. Laksaci, 2007). Notice that, some papers, (cf. for instance, Youndj et al.,
1993), have treated the problem of the smoothing parameter selection in the nonparametric
estimation of the conditional density, by using some techniques quite dierent from ours, but
only in the nite dimensional setup. Furthermore, the selection of the smoothing parameter
in the innite dimensional setting is much more complicated. In particular, the so-called
scatterplot which is a graphical tool for exploring the relationship between the explanatory
variables and the scalar response is not available, and hence it becomes very hard to have
some informations on the shape of the relationship between the functional variable and the

38

2. Choix de la largeur de fentre

scalar response. Therefore, various areas with dierent (low/high) concentrations can appear
in such a relationship even though it does not appear in the functional data sample (cf. for
instance, the simulated curves in Section 2.4.2). It is also clear, in the innite dimensional
setup, that the concentration of the distribution of the functional explanatory variable will
have an inuence on the value of some appropriate bandwidth (the variance of the estimator
increases when the concentration of the distribution of the functional covariates decreases
which is the case when the bandwidth value's decreases (cf. conditions (17) and (14)). Moreover, in areas where the functional covariates have low concentration, the bandwidth has
to be taken suciently large to include enough data curves, while a smaller bandwidth can
be used in areas where the functional covariates have high concentration. It should, thus, be
noted that Rachdi and Vieu (2007) (respectively Benhenni et al., 2007) proposed a global
(respectively a local adaptive) cross-validation procedure for the regression operator estimation for functional data, which has inspired this work.
The main aim of this paper is then the construction of both global and local functional crossvalidation procedures. We remark that a local bandwidth choice can signicantly improve
the precision of the prediction in the functional setting than the global one. In section 2, the
data-driven methods are dened. The main hypotheses and results are enounced in section 3.
In section 4, we propose a simulation study showing how an optimal local bandwidth choice
improves the usual global selection rule for some irregular functional covariates. Finally,
asymptotic theoretical support is given in section 5, and the proofs of the auxiliary results
are relegated to the Appendix.

2.2

Global and local bandwidth selection rules

Let us introduce a sample of independent pairs (Xi , Yi )1in identically distributed as (X, Y )
which is valued in F R, where (F, d) is a semi-metric space equipped with a semi-metric d.
Assume that there exists a regular version of the conditional probability of Y given X , which
is absolutely continuous with respect to the Lebesgue measure on the real line R. Let f (x, )
denote the conditional probability density of the random variable Y given X = x F ,
which we have to estimate. For this aim, we dene the kernel estimator fb(a,b) of f as in (1),
but by considering two dierent kernel functions as follows :

b1 ni=1 K(a1 d(x, Xi ))H(b1 (y Yi ))


n
(2)
x F and y R, fb(a,b) (x, y) =
1
i=1 K(a d(x, Xi ))
where K is a kernel and a = aK,n (respectively b = bH,n ) is a sequence of positive real
numbers. Notice that the estimator (2) has been used by Roussas (1968) in the real case
and by Ferraty et al. (2006) in the functional case.
The main goal of this paper is to construct and study the asymptotic behavior of a data driven method which optimally selects the smoothing parameters (a, b). To do that, we propose
to use a rule which is based on the classical leave-out-one-curve cross-validation procedure
and to study its asymptotic behavior in the mean squared sense. Indeed, commonly with the

2.2.

39

Global and local bandwidth selection rules

majority of the earlier works on the bandwidth selection, our rule is based on the minimization of the integrated squared error which is weighted by the probability measure, dPX (x),
of the functional variable X and some nonnegative weight functions W1 and W2 :
(
)2
d1 (fb(a,b) , f ) =
fb(a,b) (x, y) f (x, y) W1 (x)W2 (y) dPX (x) dy
(3)
A discrete approximation of (3) is the averaged squared error given by :

d2 (fb(a,b) , f ) =

n
)2 W (X )W (Y )
1 (b
1
i
2 i
f(a,b) (Xi , Yi ) f (Xi , Yi )
n
f (Xi , Yi )

(4)

i=1

or, also, the mean integrated squared error :



(
)2
b
d3 (f(a,b) , f ) =
IE fb(a,b) (x, y) f (x, y) W1 (x)W2 (y)dPX (x)dy

(5)

However, these loss functions depend on the conditional density f , so the smoothing parameter that minimizes these errors is not computable in practice. Thus, we must nd another
loss function which is asymptotically equivalent to the quadratic distances (3), (4) and (5).
Following the same ideas as in Youndj (1996) for the real case, we can write that :

d1 (fb(a,b) , f ) = A + B 2C
where


A =

B =

C =

2
fb(a,b)
(x, y)W1 (x)W2 (y)dPX (x)dy

f 2 (x, y)W1 (x)W2 (y)dPX (x)dy


fb(a,b) (x, y)f (x, y)W1 (x)W2 (y)dPX (x)dy

Since the second term B is independent of (a, b), the problem of minimizing d1 is equivalent to
that of minimizing A2C . A straightforward way to construct a computational procedure to
select the optimal bandwidths (a, b) with respect to the error measure d1 is to estimator the
both quantities A and C . For this aim, as mentioned above, we adopt the standard leaveout-one-curve technique as in Rudemo (1982) for the probability density estimation and
Rachdi and Vieu (2007) for the regression operator estimation, by considering the following
criteria :

n
n
1
2 bi
i2
b
GCV (a, b) =
W1 (Xi ) f(a,b) (Xi , y)W2 (y)dy
f(a,b) (Xi , Yi )W1 (Xi )W2 (Yi ) (6)
n
n
i=1

i=1

and respectively, for a xed y R :

n
n
1
2 bi
i2
LCVx,y (a, b) =
W1,x (Xi ) fb(a,b)
(Xi , z)W2,y (z)dz
f(a,b) (Xi , Yi )W1,x (Xi )W2,y (Yi )
n
n
i=1
i=1
(7)

40

2. Choix de la largeur de fentre

where W2,x (respectively W2,y ) is some positive local weight function around x (respectively
y ), and for any i = 1, . . . , n :
i
fb(a,b)
(x, y) =

These criteria are obtained by



C =

=

=

b1

1
1
j=i K(a d(x, Xj ))H(b (y
n
1
j=i K(a d(x, Xj ))

Yj ))

(8)

using the fact that

fb(a,b) (x, y)f (x, y)W1 (x)W2 (y)dPX (x)dy


fb(a,b) (x, y)W1 (x)W2 (y)dPY |X=x (y)dPX (x)

fb(a,b) (x, y)W1 (x)W2 (y)dP(X,Y ) (x, y)


(
)
= IE(X,Y ) fb(a,b) (X, Y )W1 (X)W2 (Y )

and

(
A = IEX

2
fb(a,b)
(X, y)W1 (X)W2 (y)dy

where IEZ denotes the expectation with respect to the distribution of the random variable
Z.
Finally, our global (respectively, local) cross-validation procedure consists in choosing the
bandwidths (a, b) which minimize GCV (a, b) (respectively, LCVx,y (a, b)) on a given set
Hn IR+2 (respectively, Hn (x, y) IR+2 ).

2.3

Main Results

2.3.1 Assumptions
In order to deduce the asymptotic optimality of the bandwidth selected by the rule GCV
(respectively, LCVx,y ), we will assume that the weight function W1 (respectively W2 ) is
bounded with support in some subset SX of F (respectively on a compact subset SY of IR)
and the conditional density f (, ) is bounded on SX SY . In the sequel of this paper, when
no confusion is possible, we will denote by C and C some strictly positive generic constants
and we will make the following assumptions :
The weight functions are taken, for each curve x, such that for some positive real w :

w = a for 0 < < 1 and W1,x is bounded and supported in B(x, w)

(9)

where B(x, h) denotes the closed ball with center x and radius the real h > 0,

x SX , 0 < C(h) IP (X B(x, h)) C (h)

(10)

2.3.

41

Main Results

where (h) is a positive real function such that lim (h) = 0.


h0

There exist some strictly positive constants b1 , b2 and , such that : (x0 , y0 ) SX SY ,
(x1 , x2 ) SX SX and (y1 , y2 ) SY SY , we have :
(
)
f (x0 , y0 ) > and |f (x1 , y1 ) f (x2 , y2 )| C db1 (x1 , x2 ) + |y1 y2 |b2 (11)
The kernel K is a bounded and Lipschitzian kernel on its support (0, 1), and there exist
some positive constants C and C such that :

0 < C < K(t) < C <

(12)

and if K(1) = 0, the kernel K has to fulll the additional condition < C < K (t) <
C < 0, where K is the rst derivative of K .
The kernel H is bounded and a Lipschitzian continuous function, such that :

|t|b2 H(t)dt < and


H 2 (t)dt <
(13)
The function is such that :

C > 0, 0 > 0, < 0 , () < C

and if K(1) = 0, the function (.) has to fulll the additional condition :

C > 0, 0 > 0, 0 < < 0 ,


(u) du > C ()

(14)

For n large enough, the Kolmogorov's -entropy of SX denoted by SX (cf. for instance,
Kolmogorov and Tikhomiros (1959) and Theodoros and Yannis (1997)) satises, for some
(0, 1) :
(
{
)}

log n
n(3+1)/2 exp (1 )SX
< for some > 1
(15)
n
n=1

and for all (a, b) Hn we have :

lim n b =

n+

and

(a) Cn for some (0, 2 2)

(16)

2.3.2 Some interpretations and examples on our hypotheses


It is worth observing that these conditions are not very restrictive. The hypotheses (10)-(14)
are very standard in the functional nonparametric setting. More precisely :
 The hypothesis (10) is a simple uniformization of the concentration property of the
probability measure on the small balls. This assumption is satised for a large family
of random functional variables. Indeed, in many examples, the small ball probability
function IP (X B(x, h)) can be written approximatively as the product of two independent functions g(x) and (h), as in the following examples, which can be found in
Ferraty et al. (2007) :

42

2. Choix de la largeur de fentre

(i) IP (X B(x, h)) = g(x)h for some


) > 0
(
C
(ii) IP (X B(x, h)) = g(x)h exp p for some > 0 and p > 0
h
g(x)
(iii) IP (X B(x, h)) =
| log(h)|
Thus, condition (10) is automatically veried if the function g satises :

0 < C < inf g(x) sup g(x) < C < .


xSX

xSX

Moreover, the function g can be specied for several well known continuous-time processes, by using the Onsager-Machlup function (cf. Ferraty et al., 2010). For instance,
it is shown in Corollary 4.7.8 in Bogachev (1999, page 186), that the expression of
the Onsager-Machlup function of the couple (x, z), for the Gaussian measures on a
semi-normed space (F, ), is given by :
(
)
1
1
IP (X B(x, h))
F (x, z) = log lim
= (z)2H (x)2H
h0 IP (X B(z, h))
2
2
where H is the Hilbert norm on the Cameron-Martin space of F associated to
a Gaussian measure, denoted by H , and () is the orthogonal projection onto the
orthogonal complement of the set {a H, such that a = 0}. So, in this case g(x) =
exp( 21 (x)2H ), therefore, condition (10) is veried for subsets such as :

SX = {x F, such that (x)H r}.


 The assumption (14) summarized in the boundless of the derivative of around zero, is
not very restrictive. Indeed, by some simple analytic arguments we can show that this
condition is fullled for all the usual cases such as the exponential-type processes, the
fractal-type processes, and also in the nite dimensional case. On the other hand, for a
more exibility in the choice of the kernel, we add an additional condition, specically
on the continuity of the kernel at 1. This additional condition acting on the behavior
of around zero was introduced by Ferraty and Vieu (2006). It is obviously satised
for fractal-type processes and it is always possible, if F is a separable Hilbert space, to
choose a semi-metric d for which this condition is fullled (cf. Lemma 13.6, in Ferraty
and Vieu (2006), page 213)
 Assumption (15) is veried for a large class of functional spaces. We quote the following
examples (cf. Ferraty et al., 2010) :
1. The unit ball of the Cameron-Martin space associated to a Gaussian process
viewed as a map in C(0, 1) with the spectral measure satisfying :

exp(||)(d) < , for > 0


for which :

(
SX

log n
n

)
= O((log n))2 )

2.3.

43

Main Results

2. The unit ball of the Cameron-Martin space associated to the standard stationary
Ornstein-Uhlenbeck process viewed as a map in the Sobolev space W21 (0, 1) with
the covariance operator :

C(s, t) = exp (a|s t|) , for a > 0


For this subset, we have :

(
SX

log n
n

)
= O(log n)

3. The closed ball B(0, r) in the Sobolev space dened by the class of functions x(t)
on T = [0, 2p), such that :
2
2
1
1
2
2
x (t)dt +
x(m) (t)dt r
2 0
2 0
where x(m) () denotes the mth derivative of x. In this case :
(
)
log n
= O(n1/m )
SX
n
4. The compact subsets in the nite dimensional spaces, or in the projection semimetric in Hilbert spaces where :
(
)
log n
SX
= O(log n)
n
 Notice that, the inequality (H5b) in Ferraty et al. (2010) is not necessary here because
such assumption is used to precise the convergence rate of the uniform consistency
which is not necessary. In other words, the uniform consistency of the kernel estimator
of the conditional density (without any precision on the convergence rate) is sucient
to show our results.
 Conditions (9) and (16) are equivalent to those used by Rachdi and Vieu (2007) and
Benhenni et al. (2007) for the global and local cross-validation procedures in the
operatorial regression estimation. In fact, these hypotheses are the functional versions
of those used by Hrdle and Marron (1985) and Youndj (1996) in the usual real case.
The condition (9) on the weight function is similar to that in Vieu (1991), and allows
to give more importance to observations around the curve x.

2.3.3 Two theorems on global and local criteria


Theorem 2.3.1.

Under hypotheses (10)-(16), if the set

Hn

of bandwidths

(a, b)

is nite

with :

#(Hn ) = O(n )

for some

> 0,

where

denotes the cardinality

(17)

44

2. Choix de la largeur de fentre

then, we have for

k = 1, 2, 3,

that :

dk (fb(a1 ,b1 ) , f )
1
dk (fb(a ,b ) , f )
0

almost surely (a.s.), as

where

(a0 , b0 ) = (a0K,n , b0H,n ) = arg


and

n +

(18)

inf
(a,b)Hn

(a1 , b1 ) = (a1K,n , b1H,n ) = arg

dk (fb(a,b) , f )
inf

GCV (a, b)

(a,b)Hn

On the local framework, we suppose that (15) is veried for SX = B(x, w) and we deduce
the same optimality results, for the local criterion.

Theorem 2.3.2.

Under hypotheses (9)-(16), if the set

Hn (x, y)

of bandwidths

(a, b)

is nite

with :

#(Hn (x, y)) = O(n(x,y) )


then, we have for

k = 1, 2, 3,

for some

(x, y) > 0,

(19)

n +

(20)

that :

dk (fb(a1 ,b1 ) , f )
1,
dk (fb(a ,b ) , f )
0

a.s., as

where

(a0 , b0 ) = (a0K,n , b0H,n ) = arg


and

2.4

inf
(a,b)Hn (x,y)

(a1 , b1 ) = (a1K,n , b1H,n ) = arg

dk (fb(a,b) , f )

inf
(a,b)Hn (x,y)

LCVx,y (a, b)

Discussion and applications

2.4.1 On the applicability of the method


It is well know that, the estimation of the conditional probability density is an important
tool permitting the analysis of the input-output relation in nonparametric statistics. Such
nonparametric model provides a broader range of relevant information on the covariation
between two random variables. Moreover, if a conditional density estimator is available, it is
easy to make the prediction via the conditional mode estimator, to derive prediction intervals
or to determine the probabilities of extreme values. So, the optimality of all these statistical
studies is closely linked to the construction of the optimal estimator of the conditional
density. In order to emphasize the practical aspects of our study, we discuss in the rest of
this section the applicability of our bandwidth selection approach on some nonparametric
models, frequently used in practice, for which this question of the bandwidth selection is
inherent to derive their best properties.

2.4.

45

Discussion and applications

often, the prediction of the values of the response variable


knowing an explained one is obtained by estimating the conditional expectation. However,
the latter may not be suciently informative, when the conditional distribution possesses
multi-modality or a highly skewed prole with heteroscedastic noise. In such cases, it would
be more informative to estimator the conditional distribution itself. A pertinent predictor can
be obtained by estimating the conditional mode or the conditional median. In this section,
we focus on the problem of the bandwidth selection in the conditional mode estimation by
the kernel method. Firstly, we recall that a natural kernel estimator b of the conditional
mode is derived from the conditional density estimator as follows :
The conditional mode estimation :

b b, x) = arg sup fb(a,b) (x, y).


(a,

(21)

yR

where fb(a,b) (x, y) is given in (2). Clearly, the behaviour of the conditional mode estimator
depends on the choice of the two smoothing parameters a and b. In this prediction context,
a naive L2 -criterion is given by :
{ n
}
1
i
2
(aopt , bopt ) = arg min
(Yi b (a, b, Xi ))
a,b
n
i=1

where

i
bi (a, b, Xi ) = arg sup fb(a,b)
(Xi , y)
yR

i
with f(a,b)
(Xi , y) is the leave-one-out-curve estimator dened by (8). This selection method
has been used by De Gooijer and Gannoun (2000) in the multivariate case and by Ferraty
and Vieu (2006) in the functional setting. Although this selection method is very adequate
in several practical situations but, to the best of our knowledge, their asymptotic optimality
has not been addressed so far. Moreover, this selection procedure has serious problems if
the estimator b is not unique. A reasonable way to overcome this problem is to use our
bandwidth selection procedure by computing the conditional mode estimator as follows :

b b, x) = arg sup fb(a ,b ) (x, y).


(a,
1 1
yR

where (a1 , b1 ) are dened in Theorem 2.3.1. Similarly to the previous criterium, the present
procedure shows a great compatibility in practice (cf. Section 4.3), but their asymptotic
optimality remains an open question. Furthermore, the choice of the smoothing parameter
in the conditional mode estimation is one of the natural prospects of the present work.
in many particular situations predictions's user can also be interested in the construction of a predictive interval (or region)
since the latter is often more informative than a pointwise prediction. Notice that, there are
several ways to determine these regions (cf. for instance De Gooijer and Gannoun, 2000). In
this paragraph we focus on the maximum conditional density predictive region (MCDR) or
The maximum conditional density predictive region :

46

2. Choix de la largeur de fentre

highest conditional density region (HCDR) introduced by Hyndman (1995). This region is
dened, for any given (0, 1, by :

R = {y : |y| < , f (x, y) l (x)}


where

f (x, y)dy .

l (x) = max l > 0 :


f (x,y)l

Recall that, the MCDR is of the smallest Lebesgue measure among all the predictive regions
with the some coverage probability (cf. De Gooijer and Gannoun, 2000). In the unconditional case, the estimation of the maximum density predictive region has been widely studied
(cf. Samworth and Wand (2010) and the references therein). In our functional conditional
context, we use the kernel estimator fb(a,b) of the conditional density f to give a plug-in estimator of R . However, as for all estimations by the kernel method, the performance of this
estimation depends heavily on the choice of the bandwidth parameters (a, b). As that has
been mentioned before, many data-driven bandwidth selection have been proposed in the
multivariate case. For instance, Gooijer and Gannoun (2000) compare four selection methods
based on the classical leave-out-one cross-validation procedure associated to the cumulative
conditional distribution, the conditional mode, the conditional mean and the conditional
median. At this stage, it seems more reasonable to use a cross-validation criterion of the
conditional density instead to that of the predictors (the conditional mode, the conditional
mean or the conditional median). In other words, the best approximation of the MCDR can
be obtained by computing :
{
}
b = y : |y| < , fb(a ,b ) (x, y) b
R
l
(x)

1 1
where

b
l (x) = max l > 0 :

11{fb

(a1 ,b1 )

}
b(a ,b ) (x, y)dy .
f
1 1
(x,y)l}

Similarly to the conditional mode estimation, the asymptotic optimality of this selection
procedure is also an important prospect of this work. Finally, let us note that, in the real
unconditional framework Samworth and Wand (2010) show the asymptotic optimality of
some selection method based on the minimization of the probability of the symmetric difb . The adaptation of these ideas in the functional
ference between R and its estimator R
conditional case is an other important prospect of this work.
the expected shortfall (ES) is recently considered as one of the most common risk measures in Finance. This model was introduced by
Acerbi (2002) for a given level in (0, 1) by :
+
ES = 1
tfY (t)dt
The conditional expected shortfall estimation :

VAR

where VAR = FY1 () with fY (respectively, FY ) denotes the density (respectively, the
cumulative distribution) of the random variable Y representing returns on a given portfolio,

2.4.

47

Discussion and applications

stock, bond or market index. In many situations, we have to analyze the nancial risk
conditionally to an exogenous variable which is continuously observed. To do that, we use
the conditional expected shortfall for which the expectation is taken with respect to the
conditional distribution of Y given this exogenous variable X and the VAR is the conditional
quantiles of order of Y given X . Accurate estimation of this conditional model depends
crucially on the estimation method of the conditional density. Thus, if the kernel method is
used to estimator the conditional density function, the best approximation of the conditional
expected shortfall (CES) is then given by :
+
1
d
CES =
ybf(a1 ,b1 ) (x, y)dy
d (x)
VAR

d (x) is the th conditional quantile estimator. Similarly to the previous case,


where VAR
the asymptotic optimality of this approximation is also another important prospect of this
work.

2.4.2 On the nite-sample performance of the method


The main purpose of this Section is to show how we can implement easily and rapidly
our approach on nite samples. Our rst aim is to evaluate the performance of the global
smoothing selection method and the second one is to compare its behavior to the local one.
For these aims, we consider the following functional nonparametric model :

Yi = r(Xi ) + i , for i = 1, . . . , n

(22)

where the i 's are generated independently according to a N (0, 1) distribution. The sampled
functional explanatory variables Xi for i = 1, . . . , n, which is assumed to be independent of
i for i = 1, . . . , n, is generated according to the following expressions :

Xi (t) = ai sin(4(bi t)) + bi + i,t , t [0, 1] and i = 1, 2, . . . , n


where bi (respectively, i,t ) is N (0, 3), (respectively, N (0, 0.5)), while the n random variables
ai 's are generated according to a N (4, 3). All the curves Xi 's are discretized on the same
grid generated from 100 equispaced measurements in (0, 1) (cf. Figure 6.1).
On the other hand, the scalar response Yi , dened by (22), is computed by considering the
following operator :
1
dt
r(x) =
0 1 + |x(t)|
Recall that, with this model denition, the conditional density of Y given X = x can be
explicitly dened by :
1
1
f (x, y) = exp( (y r(x))2 )
2
2
In order to check the eciency of this global selection method over nite samples, we compare
the averaged squared error (d2 ) of the global cross-validation bandwidth (aGCV , bGCV ) to

48

15

10

10

15

2. Choix de la largeur de fentre

20

40

60

80

100

Time

Figure 2.1  A sample of 200 irregular curves


that given by the bandwidth (aGd2 , bGd2 ) which minimizes the global averaged squared
error. Noting that the use of d2 as a criterion of accuracy, is motivated by the fact that the
averaged squared error is easier to deal with from a computational point of view and it is
asymptotically equivalent to d1 and d3 (cf. Proof of Lemma 2.5.3).
For practical purposes, we select the parameters (aGCV , bGCV ) and (aGd2 , bGd2 ) over a nite
set Hn dened by (aq , bq ) where aq (respectively, bq ) is the quantile of order q of the vector
of all distances between the curves (respectively, between the response variable). Concerning
the weight functions, W1 and W2 , we recall that these functions are introduced to reduce
the boundary eects through their supports, but the role of their expressions are not very
determinant in practice as pointed out by Hrdle and Marron (1985). In our simulation
study, we take

1 if min d(t, Xi ) < a

i=1,...n

W1 (t) =

0 otherwise

1 if z [min{Yi , i = 1, . . . , n} 0.9, max{Yi , i = 1, . . . , n} 1.1]


W2 (z) =

0 otherwise

The kernel K (respectively, H ) is chosen to be quadratic on (0, 1) (respectively, on (1, 1)


). Another important point permitting to ensure a good behavior of the method, is the

2.4.

49

Discussion and applications

n
b
d2 (f(aGCV ,bGCV ) , f )
d2 (fb Gd2 Gd2 , f )

50
0.1183
0.0593

100
0.0850
0.04284

150
0.0774
0.0392

200
0.0395
0.03333

250
0.0262
0.02361

d2 (fb(aGd2 ,bGd2 ) , f )
d2 (fb GCV GCV , f )

1.9949

1.9859

1.9744

1.1851

1.1096

(a

,b

(a

,b

Table 2.1  Averaged squared errors according to various values of n.


use of a semi-metric that is well adapted to the kind of data that we have to deal with.
Here, we use a semi-metric based on the q rst eigenfunctions of the empirical covariance
operator associated with the q greatest eigenvalues (cf. Ferraty and Vieu (2006) for more
discussions). This choice is motivated by the shape of the curves X (cf. Figure 6.1). Similarly
to Benhenni et al. (2007), we take q = 3. The results of the four selected sample sizes,
n {50, 100, 200, 250}, are gathered in Table 2.1.
Then, it can be seen that Table 2.1 shows the good behavior of our functional procedure,
in this sense that, the d2 errors computed by using the smoothing parameters given by our
bandwidth selectors is very close to the minimum of the true/real errors. This fact is illustrated in the last row by the ratio d2 (fb(aGCV ,bGCV ) , f )/d2 (fb(aGd2 ,bGd2 ) , f ), where we observe
that our results admit suciently good performances even though for a small sample size
n = 50. It should be noted that, our automatic selection method computes faster the results
also for large sample sizes n {100, 200, 250}.
Now, we compare this global procedure to the local one. For this purpose, we consider a
sample of 150 observations of the couple (X, Y ), randomly splited into two subsets (100 observations as the learning sample and 50 ones are used as the test sample) and we construct
the kernel estimator of the conditional density for each curve in the test sample on the 100
equispaced point yj for j = 1, . . . 100 in the interval
[min{Yi , i = 1, . . . , n} 0.9, max{Yi , i = 1, . . . , n} 1.1] by using the both smoothing parameters selectors (local and global). The local bandwidths (aLCV , bLCV ) is selected over
Hn (x, y) the set of (a(x), b(y)) such that, for a(x) (respectively, for b(x)) the ball centered at
x (respectively, the interval centered at y ) with radius a(x) (respectively, with radius b(y))
contains exactly k neighbors of x ( respectively, of y ). Further, we use the following local
weight functions :

1
if
d(t,
x)
<
a(x)

1 if |z y| < b(y)

W1,x (t) =

and W2,y (z) =

0 otherwise

0 otherwise

We choose the global bandwidths over the same set Hn dened above and we use the same

50

2. Choix de la largeur de fentre

RSS(local)

0.0044

0.0046

0.0048

0.0050

RSS(global)

Figure 2.2  Comparison of RSS errors


functions W1 and W2 . Moreover, we consider the same semi-metric and the same kernels K
and H as in the rst illustration. Then, we examine the accuracy of our conditional density
estimates as a collection of slices by using the generalized sum of squared residuals (RSS)
dened by :

RSS(local) =

50 100
]2
1 [
f (xi , yj ) fb(aLCV ,bLCV ) (xi , yj )
500
i=1 j=1

and

50 100
]2
1 [
RSS(global) =
f (xi , yj ) fb(aGCV ,bGCV ) (xi , yj )
500
i=1 j=1

We have carried out several tests (exactly 25 tests) by changing observations between the
learning and the test samples. In Figure 2 we plot the box-plot of the given RSS errors in
both cases. It appears clearly that, the local bandwidth choice outperforms better than the
global selection method.

2.4.

51

Discussion and applications

2.4.3 A real data application


In this Section, we test the eciency of our procedure in the conditional mode estimation as discussed in Section 4.1. More precisely, we compare our local bandwidth selection
procedure to the local one used by Ferraty and Vieu (2006). For this purpose, we consider
the prevision problem when we are interested by the logarithm of the total precipitations
given the monthly maximum temperatures via the conditional mode estimation using the
both selection methods.
Notice that, the data used for this study, come from the US National Climatic data center
and are available on the Web 4 , and are collected in 98 climatic stations in USA from 2004
until 2010. According to the notations of the previous section, the functional predictor Xi
is the monthly maximum temperatures in the ith climatic station from 2004 until 2010 (in
tenths of degrees F) and Yi is the logarithm of the total precipitations (in hundredths of
inches) in the same station and the same period. The functional predictors Xi for i = 1, . . . 98
are plotted in Figure 6.3.
The practical utilization of our selection method in the conditional mode estimation has
been discussed in the previous section. For this study, we keep the same arguments used
in the previous section. In other words, we keep the same weight functions W1,x , W2,y ,
the same subsets Hn (x, y), the same kernels K and H and according to the shape of our
curves, we use the semi-metric based on the functional principal components analysis (cf.
also Besse et al., 1997) with q = 8. We split our data into two subsets. The rst sample, of
size n = 80 corresponds to the learning sample which will be used , as a sample, to compute
our conditional mode estimators at the 18 remaining curves (considered as the test sample).
Then, the conditional density estimation based on the LCV, LCDE method say, may be
presented as follows :

b LCV , bLCV , xi ) = arg sup fb LCV LCV (xi , y)


curve xi in the test sample, (a
(a
,b
)
yR

While for the Ferraty and Vieu's method (F-V-Method, say) we use the R-routine named
5
funopare.mode.lcv . Recall that, the parameters (a, b) in this R-routine are locally chosen
over the same type of set Hn (x, y) as follows :

b F V , bF V , xi ) = arg sup fb F V F V (x , y).


curve xi in the test sample, (a
(a i ,b i ) i
yR

where xi = arg

xj

min
d(xi , xj ) and
learning sample

(aF Vi , bF Vi ) = arg
4. in the fttp address :

min

(a,b)Hn (xi ,y)

b b, x ).
|Yi (a,
i

ftp ://ftp.ncdc.noaa.gov/pub/data/ushcn/v2/monthly

5. available at the website

www.lsp.ups-tlse.fr/staph/npfda

52

200

400

600

800

1000

1200

2. Choix de la largeur de fentre

20

40

60

80

Time

Figure 2.3  Monthly maximum temperatures in 98 climatic stations in USA

2.4.

53

Discussion and applications

LCDEMethod (MSE=0.35)

9.0

9.5

Estimates

9.5
9.0

Estimates

10.0

10.0

10.5

FVMethod (MSE=0.47)

9.0

9.5 10.0

9.0

Observations

9.5 10.0

Observations

Figure 2.4  Comparison of the prediction results between the F-V-Method and the LCDE-Method
The performance of the both selection rules, in terms of prediction, is evaluated by computing
the mean squared prediction errors (MSE), dened by the following quantities :

MSE(LCDE) =
and

(
)2
LCV , bLCV , X )
b
Y

(a
i
i
iI1
18
(

MSE(FV) =

iI1

)2
b F V , bF V , X i )
Yi (a
18

with I1 is the indexes set's of the sample test.


Clearly, the comparison of both scatterplots (cf. Figure 6.4) indicates that the LCDE Method
gives better results than those given by the F-V-Method. This is conrmed by relative mean
squared prediction errors MSE(LCDE)= 0.35 and MSE(FV)= 0.47. It is worth noting that

54

2. Choix de la largeur de fentre

the superiority of our local procedure to the local one used in Ferraty and Vieu (2006)
can be justied by the fact that, for each curve xi in the sample test, our local procedure
minimizes over all k - nearest neighbors smoothing parameter, while in Ferraty and Vieu
(2006) consider, only, the k -nearest neighbors of the nearest curve of xi in the learning
sample.

2.5

Proofs

Recall that, C denotes a generic constant. As specied above, the technical parts which
are similar to the nite dimensional or regression operator estimation for functional data
techniques are omitted. Thus, we encourage readers who are interested in these proofs to
keep at hand the standard nite dimensional literature (that is, the paper by Hrdle and
Marron, 1985) or those relative to the innite dimensional framework (that is, the papers
by Rachdi and Vieu (2007), and Benhenni et al. (2007)).
Proof of theorem 2.3.1.

and for all (a, b) Hn :

We consider the following decomposition, for all (x, y) SX SY ,

)2
(
)2
fb(a,b) (x, y) f (x, y)
=
fbN (x, y) f (x, y)fbD (x)
(
)
(
)2
+2 1 fbD (x) fbD (x) fb(a,b) (x, y) f (x, y)
(
)2 (
)2
+ 1 fbD (x)
fb(a,b) (x, y) f (x, y)

where

fbD (x) =

nIE [K(a1 d(x, X))]

i=1

K(a1 d(x, Xi ))

(23)

and

fbN (x, y) =

nbIE [K(a1 d(x, X))]

i=1

K(a1 d(x, Xi ))H(b1 (y Yi ))

= fbD (x)fb(a,b) (x, y)

(24)

[
]
It follows from the uniform consistency 6 of fbD (x) to IE fbD (x) = 1 (cf. Ferraty et al., 2010),
that :
k = 1, 2, 3 dk (fb(a,b) (x, y), f (x, y)) = dk (fbN (x, y), f (x, y)fbD (x))+oa.s. (dk (fb(a,b) (x, y), f (x, y)))
6. In Ferraty et al. (2010), the main aim is to state the rate of the uniform almost-complete convergence
of the functional component. Such result can be easily extended here (without precision of the convergence
rate) to

supaHn

by using the second part of assumption (15)

2.5.

55

Proofs

It suces, then, to show the claimed result for dk dened by :

dk (fb(a,b) , f ) = dk (fbN (x, y), f (x, y)fbD (x)) for all k = 1, 2, 3.


Notice that by following the same steps as those used for proving Lemmas 5, 6 and 7 in
Rachdi and Vieu (2007), and by noting :
(
)
1

1
1
1
K (x, Xi ) =
K(a d(x, Xi ))H(b (y Yi ) bf (x, y)K(a d(x, Xi ))
bIE [K(a1 d(x, X))]
we obtain that :




dk (fb(a,b) , f ) dl (fb(a,b) , f )
0, a.s., for all k = l
sup
hHn
d3 (fb(a,b) , f )

On the other hand, we introduce the error measure d5 dened by :


n
)2 W (X )W (Y )
1 ( bi
1
i
2 i
d5 (fb(a,b) , f ) =
f(a,b) (Xi , Yi ) f (Xi , Yi )
n
f (Xi , Yi )
i=1

where

fbi (Xi , Yi )
i
fb(a,b)
(Xi , Yi ) = N i
fbD (Xi )

with
i
fbN
(x, y) =

nbIE [K(a1 d(x, X))]

j=i

K(a1 d(x, Xj ))H(b1 (y Yj )

and
i
fbD
(x) =

Remark that :

nIE [K(a1 d(x, X))]

j=i

(
)
K a1 d(x, Xj )

GCV (a, b) d5 (fb(a,b) , f ) = CT (a, b) + T

where

1
CT (a, b) =
W1 (Xi )
n
n

i=1

1 bi2
W1 (Xi )W2 (Yi )
i2
fb(a,b)
(Xi , y)W2 (y) dy
f(a,b) (Xi , Yi )
n
f (Xi , Yi )
n

i=1

and

1
f (Xi , Yi )W1 (Xi )W2 (Yi )
n
n

T =

i=1

56

2. Choix de la largeur de fentre

Thus, the proof of this theorem is complete if we can prove that d5 is asymptotically equivalent to d3 and that :


CT (a, b)


sup
0, a.s., as n +
b

hHn d3 (f(a,b) , f )
That is why the demonstration of Theorem 2.3.1 is achieved by the following Lemmas 2.5.1,
2.5.2 and 2.5.3, for which the proofs are given in the Appendix (cf. Section 6).

Lemma 2.5.1.

Under hypotheses (10), (11), (12) and (14), we have that :

d3 (fb(a,b) , f ) C

Lemma 2.5.2.

1
nb(a)

Under hypotheses (10), (11), (12), (14) and (17), we obtain that :

sup

|nb(a)CT (a, b)| 0,

a.s., as

n +

(a,b)Hn

Lemma 2.5.3.

Under hypotheses (10), (11), (12), (14) and (17), we have that :



d (fb , f ) d (fb , f )
(a,b)
5 (a,b)

3
sup
0,


d3 (fb(a,b) , f )
(a,b)Hn

a.s., as

n +

Proof of Theorem 2.3.2.

The main ideas of the proof are essentially contained in Rachdi and
Vieu (2007) and, il the same way, the proof of Theorem 2.3.1 above, but the computations
here are more complicated since we have the additional problems of dealing with non constant
weight functions and functional data. The reader should have the above mentioned papers
at hand in order to get all the details of this proof. To make things easier, from hypothesis
(9), the weight function is bounded and has a compact support with nonempty interior (the
closure of B(x, w)). Thus, hypothesis (5) in Rachdi and Vieu (2007) is satised and the proof
of this theorem is shown using the same steps as in Theorem 2.3.1's proof. It is therefore
omitted here. We just mention that the rst step of the proof consists in showing the result
over a nite subset of Hn , and in the second step, the result is extended to the continuous
set of bandwidths Hn by using the Hlder continuity property of the functions K and f (cf.
Vieu (1991) and Youndj et al. (1993)).

2.6

Appendix : Proofs of technical lemmas

In what follows, we will denote, for all i = 1, . . . , n, by :

Ki (x) = K(a1 d(x, Xi ))

and

Hi (y) = H(b1 (y Yi ))

It is clear that :

[
]
d3 (fb(a,b) , f )
Var fb(a,b) (x, y) W1 (x)W2 (y) dPX (x) dy

Proof of Lemma 2.5.1.

2.6.

57

Appendix : Proofs of technical lemmas

[
]
Therefore, it is sucient to evaluate the variance term Var fb(a,b) (x, y) . To do that, by
using a similar computational
techniques as in Laksaci (2007) and by taking into account
[
]
b
the fact that IE fD (x) = 1, we get that :
[
]
[
]
Var fb(a,b) (x, y) = Var fbN (x, y) 2IE(fbN (x, y)) Cov(fbN (x, y), fbD (x))
)
(
)2
(
1
b
b
+ IE(fN (x, y)) Var(fD (x)) + o
nb(a)

(25)

According to the denitions (24) and (23) of the estimators fbN and fbD , we obtain that :
(
)
1
Var (K1 (x)H1 (y))
Var fbN (x, y) =
n(bIE [K(a1 d(x, X))])2

Cov(fbN (x, y), fbD (x)) =


and

1
nb(IE [K(a1 d(x, X))])2

(
)
Var fbD (x) =

Cov(K1 (x)H1 (y), K1 (x))

1
n(IE [K(a1 d(x, X))])2

Var (K1 (x))

Moreover, under (12) and after some simple calculations, we can show, for all i, j = 1, 2,
that :

[
]
[ i
]
j
i
IE K1 (x)H1 (y) = bIE K1 (x)f (X1 , y)
H j (t)dt + o(bIE[K1 (x)])
(26)
and

[
]
0 < C(a) IE K1i (x) C (a)

(27)

By comparing asymptotically the three quantities in (25), we can see that the rst term is
leading. Hence, from (26) and (27) we obtain that :
)
(

[
]
[ 2
]
1
1
2
b
Var f(a,b) (x, y) =
IE K1 (x)f (X1 , y)
H (t)dt + o
nb(IE [K(a1 d(x, X))])2
nb(a)
Furthermore, by using the fact that the conditional density does not vanish on a neighbor
of SX SY , we obtain that :
1
d3 (fb(a,b) , f ) C
nb(a)
which completes the proof of this lemma.
From the denition of CT (a, b), we have for all (a, b) Hn :


n
n
1

2
2
W
(X
)W
(Y
)
2


1
i
2
i
i
i
|CT (a, b)| =
fb(a,b)
(Xi , Yi )
fb(a,b)
(Xi , y)W2 (y)W1 (Xi )dy

n
n
f (Xi , Yi )
i=1
i=1
n
]
[

2
2
1
W
(X
)W
(Y
)

1
i
2 i
i
i
=
fbN
(Xi , y)W2 (y)W1 (Xi )dy 2fbN
(Xi , Yi )

i2
b


f
(X
,
Y
i
i)
i=1 nfD (x)

Proof of Lemma 2.5.2.

58

2. Choix de la largeur de fentre

i
This combined with the uniform consistency of fbD
(x) to 1 (cf. Ferraty et al., 2008), is
enough to prove that, as n + :
n [
]

2
2
W
(X
)W
(Y
)


1
i
2
i
i
i
sup b(a)
fbN (Xi , y)W2 (y)W1 (Xi )dy 2fbN (Xi , Yi )
0, a.s.


f
(X
,
Y
)
i i
(a,b)Hn
i=1

To do this, one can use similar arguments as in the real case (cf. Youndj, 1996). Indeed, let
for all 1 i, j, k n :

aij
bij
Uij

= Kj2 (Xi )W1 (Xi )

W2 (Yi )
=
Hj2 (y)W2 (y)dy Hj2 (Yi )
f (Xi , Yi )
= aij bij

cijk = Kj (Xi )Kk (Xi )W1 (Xi )

W2 (Yi )
dijk =
Hj (y)Hk (y)W2 (y)dy Hj (Yi )Hk (Yi )
f (Xi , Yi )
Vijk = cijk dijk
Now, we have to examine the following limits :




b(a)

Uij 0, a.s.
sup
2
(a,b)Hn (n 1)

i=j

and



b(a)
sup
2
(a,b)Hn (n 1)




Vijk 0, a.s.

i=j=k=i

The proof of the above two limits follows by adopting the same steps as in the proof of Lemma
3 in Rachdi and Vieu (2007). Notice that, by the Borel-Cantelli Lemma, it is enough to show
that there is a , > 0 such that for all p IN there are constants C, C so that :

2p

IE n2 b(a)
Uij Cnp
(28)
i=j

and

IE n2 b(a)

2p
Vijk )

C n p

i=j=k=i

To prove (28) we have that :

2p

(
)
IE n2 b(a)
Uij = (n2 b(a))2p

IE Ui1 j1 Ui2p j2p


i=j

i1 =j1

i2p =j2p

(29)

2.6.

59

Appendix : Proofs of technical lemmas

To compute these quantities, we have to show that :


(
)
IE Ui1 j1 Ui2p j2p = 0, if m > 2p

(30)

where m denotes the cardinality of the set {i1 , j1 , . . . , i2p , j2p }.


On the other hand, observe that, for all i, j :

IE[bij |X1 , , Xn ] =
b2 H 2 (b1 (y z))f (Xj , z)W2 (y)dydz

f (Xi , y)f (Xj , z)

b2 H 2 (b1 (y z))
W2 (y)dydz
f (Xi , y)
= 0
and thus, we may write :
IE[Uij |X1 , . . . , Xn ] = aij IE[bij |X1 , . . . , Xn ] = 0
and

IE[Uij |Xi ] = IE[IE[Uij |X1 , . . . , Xn ]|Xi ] = 0

So, if m > 2p there exists an a {1, . . . , 2p} such that ia (or ja ) appears only once in
{i1 , j1 , . . . , i2p , j2p }, therefore, it suces to compute the expectation by conditioning with
respect to Xia (or Xja ) to show (30).
On the other hand, for 2 m 2p, we deduce from (10) and (12) that :

m
m

(
)
C
|IE Ui1 j1 Ui2p j2p | 4p
IE
K ij (a1 d(Xi , Xj ))
W1i (Xi )
b (a)4p
i,j=1

i=1

where, i = 0, 1, and ij ij = 2p and for all i = 1, . . . , m, there is an j such that ij = 0,


and i = 1 or j = 1.
Similar arguments as those invoked for proving Lemma 9 in Rachdi and Vieu (2007) can be
used, which allow us to obtain that :

(
)
for all m = 2, . . . , 2p, |IE Ui1 j1 Ui2p j2p |

m
C
2 (a)
b4p ((a))4p

Therefore, by (14), we have :

2p
2p

m
2

nm4p b2p ((a)) 2 2p


IE n b(a)
Uij
C
i=j

m=2
2p (
)m2p

1
p
((a))
= C
n
((a)
(nb(a))2p
m=2

1
((a))p
C
(nb(a))2p
1
C 1 2p
.
(n ) ((a))p

(31)

60

2. Choix de la largeur de fentre

It suces now to combine (31) together with the assumption (16), to get (28).
Concerning (29), we use analogous arguments as for showing (28). By denoting m the
cardinality of the set {i1 , j1 , k1 , . . . , i2p , j2p , k2p } and because IE[dijk |X1 , . . . , Xn ] = 0 we
deduce that, if m > 3p, we have that :
(
)
IE Vi1 j1 k1 Vi2p j2p k2p = 0
Moreover, when 3 m 3p, a straightforward modication of the proof of Lemma 5.4.2
in Youndj (1996), gives :

[
]
IE Vi1 j1 k1 Vi2p j2p k2p

C
b4p (a)4p

(b(a))m /2

This leads directly to write, under (14), that :

IE n2 b(a)

2p

Vijk

3p

nm b4p (b(a))m /2 ((a))4p (n2 b(a))2p

m =3

i=j=k=i

= C

3p

(n(b(a))1/2 )m 3p
(1/2)
p
((n(b(a))
) )
m =3

1
(n(1/2)p (a)(p/2)

Once again, we use (16) to complete the proof of this lemma.


By using the same arguments as those used at the beginning of the
proof of the Theorem 2.3.1, and by introducing :
Proof of Lemma 2.5.3.

)2 W (X )W (Y )
1 ( bi
1
i
2 i
i
fN (Xi , Yi ) f (Xi , Yi )fbD
(Xi )
d5 (fb(a,b) , f ) =
n
f (Xi , Yi )
n

i=1

we can write

d5 (fb(a,b) , f ) = d5 (fb(a,b) , f ) + o(d5 (fb(a,b) , f ))

Therefore, because of the asymptotic equivalence between d3 and d2 the proof of this lemma
will be completed as soon as we show that :


d (fb , f ) d (fb , f )
5 (a,b)

2 (a,b)
sup
0, a.s.

hHn
d3 (fb(a,b) , f )
To do that, we consider the following decomposition :
(
)2
(
)2
(
)(
)
i
i
i
i
fbN
f fbD
=
fbN f fbD + 2 fbN
fbN + f (fbD fbD
) fbN f fbD
)2
(
i
i
)
fbN + f (fbD fbD
+
fbN

2.6.

61

Appendix : Proofs of technical lemmas

Furthermore, observe that, for all i = 1, . . . , n, we have that :


i
fbN
(x, y) fbN (x, y) =

and

i
fbD
(x) fbD (x) =

1 b
fN (x, y)
n1
(
) (
)
1
d(x, Xi )
y Yi
K
H
(n 1)bIE [K(a1 d(x, X))]
a
b

1
1 b
fD (x)
K
n1
(n 1)IE [K(a1 d(x, X))]

d(x, Xi )
a

Since K is a bounded function, we deduce from the uniform consistency of fbD and fbN that
i
|fbN
(x, y) fbN (x, y)|

and

i
|fbD
(x) fbD (x)|

C
C

, a.s.
(n 1)bIE [K(a1 d(x, X))]
(n 1)b(a)

C
C

, a.s.
1
(n 1)bIE [K(a d(x, X))]
(n 1)b(a)

Hence, we get :



b

d5 (f(a,b) , f ) d2 (fb(a,b) , f )



C
b

sup
fN (x, y) f (x, y)fbD
(n 1)b(a) (x,y)SX SY
+

C
(n

1)2 b2 ((a))2

By combining this last result with Lemma 2.5.1 we obtain the claimed result.

62

2. Choix de la largeur de fentre

Bibliographie

[1] Acerbi, C. (2002). Spectral measures of risk : a coherent representation of subjective


risk aversion. J. Bank. Financ., 26, Pages 1505-1518.
[2] Bashtannyk, D.M. and Hyndman, R.J. (2001). Bandwidth selection for kernel conditional density estimation. Comput. Statist. Data Anal., 36, Pages 279-298.
[3] Benhenni, K., Ferraty, F., Rachdi, M. and Vieu, P. (2007). Local smoothing regression
with functional data. Comput. Statist., 22, No. 3, Pages 353369.
[4] Besse, P., Cardot, H. and Ferraty, F. (1997). Simultaneous nonparametric regressions of
unbalanced longitudinal data. Comput. Statist. Data Anal., 24, No. 3, Pages 255270.
[5] Bogachev, V.I. (1999). Gaussian measures. Math surveys and monographs, 62, Amer.
Math. Soc.
[6] Bosq, D. (2000). Linear Processes in Function Spaces : Theory and applications. Lecture
Notes in Statistics, 149, Springer.
[7] Collomb, G., Hrdle, W. and Hassani, S. (1987). A note on prediction via estimation
of conditional mode function. J. of Statist. Plan. and Inf., 15, Pages 227236.
[8] Dabo-Niang, S. and Laksaci, A. (2007). Estimation non paramtrique du mode conditionnel pour variable explicative fonctionnelle. Pub. Inst. Stat. Univ. Paris, 3, Pages
2742.
[9] De Gooijer, J. and Gannoun, A. (2000) Nonparametric conditional predictive regions
for time series Comput. Statist. Data Anal., 33, Pages 259257.
[10] El Ghouch, A. and Genton, M. (2009). Local polynomial quantile regression with parametric features. J. Amer. Statist. Assoc., 104, No. 488, Pages 14161429.
[11] Fan, J., Yao, Q. and Tong, H. (1996). Estimation of conditional densities and sensitivity
measures in nonlinear dynamical systems. Biometrika, 83, Pages 189206.
[12] Ferraty, F., Mas, A. and Vieu, P. (2007). Advances in nonparametric regression for
functional variables. Aust. and New Zeal. J. of Statist., 49, Pages 1-20.
[13] Ferraty, F., Tadj, A., Laksaci, A. and Vieu, P. (2010). Rate of uniform consistency for
nonparametric estimates with functional variables. J. of Statist. Plan. and Inf., 140,
Pages 335352 .
[14] Ferraty, F., Laksaci, A. and Vieu, P. (2005). Functional times series prediction via
conditional mode. C. R. Acad. Sci. Maths. Paris, 340, Pages 389392.
63

64

2. Choix de la largeur de fentre

[15] Ferraty, F., Laksaci, A. and Vieu, P. (2006). Estimating some characteristics of the
conditional distribution in nonparametric functional models. Statist. Inf. for Stoch.
Proc., 9, Pages 4776.
[16] Ferraty, F. and Vieu, P. (2006).
Practice. Springer-Verlag.

Nonparametric functional data analysis. Theory and

[17] Gannoun, A. (1990). Estimation non paramtrique de la mdiane conditionnelle.


Inst. Stat. Univ. Paris, 35, No. 1, Pages 11-22.

Pub.

[18] Gannoun, A., Saracco, J. and Yu, K. (2003). Nonparametric prediction by conditional
median and quantiles. J. of Statist. Plan. and Inf., 117, No. 2, Pages 207223.
[19] Hall, P., Wolk, R.C. and Yao, Q. (1999). Methods for estimating a conditional distribution function. J. Amer. Statist. Assoc., 94, Pages 154163.
[20] Hrdle, W. (1991).
York.

Smoothing Techniques with Implementation in S.

Springer, New

[21] Hrdle, W., Jenssen, P. and Sering, R. (1991). Strong consistency rates for estimators
of conditional functionals. Ann. Statist., 16, No. 4, Pages 14281449.
[22] Hrdle, W. and Marron, J. S. (1985). Optimal bandwidth selection in nonparametric
regression function estimation. Ann. Statist., 13, No. 4, Pages 14651481.
[23] Hyndman, R.J. (1995). Highest-density forecast regions for non-linear and non-normal
time series models. J. Forecast., 14, Pages 431441.
[24] Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996). Estimating and visualizing conditional densities. J. Comput. Graph. Statist., 5, Pages 315336.
[25] Hyndman, R.J. and Yao, Q. (1998). Nonparametric estimation and symmetry tests for
conditional density functions. Working paper 17/98, Department of Econometrics and
Business Statistics, Monash University.

[26] Kolmogorov, A.N. and Tikhomirov, V.M. (1959). -entropy and -capacity. Uspekhi
Mat. Nauk., 14, Pages 386. (Eng. Transl. Amer. Math. Soc. Transl. Ser., 2, Pages
277364, (1964)).
[27] Laksaci, A. (2007). Convergence en moyenne quadratique de l'estimateur noyau de
la densit conditionnelle avec variable explicative fonctionnelle. Pub. Inst. Stat. Univ.
Paris, 3, Pages 6980.
[28] Ouassou, I. and Rachdi, M. (2009). Stein type estimation of the regression operator
for functional data. Advances and Applications in Statistical Sciences, 1, No. 2, Pages
233-250.
[29] Rachdi, M. and Vieu, P. (2007). Nonparametric regression for functional data : automatic smoothing parameter selection. J. of Statist. Plan. and Inf., 137, Pages 27842801.
[30] Ramsay, J.O. and Silverman, B.W. (2005).
Springer-Verlag. New York.

Functional data analysis.

Second Edition,

[31] Rosenblatt, M. (1969). Conditional probability density and regression estimators. In


Multivariate Analysis II, Ed. P.R. Krishnaiah. Academic Press, New York and London.

2.6.

65

Appendix : Proofs of technical lemmas

[32] Roussas, G.G. (1968). On some properties of nonparametric estimates of probability


density functions. Bull. Soc. Math. Greece (N.S.), 9, Pages 2943.
[33] Samworth, R. J. and Wand, M. P. (2010). Asymptotics and optimal bandwidth selection
for highest density region estimation. Ann. Statist., 38 , Pages 17671792.
[34] Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators.
Scand. J. Statist., 9, Pages 6578.
[35] Stone, C.J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation. Ann. Statist., 22, No. 1, Pages 118184.
[36] Wand, M.P. and Jones, M.C. (1995).

Kernel Smoothing.

Chapman & Hall, London.

[37] Theodoros, N. and Yannis, G.Y. (1997). Rates of convergence of estimate, Kolmogorov
entropy and the dimensionality reduction principle in regression. Ann. Statist., 25, No.
6, Pages 24932511.
[38] Vieu, P. (1991). Nonparametric regression : optimal local bandwidth choice.
Soc., 53, No. 2, Pages 453464.

J. R. Stat.

[39] Youndj, E. (1993). Estimation non paramtrique de la densit conditionnelle


mthode du noyau. PhD Thesis from the Rouen University (in French).

par la

[40] Youndj, E. (1996). Proprits de convergence de l'estimateur noyau de la densit


conditionnelle. Rev. Roumaine Math. Pures Appl., 41, Pages 535566.
[41] Youndj, E., Sarda, P. and Vieu, P. (1993). Kernel estimator of conditional density
bandwidth selection for dependent data. C. R. Acad. Sci. Math. Paris, 316, No. 9,
Pages 935938.

66

2. Choix de la largeur de fentre

Chapitre 3
Functional data : Local linear
estimation of the conditional density
and its application

Jacques Demongeot 1 , Ali Laksaci 2


Fethi Madani1 and Mustapha Rachdi 3

,4

C. R., Math., Acad. Sci. Paris, 348, Issues 15-16, Pages 931-934, (2010).
Statistics, DOI : 10.1080/02331888.2011.568117 ( paratre en 2012)
Abstract.In

this paper, we introduce a new nonparametric estimation of the conditional density of


a scalar response variable given a random variable taking values in a semi-metric space. Under some
general conditions, we establish the pointwise and uniform almost complete consistencies with rates
of this estimator. Moreover, as an application, we use the obtained results to derive some asymptotic
properties for the local linear estimator of the conditional mode.
Keywords. Functional data, Local linear estimator, Conditional density, Conditional mode, Nonparametric model, Small balls probability
AMS Subject Classication. Primary : 62G05, Secondary : 62G07, 62G08, 62G35, 62G20.

3.1

Introduction

This paper deals with local polynomial modeling of the conditional density function when
the explanatory variable is of functional type. It is well known that a local polynomial smoothing has various advantages over the kernel method, namely this method has superior bias
properties to the previous one (cf. [4] and [6] for an extensive discussion on the comparison
1. Laboratoire AGIM FRE

3405

CNRS, Equipe TIMB, Facult de Mdecine de Grenoble, Universit J.

Fourier, 38700 La Tronche, France. E-mail : Jacques.Demongeot@imag.fr


2. Universit Djillali Liabs, BP. 89, Sidi Bel-Abbs 22000, Algeria. E-mail : alilak@yahoo.fr
3. Laboratoire AGIM FRE

3405

CNRS, Equipe TIMB, Universit P. Mends France (Grenoble 2),

UFR SHS, BP. 47, 38040 Grenoble Cedex 09, France. E-mails : Mustapha.Rachdi@upmf-grenoble.fr and
Fethi.Madani@imag.fr
4. Corresponding author

67

68

Chapitre 3. Estimation locale de la densit conditionnelle

between both these methods). Notice that these questions in innite dimensional spaces are
particularly interesting, at once for the fundamental problems they formulate, but also for
many applications they may allow (cf. [5, 11, 29, 30]). Moreover, the kernel method is known
for being a particular case of the local polynomial method.
Except the fact that the conditional density plays an important role in nonparametric prediction, there are several tools in nonparametric statistic, such as the conditional mode, the
conditional median or the conditional quantiles, which are based on the preliminary estimator of the functional parameter proposed in this paper. In the nonparametric functional
statistics, the rst results about the almost-complete consistency were obtained in [10], for
conditional density/distribution functions estimation when the data are independent and
identically distributed. The strong mixing case has been studied by [16]. On the other hand,
the convergence in Lp -norm mode of the kernel estimator of the conditional mode was stated
in [8], and some asymptotics of the conditional quantile and mode estimators in [10]. While
in [12] the asymptotic expansion of the exact expression involved in the leading terms of the
quadratic error of the kernel estimators of the conditional density is established, in [9] the
uniform almost complete convergence of some nonparametric conditional models is showed.
For some more recent advances in the nonparametric statistics for functional data we refer
to [3, 5, 25] and the references therein.
In this work, we introduce a new nonparametric estimation of the conditional density for
functional data. Our estimator is based on the local linear approach. Notice that the local
linear estimator of the conditional density has been widely studied, when the explicative
variable lies in a nite dimensional space, and there are many references on this topic (cf.
for instance [12, 19]). For a general treatment/study of local polynomial estimation, we refer
to [13, 24]. Thus, in this paper, we are concerned in proving, under some general conditions,
the almost complete convergence with rates of the constructed estimator. More precisely, in
Section 3, we show the pointwise consistency. The uniform version of this asymptotic result
is given in Section 4. The interest of the uniform consistency comes mainly from the fact
that the pointwise performance of all estimators is not sucient to quantify its eciency,
but, some stability is needed, in the sense that this performance should be uniform over
a neighborhood. Notice that, in the functional statistics, the uniform convergence is not a
direct extension of the previous pointwise results, but, it requires some additional tools and
conditions. In Section 5, we will emphasize the consequence of the previous results to the
estimation of the conditional mode.

3.2

Model

Let us introduce n pairs of random variables (Xi , Yi ) for i = 1, . . . , n that we assume drawn
from the pair (X, Y ) which is valued in F R, where F is a semi-metric space equipped
with a semi-metric d.
Furthermore, we assume that there exists a regular version of the conditional probability of
Y given X , which is absolutely continuous with respect to Lebesgue measure on R and has
bounded density, denoted by f x . Local polynomial smoothing is based on the assumption
that functional parameter is smooth enough to be locally well approximated by a polynomial.
In functional statistics, there are several ways for extending the local linear ideas (cf. [1, 2, 5]).

3.3.

69

Pointwise almost complete convergence

Here we adopt the fast functional locally modeling, that is, we estimate the conditional
density f x by b
a which is obtained by minimizing the following quantity :

min

(a,b)IR2

)2
1
h1
H(h
(y

Y
))

b(X
,
x)
K(h1
i
i
H
H
K (x, Xi ))

(1)

i=1

where (., .) is a known function from F 2 into IR such that, F , (, ) = 0, with K


and H are kernels and hK = hK,n (resp. hH = hH,n ) is chosen as a sequence of positive real
numbers and (., .) is a function of F F such that d(., .) = |(., .)|. Clearly, by a simple
algebra, we get explicitly the following denition of fbx :
n
1
i,j=1 Wij (x)H(hH (y Yi ))
x
b

f (y) =
(2)
hH ni,j=1 Wij (x)
where
1
Wij (x) = (Xi , x) ((Xi , x) (Xj , x)) K(h1
K (x, Xi ))K(hK (x, Xj ))

with the convention 0/0 = 0.

Remark 3.2.1.
 Obviously, if

b=0

then we obtain from (1) the Nadaraya-Watson estimator studied,

in the functional case, in [9, 10, 28] and the references therein.
 The minimization of (1) may be achieved by a wiggly
to all the data points in a neighborhood of

bb

(see Chapter

that forces

15

fbx

to adapt

in [29] for a similar

reasoning in the context of linear regression). In [6] is expressed the same idea stating
that optimizing in

3.3

is an innite-dimensional problem.

Pointwise almost complete convergence

In what follows x denotes a xed point in F , Nx denotes a xed neighborhood of x, SIR will
be a xed compact subset of IR, and x (r1 , r2 ) = IP(r2 (X, x) r1 ).
Notice that our nonparametric model will be quite general in the sense that we will just
need the following assumptions :
(H1) For any r > 0, x (r) := x (r, r) > 0
2
(H2) The conditional density f x is such that : there exist b1 > 0, b2 > 0, (y1 , y2 ) SIR
and (x1 , x2 ) Nx Nx
(
)
|f x1 (y1 ) f x2 (y2 )| Cx db1 (x1 , x2 ) + |y1 y2 |b2 ,

where Cx is a positive constant depending on x.


(H3) The function (., .) is such that :

x F , C1 d(x, x ) |(x, x )| C2 d(x, x ), where C1 > 0, C2 > 0.


(H4) K is a positive, dierentiable function with support [1, 1].
(H5) H is a positive, bounded, Lipschitzian continuous function, such that :

b2
|t| H(t)dt < and
H 2 (t)dt < .

70

Chapitre 3. Estimation locale de la densit conditionnelle

(H6) The bandwidth hK satises : there exists an integer n0 , such that

1
n > n0 ,
x (hK )
and

x (zhK , hK )

)
d ( 2
z K(z) dz > C3 > 0
dz

hK

)
2

(u, x)dP (u) = o


B(x,hK )

(u, x) dP (u)
B(x,hK )

where B(x, r) = {x F/d(x , x) r} denotes the closed-ball of center x and radius


r, and dP (x) is the cumulative distribution of X .
(H7) The bandwidth hH satises :

lim n hH = for some > 0 and lim

ln n
= 0.
nhH x (hK )

Observe that these conditions are very standard in this context. The conditions (H1), (H3)
and (H6) are the same as those used in [1]. Assumptions (H2) is a regularity condition which
characterizes the functional space of our model and is needed to evaluate the bias term in
the asymptotic results of this paper. The hypotheses (H5) and (H7) are technical conditions
and are, also, similar to those considered in [10].
The following theorem gives the almost-complete convergence 5 (a.co.) of fbx .

Theorem 3.3.1.

Under assumptions (H1), (H2), (H3), (H4), (H5), (H6) and (H7), we

have that :

(
)
sup |fbx (y) f x (y)| = O hbK1 + hbH2 + O

ySIR

ln n
n hH x (hK )

)
, a.co.

Remark that, the proof of Theorem 4.2.1 is a direct consequence of the decomposition :

y SIR , fbx (y) f x (y) =


+
where

x
fbN
(y) =

) (
)}
1 {( bx
x
x
fN (y) IE[fbN
(y)] f x (y) IE[fbN
(y)]
x
fbD
)
f x (y) (
x
1 fbD
,
x
fbD

(3)

1
Wij (x)H(h1
H (y Yi )).
n(n 1)hH IE [W12 (x)]
i=j

and

x
fbD
=

1
Wij (x)
n(n 1)IE [W12 (x)]
i=j

and of Lemmas 5.3.1, 5.3.2 and 5.3.3 below, for which the proofs are given in the Appendix.
(zn )nN bea sequence of real r.v.'s ; we say that zn converges almost completely (a.co.) to zero if,
> 0,
(un )nN be a sequence of positive real numbers ;
n=1 IP (|zn | > 0) < . Moreover,
let

that zn = O(un ) a.co. if, and only if, > 0,


n=1 IP (|zn | > un ) < : This kind of convergence

5. Let

and only if,


we say

implies both almost sure convergence and convergence in probability (cf. [13] for details).

3.4.

71

Uniform almost complete convergence

Lemma 3.3.1.

(cf. [1])

Under assumptions (H1), (H3), (H4) and (H6), we have that :

(
x
1 fbD
=O

and

> 0,

such that

ln n
n x (hK )

)
, a.co.

(
)
x
IP fbD
< < .

i=1

Lemma 3.3.2.

Under assumptions (H1), (H2) and (H4), we obtain :



(
)


x
sup f x (y) IE[fbN
(y)] = O hbK1 + hbH2 .

ySIR

Lemma 3.3.3.

Under assumptions of Theorem 4.2.1, we get :




x
x
(y) IE[fbN
(y)] = O
sup fbN

ySIR

3.4

ln n
n hH x (hK )

)
, a.co.

Uniform almost complete convergence

This Section is devoted to the uniform version of Theorem 4.2.1. More precisely, our purpose
is to establish the uniform almost complete convergence of fbx on some subset SF of F , such
that :
dn

SF
B(xk , rn ),
k=1

where xk F and rn (resp. dn ) is a sequence of positive real numbers.


In practice, the uniform consistency has great importance because it allows us to make
prediction, even if the data are not perfectly observed. Moreover, the uniform convergence
results are indispensable tools for some nonparametric functional data problems such as for
instance data-driven bandwidth choice (cf. [4, 28]), or bootstrapping (cf. [16]). It is worth
noting that, in the multivariate case, the uniform consistency is a standard extension of
the pointwise one, however, in our functional case, some additional tools and topological
conditions are required. Thus, in addition to the conditions introduced in the previous
section, we need the following ones.
(U1) There exists a dierentiable function (.), such that :

x SF , 0 < C (h) x (h) C (h) < and 0 > 0, < 0 , () < C,


where C and C are strictly positive constants and where denotes the rst derivative
of .
(U2) The conditional density f x satises, for some strictly positive constant C , that :

(y1 , y2 ) SIR SIR , (x1 , x2 ) SF SF :


|f x1 (y1 ) f x2 (y2 )| C

)
db1 (x1 , x2 ) + |y1 y2 |b2 .

72

Chapitre 3. Estimation locale de la densit conditionnelle

(U3) The function (., .) satises (H3) and, for some strictly positive constant C , the
following Lipschitz's condition :

(x1 , x2 ) SF SF , |(x1 , x ) (x2 , x )| C d(x1 , x2 ).


(U4) The kernel K satises (H4) and, for some strictly positive constant C , the following
Lipschitz's condition :
|K(x) K(y)| C ||x| |y||.
)
(
ln n

(U5) For some (0, 1), lim n hH = , and for rn = O


the sequence dn
n+
n
satises :
(ln n)2
n1 (hK )
<
ln
d
<
,
n
n1 (hK )
ln n
and

n(3+1)/2 d1
< , for some > 1.
n

n=1

Notice that conditions (U1) and (U2) are, respectively, the uniform versions of (H1) and
(H2). Indeed, conditions (U1) and (U5) are linked with the topological structure of the
functional variable. Therefore, as in the pointwise case, the choice of the topological structure, controlled here by means of the function (., .), plays a crucial role. So, a right choice
of this function improves the convergence rate of the estimator. More precisely, we will see
thereafter, that a good semi-metric is that increases the concentration of the probability
measure of the functional variable X as well as minimizes dn . It should be noticed that,
both conditions (U1) and (U2) are veried for several continuous time-processes (cf. for
instance [9] for some examples).

Theorem 3.4.1.

Under assumptions (U1), (U2), (U3), (U4), (U5), (H5) and (H6), we

have that :

(
bx

sup sup |f (y) f (y)| =


x

xSF ySIR

O(hbK1 )

O(hbH2 )

+ Oa.co.

ln dn
n1 (hK )

)
.

(4)

It Clear that, as for Theorem 4.2.1, the Theorem 3.4.1's proof can be deduced directly
from the decomposition (1) and from the following intermediate results which correspond to
the uniform versions of Lemmas 5.3.1, 5.3.2, 5.3.3 and for which the proofs are, also, given
in the Appendix.

Lemma 3.4.1.

Under assumptions (U1), (U3), (U4), (U5) and (H6), we obtain that :

(
x
1| = Oa.co.
sup |fbD

xSF

Corollary 3.4.1.

)
.

Under the assumptions of Lemma 3.4.1, we have that :

n=1

Lemma 3.4.2.

ln dn
n(hK )

1
x
IP inf fbD
<
xSF
2

)
< .

Under the hypotheses (U1), (U2) and (H5), we obtain that :

[
]
x
sup sup |f x (y) IE fbN
(y) | = O(hbK1 ) + O(hbH2 ).

xSF ySIR

3.5.

73

Application : Conditional mode estimation

Lemma 3.4.3.

Under the assumptions of Theorem 3.4.1, we obtain that :

(
x
x
(y) IE[fbN
(y)]| = Oa.co.
sup sup |fbN

xSF ySIR

3.5

ln dn
1
n (hK )

)
.

Application : Conditional mode estimation

Let us now study the almost complete convergence of the kernel estimator of the conditional
mode of Y given X = x, denoted by (x), uniformly on a xed compact subset SF of
F . For this aim, we assume that (x) satises, on SF , the following uniform uniqueness
property (cf. [27, 34] for the univariate case and [26] for the multivariate case).
(U6) 0 > 0, > 0, r : S SIR , we have that :

sup |(x) r(x)| 0 sup |f x (r(x)) f x ((x))| .


xSF

xSF

Moreover, we suppose, also, that there exists some integer j > 1 such that x SF , the
function f x is j -times continuously dierentiable on interior(SIR ) with respect to y , and
that :
(U7)

x(l)

f ((x)) = 0, if 1 l < j
and f x(j) () is uniformly continuous on SIR

such that |f x(j) ((x))| > C > 0


where f x(j) denotes the j th order derivative of the conditional density f x .

d dened by :
We estimate the conditional mode (x) by the random variable (x)
d = arg sup fbx (y).
(x)
ySIR

Thus, from Theorem 3.4.1, we derive the following Corollary.

Corollaire 3.5.1.

Under the hypotheses of Theorem 3.4.1, and if the conditional density

fx

satises assumptions (H9) and (H10), then we get :

(
d (x)| =
sup |(x)
j

xSF

3.6

O(hbK1 )

O(hbH2 )

+ Oa.co.

ln dn
n1 (hK )

)
.

Appendix

In what follows, when no confusion is possible, we will denote by C and C some strictly
positive generic constants. Moreover, we put, for any x F , and for all i = 1, . . . , n :

Ki (x) = K(h1 (x, Xi )), i (x) = (Xi , x) and Hi (y) = H(h1


H (y Yi )).

Proof of lemma 5.3.2.

Since the pairs (Xi , Yi ) are identically distributed, then from


assumption (H4) we obtain :
]]]
[
[ [
x
.
y SR , IE[fbN
(y)] = IE W12 (x) IE h1
H H1 (y)/X

74

Chapitre 3. Estimation locale de la densit conditionnelle

By the classical change of variables t = (y z)/hH , we obtain :

h1
IE
[H
(y)/X]
=
H(t)f X (y hH t)dt,
1
H
R

therefore

|IE[H1 (y)/X] f (y)|


x

H(t)|f X (y hH t) f x (y)|dt.

Thus, by the assumption (H2) we get that :

y SR , 1IB(x,hK ) (X)|IE(H1 (y)/X) f (y)|


x

H(t)(hbK1 + |t|b2 hbH2 )dt.

Since H is a probability density function, then the claimed result in this lemma is a direct
consequence of (H5).

Proof of lemma 5.3.3. The compactness property of SIR , allows usto write that :

n
there exists a sequence of real numbers (tk )k=1,...,sn , such that SIR sk=1
(tk ln , tk + ln )
3
1
2 2
1
with ln = n
and sn = O(ln ).
Let ty = arg
min
|y t| and consider the following decomposition :

t{t1 ,...,tsn }



x

x
sup fbN
(y) IE[fbN
(y)]

ySIR





x

x

x
x
(ty ) IE[fbN
(ty )] +
sup fbN
(y) fbN
(ty ) + sup fbN
ySIR
ySIR
{z
} |
{z
}
|
A

1



x
x
+ sup IE[fbN
(ty )] IE[fbN
(y)] .
ySIR
|
{z
}

A2

(5)

A3

Firstly, for the terms A1 and A3 , we use the Lipschitz's condition on the kernel H to show
that :

1
x

x
sup fbN
(y) fbN
(ty ) sup
|Hi (y) Hi (ty )| Wij (x),
ySR
ySR n(n 1) hH IE[W12 (x)] i=j

C|y ty |
1
Wij (x) ,
sup
hH
n(n 1) hH IE[W12 (x)]
yS
i=j

ln bx
f .
h2H D

(6)

x (cf. Lemma 5.3.1) and (14) permit to write that :


The almost complete consistency of fbD



ln
x

x
sup fbN
(y) fbN
(ty ) C 2
hH
ySIR
Since ln = n

3
12
2

, then



x

x
sup fbN
(y) fbN
(ty ) = o

ySIR

ln n
n hH x (hK )

)
.

(7)

3.6.

75

Appendix

Indeed, the term A3 may be considered as a direct consequence of the following known
inequality :
[
]




bx



x
x
x
sup IE[fN (y)] IE[fbN (ty )] IE sup fbN (y) fbN (ty ) .
(8)
ySR

ySR

Secondly, about the term A2 , we can write for all > 0 :

)
(


ln
n
bx

x
(ty )] >
IP sup fN (ty ) IE[fbN
n hH x (hK )
ySR

(
)


ln
n
bx

x
= IP
max
fN (ty ) IE[fbN (ty )] >
n hH x (hK )
ty {t1 ,...,tsn }

)
(


ln
n
bx

x
sn
max
IP fN (ty ) IE[fbN
.
(ty )] >
n hH x (hK )
ty {t1 ,...,tsn }
all it remains to compute is the following quantity :

)
(


ln n
bx

x
b
, for all ty {t1 , . . . , tsn }.
IP fN (ty ) IE[fN (ty )] >
n hH x (hK )
This later quantity's value is given by a straightforward adaptation of the proof of Lemma
2 in [1]. To do that, we consider the following decomposition :

(
n
)
n
1 K (x)H (t )
2 (x)

1
K
(x)

j
j
y
i
x
i

fbN
(ty ) =

2 (h )
n(n 1)IE[W12 ] n
hH x (hK )
n
h
x
K
K
j=1
|
{z
}
| i=1 {z
}
|
{z
}
T1
n2 h2K 2x (hK )

T3

T2

(
)
n
n

K
(x)
(x)H
(t
)
1
K
(x)
(x)
1

j
j
j
y
i
i

.
n
hH hK x (hK )
n
h (h )
j=1
i=1 K x K
{z
}
|
{z
}|
T4

T5

It follows that :
x
x
fbN
(ty ) IE[fbN
(ty )] = T1 ( (T2 T3 IE[T2 T3 ]) (T4 T5 IE[T4 T5 ]) ).

Moreover, observe that :

T2 T3 IE[T2 T3 ] = (T2 IE[T2 ]) (T3 IE[T3 ]) + (T3 IE[T3 ]) IE[T2 ]


+ (T2 IE[T2 ]) IE[T3 ] + IE[T2 ]IE[T3 ] IE[T2 T3 ]
and in the same way :

T4 T5 IE[T4 T5 ] = (T4 IE[T4 ]) (T5 IE[T5 ]) + (T5 IE[T5 ]) IE[T4 ]


+ (T4 IE[T4 ]) IE[T5 ] + IE[T4 ]IE[T5 ] IE[T4 T5 ]

76

Chapitre 3. Estimation locale de la densit conditionnelle

So, the claimed result will be obtained as soon as the following assertions have been checked :

{
}

ln n
< , for i = 2, 3, 4, 5,
(9)
sn IP |Ti IE[Ti ]| >
n hH x (hK )
n

T1 = O(1)

and

IE[Ti ] = O(1)

for

i = 2, 3, 4, 5,

(10)

and almost completely :

(
|IE[T2 ]IE[T3 ] IE[T2 T3 ] IE[T4 ]IE[T5 ] + IE[T4 T5 ]| = o

ln n
n hH x (hK )

)
.

(11)

: For this aim, we use the Bernstein's exponential inequality for which the
main point is to evaluate asymptotically the mth order moment of :
(
[
])
1
Zil,k = l k
Ki (x)Hik (ty )il (x) IE Ki (x)Hik (ty )il (x)
hK hH x (hK )
Proof of (17)

for l = 0, 1, 2, and k = 0, 1.
Notice that, by the Newton's binomial expansion, we obtain :
(
[
])m


IE Ki (x)Hik (ty )il (x) IE Ki (x)Hik (ty )il (x)



m


(
)d ( [
])md


d
= IE
Cm
Ki (x)Hik (ty )il (x)
IE Ki (x)Hik (ty )il (x)
(1)md


d=0
(
)
m

d [
] md




d
Cm
IE Ki (x)Hik (ty )il (x) IE Ki (x)Hik (ty )il (x)

d=0


[
] md



d
Cm
IE K1 (x)1dl (x)IE[H1dk (ty )|X1 ] IE K1 (x)1l (x)IE[H1k (ty )|X1 ]

d=0

where Ck,m = m!/(k!(m k)!).


Using the same arguments as those invoked in the proof of Lemma 5.3.2, and replacing H
by H d , we show that, for all d m :

[
]

IE H1 (ty )/X = hH
H (t)f X (ty hH t)dt.
IR

So, conditions (H2) and (H5) allow us to write that :


[
]
IE H1dk (ty )/X = O(hkH ), for all d m and k = 0, 1.
Moreover, it is shown in [1] that :
m
hlm
K x (hK )


[
] md



d
Cx (hK )m+1 .
Cm
IE K1 (x)1dl IE K1 (x)1l

d=0

Therefore, for l = 0, 1, 2, and k = 0, 1, we obtain that :



m
(
)


IE Zil,k = O (hkH x (hK ))m+1 .

3.6.

77

Appendix

Thus, to achieve this proof, it suces to use the classical Bernstein's inequality (see Corollary
A8 in [11], page 234), rst with an = (hH x (hK ))1/2 to treat the terms T2 and T4 , and
second with an = (x (hK ))1/2 for the terms T3 and T5 . In conclusion, we obtain for all
>0:

}
{

ln n
2
C nC ,
n hH x (hK )

{
}
ln n
2
IP |T4 IE[T4 ]| >
C nC
n hH x (hK )

IP |T2 IE[T2 ]| >

and for i = 3, 5
{

IP |Ti IE[Ti ]| >

ln n
n hH x (hK )

{
IP |Ti IE[Ti ]| >

ln n
n x (hK )

}
C nC .
2

Therefore, an appropriate choice of permits to deduce that :

{
}
ln n
C n1 , for i = 2, 3, 4, 5.
sn IP |Ti IE[Ti ]| >
n hH x (hK )
Proofs of (18) and (25).

Notice that, the rst part of (18) has been treated in [1]. We now
proceed in proving the second part of (18) and (25). For this aim, since the pairs (Xi , Yi ),
i = 1, . . . , n are identically distributed, we obtain that :

IE[K1 (x)H1 (ty )]


IE[K1 (x)12 (x)]

IE[T
]
=
,
IE[T
]
=
,

2
3

hH x (hK )
hK x (hK )

IE[K1 (x)H1 (ty )1 (x)]


IE[K1 (x)1 (x)]

IE[T4 ] =
,
IE[T5 ] =
,

hK hH x (hK )
hK x (hK )

and

IE[T2 ]IE[T3 ] IE[T2 T3 ] IE[T4 ]IE[T5 ] + IE[T4 T5 ]

(
)

n(n 1)

2
2

= 1
h2
K x (hK ) IE[K1 (x)1 (x)]IE[K1 (x)H1 (ty )]
n2
Thus, for both equations (18) and (25), we have to evaluate :
[
]
IE Ki (x)Hik (ty )il (x) , for l = 0, 1, 2, and k = 0, 1.
As previously, we condition on X1 to show that, for all l = 0, 1, 2, and k = 0, 1, we have :
[
]
[
]
IE Ki (x)Hik (ty )il (x) = O(hkH IE Ki (x)il (x) )
and by Lemma 3 in [1], we obtain that :
[
]
IE Ki (x)Hik (ty )il (x) = O(hkH hlK x (hK )).

(12)

78

Chapitre 3. Estimation locale de la densit conditionnelle

Equality (12) leads directly to :

IE[Ti ] = O(1), for i = 2, 3, 4, 5,


|IE[T2 ]IE[T3 ] IE[T2 T3 ] IE[T4 ]IE[T5 ] + IE[T4 T5 ]| = O(hH n1 )
which implies that :

(
IE[T2 ]IE[T3 ] IE[T2 T3 ] IE[T4 ]IE[T5 ] + IE[T4 T5 ] = O

ln n
n hH x (hK )

)
.

Finally, the Lemma 5.3.3 is a direct consequence of the assertions (7), (8), (17), (18) and
(25).

Proof of Lemma 3.4.1. The proof of this lemma is based on the same decomposition's
kind as used to prove Lemma 5.3.3. Indeed,

(
n
)
n
1 K (x)
2 (x)

K
(x)
1

j
i
x
i

fbD = T1
n
x (hK )
n
h2K x (hK )
j=1
i=1

{z
}
|
{z
}|
S4 (x)

S2 (x)

(
)
n
n

K
(x)
(x)
1
K
(x)
(x)
1

j
j
i
i

n
hK x (hK )
n
h (h )
j=1
i=1 K x K
{z
}
|
{z
}|
S3 (x)

S3 (x)

and in the same fashion, all it remains to show are the following uniform convergences :
(
)
ln dn
sup |Sk (x) IE[Sk (x)]| = O
, a.co. for k = 2, 3, 4,
(13)
n x (hK )
xSF
and supxSF |IE[S2 (x)]IE[S4 (x)] IE[S2 (x)S4 (x)] var[S3 (x)]| = o

ln dn
n x (hK )

, a.co. and,

also that, uniformly on x SF : T1 = O(1)


and
|IE[Sk (x)]| = O(1), for k = 2, 3, 4.
Clearly, the last two equations are direct consequences of the assumption (U1) and of the
Lemma 3 in [1]. While the proof of (13) follows the same ideas as in [9]. Indeed, by noting :
j(x) = arg
min
|(x, xk )|, we consider the following decomposition :
j{1,2,...,dn }

sup |Sk (x) IE[Sk (x)]|


xSF

sup |Sk (x) Sk (xj(x) )| + sup |Sk (xj(x) ) IE[Sk (xj(x) )]|
xS
|
{z
} | F
{z
}

xSF

F1k

F2k

+ sup |IE[Sk (xj(x) )] IE[Sk (x)]|.


xSF
|
{z
}
F3k

We have, then, to evaluate each term Fjk , j = 1, 2, 3. Since F1k and F3k have almost the same
treatment, we will consider the following two items :

3.6.

79

Appendix

F1k and F3k . Firstly, let us analyze the rst term F1k for k =
2, 3, 4. Since K is supported in [1, 1], we can write for all k = 2, 3, 4 that :
Treatment of the terms

sup
Ki (x)ik2 (x)11B(x,hK ) (Xi )

(h
)
nhk2
xS
x
K
F i=1
K


Ki (xj(x) )ik2 (xj(x) )11B(xj(x) ,hK ) (Xi )
n

F1k

C(k 2)
Ki (x)11B(x,hK ) (Xi )
sup
nhk2
K x (hK ) xSF i=1




ik2 (x) ik2 (xj(x) )11B(xj(x) ,hK ) (Xi )
n

1
sup
ik2 ((xj(x) )11B(xj(x) ,hK ) (Xi )
nhk2

(h
)
xS
x K
F i=1
K



Ki (x)11B(x,h ) (Xi ) Ki (xj(x) ) .
K

The Lipschitz condition on K allows us directly to write




11B(xj(x) ,hK ) (Xi ) Ki (x)11B(x,hK ) (Xi ) Ki (xj(x) )

C11B(x,hK )B(xj(x) ,hK ) (Xi ) + C11B(x

j(x) ,hK )B(x,hK )

(Xi ).

In a similar way, the Lipschitz condition on gives






11B(x,hK ) (Xi ) i (x) i (xj(x) )11B(xj(x) ,hK ) (Xi )

11B(x,hK )B(xj(x) ,hK ) (Xi ) + hK 11B(x,hK )B(x ,hK ) (Xi )


j(x)


2

11B(x,hK ) (Xi ) i (x) i2 (xj(x) )11B(xj(x) ,hK ) (Xi )
hK 11B(x,hK )B(xj(x) ,hK ) (Xi ) + h2K 11B(x,hK )B(x

j(x) ,hK )

(Xi )

which implies that, for k = 3, 4






11B(x,hK ) (Xi ) ik2 (x) ik2 (xj(x) )11B(xj(x) ,hK ) (Xi )
k2
hk3
K 11B(x,hK )B(xj(x) ,hK ) (Xi ) + hK 11B(x,hK )B(x

j(x) ,hK )

Thus,

F1k

C sup
xSF

k
F11

+ F12 +

k
F13

)
+ F14 ,

where

C(k 2)
11B(x,hK )B(x ,hK ) (Xi ),
j(x)
n(hK )
n

k
F11
=

i=1

F12 =

C
n(hK )

11B(x,hK )B(xj(x) ,hK ) (Xi ).

i=1

C(k 2)
11B(x,hK )B(xj(x) ,hK ) (Xi ).
nhK (hK )
n

k
F13
=

F14 =

C
n(hK )

i=1
n

11B(x

j(x) ,hK )B(x,hK )

i=1

(Xi ).

(Xi ).

80

Chapitre 3. Estimation locale de la densit conditionnelle

k , F , F k and F , we apply a standard inequality


Now, to evaluate these terms F11
12
14
13
for sums of bounded random variables (cf. Corollary A.9 in [11]) with Zi is identied
such that :

[
]
1
k

sup
1
1
(X
)
for F11

B(x,hK )B(xj(x) ,hK )

(h
)
K xSF

[
]

k
sup
1
1
(X
)
for F12 and F13
i
Zi =
B(x,hK )B(xj(x) ,hK )
h
(h
)
K
K xSF

[
]

(Xi )
sup 11
for F14

(hK ) xSF B(xj(x) ,hK )B(x,hK )

Clearly, under the second part of (U1), we have for the rst and the last case :
(
)
(
)
(
)
1

Z1 = O
, IE[Z1 ] = O
and var(Z1 ) = O
.
(hK )
(hK )
((hK ))2
So that, we get :

(
k
F11

=O

(hK )

)
+ Oa.co.

ln n
n (hK )2

)
.

k case
In the same way, assumption (U5) allows to get, for F12 or F13
(
)
(
)
(
)

2
Z1 = O
, IE[Z1 ] = O
and var(Z1 ) = O
,
hK (hK )
hK
h2k (hK )

which implies that :

(
k
F12
= Oa.co.

ln dn
n (hK )

)
.

To achieve the study of the term F1 , it suces to put together all the intermediate
results and to use (U5) to obtain :
(
)
ln
d
n
F1k = Oa.co.
.
(14)
n (hK )
Furthermore, since :

[
F3k

IE

]
sup |Sk (x) Sk (xj(x) )|

xSF

we have also :

(
F3k

= O

ln dn
n(hK )

)
.

F2k . For all > 0, we have that :

(
)
(
)

[
]
ln
d
n
ln dn
IP F2k > n(h
= IP
max |Sk (xj(x) ) IE Sk (xj(x) ) | >
K)
n(hK )
j{1,,dn }

)
(
ln dn
.
dn max IP |Sk (xj ) IE [Sk (xj )] | >
n(hK )
j{1,,dn }

Treatment of the term

3.6.

81

Appendix

Set,

ki =

(
[
])
1
k2
k2
K
(x
)
(x
)

IE
K
(x
)
(x
)
, for k = 2, 3, 4.
i
i
k i
k
k i
k
nhk2
K (hK )

By using similar proof as for showing Lemma 2 of [1], we get for all j = 1, . . . , dn and
i = 1, . . . , n, that :

(
)
IE|ki |m = O (hK )m+1 , for k = 2, 3, 4.
So, one can apply a Bernstein-type inequality (cf. Corollary A.8 in [11]) to obtain
directly :
(
)

ln dn
IP |Si (xk ) IE [Si (xk )] | > n(h
) K)
(

n
ln dn
1
= IP n | i=1 lki | > n(hK )

2 exp{C 2 ln dn }
Thus, by choosing such that C 2 = , we get :

(
dn

Since

max

IP |Si (xk ) IE [Si (xk )] | >

k{1,,dn }

ln dn
n(hK )

)
C d1
n .

(15)

d1
< , we obtain that :
n

n=1

(
F2 = Oa.co.

ln dn
n(hK )

)
.

Proof of Corollary 3.4.1. Clearly, we have that :


1
1
1
inf fbD (x) x SF , such that 1 fbD (x) sup |1 fbD (x)| .
2
2
2
xSF

xSF

According to the Lemma 3.4.1, we obtain :

1
IP inf fbD (x)
xSF
2
Consequently :

n=1

1
IP sup |1 fbD (x)| >
2
xSF

1
IP inf |fb(x)| <
xSF
2

)
.

)
< .

Proof of Lemma 3.4.2. It suces to combine the proofs of the previous lemmas, and
assuming the Lipschitz's condition uniformly on (x, y) in SF SIR .

82

Chapitre 3. Estimation locale de la densit conditionnelle

Proof of Lemma 3.4.3. The proof of this lemma follows the steps as for proving Lemma
3.4.1, where S2 (x) and S4 (x) are replaced by :

n
1 Kj (x)Hj (y)

x (y)

T
=
2

n
hH x (hK )

j=1

1 Kj (x)j (x)Hj (y)


T3x (y) =
n
hK hH x (hK )

j=1

1 Kj (x)j2 (x)Hj (y))


x

T
(y)
=

4
n
h2K hH x (hK )
j=1

To do that, we keep the notations used previously, namely, the denitions of j(x), ty and ln .
The proof is based on the the following decomposition which will be used for the three terms :


x


|Tix (y) IE[Tix (y)]| sup sup Tix (y) Ti j(x) (y)
xSF ySIR
|
{z
}
E1

x
x

+ sup sup Ti j(x) (y) Ti j(x) (ty )
xSF ySIR
|
{z
}
E2


x
x

+ sup sup Ti j(x) (ty ) IE[Ti j(x) (ty )]
xSF ySIR
|
{z
}
E3


x
x


+ sup sup IE[Ti j(x) (ty )] IE[Ti j(x) (y)]
xSF ySIR
|
{z
}
E
4


x


+ sup sup IE[Ti j(x) (y)] IE[Tix (y)] .
xSF ySIR

{z

E5

Similarly to the study of the term F1 , we obtain :


(
)
(
)
ln dn
ln dn
E1 = Oa.co.
and E5 = O
.
n1 (hK )
n1 (hK )

(16)

Concerning the term E2 , by using the Lipschitz's condition on the kernel H , one can write :
xj(x)

|Ti

xj(x)

(y)Ti

(ty ) C

n hlK hH

(hK ) i =1

Ki (xj(x) )il (xj(x) ) |Hi (y) Hi (ty )|

ln
Si (xj(x) ),
h2H

where
Si () for i = 2, 3, 4, are dened and treated in Lemma 3.4.1's proof. Thus, by using the facts
1
3
that : lim n hH = and ln = n 2 2 , we obtain :
n+

(
E2 = Oa.co.

ln dn
n1 (hK )

)
and E4 = O

ln dn
n1 (hK )

)
.

(17)

Finally, for the term E3 , we use the same arguments as for proving Lemma 3.4.1, to show
that for all > 0,

(
)
ln dn
IP E3 >
n hH (hK )

3.6.

83

Appendix

)
ln
d
n
= IP
max
max |Tixk (tj ) IE[Tixk (tj )]| >
n hH (hK )
j{1,2,...,sn } k{1,...,dn }

(
)
ln dn
xk
xk
s n dn
max
max IP |Ti (tj ) IE[Ti (tj )]| >
.
n hH (hK )
j{1,2,...,sn } k{1,...,dn }
This last probability can be treated by using the classical Bernstein's inequality, with an =
(hH x (hK ))1/2 . Recall that, the choice of an is motivated by moment of order m of Zil,k
computed in Lemma 5.3.3's proof. That allows, nally, to :

)
(
ln dn
xk
xk
2 exp{C 2 ln dn }.
j sn , IP |Ti (tj ) IE[Ti (tj )]| >
n hH (hK )
( 3 1)
( )
Therefore, since sn = O ln1 = O n 2 + 2 , and by choosing C 2 = one has :

(
)
ln dn
2
xk
xk
sn dn
max
max IP |Ti (tj ) ETi (tj )| >
C sn d1C
.
n
n hH (hK )
j{1,2,...,sn } k{1,...,dn }
By using the fact that lim n hH = and the second part of condition (U5), one obtains :
n+

(
E3 = Oa.co.

ln dn
1
n
(hK )

)
.

(18)

Thus, Lemma 3.4.3's result can be easily deduced from (16), (17) and (18).

Proof of Corollary 4.3.4. By a simple manipulation, we show that :


d f x ((x))| 2 sup |fbx (y) f x (y)|.
|f x ((x))

(19)

ySR

We use the following Taylor expansion of the function f x :

d (x))j ,
d = f x ((x)) + 1 f x(j) ( (x))((x)
f x ((x))
j!
d . It is clear that, from conditions (U6), (29) and
for some (x) between (x) and (x)
Theorem 3.4.1, we obtain that :
d (x)| 0, a.co.
sup |(x)
xSF

Next, under condition (U7), we obtain that :

sup |f x(j) ( (x)) f x(j) ((x))| 0, a.co.


xSF

Consequently, we can get > 0 such that :


(
)

x(j)
IP inf f
( (x)) < < ,
n=1

xSF

and we have :

d (x)|j C sup sup |fbx (y) f x (y)|, a.co.


sup |(x)
xSF

xSF ySIR

So, the claimed result is a direct consequence of this last inequality together with Theorem
3.4.1's result.

84

Chapitre 3. Estimation locale de la densit conditionnelle

Acknowledgment :

The authors would like to thank the Editor, an Associate-Editor


and an anonymous reviewer for their valuable comments and suggestions which improved
substantially the quality of an earlier version of this paper.

Bibliographie

[1] Barrientos-Marin, J. (2007). Some Practical Problems of Recent Nonparametric Procedures : Testing, Estimation, and Application. PhD thesis from the Alicante University
(Spain).
[2] Barrientos-Marin, J., Ferraty, F. and Vieu, P. (2010). Locally Modelled Regression and
Functional Data. J. of Nonparametric Statistics, 22, No. 5, Pages 617632.
[3] Benhenni, K., Ferraty, F., Rachdi, M. and Vieu, P. (2007). Local smoothing regression
with functional data. Computational Statistics, 22, No. 3, Pages 353369.
[4] Bosq, D. (2000). Linear Processes in Function Spaces : Theory and applications. Lecture
Notes in Statistics, 149, Springer.
[5] Ballo, A. and Gran, A. (2009). Local linear regression for functional predictor and
scalar response, Journal of Multivariate Analysis, 100, Pages 102111.
[6] Cai, T.-T. and Hall, P. (2006). Prediction in functional linear regression,
Statistics, 34, Pages 21592179.

Annals of

[7] Chu, C.-K. and Marron, J.-S. (1991). Choosing a kernel regression estimator. With
comments and a rejoinder by the authors. Statist. Sci., 6, Pages 404436.
[8] Ezzahrioui, M. and Ould-Sad, E. (2008). Asymptotic normality of a nonparametric
estimator of the conditional mode function for functional data. J. Nonparametr. Stat.,
20, Pages 318.
[9] Fan, J. (1992). Design-adaptive nonparametric regression.
Pages 9981004.

J. Amer. Statist. Assoc.,

87,

[10] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and its Applications. London,
Chapman & Hall.
[11] Fan, J. and Yim, T.-H. (2004). A cross-validation method for estimating conditional
densities. Biometrika, 91, Pages 819834.
[12] Ferraty, F., Laksaci, A., Tadj, A., and Vieu, P. (2010). Rate of uniform consistency for
nonparametric estimates with functional variables. Journal of statistical planning and
inference, 140, Pages 335352.
[13] Ferraty, F., Laksaci, A. and Vieu, P. (2005). Functional times series prediction via
conditional mode. C. R., Math., Acad. Sci. Paris, 340, Pages 389392.
[14] Ferraty, F., Laksaci, A. and Vieu, P. (2006). Estimating some characteristics of the
conditional distribution in nonparametric functional models. Stat. Inference Stoch.
Process., 9, Pages 4776.
[15] Ferraty, F. and Vieu, P. (2006). Nonparametric functional
Practice. Springer Series in Statistics. New York.
85

data analysis. Theory and

86

Chapitre 3. Estimation locale de la densit conditionnelle

[16] Ferraty, F., Van Keilegom, I. and Vieu, P. (2008). On the validity of the bootstrap
in nonparametric functional regression. Scandinavian J. of Statist., 37, No. 2, Pages
286306.
[17] Hyndman, R. and Yao, Q. (2002). Nonparametric estimation and symmetry tests for
conditional density functions. J. Nonparametr. Stat., 14, Pages 259278.
[18] Mller, H.-G. and Stadtmller, U. (2005). Generalized functional linear models.
Stat., 33, No. 2, Pages 774805.

Ann.

[19] Ould-Sad, E. and Cai, Z. (2005). Strong uniform consistency of nonparametric estimation of the censored conditional mode function. J. Nonparametr. Stat., 17, Pages
797806.
[20] Rachdi, M. and Vieu, P. (2007). Nonparametric regression for functional data : automatic smoothing parameter selection. Journal of Statistical Planning and Inference,
137, Pages 27842801.
[21] Ramsay, J.-O. and Silverman, B.-W. (1997).
in Statistics. New York.

Functional data analysis.

[22] Ramsay, J. O. and Silverman, B. W. (2002). Applied functional


and case studies. Springer Series in Statistics. New York.
[23] Sarda, P. and Vieu, P. (2000).

Kernel Regression.

Springer Series

data analysis. Methods

Pages 4370, Wiley, New York.

[24] Vieu, P. (1996). A note on density mode estimation.


297307.

Statist. Probab. Lett.,

26, Pages

[25] Youndj, E. (1993). Estimation non paramtrique de la densit conditionnelle par la


mthode du noyau. PhD thesis (in French), Rouen University.

Chapitre 4
A fast functional locally modeled of
the conditional density and mode in
functional time series

Jacques Demongeot 1 , Ali Laksaci 2 Fethi Madani

1 3 and

Mustapha Rachdi 4

Recent Advances in Functional Data Analysis and Related Topics Contributions to


Statistics, Physica-Verlag/Springer, 2011, 85-90, DOI : 10.1007/978-3-7908-2736-1_13
In this paper, we study the asymptotic behavior of the nonparametric local linear
estimation of the conditional density of a scalar response variable given a random variable
taking values in a semi-metric space. Under some general conditions on the mixing property
of the data, we establish the pointwise almost complete convergence, with rates, of this estimator. This approach can be applied in time series analysis to the prediction problem via
the conditional mode estimation.
Abstract.

Keywords and Phrases. Functional data, Local linear estimator, Conditional density, Conditional
mode, Nonparametric model, Small balls probability, Mixing data

AMS Subject Classication. Primary : 62G05, Secondary : 62G07, 62G08, 62G35, 62G20.

4.1

Introduction

Let (Xi , Yi ) for i = 1, . . . , n be n pairs of random variables that we assume drawn from
the pair (X, Y ) which is valued in F R, where F is a semi-metric space equipped with
1. Laboratoire TIMC-IMAG, UMR CNRS

5525,

Equipe AGIM , Facult de Mdecine de Grenoble, Uni-

versit J. Fourier, 38700, La Tronche, France. E-mail : Jacques.Demongeot@imag.fr


2. Universit Djillali Liabs, BP. 89, Sidi Bel-Abbs 22000, Algeria. E-mail : alilak@yahoo.fr
3. Corresponding author : fethi_madani@yahoo.fr and Fethi.Madani@imag.fr
4. Laboratoire TIMC-IMAG, UMR CNRS

5525,

Equipe AGIM , Universit de Grenoble 2, UFR SHS,

BP. 47, 38040 Grenoble Cedex 09, France. E-mail : Mustapha.Rachdi@upmf-grenoble.fr

87

88 4.

A fast functional locally modeled of the conditional density and mode in functional time series

a semi-metric d. Assume that there exists a regular version of the conditional probability
of Y given X = x for a xed x F , which is absolutely continuous with respect to the
Lebesgue measure on R and has bounded density, denoted by f x . In this paper, we consider
the problem of the conditional density estimation by using locally modeling approach when
the explanatory variable X is functional and when the observations (Yi , Xi )iN are strongly
mixing. In functional statistics, there are several ways for extending the local linear ideas
. Here, we adopt the fast functional locally modeling, introduced by [2] for the regression
analysis, that is, we estimate the conditional density by b
a which is obtained by minimizing
the following quantity :

min

(a,b)IR2

)2
1
1
h1
H H(hH (y Yi )) a b(Xi , x) K(hK (x, Xi ))

i=1

where (., .) is a known function from F 2 into IR such that, F , (, ) = 0, with K


and H are kernels and hK = hK,n (resp. hH = hH,n ) is a sequence of positive real numbers
and (., .) is a function of F F Clearly, by a simple algebra, we get explicitly the following
denition of fbx :
n
1
i,j=1 Wij (x)H(hH (y Yi ))
x
b
n
f (y) =
hH i,j=1 Wij (x)
where
1
Wij (x) = (Xi , x) ((Xi , x) (Xj , x)) K(h1
K (x, Xi ))K(hK (x, Xj ))

with the convention 0/0 = 0.


Notice that the local linear estimator of the conditional density has been widely studied,
when the explicative variable lies in a nite dimensional space (cf. for instance [12, 13,
19, 24]). Such estimation method has various advantages over the kernel method, namely,
this method has superior bias properties than the kernel ones (cf. [4, 6] for an extensive
discussion on the comparison between the both methods). Moreover, it is known that the
kernel method can be viewed as a particular case of this method.
Nowadays, the progress of informatics tools permits the recuperation of increasingly bulky
data. These large data sets are available essentially by real time monitoring, and computers
can manage such databases. The object of statistical study can then be curves (consecutive
discrete recordings are aggregated and viewed as sampled values of a random curve) not
numbers or vectors. Functional data analysis (FDA) (cf, [5], [11] and [30]) can help to
analyze such high-dimensional data sets. In this area, the conditional density is an important
model to study the association between functional covariates scalar responses. In particular,
the are several prediction tools in nonparametric statistic, such the conditional mode, the
conditional median or the conditional quantiles, are based on the preliminary estimate of this
functional parameter. Since, the contribution of [10] this subject has been widely studied in
functional statistics (see, [8, 10, 9, 12] for some results on the conditional density estimation
and its mode in the functional framework. All these works give some asymptotic results
on the Nadaraya-Watson type estimator of this model. As direct extension of this method
we focus, here, the estimation by local polynomial method. Such estimation procedure has
recently investigated for the functional data. For example, we quote the realizations of [1, 2,
2, 5, 21, 22] which are concerned with the local linear type regression operator estimation
for independent and identically distributed functional data.

4.2.

89

Main results

In this work, we introduce the local linear nonparametric estimation of the conditional
density and its mode for strongly mixed functional data. To the best of our knowledge the
local linear nonparametric estimation in functional time series has not been addressed so for.
The current work is the rst contribution in this topic. As asymptotic result we establish,
under some general conditions, the almost complete convergence rate of our estimate. The
interest of this work comes mainly from the fact that the main elds of application of
functional statistical methods relate to the analysis of continuous-time stochastic processes.
Our study, for instance, can be applied to predict future values of some process by cutting
the whole past of this process into continuous paths.
The organization of the remainder of the paper is as follows : The following Section is
dedicated to xing notations, hypotheses and the presentation of the main results. Section
3 is concentrated on some conclusions, discussions and applications of our study. The proofs
of the auxiliary results are relegated to the Appendix.

4.2

Main results

We begin by recalling the denition of the strong mixing property. For this we introduce the
following notations. Let Fik (Z) denote the algebra generated by {Zj , i j k} .

Denition 4.2.1. Let

{Zi , i = 1, 2, ...} be a strictly stationary sequence of random variables.

Given a positive integer n, set

{
(n) = sup |IP(A B) IP(A)IP(B)| : A F1k (Z)
The sequence is said to be

-mixing

and

B Fk+n
(Z), k IN .

(strong mixing) if the mixing coecient

n .

(n) 0

as

There exist many processes fullling the strong mixing property. We quote, here, the usual
ARMA processes (with innovations satisfying some existing moment conditions) are geometrically strongly mixing, i.e., there exist (0, 1) and C > 0 such that, for any n 1,
(n) C n (see, e.g., Jones (1978)). The threshold models, the EXPAR models (see, Ozaki
(1979)), the simple ARCH models (see Engle (1982)), their GARCH extension (see Bollerslev
(1986)) and the bilinear Markovian models are geometrically strongly mixing under some
general ergodicity conditions. For more details we refer the reader to the monographs of
Bradley (2007) or Dedecker et al. (2007).
Throughout the paper, x denotes a xed point in F , Nx denotes a xed neighborhood of x,
f x(j) denotes the j th order derivative of the conditional density f x . S will be a xed compact
subset of IR, B(x, r) = {x F /|(x , x)| r} and x (r1 , r2 ) = IP(r2 (X, x) r1 ).
Notice that, our nonparametric model will be quite general in the sense that we will just
need the following assumptions :
(H1) For any r > 0, x (r) := x (r, r) > 0
(H2) The conditional density f x is such that : there exist b1 > 0, b2 > 0, (y1 , y2 ) S 2
and (x1 , x2 ) Nx Nx
(
)
|f x1 (y1 ) f x2 (y2 )| Cx | b1 (x1 , x2 )| + |y1 y2 |b2 ,
where Cx is a positive constant depending on x.

90 4.

A fast functional locally modeled of the conditional density and mode in functional time series

(H3) The function (., .) is such that :

x F, C1 |(x, x )| |(x, x )| C2 |(x, x )|, where C1 > 0, C2 > 0.


(H4) The sequence (Xi , Yi )iN satises : a > 0, c > 0 : n N (n) cna and

max IP ((Xi , Xj ) B(x, h) B(x, h)) = x (h) > 0


i=j

(H5) The conditional density of (Yi , Yj ) given (Xi , Xj ) exists and is bounded.
(H6) K is a positive, dierentiable function with support [1, 1].
(H7) H is a positive, bounded, Lipschitzian continuous function, such that :

b2
|t| H(t)dt < and
H 2 (t)dt < .
(H8) The bandwidth hK satises : there exists an integer n0 , such that
1
)
1
d ( 2
n > n0 ,
x (zhK , hK )
z K(z) dz > C3 > 0
x (hK ) 1
dz
and

hK

(u, x)dP (u) = o


B(x,hK )

)
2

(u, x) dP (u)
B(x,hK )

where dP (x) is the cumulative distribution.


(H9) lim hH = 0 and 1 > 0 such that lim n1 hH = ,
n
n

(1/2)

x (hK ) log n

lim hK = 0, lim
= 0,

n
n n hH 2
x (hK )
(3a)
(H10)
+
1/2
1 +1
and 0 > 3a+1
, Cn (a+1) 0 hH x (hK )

where x (h) = max(2x (h), x (h))


The following theorem gives the almost-complete convergence 5 (a.co.) of fbx .

Theorem 4.2.1.

Under assumptions (H1)-(H10), we have :

(1/2)
(
)

(h
)
log
n
x
K
, a.co.
sup |fbx (y) f x (y)| = O hbK1 + hbH2 + O
2
n
h

H x (hK )
yS

The proof of Theorem 4.2.1 is a direct consequence of the decomposition :


) (
)}
1 {( bx
x
x
y S, fbx (y) f x (y) =
fN (y) IE[fbN
(y)] f x (y) IE[fbN
(y)]
x
fbD
)
f x (y) (
x
+
1 fbD
,
fbx

(1)

where

x
fbN
(y) =

1
Wij (x)H(h1
H (y Yi )).
n(n 1)hH IE [W12 (x)]
i=j

(zn )nN bea sequence of real r.v.'s ; we say that zn converges almost completely (a.co.) to zero if,
> 0,
(un )nN be a sequence of positive real numbers ;
n=1 IP (|zn | > 0) < . Moreover,
let

that zn = O(un ) a.co. if, and only if, > 0,


n=1 IP (|zn | > un ) < : This kind of convergence

5. Let

and only if,


we say

implies both almost sure convergence and convergence in probability (cf. Sarda and Vieu (2000) for details).

4.3.

91

Concludes remarks

and

x
fbD
=

1
Wij (x)
n(n 1)IE [W12 (x)]
i=j

and of Lemmas 5.3.1, 5.3.2 and 5.3.3 below, for which the proofs are given in the Appendix.

Lemma 4.2.1.

Under assumptions (H1), (H3), (H4), (H6), (H8) and (H10), we have that :

x
1 fbD
= O

and

> 0,

(1/2)
x (hK ) log n
,
n 2x (hK )

such that

a.co.

(
)
x
IP fbD
< < .

i=1

Lemma 4.2.2.

Under assumptions (H1), (H2) and (H7), we obtain :



)
(


x
sup f x (y) IE[fbN
(y)] = O hbK1 + hbH2 .
yS

Lemma 4.2.3.

4.3

Under assumptions of Theorem 4.2.1, we get :

(1/2)


x (hK ) log n
x

x
sup fbN
(y) IE[fbN
(y)] = O
, a.co.
n hH 2x (hK )
yS

Concludes remarks

The hypotheses used in this work are not unduly restrictive and
they are rather classical in the setting of nonparametric functional statistics. Indeed,
the conditions (H1), (H3), (H6) and (H8) are the same as those used by [?]. Specically
(H1) is needed to deal with the functional nonparametric of our model by controlling
the concentration properties of the probability measure of the variable X . The latter,
is quantied, here with respect the bi-functional operator which can be related to
the topological structure on the functional space F by taking d = ||. While (H3) is
a mild regularity condition permits to control the shape of locating function . Such
condition is veried, for instance, if we take = . However, as pointed out in [?], this
consideration of = is not very adequate in practice, because these bi-functional
operators do not plays similar role. We return to [?] for more discussions on these
conditions and some examples of and . As usually in nonparametric problems, the
innite dimension of the model is controlled by mean of a smoothness condition (H2).
This condition is needed to evaluate the bias component of the rates of convergence.
The rst part of (H4) is a standard choice for the mixing coecient in time series. While
the second part of this condition measure the local dependence of the observations.
Let us note that this last has been exploited in the expression of the convergence rate.
Assumptions (H7), (H9) and (H7) are standard technical conditions in nonparametric
estimation. They are imposed for the sake of simplicity and brevity of the proofs.
On the assumptions

Some particular cases

In a rst attempt we will look at what happen if b = 0. It


is clear that, in this particular case, the conditions (H3) and (H8) are not necessary
to get our result and Theorem 4.2.1 can be reformulated in the following way.

The Nadaraya-Watson case

92 4.

A fast functional locally modeled of the conditional density and mode in functional time series

Corollary 4.3.1.

Under assumptions (H1), (H2), (H4)-(H7) and (H9)-(H10), we

have :

(1/2)
)
(
b2
b1
x
x
x (hK ) log n , a.co.
sup |fbN
W (y) f (y)| = O hK + hH + O
n hH 2x (hK )
yS

where

x
fbN
W (y)

In the vectorial case, when F = Rp , p 1 and if the probability density of the random variable X (resp. the jointly density of (Xi , Xj ) ) denoted
by f (resp. by fi,j ), is of C 1 class, then x (h) = O(hp ), and x (h) = O(h2p ) which
implies that x (h) = O(h2p ). Then our Theorem leads straitforwardly to the next
Corollary,
The multivariate case

Corollary 4.3.2.

Under assumptions (H2)-(,(H3)and (H5)-(H10), we have :

bx

sup |f (y) f (y)| = O


x

yS

hbK1

hbH2

)
+O

log n
n hH hpK

, a.co.

where

We point out that, in the special case when F = R our estimate is identied to the
estimator of [12] by taking (x, X) = X x and (x, X) = x X .
 The independent case In this situation, the conditions (H4), (H5) and the last part
of (H10) are automatically veried and x (h) = x (h) = 2x (h). So, we obtain the
following result

Corollary 4.3.3.

Under assumptions (H1)-(H3) and (H6)-(H9), we have :

(
)
sup |f (y) f (y)| = O hbK1 + hbH2 + O
bx

yS

log n
n hH x (hK )

, a.co.

The most important application of our


study when the observations are dependent and of functional nature is the prediction of
future values of some continuous time process by using the conditional mode (x) =
arg supyS f x (y) as prediction tool. The latter is estimated by the random variable
d which is such that :
(x)
d = arg sup fbx (y).
(x)
Application to functional times series prediction

yS

In practice, we proceed as follows : let (Zt )t[0,b[ be a continuous time real valued
random process. From Zt we may construct N functional random variables (Xi )i=1,...,N
dened by :
t [0, b[,
Xi (t) = ZN 1 ((i1)b+t) ,
and a real characteristic Yi = G(Xi+1 ). So, we can predict the characteristic YN by
\
the conditional mode estimate Yb = (X
N ) given by using the N 1 pairs of r.v
(Xi , Yi )i=1,...,N 1 . Such prediction is motivated by the following consistency result :

Corollary 4.3.4.
j -times

Under the hypotheses of Theorem 4.2.1, and if the function

continuously dierentiable on

(S)

with respect to

y,

x(l)

f ((x)) = 0, if 1 l < j
x(j) () is uniformly continuous
and f

x(j) ((x))| > C > 0,


such that |f

fx

is

and that :

on

(2)

4.4.

93

Appendix

then we get :

d (x)|j = O(hb1 ) + O(hb2 ) + Oa.co. O


|(x)
K
H

4.4

(1/2)
x (hK ) log n
.
n hH 2x (hK )

Appendix

In what follows, when no confusion is possible, we will denote by C and C some strictly
positive generic constants. Moreover, we put, for any x F , and for all i = 1, . . . , n :

Ki (x) = K(h1 (x, Xi )), i (x) = (Xi , x) and Hi (y) = H(h1


H (y Yi )).

Proof of lemma 5.3.1. The proof is based on the same decomposition used by Barrientos
et al .

(2007)

(
n
)
n

2
2
2
2 (x)

n
h

(h
)
K
(x)
K
(x)
1
1

j
K
i
x
x
i
K

fbD =

n(n 1)IE[W12 ] n
x (hK )
n
h2K x (hK )
j=1
i=1
|
{z
}
|
{z
}
|
{z
}
T1
T4

T2

2
n

Kj (x)j (x)
1

n
hK x (hK )
j=1

|
{z
}
T3

Thus, we can write

(
)
x
x
fbD
IE[fbD
] = T1 ( (T2 T4 IE[T2 T4 ]) T32 IE[T32 ] ).
Moreover, we have

T2 T4 IE[T2 T4 ] = (T2 IE[T2 ]) (T4 IE[T4 ]) + (T4 IE[T4 ]) IE[T2 ]


+ (T2 IE[T2 ]) IE[T4 ] + IE[T2 ]IE[T4 ] IE[T2 T4 ]
and in the same way :

T32 IE[T32 ] = (T3 IE[T3 ])2 + 2 (T3 IE[T3 ]) IE[T3 ]


+IE2 [T3 ] IE[T32 ].
So, the claimed result can be derived from the following assertions

(1/2)

x (hK ) log n
IP |Tl IE[Tl ]| >
< ,

n 2x (hK )
T1 = O(1)

and

IE[Tl ] = O(1)

for

l = 2, 3, 4,

(3)

for

l = 2, 3, 4,

(4)

94 4.

A fast functional locally modeled of the conditional density and mode in functional time series

(5)

(1/2)
x (hK ) log n
.
n 2x (hK )

(6)

Cov(T2 , T4 ) = o

and V ar[T3 ] = o
It is shown that in Barrientos

et al.

(1/2)
x (hK ) log n
n 2x (hK )

(2007) that

T1 = O(1) and IE[Tl ] = O(1) for l = 2, 3.4


Theses lasts results are not aected by the dependence structure of the data. It suces to
prove (3), (5) and (6) to nish the proof of the claimed result. For (3), we set

ki =

[
])
1 (
k
k
K
(x)
(x)

IE
K
(x)
(x)
for k = 0, 1, 2.
i
i
i
i
hkK

Then, it can be seen that

1
ki
Tk+2 IE[Tk+2 ] =
nx (hK )
n

for k = 0, 1, 2.

i=1

Moreover, under (H1), (H3) and (H5), we have


1
K (x)ik (x)
hkK i

1
K (x)|(Xi , x)|k
hkK i
1
K(h1 (x, Xi ))|(Xi , x)|k 1I]1,1[ (h1 (x, Xi ))
hkK
K(h1 (x, Xi )) C

(7)

So, we can apply the Fuck-Nagaev exponential inequality, to get for all r > 0 and > 0, we
have

{
}
n

1


ki >
IP {|Tk+2 IE[Tk+2 ]| > } = IP
nx (hK )

i=1
}
{ n



IP
ki > nx (hK )


i=1

C(A1 + A2 )
where

(
)r/2
2 n2 (x (hK ))2
A1 = 1 +
Sn2 r

and

Sn2

n
n

and

A2 = nr

(8)

r
nx (hK )

Cov(ki , kj ) = Sn2 + nV ar[1 (x)]

i=1 j=1

with

Sn2

i=1 i=j

Cov(i (x), j (x)).

)a+1

4.4.

95

Appendix

Next, we evaluate the asymptotic behavior of Sn2 . For this we use the technique of Masry
(1986). We dene the sets

S1 = {(i, j) such that 1 i j mn }


and

S2 = {(i, j) such that mn + 1 i j n 1}

where mn , as n . Let J1,n and J2,n be the sum of covariance over S1 and S2
respectively. Because of (H1), (H4) and (7) we can write :

J1,n Cnmn [IP(Xi , Xj ) B(x, hK ) B(x, hK )


+ IP(Xi B(x, hK )IP(Xj B(x, hK )]
Cnmn [x (hK ) + 2x (hK )] Cnmn x (hK ).
Concerning the summation over S2 , we use Davydov-Rio's inequality for bounded mixing
processes. This leads, for all i = j , to
|Cov(ki , kj )| C(|i j|).
Therefore, using
(H4)

jx+1 j

ux u

[
]1
= (a 1)xa1
we get, under the rst part of




[
]

nma+1
n
.
|J2,n | =
Cov ki , kj C
a

1
(i,j)E2

(9)

The choice mn = (x (hK ))1/a , permits to get


n

Cov(ki , kj ) = O(nx(a1)/a (hK )) for k = 0, 1, 2.

i=j

Concerning the variance term, we deduce from (H1) that

V ar[1 (x)] C(x (hK ) + (x (hK ))2 ) C1/2


x (hK ).
Finally, as a > 2
Now, we apply (8) to =

Sn2 = O(nx1/2 (hK )).

2 log n
Sn
nx (hK )

(10)

and r = C(log n)2 . It follows that

A2 Cn1(a+1)/2 x (hK )(a+1)/4 (log n)(3a1)/2 .


Next, using the left side of (H10) we obtain

A2 Cn1(a+1)/2 (log n)(3a1)/2 .


So, it exists some real > 0 such that

A2 Cn1 .
By means of (10), we show that

(
)r/2
(
(
))
2 log n
2 log n
A1 C 1 +
= C exp r/2 log 1 +
r
r

(11)

96 4.

A fast functional locally modeled of the conditional density and mode in functional time series

because of r = C(log n)2 , we get

(
)
2
2 log n
A1 C exp
= Cn /2 .
2
Thus, for large enough :

> 0,

A1 Cn

Hence

Tl IE[Tl ] = Oa.co.

2 /2

Cn1 .

(12)

(1/2)
x (hK ) log n
n 2x (hK )

for l = 2, 3, 4.

Finally, by following similar arguments used to prove (10) we get


(
)
(1/2)
x (hK )
Cov(T2 , T4 ) = O
n 2x (hK )
(
)
(1/2)
x (hK )
and V ar[T3 ] = o
.
n 2x (hK )

(1/2)

which is negligible with respect


nished.

(hK ) log n
. Then, the proof of our Lemma is now
n 2x (hK )

Proof of lemma 5.3.2.

The bias term is standard is not aected by the dependence


condition of (Xi , Yi ). So, by the equiprobability of the couples (Xi , Yi ) we have
x
y S, IE[fbN
(y)] =

[
[ [
]]]
1
IE W12 (x) IE h1
.
H H1 (y)/X
IE[W12 ]

By the classical change of variables t = (y z)/hH , we obtain :

h1
IE
[H
(y)/X]
=
H(t)f X (y hH t)dt,
1
H
R

therefore

|IE[H1 (y)/X] f (y)|


x

H(t)|f X (y hH t) f x (y)|dt.

Thus, by the assumption (H2) we get that :

y S, 1IB(x,hK ) (X)|IE(H1 (y)/X) f (y)|


x

H(t)(hbK1 + |t|b2 hbH2 )dt.

Hence,
x
y S, |IE[fbN
(y)]f x (y)|

Proof of lemma 5.3.3.

]
[
]
[
1
b1
b2
x

IE W12 (x) IE h1
H H1 (y)/X f (y) C(hK +hH ).
IE[W12 ]

Using the compactness of S , we can write that S


Taking ty = arg mint{t1 ,...,tzn } |y t|, we have

k=1 Sk

where Sk = (tk ln , tk +ln ).

4.4.

97

Appendix







x

x

x

x
x
x
sup fbN
(y) IE[fbN
(y)] sup fbN
(y) fbN
(ty ) + sup fbN
(ty ) IE[fbN
(ty )] +
yS
yS
yS
|
{z
} |
{z
}
A

A2

1


bx

x
b
+ sup IE[fN (ty )] IE[fN (y)] .
yS
{z
}
|

(13)

A3

Concerning (A1 ) :


x

x
sup fbN
(y) fbN
(ty ) sup

|Hi (y) Hi (ty )| Wij (x),


n(n 1) hH IE[W12 (x)]
i=j

C|y ty |
1
sup
Wij (x) ,
h
n
h
IE[W
12 (x)]
H
H
yS

yS

yS

i=j

ln bx
f
h2H D
ln
C 2 .
hH
C

(14)

The second inequality is obtained by considering a Lipschitz argument whereas the


x (see, Lemma 5.3.1). Take
last one comes from the almost comply consistency of fbD
now
1
3
(15)
ln = n 2 1 2 ,
and note that, because of (H9), we have

ln /h2H = o

(1/2)
x (hK ) log n
.
n hH 2x (hK )

Thus, for n large enough, we can write

(1/2)


x (hK ) log n
x

x
sup fbN
(y) fbN
(ty ) = oa.co.
.
n hH 2x (hK )
ySIR

(16)

Concerning (A2 ) : we can write for all > 0 :



x

x
IP sup fbN
(ty ) IE[fbN
(ty )] >
yS

= IP

max

ty {t1 ,...,tzn }

(1/2)
x (hK ) log n
n hH 2x (hK )



bx

x
(ty )] >
fN (ty ) IE[fbN

(1/2)
x (hK ) log n
n hH 2x (hK )

(1/2)


x (hK ) log n
x

x
zn
max
IP fbN
(ty ) IE[fbN
(ty )] >
.
n hH 2x (hK )
ty {t1 ,...,tzn }

98 4.

A fast functional locally modeled of the conditional density and mode in functional time series

all it remains to compute is the following quantity :

(1/2)


x (hK ) log n

x
x
(ty ) IE[fbN
(ty )] >
IP fbN
, for all ty {t1 , . . . , tzn }.
n hH 2x (hK )
The later is given by a straightforward adaptation of the proof of Lemma (5.3.1). To
do that, we consider the following decomposition :

)
n
n
1
2 (x)

K
(x)H
(t
)
K
(x)
1

j
j
y
i
x
i

fbN (ty ) =

2 (h )
n(n 1)IE[W12 ] n
hH x (hK )
n
h
x
K
K
j=1
|
{z
}
| i=1 {z
}
|
{z
}
S1
n2 h2K 2x (hK )

S3

S2

(
)
n
n

K
(x)
(x)H
(t
)
K
(x)
(x)
1
1

j
j
j
y
i
i

n
hH hK x (hK )
n
hK x (hK )
j=1
i=1
{z
}
|
{z
}|
S5

S4

which implies that


x
x
fbN
(ty ) IE[fbN
(ty )] = S1 ( (S2 S3 IE[S2 S3 ]) (S4 S5 IE[S4 S5 ]) ).

Clearly

S2 S3 IE[S2 S3 ] = (S2 IE[S2 ]) (S3 IE[S3 ]) + (S3 IE[S3 ]) IE[S2 ]


+ (S2 IE[S2 ]) IE[S3 ] + IE[S2 ]IE[S3 ] IE[S2 S3 ]
and

S4 S5 IE[S4 S5 ] = (S4 IE[S4 ]) (S5 IE[S5 ]) + (S5 IE[S5 ]) IE[S4 ]


+ (S4 IE[S4 ]) IE[S5 ] + IE[S4 ]IE[S5 ] IE[S4 S5 ]
So, our claimed result is direct consequences of the following assertions

(1/2)

x (hK ) log n
zn IP |Sk IE[Sk ]| >
< , for k = 2, 3, 4, 5, (17)

n hH x (hK )
n

S1 = O(1)

and

IE[Sk ] = O(1)

for

(h
)
log
n
K

Cov(S2 , S3 ) = o
n 2x (hK )

(1/2)

(h
)
log
n
x
K
.
and Cov(S4 , S5 ) = o
n 2x (hK )

k = 2, 3, 4, 5, (18)

(1/2)

(19)

(20)

Observe that the case k = 3, 5 has been already obtained in Lemma (5.3.1). Thus, we
focus only the case k = 2, 4.

4.4.

99

Appendix

Firstly, for (17), we use the same arguments as those invoked in the proof of Lemma
5.3.1 and we compute asymptotically

sn2 =

|Cov(ki , kj )|,

i=j=1

where

ki =

[
])
1 (
k
k
K
(x)H
(t
)
(x)

IE
K
(x)H
(t
)
(x)
for k = 0, 1.
i
i y i
i
i y i
hkK

We split the sum into two sets dened by

S1 = {(i, j) such that 0 |i j| mn }


and

S2 = {(i, j) such that mn + 1 |i j| n 1}

where mn . Then

sn2

Cov(ki , kj ) +

S1

Cov(ki , kj ).

S2

On one hand, by (H1) and (H4)-(H7), we have :

Cov(ki , kj ) = IE[ki kj ] IE[ki ]IE[kj ]


[
]
= IE[ki kj IE[Hi (ty )Hj (ty )|Xi , Xj ]] IE2 k1 IE[H1 (ty )|X]
(
[ ])
C h2H IE[ki kj ] + h2H IE2 k1
Ch2H (x (hK ) + 2x (hK ))
Ch2H x (hK ).
On the other hand this covariance can be controlled by means of the Davydov-Rio's
inequality for bounded mixing processes
i = j Cov(ki , kj ) C(|i j|).
Then, by using similar arguments as those invoked in the proof of Lemma (5.3.1)

sn2 C(nmn h2H x (hK ) + nm1a


n ).
We can take

mn

(
=

1
2
hH x (hK )

)1/a
,

and we conclude as for the proof of Lemma 5.3.1 that :

sn2 = O(n(h2H x (hK ))(a1)/a ).

(21)

100 4.

A fast functional locally modeled of the conditional density and mode in functional time series

The computation of the variance term can be done by following the same ideas as in
bias term given in Lemma (5.3.2) and is based in the fact that

IE[H12 (ty )|X] = O(hH ) and IE[H1 (ty )|X] = O(hH ).


Therefore,

V ar[ki ] = O(hH (1/2)


(hK )).
x

This last result, together with (21) show that


n

sn =

|Cov(ki , kj )| = O(nhH 1/2


x (hK )).

(22)

i,j=1

Once again, similar arguments as those invoked for proving Lemma 5.3.1 can be used,
and we obtain successively, for all r > 1, and k = 2, 4 :

(
)
zn IP {|Sk IE[Sk ]| > } Cln1 A1 + A2 ,

where

A1

)r/2
(
2
= 4 1 + 2
rsn

and A2 = 4cnr1

( r )a+1

sn2 log n
and r = c log n2 and we use
nhH x (hK )

the fact that sn2 = O(nhH 1/2


x (hK )). We note that A1 and A2 are exactly the same
as the terms A1 and A2 appearing in the proof of Lemma 5.3.1. We also note that,
the choice of ln made in (15) and our conditions on hK (resp. on x ) are immediately
insuring the existence, for large enough of some > 0 such that :
To state the desired result we take = C1

ln1 (A1 + A2 ) Cn1 .


Finally, we arrive at

zn IP

|Si IE[Si ]| >

(1/2)
x (hK ) log n
n hH 2x (hK )

< .

(23)

Let us now prove the result (18) and (25). The proof of this last follows exactly along
the same line as the proof of (22).

(1/2)

(h
)
log
n
x
K

Cov(S2 , S3 ) = O
= o
n hH 2x (hK )

(
)
(1/2)
(1/2)
x (hK )

(h
)
log
n
x
K
.
and Cov(S4 , S5 ) = O
= o
n 2x (hK )
n hH 2x (hK )
(

(1/2)

x (hK )
n 2x (hK )

While the proof of (4) is based on the fact that

IE[S2 ] =

IE[K1 (x)H1 (ty )1 (x)]


IE[K1 (x)H1 (ty )]
, and IE[S4 ] =
,
hH x (hK )
hK hH x (hK )

(24)

(25)

4.4.

101

Appendix

So it remains to evaluate the quantities :


[
]
IE Ki (x)Hi (ty )ik (x) , for k = 0, 1.
By conditioning on X1 and by applying Lemma 3 in Barrientos (2007), we obtain
that :
[
]
IE Ki (x)Hi (ty )ik (x) = O(hH hkK x (hK )).
(26)
Leading

IE[Sk ] = O(1), for k = 2, 4.

Finally, we arrive at :

(1/2)

(h
)
log
n
x

x
K
x
.
sup fbN
(y) IE[fbN
(ty )] = Oa.co.
2
n
h

H x (hK )
ySIR

(27)

Concerning (T3 ) : because of (H8) and (H9) we have :




ln


x
x
sup IE[fbN
(y)] IE[fbN
(ty )] C 2 .
h
yS
H
Using analogous arguments as for T1 , we can show for n large enough that

(
)


log
n
bx

x
IP sup IE[fN (y)] IE[fbN
(ty )] >
= 0.
3 n hH x (hK )
yS

(28)

Now, our lemma can be easily deduced from (16), (27) and (28).

Proof of Corollary 4.3.4. By a simple manipulation, we show that :


d f x ((x))| 2 sup |fbx (y) f x (y)|.
|f x ((x))

(29)

ySR

We use the following Taylor expansion of the function f x :

d = f x ((x)) + 1 f x(j) ( (x))((x)


d (x))j ,
f x ((x))
j!
d . It is clear that, from conditions (U6), (29) and
for some (x) between (x) and (x)
Theorem 3.4.1, we obtain that :
d (x)| 0, a.co.
|(x)
Next, under condition (U7), we obtain that :

|f x(j) ( (x)) f x(j) ((x))| 0, a.co.


Consequently, we can get > 0 such that :

(
)
IP f x(j) ( (x)) < < ,

n=1

and we have :

d (x)|j C sup |fbx (y) f x (y)|, a.co.


(x)
yS

So, the claimed result is a direct consequence of this last inequality together with Theorem
3.4.1's result.

102 4.

A fast functional locally modeled of the conditional density and mode in functional time series

Bibliographie

[1] Barrientos-Marin, J. (2007). Some Practical Problems of Recent Nonparametric Procedures : Testing, Estimation, and Application. PhD thesis (in French) from the Paul
Sabatier's University (Toulouse).
[2] Barrientos-Marin, J., Ferraty, F. and Vieu, P. (2010). Locally Modelled Regression and
Functional Data. J. of Nonparametric Statistics, 22, No. 5, Pages 617632.
[3] Benhenni, K., Griche-Hedli, S., Rachdi, M. (2010). Estimation of the regression operator
from functional xed-design with correlated errors. Journal of Multivariate Analysis,
101, Pages 476-490.
[4] Benhenni, K., Ferraty, F., Rachdi, M. and Vieu, P. (2007). Local smoothing regression
with functional data. Computational Statistics, 22, No. 3, Pages 353369.
[5] Bosq, D. (2000). Linear Processes in Function Spaces : Theory and applications. Lecture
Notes in Statistics, 149, Springer.
[6] Ballo, A. and Gran, A. (2009). Local linear regression for functional predictor and
scalar response, Journal of Multivariate Analysis, 100, Pages 102111.
[7] Chu, C.-K. and Marron, J.-S. (1991). Choosing a kernel regression estimator. With
comments and a rejoinder by the authors. Statist. Sci., 6 (1991), Pages 404436.
[8] Dabo-Niang, S. and Laksaci, A. (2007). Estimation non paramtrique du mode conditionnel pour variable explicative fonctionnelle. Pub. Inst. Stat. Univ. Paris, 3, Pages
2742.
[9] El Methni, M. and Rachdi, M. (2010). Local weighted average estimation of the regression operator for functional data. Commun. Stat., Theory and Methods, to appear.
[10] Ezzahrioui, M. and Ould-Sad, E. (2008). Asymptotic normality of a nonparametric
estimator of the conditional mode function for functional data. J. Nonparametr. Stat.,
20, Pages 318.
[11] Fan, J. (1992). Design-adaptive nonparametric regression.
Pages 9981004.

J. Amer. Statist. Assoc.,

87,

[12] Fan, J. and Yim, T.-H. (2004). A cross-validation method for estimating conditional
densities. Biometrika, 91, Pages 819834.
[13] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and its Applications. London,
Chapman & Hall.
[14] Ferraty, F., Goia, A. and Vieu, P. (2002). Functional nonparametric model for time
series : a fractal approach to dimension reduction. TEST, 11, No. 2, Pages 317344.
[15] Ferraty, F., Laksaci, A., Tadj, A. and Vieu, Ph. (2010). Rate of uniform consistency for
nonparametric estimates with functional variables. Journal of statistical planning and
inference, 140, Pages 335352.
103

104 4.

A fast functional locally modeled of the conditional density and mode in functional time series

[16] Ferraty, F., Laksaci, A. and Vieu, P. (2005). Functional times series prediction via
conditional mode. C. R., Math., Acad. Sci. Paris, 340, Pages 389392.
[17] Ferraty, F., Laksaci, A. and Vieu, P. (2006). Estimating some characteristics of the
conditional distribution in nonparametric functional models. Stat. Inference Stoch.
Process., 9, Pages 4776.
[18] Ferraty, F. and Vieu, P. (2006). Nonparametric functional
Practice. Springer Series in Statistics. New York.

data analysis. Theory and

[19] Hyndman, R. and Yao, Q. (2002). Nonparametric estimation and symmetry tests for
conditional density functions. J. Nonparametr. Stat., 14, Pages 259278.
[20] Laksaci, A. (2007). Convergence en moyenne quadratique de l'estimateur noyau de
la densit conditionnelle avec variable explicative fonctionnelle. Pub. Inst. Stat. Univ.
Paris, 3, Pages 6980.
[21] Laksaci, A., Madani, F. and Rachdi, M. (2012). Kernel conditional density estimation
when the regressor is valued in a semi-metric space Communications in StatisticsTheory
and Methods. (to appear).
[22] Demongeot, J., Laksaci, A., Madani, F. and Rachdi, M. (2011). Functional data : Local
linear estimation of the conditional density and its application Statistics, Volume 00,
Pages 00-00, DOI : 10.1080/02331888.2011.568117.
[23] Demongeot, J., Laksaci, A., Madani, F. and Rachdi, M. (2010) Local linear estimation
of the conditional density for functional data. C. R., Math., Acad. Sci. Paris, 348,
Pages 931-934.
[24] Mller, H.-G. and Stadtmller, U. (2005). Generalized functional linear models.
Stat., 33, No. 2, Pages 774-805.

Ann.

[25] Ouassou, I. and Rachdi, M. (2010). Stein type estimation of the regression operator
for functional data. Advances and Applications in Statistical Sciences, 1, No 2, Pages
233-250.
[26] Ould-Sad, E. and Cai Z. (2005). Strong uniform consistency of nonparametric estimation of the censored conditional mode function. J. Nonparametr. Stat., 17, Pages
797-806.
[27] Rachdi, M. and Sabre, R. (2000). Consistent estimates of the mode of the probability
density function in nonparametric deconvolution problems. Statist. Probab. Lett., 47,
Pages 105114.
[28] Rachdi, M. and Vieu, P. (2007). Nonparametric regression for functional data : automatic smoothing parameter selection. Journal of Statistical Planning and Inference,
137, Pages 27842801.
[29] Ramsay, J.-O. and Silverman, B.-W. (1997).
in Statistics. New York.

Functional data analysis.

[30] Ramsay, J.-O. and Silverman, B.-W. (2002). Applied functional


and case studies. Springer Series in Statistics. New York.
[31] Rio, E. (1990).

Exponential inequalities ...

data analysis. Methods

Springer-Verlag, New York.

[32] Rio, E. (2000). Thorie asymptotique des processus alatoires


Springer, ESAIM, Collection Mathmatiques et Applications
[33] Sarda, P. and Vieu, P. (2000).

Springer Series

Kernel Regression.

faiblement dpendants.

Pages 4370, Wiley, New York.

4.4.

105

Appendix

[34] Vieu, P. (1996). A note on density mode estimation.


297307.

Statist. Probab. Lett.,

26, Pages

[35] Youndj, E. (1993). Estimation non paramtrique de la densit conditionnelle par la


mthode du noyau. PhD thesis (in French) from the Rouen's University.

106 4.

A fast functional locally modeled of the conditional density and mode in functional time series

Chapitre 5
On the quadratic error of the
functional local linear estimate of the
conditional density

5.1

Introduction

The observation of functional variables has become usual due, for instance, to the development of measuring instruments that allow one to observe variables at ner and ner resolutions. Then, as technology progresses, we are able to handle larger and larger datasets.
At the same time, monitoring devices such as electronic equipment and sensors (for registering images, temperature, etc.) have become more and more sophisticated. This high-tech
revolution oers the opportunity to observe phenomena in an increasingly accurate way by
producing statistical units sampled over a ner and ner grid, with the measurement points
so close that the data can be considered as observations varying over a continuum. Such
continuous (or functional) data may occur in biomechanics (e.g. human movements), chemometrics (e.g. spectrometric curves), econometrics (e.g. the stock market index), geophysics
(e.g. spatio-temporal events such as El Nio or time series of satellite images), or medicine
(electro-cardiograms/electro-encephalograms). It is well known that standard multivariate
statistical analyses fail with functional data. However, the great potential for applications
has encouraged new methodologies able to extract relevant information from functional datasets. This Handbook aims to present a state of the art exploration of this high-tech eld,
by gathering together most of major advances in this area. The main statistical topics (classication, inference, factor-based analysis, regression modelling, resampling methods, time
series, random processes) are covered in the setting of functional data. The twin challenges
of the subject are the practical issues of implementing new methodologies and the theoretical techniques needed to expand the mathematical foundations and toolboxes. This chapter
and the following, therefore, mixes practical, methodological and theoretical aspects of the
subject, sometimes within the same chapter (cf. Chapter 2 before). As a consequence, these
results should appeal to a wide audience of engineers, practitioners and graduate students,
as well as academic researchers, not only in statistics and probability but also in numerous
related application areas.
107

108

Chapitre 5. Convergence en moyenne quadratique

It seems, then, natural to assume that the data are actually observations from a random
variable taking values in a functional space. In this chapter, we are interested in the local
polynomial modeling of the conditional density function when the explanatory variable is of
functional type. Such study is motivated by the fact that the local polynomial smoothing
has various advantages over the kernel method, namely this method has superior bias properties to the previous one (see, for example Chu and Marron (1991) and Fan (1992) for an
extensive discussion on the comparison between both these methods). Moreover, as noticed
by Fan and Yao (2003) that the conditional density provides a very informative summary
of response variable that allows us to examine the overall shape of the conditional distribution. In the nonparametric functional statistics, the rst results about the conditional
distribution were obtained in Ferraty et al. (2006). In this last, it is established the almost
complete convergence of the kernel estimator of the conditional density and its derivatives.
the quadratic error of this estimate has been studied by Laksaci (2007). The latter gave the
asymptotic expansion of the exact expression involved in the leading terms of the quadratic
error of the considered estimate. Recently, Ferraty et al. (2010) stated the uniform almost
complete convergence of the kernel estimate of some nonparametric conditional models, in
particular, of the conditional density model.
Since the open question (How can the local polynomial ideas be adapted to innite dimensional settings ?) stated by Ferraty and Vieu (2006), the local linear smoothing in the
functional data setting, have been considered by many authors. We cite, for instance Barrientos et al. (2010), Ballo and Gran (2009), El Methni and Rachdi (2011) which are concerned
with the local linear type regression operator estimation for independent and identically distributed functional data. While, the rst contribution on the local polynomial modeling of
the conditional density function when the explanatory variable is functional were considered
by Demongeot et al. (2010). The authors established the almost complete consistency (in
pointwise and uniform) of a fast functional local linear estimate of the conditional density
when the explanatory variable is functional and the observations are i.i.d. Their study is
extended to dependent case by Demongeot et al. (2011).
In this chapter, we give the convergence rate in mean square of of a fast functional local
linear estimate considered by Demongeot et al. (2010). The expression of this convergence
rate shows the superiority of this method with respect to the kernel method, namely in the
bias terms. It should be noted that the accuracy of our asymptotic results leads to interesting perspectives from a practical point of view, in particular, minimizing mean squared
errors can govern automatic bandwidth selection procedures.
We present our model in Section 5.2. In Section 5.3 we give some notations, hypotheses
and the presentation of the main results. Section 5.4 is devoted to some discussions and
comments on the result. The proofs of the results are relegated to the last section of this
chapter.

5.2

The model

Let us introduce n pairs of random variables (Xi , Yi ) for i = 1, . . . , n that we assume drawn
from the pair (X, Y ) which is valued in F R, where F is a semi-metric space equipped
with a semi-metric d.

5.3.

109

Main results

Furthermore, we assume that there exists a regular version of the conditional probability
of Y given X , which is absolutely continuous with respect to the Lebesgue measure on
R and has two-times continuously dierentiable density, denoted by f x . Local polynomial
smoothing is based on the assumption that functional parameter is smooth enough to be
locally well approximated by a polynomial. In functional statistics, there are several ways for
extending the local linear ideas (cf. Barrientos et al.(2010), Ballo. and Gran (2009)). Here
we adopt the fast functional locally modeling, that is, we estimate the conditional density
f x by b
a which is obtained by minimizing the following quantity :

min

(a,b)IR2

)2
1
1
h1
H H(hH (y Yi )) a b(Xi , x) K(hK (x, Xi ))

(1)

i=1

where (., .) (resp. (., .) ) is a known operator from F 2 into IR such that, F , (, ) = 0
(resp. (, ) = 0), with K and H are kernels and hK = hK,n (resp. hH = hH,n ) is chosen
as a sequence of positive real numbers. Clearly, by a simple algebra, we get explicitly the
following denition of fbx :

fbx (y) =

1
i,j=1 Wij (x)H(hH (y

hH ni,j=1 Wij (x)

Yi ))

(2)

where
1
Wij (x) = (Xi , x) ((Xi , x) (Xj , x)) K(h1
K (x, Xi ))K(hK (x, Xj ))

with the convention 0/0 = 0.

5.3

Main results

In the remainder of the chapter, we set :

and

x (r1 , r2 ) = IP(r2 (x, X) r1 )


l f x (y)
l (, y) =
y l
l (s) = E [l (X) l (x)|(x, X) = s] , for some l {0, 2}

We will assume the following hypotheses :


(H1) For any r > 0, x (r) := x (r, r) > 0 and there exists a function x () such that :

t [1, 1] lim

h0

x (h, th)
= x (t)
x (h)

(H2) For l {0, 2}, the quantities l (0) exist, where l denotes the rst derivative of
l .

110

Chapitre 5. Convergence en moyenne quadratique

(H3) The function (., .) is such that :

z F, C1 |(x, z)| |(x, z)| C2 |(x, z)|, where C1 > 0, C2 > 0,

sup |(u, x) (x, u)| = o(r)


uB(x,r)

(
)

and
hK B(x,hK ) (u, x)dP (u) = o B(x,hK ) 2 (u, x) dP (u)
where B(x, r) = {z F/|(x, z)| r} and dP (x) is the probability distribution of X .
(H4) K is a positive, dierentiable function supported withim [1, 1]. Its derivative K
satises K (t) < 0, for 1 t < 1, and K(1) > 0.

(H5) H is a positive function, integrable, bounded, symmetric and such that

1 et t2 H(t)dt < ,

H(t)dt =

(H6) The bandwidths hK and hH satisfy : lim hK = 0, lim hH = 0 et lim n hH (hK ) =


n
n
n
.
Notice that, (H1) and (H2) are a simple adaptation of the conditions H1 and H3 in Ferraty
et al. (2007) when we replace the metric by some bi-functional . The second part of the
condition (H3) is unrestrictive and is veried, for instance, if (, ) = (, ) or, moreover, if


(u, x)


lim
1 = 0
(x,u)0 (x, u)
because, u B(x, r), we have




|(u, x) (x, u)| (u, x)

1 .
r
(x, u)
The rest of (H3) has been introduced and commented by Barrientos et al. (2010) by giving
a several examples of bi-functional operators and satisng this condition. Conditions
(H4)-(H6) are standards and classically used in the context of quadratic errors in functional
statistic.

Theorem 5.3.1.

Under assumptions (H1)-(H6), we have that :

[
]2
2
2
E fx (y) f x (y)
= BH
(x, y)h4H + BK
(x, y)h2K
+

VHK (x, y)
+ o(h4H ) + o(h2K ) + o
nhH x (hK )

1
n hH x (hK )

with

1 2 fYX (x, y)
BH (x, y) =
t2 H(t)dt
2
y 2
(
)
1
K(1) 1 (u3 K(u)) x (u)du
)
BK (x, y) = 0 (0) (
1
K(1) 1 (u2 K(u)) x (u)du

(3)

5.4.

111

Some comments and discussion

and

(
)
VHK (x, y) = fYX (x, y)

2 (1) 1 (u4 K 2 (u)) (u)du


K
x
1

H 2 (t)dt (
)2
1
K(1) 1 (u2 K(u)) x (u)du
)

If we set that :
n
n

1
1
b
Wji Hi (y) and fD (x) =
Wij
n (n 1)EW12
n (n 1)EW12
i=,1
i=j,1
(
)
where Ki = K h1
(x,
X
)
, Hi (y) = H(h1
i
K
H (y Yi )), for all i = 1, . . . , n, then we obtain
the following lemmas which will be useful for Theorem 5.3.1's proof.

fbN (x, y) =

Lemma 5.3.1.

Under the hypotheses of Theorem 5.3.1, we have :

Lemma 5.3.2.

Under the hypotheses of Theorem 5.3.1, we have :

Lemma 5.3.3.

Under the hypotheses of Theorem 5.3.1, we have :

[
]
E fbN (x, y) fYX (x, y) = BH (x, y)h2H + BK (x, y)hK + o(h2H ) + o(hK )
(
)
[
]
V
(x,
y)
1
HK
V ar fbN (x, y) =
+o
nhH x (hK )
nhH x (hK )
Cov(fbN (x, y), fbD (x)) = O

Lemma 5.3.4.

5.4

1.

1
nx (hK )

Under the hypotheses of Theorem 5.3.1, we have :

(
[
]
b
V ar fD (x) = O

1
nx (hK )

Some comments and discussion


Remarks on the model :

In the present work, the functional space of our model is characterized by the regularity condition (H2). Of course this condition is closely related to the existence of
l f X (,y)

Y
the dierentiability of the operators y
and fYX (, y) (cf. Ferraty et al. (2007) for
l
more discussions on the link between the existence of the derivative of l and l ). It
should be noted that, this condition is used in order to keep the usual form of the
quadratic error (cf. Vieu, 1991). However, if we replace (H2) by a Lipschitz condition
as :
(y1 , y2 ) Ny Ny (x1 , x2 ) Nx Nx ,

(
)
fYX (x1 , y1 ) fYX (x2 , y2 ) C |(x1 , x2 )|2 + |y1 y2 |2
which is less restrictive than the condition (H2), we obtain a result as follows
)
(
[
]2
( 4
)
1
X
X
2
b
fY (x, y) fY (x, y) = O hH + hK + O
.
nhH x (hK )
But, such expression of the convergence rate is inexact and can not be used to determine the smoothing parameter. In other words, this condition of the dierentiability is
a good compromise for obtaining an expression asymptotically exact of the convergence
rate.

112

5.5

Chapitre 5. Convergence en moyenne quadratique

Proofs

Proof of Theorem 5.3.1. Since

[
]2 [ (
)
]2
[
]
E fbYX (x, y) fYX (x, y) = E fbYX (x, y) fYX (x, y) + V ar fbYX (x, y)

the proof of this Theorem is based on the separate calculate separately of the two parts :
bias and variance terms. Then we have to distinguish two stages in this proof. Firstly, for
the bias term, we recall that, for all z = 0, p N , we can write :

1
(z 1)p+1
= 1 (z 1) + . . . + (1)p (z 1)p + (1)p+1
z
z
By using this decomposition for z = fbD (x) and p = 1 we show that
(
)
fbYX (x, y) fYX (x, y) = fbN (x, y) fYX (x, y)

(
)(
)
fbN (x, y) E fbN (x, y) fbD (x) 1
(
)
(E fbN (x, y)) fbD (x) 1
(
)2
+ fbD (x) 1 fbYX (x, y)

(4)

which imply that :


[
]
(
)
E fbYX (x, y) fYX (x, y) = E fbN (x, y) fYX (x, y) Cov(fbN (x, y), fbD (x))

(
)2
+E fbD (x) E fbD (x) fbYX (x, y)
As the kernel H is bounded, we can nd a constant C > 0 such that fbYX (x, y) Ch1
H .
Hence,
[
]
(
)
E fbX (x, y) f X (x, y) = E fbN (x, y) f X (x, y) Cov(fbN (x, y), fbD (x))
Y

(
)
+V ar fbD (x) O(h1
H ).
Secondly, concerning the variance term, we use similar ideas as those used by Sarda and
Vieu (2000), and Bosq et Lecoutre (1987) to deduce that :
[
]
[
]
V ar fbYX (x, y) = V ar fbN (x, y)

2(E fbN (x, y))Cov(fbN (x, y), fbD (x))


+(E fbN (x, y))2 V ar(fbD (x)) + o

1
nhH (hK )

5.5.

113

Proofs

Finally, the proof of Theorem 5.3.1 becomes a direct consequence of Lemmas 5.3.1, 5.3.2,
5.3.3 and 5.3.4.

Proof of Lemma 5.3.1 We start by writing :

[
]
E fbN (x, y) = E

1
n(n 1)E[W12 ]

Wij Hi =

j=i,1

E[W12 H1 ]
1
=
E [W12 E[H1 /X]]
E[W12 ]
E[W12 ]
(5)

To evaluate the quantity E[H1 /X], we use the usual change of variable t = h1
H (y z). Thus,
(
)

yz
1
X
H
fY (X, z)dz = H(t)fYX (X, y hH t)dt.
E[H1 /X] =
hH
hH
As fYX (X, .) is of class C 2 in y , then, we can use the Taylor development of order two as
follows

fYX (X, y hH t) = fYX (X, y) hH t

fYX (X, y) h2H t2 2 fYX (X, y)


+
+ o(h2H ).
y 2
2
y 2

Moreover, under (H5), we get :

E[H1 /X] =

fYX (X, y)

h2 2 fYX (X, y)
+ H
2
y 2

which can be re-written as

h2
E[H1 /X] = 0 (X, y) + H
2
It follows, from (5), that :
[
]
E fbN (x, y) =

t2 H(t)dt + o(h2H ).
)

t H(t)dt 2 (X, y) + o(h2H ).

(
)
1
E [W12 0 (X, y)] + E [W12 2 (X, y)] + o(h2H )
E[W12 ]

Now, by the same arguments as those used by in Barrientos et al. (2010) for the regression
function, we show that :

E [W12 l (X, y)] = l (x, y)E[W12 ] + E [W12 (l (X, y) l (x, y))]

= l (x, y)E[W12 ] + E [W12 E [l (X, y) l (x, y)|(x, X)]]

= l (x, y)E[W12 ] + E [W12 (l ((x, X))] .

Moreover, since l (0) = 0, for l {0, 2}, we have that :

114

Chapitre 5. Convergence en moyenne quadratique

E [W12 (l ((x, X))] = l (0)E [(x, X)W12 ] + o(E [(x, X)W12 ]).
Hence, [
(
)
]
h2 2 fYX (X,y) 2
2 IE[(x,X)W12 ]
E fbN (x, y) = fYX (x, y) + 2H
t
H(t)dt
+
o
h
2
H
E[W12 ]
y
IE [(x, X)W12 ]
+0 (0)
E[W12 ]

(
+o

E [(x, X)W12 ]
E[W12 ]

It is clear that,
[
]
E [(x, X)W12 ] = E K1 12 1 EK1 E[K1 1 ]E[K1 1 1 ]E [W12 ]

= E[K1 12 ]EK1 (E[K1 1 ])2 .


Observe that, under (H4), for all a > 0

E[K1a 1 ] C
(u, x)dP (u)
B(x,hK )

Using, the last part of (H3) we get :


(

hK E[K1a 1 ]

)
2

=o

= o(h2K x (hK ))

(u, x)dP (u)


B(x,hK )

So, we can see that :

(6)

E[K1a 1 ] = o(h2K x (hK ))

Moreover, for all b > 1, we can write

[
]
E[K1a 1b ] = E[K1a b (x, X)] + E K1 ( b (X, x) b (x, X))

and, the second part of (H3) implies that,


[
]
E K1a ( b (X, x) b (x, X))
[

=E

K1a 11B(x,hK ) ((X, x)

(x, X))(

bl

(X, x) (x, X))

l=1

sup

|(u, x) (x, u)|

uB(x,hK )

[
]
E K1a 11B(x,hK ) ||bl (X, x)||l (x, X))

l=1

whereas the rst part gives,

11B(x,hK ) |(X, x)| 11B(x,hK ) |(x, X)| b.


Thus,
[
]
E K1a ( b (X, x) b (x, X)) b supuB(x,hK ) |(u, x) (x, u)||E[K1a ||b (x, X)]

b supuB(x,hK ) |(u, x) (x, u)|hbK E[K1a ]

then,

b supuB(x,hK ) |(u, x) (x, u)|hbK x (hK )


E[K1a 1b ] = E[K1a b (x, X)] + o(hbK x (hK ))

5.5.

115

Proofs

Concerning the rst term, we write :


a b
hb
K E[K1 ] =

v b K a (v)dP hK

1 [
1

K a (1)

(x,X) (v)

1(
v

) ]
1
(ub K a (u)) du dP hK (x,X) (v)

(
)
1
= K(1)x (hK ) 1 (ub K a (u)) x (hK , uhK )du
(
)
1
K ,uhK )
du
= x (hK ) K(1) 1 (ub K a (u)) x (h
x (hK )
Finally, under (H1), we obtain :
(
)
1
a b
b
b a

E[K1 1 ] = h x (hK ) K(1)


(u K (u)) x (u)du + o(hbK x (hK ))
1

(7)

It follows that :

(
)
1
E[W12 ] = h2K 2x (hK ) K(1) 1 (u2 K(u)) x (u)du
(

K(1)

and

)
(u) (u)du + o(h2 2 (h ))
K
x
K x K
1

(
)
1
E [(x, X)W12 ] = h3K 2x (hK ) K(1) 1 (u3 K(u)) x (u)du
(

K(1)

)
(u) (u)du + o(h3 2 (h )).
K
x
K x K
1

Consequently
[
]
E fbN (x, y) = fYX (x, y) +

+hK 0 (0)

h2H 2 fYX (x,y)


2
y 2

t2 H(t)dt + o(h2H )

1
(u3 K(u)) x (u)du)
(K(1) 1
1
+ o(h2K )
(K(1) 1 (u2 K(u)) x (u)du)

Proof of Lemma 5.3.2 It is clear that


(
)
V ar fbN (x, y)
=

1
V ar
(nhH (n 1)E[W12 ])2

Wij Hi

(8)

i=j =1

(
1
2
2
2 n(n 1)E[W12 H1 ] + n(n 1)E[W12 W21 H1 H2 ]
(n(n 1)hH (EW12 ))
n(n 1)(n 2)E[W12 W13 H12 ] + n(n 1)(n 2)E[W12 W23 H1 H2 ]

(9)

n(n 1)(n 2)E[W12 W31 H1 H3 ] + n(n 1)(n 2)E[W12 W32 H1 H3(] 10)
n(n 1)(4n 6)(E[W12 H1 ])2

(11)

116

Chapitre 5. Convergence en moyenne quadratique

E[W12 H1 ]
= O(1). Furthermore, by a simple maniE[W12 ]
pulation and by using (6) and (7) we arrive at :
Observe that the previous lemma gives

2 H 2]

E[W12
1

E[W12 W13 H12 ]

E[W12 W21 H1 H2 ]

E[W12 W23 H1 H2 ]

hf ill

E[W12 W31 H1 H3 ]

E[W12 W32 H1 H3 ]

= O(h4K hH 2x (hK )),


= E[14 K12 H12 ](E[K1 ])2 + o(h4K hH 3x (hK ))
= O(h4K h2H 2x (hK )),
= O(h4K h2H 3x (hK )),

= O(h4K h2H 3x (hK ))


= O(h4K h2H 3x (hK ))

(
)
Therefore, the second quantity is the leading term in V ar fbN (x, y) . This term can be
evaluated, by the same arguments used in the pervious proof. Indeed :
[
]
E[14 K12 H12 ] = E 14 K12 E(H12 /X)
)
]
[
(
X (X, z)dz
f
= E 14 K12 2 yz
Y
hH
[
]
= hH E 14 K12 2 (t)fYX (X, y hH t)dt .
From the rst order Taylor's expansion, we have

fYX (X, y hH t) = fYX (X, y) + O(hH ) = fYX (X, y) + o(1).


Next,

E[14 K12 H12 ] = hH

[
]
(
)
H 2 (t)dt E 14 K12 fYX (X, y) + o hH E[14 K12 ] .

Once again we follow the same steps as in the previous Lemma to write
[
]
E 14 K12 fYX (X, y) = fYX (x, y)E[14 K12 ] + o(E[14 K12 ])
which implies that

E[14 K12 H12 ] = hH fYX (x, y)

(
)
(
)
H 2 (t)dt E[14 K12 ] + o hH E[14 K12 ] .

5.5.

117

Proofs

Combining the last equation with (11) we obtain


(
)[
]
(
)
(
)
fYX (x, y)
x (hK )E[14 K12 ]
1
2
b
V ar fN (x, y) =
H (t)dt
+o
. (12)
nhH x (hK )
nhH x (hK )
(E[12 K1 ])2
]
[
(
)
1
) (K 2 (1)1
(u4 K 2 (u)) x (u)du)
fYX (x,y) (
2
b
V ar fN (x, y)
= nhH x (hK )
H (t)dt
1
2
(u2 K(u)) x (u)du)
(K(1) 1
So, by (7) we obtain :
(
)
+o nhH 1x (hK ) .

Proof of Lemma 5.3.3 The proof of this lemma is very similar to the proof of the Lemma

5.3.2. It suces to write :

Cov(fbN (x, y), fbD (x)) =

n
n
1
Cov(
W
H
,
ij
i
2

i =j =1 Wi j )
i
=
j
1
hH (n(n1)E[W12 ])
=

1
hH (n(n1)EW12 )2

2 H ]
n(n 1)E[W12
1

+n(n 1)E[W12 W21 H1 ]n(n 1)(n 2)E[W12 W13 H1 ]


+n(n 1)(n 2)E[W12 W23 H1 ]n(n 1)(n 2)E[W12 W31 H1 ]
+n(n 1)(n 2)E[W12 W32 H1 ]
n(n 1)(4n 6)(E[W12 H1 ]E[W12 ])
By a simple algebra, we deduce that :

2 H ]

E[W12
= O(h4K hH 2x (hK )),

E[W12 W13 H1 ] = O(h4K hH 2x (hK )),

E[W W H ] = O(h4 h 2 (h )), E[W W H ] = O(h4 h 2 (h )),


12 23 1
12 21 1
K H x K
K H x K

E[W12 W31 H1 ] = O(h4K hH 2x (hK )),

E[W12 W32 H1 ] = O(h4K hH 2x (hK ))

Furthermore, as E[W12 ] = O(h2K x (hK )), we have :


(
)
1
b
b
Cov(fN (x, y), fD (x)) = O
.
nx (hK )

(13)

118

Chapitre 5. Convergence en moyenne quadratique

Proof of Lemma 5.3.4 Similarly to Lemma 5.3.2, we write :


(
)
V ar fbD (x)
=

1
(n(n1)(E[W12 ]))2

(
V ar

(
n

1
(n(n1)(EW12 ))2

))

i=j =1 Wij

2 ]
n(n 1)E[W12

+n(n 1)E[W12 W21 ]n(n 1)(n 2)E[W12 W13 ]


+n(n 1)(n 2)E[W12 W23 ]n(n 1)(n 2)E[W12 W31 ]
+n(n 1)(n 2)E[W12 W32 ]
n(n 1)(4n 6)(E[W12 ])2
Because of :

2 ]

E[W12
= O(h4K 2x (hK )),

E[W12 W13 ] = O(h4K 2x (hK )),

4 2

E[W12 W21 ] = O(hK x (hK )),

E[W12 W23 ] = O(h4K 2x (hK )),

E[W12 W31 ] = O(h4K 2x (hK )),

E[W W ] = O(h4 2 (h ))
12 32
K x K

we have that :

(
)
b
V ar fD (x) = O
(

which completes the proof.

1
nx (hK )

)
.

Bibliographie

[1] Barrientos-Marin, J., Ferraty, F. and Vieu, P. (2010). Locally Modelled Regression and
Functional Data. J. of Nonparametric Statistics, 22, No. 5, Pages 617632.
[2] Ballo, A. and Gran, A. (2009). Local linear regression for functional predictor and
scalar response, Journal of Multivariate Analysis, 100, Pages 102111.
[3] Bosq, D., Lecoutre, J. P., (1987), Thorie de l'estimation fonctionnelle., Ed. Economica.
[4] Chu, C.-K. and Marron, J.-S. (1991). Choosing a kernel regression estimator. With
comments and a rejoinder by the authors. Statist. Sci., 6, Pages 404436.
[5] El Methni, M. and Rachdi, M. (2010). Local weighted average estimation of the regression operator for functional data. Commun. Stat., Theory and Methods, Volume 00,
Pages 0000.
[6] Fan, J. (1992). Design-adaptive nonparametric regression.
Pages 9981004.

J. Amer. Statist. Assoc.,

87,

[7] Fan, J., Yao, Q. and Tong, H. (1996). Estimation of conditional densiities and senscitivity measures in nonlinear dynamical systems. Biometrika, 83, Pages 189-206.
[8] Fan, J. and Yao, Q. (2003). Nolinear
Methods, Springer-Verlag, New York.

Time Series : Nonparametric and Parametric

[9] Ferraty, F., Laksaci, A., Tadj, A., and Vieu, P. (2010). Rate of uniform consistency for
nonparametric estimates with functional variables. Journal of statistical planning and
inference, 140, Pages 335352.
[10] Ferraty, F., Laksaci, A. and Vieu, P. (2006). Estimating some characteristics of the
conditional distribution in nonparametric functional models. Stat. Inference Stoch.
Process., 9, Pages 4776.
[11] Ferraty, F. and Vieu, P. (2006). Nonparametric functional
Practice. Springer Series in Statistics. New York.

data analysis. Theory and

[12] Laksaci, A. (2007). Convergence en moyenne quadratique de l'estimateur noyau de


la densit conditionnelle avec variable explicative fonctionnelle. Pub. Inst. Stat. Univ.
Paris, 3, Pages 6980.
[13] Sarda, P., Vieu, P. (2000), Kernel regression., Schimek, Michael G. (ed.), Smoothing
and regression. Approaches, computation and application. Chichester : Wiley. Wiley
Series in Probability and Statistics. 43-70.
[14] Vieu, P. (1991), Quadratic errors for nonparametric estimates under dependence.,
Multivariate Anal., 39, 324-347.

119

J.

120

Chapitre 5. Convergence en moyenne quadratique

Chapitre 6
Estimation locale linaire des
paramtres conditionnels pour des
donnes fonctionnelles : Application
sur des donnes simules et relles

La densit conditionnelle est un outil fondamental pour dcrire la relation entre deux variables alatoires. Dans ce chapitre, nous allons dterminer ce lien par la mthode d'estimation non paramtrique locale linaire. L'objectif principal est de montrer l'aide de donnes
simules puis relles l'applicabilit de cette mthode dans le cadre fonctionnel. Dans un
premier temps, nous illustrons le mode conditionnel comme outil de prvision trs li l'estimation de la densit conditionnelle. Ensuite, nous proposons une implmentation facile et
rapide de l'estimateur de la densit conditionnelle propos dans le chapitre 2. Enn, une
application sur des donnes relles prouvera la supriorit de la mthode d'estimation par
polynmes locaux sur la mthode noyau. Nous abordons galement, dans cette tude, de
nombreuses questions d'ordre pratique telles que le choix du paramtre de lissage et des
deux oprateurs et (cf. les chapitres prcdentes). C'est pourquoi, nous numrerons
direntes mthodes permettant de faire une slection optimale de ces lments.

6.1

Illustration du mode conditionnel

Nous rcoltons des observations fonctionnelles gnres l'aide du processus suivant :

Xi (t) = cos((1 Wi )t) + sin(Wi t) + bi , t [0, ] et i = 1, 2, . . . , n


o Wi (respectivement, bi ) est distribu selon la loi uniforme U (0, 1) (respectivement, normalN (1, 0.4)).
Nous supposons que ces courbes sont observes sur une grille forme d'une discrtisation en
100 points de l'intervalle (0, ). Ces variables fonctionnelles sont reprsentes dans la Figure
6.1.

121

122

Chapitre 6. Application sur des donnes simules et relles

20

40

60

80

100

Time

Figure 6.1  Les courbes Xi


Concernant les variables rponse Yi , nous les rcoltons selon le modle de rgression suivant :
(
)
1

Y = r(X) + o r(x) = 4 exp


1 + 0 |x(t)|2 dt
et est une variable alatoire suivant la loi normale N (0, 0.3). L'objectif de cette illustration est de montrer l'utilit de la densit conditionnelle dans un contexte de prvision. A
cette n, nous partageons notre chantillon en deux paquets d'observations : un chantillon
d'apprentissage (Xi , Yi )i=1,...,500 et un chantillon de test (Xi , Yi )i=501,...,550 . Pour ce dernier
sous-chantillon (chantillon de test), nous supposons que les valeurs rponses sont inconnues
et nous allons les approximer par (Xi ), o :

(x) = argy max f x (y)


que nous estimons par :

b
(x)
= argy max fb(hK ,hH ) (x, y)

fb(hK ,hH ) (x, y) =


avec

1
i,j=1 Wij (x)H(hH (y
n
hH i,j=1 Wij (x)

Yi ))

1
Wij (x) = (Xi , x) ((Xi , x) (Xj , x)) K(h1
K (x, Xi ))K(hK (x, Xj ))

en considrant la convention : 0/0 = 0.


Dans le but de vrier l'ecacit de ce modle dans cette analyse prvisionnelle, on compare
b i ))i=501,...,550 . Comme nous l'avions mentionn
les quantits (Yi )i=501,...,550 aux quantits ((X
ci-dessus, la dtermination pratique de cet estimateur dpend du choix des paramtres hK ,
hH , et . Concenrnant le choix des paramtres de lissage hK et hH , rappelons que celui-ci
s'appuie sur le critre de slection qui a t adopt par Ferraty et Vieu (2006) dans le cadre

6.1.

Illustration du mode conditionnel

123

de la mthode d'estimation noyau. En eet, les paramtres hK et hH s'obtiennent en


minimisant le critre suivant :
pour chaque courbe Xi dans l'chantillon test, err(hK , hH ) = |Yi (Xi )|

(1)

o i dsigne l'indice de la courbe la plus proche de Xi parmi toutes les courbes de l'chantillon d'apprentissage. Plus prcisment, ces paramtres de lissage sont slectionns en minimisant le critre (1), sur l'ensemble des plus proches voisins. Notons que cette mthode de
slection est trs compatible avec nos jeux des donnes. Cependant, notre connaissance, il
n'y a pas eu d'tude thorique qui permet de prouver l'optimalit de cette mthode, mme
lorsqu'on utilise la mthode noyau. D'autre part, le choix des oprateurs et joue un
rle essentiel. Concrtement, nos rsultats thoriques orent, sous certaines conditions, une
grande souplesse dans le choix de ces deux oprateurs (cf. les chapitres 2-4). Dans notre
cas, nous nous sommes concentrs sur le cas o et sont issus de la mtrique de l'indice
fonctionnel. Plus prcisment, nous considrons les oprateurs :

(x, x ) =< 1 , x x >


Il est clair, qu'avec ce choix, on peut contrler l'approximation des courbes avec signe (ou
courbes signes). Il est noter que lorsque les courbes sont trs lisses (i.e., rgulires), on
peut remplacer le produit scalaire < x, x > par < x(q) , x(q) > o la valeur de l'entier naturel
q est choisie en fonction du degr de lissage des courbes. Ainsi, l'oprateur sera choisi en
fonction des paramtres 1 et q . Rappelons que les ides dans Ait-Saidi et al. (2007), peuvent
tre adaptes an de trouver une mthode de slection pratique pour 1 . Cependant, cette
adaptation, dans le cas de l'estimation de la densit conditionnelle, exige des outils et des
rsultats prliminaires supplmentaires (cf. la discussion dans Attaoui et al. (2010)).
Dans cette illustration, nous slectionnons l'indice fonctionnel 1 sur l'ensemble des vecteurs
propres de l'oprateur de covariance empirique :

1
t (Xi X).

(Xi X)
100
100

i=1

Nous renvoyons Barrientos et al. (2010) pour plus de discussions sur l'importance et la
motivation de ce choix. Finalement, l'oprateur est choisi en fonction des deux paramtres :
 q : l'ordre de la drive des donnes fonctionnelles.
 q1 : l'ordre de la valeur propre associe au vecteur propre fonctionnel 1
De mme pour l'oprateur , nous pouvons prendre aussi :

(x, x ) =< 2 , x x >


et nous considrons la mme procdure que pour . Toutefois, nous pouvons identier
la mtrique d i.e.,

(x, x ) = d(x, x ) = < x x , x x >


An de mettre en vidence ces nombreuses possibilits de choix des paramtres intervenant
dans nos estimations, nous avons eectu nos simulations en considrant deux cas et plusieurs
valeurs de q , q1 et q2 (l'ordre de la valeur propre associe au vecteur propre fonctionnel 2 ).
A ce statde, nous ramarquons que les rsultats sont lgrement meilleurs pour q = 0, q1 = 1
et q2 = 2 (cf. Figure 6.2).

124

14
12
10

Responses

16

18

20

Chapitre 6. Application sur des donnes simules et relles

10

12

14

16

18

Predicted values

Figure 6.2  Les rsultats pour (., .) = d(., .) et q1 = 1

125

10

12

14

16

18

20

Illustration du mode conditionnel

Responses

6.1.

10

12

14

16

18

Predicted values

Figure 6.3  Les rsultats pour (x, x ) =< 2 , x x > o (q1 , q2 ) = (2, 1)

126

Chapitre 6. Application sur des donnes simules et relles

En conclusion, nous pouvons arm que l'estimation du mode conditionnel, en utilisant


la mthode par polynmes locaux, est trs ecace comme modle de prvision quand les
donnes sont de type fonctionnel. Ceci est d'autant vrai, mme si l'on ne dispose pas de
rsultats thoriques permettant de dterminer automatiquement les dirents paramtres
intervenant dans l'estimateur. De plus, on constate que ce modle est facile manipuler et
que nos programmes garantissent une rapidit certaine dans l'obtention des rsultats. D'un
autre ct, on constate que le deuxime choix de est lgrement meilleur que le premier
choix i.e., quand est identi la mtrique. En eet, nous obtenons :
MSE =

550
1
b i ))2 = 0.27
(Yi (X
50
i=501

pour le premier cas et 0.19 pour le deuxime.

6.2

Illustration de la densit conditionnelle

Dans ce paragraphe, nous gardons le mme jeu de donnes que dans la section prcdente
et nous comparons l'estimateur :

fb(hK ,hH ) (x, y) =

1
i,j=1 Wij (x)H(hH (y
n
hH i,j=1 Wij (x)

Yi ))

la vraie densit conditionnelle qui est donnes par :

1
1
f x (y) = exp( (y r(x))2 )
2
2
Cette illustration est motive par le fait que la densit conditionnelle pourrait tre utilise
d'autres nalits (le test de multi-modalit des donnes, l'estimation de la fonction de
hasard,...) et pas seulement comme tant une tape prliminaire l'estimation du mode
conditionnel. Il est donc trs intressant de montrer l'applicabilit de cette mthode sans
avoir comme objet fondamental l'obtention du mode conditionnel. Le d principal est de
trouver des critres de choix dirents de ceux proposs dans le cas prcdent. Pour cela,
nous proposons d'utiliser des ides similaires celles vues dans le deuxime chapitre pour
la mthode du noyau. Autrement dit, le critre de choix naturel des paramtres hK et hH
est bas sur la minimisation des erreurs :
(
)2
b
b
d1 (f(hK ,hH ) , f ) =
f(hK ,hH ) (x, y) f (x, y) W1 (x)W2 (y) dPX (x) dy

d2 (fb(hK ,hH ) , f ) =

)2 W (X )W (Y )
1 (b
1
i
2 i
f(hK ,hH ) (Xi , Yi ) f (Xi , Yi )
n
f (Xi , Yi )
n

i=1

d3 (fb(hK ,hH ) , f ) =

(
)2
IE fb(hK ,hH ) (x, y) f (x, y) W1 (x)W2 (y)dPX (x)dy

6.3.

127

Application sur des donnes relles

Cependant, pour la dtermination pratique de nos paramtres, nous considrons le critre


de validation croise suivant :

n
n
1
2 bi
i2
b
W1 (Xi ) f(hK ,hH ) (Xi , y)W2 (y)dy
f(hK ,hH ) (Xi , Yi )W1 (Xi )W2 (Yi )
n
n
i=1

i=1

o
k
(Xk , y) =
fb(h
K ,hH )

1
k=i,j=1 Wij (Xk )H(hH (y
n
hH i,j=1 Wij (Xk )

Yi ))

Il s'agit, en fait, d'une adaptation de l'tude eectue dans le chapitre 2 sur l'estimateur
noyau de la densit conditionnelle. Avec cette technique, nous obtenons des rsultats de
simulation satisfaisants mais l'optimalit asymptotique de la mthode reste prouver (il
s'agit donc d'une question ouverte). Dans ce qui suit, nous gardons les mmes oprateurs
et , et nous supposons que W1 = W2 1. Enn, nous utilisons les 119 observations
(Xi , Yi ) pour calculer l'estimateur de f (y|X120 ) quand y appartient l'intervalle [0.9
min (Yi ), 1.1 max (Yi )]. Les rsultats de nos investigations gurent dans la Figure
i=1,...,119

6.4.

i=1,...,119

An de mettre en vidence le rle, crucial, des paramtres de lissage dans cette estimation,
nous perturbons ce choix en considrant deux couples de valeurs arbitraires : (1) un couple de
valeurs plus petites que les paramtres optimaux (hK , hH ) = (0.29, 1.40) et (2) l'autre couple
comporte des valeurs plus grandes que les valeurs fournies par notre critre, c'est--dire, nous
considrons le couple (hK , hH ) = (0.66, 3.40). Nous remarquons que le couple de paramtres
optimaux relativement notre critre fournit des rsultats d'estimation nettement meilleurs,
car on aboutit une erreur quadratique moyenne MSE = 0.002, comme erreur d'estimation,
alors que pour le cas (1) nous obtenons MSE = 0.01 et pour le cas (2) l'erreur MSE = 0.006.

6.3

Application sur des donnes relles

Dans cette Section, nous utilisons un jeu de donnes relles an d'atteindre deux buts. La
premire nalit est de donner la bonne mise en application de notre technique sur le plan
pratique, et la seconde est de fournir une comparaison de l'estimation du mode conditionnel
par la mthode du noyau celle obtenue par la mthode d'estimation par polynmes locaux.
Pour ce faire, nous considrons les courbes spectromtriques de masse de 197 morceaux de
viande et nous xons comme objectif la prvision du taux de matire grasse, Y , dans un
morceau de viande connaissant sa courbe spectromtrique de masse, X . Ces courbes X sont
prsentes dans la Figure 6.5.

Plus prcisment, dans ce qui suit, nous comparons la prvision via le mode conditionnel, en
utilisant les deux quantits bF LM (estimation du mode par la mthode d'estimation fonctionnelle localement linaire) et bKM (estimation du mode par la mthode d'estimation
noyau). Pour ceci, nous partageons les 197 observations, disponibles sur le site internet du
groupe de travail STAPH de l'Universit Paul Sabatier (Toulouse 3) 1 , comme suit : nous
1. http ://www.math.univ-toulouse.fr/staph/npfda/

128

Chapitre 6. Application sur des donnes simules et relles

0.0

0.2

0.4

Le cas o le choix des paramtres de lissage


par des validations croise (h_K,h_H)=( 0.44, 2.40)

20

40

60

80

100

80

100

80

100

Time

0.0

0.2

0.4

Le cas o les paramtres


de lissage sont fixs (h_K,h_H)=( 0.66, 3.40)

20

40

60
Time

0.0

0.4

0.8

Le cas o les paramtres


de lissage sont fixs (h_K,h_H)=( 0.29, 1.40)

20

40

60
Time

Figure 6.4  La densit conditionnelle f x est en pointills et son estimateur fb(h

continue

K ,hH )

est en ligne

129

Application sur des donnes relles

Absorbances
4

6.3.

850

900

950
wavelengths

1000

1050

Figure 6.5  Donnes spectromtriques de masse


considrons les 170 premires observations comme chantillon d'apprentissage et les 17 dernires comme chantillon du test. Pour calculer bF LM , nous utilisons la mme procdure que,
dans la premire section, pour le choix des paramtres (hK , hH ) ainsi que pour l'oprateur
. Concernant l'oprateur , nous prenons :

(x, x ) = d(x, x ) =

(x(q) x(q) )2

Il est rappeler que cette mtrique pour q = 2 peut tre considre comme la plus adapte
pour ce type de donnes. Aussi, nous constatons, ici, que l'oprateur associ aux mmes
valeurs (que prcdment) i.e., q = 2 et q1 = 69, donne des rsultats de prvision optimales
relativement l'erreur :
197
1
(Yi bF LM (Xi ))
MSE(FLM) =
17
i=171

Par ailleurs, pour calculer bKM , nous utilisons le programme d Ferraty et Vieu (2006)
tlchargeable sur le site http ://www.math.univ-toulouse.fr/staph/npfda/ STAPH pour la
mme mtrique d (cf. note de bas de page pour l'adresse Internet). En considrant un noyau
quadratique, nous obtenons les rsultats conns dans la Figure 6.6.

A partir des rsultats obtenus, nous pouvons conclure clairement que, la mthode d'estimation par polynmes locaux est nettement meilleure et plus performante que la mthode
d'estimation noyau : en fait, on obtient MSE(FLM) = 3.84 contre MSE(KM) = 5.42. De

130

Chapitre 6. Application sur des donnes simules et relles

30
10

20

Responses

30
20
10

Responses

40

Mthode du noyau,
M.S.E=5.42

40

Mthode polynmes locaux,


M.S.E=3.84

10

20

30

40

Predicted values

10 20 30 40 50
Predicted values

Figure 6.6  Les rsultats de prvision pour les deux mthodes d'estimation

6.3.

Application sur des donnes relles

131

plus, nos rsultats sont aussi comparables d'autres outils de prvision tels que la rgression
et la mdiane pour lesquelles l'erreur de prvision est de 3.5 pour la rgression et de 3.44
pour la mdiane conditionnelle (cf. Ferraty and Vieu, 2006).

132

Chapitre 6. Application sur des donnes simules et relles

Chapitre 7
Conclusion et Perspectives

7.1

Conclusion

Dans cette thse, nous avons abord une tude globale de l'estimation de la densit conditionnelle quand la variable explicative fonctionnelle. Deux mthodes d'estimation ont t
considres, la premire concerne la mthode de noyau tandis que la deuxime concerne la
mthode d'estimation par polynmes locaux.
Pour la premire mthode nous avons tudi une question cruciale dans l'estimation non
paramtrique par la mthode noyau. Il s'agit du problme de choix du paramtre de lissage. A ce sujet, nous avons propos une mthode de slection automatique pour les deux
paramtres de lissages. L'optimalit asymptotique de notre mthode est obtenue sous des
conditions standards en statistique fonctionnelle. En pratique, cette mthode est ecace,
trs facile implmenter, et elle s'execute rapidement. De plus, notre critre de choix est
utilisable pour d'autres modles non paramtriques lis la densit conditionnelle. Ainsi,
on peut dire que notre contribution donne une rponse pertinente la question de Ferraty
et Vieu (2006) et elle ouvre aussi des perspectives sur de nombreuses questions de recherche
(cf. le chapitre 2).
Dans la deuxime partie, nous avons considr une autre approche pour l'estimation de la
densit conditionnelle quand les donnes sont fonctionnelles. L'estimateur propos est une
gnralisation au cas fonctionnel de l'estimateur par local linaire introduit par Fan et Gijbels
(1996). Comme rsultats asymptotiques nous avons tabli la vitesse de convergence presque
complte (ponctuelle et uniforme) et nous avons donn l'expression asymptotiquement exacte
de l'erreur quadratique de cet estimateur. Les expressions des vitesses de convergence obtenues ont la mme forme que l'estimateur noyau, dont les deux structures fonctionnelles
sont bien exploites. Plus prcisment, la dimensionalit du modle dans la partie biais, alors
que la dimensionalit de l'espace fonctionnel de la variable explicative a t explicit dans
la partie dispersion. De mme que dans le cas prcdent, ces rsultats asymptotiques sont
obtenus sous des conditions trs classiques en statistique non-paramtrique fonctionnelle et
que les estimateurs sont facile utiliser en pratique. De plus l'importance de cette deuxime
partie est aussi exprim par le nombre important de perspectives de recherce qu'il ore.
Dans la section suivante, nous listons quelques exemples de ces perspectives.
133

134

7.2

Chapitre 7. Perspectives

Perspectives

Pour conclure les travaux de cette thse, nous exposons dans ce qui suit, quelques dveloppements futurs possibles en vue d'amliorer et d'tendre nos rsultats.

Sur le choix des paramtres de lissage pour l'estimation local linaire


Une question naturelle pour l'utilisation pratique de cet estimateur, dans la partie simulation, nous avons considr le critre de validation croiss suivant :

1
W1 (Xi )
n
n

i=1

2 bi
i2
fb(h
(X
,
y)W
(y)dy

f(hK ,hH ) (Xi , Yi )W1 (Xi )W2 (Yi )


i
2
K ,hH )
n

o
k
fb(h
(Xk , y) =
K ,hH )

i=1

1
k=i,j=1 Wij (Xk )H(hH (y
n
hH i,j=1 Wij (Xk )

Yi ))

L'tude de l'optimalit asymptotique de cette mthode n'a pas t conduite, mais elle constitue une perspective de recherche court terme.

Sur la normalit asymptotique de l'estimateur locale linaire


Notons que la normalit asymptotique est importante en pratique et nous permet de dterminer l'intervalle de conance o de faire des tests statistiques. Dans le futur proche, nous
essaierons d'tendre les rsultats de Ezzahrioui et Ould Sa(2008) sur la mthode noyau
(cas fonctionnel) pour notre estimateur local linaire.

L'estimateur locale linaire dans le cadre spatial


La gnralisation de nos rsultats au cadre spatial est une suite logique suivre. Nous
pensons que cette gnralisation peut tre facilement atteinte par l'adaptation des rsultats
de Laksaci (2010) et Rachdi et Niang et Yao (2011).
D'autres perspectives pourraient tre traites long terme telles que, le cas o les variables
sont entaches par des erreurs d'observations, le processus d'erreur est longue mmoire,...
Par ailleurs, un autre problme mrite d'tre soulev. Il concerne la gnralisation de nos
rsultats sur la mthode par polynmes locaux d'ordre suprieur strictement 1.

Chapitre 8
Bibliographie gnrale

Acerbi, C. (2002). Spectral measures of risk : a coherent representation of subjective risk


aversion. J. Bank. Financ., 26, Pages 1505-1518.
Aneiros-Prez, G. et Vieu, P. (2006) Semi-functional partial linear regression.
76 (11) 1102-1110.

Statistics &

Probability Letters.

Antoniadis, A. et Sapatinas, T. (2007) Estimation and inference in functional mixed-eect


models Comput. Statist. & Data Anal. 51 (10) 4793-4813.
Araki, Y., Konishi, S. et Imoto, S. Functional discriminant analysis for microarray gene expression data via radial basis function networks. (English summary) COMPSTAT 2004|Proceedings in Computational Statistics, Physica, Heidelberg 613-620.
Barrientos-Marin, J. (2007). Some Practical Problems of Recent Nonparametric Procedures :
Testing, Estimation, and Application. PhD thesis from the Alicante University (Spain).
Barrientos-Marin, J., Ferraty, F. and Vieu, P. (2010). Locally Modelled Regression and
Functional Data. J. of Nonparametric Statistics, 22, No. 5, Pages 617632.
Ballo, A. and Gran, A. (2009). Local linear regression for functional predictor and scalar
response, textJournal of Multivariate Analysis, 100, Pages 102111.
Bashtannyk, D.M. and Hyndman, R.J. (2001). Bandwidth selection for kernel conditional
density estimation. Comput. Statist. Data Anal., 36, Pages 279-298.
Benhenni, K., Ferraty, F. et Rachdi, M. (2007) Local smoothing regression with functional
data. Computational Statistics, 22, (3) 353-369.
Benhenni, K., Hedli-Griche, S., Rachdi, M. (2010) Estimation of the regression operator
from functional xed-design with correlated errors. Jounal of Multivariate Analysis, 101,
Issue 2, 476-490.
Benhenni, K., Hedli-Griche, S., Rachdi, M. et Vieu, P. (2008) Consistency of the regressionn estimator with functional data under long memory conditions. Statistics & Probability
Letters, 78, Issue 8, 1043-1049.
135

136

8. Bibliographie gnrale

Benko, M. (2006) Functional Data Analysis with Applications in Finance Mmoire de Thse.
universit de Berlin.
Benko, M., Hardle, W. et Kneip, A. (2006) Common Functional Principal Components SFB
649 Discussion Papers SFB649DP2006-010, Humboldt University, Berlin, Germany.
Bogachev, V.I. (1999). Gaussian measures. Math surveys and monographs, 62, Amer. Math.
Soc.
Bosq, D. (1991), Modelization, nonparametric estimation and prediction for continuous time
processes. In Nonparametric functional estimation and related topics (Spetses, 1990), 509529, NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci. 19, Kluwer Acad. Publ., Dordrecht.
Bosq, D. (2000) Linear Processes in Function Spaces
Notes in Statistics, 149, Springer-Verlag, New York.

: Theory and Applications

Lecture

Bosq, D. et Delecroix. V. (1985) Nonparametric prediction of a Hilbert space valued random


variable. Stochastic Processes Appl. 19, 271-280.
Bosq. D. et Lecoutre. J.P. (1987)

Thorie de l'estimation fonctionnelle.

Economica, Paris.

Berlinet, A., Gannoun, A. et Matzner-Lober, E. (1998). Asymptotic normality of convergent


estimators of the conditional mode. CANADIAN J. Statistics. 26 , 2, 365-380.
Berlinet, A., Gannoun, A. et Matzner-Lober, E. (2001) Asymptotic normality of convergent
estimates of the conditional quantiles. Statistics. 35, 139-169.
Berlinet, A., Biau, G. and Rouvire, L. (2005) Functional supervised classication with
wavelets. Annales de l'ISUP, 2, 61-80.
Besse, P., Cardot, H. and Ferraty, F. (1997). Simultaneous nonparametric regressions of
unbalanced longitudinal data. Comput. Statist. Data Anal., 24, No. 3, Pages 255270.
Besse, P. et Ramsay, J.O. (1986) Principal components analysis of sampled functions. Psychometrika
51 (2) 285-311.
Cadre. B. (2001) Convergent estimators for the L1 -median of Banach valued random variable. Statistics. 35, 509-521.
Cai, T.-T. and Hall, P. (2006). Prediction in functional linear regression, Annals of Statistics,
34, Pages 21592179.
Cardot, H., Ferraty, F. et Sarda, P. (1999) Functional linear model.
45, (1) 11-22.

Statist & Probability

Letters.

Cardot, H., Ferraty, F. and Sarda, P. (2003) Spline Estimators for the Functional Linear
ModelStatistica Sinica 13 (3) 571-591.
Cardot, H., Crambes, C. and Sarda, P. (2004) Spline estimation of conditional quantities
for functional covariates. C. R. Acad. Sci., Paris. 339, (2) 141-144.

137

Cardot, H., Crambes, C. and Sarda, P. (2004a) Conditional Quantiles with Functional Covariates : an Application to Ozone Pollution Forecasting. Contributed paper in Compstat
Prague 2004 Proceedings 769-776.
Cardot, H. et Sarda, P. (2005). Quantile regression when the covariates are functions.
Stat. 17 (7) 841-856.

J.

Nonparametr.

Cardot, H. et Sarda, P. (2005a) Estimation in generalized linear models for functional data
via penalized likelihood. J. Multivariate Anal. 92 (1) 24-41.
Cardot, H., Crambes, C. and Sarda, P. (2006)

Ozone pollution forecasting using conditional

mean and conditional quantiles with functional covariates.

Statistical methods for biostatistics and related elds, Ha


rdle, Mori and Vieu (Eds.), Springer.
Cardot, H. et Sarda, P. (2006) Linear regression models for functional data. The art of
semiparametrics 49-66, Contrib. Statist., Physica-Verlag/Springer, Heidelberg.
Cardot, H., Crambes, C. and Sarda, P. (2006a) Ozone

pollution forecasting using conditional

mean and conditional quantiles with functional covariates.

Statistical methods for biostatis


tics and related elds, H|a rdle, Mori and Vieu (Eds.), Springer.
Cardot, H., Mas, A. et Sarda, P. (2007) CLT in functional linear regression models. (English
summary) Probab. Theory Related Fields 138 (3-4) 325-361.
Castro,P.E., Lawton, W.H., et Sylvestre, E.A. (1986). Principal modes of variation for processes with continuous sample curves. Technometrics, 28 :329337.
Chu, C.-K. and Marron, J.-S. (1991). Choosing a kernel regression estimator. With comments
and a rejoinder by the authors. Statist. Sci., 6, Pages 404436.
Crambes, C., Kneip, A. and Sarda, P. (2007) Smoothing splines estimators for functional
linear regression Ann. Statist. 37, (1) 35-72.
Collomb, G. (1981) Estimation non-paramtrique de la rgression :
Inter. Statist. Review. 49, 75-93.

Revue bibliographique

Collomb, G. (1985) Nonparametric regression : an up to date bibliography Statistics. 2,


297-307.
Collomb, G., Ha
rdle, W., and Hassani, S (1987) A note on prediction via conditional mode
estimation. J. Statist. Plann. and Inf. 15, 227-236.
Cuevas, A., Febrero, M. and Fraiman, R. (2002) Linear function regression : the case of xed
design and functional response Canad. J. of Statistics 30 285-300.
Cuevas, A. et Fraiman, R. (2004) On the bootstrap methodology for functional data. (English
summary) COMPSTAT 2004|Proceedings in Computational Statistics 127-135 Physica, Heidelberg.
Dabo-Niang, S. (2002) Estimation de la densit dans un espace de dimension innie : Application aux diusions. C. R. Acad. Sci., Paris. 334, (3) 213-216.

138

8. Bibliographie gnrale

Dabo-Niang, S. and Rhomari, N. (2003) Estimation non paramtrique de la rgression avec


variable explicative dans un espace mtrique. (French. English, French summary) [Kernel
regression estimation when the regressor takes values in metric space] C. R. Math. Acad.
Sci. Paris 336 (1) 75-80.
Dabo-Niang, S., Ferraty, F. et Vieu, P. (2004a) Estimation du mode dans un espace vectoriel
semi-norm. (French) [Mode estimation in a semi-normed vector space] C. R. Math. Acad.
Sci. Paris 339 9 659-662.
Dabo-Niang, S. (2004b) Kernel density estimator in an innite-dimensional space with a
rate of convergence in the case of diusion process. Appl. Math. Lett. 17 (4) 381-386.
Dabo-Niang, S., Ferraty, F. et Vieu, P. (2006) Mode estimation for functional random variable and its application for curves classication. Far East J. Theor. Stat. 18 (1) 93-119.
Dabo-Niang, S. and Laksaci, A. (2007) Estimation non paramtrique de mode conditionnel
pour variable explicative fonctionnelle. (French) [Nonparametric estimation of the conditional mode when the regressor is functional] C. R. Math. Acad. Sci. Paris 344 (1) 49-52.
Damon, J. et Guillas, S. (2002) The inclusion of exogenous variables in functional autoregressive ozone forecasting Environmetrics 13 759-774.
Dauxois, J., Pousse, , A. and Romain, Y. (1982) Asymptotic theory for the principal component analysis of a vector random function : some applications to statistical inference J.
Multivariate Anal. 12 (1) 136-154.
De Boor, C. (1978)

A pratical guide to Splines.

Springer, New-York.

De Gooijer, J. and Gannoun, A. (2000) Nonparametric conditional predictive regions for


time series Comput. Statist. Data Anal., 33, Pages 259257.
Delsol, L. (2007) Rgression sur variables fonctionnelle : Estimation, Tests de structure et
Application. Thse de Doctorat, Universit de Toulouse.
Demongeot, J., Laksaci, A., Madani, F. and Rachdi, M. (2010) Local linear estimation of the
conditional density for functional data. C. R., Math., Acad. Sci. Paris, 348, Pages 931-934.
Demongeot, J., Laksaci, A., Madani, F. and Rachdi, M. Functional data : Local linear
estimation of the conditional density and its application Statistics, Volume 00, Pages 00-00,
DOI : 10.1080/02331888.2011.568117.( paratre en 2012)
Demongeot, J., Laksaci, A., Madani, F. and Rachdi, M. (2011). A fast functional locally moRecent Advances in Functional
Data Analysis and Related Topics Contributions to Statistics, Physica-Verlag/Springer,
2011, 85-90, DOI : 10.1007/978-3-7908-2736-1_13
deled conditional density and mode for functional time-series.

Dereich, S. (2003) High resolution coding of stochastic processes and small ball probabilities.
PhD Thesis.
Deville., J. C. (1974) Mthodes statistiques et numriques de l'analyse harmonique. Ann.
15, 3-101.

Insee,

139

Eddy, W. F. (1982) The asymptotic distribution of kernel estimators of the mode.


59, 2279-290.

Z. W.

Giebete,

El Ghouch, A. and Genton, M. (2009). Local polynomial quantile regression with parametric
features. J. Amer. Statist. Assoc., 104, No. 488, Pages 14161429.
Ezzahrioui, M. (2007) Prvision dans le smodles conditionnels en dimension innie. Thse
de Doctorat, Universit du Littoral Cte d'Opale.
Ezzahrioui, M. and Ould-Sad, E. (2008). Asymptotic normality of a nonparametric estimator of the conditional mode function for functional data. J. Nonparametr. Stat., 20, Pages
318.
Fan, J. (1992). Design-adaptive nonparametric regression.
Pages 9981004.
Fan, J. and Gijbels, I. (1996).
Chapman & Hall.

J. Amer. Statist. Assoc.,

Local Polynomial Modelling and its Applications.

87,

London,

Fan, J., Yao, Q. and Tong, H. (1996). Estimation of conditional densities and sensitivity
measures in nonlinear dynamical systems. Biometrika, 83, Pages 189206.
Fan, J. and Yao, Q. (2003). Nolinear
Springer-Verlag, New York.

Time Series : Nonparametric and Parametric Methods,

Fan, J. and Yim, T.-H. (2004). A cross-validation method for estimating conditional densities. Biometrika, 91, Pages 819834.
Ferraty, F., Goia, A. et Vieu, P. (2002) Rgression non-paramtrique pour des variables alatoires fonctionnelles mlangeantes. (French) [Nonparametric regression for mixing functional
random variables] C. R. Math. Acad. Sci. Paris 334 (3) 217-220.
Ferraty F., Goia A. and Vieu P. (2002b) Functional nonparametric model for time series : a
fractal approach for dimension reduction. Test, 11, (2) 317-344
Ferraty, F., Mas, A. and Vieu, P. (2007) Advances in nonparametric regression for fonctionnal
data. Aust. and New Zeal. J. of Statist 49 1-20.
Ferraty, F., Rabhi, A. et Vieu, P. (2008) Estimation non-paramtrique de la fonction de
hasard avec variable explicative fonctionnelle. Rom. J. Pure & Applied Math. 53 (1) 1-18.
Ferraty, F., Tadj, A., Laksaci, A. and Vieu, P. (2010). Rate of uniform consistency for
nonparametric estimates with functional variables. J. of Statist. Plan. and Inf., 140, Pages
335352 .
Ferraty, F. and Vieu, P. (2000) Dimension fractale et estimation de la rgression dans des
espaces vectoriels semi-norms C. R. Math. Acad. Sci. Paris, 330, 403-406.
Ferraty, F. et Vieu, P. (2002) The functional nonparametric model and application to spectrometric data. Comput. Statist. 17 (4) 545-564.

140

8. Bibliographie gnrale

Ferraty, F. and Vieu, P. (2003) Curves discrimination : a nonparametric functional approach.


Comput. Statist. and Data Anal. 44, 161-173.
Ferraty, F. and Vieu, P. (2004)Nonparametric models for functional data, with application
in regression times series prediction and curves discrimination. J. Nonparametric. Statist.
16, 111-127.
Ferraty, F., Laksaci, A. et Vieu, P. (2005) Functional time series prediction via conditional
mode estimation. C. R. Math. Acad. Sci. Paris 340 (5) 389-392.
Ferraty, F. and Vieu, P. (2006) Nonparametric
Springer Verlag.

functional data analysis. Theory and Practice.

Ferraty, F., Laksaci, A. and Vieu, P. (2006) Estimating some characteristics of the conditional
distribution in nonparametric functional models. Stat. Inference Stoch. Process 9 (1) 47-76.
Ferraty F. and Vieu P. (2006a) Nonparametric
New York.

modelling for functional data.

Springer-Verlag,

Ferraty, F., Rabhi, A. et Vieu, P. (2008) Estimation non-parametrique de la fonction de


hasard avec variable explicative fonctionnelle. Rom. J. Pure Applied Math. 53, 1-18.
Ferraty, F., Van Keilegom, I. and Vieu, P. (2008). On the validity of the bootstrap in
nonparametric functional regression. Scandinavian J. of Statist., 37, No. 2, Pages 286306.
Gannoun, A. (1990). Estimation non paramtrique de la mdiane conditionnelle.
35, No. 1, Pages 11-22.

Pub. Inst.

Stat. Univ. Paris,

Gannoun, A., Saracco, J. and Yu, K. (2003). Nonparametric prediction by conditional median and quantiles. J. of Statist. Plan. and Inf., 117, No. 2, Pages 207223.
Gao, F. et Li, W.V. (2007) Small ball probabilities for the Slepian Gaussian elds.
Trans. Amer. Math. Soc. 359 (3) 1339-1350 (electronic).
Gasser, T., Hall, P. et Presnell, B. (1998) Nonparametric estimation of the mode of a distribution of random curves. J. R. Statomptes. Soc. Ser. B Stat. Methodol. 60 (4) 681-691.
Geroy, J. (1974) Sur l'estimation d'une densit dans un espace mtrique.
Paris, 278, 1449-1452.

C. R. Aca. Sci.,

Grenander, U. (1981) Abstract inference. Wiley Series in Probability and Mathematical


Statistics. John Wiley & Sons, Inc., New York. ix+526 pp.
Gyorf i, L., Ha
rdle, W., Sarda, P. and Vieu, P. (1989) Nonparametric curve estimation for
time series. Lecture Notes in Statistics. 60, Springer-Verlag.
Hall, P. et Hosseini-Nasab, M. (2006) On properties of functional principal components
analysis J.R. Stat. Soc. B 68 (1) 109-126.
Hall, P., Poskitt, P. and Presnell, D. (2001) A functional data-analytic approach to signal
discrimination. Technometrics 43 1-9.

141

Hall, P., Wolk, R.C. and Yao, Q. (1999). Methods for estimating a conditional distribution
function. J. Amer. Statist. Assoc., 94, Pages 154163.
Ha
rdle, W. (1990) Applied nonparametric regression.
UK.
Hrdle, W. (1991).

Cambridge Univ. Press,

Smoothing Techniques with Implementation in S.

Cambridge,

Springer, New York.

Hrdle, W., Jenssen, P. and Sering, R. (1991). Strong consistency rates for estimators of
conditional functionals. Ann. Statist., 16, No. 4, Pages 14281449.
Ha
rdle, W., Lu
tkepohl, H. and Chen, R. (1997) A review of nonparametric time series
analysis. Inter. Statist. Rev. (65), 73-85.
Hrdle, W. and Marron, J. S. (1985). Optimal bandwidth selection in nonparametric regression function estimation. Ann. Statist., 13, No. 4, Pages 14651481.
Hassani, S., Sarda, P., et Vieu, P. (1995) Approche non paramtrique en thorie de abilit :
revue bibliographique. Rev. Statist. Appli. (35), 27-41.
Hastie, T., Buja, A. et Tibshirani, R. (1995) Penalized discriminant analysis.
13 435-475.

Ann. Statist.

Hedli-Griche, S. (2008) Estimation de l'oprateur de rgression pour des donnes fonctionnelles et des erreurs corrles. PhD Thesis.
Holmstrome, I. (1961) On method for parametric representation of the state of the atmosphere. Tellus. 15 127-149.
Hyndman, R.J. (1995). Highest-density forecast regions for non-linear and non-normal time
series models. J. Forecast., 14, Pages 431441.
Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996). Estimating and visualizing
conditional densities. J. Comput. Graph. Statist., 5, Pages 315336.
Hyndman, R.J. and Yao, Q. (1998). Nonparametric estimation and symmetry tests for conditional density functions. Working paper 17/98, Department of Econometrics and Business
Statistics, Monash University.

Hyndman, R. and Yao, Q. (2002). Nonparametric estimation and symmetry tests for conditional density functions. J. Nonparametr. Stat., 14, Pages 259278.
Jank W. et Shmueli, G. (2006) Functional Data Analysis in Electronic Commerce
21 (2) 155-166.

Research

Statistical Science

Kawasaki, Y. et Ando, T. (2004) Functional data analysis of the dynamics of yield curves.
COMPSTAT 2004|Proceedings in Computational Statistics 1309-1316, Physica, Heidelberg.
Kirkpatrick, M. and Heckman, N. (1989) A quantitative genetic model for growth, shape,
reaction norms, and other innite-dimensional characters. J. Math. Biol. 27, (4) 429-450.

142

8. Bibliographie gnrale

Kneip, A. et Gasser, T. (1992) Statistical tools to analyze data representing a sample of


curves. Ann. Statist. 20 (3) 1266-1305.
Kneip, A. et Utikal, K.J. (2001) Inference for density families using functional principal
component analysis. With comments and a rejoinder by the authors. J. Amer. Statist. Assoc.
96 (454) 519-542.
Kolmogorov, A.N. and Tikhomirov, V.M. (1959). -entropy and -capacity. Uspekhi Mat.
Nauk., 14, Pages 386. (Eng. Transl. Amer. Math. Soc. Transl. Ser., 2, Pages 277364,
(1964)).
Konakov, V.D. (1974) On the asymptotic of the mode of multidimensional distributions.
Theory of probab. Appl., 19, 794-799.
Laksaci, A. (2005). Contribution aux modles non paramtriques conditionnels pour variables explicatives fonctionnelles. Thse de Doctorat, Universit de Toulouse.
Laksaci, A. (2007). Convergence en moyenne quadratique de l'estimateur noyau de la
densit conditionnelle avec variable explicative fonctionnelle. Pub. Inst. Stat. Univ. Paris,
3, Pages 6980.
Laksaci, A. et Yousfate, A. (2002) Estimation fonctionnelle de la densit de l'oprateur de
transition d'un processus de Markov temps discret. C. R. Acad. Sci., Paris 334, (11)
1035-1038.
Laksaci, A., Madani, F. and Rachdi, M. (2012). Kernel conditional density estimation when
the regressor is valued in a semi-metric space Communications in StatisticsTheory and
Methods. (to appear).
Lanczoz, C. (1956)

Applied analysis.

Prentice Hall.

Lecoutre, J.P., et Ould-Sad, E. (1995) Hazard rate estimation for mixing and censored
processes. J. Nonparametric Statist. (5), 83-89.
Laukaitis, A. et Rackauskas, A. (2002) Functional data analysis of payment systems Nonlinear Analysis : Modelling and Control 7 (2) 53-68.
Li, W.V. and Shao, Q.M. (2001) Gaussian processes : inequalities, small ball probabilities
and applications. In :C.R. Rao and D. Shanbhag (eds.) Stochastic processes, Theory and
Methods. Handbook of Statitics, 19, North-Holland, Amsterdam.
Lifshits, M.A., Linde, W. et Shi, Z. (2006) Small deviations of Riemann- Liouville processes
in Lq-spaces with respect to fractal measures. Proc. London Math. Soc. (3) 92 (1) 224-250.
Louani, D. (1998) On the asymptotique normality of the function and its derivatives under
censoring. Comm. Statist., Theory and methods. 27, 2909-2924.
Louani, D. et Ould-Said, E. (1999) Asymptotique normality of kernel estimators of the
conditional mode under strong mixing hypothesis. J. Nonparametric Statist. 11, (4) 413442.

143

Meiring, W. (2005)

Mid-Latitude Stratospheric Ozone Variability : a Functional Data Ana-

lysis Study of Evidence of the Quasi-Biennial Oscillation, Time Trends and Solar Cycle in
Ozoneonde Observations Technical report

of California, Santa Barbara.

403 Statistic and Applied Probability, University

Molenaar, P. et Boosma, D. (1987) The genetic analysis of repeated measures : the karhunenloeve expension. Behavior Genetics. 17, 229-242.
Mller, H.-G. and Stadtmller, U. (2005). Generalized functional linear models.
33, No. 2, Pages 774805.

Ann. Stat.,

Mu
ller, H.G., Sen, R. et Stadtmu
ller, U. (2007) Functional data Analysis for Volatility
Process.soumis
Nadaraya, E.A. (1965) On estimation of density functions and regression curves.
Prob. Appl., 10. 1861 90.

Theory

Ouassou, I. and Rachdi, M. (2009). Stein type estimation of the regression operator for
functional data. Advances and Applications in Statistical Sciences, 1, No. 2, Pages 233-250.
Ould-Sad, E. (1993) Estimation nonparamtrique du mode conditionnelle. Application la
prvision. C. R. Acad. Sci., Paris. I36, 943-947.
Ould-Sad, E. (1997) A note on ergodic processes prediction via estimation of the conditional
mode function. Scand. J. Statist. 24, 231-239.
Ould-Sad, E. and Cai, Z. (2005) Strong uniform consistency of nonparametric estimation
of the censored conditional mode function. J. Nonparametric Statist. 17, 797-806.
Perzen, E. (1962) On estimation of a probability density functionand mode.
Stat. 33, 1065-1076.

Ann. Math.

Preda, C. (2007) Regression models for functional data by reproducing kernel Hilbert spaces
methods. J. Statist. Plann. Inference 137 (3) 829-840.
Quentela Del Rio, A., Vieu, P., (1997). A nonparametric conditional mode estimate.
8, 253-266.

Non-

parametric J. Statist.,

Rachdi, M., El Methni, M. (2011) Local weighted average estimation of the regression operator for functional data. Comm. Stat., Theory and Methods, (Sous presse).
Rachdi, M., Sabre, R. (2000) Consistent estimates of the mode of the probability density
function in nonparametric deconvolution problems, Statistics & Probabability Letters. 47
105114.
Rachdi, M. et Vieu, P. (2005) Slection automatique du paramtre de lissage pour l'estimation non-paramtrique de la rgression pour des donnes fonctionnelles. C. R., Math., Acad.
Sci. Paris 341 (6) 365-368.
Rachdi, M. et Vieu, P. (2007). Nonparametric regression for functional data : automatic
smoothing parameter selection. J. Statist. Plann. and Inf., 137, 2784-2801, (2007).

144

8. Bibliographie gnrale

Ramsay, J.O. (1982) When the data are functions.

Psychometrika

47 (4) 379-396.

Ramsay, J.O. (2000a) Dierential equation models for statistical functions. Canad. J. Statist.
28 (2) 225-240.
Ramsay, J.O. (2000b) Functional components of variation in handwriting.
95 9-15.

Journal of the

American Statistical Association

Ramsay, J.O., Bock, R. et Gasser, T. (1995) Comparison of height acceleration curves in


the Fels, Zurich, and Berkeley growth data, Annals of Human Biology 22 413-426.
Ramsay, J.and Li, X. (1996) Curve registration.

J. R. Statist. Soc. B,

60, 351-363.

Ramsay, J.O., Munhall, K.G., Gracco V.L. and Ostry D.J. (1996) Functional data analysis
of lip motion. J Acoust Soc Am 99 3178-3727.
Ramsay, J. and Silverman, B. (1997) Functional

Data Analysis

Springer- Verlag, New York.

Ramsay, J. and Silverman, B. (2002)


Spinger-Verlag, New York.

Applied functional data analysis : Methods and case

Ramsay, J. and Silverman, B. (2005)


Verlag, New York.

Functional Data Analysis

studies

(Second Edition) Spinger-

Rao, C. R. (1958) Some statistical methods for comparison of growth curves. Biometrics 14
1-17.
Rice, J. and Silverman, B. (1991) Estimating the mean and the covariance structure nonparametrically when the data are curves. J. R. Statist. Soc. B, 53, 233-243.
Rio, E. (1990).

Exponential inequalities ...

Springer-Verlag, New York.

Rio, E. (2000). Thorie asymptotique des processus alatoires


ger, ESAIM, Collection Mathmatiques et Applications

faiblement dpendants.

Robinson, P. (1983) Nonparametric estimators for time series.


185-207.

Sprin-

J. of Time Ser. Anal.

4,

Romano, J. P. (1988) On weak convergence and optimality of kernel density estimates of


the mode. Ann. Statist. 16, 916, 629-647.
Rosenblatt, M. (1969). Conditional probability density and regression estimators. In
Ed. P.R. Krishnaiah. Academic Press, New York and London.

Mul-

tivariate Analysis II,

Rossi, F., Delannay, N., Conan-Guez, B. et Verleysen, M. (2005c) Representation of Functional Data in Neural Networks Neurocomputing 64 183-210
Roussas, G. (1968) On some properties of nonparametric estimates of probability density
functions. Bull. Soc. Math. Grce(N.S.), 9, (1), 29-43.
Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators.
J. Statist., 9, Pages 6578.

Scand.

145

Samanta, M. (1973) Nonparametric estimation of the mode of a multivariate density.


7, 109-117.

South

African Statist. J.

Samanta, M. (1989) Non-parametric estimation of conditional quantiles.


7, 407-412.

Statistics & Pro-

bability Letters.

Samanta, M. and Thavaneswaran, A. (1990) Non-parametric estimation of conditional mode.


Comm. statist. Theory and Meth. 16, 4515-4524.
Samworth, R. J. and Wand, M. P. (2010). Asymptotics and optimal bandwidth selection
for highest density region estimation. Ann. Statist., 38 , Pages 17671792.
Sarda, P. and Vieu, P. (2000).

Kernel Regression.

Pages 4370, Wiley, New York.

Schimek, M. (2000) Smoothing and regression : Approches, computation and application. Ed.
M.G. Schimek, Wiley Series in Probability and Statistics.
Schumaker, M. (1981)

SSpline functions : basic theory.

Wiley.

Stone, C.J. (1984). An asymptotically optimal window selection rule for kernel density estimates. Ann. Stat, 12 , 12851297.
Stone, C.J. (1994). The use of polynomial splines and their tensor products in multivariate
function estimation. Ann. Statist., 22, No. 1, Pages 118184.
Stute, W. (1985) Conditional Empirical processes.

Ann. Statist.

14, 638-647.

Theodoros, N. and Yannis, G.Y. (1997). Rates of convergence of estimate, Kolmogorov


entropy and the dimensionality reduction principle in regression. Ann. Statist., 25, No. 6,
Pages 24932511.
Tucker, L.R. (1958) Determination of parameters of functional equations by factor analysis.
Psychometrica 23 19-23.
Valderrama, M.J., Ocaa, F.A. et Aguilera, A.M. (2002) Forecasting PC-ARIMA models
for functional data. COMPSTAT (Berlin) 25-36.
Van Ryzin, J. (1969). On strong consistancy of density estimation. The
tical Statistics. 40, 5, 1765-1772.

annals of mathema-

Vidakovic, B. (2001) Wavelet-Based Functional Data Analysis : Theory, Applications and


Ramications Proceedings PSFVIP-3 F3399.
Vieu, P. (1991), Quadratic errors for nonparametric estimates under dependence.,
39, 324-347.

J. Multi-

variate Anal.,

Vieu, P. (1996) A note on density mode function.


297-307.
Wand, M.P. and Jones, M.C. (1995).

Statistics & Probability Letters.

Kernel Smoothing.

Chapman & Hall, London.

26, (4)

146

8. Bibliographie gnrale

Wertz,W. (1981) Nonparametric estimators in abstract and homogeneous spaces. Lecture


note in Mathematics. University of Technology ; Wien.
Yao, F. and Lee, T.C.M. (2006) Penalised spline models for functional principal component
analysis J.R. Stat. Soc. B 68 (1) 3-25.
Yoshihara, K. I. (1994) Weakly dependent stochastic sequences and their
IV : Curve estimation based on weakly dependent data. Sanseido, Tokyo.

applications.

vol

Youndj, E. (1993). Estimation non paramtrique de la densit conditionnelle par la mthode


du noyau. PhD Thesis from the Rouen University (in French).
Youndj, E. (1996). Proprits de convergence de l'estimateur noyau de la densit conditionnelle. Rev. Roumaine Math. Pures Appl., 41, Pages 535566.
Youndj, E., Sarda, P. and Vieu, P. (1993). Kernel estimator of conditional density bandwidth selection for dependent data. C. R. Acad. Sci. Math. Paris, 316, No. 9, Pages 935938.

Vous aimerez peut-être aussi