Académique Documents
Professionnel Documents
Culture Documents
Statistical Techniques
--Nonparametric Density Estimation
Xi Chen
Nov 6
Multivariate Analysis
Classical Analysis:
Poor results for huge and complex data sets
The questions become more different
Computational cost of storing and processing data goes down
Modern Data
Exploratory data analysis (EDA) 1977 Data mining
From simple dirty techs to big data.
Internet traffic data are described as ferocious
Human Genome Project has to deal with gigabytes (230 ( 109)
bytes) of genetic information
earth sciences have terabytes (240 ( 1012) bytes) and soon,
petabytes (250 ( 1015) bytes), of data for processing
Etc.
The Histogram
Rule-of-Thumb Method
We take p to be and K to be a standard Gaussian
kernel.
The optimal (ROT) window width for the above density
would be
Cross-Validation
In
the univariate case, the basic algorithm removes a single value, say xi, from
the sample, computes the appropriate density estimate at that xi from the
remaining n1 sample values,
and then chooses h to optimize some given criterion involving all values of i =
1,2,...,n.
The unbiased cross-validation choice of window width is the h that minimizes
The End