Académique Documents
Professionnel Documents
Culture Documents
www.math.yale.edu/~mmm82
Outline of the talk
Motivations
Analyze large amount of data, and functions on this data, intrinsically rather
low-dimensional, embedded in high dimensions.
Paradigm: we have a large number of documents (e.g.: web pages, gene
array data, (hyper)spectral data, molecular dynamics data etc...) and a way
of measuring similarity between pairs. Model: a graph
(G,E,W)
In important practical cases: vertices are points in high-dimensional
Euclidean space, weights may be a function of Euclidean distance.
Difficulties
Data sets in high-dimensions are complicated, as are classes of interesting
functions on them.
We want to do approximation (“learning”) of such functions. Parametrize
low dimensional subspaces of smooth functions on sets embedded in high-
dimension. “Fast” algorithms
High dimensional data: examples
• Documents, web searching
• Customer databases
• Satellite imagery
• Transaction logs
• Social networks
• Gene arrays, proteomics data
• Art transactions data
• Traffic (automobilistic, network) statistics
•
•
1
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
100
80
60
40
20
-20
-2
-4
-6
-8
-10
-12
-14
0 50 100 150 200 250 300
Examples:
If X lies in n-dimensional Euclidean space, k(x,y) could be:
- exponentially weighted distance: exp(-(||x-y||/a)^2
- angle: <x,y>/(||x|| ||y||)
- harmonic potential: 1/(a+||x-y||), or powers of it
- “feature distances”: any of the above applied in the range f(X), f nonlinear,
possibly mapping X to higher dimension
- if a model for the data is available (probabilistic, or from a (stochastic)
dynamical system), k may be chosen to be consistent with that model
-1
-2
-3
-4
5
-5
6 8
2 4
-10 -2 0
-4
PROBLEM:
Either very local information or very global information: in many problems the
intermediate scales are very interesting (think of web pages)! Would like
MULTISCALE information!
0.9
0.8
0.7
0.6
2
0.5 σ (T )
4
0.4 σ (T )
8
0.3 σ (T )
0.2 16)
σ (T
0.1 ε
0 5 10 15 20 25 30
... V V V V
3 2 1 0
... to Multiresolution Diffusion
[Coifman,MM]
The decay in the spectrum of T says
powers of T are low-rank, hence
Multiscale compression scheme: compressible.
- Random walk for 1=2^0 step 1
0.8
representatives 0.7
4
0.4 σ (T )
steps... 0.3 8
σ (T )
0.2 16
σ (T )
0.1 ε
0 5 10 15 20 25 30
... V V V V
3 2 1 0
Multiscale Random Walkers
“=”
Dilations, translations, downsampling
We have frequencies: the eigenvalues of the diffusion T.
What about dilations , translations , downsampling?
We may have minimal information about the geometry, and only locally. Let's
think in terms of functions on the set X.
Dilations:
Use the diffusion operator T and its dyadic powers as dilations.
0.15
0.2 0.1
0.05
0.1
0
0.1
0
-0.05
0.05
-0.1 -0.1
0 0.15
-0.15
-0.2
0.1
-0.05 -0.2
-0.05 0.05
0 0.1
0.05 0.05
-0.1 0 0
0.1 -0.05
-0.1 -0.05
Physics
0.2
0.2
0.1
0.1
0
0
-0.1
-0.1
-0.2
0.1
0.05 0.2 0.15 -0.2
0 0.1 0.1
0.05 0.05 0
-0.05 0 0.1
-0.1 0 0.1
-0.05
0.05
-0.05 0
Paleontology
0
-0.1 -0.05
Astronomy,
-0.1
planets
Climate -0.2
0.1
0.05 0.2 0.15
0.2 0.1
0
0.05
-0.05 0 0.1
0.1 -0.1 -0.05
0
0
-0.1
-0.1
-0.2
-0.2 0.1
0.1
0.05
0.05 0.15
0
0.05
0.1 Comets, 0
-0.05
0.05
0.1
-0.05 0
asteroids -0.1 -0.05
0
Examples of multiscale scaling functions on documents
Documents Words
- By Jessica Gorman--"Pan scrapings!" announced G.
'wine' 'region' 'chemical' 'food' 'ancient'
Kenneth Sams. "When you come down to it, that is -what
has brought us all here tonight-pan scrapings."-- It was
probably not the most appetizing introduction for a $150-
per-plate -museum fundraiser. Some 150 guests, many in
black tie, had just sat down to -enjoy their first course at
lavish banquet tables in the Upper Egyptian Gallery -of the
University of Pennsylvania Museum of Archaeology and
Anthropology in -Philadelphia. Did they r
Documents Words
'nitrogen' 'plant' 'ecologist' 'carbon' 'global'
by J. Raloff-- Acid rain and agricultural pollution 'specy' 'native' 'change' 'pollution'
both spew nitrogen into the air. Though -plants need 'university' 'growth' 'soil' 'human' 'activity'
nitrogen to grow, a new study finds that even small 'nutrient' 'dioxide' 'fuel' 'acid' 'cycle'
additions of -this fertilizing pollutant can perturb the 'natural'
landscape.-- In plots of Minnesota prairie to which
ecologists applied nitrogen for 12 -years, native
grasses showed a dramatically impaired ability to
compete against -weeds that had immigrated from
Europe centuries ago.-- The nitrogen treatment
triggered the terrestrial equ
ans =
Documents Words
'earthquake' 'wave' 'fault' 'quake' 'tsunami'
New technologies convert the motion of waves into 'california' 'southern' 'rock' 'seismologist'
watts-- Peter Weiss-- 'ocean' 'lake' 'coast' 'tsunamis' 'north'
R. Monastersky-- The underwater volcano growing 'region' 'island' 'seismic' 'plate' 'crust'
off Hawaii's south coastline collapsed - 'power'
- The Mush Zone A slurpy layer lurks deep inside
the planet--
by I. Peterson-- In late 1942, carrying 15,000 U.S.
soldiers bound for England, the Queen Mary -hit a
storm about 700 miles off the coast of Scotland.
Without warning amid the -tumult, a single,
mountainous wave struck the ocean liner, rolling it
over and -washing water across its upper decks.
Luckily, the ship managed to right itself -and
continue on its voyage.-- Gargantuan waves, which
appear unexpectedly even under calm conditions in
the -open ocean, have damaged and sunk numerous s
- Racing the Waves Seismologists try to catch quake
tremors quickly enough to -save lives--
- geologists have been snapping a very different
picture of -the lake lately. Far beneath Lake Tahoe's
gentle surface, they say, several -hidden earthquake
faults snake across the lake's flat bottom.
How a middling quake made a giant tsunami
Waves of Death-- Why the New Guinea tsunami
Comments, Applications, etc...
● This is a wavelet analysis on manifolds, graphs, markov chains, certain fractals,
while Laplacian eigenfunctions do the corresponding Fourier Analysis, adapted to a
given diffusion operator.
● We are “compressing”: powers of the operator, functions of the operators, subspaces
of the function subspaces on which its powers act. This yields sampling formulas,
quadrature formulas.
● Does not require the diffusion to be self-adjoint, nor needs the eigenvectors.