Vous êtes sur la page 1sur 46

Diffusion Wavelets

on Graphs and Manifolds

R.R. Coifman, MM,


J.C. Bremer Jr., A.D. Szlam

Other collaborators on related projects: P.W. Jones, S. Lafon, R. Schul

www.math.yale.edu/~mmm82
Outline of the talk

1) Relate learning and approximation under smoothness constraints, motivate


need for good “'building blocks” for approximation
2) Some very classical “building blocks” in Euclidean spaces: Fourier and
wavelets (a comparison)
3) New multiscale “building blocks” for manifolds: for this we need to talk
about about diffusion geometries:
3a) Global diffusion geometries, eigenfunctions of the Laplacian
3b) Multiscale diffusion wavelets
4) Stylized applications
5) Future work
Learning & Function Approximation on graphs and
manifolds
In many cases learning algorithms (or learning itself?) can be viewed as an
approximation problem, under smoothness constraints, on a set of points/state
space represented as a manifold or graph. [Think of SMVs as an example]

Approximation Smoothness constraint Manifold/graph

Fitting the function Want generalization Belief: the structure of the


of interest power, complexity state space has to do with
control, avoid overfitting the learning task.
Goal
Develop tools for multiscale analysis of functions on manifolds, varifolds,
graphs, “datasets”.

Motivations
Analyze large amount of data, and functions on this data, intrinsically rather
low-dimensional, embedded in high dimensions.
Paradigm: we have a large number of documents (e.g.: web pages, gene
array data, (hyper)spectral data, molecular dynamics data etc...) and a way
of measuring similarity between pairs. Model: a graph
(G,E,W)
In important practical cases: vertices are points in high-dimensional
Euclidean space, weights may be a function of Euclidean distance.

Difficulties
Data sets in high-dimensions are complicated, as are classes of interesting
functions on them.
We want to do approximation (“learning”) of such functions. Parametrize
low dimensional subspaces of smooth functions on sets embedded in high-
dimension. “Fast” algorithms
High dimensional data: examples
• Documents, web searching
• Customer databases
• Satellite imagery
• Transaction logs
• Social networks
• Gene arrays, proteomics data
• Art transactions data
• Traffic (automobilistic, network) statistics


1

0.8

0.6

0.4

0.2

-0.2

-0.4

-0.6

-0.8

50 100 150 200 250


120

100

80

60

40

20

-20

50 100 150 200 250

Function with a discontinuity


6

-2

-4

-6

-8

-10

-12

-14
0 50 100 150 200 250 300

Rate of decay of coefficients


onto Fourier (red) and
wavelets (blue)
Laplacian, diffusion geometries
[RR Coifman, S. Lafon]

Related previous and current work by M. Belkin, P. Niyogi on eigenmaps and


applications to semi-supervised learning on manifolds and graphs.
Part of the material in the next few slides is courtesy of Stephane Lafon.

From local to global: diffusion distances


Motto: Diffusion distance measures and averages connections of all lengths. It is
more stable and “finer” than geodesic distance, uses a “preponderance of evidence”
A local “similarity” operator on the set

X = { x1 , x2 ,..., xN } data set.

k ( x , y ) kernel defined on the data and being


symmetric k ( x, y) = k ( x, y)
positivity-preserving: k ( x , y ) ≥ 0
positive semi-definite: ∑ ∑ α ( x)α ( y)k ( x, y) ≥ 0
x∈ X y∈ X

The kernel k describes the geometry of X by defining the relationship


between the data points.
Examples of similarity kernels

Examples:
If X lies in n-dimensional Euclidean space, k(x,y) could be:
- exponentially weighted distance: exp(-(||x-y||/a)^2
- angle: <x,y>/(||x|| ||y||)
- harmonic potential: 1/(a+||x-y||), or powers of it
- “feature distances”: any of the above applied in the range f(X), f nonlinear,
possibly mapping X to higher dimension
- if a model for the data is available (probabilistic, or from a (stochastic)
dynamical system), k may be chosen to be consistent with that model

If X is an abstract graph: we need to be given some weights on edges,


measuring similarity.
This could wildly vary: from the graph obtained by discretizing a PDE, to
ways of measuring similarity between two web pages, or two protein chains...
Diffusion Distances

Diffusion embedding mapping X with diffusion distance into Euclidean


space with Euclidean distance:
Phi1 Phi2 Phi3
Diffusion embedding mapping X with diffusion distance into Euclidean
space with Euclidean distance:
Link with other kernel methods

Recent kernel methods: LLE (Roweis, Saul 2000),


Laplacian Eigenmaps (Belkin, Niyogi 2002),
Hessian Eigenmaps (Donoho, Grimes 2003),
LTSA (Zhang, Zha 2002) ...
all based on the following paradigm: minimize Q( f ) where
Q( f ) = ∑ Qx ( f )
x∈ X

Qx ( f ) : quadratic form measuring local variation of f in a neighborhood of x


Solution: compute eigenfunctions {ϕl }of Q and map data points via
x (ϕ0 ( x), ϕ1 ( x),..., ϕ p ( x)) T
In our case we minimize ∑ k ( x, y)( f ( x) − f ( y)) 2
x∈ X
Applications

- Classifiers in the semi-supervised learning context (M. Belkin, P. Nyogi)


- fMRI data (F. Meyer, X. Shen)
- Art data (W Goetzmann, PW Jones, MM, J Walden)
- Hyperspectral Imaging in Pathology (MM, GL Davis, F Warner, F.
Geshwind, A Coppi, R. DeVerse, RR Coifman)
- Molecular dynamics simulations (RR. Coifman, G.Hummer, I. Kevrekidis,
MM)
- Text documents classification (RR. Coifman, MM)
Art data
[W Goetzmann, PW Jones, MM, J Walden]
Given the sales of art works (paintings and drawings) of 58 artists in for 9
years.
View them as 58 points in 9 dimensions, try to find clusters that indicate
how the works of groups of artists are 'traded' together. Does this correlate
to style?
3

-1

-2

-3

-4
5

-5
6 8
2 4
-10 -2 0
-4

Projection of the data on the top 3 Principal Components


Image under the top 3 eigenfunction of Laplacian + K-means
Hyper-spectral Pathology Data
[MM, GL Davis, F Warner, F. Geshwind, A Coppi, R. DeVerse, RR Coifman]
So far...
..we have seen that:
- it seems useful to consider a framework in which, for a given data set, local
similarities are given only between very similar points
- it is possible to organize these local information by diffusion into global
parametrizations,
- these parametrizations can be found by looking at the eigenvectors of a diffusion
operator,
- these eigenvectors in turn yield a nonlinear embedding into low-dimensional
Euclidean space,
- the eigenvectors can be used for global Fourier analysis on the set/manifold

PROBLEM:
Either very local information or very global information: in many problems the
intermediate scales are very interesting (think of web pages)! Would like
MULTISCALE information!

Solution 1: proceed bottom up: repeatedly cluster together in a multi-scale fashion, in


a way that is “faithful” to the operator: diffusion wavelets.
Solution 2: proceed top bottom: cut greedily according to global information, and
repeat procedure on the pieces: recursive partitioning, local cosines...
Solution 3: do both!
From global Diffusion Geometries...
We are given a graph X with weights W. There is a natural random walk
P on X induced by these weights.
P maps probability distributions on X to probability distributions on X.
The reversibility condition on P implies that P is conjugate to a self-
adjoint operator T. We assume T is renormalized to have 2-norm 1.
The spectra of T and its powers look like:
1

0.9

0.8

0.7

0.6

2
0.5 σ (T )

4
0.4 σ (T )

8
0.3 σ (T )

0.2 16)
σ (T

0.1 ε

0 5 10 15 20 25 30
... V V V V
3 2 1 0
... to Multiresolution Diffusion
[Coifman,MM]
The decay in the spectrum of T says
powers of T are low-rank, hence
Multiscale compression scheme: compressible.
- Random walk for 1=2^0 step 1

- Collect together random walkers into 0.9

0.8

representatives 0.7

- Write a random walk on the representatives 0.6

- Let the representatives random walk 2=2^1


0.5 σ (T )

4
0.4 σ (T )

steps... 0.3 8
σ (T )

0.2 16
σ (T )

0.1 ε

0 5 10 15 20 25 30
... V V V V
3 2 1 0
Multiscale Random Walkers

“=”
Dilations, translations, downsampling
We have frequencies: the eigenvalues of the diffusion T.
What about dilations , translations , downsampling?
We may have minimal information about the geometry, and only locally. Let's
think in terms of functions on the set X.

Dilations:
Use the diffusion operator T and its dyadic powers as dilations.

Translations and downsampling:


Idea: diffusing a basis of “scaling functions” at a certain scale by a power of T
should yield a redundant set of coarser “scaling functions” at the next coarser
scale: reduce this set to a Riesz (i.e. well-conditioned) or even orthonormal
basis. This is downsampling in the function space, and corresponds to finding a
well-conditioned subset of “translates”.
Potential Theory, Green's function
Diffusion Wavelets on the sphere
Diffusion Wavelets on a dumbbell
Connections...
Wavelets:
- Lifting (W. Sweldens, I. Daubechies,...)
- Continuous wavelets from group actions (Grossman, Morlet, Ali, Antoine,
Gazeau, many physicists...)
Classical Harmonic Analysis:
- Littlewood-Paley on semigroups (Stein), Markov diffusion semigroups
(large literature)
- Martingales associated to the above, Brownian motion
- Heat kernel estimates, generalized Heisenberg principles (A. Nahmod)
- Harmonic Analysis of eigenfunctions of the Laplacian on domains,
manifolds and graphs
- Atomic decompositions (Coifman, Weiss, Rochberg...)
Numerics:
- Algebraic Multigrid (Brandt, large literature since the 80's...)
- Kind of “inverse” FMM (Rohklin, Beylkin-Coifman-Rohklin, ...)
- Multiscale matrix compression techniques (Gu-Eisenstat, Chaen-Gimbutas-
Martinsson-Rohklin,...)
- Randomizable
- FFTs?
Diffusion Wavelet Packets
[JC Bremer, RR Coifman, MM, AD Szlam]
We can split the wavelet subspaces further, in a hierarchical dyadic fashion,
very much like in the classical case. The splittings are generated by
“numerical kernel” and “numerical range” operations.
Compression example I
Denoising
[a la Coifman-Wickerhauser & Donoho-Johnstone]
Analysis of a document corpora
Given 1,000 documents, each of which is associated with 1,000 words, with a
value indicating the relevance of each word in that document.
View this as a set of 1,000 in 1,000 dimensions, construct a graph with 1,000
vertices, each of which with few selected edges to very close-by documents.

0.15

0.2 0.1

0.05
0.1
0
0.1
0
-0.05
0.05
-0.1 -0.1
0 0.15
-0.15
-0.2
0.1
-0.05 -0.2
-0.05 0.05
0 0.1
0.05 0.05
-0.1 0 0
0.1 -0.05
-0.1 -0.05
Physics
0.2
0.2

0.1
0.1
0

0
-0.1

-0.1
-0.2
0.1
0.05 0.2 0.15 -0.2
0 0.1 0.1
0.05 0.05 0
-0.05 0 0.1
-0.1 0 0.1
-0.05
0.05
-0.05 0
Paleontology
0
-0.1 -0.05
Astronomy,
-0.1
planets
Climate -0.2
0.1
0.05 0.2 0.15
0.2 0.1
0
0.05
-0.05 0 0.1
0.1 -0.1 -0.05

0
0

-0.1
-0.1

-0.2
-0.2 0.1
0.1
0.05
0.05 0.15
0
0.05
0.1 Comets, 0
-0.05
0.05
0.1

-0.05 0
asteroids -0.1 -0.05
0
Examples of multiscale scaling functions on documents

Documents Words
- By Jessica Gorman--"Pan scrapings!" announced G.
'wine' 'region' 'chemical' 'food' 'ancient'
Kenneth Sams. "When you come down to it, that is -what
has brought us all here tonight-pan scrapings."-- It was
probably not the most appetizing introduction for a $150-
per-plate -museum fundraiser. Some 150 guests, many in
black tie, had just sat down to -enjoy their first course at
lavish banquet tables in the Upper Egyptian Gallery -of the
University of Pennsylvania Museum of Archaeology and
Anthropology in -Philadelphia. Did they r

Can chemical analysis confirm a wine's authenticity?--


Damaris Christensen-- There's a romance to wine
unmatched by most agricultural products. Making -wine,
with all its complex flavors, remains as much an art as a
science.-- Hints of citrus, peach, raspberry, pear, oak, grass,
or flowers may show up in -the taste or smell of wines. The
catalog of factors that determine the flavor -and bouquet is
almost as long as the list of adjectives that connoisseurs use
-to describe them.--
Examples of multiscale scaling functions on documents

Documents Words
'nitrogen' 'plant' 'ecologist' 'carbon' 'global'
by J. Raloff-- Acid rain and agricultural pollution 'specy' 'native' 'change' 'pollution'
both spew nitrogen into the air. Though -plants need 'university' 'growth' 'soil' 'human' 'activity'
nitrogen to grow, a new study finds that even small 'nutrient' 'dioxide' 'fuel' 'acid' 'cycle'
additions of -this fertilizing pollutant can perturb the 'natural'
landscape.-- In plots of Minnesota prairie to which
ecologists applied nitrogen for 12 -years, native
grasses showed a dramatically impaired ability to
compete against -weeds that had immigrated from
Europe centuries ago.-- The nitrogen treatment
triggered the terrestrial equ

ans =

by C. Mlot-- For most of agricultural history,


nitrogen has been a precious commodity. Only
-specialized bacteria and lightning could convert
atmospheric nitrogen into -biologically usable forms.
Today, however, fertilizers and fossil fuels have
-made nitrogen so freely available that it has become
too much of a good thing.-- In a review of nitrogen's
effects across the environmental spectrum, a team of
-ecologists headed by Peter M. Vitousek of Stanford
University has concluded in -no
Examples of multiscale scaling functions on documents

Documents Words
'earthquake' 'wave' 'fault' 'quake' 'tsunami'
New technologies convert the motion of waves into 'california' 'southern' 'rock' 'seismologist'
watts-- Peter Weiss-- 'ocean' 'lake' 'coast' 'tsunamis' 'north'
R. Monastersky-- The underwater volcano growing 'region' 'island' 'seismic' 'plate' 'crust'
off Hawaii's south coastline collapsed - 'power'
- The Mush Zone A slurpy layer lurks deep inside
the planet--
by I. Peterson-- In late 1942, carrying 15,000 U.S.
soldiers bound for England, the Queen Mary -hit a
storm about 700 miles off the coast of Scotland.
Without warning amid the -tumult, a single,
mountainous wave struck the ocean liner, rolling it
over and -washing water across its upper decks.
Luckily, the ship managed to right itself -and
continue on its voyage.-- Gargantuan waves, which
appear unexpectedly even under calm conditions in
the -open ocean, have damaged and sunk numerous s
- Racing the Waves Seismologists try to catch quake
tremors quickly enough to -save lives--
- geologists have been snapping a very different
picture of -the lake lately. Far beneath Lake Tahoe's
gentle surface, they say, several -hidden earthquake
faults snake across the lake's flat bottom.
How a middling quake made a giant tsunami
Waves of Death-- Why the New Guinea tsunami
Comments, Applications, etc...
● This is a wavelet analysis on manifolds, graphs, markov chains, certain fractals,
while Laplacian eigenfunctions do the corresponding Fourier Analysis, adapted to a
given diffusion operator.
● We are “compressing”: powers of the operator, functions of the operators, subspaces

of the function subspaces on which its powers act. This yields sampling formulas,
quadrature formulas.
● Does not require the diffusion to be self-adjoint, nor needs the eigenvectors.

● We are constructing a biorthogonal version of the transform (better adapted to

studying Markov chains): this will allow efficient denoising, compression,


discrimination on all the spaces mentioned above. Preliminary: n log(n) with good
constants for symmetric problems.
● The multiscale spaces are a natural scale of complexity spaces for learning empirical

functions on the data set.


● Diffusion scaling functions extend outside the set, in a natural multiscale fashion.

● Exploring ties with measure-geometric considerations used for embedding metric

spaces in Euclidean spaces with small distortion.


● Study and compression of dynamical systems.
Current & Future Work

- Use for regularization and learning on graphs and manifolds: semi-


supervised and plain supervised.
- Robustness to noise, perturbations, extension of the functions to test points.
- Biorthogonal construction (better localization, better constants)
- Multiscale embeddings for graphs, measure-geometric implications
- Martingale aspects and fast Brownian motion simulation in
nonhomogeneous media
- Compression of data sets
- Going nonlinear

This talk, papers, Matlab code available at:


www.math.yale.edu/~mmm82
(or Google for “Mauro Maggioni”)
Thank you!

Vous aimerez peut-être aussi