Bienvenue sur Scribd !

Clustering

Transféré par

0% ont trouvé ce document utile (0 vote)

35 vues20 pages

This document discusses clustering and summarizes several clustering algorithms and concepts. It introduces k-means clustering and describes how it works by randomly assigning cluster centers and iteratively assigning points to clusters until convergence. The document notes that k-means has pros of often being fast and finding local minima, but has cons such as being sensitive to outliers and initialization. Spectral clustering is also introduced, which embeds data before clustering. The document provides examples of clustering image and iris data and discusses determining the number of clusters using methods like gap statistics. The goal given is to learn more about clustering algorithms and techniques for selecting the number of clusters.

Description originale:

Copyright

Formats disponibles

PPT, PDF, TXT ou lisez en ligne sur Scribd

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Signaler ce document

Droits d'auteur :

Attribution Non-Commercial (BY-NC)

Formats disponibles

Téléchargez comme PPT, PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

0% ont trouvé ce document utile (0 vote)

35 vues20 pages

Clustering

Transféré par

amrajee

Droits d'auteur :

Attribution Non-Commercial (BY-NC)

Formats disponibles

Téléchargez comme PPT, PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

Passer à la page

Vous êtes sur la page 1sur 20

Rechercher à l'intérieur du document

Clustering

Gilad Lerman
Math Department, UMN
Slides/figures stolen from M.-A. Dillies, E. Keogh, A. Moore
What is Clustering?
Partitioning data into classes with
high intra-class similarity
low inter-class similarity
Is it well-defined?

What is Similarity?
Clearly, subjective measure or problem-dependent
How Similar Clusters are?
Ex1: Two clusters or one clusters?
How Similar Clusters are?
Ex2: Cluster or outliers
Sum-Squares Intra-class Similarity
Given Cluster
Mean:

Within Cluster Sum of Squares:

Note that
{ }
=
1
1 1
,...,
N
S x x
=

1
1
1
1
N
i
x
N

e =
=

1
2 2
2
1 1
1
WCSS( )= , where ( )
i
D
i j
x S j
S x y y

e
=

1
2
1
argmin
i
i
c
x S
x c
Within Cluster Sum of Squares
For Set of Clusters S={S
1
,,S
K
}

Can use

So get Within Clusters Manhattan Distance

Question: how to compute/estimate c?

= e

2
1
WCSS( )=
i j
K
i j
j x S
S x
= =
= =

2
1
1 1
( ) instead of ( )
D D
j j
j j
y y y y
= e
e

1
1
1
WCMD( )= ,
where argmin
i j
i j
K
i j
j x S
j i
c
x S
S x m
m x c
Minimizing WCSS
Precise minimization is NP-hard
Approximate minimization for WCSS by
K-means
Approximate minimization for WCMD by
K-medians
The K-means Algorithm
Input: Data & number of clusters (K)
Randomly guess locations of K cluster centers
For each center assign nearest cluster
Repeat till convergence .

Demonstration: K-means/medians
Applet

K-means: Pros and Cons
Pros
Often fast
Often terminates at a local minimum
Cons
May not obtain the global minimum
Depends on initialization
Need to specify K
Sensitive to outliers
Sensitive to variations in sizes and densities of clusters
Not suitable for non-convex shapes
Does not apply directly to categorical data
Spectral Clustering
Idea: embed data for easy clustering
Construct weights based on proximity:

(Normalize W )
Embed using eigenvectors of W
o
= =
2
/
if and 0 otherwise
i j
x x
ij
W e i j
Clustering vs. Classification
Clustering find classes in an unsupervised
way (often K is given though)
Classification labels of clusters are given
for some data points (supervised learning)
Data 1: Face images
Facial images (e.g., of persons 5,8,10) live on different
planes in the image space
They are often well-separated so that simple clustering
can apply to them (but not always)
Question: What is the high-dimensional image space?
Question: How can we present high-dim. data in 3D?

Data 2: Iris Data Set
50 samples from each of 3 species
4 features per sample:
length & width of sepal and petal
Setosa Versicolor Virginica
Data 2: Iris Data Set
Data 2: Iris Data Set
Setosa is clearly separated from 2 others
Cant separate Virginica and Versicolor
(need training set as done by Fischer in 1936)
Question: What are other ways to visualize?
Data 3: Color-based Compression
of Images
Applet
Question: What are the actual data points?
Question: What does the error mean?
Some methods for # of Clusters
(with online codes)
Gap statistics
Model-based clustering
G-means
X-means
Data-spectroscopic clustering
Self-tuning clustering
Your mission
Learn about clustering (theoretical results,
algorithms, codes)
Focus: methods for determining # of clusters
Understand details
Compare using artificial and real data
Conclude good/bad scenarios for each (prove?)
Come up with new/improved methods
Summarize info: literature survey and possibly
new/improved demos/applets
We can suggest additional questions tailored to
your interest

Vous aimerez peut-être aussi

Never Split the Difference: Negotiating As If Your Life Depended On It
D'Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Évaluation : 4.5 sur 5 étoiles
4.5/5 (838)
Shoe Dog: A Memoir by the Creator of Nike
D'Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Évaluation : 4.5 sur 5 étoiles
4.5/5 (537)
JDS 192
Document16 pages
JDS 192
amrajee
Pas encore d'évaluation
ICOAC2012
Document1 page
ICOAC2012
Britto Ebrington Ajay
Pas encore d'évaluation
Lecture 3 - Cluster Analysis
Document62 pages
Lecture 3 - Cluster Analysis
amrajee
Pas encore d'évaluation
Orange Chicken
Document2 pages
Orange Chicken
amrajee
Pas encore d'évaluation
Introduction To Intelligent Computing
Document27 pages
Introduction To Intelligent Computing
amrajee
Pas encore d'évaluation
Splitpea Soup
Document1 page
Splitpea Soup
amrajee
Pas encore d'évaluation
Computer Network Qns and Ans
Document35 pages
Computer Network Qns and Ans
get_together
Pas encore d'évaluation
Parsing: Prof. Busch - LSU 1
Document47 pages
Parsing: Prof. Busch - LSU 1
amrajee
Pas encore d'évaluation
Tamil Children Story
Document23 pages
Tamil Children Story
isdamilus
100% (16)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
D'Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Évaluation : 4 sur 5 étoiles
4/5 (5794)
The Yellow House: A Memoir (2019 National Book Award Winner)
D'Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Évaluation : 4 sur 5 étoiles
4/5 (98)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
D'Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Évaluation : 4 sur 5 étoiles
4/5 (894)
The Little Book of Hygge: Danish Secrets to Happy Living
D'Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Évaluation : 3.5 sur 5 étoiles
3.5/5 (399)
Principles: Life and Work
D'Everand
Principles: Life and Work
Ray Dalio
Évaluation : 4 sur 5 étoiles
4/5 (599)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
D'Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Évaluation : 4.5 sur 5 étoiles
4.5/5 (474)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
D'Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Évaluation : 3.5 sur 5 étoiles
3.5/5 (231)
Grit: The Power of Passion and Perseverance
D'Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Évaluation : 4 sur 5 étoiles
4/5 (587)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
D'Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Évaluation : 4.5 sur 5 étoiles
4.5/5 (265)
Yes Please
D'Everand
Yes Please
Amy Poehler
Évaluation : 4 sur 5 étoiles
4/5 (1891)
On Fire: The (Burning) Case for a Green New Deal
D'Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Évaluation : 4 sur 5 étoiles
4/5 (73)
The Emperor of All Maladies: A Biography of Cancer
D'Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Évaluation : 4.5 sur 5 étoiles
4.5/5 (271)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
D'Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Évaluation : 4.5 sur 5 étoiles
4.5/5 (344)
The Unwinding: An Inner History of the New America
D'Everand
The Unwinding: An Inner History of the New America
George Packer
Évaluation : 4 sur 5 étoiles
4/5 (45)
Team of Rivals: The Political Genius of Abraham Lincoln
D'Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Évaluation : 4.5 sur 5 étoiles
4.5/5 (234)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
D'Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Évaluation : 3.5 sur 5 étoiles
3.5/5 (2219)
Rise of ISIS: A Threat We Can't Ignore
D'Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Évaluation : 3.5 sur 5 étoiles
3.5/5 (137)
Fear: Trump in the White House
D'Everand
Fear: Trump in the White House
Bob Woodward
Évaluation : 3.5 sur 5 étoiles
3.5/5 (738)
Angela's Ashes: A Memoir
D'Everand
Angela's Ashes: A Memoir
Frank McCourt
Évaluation : 4.5 sur 5 étoiles
4.5/5 (440)
John Adams
D'Everand
John Adams
David McCullough
Évaluation : 4.5 sur 5 étoiles
4.5/5 (2409)
Steve Jobs
D'Everand
Steve Jobs
Walter Isaacson
Évaluation : 4.5 sur 5 étoiles
4.5/5 (806)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
D'Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Évaluation : 4 sur 5 étoiles
4/5 (1090)
Bad Feminist: Essays
D'Everand
Bad Feminist: Essays
Roxane Gay
Évaluation : 4 sur 5 étoiles
4/5 (1015)
The Glass Castle: A Memoir
D'Everand
The Glass Castle: A Memoir
Jeannette Walls
Évaluation : 4.5 sur 5 étoiles
4.5/5 (1712)
The Outsider: A Novel
D'Everand
The Outsider: A Novel
Stephen King
Évaluation : 4 sur 5 étoiles
4/5 (1839)
The Light Between Oceans: A Novel
D'Everand
The Light Between Oceans: A Novel
M.L. Stedman
Évaluation : 4.5 sur 5 étoiles
4.5/5 (789)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
D'Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Évaluation : 4.5 sur 5 étoiles
4.5/5 (119)
Manhattan Beach: A Novel
D'Everand
Manhattan Beach: A Novel
Jennifer Egan
Évaluation : 3.5 sur 5 étoiles
3.5/5 (792)
The Woman in Cabin 10
D'Everand
The Woman in Cabin 10
Ruth Ware
Évaluation : 3.5 sur 5 étoiles
3.5/5 (2322)
Brooklyn: A Novel
D'Everand
Brooklyn: A Novel
Colm Toibin
Évaluation : 3.5 sur 5 étoiles
3.5/5 (1937)
A Man Called Ove: A Novel
D'Everand
A Man Called Ove: A Novel
Fredrik Backman
Évaluation : 4.5 sur 5 étoiles
4.5/5 (4609)
Wolf Hall: A Novel
D'Everand
Wolf Hall: A Novel
Hilary Mantel
Évaluation : 4 sur 5 étoiles
4/5 (3811)
The Perks of Being a Wallflower
D'Everand
The Perks of Being a Wallflower
Stephen Chbosky
Évaluation : 4.5 sur 5 étoiles
4.5/5 (2099)
The Art of Racing in the Rain: A Novel
D'Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Évaluation : 4 sur 5 étoiles
4/5 (4200)
Little Women
D'Everand
Little Women
Louisa May Alcott
Évaluation : 4 sur 5 étoiles
4/5 (104)
A Tree Grows in Brooklyn
D'Everand
A Tree Grows in Brooklyn
Betty Smith
Évaluation : 4.5 sur 5 étoiles
4.5/5 (1929)
Sing, Unburied, Sing: A Novel
D'Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Évaluation : 4 sur 5 étoiles
4/5 (1103)
The Constant Gardener: A Novel
D'Everand
The Constant Gardener: A Novel
John le Carre
Évaluation : 3.5 sur 5 étoiles
3.5/5 (104)
Her Body and Other Parties: Stories
D'Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Évaluation : 4 sur 5 étoiles
4/5 (821)
Spam Detection
Document17 pages
Spam Detection
anon_35146415
Pas encore d'évaluation
The Cartoon Guide To Statistics-3
Document8 pages
The Cartoon Guide To Statistics-3
Swetha Ramesh
100% (1)
Bhoomika Valani's Resume - Skilled Frontend Developer and Data Analyst
Document2 pages
Bhoomika Valani's Resume - Skilled Frontend Developer and Data Analyst
Bhoomika Valani
Pas encore d'évaluation
Deep Learning: Seungsang Oh
Document39 pages
Deep Learning: Seungsang Oh
KaAI Kookmin
Pas encore d'évaluation
Applications of Neural Networks - Tutorialspoint
Document2 pages
Applications of Neural Networks - Tutorialspoint
prabhuraaj101
Pas encore d'évaluation
Data Analytics For PDF
Document701 pages
Data Analytics For PDF
Ramesh Padmanabhan
100% (2)
Hand Sign Language Translator For Speech Impaired
Document4 pages
Hand Sign Language Translator For Speech Impaired
International Journal of Innovative Science and Research Technology
Pas encore d'évaluation
Assignment 4
Document3 pages
Assignment 4
Uday Gulghane
Pas encore d'évaluation
Viplav Awasthi-DataScientist
Document6 pages
Viplav Awasthi-DataScientist
Shaik Saleem
Pas encore d'évaluation
SSRN Id3693395
Document62 pages
SSRN Id3693395
tema.kouznetsov
Pas encore d'évaluation
Machine Learning Reference Guide ANALYTICS
Document7 pages
Machine Learning Reference Guide ANALYTICS
Chirag Sachdeva
Pas encore d'évaluation
1 - A Survey of Intrusion Detection Models Based On NSL-KDD Data Set (IEEE)
Document6 pages
1 - A Survey of Intrusion Detection Models Based On NSL-KDD Data Set (IEEE)
Bach Nguyen
Pas encore d'évaluation
Agglomerative Hierarchical Clustering Dendrogram
Document22 pages
Agglomerative Hierarchical Clustering Dendrogram
Daneil Radcliffe
Pas encore d'évaluation
Data Science Roadmap 2024
Document12 pages
Data Science Roadmap 2024
diwakargupta1098
Pas encore d'évaluation
Data Science Training in Bangalore-Learnbay - in
Document13 pages
Data Science Training in Bangalore-Learnbay - in
Krishna Kumar
100% (1)
Prediction of Drag Reduction Effect in Turbulent Pulsating Pipe Flow by Machine Learning Based On Experimental Data
Document5 pages
Prediction of Drag Reduction Effect in Turbulent Pulsating Pipe Flow by Machine Learning Based On Experimental Data
mjsm msm
Pas encore d'évaluation
TUV SUD Brochure PDF
Document12 pages
TUV SUD Brochure PDF
aman raj
Pas encore d'évaluation
Artikel 10 147 154 Analisis Sentimen Review Penjualan Produk Umkm Pada Kabupaten Nias Dengan Komparasi Algoritma Klasifikasi Machine Learning
Document8 pages
Artikel 10 147 154 Analisis Sentimen Review Penjualan Produk Umkm Pada Kabupaten Nias Dengan Komparasi Algoritma Klasifikasi Machine Learning
Sistem Informasi
Pas encore d'évaluation
"Iot Based Smart Public Distribution System": A Micro Project Report On
Document25 pages
"Iot Based Smart Public Distribution System": A Micro Project Report On
Sandesh Pathare
Pas encore d'évaluation
ID 431 Anodot Ultimate Guide To Building A Machine Learning Outlier Detection System Part III
Document20 pages
ID 431 Anodot Ultimate Guide To Building A Machine Learning Outlier Detection System Part III
Clément Moutard
Pas encore d'évaluation
Klasifikasi Spam Email Algoritma c4.5
Document12 pages
Klasifikasi Spam Email Algoritma c4.5
Ali Sofyan
Pas encore d'évaluation
Disease Prediction Using Data Mining
Document5 pages
Disease Prediction Using Data Mining
abhishek
Pas encore d'évaluation
Classifying Handwritten Digits Using Neural Networks
Document19 pages
Classifying Handwritten Digits Using Neural Networks
Nihir Yadav
Pas encore d'évaluation
Top 10 Python Development Companies in India
Document14 pages
Top 10 Python Development Companies in India
brainerhub
Pas encore d'évaluation
K Means Clustering Solved Numerical - 5 Minutes Engineering
Document8 pages
K Means Clustering Solved Numerical - 5 Minutes Engineering
priyanshidubey2008
Pas encore d'évaluation
1 - Second Draft - AI
Document45 pages
1 - Second Draft - AI
ivy tsatsi
Pas encore d'évaluation
Machine Learning
Document430 pages
Machine Learning
arumbella
50% (2)
Assignmet Pre Final Application Dev Emerging Tech
Document2 pages
Assignmet Pre Final Application Dev Emerging Tech
pia espanillo
Pas encore d'évaluation
Image Classification Using CNN (Convolutional Neural Networks)
Document16 pages
Image Classification Using CNN (Convolutional Neural Networks)
Sumit Saha
Pas encore d'évaluation
Advanced Data Mining Techniqes in Bioinformatics
Document343 pages
Advanced Data Mining Techniqes in Bioinformatics
Naeem Ahmed
100% (1)