Vous êtes sur la page 1sur 27

Spatial Data Mining

Presented by-:
Rajkumar jain
M.tech (c.s.e)
1st year (2nd sem)
Overview
• What is spatial data.
• What makes spatial data mining
different.
• Spatial data mining task
• Spatial data properties
• Clustering analysis
• Trend analysis
• Future parameter
2
What is Spatial Data?
• Objects of
types:
– points
– lines
– polygons
– etc.

Used in/for:
GIS - Geographic Information
Systems
GPS - Global Positioning System
Environmental studies
etc …
Introduction
• Spatial data mining is the process of discovering
interesting, useful, non-trivial patterns from large
spatial datasets
– E.g. Determining hotspots: unusual locations.
• Spatial Data Mining Tasks
– Characteristics rule.
– Discriminate rule.
E.g. Comparison of price ranges of different geographical
area.
– Association rule-: we can associate the non spatial attribute
to spatial attribute or spatial attribute to spatial attribute.
– Clustering rule-: helpful to find outlier detection which is
useful to find suspicious knowledge E.g. Group crime location.

4
- Classification rule-: it defines whether a spatial entity belong
to a particular class or how many classes will be classified.
e.g. Remote sensed image based on spectrum and GIS data.
- Trend detection-A trend is a temporal pattern in some time
series data. Spatial trend is defined as consider a non spatial
attribute which is the neighbour of a spatial data object.
• Properties of Spatial Data
– Spatial autocorrelation
– Spatial heterogeneity
– Implicit Spatial Relations

5
Hetrogeneity of Spatial Data
• Auto correlation.
• Patterns usually have to be defined in the
spatial attribute subspace and not in the
complete attribute space.
• Longitude and latitude (or other
coordinate systems) are the glue that
link different data collections together.
• People are used to maps in GIS
therefore, data mining results have to
summarized on the top of maps.
• Patterns not only refer to points, but
can also refer to lines, or polygons or
other higher order geometrical objects

7
Autocorrelation
• Items in a traditional data are
independent of each other,
– whereas properties of locations in a map are
often “auto-correlated”.
• First law of geography [Tobler]:
– Everything is related to everything, but
nearby things are more related than distant
things.
– People with similar backgrounds tend to live
in the same area.
– Economies of nearby regions tend to be
similar.
– Changes in temperature occur gradually
over space.

9
Spatial Relations
• Spatial databases do not store spatial
relations explicitly
– Additional functionality required to compute them
• Three types of spatial relations specified by
the OGC reference model
– Distance relations
• Euclidean distance between two spatial features
– Direction relations
• Ordering of spatial features in space
– Topological relations
• Characterise the type of intersection between spatial
features

10
Distance relations
• If dist is a distance A
B
function and c is
some real number A B
1. dist(A,B)>c,
2. dist(A,B)<c and
A B
3. dist(A,B)=c

11
Direction relations
• If directions of B and C
C north A
are required with
respect to A C
• Define a representative
point, rep(A) B northeast A
• rep(A) defines the A B
origin of a virtual
coordinate system
• The quadrants and half rep(A)
planes define the
direction relations
• B can have two values
{northeast, east}
• Exact direction relation
is northeast
12
Topological Relations
• Topological relations describe how geometries
intersect spatially.
• Simple geometry types
– Point, 0-dimension
– Line, 1-dimension
– Polygon, 2-dimension
• Each geometry represented in terms of
– boundary (B) – geometry of the lower dimension
– interior (I) – points of the geometry when boundary is
removed
– exterior (E) – points not in the interior or boundary

13
DE-9IM
• Topological relations are defined using any
one of the following models
– 4IM, four intersection model (only B and E
considered)
– 9IM, nine intersection models (B, I, and E)
– DE-9IM, dimensionally extended 9 intersection
model.

• Dim is the dimension function


14
Example
• Consider two
polygons
– A - POLYGON ((10
10, 15 0, 25 0, 30 10,
25 20, 15 20, 10 10))
– B - POLYGON ((20
10, 30 0, 40 10, 30
20, 20 10))

15
9-Intersection Matrix of example
geometries
I(B) B(B) E(B)

I(A)

B(A)

E(A)

16
DE-9IM for the example
geometries
I(B) B(B) E(B)

I(A) 2 1 2

B(A) 1 0 1

E(A) 2 1 2

17
Relationships using DE-9IM
• Different geometries may give
rise to different numbers in the
DE-9IM A I(B) B(B) E(B)
• For a specific type of
relationship we are only over
interested in certain values in
certain positions laps
– That is, we are interested in
patterns in the matrix than B

actual values
Actual values are replaced by
I(A) T * T
wild cards
– T: value is "true" - non empty -
any dimension >= 0 B(A) * * *
– F: value is "false" - empty -
dimension < 0
– *: Don't care what the value is E(A) T * *
– 0: value is exactly zero
– 1: value is exactly one
– 2: value is exactly two
18
Cluster analysis
• Cluster analysis divides data into meaningful or useful groups
(clusters). Cluster analysis is very useful in spatial databases.
For example, by grouping feature vectors as clusters can be
used to create thematic maps which are useful in geographic
information systems.
• CLUSTERING METHODS FOR SPATIAL DATA MINING
1. Partitioning Around Medoids (PAM)- PAM is similar to K- means
algorithm. Like k- means algorithm, PAM divides data sets into
groups but based on medoids. Whereas k- means is based on
centroids. By using medoids, we can reduce the dissimilarity of
objects within a cluster. In PAM, first calculate the medoid,
then assigned the object to the nearest medoid, which forms a
cluster.
• let i be a object, vi be a cluster then i is nearer to medoids mvi
than mw d(i ,mvi)<d(i, mw) here w=1,2,……..k.
The k representative objects should minimize the objective
function, which is the sum of the dissimilarities of all objects to
19
their nearest medoid: Objective function = S d(i, mvi)
• Clustering Large Applications(CLARA)
• Compared to PAM, CLARA can deal with much larger data sets.
Like PAM CLARA also finds objects that are centrally located in
the clusters. The main problem with PAM is that it finds the
entire dissimilarity matrix at a time. So for n objects, the space
complexity of PAM becomes O(n2). But CLARA avoid this
problem. CLARA accepts only the actual measurements (i.e.,. n ´
p data matrix).
• CLARA assigns objects to clusters in the following way:
• BUILD-step: Select k "centrally located" objects, to be used as
initial medoids. Now the smallest possible average distance
between the objects to their medoids are selected, that forms
clusters.
• SWAP-step: Try to decrease the average distance between the
objects and the medoids. This is done by replacing
representative objects. Now an object that does not belong to
the sample is assigned to the nearest medoids.

20
Trend analysis
• Spatial trend-: it is regular change of one or more non spatial
attribute.
E.g. when we move away eastward from the cyber tower, the
rental of residential house decrease approximately at the rate
of 5% per km.
• This trend is identified by neighborhood path starting from
location O and regression analysis is performed on the
respective attribute values for the object of a neighborhood
path to describe the regularity of change.
there are two algorithm to determine the global trend and local
trend.
• Global trend-:
here if considering all the object on all path starting from O,
the values for the specified attribute in general trend tend to
increase or decrease with increasing distance or decreasing
distance. 21
• Local trend-:
it consider the detect single path starting from an object O
and having a certain trend. E.g. some trends may be positive
while the other may be negative.

22
Spatial trend detection
• E.g. Let g be graph and O is an object in g and let a is a non
special attribute on which we are detecting changing pattern
while we move away from O in the neighborhood graph.
• Here let be a filter which indicate subset of neighbor to be
taken into consideration.
• Let min_conf be real number .
• Let min_length and max_length initialized with natural
number and here difference of distance must be between
these.

23
Architecture of Spatial Data mining
HUMAN COMPUTER INTERACTION SYSTEM

SPATIAL DATA DISCOVERABLE


MINING SYSTEM, KNOWLEDGE

DATA RELATED TO KNOWLEDGE BASE


PROBLEM MANAGEMENT
SYSTEM
SPATIAL DATA
BASE DOMAIN
MANAGEMENT KNOWLEDGE
SYSTEM DATABASE

SPATIAL DATABASE 24
Examples of Spatial Patterns
• 1855 Asiatic Cholera in London.
– A water pump identified as the source.
• Crime hotspots for planning police
patrol routes.
• Affects of weather in the US caused by
unusual warming of Pacific ocean (El
Nino).
Future scope

• Data mining in Spatial Object Oriented Databases:


How can the object oriented approach be used to design a
spatial database. Object Oriented Database may be a better
choice for handling spatial data rather than traditional relational
or extended relational models. For example, rectangles,
polygons, and more complex spatial objects can be model
naturally in object oriented database.

• Parallel data mining can use because here it takes much


computational time to process the spatial data.

26
Thank you

27

Vous aimerez peut-être aussi