Académique Documents
Professionnel Documents
Culture Documents
Outline
- Introduction
- Why Geographic Knowledge Discovery?
- Where we need it?
- Geographic Knowledge Discovery GKD
- Geographic Information Science GIS
- Geographic Research
- Geographic applications
- Geographic Data Warehousing
- Spatial Data Warehousing
- Differences between SDW & DW
- Types of Spatial Data Mining
A) - (SDM) Spatial Classification
B) - Spatial association
GKD
- Why Geographic Knowledge Discovery?
GKD is based on a belief that there is novel and useful geographic
knowledge hidden in the unprecedented amounts of data.
Large databases contain interesting patterns:
non-random properties and relationships that are:
- Valid data (general enough to apply to new data)
- Novel data (non-trivial and unexpected)
- Useful data (leads to effective action: decision making or investigation)
- Ultimately understandable data (simple, and interpretable by humans)
Outline
- Introduction
- Why Geographic Knowledge Discovery?
- Where we need it?
- Geographic Knowledge Discovery GKD
- Geographic Information Science GIS
- Geographic Research
- Geographic applications
- Geographic Data Warehousing
- Spatial Data Warehousing
- Differences between SDW & DW
- Types of Spatial Data Mining
A) - (SDM) Spatial Classification
B) - Spatial association
Climate
Access
Visualisation
Mapping
Outline
- Introduction
- Why Geographic Knowledge Discovery?
- Where we need it?
- Geographic Knowledge Discovery GKD
- Geographic Information Science GIS
- Geographic Research
- Geographic applications
- Geographic Data Warehousing
- Spatial Data Warehousing
- Differences between SDW & DW
- Types of Spatial Data Mining
A) - (SDM) Spatial Classification
B) - Spatial association
Geographic in Knowledge
Discovery
Data mining is only one step of the knowledge discovery from databases (KDD)
process. Data mining involves the application of techniques for distilling data
into information or facts implied by the data. KDD is the higher level process of
obtaining facts through data mining and distilling this information into knowledge
or ideas and beliefs about the mini world described by the data. This generally
requires a human-level intelligence to guide the process and interpret the results
based on pre-existing knowledge (Miller and Han 2001).
Geographic knowledge
discovery GKD
Geographic knowledge discovery (GKD) is the process of extracting information
and knowledge from massive georeferenced databases. (Shekhar et al. 2003).
Why GKD has always huge data:
-Because of the nature of geographic space, the complexity of spatial objects and
relationships between objects.
-Also, because of the heterogeneous and sometimes ill-structured nature of georeferenced data, and the nature of geographic knowledge (SMB'05)
Outline
- Introduction
- Why Geographic Knowledge Discovery?
- Where we need it?
- Geographic Knowledge Discovery GKD
- Geographic Information Science GIS
- Geographic Research
- Geographic applications
- Geographic Data Warehousing
- Spatial Data Warehousing
- Differences between SDW & DW
- Types of Spatial Data Mining
A) - (SDM) Spatial Classification
B) - Spatial association
Geographic Information
Systems (GIS)
Geographic Information Systems (GIS)
A geographic information system is a system designed to capture, store,
manipulate, analyze, manage, and present all types of geographical data. (Sun
2010)
It is important to know that exploring geo-spaces or levels for representing
geographic data, that is a form of data preprocessing that could substantially
enhance the GKD process. (Miller and Wentz 2003).
*GKD: Geographic knowledge discovery
Outline
- Introduction
- Why Geographic Knowledge Discovery?
- Where we need it?
- Geographic Knowledge Discovery GKD
- Geographic Information Science GIS
- Geographic Research
- Geographic applications
- Geographic Data Warehousing
- Spatial Data Warehousing
- Differences between SDW & DW
- Types of Spatial Data Mining
A) - (SDM) Spatial Classification
B) - Spatial association
Geographic Research
Mapping/searching is the main important spatial analysis of the geographic data
mining.
For example: in Google Map Researches
- Moving from displaying an accurate and comprehensive images that
supercomputers have managed, built, and allowed to access the images as 3D to ...
Geographic Research
The result:
- to create images by storing and analysing each pixel on the model to generate
textured 3D mesh for each building or item on the map.
Geographic Research
The result:
- A picture can be used to generate 3D "overfly" pictures, by analysing or mining
each pixel and color of the original picture.
Outline
- Introduction
- Why Geographic Knowledge Discovery?
- Where we need it?
- Geographic Knowledge Discovery GKD
- Geographic Information Science GIS
- Geographic Research
- Geographic applications
- Geographic Data Warehousing
- Spatial Data Warehousing
- Differences between SDW & DW
- Types of Spatial Data Mining
A) - (SDM) Spatial Classification
B) - Spatial association
GDM - applications
- MI: Map interpretation (for Interpretation of Terrain Features on a Topographic Map)
- RSI: Remote sensing interpretation (for interpreting and measuring of objects on aerial
photographs)
- EMS: Environmental mapping (soil type, water borders, oil level)
- ESTPC: Extracting spatiotemporal patterns for cyclones and crimes
- SPIS: Spatial interaction (movement/flow of people, capital, goods)
Outline
- Introduction
- Why Geographic Knowledge Discovery?
- Where we need it?
- Geographic Knowledge Discovery GKD
- Geographic Information Science GIS
- Geographic Research
- Geographic applications
- Geographic Data Warehousing
- Spatial Data Warehousing
- Differences between SDW & DW
- Types of Spatial Data Mining
A) - (SDM) Spatial Classification
B) - Spatial association
Geographic Data
Warehousing (GDW)
GDW is a database used for reporting and analysis.
Geographic data warehouse uses staging, integration, and
access layers to house its key functions.
A spatial data cube is the GDW analog to the data cube tool for
computing and storing all possible aggregations of some measure
in OLAP, Online Analytical Processing.
The spatial data cube must include standard attribute summaries
as well as pointers to spatial objects at varying levels of
aggregation.
Aggregating spatial objects is nontrivial and often requires
background domain knowledge in the form of a geographic
concept hierarchy. Strategies for selectively pre-computing
measures in the spatial data cube include none, pre-computing
rough approximations, and selective pre-computation
(Han, Stefanovic and Koperski 2000).
Outline
- Introduction
- Why Geographic Knowledge Discovery?
- Where we need it?
- Geographic Knowledge Discovery GKD
- Geographic Information Science GIS
- Geographic Research
- Geographic applications
- Geographic Data Warehousing
- Spatial Data Warehousing
- Differences between SDW & DW
- Types of Spatial Data Mining
A) - (SDM) Spatial Classification
B) - Spatial association
Why SDW
1- The nature of geographic space => various distances and spatial relationships
Spatial objects are embedded in multidimensional levels
implicit distances directional relationships topological relationships
Many physical and human geographic processes exhibit spatial properties
(e.g. migration, disease propagation, travel times in congested urban areas)
2- The complexity of spatial objects and relationships
Spatial dependency
Everything is related to everything else, but near things are more related than
distant things ((Tobler1979))
Spatial location
Size and Shape of geographic units have non trivial
influences on geographic processes
and their measurement
Outline
- Introduction
- Why Geographic Knowledge Discovery?
- Where we need it?
- Geographic Knowledge Discovery GKD
- Geographic Information Science GIS
- Geographic Research
- Geographic applications
- Geographic Data Warehousing
- Spatial Data Warehousing
- Differences between SDW & DW
- Types of Spatial Data Mining
A) - (SDM) Spatial Classification
B) - Spatial association
Differences between
SDW & DW
The main functional difference between SDW and standard data
warehouses is the capabilities for visualization and spatial aggregation.
That means: how to manipulate pictures
The differences:
* Conventional OLAP methods such as the data cube generate summary crosstabs
in tables.
** Spatial data requires capabilities for data summaries in cartographic form.
* Conventional OLAP tools have clear standards for aggregation and crosstabulation, namely, the one-dimensional attributes associated with each data
object.
** Spatial aggregation is more complex, and standards for aggregation operators
on geometric data types have not yet emerged (Shekhar and Chawla 2003).
Outline
- Introduction
- Why Geographic Knowledge Discovery?
- Where we need it?
- Geographic Knowledge Discovery GKD
- Geographic Information Science GIS
- Geographic Research
- Geographic applications
- Geographic Data Warehousing
- Spatial Data Warehousing
- Differences between SDW & DW
- Types of Spatial Data Mining
A) - (SDM) Spatial Classification
B) - Spatial association
Discovering Co-location
Patterns from Spatial
F)- Geovisualization
Geographic visualization (GVis), or the
integration of scientific visualization with
traditional cartography, is highly complementary to
the GKD process and can be exploited at all stages,
including data pre-processing, data mining and
knowledge construction (Gahegan et al 2001).
The problem is how to preserve the richness of
this information space when restricted to the low
dimensional information spaces that can be easily
related to geographic space by the user (these being
two or three spatial dimensions, and possibly time
through animation).
For example, dense and complex symbols and
colors within low dimensionality spaces can create
visual interaction effects that are poorly understood
and can confound knowledge discovery and
communication (Gahegan 1999).
Outline
- Introduction
- Why Geographic Knowledge Discovery?
- Where we need it?
- Geographic Knowledge Discovery GKD
- Geographic Information Science GIS
- Geographic Research
- Geographic applications
- Geographic Data Warehousing
- Spatial Data Warehousing
- Differences between SDW & DW
- Types of Spatial Data Mining
A) - (SDM) Spatial Classification
B) - Spatial association
Outline
- Introduction
- Why Geographic Knowledge Discovery?
- Where we need it?
- Geographic Knowledge Discovery GKD
- Geographic Information Science GIS
- Geographic Research
- Geographic applications
- Geographic Data Warehousing
- Spatial Data Warehousing
- Differences between SDW & DW
- Types of Spatial Data Mining
A) - (SDM) Spatial Classification
B) - Spatial association
Georeferenced multimedia
Geo-multimedia database system stores and manages large collections of
georeferenced multimedia objects such as audio, image, and video, including
metadata about where and when the media was collected, or the locations and
times described by the media with most basic features and structures (Han and
Kamber 2001).
Structure of Geographic
Information
Georeference should be
unambiguous, unique and
shared (agreed)
Metrics (measures of distance from fixed
places)
Latitude, Longitude (distance from the
Equator or from the Greenwich Meridian)
XY map coordinates
Ordinal
street addresses in most parts of the world
order houses along streets
Nominal (placenames)
administrative units, municipalities,
regions, cities,)
Structure of Geographic
Information
Control points: points which provide the
connection to the national geodetic
coordinate system
Land cover: buildings, roads, hydrology
Single objects: walls, masts, bridges, etc.
Heights: digital terrain model
Local names: place names, locality names
Ownership: land parcels
Pipelines: oil and gas
Territorial boundaries: municipal, district
landslips: ground movement
Building addresses: geographical locations
(road or street name, house number, postal
code, locality name)
Structure of Geographic
Information
Geographical objects models
Geographic Information
Systems (GIS)
A Data warehouse + specific geographic operations
References (1)
1- Barnett, V. and Lewis, T. (1994) Outliers in Statistical Data, John Wiley.
2- Bdard, Y., Merrett, T. and Han, J. (2001) Fundamentals of spatial data
warehousing for geographic knowledge discovery, in H. J. Miller and J. Han
(eds.) Geographic Data Mining and Knowledge Discovery, London: Taylor and
Francis, 53-73
3- Cmara, A. S. and Raper, J. (1999) (eds.) Spatial Multimedia and Virtual
Reality, London: Taylor and Francis.
4- Ester, M., Kriegel, H.-P. and Sander, J. (1997) "Spatial data mining: A
database approach," M. Scholl and A, Voisard (eds.) Advances in Spatial
Databases, Lecture Notes in Computer Science 1262, Berlin: Springer, 47-66.
5- Ester, M., Kriegel, H.-P.and Sander, J. (2001) Algorithms and applications
for spatial data mining, in H. J. Miller and J. Han (eds.) Geographic Data
Mining and Knowledge Discovery, London: Taylor and Francis, 160-187.
6- J. P. Wilson and A. S. Fotheringham (eds.) Handbook of Geographic
Information Science, in press. Geographic Data Mining and Knowledge
Discovery
References (2)
7- Fayyad, U. M., Piatetsky-Shapiro, G. and Smyth, P. (1996) From data
mining to knowledge discovery: An overview in U.M. Fayyad, G. PiatetskyShapiro, P.
8- Smyth and R. Ulthurusamy (eds.) Advances in Knowledge Discovery and
Data Mining, Cambridge, MA: MIT Press, 1-34.
9- Fotheringham, A. S. and Wong, D. W. S. (1991) "The modifiable areal unit
problem in multivariate statistical analysis," Environment and Planning A, 23,
1025-1044.
10- Frank, A. (2001) Socio-economic units: Their life and motion, in A. Frank,
J. Raper and J.-P. Cheylan (eds.) Life and Motion of Socio-economic Units,
London: Taylor and Francis, GISDATA 8, 21-34.
11- Gahegan, M. (1999) Four barriers to the development of effective
exploratory visualisation tools for the geosciences, International Journal of
Geographical Information Science 13 289-309
References (3)
12- Gahegan, M. (2000) "On the application of inductive machine learning
tools to geographical analysis," Geographical Analysis, 32, 113-139
13- Gahegan, M., Wachowicz, M. Harrower, M. and Rhyne, T.-M (2001) The
integration of geographic visualization with knowledge discovery in databases
and geocomputation, Cartography and Geographic Information Systems, 28,
29-44.
14- Gopal, S., Liu, W. and Woodcock, X. (2001) Visualization based on fuzzy
ARTMAP neural network for mining remotely sensed data, in H. J. Miller and
J. Han (eds.)
15- Geographic Data Mining and Knowledge Discovery, London: Taylor and
Francis, 315-336.
16- Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao,
M., Pellow, F. and Pirahesh, H. (1997) "Data cube: A relational aggregation
operator generalizing group-by, cross-tab and sub-totals," Data Mining and
Knowledge Discovery, 1, 29-53.
17- Han, J. and Kamber, M. (2003) Data Mining: Concepts and Techniques,
San Francisco: Morgan Kaufman.
References (4)
18- Han, J., Kamber, M. and Tung, A. K. H. (2001) Spatial clustering methods
in data mining: A survey, in H. J. Miller and J. Han (eds.) Geographic Data
Mining and Knowledge Discovery, London: Taylor and Francis, 188-217.
19- Healy, R., Dowers, S., Gittings, B. and Mineter, M. (1998) (eds.) Parallel
Processing Algorithms for GIS, London: Taylor and Francis.
20- Hipp, J., Gntzer, U. and Nakhaeizadeh, G. (2000) "Algorithms for
association rule mining: A general survey and comparison," SIGKDD
Explorations, 2, 58-64.
21- Huang, Y. Shekhar, S. and Xiong, H. (2000) Discovering co-location
patterns from spatial datasets: A general approach, Draft manuscript available
at
http://www.cs.umn.edu/research/shashi-group/
22- Keim, D. A. and Kriegel, H.-P. (1994) "Using visualization to support data
mining of large existing databases, " in J. P. Lee and G. G. Grinstein (eds.)
Database Issues for Data Visualization, Lecture Notes in Computer Science
871, 210-229.
References (5)
23- Koperski, K., Han, J., and Stefanovic N. (1998) An efficient two-step
method for classification of spatial data, Proceedings of the Spatial Data
Handling
Conference, Vancouver, Canada.
24- Mesrobian, E, Muntz, R., Santos, J. R., Shek, E., Mechoso, C. R., Farrara, J.
D. and Stolorz, P. (1994) "Extracting spatio-temporal patterns from geoscience
datasets, IEEE Workshop on Visualization and Machine Visualization, Los
Alamitos, CA: IEEE Computer Society Press, 92-103.
25- Mesrobian, E, Muntz, R., Shek, E., Nittel, S., La Rouche, M., Kriguer, M.,
Mechoso, C., Farrara, J., Stolorz, P. and Nakamura, H. (1996) "Mining
geophysical data for knowledge," IEEE Expert, 11(5), 34-44.
26- Miller, H. J. and Han, J. (2001) Geographic data mining and knowledge
discovery: An overview," in H. J. Miller and J. Han (eds.) Geographic Data
Mining and Knowledge Discovery, London: Taylor and Francis, 3-32.
27- Miller, H. J. and Wentz, E. A. (2003) Representation and spatial analysis
in geographic information systems, Annals of the Association of American
Geographers, 93, 574-594.
References (6)
28- Ng, R. (2001) Detecting outliers from large datasets, in H. J. Miller and J.
Han (eds.) Geographic Data Mining and Knowledge Discovery, London: Taylor
and Francis, 218-235.
29- Qi, F. and Zhu, A.-X. (2003) Knowledge discovery from soil maps using
inductive learning, International Journal of Geographical Information Science,
17, 771- 795.
30- Roddick, J. F. and Lees, B. (2001) Paradigms for spatial and spatiotemporal data minig, in H. J. Miller and J. Han (eds.) Geographic Data Mining
and Knowledge Discovery, London: Taylor and Francis, 33-49.
31- Shekhar, S. and Chawla, S. (2003) Spatial Databases: A Tour, Upper Saddle
River, N. J.: Prentice-Hall.
32- Shekhar, S., Lu, C. T., Tan, X., Chawla, S. and Vatsavai, R. R. (2001) Map
cube: A visualization tool for spatial data warehouses, in H. J. Miller and J.
Han (eds.) Geographic Data Mining and Knowledge Discovery, London: Taylor
and Francis, 74-109.
References (7)
33- Shekhar, S. Zhang, P., Huang, Y. and Vatsavai, R. (2003) Trends in spatial
data mining, in H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), Data
Mining: Next Generation: Challenges and Future Directions, AAAI/MIT Press,
in press Shekhar, S., Lu, C. T. and Zhang, P. (2003) A unified approach to
detecting spatial outliers, GeoInformatica, 7, 139-166.
34- Shekhar, S., Lu, C. T., Zhang, P. and Liu, R. (2002) Data mining for selective
visualization of large spatial datasets, Proceedings of 14th IEE International
Conference on Tools with Artificial Intelligence (ICTAI '02), IEEE Press.
35- Smyth, C. S. (2001) Mining mobile trajectories, in H. J. Miller and J. Han
(eds.) Geographic Data Mining and Knowledge Discovery, London: Taylor and
Francis, 337-361.
36- Wang, S. and Armstrong, M.P. (2003) A quadtree approach to domain
decomposition for spatial interpolation in grid computing environments,
Parallel Computing, 29, 1481-1504.
37- Wolfson, O. (2002) Moving objects information management: The database
challenge, Proceedings, 5 th workshop on Next Generation Information
Technologies and Systems, Caesarea, Israel, June 25-26.
Books
- Geographic Data Mining and Knowledge Discovery
By Harvey Miller and Jiawei Han
- Knowledge Discovery in Spatial Data By Yee Leung
Questions?!
Thank you for listening