Vous êtes sur la page 1sur 20

A PROJECT REPORT ON CLUSTER ANALYSIS IN SAGA GIS Submitted by Dinesh Kumar Baghel (Reg. No.

2011gi03) (course code CE351) (Specialization- GIS & Remote Sensing) for the partial fulfillment of mini project for the semester under the guidance of Dr. Varun Singh (Astt. Prof. CED dept. MNNIT ,allahabad)

MOTILAL NEHRU NATIONAL INSTITUTE OF TECHNOLOGY & MANAGEMENT, ALLAHABAD, INDIA-211004

Table of Content S.No. 1. 2. 3. 4. 5. 6. 7. Title Introduction Software Overview Theory Software flow chart Conclusion List of Deployed Real world application References Page no. 1 2 7 12 16 17 18

Chapter 1

INTRODUCTION

Cluster Analysis encompasses many diverse techniques for discoveing structure within complex bodies of data. In a typical example, one has sample of data units (subjects, person, cases) each described by scores on selected variables (attributes, characteristics, measurements ). The objective of to group either the data units or variables into clusters such that elements within a cluster have a high degree of natural association among themselves while the clusters are relative distinct from ont another. Searching the dara for a structure of natural grouping is an important exploratory technique. The most important techniques fordata classification are (1) Cluster Analysis (2) Discriminant Analysis Although both cluster and discriminant analysis classify objects into categories, discriminant analysis requires one to know group membership for the cases use to decide the classification rule whereas in cluster analysis group membership for all cases is unknown. In addition to membership, the number of groups is also generally unknown. Cluster analysis is more primitive technique in that no assumption are made concerning the numbers of groups or groups structure. Grouping is done on the basis of similarities or distances ( dissimilarities ). Thus in the case of cluster analysis the inputs are similarly measures or the data from which these can be computed. The definition of similarity or homogeneity varies from analysis to analysis and depends on the objective of study.

Chapter 2

SOFTWARE OVERVIEW

2.1

What is SAGA?
SAGA is abbreviated for System for Automated Geoscientific Analysis SAGA is a free open source (FOSS) Geographic Information System software and has been designed for ana easy and effective implementation of spatial algorithms through easy approachable user interface with many visualization options. It runs under windows and linux operating system. It is written in C++ programming language. Program Code relies on the GNU General Public License(SAGA API) and GNU General Public License(SAGA GUI, CMD and most of the modules). With deep knowledge of C++ programming language you can even modify the SAGA API to create a tailored version that will fulfill all of your necessities.

2.2

License Issues
SAGA is a free open source software , which generally means that you have the

freedom . * to run the program , for any purpose, * to study how the program works and to modify it, * to redistribute copies, * to improve the program and release the improvements to the public. Except SAGA API which uses LGPL most of the source codes of SAGA have been licensed under GPL.

2.3 A short history of the SAGA development


The idea of SAGA evolved in the late 1990s during the work on several research

projects with a focus on rasteer analysis of DEM of the Dept. for physical Geography, Gottingen. The core group resposible for the developement of methods involve namely J.Bohner, O.Conard, R.Kothe and A.RIngeler was very heterogenous with regard to preferred OS, programming languages , development environments, data formats and so on that lead to the common developed platform of SAGA.

2.4

Milestones
In 2001 SAGA development begins. In 2002-2003 becomes a common tool of the team around J.Bohner. In February 2004, SAGA 1.0 is published as Open Source Software and in july SAGA 2

development begins. In 2005, March SAGA2 runs under linux.

2.5

Software
SAGA is coded in the widespread and powerful c++ programming language and has

an object oriented system design.Due to the use of cross platform GUI library wxWidgets for user interface functionality you can run SAGA with MS-Windows as well as with Linux.

2.6

System Architecture

SAGA's system architecture is modular. Its foundation is its API, which provides data object modules, basic definitions for the programming of scientific modules and numerous helpful classes and functions. Module libraries aare conatainer for for the scientfifc methods in form of modules. API aas well as module libraries are not indeoendently running exe but Dynamic Link Libraries(DLL) and have to be accessed through a front end program. A GUI is one of the two SAGA front ends. It allows the user to control the system, is responsible for module and data managemnet as well as for data visualisations. Alternatively modules can be executed by using the second front end, the SAGA command line interpreter tool.

SAGA GUI

SAGA Command Line Interpreter(CMD)

2.6

SAGA Limitations
SAGA is a hybrid GIS used for analysis of raster and vector layer, but it is mainly developed to process raster data so spatial analysis of vector data is not so efficient as for raster data.

Chapter 3
3.1

THEORY AND ANALYSIS FUNCTIONS


Theory
Image classification is the process of converting continuous reflectance values into categories representing land cover or surface condition. This processis commonly associated with satellite digital imagery. Image classification s the process of analyzing multiple band reflectance values for a specific small area of the earth's surface in order to identify the land cover or a condition that it represents. There are two general image classification approaches used to classify satellite digital

imagery: supervised and unsupervised. Supervised classification starts with a set of training sites. The training sites represent ground truth; these are areas that have a known land cover or condition. The spectral return values for these cells, i.e. , the digital values representing the reflectance values for each spectral band of the sensor provide the basic data for establishing each land cover class mean and variance. The procedure evaluates the rest of the image cells to determine which of the known land cover classes each cell fits best. The unsupervised classification procedure uses a cluster analysis algorithm to identify clusters of values that seem distinct based on the spectral reflectance values for each spectral band of sensor. Mathematically, the clusters approach is performs using an iterative series of passes to minimize the within cluster homogeneity and mixture the between cluster heterogeneity.

3.2

Unsupervised classification
The Imagery- classification/cluster Analysis for grids module in SAGA supports the

unsupervised process.

The unsupervised classification relies on the software to statistically analyze the input multiple bands( or multiple grid data layers in the case of using non-image files) without benefr of what the different spectral or numeric combinations represent. Using the SAGA Cluster Analysis for Grids module the user specifies the number of classes(the default size is 10) to analyze. The classification proceeds using ststistical rules that are applied to the various multi-band pixels or multi-grid data layer data values. When module execution finishes, the resulting classes must be interpreted and related to ground features or grid layer theme to assign class names.

There are various steps that you can follow to perform cluster analysis in SAGA. These steps are listed as follows :(1) (2) (3) (4) (5) (6) (7) (8) Load the Grid data layer on which you want to perform cluster analysis. After loading to show it in workspace area double click on it and perform the necessary functions. Go to the Module tab in workspace window of from the menu bar Module. Go to the Image classification module category of the loaded module library. Now click on the lower small triangle to explore it. From the drop down list double click on cluster analysis for grids. It will open the above shown window named cluster analysis for grids. It shows grids, tables and options datasets and their corresponding attributes and associated value space on the left and right side respectively. 8.1 the grid system parameter is used to choose the grid system in the case of multiple grid system. 8.2 the '>>grid' parameter will be the set o grid data layer representing spectral reflectance in various bandwidth areas. You can choose either one or multiple of them. 8.3 the '<<create' is the name of the output clustered grid data layer. 8.4 You can order the list of data layer but the order of selected layers is not important for how this module executes. After selecting click on OK . 8.5 the default for the '<<clusters' label is [create]. You can replace it if you want to overwrite the content of existing grid data layer. (9) The tabular output for the module is a statistical table for providing descriptive statistics for each cluster or class created by the module.

Statistics output from the cluster analysis for Grids modules (9) There are four parameters listed in the 'options' section of the settings: 'Method', 'cluster', 'normalize', and 'update view'. 9.1 The method has three procedure for the cluster analysis... (a) Iterative Minimum Distance (Forgy 1965) (b) Hill-Climbing (Rubin 1967) (c ) Combined Minimum Distance/Hill Climbing the default method is hill-climbing. 9.2 Clusters Its value indicate the number of classes you want the module to generate, default is 10. Its value can change depending on the feature characteristics and representation depends on the environment you are applying the module. 9.3 Normalise The 'normalise' nd 'update view' parameters are controlled by check boxes to automatically normalize using the standard data deviations prior to the beginning of clustering process, but it increases the execution time of the module.

' Update view' option displays a map window displaying the output grid data layer to display in work area and gets refreshed as the module execution progresses in passes. (10) Object Properties Window 10.1 Color Coding Scheme The following figure shows the color coding scheme that is used for cluster analysis with a description in fornt of them but you can also specify your own description.

10.2 Description
It shows the various properties of the selected area in the data tab of the workspace window such as Name,Description,Projection,Cells,Memory etc.

10.3 History
It shows corresponding to the output layer the module used, methods and various parameter values of the option part of the module execution, input data and the location used. the other option used is Attributes which shows its values depending on the analysis used.

Chapter 4

INPUT/OUTPUT
The flow chart for the cluster analysis in SAGA is shown as below: START (STAR SAGA GUI)

INPUT THE GRID LAYER ON WHICH YOU WANT TO PROCESS CLUSTER ANALYSIS

SET THE PARAMETERS IN CLUSTER ANALYSIS FOR GRID MUDULE OF IMAGE CLASSIFICATION CATOGORY OF MODULE LIBRARIES

NOW CLICK OKAY TO PROCESS THE INPUT GRID DATA LAYER USING THE SPECIFIED METHOD

OUTPUT GENERATED?

No

Yes DISPLAY OUTPUT

END

INPUT GRID IMAGE:-

OUTPUT PART-1 OUTPUT CLUSTERED IMAGE WITH NUMBER OF AND THE LEGEND OF COLORS SHOWN:CLUSTERS=10

OUTPUT PART-2 THIS PART IS SHOWING THE OUTPUT TABLE GENERATED.

Chapter 5

CONCLUSION

This report shows how to use SAGA gis modules to achieve the clusters of an grid image. The results shows the clusters of the catchment area with the no of clusters=10 which can be changed depending on the number of features used.There are some confusion in interpreting the colors but this can be changed by using the appropriate color scheme according to the user as I have done in my project. The algorithm that I have used is hill climbing best suited for finding clusters of near by pixels. In the end the achieved results with the set parameters shows that SAGA gis is good for cluster analysis.

Chapter 6

LIST OF DEPLOYED REAL WORLD APPLICATION

ETC/ACC (2009) has applied the concept in Europe's offshore and onshore wind energy potential. Tony H. Grubesic has applied the concept in Detecting Hot Spot Using Cluster Analysis and GIS. Ece AKSOY, Turkey (2006) has applied the concept in Clustering With GIS: An Attempt To Classify TURKISH District Data. ACSM/ALTA (2005) Minimum Standard Detail Requirements for ALTA/ACSM Land Title Surveys

Chapter 7

REFERENCES

CLUSTER ANALYSIS, AMRENDER KUMAR,I.A.S.R.I,LIBRARY AVENUE SAGA USER GUIDE, VOL. 1,CLIMMERY, VERSION 2.0.5.20100823 SAGA USER GUIDE, VOL. 2,CLIMMERY, VERSION 2.0.5.20100823 ALDENDERFER , M. &R. BLASHFIELD.1984, CLUSTER ANALYSIS,BEVERLY HILLS, SAGE PUBLICATION WWW.SAGA-GIS.ORG WWW.SOURCEFORGE.NET

Vous aimerez peut-être aussi