Academia.eduAcademia.edu

Outline

A Visual Framework for Spatial Data Mining

https://doi.org/10.32913/MIC-ICT-RESEARCH.V3.N12.319

Abstract

The  unique properties of spatial data provide challenges  and  opportunities  for  researching  new methods  in  spatial  data  mining.  In  this  article,  we propose  an  interoperable  framework  that  integrates Geographic  Information  System  (GIS)  with  the  spatial data  mining  processto  facilitate  spatial  data preparation,  to  extract  spatial  relationships  that  can take  advantage of traditional data  mining toolkits such as Weka, and to reveal significant spatial patterns. With this approach, it’svery straightforward to adopt spatial access methods and spatial query processing algorithms foran  efficient  data  mining  technique.  Moreover,  our framework  visually  supports  the  complete  spatial  data mining process.

&T c TT ộ Volume E-3 No. 8 (12) T N thu A Visual Framework for Spatial Data Mining í C ền Nguyen Vinh Nam (1), Le Hoai Bac (2) (1) Vietnam Informatics and Mapping Corporation Ho Chi Minh City, Vietnam ch y (2) Faculty of Information TechnologyUniversity of Science, VNU - HCMC, Vietnam p qu Email: [email protected], [email protected] Abstract -The unique properties of spatial data provide The KDD process consists of an iterative sequence Tạ n challenges and opportunities for researching new methods in spatial data mining. In this article, we of the five major steps [4]: selection, preprocessing, Bả propose an interoperable framework that integrates transformation, data mining and evaluation/ Geographic Information System (GIS) with the spatial interpretation. Data are prepared for mining in steps 1 data mining processto facilitate spatial data through 3. For non-spatial databases, the data preparation, to extract spatial relationships that can take advantage of traditional data mining toolkits such preparation process requires between 60 and 80 as Weka, and to reveal significant spatial patterns. With percent of the time and effort in the whole KDD [5]. this approach, it’svery straightforward to adopt spatial For spatial databases, this problem increases access methods and spatial query processing algorithms significantly because of the unique properties of foran efficient data mining technique. Moreover, our spatial data such as spatial dependency, spatial framework visually supports the complete spatial data mining process. heterogeneity, and data type variety. Keywords - Spatial data mining, Visual data mining, GIS Geographic Information System - GIS has a long history of being used as a tool to capture, to store, to 1. INTRODUCTION check, to integrate, to manipulate, to analyze and to With advanced data collection techniques such as visualize spatial attribute and statistical data [1]. The remote sensing, census data acquiring, weather and availability of features such as spatial and non-spatial climate monitoring etc. contemporary geographic data query and selection, topology (defining and datasets contain an enormous amount of data of enforcing data integrity rules), classification, map various types and attributes. These data are stored in overlays, network analysis, and thematic map Relational Database Management Systems (RDBMS) creation, makes GIS a useful tool for spatial data (e.g. Oracle Spatial, IBM DB2 Spatial Extender, mining. Visualization by GIS gives user the ability to MySQL 5, SQL Server 2008 and later, PostGIS), or in spot spatial errors that are often omitted by analyzing specific formats of commercial GIS applications (e.g. raw data, and to aid visual analysis and detection of MapInfo, ArcGIS, Intergraph). Spatial Data Mining spatial distributions and their patterns. Visualization is (SDM) can be an appropriate technique for detecting a powerful strategy for integrating high-level human possible interesting patterns in geographic datasets. intelligence and knowledge into the KDD process. SDM is a knowledge discovery process of extracting Therefore, the integrating of GIS and data mining implicit interesting knowledge, spatial relations, or techniques brings promises of a solution to many other patterns not explicitly stored in databases [2, 3]. challenges in spatial data mining. The ever increasing and diverse nature of spatial data In this paper, we propose a framework to integrate has created new challenges for data mining: GIS into the steps of data mining process to improve interactions with spatial databases, spatial data the efficiency of knowledge discovery in spatial processing, visualization and modeling discovered databases. knowledge etc... Data mining has been widely treated as a synonym of Knowledge Discovery in Databases The rest of the paper is organized as follows. (KDD). Section 2 describes main modules of our framework architecture. Section 3 presents the experimental 67 &T c TT ộ Research, Development on Information and Communications Technology T N thu results. Finally, we discuss conclusions and future 2.2. GIS work in section 4. Spatial data mining deals with spatial data that 2. FRAMEWORK ARCHITECTURE includes numbers, categories, and extended objects í C ền such as lines, points and polygons. Moreover, it works with implicit spatial predicates (e.g. touch, contain) ch y and high autocorrelation among nearby features. So, p qu the embedding of GIS in spatial datamining helps facilitate spatial data preparation, spatial data visualization and interesting pattern discovery by Tạ n creating thematic maps (see Fig.2). A spatial data mining system prototype,GeoMiner,is such system Bả that is implemented on the top of MapInfo Professional 4.1 commercial GIS software [9]; Fig. 1. The framework architecture however, it is no longer available due to practical purposes. The framework is designed to integrate GIS into spatial data mining process with open, highly In our framework, the GIS module is dotSpatial, an extensible architecture (e.g. plugin model). It has five open source geographic information system library major modules described in Fig. 1. that is used to incorporate spatial dataanalysis and mapping into spatial data mining process [10]. This 2.1. Spatial Database Connectivity GIS library supports plugin architecture so we can A huge amount of spatial data is stored in various develop GIS extensions as well as specialized spatial formats. These formats may be closed or opened, data mining algorithms. normalized or not. Consequently, preparation and/or analysis may be difficult or time-consuming due to the heterogeneity of formats. To avoid these drawbacks and in order to facilitate accesses and usage of these data,we use FDO (Feature Data Object) Data Access Technique as a major module in our framework for storing, retrieving, updating, and analyzing GIS data. FDO uses a provider-based model for supporting a variety of geospatial data sources, where each provider typically supports a particular data format or data store [6]. FDO Geometry is based on the OpenGIS Simple Features Implementation Specification for SQL [7] which are implemented by Fig. 2 GIS visualize spatial data and most spatial databases. The FDO XML format for interesting patterns by thematic map schema is based on the Open GIS Consortium Each feature class (layer) consists of a spatial Geography Markup Language [8]. FDO is a free, open attribute (geometry column) and one or many non- source software. With this open standard approach, spatial attributes. The latter can be properties describe we can manipulate most available spatial databases their spatial object, result of spatial predicates or and can expand as needed. In current release, FDO interesting patterns found. They are stored in table works with popular geospatial data format such as format and are linked with spatial records by feature SDF, SHP, ArcSDE, WFS, WMS, ODBC, MySQL, identifier (Fid) as shown in Fig.3. Then, these GDAL, OGR, SQL Server Spatial, SQLite Spatial. attributes are used as controlling parameters to create thematic maps for data analysis. 68 &T c TT ộ Volume E-3 No. 8 (12) T N thu revealing meaningful information of spatial objects, with a particular interest in their relationships. There are three basic types of spatial relations: í C ền a. Topological relations are invariant under topological transformations, i.e. they are preserved if ch y both are translated, scaled or rotated simultaneously. p qu The topological relations are derived from the nine intersection model of the interiors (denoted by ), Fig. 3. Linking spatial data with non-spatial data the boundaries (denoted by ) and the complements Tạ n (denoted by ) between two objects. In particular, In our framework, GIS is used in the three stages Bả of spatial data mining process: given two geometries A and B, the nine possible a. Data Preparation intersections defining the relation between these geometries are represented by the 9-intersection  Spatial data preprocessing matrix: o Data selection o Removal of noise, outliers and dependences o Data transformation: convert into the These relationships can be classified intoEquals, right format for spatial data mining Contains, Within, Crosses, Disjoint, Overlaps and process Touches. In general, the spatial operations for o Data projection relationship extraction are computationally expensive, o Data aggregation (e.g. buffering) so theyshould be done prior to data mining.  Spatial Relationships Extraction (e.g. Fortunately, these relations can be extracted into touch, nearby, overlap, contain) attribute table by spatial queries supported by both b. Data mining GIS module and SDBMS. Topological relations are  Use GIS functionalities for an efficient often used as spatial predicates in spatial association implementation of spatial data mining rules. For example: algorithms (e.g. spatial index, spatial feature extraction) b. Distance relations are based on distance c. Data & Patterns Analysis between two spatial objects. Distance function  Usevisual techniques for data analysis and depends on the spatial datatype used (e.g. geography, pattern visualization: geometry). o Map based techniques c. Direction relations deal with where spatial o Chart based techniques objects are located in space. In literature, each spatial o Projection techniques object is assigned a representative point, then these o Pixel techniques points are used to determine direction relation o Iconographic techniques between two objects [11]. For example, B northeast A o Network methods holds, . With this approach, direction relation depends on 2.3. Spatial Relationships Extractor representative point selection for spatial object, so it Spatial relationships are usually not explicitly is not often unique. stored in spatial data repositories, so they must be computed and materialized with spatial operations. We propose a new method for calculating exact Spatial relationships extraction poses the challenge of direction relation between any two shaped objects. Our algorithm is depicted as follows: 69 &T c TT ộ Research, Development on Information and Communications Technology T N thu  Find two external tangents of two data mining algorithms on features extracted. geometries Researchers themselves may also create new models  Find intersection of found tangents as needed using our plugin architecture. í C ền  Construct bisector of the angle between  Develop specialized spatial data mining these tangents algorithms can be directly applied to spatial and non- ch y  Determine the direction for each geometry spatial data. This approach can dynamically exploit object corresponding to their position along p qu the spatial relationships during discovery process and this bisector. flexibly uses spatial knowledge. These algorithms can Fig.4 shows some pairs of geometries and be developed directly in our GIS environment. Tạ n constructs bisector for each case: point and point, 3. EXPERIMENTS point and polygon, point and line, line and line, line Bả and polygon, and polygon and polygon. In thissection, we consider some examples of spatial data mining using our framework: input data and output data of specialized spatial data mining algorithms are stored as vector layers or attribute tables linked to some given vector layers, so they can be visualized to analyse revealed patterns. 3.1. Analysiswith Touch relation (Topological relation) In this example, the user works with a layer of province boundaries of Vietnam. He wants to create some thematic map of adjacent provinces. We Fig. 4. Calculating Direction relations implemented an algorithm [14] as a plugin to extract adjacent relations and store them in LeftFace and 2.4. Data Conversion RightFace fields of VPF model, see Fig. 5. This module converts non-spatial data stored in attribute table to input format of available traditional data mining toolkits (e.g. Weka) and vice versa. 2.5. Data mining In our framework, spatial data mining can be performed in two ways:  Extract and materialize spatial relationships in non-spatial data then apply traditional data mining toolkits or by spatial data mining. We construct models to store interest spatial relationships, then these relationships are extracted and converted to the Fig.5. Adjacent relationships among provinces input format (e.g. table) of traditional data mining are stored in a table toolkits. Currently, we support building some spatial Then, he uses thematic map engine to display relationship models, such as VPF (Vector Product each province with a color corresponding to number Format) to store adjacent spatial relationships for of provinces touching it (see Fig.6). Fig.7 (left) shows polygon objects, TIN (Triangulated Irregular provinces that have number of adjacent provinces Network) and CDT (Constrained Delaunay greater than three. Fig.7 (right) shows provinces that Triangulation) to store neighbor relationships for have number of adjacent provinces greater than five. point objects. This approach allows reusing standard 70 &T c TT ộ Volume E-3 No. 8 (12) T N thu í C ền ch y p qu Tạ n Bả Fig.9. Defects correspond to the thresholds: 200pixels To analyse defects, the user defines a defect Fig. 6 Thematic map about adjacent provinces threshold to distinguish real circuits from defects in the mainboard, then uses thematic map tool to highlight defects as you see in Fig. 9 and Fig.10. Fig.7. Provinces adjacent more than 3 (left) and 5 (right) provinces 3.2. Analysis with Clustering We developed some spatial clustering algorithms that could directly access spatial data [12, 13]. These algorithms used spatial indexing techniques such as Fig.10 Defects correspond to the thresholds: 50 pixels R-tree and Quad-tree to improve clustering effectively. In this example, the input data is a bitmap 4. CONCLUSTION AND FUTURE WORK of circuit diagram of computer mainboard; our This paper proposes a framework for integrating clustering algorithms were used to detect defects. In GIS with spatial data mining process. Our approach Fig. 8, clusters are colorized using thematic map tools. offers the advantage of:  Visualizing spatial data mining steps  Visualizing interesting patterns with thematic map toolkit  Dynamically exploiting the spatial features in the discovery process  Reusing traditional data mining algorithms  Extracting and analyzing interesting patterns from large spatial databases with our visual framework. In the future, we will study to integrate available Fig.8. Clusters are found by our algorithm methods for pattern visualization in spatial data 71 &T c TT ộ Research, Development on Information and Communications Technology T N thu mining andintegrate workflow into our framework to AUTHORS' BIOGRAPHIES automate discovery process. Nguyen Vinh Nam is Master of Computer Science, University of Natural Sciences, í C ền year 2005. Now, he is working at Vietnam REFERENCES Informatics and Mapping Corporation [1] Shahab Fazal, GIS Basics, New Age, 2008. (vietbando). ch y [2] Koperski, K., Adhikary J., Han, J. “Knowledge His research interests are GIS, Spatial Data p qu discovery in spatial databases: Progress and Mining. Challenges”, In Proceedings of the SIGMID workshop on research issue in data mining and knowledge discovery, Technical report 96-08. University of Le Hoai Bac is Doctor of Computer Tạ n British Columbia, Vancouver, Canada, 1996. Science, University of Natural Sciences, year 1999. Now he is working at Faculty of Bả [3] KoperskiK., Han J.,“Discovery of Spatial Association Rules in Geographic Information Databases”, Proc. Information TechnologyUniversity of 4th Int. Symp. on Large Spatial Databases, 47-66. Science, VNU - HCMC, Vietnam Berlin: Springer, 1995. His research interests are Artificial [4] Jiawei Han, Micheline Kamber, Jian Pei,Data mining: Intelligence, Soft Computing, Data mining Concepts and Techniques, Elsevier, 2012. and Knowledge Discovery. [5] Adriaans, P. and Zantinge, D., Data mining, Addison Wesley Longman, Harlow, England, 1996. [6] http://fdo.osgeo.org/ [7] http://portal.opengeospatial.org/files/?artifact_id=829 [8] http://portal.opengeospatial.org/files/?artifact_id=1034 [9] Han J., Koperski K., Stefanvic N.,“GeoMiner: a system prototype for spatial data mining”, In Proceedings of the ACM-SIGMOD international conference on Management of Data (SIGMOD’97). ACM Press, Tucson, AR, 553-556. 1997. [10] http://dotspatial.codeplex.com [11] Martin Ester, Hans-Peter Kriegel, Jörg Sander, Algorithms and Applications for Spatial Datamining, Research Monographs in GIS, Taylor and Francis, 2001. [12] Nam Nguyen Vinh, Bac Le. “Incremental Spatial Clustering in Data Mining using Genetic Algorithm and R-tree”. SEAL 2012, LNCS 7673, pp. 270–279, 2012. [13] Nam Nguyen Vinh, Bac Le. “Simple Spatial Clustering Algorithm based on R-tree”. MIWAI 2012. [14] Nam Nguyen Vinh, Bac Le. “Constructing and modeling parcel boundaries from a set of lines for querying adjacent spatial relationships”. ICCSA 2013. 72

References (10)

  1. Shahab Fazal, GIS Basics, New Age, 2008.
  2. Koperski, K., Adhikary J., Han, J. "Knowledge discovery in spatial databases: Progress and Challenges", In Proceedings of the SIGMID workshop on research issue in data mining and knowledge discovery, Technical report 96-08. University of British Columbia, Vancouver, Canada, 1996.
  3. KoperskiK., Han J.,"Discovery of Spatial Association Rules in Geographic Information Databases", Proc. 4th Int. Symp. on Large Spatial Databases, 47-66. Berlin: Springer, 1995.
  4. Jiawei Han, Micheline Kamber, Jian Pei,Data mining: Concepts and Techniques, Elsevier, 2012.
  5. Adriaans, P. and Zantinge, D., Data mining, Addison Wesley Longman, Harlow, England, 1996.
  6. Han J., Koperski K., Stefanvic N.,"GeoMiner: a system prototype for spatial data mining", In Proceedings of the ACM-SIGMOD international conference on Management of Data (SIGMOD'97). ACM Press, Tucson, AR, 553-556. 1997.
  7. Martin Ester, Hans-Peter Kriegel, Jörg Sander, Algorithms and Applications for Spatial Datamining, Research Monographs in GIS, Taylor and Francis, 2001.
  8. Nam Nguyen Vinh, Bac Le. "Incremental Spatial Clustering in Data Mining using Genetic Algorithm and R-tree". SEAL 2012, LNCS 7673, pp. 270-279, 2012.
  9. Nam Nguyen Vinh, Bac Le. "Simple Spatial Clustering Algorithm based on R-tree". MIWAI 2012.
  10. Nam Nguyen Vinh, Bac Le. "Constructing and modeling parcel boundaries from a set of lines for querying adjacent spatial relationships". ICCSA 2013.