Geographic/Spatial Data Modelling
Define data model
A data model is a set of concepts or scientific
theories for describing and representing parts of the
real world in a digital computer system.
Longley et al (2005)
Recommended readings: Geographical Information Systems and Science 2nd
Edition, from Chapter 8, Pages 177-196.
INTRODUCTION TO GEOGRAPHIC I NFORMATION SYSTEMS, from Chapters 3& 4,
Pages 46-73
Importance of geographic data models
Data models are the heart of GIS – They are a very important
component of a GIS system.
Enables a computer to represent real geographical elements as
graphical elements.
They show the spatial relationships that exists amongst phenomena or
geographic features ... Eg. A geographical data model can show the
relationship between elevation and temperature in a Forest in
Chipinge.
Levels of data model abstraction
Geographic features in the real world are abstractly presented in a map
form (abstraction means generalization or simplification). There are four
main levels of data model abstraction;
Reality –made up of real-world phenomena (buildings, streets, lakes)
Conceptual model – a human-orientated, partially structured model of
selected objects and phenomena relevant to a particular problem domain.
Logical model –Implementation-orientated representation of reality. It
is often represented as a diagram showing the selected objects and
relationships between them (Spatial Data model)
Physical model - describes the exact files or database tables used to
store the data, etc. It is specific to a particular implementation (Database
model).
Geographic data models - Vector
Vector data models use points and their associated X, Y coordinate pairs
to represent the vertices of spatial features, much as if they were being
drawn on a map by hand (Aronoff 1989).
A representation of the world using points, lines, and polygons:
Geographic data models - Vector
Points -are zero-dimensional objects that contain only a single coordinate
pair. Typically used to model singular, discrete features such as buildings,
wells, power poles. Other types of point features include the node (where
two or more lines join) and the vertex (a section of line running between
nodes). Each arc is made up from straight line segments running between
adjoining points (or vertices).
Lines - one-dimensional features composed of multiple,
explicitly connected points. Lines are used to represent
linear features such as roads, streams, faults, boundaries
Polygons - two-dimensional features created by multiple
lines that loop back to create a “closed” feature. Used to
represent features such as city boundaries, geologic
formations, lakes
Arc/Node Structures
Polygons can be defined as a series of arcs.
Arcs can be defined as a series of segments.
Arcs/lines have starting and ending nodes
The different types of data can be stored in separate files, linked
together by pointers.
Example
https://www.geo.umass.edu/
courses/geo494a/
Chapter2_GIS_Fundamentals.p
df
Attribute Data
• Attribute data is also more complex for lines and polygons.
• Could record the attributes for each coordinate pair, but
would create a lot of data redundancy.
• Would also be very difficult to edit.
• A common solution is to store the attribute data in a
separate file and link it to the locational data using a
relational join
• There are notably georelational and geodatabase models
Vector Data Model Structures
Two of the most common vector data structures are;
spaghetti data model - each point, line, and/or polygon feature is
represented as a string of X, Y coordinate pairs (or as a single X, Y
coordinate pair in the case of a vector image with a single point) with no
inherent structure.
topological data model - is characterized by the inclusion of topological
information within the dataset, Topology is a set of rules that model the
relationships between neighbouring points, lines, and polygons and
determines how they share geometry.
Spaghetti vs Topological
https://www.geo.umass.edu/courses/
geo494a/Chapter2_GIS_Fundamentals.pdf
Topological precepts
Three basic topological precepts that are necessary to understand the
topological data model are;
Connectivity - describes the arc-node topology for the feature dataset
Area definition - Area definition states that an arc that connects to and
surounds an area defines a polygon, also called polygon-arc topology.
Contiguity - is based on the concept that polygons that share a boundary
are deemed adjacent. Specifically, polygon topology requires that all arcs
in a polygon have a direction (a from-node and a to-node), which allows
adjacency information to be determined
Advantages of vector model
Vector data models can represent all types of features with accuracy.
Points, lines, and polygons, are accurate when defining the location
and size of all topographic features.
Compact data structure, therefore less disk storage space
Accurate geographic location of data is maintained
Efficient for network analysis
Not scale dependent
Disadvantages of vector model
Difficult overlay operations.
Spatial variability is not well represented
Not compatible with remotely sensed data - Continuous data, such as
elevation data, is not effectively represented in vector form. Usually,
substantial data generalization or interpolation is required for these
data layers.
Complex data structure
Geographic data models – Raster
Raster data models
Raster data models present information through a grid of cells.
Because of the reliance on a uniform series of square pixels, the raster
data model is referred to as a grid-based system.
Raster grids are usually made up of square or rectangular cells and
they display one value in a cell.
Raster data is especially effective at representing data that is
continuous, such as elevation, precipitation, aspect and slope.
All types of digital images are displayed in raster format
Raster data model structure
The raster logical model represents a single geographic phenomenon
(usually, but not always a field) as a two-dimensional array of
samples, usually at regular spacing in both the x and y directions.
In a grid, each value represents a summary of the values within a
square, such as mean temperature.
Each horizontal array of values (having the same y value) is called
a row, and each vertical array (having the same x value) is called
a column. Each sample location (whether point or square) is called
a cell, or pixel if the raster is an image.
Raster model can be used to represent continuous (elevation) or
discreet fields (land cover).
Discrete raster data model
Representing land cover – four dinstict land cover classes are
represented. Each cell has a unique value which represents a certain
land cover class. For example, the cells with value 3 means that they
below to the cropland landcover class, whilst 1 belongs to forest land.
Continuous raster data model
Representing elevation - there are no distinct classes, only ranges
Continuous data has no clearly defined boundaries
The transition between possible values on a continuous surface is
without abrupt or well-defined breaks
Advantages of raster model
Due to the nature of the data storage technique data analysis is usually
easy to program and quick to perform.
Discrete data, e.g. forestry stands, is accommodated equally well as
continuous data, e.g. elevation data, and facilitates the integrating of
the two data types.
The inherent nature of raster maps, e.g. one attribute maps, is ideally
suited for mathematical modeling and quantitative analysis.
Disadvantages of raster model
The cell size determines the resolution at which the data is
represented.
It is especially difficult to adequately represent linear features
depending on the cell resolution. Accordingly, network linkages are
difficult to establish.
Since most input data is in vector form, data must undergo vector-to-
raster conversion. Besides increased processing requirements this may
introduce data integrity concerns due to generalization and choice of
inappropriate cell size.