iirs
Data Models
&
Conceptual Model of Spatial Information
Dr. Sameer Saran
Geoinformatics Division
iirs What is a GIS ?
“A GIS is a computer-based system that
provides the following four sets of
capabilities
p to handle ggeo-referenced data:
1. Input
2 D
2. Data management ((storage and d retrieval)
i l)
3. Manipulation
p and analysis
y
4. Output.”
(A
(Aronoff,
ff 1989)
iirs What does a GIS?
A GIS works with objects,
objects their attributes,
attributes
and the relationships among the objects.
The objects are stored in a database using
geometric primitives (volumes, areas, lines,
points) their attributes and the
points),
relationships between them (topology).
iirs Characteristics of Geographic Data
• Spatial data: features orientation shape, size
& structure
• Non-Spatial
N S i l data:
d IInformation
f i about
b various
i
attributes like area, length &
population
iirs Characteristics of Spatial Data
• spatial reference • where?
• attributes • what?
• spatial relationships • how?
• temporal component • when?
iirs Spatial data models
Spatial data models are highhigh-
level data structures that focus
on “formalization of the
concepts humans use to
conceptualize space”
iirs Spatial Data Model
It represents the linkages between the real world
domain of geographic data and the computer or GIS
representation of these features. It helps (Marble, 1982)
• To organize a systematic file structure
• Abstracts the real world into properties which are perceived
by
y a specific
p application
pp
iirs GIS structures as representations of reality
Two approaches have been widely adopted
for representing the spatial & attribute
information within a GIS
• A composite model (raster)
• Geo
G -relational
Geo- l ti l model
d l (vector)
( t )
iirs spatial data models
• Two fundamental approaches:
–raster model
–vector
t model
d l
iirs Spatial data types
• Regular
l tessellations
ll i
• Irregular tessellations
• Point data
• Line data
• A ddata
Area
iirs
Vector Data Concept
p
iirs Vector Data Structure
Point (node): 0-dimension
• single
i l x,y coordinate
di t pair i
• zero area
• tree, oil well, label location
Line
Li (arc):
( ) 1-dimension
1 di i
• two (or more) connected x,y
coordinates
• road, stream
Polygon : 2-dimensions
• four or more ordered and
connected x,y coordinates
• first and last x,y pairs are the
same
• encloses an area
• census tracts, county, lake
iirs Vector model
• In a vector-based GIS data are handled as:
– Points X,Y coordinate pair + label
– Lines series of points
– Areas line(s) forming their boundary
(series of polygons)
line
feature
area
point feature feature
iirs Vector model
iirs Line Types
Line segment: with two end points
Line string: a sequence of line segments
LinearRing (Ring): a sequence of segments with
closure
Curves
C (Arc)
(A )
* Sometimes, “Arcs” refer to lines (in
ArcGIS/ArcView case))
iirs Vector Structures
How to organize
g vectors in Computer
p ?
Spaghetti Structure
Whole Polygon Structure
Points and Polygons Structure
Topological Structure
iirs Spaghetti Vector Structure
Spaghetti
p g structure is usually
y derived
from manual digitizing
C
Crossing
i lines
li (no
( crossingi nodes)
d )
The common boundary between adjacent
polygons is recorded twice
No neighbourhood information
Unlinked data require a large amount of
storage
t memory
iirs Whole Polygon Structure
(A Kind of Spaghetti)
Whole
Wh l Polygon
P l (boundary
(b d structure):
t t ) polygonsl
described by listing coordinates of points in order as
yyou ‘walk around’ the outside boundaryy of the
polygon
• coordinates/borders for adjacent polygons stored twice
¾ may nott be
b same, resulting
lti ini slivers
li (gaps),
( ) or overlap
l
• all lines are ‘double’ (except for those on the outside
periphery)
• no topological information about polygons
¾ which are adjacent and have common boundary?
¾ how to relate different g
geographies?
g p e.g.
g zip
p codes and tracts?
iirs Whole Polygon: illustration
iirs Points & Polygons Structure
Points and Polygons: polygons described by
listing ID numbers of points in order as you ‘walk
around the outside boundary’; a second file lists
all points and their coordinates
• solves the duplicate coordinate/double border problem
• lines can be handled similar to polygons (list of IDs) ?
• still no topological information
iirs Points and Polygons:illustration
iirs Topology
Topology is a branch of mathematics that deals
with properties of space that remain invariant
under certain transformations
transformations.
Properties : Three spatial relationships
Area:
Area: Polygons can be defined by set of lines enclose them
Contiguity:: Identification of polygons which touch each other or
Contiguity
connect identify contiguos polgons (left or right)
Connectivity:
Connectivity
Co ect v ty:
ty: Identification
de t cat o oof interconnected
te co ected arcs,
a cs, starting
sta t g point
po t
& end point of network analysis
iirs Rubber Sheet Transformation
1 1
C 5 C 5
A A
D E 7 D E 7
6 6
4 3 4 3
B B
2 2
iirs Topology-1
Connections & relationships between objects are
independent of their coordinates
Topological properties of an object are preserved
when the object is stretched, distorted and bended
Overcomes major weakness of spaghetti model –
allowing for GIS analysis (Overlaying,
(Overlaying Network)
Requires all lines be connected, polygons closed,
loose ends removed
iirs Topology-2
It describes spatial relationships
• Connectivity: relationships between
the arcs in the network
• Contiguity (adjacency): relationships
between the polygons
¾ For example, with respect to line 1, left and
right polygons are A and B respectively
• Containment: this refers to what is
within
i hi a polygon
l
¾ For example, Polygon B is within Polygon A
iirs Building topology
iirs Topological data model
iirs Popular File Formats
DIME – Dual Independent Map Encoding
TIGER – Topologically Integrated
Geographic Encoding and Reference
DLG – Digital Line Graph
Shape File, ESRI
• Software or data specific
Geodatabases
iirs Vector Data Structures
Advantages
• Good modeling of objects (object-view)
• Compact data structure
• Topology can be described explicitly – therefore good
f analysis
for l i
• Coordinate transformation & rubber sheeting is easy
• Accurate graphic representation at all scales
• Retrieval, updating and generalization of graphics &
attributes are possible
iirs Vector Data Structures
Disadvantages
g
• Complex data structures
• Combining
C bi i severall polygon
l networks
t k by
b
intersection & overlay is difficult; uses
considerable computer power
• Display & plotting often time consuming
and expensive; especially high quality
drawings, coloring, and shading
iirs
Raster Data Concept
p
iirs Raster Data Structure
Area is covered by grid with
(usually) equal-sized
equal sized cells
Cells often called pixels
(picture elements); raster data
often called image data
Attributes are recorded by
assigning each cell a single
value based on the majority
feature (attribute) in the cell
cell,
such as land use type
Typically
yp y 8 bits assigned
g to
values therefore 256 possible
values (0-255)
iirs Raster Data Structures: Tessellation
iirs Raster based data structures
To effectively increase data processing
performance and reduce the demand for
data storage, two issues involved in raster
data structures:
• Compression
p methods:- how to more
efficiently store the data, and
• Scan order:
order:- how to scan the data in an
array and deals with performance in terms
of data processing
iirs Run-length Coding
Describes the interior of
an area by run-lengths,
instead of the boundary
Run-Length
g Codes:
• Row 9: 2,3; 6,6; 8,10
• Row 10: 1,10
• Row 11: 1,9
19
iirs Raster Compression
Run Length Compression
• O
One off th
the widely
id l usedd raster
t data
d t representation
t ti
and compression techniques
• E.g.:
g Code the raster ((shown in the example p
image) using the run-length coding with the row-
order
• Run-length Codes: 14, 3; 2,7; 4,3; 4,7; 4,3; 3,7;
9,3; 2,7; 6,3; 4,7; 5,3; 3,7; 4,3
• Original image size (assume that each pixel is
coded using 1 byte) is 8x8=64 bytes
• The run
run-length
length code file needs 13x2 = 26 bytes
• The compression radio is, 64:26 = 2.46:1
iirs Quad-tree Coding
iirs Raster Compression Cont’d
Quad-tree Compression
• Used widely for spatial data indexing
• Quadtree codes with N-order (Peano key–based)
iirs Raster Ordering
Raster is two-dimensional
2-D ordering is developed to create a 1-D
representation of the 2-D raster, in order to improve
the efficiency of raster access.
iirs Raster Compression Cont’d
“lossless”
lossless compression vs “lossy”
lossy
compression
• Can you reproduce exactly the original data
from the compressed data?
Zip (2
(2-5:1)
5:1)
GIF (2-4:1), JPEG (10-40:1), MPEG
(50 1)
(50:1).
ECW , Mr. Sid etc.
iirs Raster Data Structures
Advantages
• Si
Simple
l data
d structures
• Location-specific manipulation of attribute
d t is
data i easy
• Many kinds of spatial analysis and filtering
may be
b usedd
• Mathematical modeling is easy because all
spatial
ti l entities
titi have
h a simple,
i l regularl shape
h
iirs Raster Data Structures
Disadvantages
g
• Large data volumes
• Using large grid cells to reduce data volumes
reduces spatial resolution; loss of information
& inability
ab ty to recognize
ecog e pphenomena
e o e a that
t at have
ave
logically defined structures
• Crude raster mapsp are inelegant
g though g ggraphic
p
elegance is becoming less of a problem
• Coordinate transformations are difficult and time
consuming unless special algorithms are employed
iirs Distortion of shapes in raster data
iirs GIS Data Models: Raster vs. Vector
iirs Choices: Raster vs. Vector
iirs
THANK YOU