Tybsc It Sem 6 Pgis Unit 1 To 5 Theshikshak Com LMR Point To Point
Tybsc It Sem 6 Pgis Unit 1 To 5 Theshikshak Com LMR Point To Point
In conclusion, studying topic-wise is more beneficial for students as it enables them to develop a better
understanding of a subject, retain information better, utilize their time more efficiently, and be well-prepared
for exams.
1. Comprehensive course material: TheShikshak Edu App offers comprehensive course material for
BScIT and BScCS students, covering all topics and concepts required in these programs.
2. Track their progress : analytics program helps student to know which topics are remaining and which
are lowest watched lectures
3. Expert guidance: TheShikshak Edu App has a team of experienced instructors who provide expert
guidance and support to students. Students can get their doubts clarified and receive personalized
feedback on their performance.
UNIT 1
Chapter 1: A Gentle Introduction to GIS
Defining GIS:
A GIS is a computer-based system that provides the following four sets of capabilities to
handle georeferenced data:
1. Data capture and preparation
2. Data management, including storage and maintenance
3. Data manipulation and analysis
4. Data presentation
• This implies that a GIS user can expect support from the system to enter (geo-
referenced) data, to analyse it in various ways, and to produce presentations
(including maps and other types) from the data.
• This would include support for various kinds of coordinate systems and
transformations between them, options for analysis of the georeferenced data.
GISystem
• Geographic Information System (GISystem) is the most used concept of GIS.
• GISystem as a computerized system designed to dealing with the collection, storage,
manipulation, analysis, visualization and displaying geographic information.
• GISystem is a tool to perform the spatial analysis which will put insight to the activities and
phenomena carrying out everyday.
• GISystem include different components:-
1. Hardware
2. Software
3. Data/Information
4. Users/People
5. Procedures/Methods and Network
GIScience
• Geographic Information Science (GIScience) is advocated to address a set of intellectual and
scientific questions which go well beyond the technical capabilities of GISystem
• GIScience: Talks about GIS as a scientific discipline of study in the academia. This is the
science behind the technology aimed at enhancing knowledge of Geospatial concepts and
their computational implementations. The major contributing disciplines are:-
1. Computer science
2. Mathematics/Statistics
3. Geomatics (Land Surveying, Photogrammetry, Remote Sensing, Geodesy, GPS, Drone
mapping)
4. Geography and
5. Cartography
GIS Application
• Geographic Information Application is the kind of services dealing with the geographic
information, such as the design and development of the GIS, geographic information
retrieval, analysis, etc. For example, MapQuest (www.mapquest.com) provides a routing
service for people to find the best driving route between two points.
• GIService allows GIS users to access specific functions that are provided by remote sites
through the internet.
• Some examples are: MapQuest, Google maps, Bing Maps, Yahoo Maps, Apple Maps,
Yandex Maps, OpenStreetMap and WikiMapia Maps.
• Other agencies such as geological survey companies, energy supply companies, local
government departments, and many others, all collect and maintain spatial data for their
own particular pur-poses.
Maps
• maps are perhaps the best known (conventional) models of the real world.
• Maps have been used for thousands of years to represent information about the real world, and
continue to be extremely useful for many applications in various domains.
• A disadvantage of the traditional paper map is that it is generally restricted to two- dimensional
static representations, and that it is always displayed in a fixed scale. The map scale determines
the Map spatial resolution of the graphic feature representation.
• A map is always a graphic representation at a certain level of detail, which is determined by the
scale.
• Map sheets have physical boundaries, and features spanning two map sheets have to be cut into
pieces.
• Cartography, as the science and art of map making, functions as an interpreter, translating real
world phenomena (primary data) into correct, clear and understandable representations for our
For detailed Video Lecture Download The Shikshak Edu App
Telegram | youtube | Android App | website | instagram | whatsapp | e-next
use.
• Maps also become a data source for other applications, including the development of other
maps.
Databases
• A database is a repository for storing large amounts of data. It comes with a number of useful
functions:
• A database can be used by multiple users at the same time—i.e. it allows concurrent use
• A database offers a number of techniques for storing data and allows the use of the most efficient
one—i.e. it supports storage optimization
• A database allows the imposition of rules on the stored data; rules that will be automatically
checked after each update to the data—i.e. it supports data integrity
• A database offers an easy to use data manipulation language, which allows the execution of all
sorts of data extraction and data updates—i.e. it has a query facility,
• A database will try to execute each query in the data manipulation lan- guage in the most efficient
way—i.e. it offers query optimization.
Databases can store almost any kind of data. Modern database systems, as we shall see below
DAYMEASUREMENTS
BuoyDate S STWSHumidTemp10 . ..
B0749 1997/12/0328.2 ◦C NNW 4.2 72% 22.2 ◦C . ..
B9204 1997/12/0326.5 ◦C NW 4.6 63% 20.8 ◦C . ..
B1686 1997/12/0327.8 ◦C NNW 3.8 78% 22.8 ◦C . ..
B0988 1997/12/0327.4 ◦C N 1.6 82% 23.8 ◦C . ..
B3821 1997/12/0327.5 ◦C W 3.2 51% 20.8 ◦C . ..
B6202 1997/12/0326.5 ◦C SW 4.3 67% 20.5 ◦C . ..
B1536 1997/12/0327.7 ◦C SSW 4.8 58% 21.4 ◦C . ..
B0138 1997/12/0326.2 ◦C W 1.9 62% 21.8 ◦C . ..
B6823 1997/12/0323.2 ◦C S 3.6 61% 22.2 ◦C . ..
. .. . .. . .. . .. . .. . .. . ..
Table 1.2: A stored table (in part) of daily buoy measurements. Illustrated are only measurements for
December 3rd, 1997, though measurements for other dates are in the table as well. Humid is the air
humidity just above the sea, Temp10 is the measured water temperature at 10 metres depth. Other
measurements are not shown.
Elevation in the Falset study area, Tarragona prov×ince, Spain. The area is approximately 25
20 km. The illustration has been aesthetically improved by a technique known as
‘hillshading’. In this case, it is as if the sun shines from the north-west, giving a shadow effect
towards the south-east. Thus, colour alone is not a good indicator of elevation; observe that
elevation is a continuous function over the space.
For detailed Video Lecture Download The Shikshak Edu App
Telegram | youtube | Android App | website | instagram | whatsapp | e-next
Figure 2.2: A continuous field example, namely the elevation in the study area of Falset, Spain.
Data source: Department of Earth Systems Analysis (ESA, ITC)
Geographic fields
• A field is a geographic phenomenon that has a value ‘everywhere’ in the study area. We can
therefore think of a field as a mathematical function f that associates a specific value with any
position in the study area.
• Hence if (x, y) is a position in the study area, then f (x, y) stands for the value of the field of at
locality (x, y).
• Fields can be discrete or continuous. In a continuous field, the underlying function is assumed to
be ‘mathematically smooth’, meaning that the field values along any path through the study area
do not change abruptly, but only gradually.
• Good examples of continuous fields are air temperature, barometric pressure, soil salinity and
elevation. Continuity means that all changes in field values are gradual.
• Discrete fields divide the study space in mutually exclusive, bounded parts, with all locations in
one part having the same field value.
• Typical examples are land classifications, for instance, using either geological classes, soil type,
land use type, crop type or natural vegetation type.
Geographic objects
• When a geographic phenomenon is not present everywhere in the study area, but somehow
‘sparsely’ populates it, we look at it as a collection of geographic objects. Such objects are usually
easily distinguished and named, and their position in space is determined by a combination of
For detailed Video Lecture Download The Shikshak Edu App
Telegram | youtube | Android App | website | instagram | whatsapp | e-next
Fig: A number of geological faults in the same study area as in Figure2.2.Faults are indicated in blue; the
study area, with the main geo- logical era’s is set in grey in the background only as a reference. Data
source: Department of Earth Systems Analysis (ITC).
Boundaries
• Where shape and/or size of contiguous areas matter, the notion of boundary comes into play.
This is true for geographic objects but also for the constituents of a discrete geographic field
Location, shape and size are fully determined if we know an area’s boundary, so the boundary is
a good candidate for representing it.
• This is especially true for areas that have naturally crisp boundaries.
• Fuzzy boundaries contrast with crisp boundaries in that the boundary is not a precise line, but
rather itself an area of transition.
• Elevation, for instance, can be measured at many locations, even within one’s own backyard,
and each location may give a different value.
Regular tessellations
• A tessellation (or tiling) is a partitioning of space into mutually exclusive cells that together make
up the complete study space. With each cell, some (thematic) value is associated to characterize
that part of space. Three regular tessellation types are illustrated in Figure2.5.
• In a regular tessellation, the cells are the same shape and size. The simplest example is a
rectangular raster of unit squares, represented in a computer in the 2D case as an array of n m
elements (see Figure2.5–left).
Figure 2.5: The three most common regular tessellation types: square cells, hexagonal cells, and triangular
cells.
• In all regular tessellations, the cells are of the same shape and size, and the field attribute value
assigned to a cell is associated with the entire area occu- pied by the cell. The square cell
tessellation is by far the most commonly used, mainly because georeferencing a cell is so
straightforward. These tessellations are known under various names in different GIS packages,
but most frequently as rasters.
• A raster is a set of regularly spaced (and contiguous) cells with associated (field) values. The
associated values represent cell values, not point values. This means that the value for a cell is
assumed to be valid for all locations within the cell.
Irregular tessellations
• Irregular tessellations are more complex than the regular ones, but they are also more adaptive,
which typically leads to a reduction in the amount of memory used to store the data.
• A well-known data structure in this family—upon which many more variations have been
based—is the region quadtree. It is based on a regular tessellation of square cells, but takes
advantage of cases where neighbouring cells have the same field value, so that they can together
be represented as one bigger cell.
Figure 2.7: An 8 8, three-valued raster (here: colours) and its repre- sentation as a region quadtree. To
construct the quadtree, the field is successively split into four quadrants until parts have only a single
field value. After the first split, the southeast quadrant is entirely green, and this is indicated by a green
square at level two of the tree. Other quadrants had to be split further.
Vector representations
• Tessellations do not explicitly store georeferences of the phenomena they repre- sent. Instead, they
provide a georeference of the lower left corner of the raster, for instance, plus an indicator of the
raster’s resolution, thereby implicitly pro- viding georeferences for all cells in the raster. In vector
representations, an attempt is made to explicitly associate georeferences with the geographic
phenomena. georeference is a coordinate pair from some geographic space, and is also known as a
vector. This explains the name.
• Below, we discuss various vector representa- tions. We start with our discussion with the TIN, a
representation for geographic fields that can be considered a hybrid between tessellations and
vector representations.
Figure 2.8: Input locations and their (elevation) values for a TIN construction. The location P is an arbitrary
location that has no associated elevation measurement.
30 810
350 980
1550 P
1250
1340
1100
45 820
• It is one of the standard implementation techniques for digital terrain models, but it can be used to
represent any continuous field. The principles behind a TIN are simple.
• It is built from a set of locations for which we have a measurement, for instance an elevation. The
locations can be arbitrarily scattered in space, and are usually not on a nice regular grid. Any location
together with its elevation value can be viewed as a point in three-dimensional space.
Two tessellations are illustrated in Figure2.9.
P P
• In three-dimensional space, three points uniquely determine a plane, as long as they are not
collinear, i.e. they must not be positioned on the same line. A plane fitted through these points has
a fixed aspect and gradient, and can be used to compute an approximation of elevation of other
locations.3 Since we can pick many triples of points, we can construct many such planes, and
therefore we can have many elevation approximations for a single location, such as P (Figure2.8).
So, it is wise to restrict the use of a plane to the triangular area ‘between’ the three points.
Point representations
• Points are defined as single coordinate pairs (x, y) when we work in 2D, or co- ordinate triplets (x, y,
z) when we work in 3D.
• Points are used to represent objects that are best described as shape- and size- less, one-
dimensional features. Whether this is the case really depends on the purposes of the spatial
application and also on the spatial extent of the objects compared to the scale applied in the
application. For a tourist city map, a park will not usually be considered a point feature, but perhaps
a museum will, and certainly a public phone booth might be represented as a point.
Line representations
• Line data are used to represent one-dimensional objects such as roads, railroads, canals, rivers and
power lines. Again, there is an issue of relevance for the application and the scale that the application
requires. For the example application of mapping tourist information, bus, subway and streetcar
routes are likely to be relevant line features. Some cadastral systems, on the other hand, may
consider roads to be two-dimensional features,
• i.e. having a width as well.
• Above, we discussed the notion that arbitrary, continuous curvilinear features are as equally difficult
to represent as continuous fields. GISs therefore approximate such features (finitely!) as lists of
For detailed Video Lecture Download The Shikshak Edu App
Telegram | youtube | Android App | website | instagram | whatsapp | e-next
nodes. The two end nodes and zero or more internal nodes or vertices define a line. Other terms for
’line’ that are commonly used in some GISs are polyline, arc or edge. A node or vertex is like a point
(as discussed above) but it only serves to define the line, and provide shape in order to obtain a
better approximation of the actual feature.
• The straight parts of a line between two consecutive vertices or end nodes are called line segments.
Figure 2.10: A line is de- fined by its two end nodes and zero or more internal nodes, also known
as vertices. This line representation has three vertices, and therefore four line segments.
Area representations
• When area objects are stored using a vector approach, the usual technique is to apply a boundary
model. This means that each area feature is represented by some arc/node structure that
determines a polygon as the area’s boundary. Common sense dictates that area features of the same
kind are best stored in a single data layer, represented by mutually non-overlapping polygons. In
essence, what we then get is an application- determined (i.e. adaptive) partition of space.
In the case that the object can be perceived as having a fuzzy boundary, a polygon is an even worse
approximation, though potentially the only one possible. An example is provided in Figure2.11.
• The areas are still bounded by the same boundaries, only the shapes and lengths of their
perimeters have changed.
• Topological relationships are built from simple elements into more complex elements: nodes define
line segments, and line segments connect to define lines, which in turn define polygons.
Topological relationships
The mathematical properties of the geometric space used for spatial data can be described as follows:
• The space is a three-dimensional Euclidean space where for every point we can determine its
three-dimensional coordinates as a triple (x, y, z) of real numbers. In this space, we can define
features like points, lines, polygons, and volumes as geometric primitives of the respective
dimension. A point is zero-dimensional, a line one- dimensional, a polygon two-dimensional, and a
volume is a three-dimensional primitive.
• The space is a metric space, which means that we can always compute the distance between two
points according to a given distance function. Such a function is also known as a metric.
• The space is a topological space, of which the definition is a bit complicated. In essence, for every
point in the space we can find a neighbourhood around it that fully belongs to that space as well.
• Interior and boundary are properties of spatial features that remain invariant under topological
mappings. This means, that under any topological mapping, the interior and the boundary of a
feature remains unbroken and intact.
Figure2.15 shows all eight spatial relationships: disjoint, meets, equals, inside, covered by,
contains, covers, and overlaps. These relationships can be used in queries against a spatial
database, and represent the ‘building blocks’ of more complex spatial queries.
Thus, nodes also have an elevation value associated with them. Essentially, this allows the GIS
user to represent 1- and 2-simplices that are non- horizontal, and therefore, a piecewise planar,
‘wrinkled surface’ can be constructed as well, much like a TIN. Note however, that one cannot
have two different nodes with identical x- and y-coordinates, but different z-values. Such nodes
would constitute a perfectly vertical feature, and this is not allowed. Consequently, true solids
1
cannot be represented in a 2 D GIS.
2
• A geographic field can be represented through a tessellation, through a TIN or through a vector
representation. The choice between them is determined by the requirements of the application
at hand.
• It is more common to use tessellations, notably rasters, for field representation, but vector
representations are in use too. We have already looked at TINs. We provide an example of the
other two below
in each cell. This would not have made the figure very legible, however
• A raster can be thought of as a long list of field values: actually, there should be m n such
values. The list is preceded with some extra information, like a single georeference as the
origin of the whole raster, a cell size indicator, the integer values for m and n, and a data
type indicator for interpreting cell values. Rasters and quadtrees do not store the
georeference of each cell, but infer it from the above information about the raster.
2. Vector representation of a field
• We briefly mention a final representation for fields like elevation, but using a vector
representation. This technique uses isolines of the field. An isoline is a linear feature that
connects the points with equal field value. When the field is elevation, we also speak of
contour lines. The elevation of the Falset study area is represented with contour lines in
Figure2.18. Both TINs and isoline representations use vectors.
• Isolines as a representation mechanism are not very common, however. They are in use as a
geoinformation visualization technique (in mapping, for instance), but commonly using a TIN
for representing this type of field is the better choice. Many GIS packages provide functions
to generate an isoline visualization from a TIN.
• The somehow more natural way to represent geographic objects is by vector representations.
We have discussed most issues already in Section2.3.3, and a small example suffices at this
stage.
• In Figure 2.22, a number of geographic objects in the vicinity of the ITC building have been
depicted. These objects are represented as area representations in a boundary model. Nodes
and vertices of the polylines that make up the object’s boundaries are not illustrated, though
they obviously are stored.
• In the previous sections, we have discussed various types of geographic information and
ways of representing them. We have looked at case-by-case examples, however, we have
purposefully avoided looking at how various sorts of spatial data are combined in a single
system.
• The main principle of data organization applied in GIS systems is that of a spatial data layer.
A spatial data layer is either a representation of a continuous or discrete field, or a collection
of objects of the same kind. Usually, the data is organized so that similar elements are in a
single data layer. For example, all telephone booth point objects would be in one layer, and
all road line objects in another. A data layer contains spatial data—of any of the types
discussed above— as well as attribute (or: thematic) data, which further describes the field or
objects in the layer.
• Attribute data is quite often arranged in tabular form, maintained in some kind of
geodatabase, as we will see in Chapter3. An example of two field data layers is provided in
Figure2.23.
• Data layers can be overlaid with each other, inside the GIS package, so as to study
combinations of geographic phenomena. We shall see later that a GIS can be used to study
the spatial relationships between different phenomena, requiring computations which
overlay one data layer with another. This is schematically depicted in Figure 2.24 for two
different object layers.
• Besides having geometric, thematic and topological properties, geographic phenomena are
also dynamic; they change over time. For an increasing number of applications, these changes
themselves are the key aspect of the phenomenon to study. Examples include identifying the
owners of a land parcel in 1972, or how land cover in a certain area changed from native
forest to pastures over a specific time period. We can note that some features or phenomena
change slowly, such as geological features, or as in the example of land cover given above.
Other phenomena change very rapidly, such as the movement of people or atmospheric
conditions. For different applications, different scales of measurement will apply.
• Examples of the kinds of questions involving time include:
o Where and when did something happen?
o How fast did this change occur?
o In which order did the changes happen?
• The way we represent relevant components of the real world in our models can influence
the kinds of questions we can or cannot answer. This chapter has al- ready discussed
representation issues for spatial features, but has so far ignored the problematic issues for
incorporating time. The main reason lies in the fact that GISs still offer limited support for
the representation of time. As a result, most studies require substantial efforts from the GIS
user in data preparation and data manipulation. Also, besides representing an object or field
UNIT 2
Data management and processing systems
GIS software
• GIS can be considered to be a data store (i.e. a system that stores spatial data), a toolbox, a
technology, an information source or a field of science. The main characteristics of a GIS
software package are its analytical functions that provide means for deriving new
geoinformation from existing spatial and attribute data.
• The use of tools for problem solving is one thing, but the production of these tools is
something quite different. Not all tools are equally well-suited for a particular application, and
they can be improved and perfected to better serve a particular need or application.
• The discipline of geographic information science is driven by the use of our GIS tools, and
these are in turn improved by new insights and information gained through their application
in various scientific fields.
• All GIS packages available on the market have their strengths and weaknesses, typically
resulting from the development history and/or intended application domain(s) of the
package.
• Well-known, full-fledged GIS packages include ILWIS, Intergraph’s GeoMedia, ESRI’s ArcGIS,
and MapInfo from Map-Info Corp.
Data Capture
and Preparation
Storage and
Maintenance
Manipulation Data
and Analysis Presentation
• GIS software packages provide support for both spatial and attribute data, i.e. they accommodate
spatial data storage using a vector approach, and attribute data using tables. Historically,
however, database management systems (DBMSs) have been based on the notion of tables for
data storage.
• For some time, substantial GIS applications have been able to link to an external database to
store attribute data and make use of its superior data management functions.
• currently, All major GIS packages provide facilities to link with a DBMS and ex- change attribute
data with it.
Table3.4lists several different methods and devices used for the presentation of spatial data. Cartography
and scientific visualization make use of these methods and devices to produce their products.
• A DBMS supports the storage and manipulation of very large data sets.
• A DBMS can be instructed to guard over data correctness.
• A DBMS supports the concurrent use of the same data set by many users.
• • A DBMS provides a high-level, declarative query language
• A DBMS supports the use of a data model. A data model is a language with which one can
define a database structure and manipulate the data stored in it.
• • A DBMS includes data backup and recovery functions to ensure data avail- ability at all times.
• A DBMS allows the control of data redundancy.
• When a relation is created, we need to indicate what type of tuples it will store. This means
that we must
1. Provide a name for the relation,
2. Indicate which attributes it will have, and
3. Set the domain of each attribute
designer of a GIS application can choose whether to store the application data in the GIS or in the
DBMS. Spatial databases, also known as geodatabases, are implemented directly on existing
DBMS, using extension software to allow them to handle spatial objects.
• A spatial database allows users to store query and manipulate collections of spatial data.
• There are several advantages in doing this, spatial data can be stored in a special database column,
known as the geometry column, (or feature or shape, depending on the specific software
package),. This means GISs can rely fully on DBMS support for spatial data, making use of a DBMS
for data query and storage (and multi-user support), and GIS for spatial functionality. Small-scale
GIS applications may not require a multi-user capability, and can be supported by spatial data
support from a personal database.
• A geodatabase allows a wide variety of users to access large data sets (both geographic and
alphanumeric), and the management of their relations, guaranteeing their integrity. The Open
Geospatial Consortium (OGC) has released a series of standards relating to geodatabases that
(amongst other things), define :
o Which tables must be present in a spatial database (i.e. geometry columns table and
spatial reference system table)
o The data formats, called 'Simple Features' (i.e. point, line, polygon, etc.)
o A set of SQL-like instructions for geographic analysis.
UNIT 3
• The Geoid is used to describe heights. In order to establish the Geoid as reference for heights, the
ocean's water level is registered at coastal places over several years using tide gauges (mareographs).
• Averaging the registrations largely eliminates variations of the sea level with time. The resulting water
level represents an approximation to the Geoid and is called the mean sea level.
The ellipsoid
• The physical surface, called Geoid, is used as a reference surface for heights. Also a reference surface
for the description of the horizontal coordinates of points of interest is required.
• This will later used to project these horizontal coordinates onto a mapping plane, the reference
surface for horizontal coordinates requires a mathematical definition and description. The most
convenient geometric reference is the oblate ellipsoid.
• It provides a relatively simple figure which fits the Geoid to a first order approximation, though for
small scale mapping purposes a sphere may be used. An ellipsoid is formed when an ellipse is rotated
about its minor axis. This ellipse which defines an ellipsoid or spheroid is called a meridian ellipse.
• The shape of an ellipsoid may be defined in a number of ways, but in geodetic practice the definition
is usually by its semi-major axis and flattening, Flattening f is dependent on both the semi-major axis
a and the semi-minor axis b.
• We can easily transform ITRF coordinates (X, Y and Z in metres) into geo- graphic coordinates (φ,
λ,h) with respect to the GRS80 ellipsoid without the loss of accuracy. However, the ellipsoidal height
h, obtained through this straight- forward transformation, has no physical meaning and does
not correspond to intuitive human perception of height. We therefore use the height H,
above the Geoid (see Figure4.8).
Coordinate systems
• Different kinds of coordinate systems are used to position data in space. Spatial (or global)
coordinate systems are used to locate data either on the Earth's surface in a 3D space, or on the
Earth's reference surface in a 2D space. The geographic coordinate system in 2D and 3D space
and the geocentric coordinate system, also known as the 3D Cartesian coordinate system. Planar
coordinate systems on the other hand are used to locate data on the flat surface of the map in a
2D space.
• The latitude (cf>) of a point P is the angle between the ellipsoidal normal through P ' and the
equatorial plane. Latitude is zero on the equator (cf» = 0°), and increases towards the two poles to
maximum values of 4> = +90° (N 90°) at the North Pole and cj) = -90° (S 90°) at the South Pole.
• The longitude (A) is the angle between the meridian ellipse which passes through Greenwich and the
meridian ellipse containing the point in question. It is measured in the equatorial plane from the
meridian of Greenwich (A = 0°) either eastwards through A = + 180° (E 180°) or westwards through A
= -180° (W 180°).
• It should be noted that the rotational axis of the earth changes its position over time (referred to as
polar motion). To compensate for this, the mean position of the pole in the year 1903 (based on
observations between 1900 and 1905) has been used to define the so-called Conventional
International Origin (CIO).
• In principle, this reference line can be chosen freely. However, in practice three different directions
are widely used: True North, Grid North and Magnetic North. The corresponding bearings are called:
true (or geodetic) bearing, grid bearing and magnetic (or compass) bearing.
Map projections
• A map projection is a mathematically described technique of how to represent the Earth's curved
surface on a flat map. To represent parts of the surface of the Earth on a flat paper map or on a
computer screen, the curved horizontal reference surface must be mapped onto the 2D mapping
plane. The reference surface for large-scale mapping is usually an oblate ellipsoid, and for small-scale
mapping, a sphere.
• Mapping onto a 2D mapping plane means transforming each point on the reference surface with
geographic coordinates (<f>, A) to a set of Cartesian coordinates (x, y) representing positions on the
map plane.
• The actual mapping cannot usually be visualized as a true geometric projection, directly onto the
mapping plane. This is achieved through mapping equations.
• A forward mapping equation transforms the geographic coordinates (4>, A) of a point on the curved
reference surface to a set of planar Cartesian coordinates (x, y), representing the position of the same
point on the map plane : (x, y) = f((f>, A) The corresponding inverse mapping equation transforms
mathematically the planar Cartesian coordinates (x, y) of a point on the map plane to a set of
geographic coordinates (cf>, A) on the curved reference surface: ($, A) = f(x, y).
Coordinate transformations
• Map and GIS users are mostly confronted in their work with transformations from one two-
dimensional coordinate system to another. This includes the trans- formation of polar coordinates
delivered by the surveyor into Cartesian map coordinates or the transformation from one 2D
Cartesian (x, y) system of a spe- cific map projection into another 2D Cartesian (xj, yj) system of a
defined map projection.
• Datum transformations are transformations from a 3D coordinate system (i.e. horizontal datum)
into another 3D coordinate system. These kinds of transfor- mations are also important for map
and GIS users. They are usually collecting spatial data in the field using satellite navigation
technology and need to repre- sent this data on published map on a local horizontal datum.
x = d(sin(a))
y = d(cos(a))
The inverse equation is: a = tan'1 (x/y)
d2 = x2 + y2
• A more realistic case makes use of a translation and a rotation to transform one system to the other.
Datum transformations
• A change of map projection may also include a change of the horizontal datum. This is the
case when the source projection is based upon a different horizontal datum than the target
projection. If the difference in horizontal datums is ignored, there will not be a perfect match
between adjacent maps of neighboring countries or between overlaid maps originating from
different projections.
• It may result in up to several hundred meters difference in the resulting coordinates.
Therefore, spatial data with different underlying horizontal datums may need a so-called
datum transformation.
• Suppose we wish to transform spatial data from the UTM projection to the Dutch RD
system, and that the data in the UTM system are related to the European Datum 1950
(ED50), while the Dutch RD system is based on the Amersfoort datum.
• In this example the change of map projection should be combined with a datum
transformation step for a perfect match. This is illustrated in Figure4.23.
1. The space segment, i.e. the satellites that orbit the Earth, and the radio signals that they
emit,
2. The control segment, i.e. the ground stations that monitor and maintain the space segment
components, and
3. The user segment, i.e. the users with their hard- and software to conduct positioning
Absolute positioning
The working principles of absolute, satellite-based positioning are fairly simple:
1. A satellite, equipped with a clock, at a specific moment sends a radio message that includes:
a) The satellite identifier,
b) Its position in orbit, and
c) Its clock reading.
2. A receiver on or above the planet, also equipped with a clock, receives the message slightly later, and
reads its own clock.
3. From the time delay observed between the two clock readings, and knowing the speed of radio
transmission through the medium between (satelite) sender and receiver, the receiver can
compute the distance to the sender, also known as the satellite’s pseudorange.
• The pseudorange of a satellite with respect to a receiver, is its apparent distance to the receiver,
computed from the time delay with which its radio signal is received.
• Such a computation determines the position of the receiver to be on a sphere of radius equal to
the computed pseudorange (refer to Figure4.24(a)).
Relative positioning
• One technique to remove errors from positioning computations is to perform many position
computations, and to determine the average over the solutions. Many receivers allow the user
to do so.
• It should however be clear from the above that averaging may address random errors like signal
noise, selective availability (SA) and multi-path to some extent, but not systematic sources of
error, like incorrect satellite data, atmospheric delays, and GDOP effects. These sources should
be removed before averaging is applied. It has been shown that averaging over 60 minutes in
absolute, single-point positioning based on code measurements, before systematic error
removal, leads only to a 10-20% improvement of accuracy.
• In such cases, receiver averaging is therefore of limited value, and requires long periods under
near-optimal conditions.
• In relative positioning, also known as differential positioning, one tries to remove some of the
systematic error sources by taking into account measurements of these errors in a nearby
stationary reference receiver with an accurately known position.
• By using these systematic error findings at the reference, the position of the target receiver of
interest will become known much more precisely.
Network positioning
• Network positioning is an integrated, systematic network of reference receivers covering a large
area like a continent or even the whole globe. The organization of such a network can take
different shapes, augmenting an already existing satellite-based system.
• A general architecture consists of a network of reference stations, strategically positioned in the
area to be covered, each of which is constantly monitoring signals and their errors for all
positioning satellites in view. One or more control centres receive the reference station data,
verify this for correctness, and relay (uplink) this information to a geostationary satellite.
• The satellite will retransmit the correctional data to the area that it covers, so that target
receivers, using their own approximate position, can determine how to correct for satellite signal
error, and consequently obtain much more accurate position fixes.
Positioning technology
• At present, two satellite-based positioning systems are operational (GPS and
GLONASS), and a third is in the implementation phase (Galileo). Respectively, these
are American, Russian and European systems. Any of these, but especially GPS and
Galileo, will be improved over time, and will be augmented with new techniques.
GPS
• The NAVSTAR Global Positioning System (GPS) was declared operational in 1994, providing
Precise Positioning Services (PPS) to US and allied military forces as well as US government
agencies, and Standard Positioning Services (SPS) to civilians throughout the world.
• Its space segment nominally consists of 24 satellites, each of which orbits the Earth in llh58m at
an altitude of 20,200 km. There can be any number of satellites active, typically between 21 and
27.
• The satellites are organized in six orbital planes, somewhat irregularly spaced, with an angle of
inclination of 55-63° with the equatorial plane, nominally having four satellites each (see Figure
4.28).
• This means that a receiver on Earth will have between five and eight (sometimes up to twelve)
satellites in view at any point in time. Software packages exist to help in planning GPS surveys,
identifying expected satellite set-up for any location and time.
GLONASS
• What GPS is to the US military, is GLONASS to the Russian military, specifically the Russian Space
Forces. Both systems were primarily designed on the basis of military requirements.
• The big difference between the two is that GPS generated a major interest in civil applications, thus
having an important economic impact. This cannot be said of GLONASS.
• The GLONASS space segment consists of nominally 24 satellites, organized in three orbital planes,
with an inclination of 64.8◦ with the equator. Orbiting altitude is 19,130 km, with a period of
revolution of 11 hours 16 min. GLONASS uses the PZ–90 as its reference system, and like GPS uses
UTC as time reference, though with an offset for Russian daylight.
Galileo
• In the 1990’s, the European Union (EU) judged that it needed to have its own satellite-based
positioning system, to become independent of the GPS monopoly and to support its own economic
growth by providing services of high reliability under civilian control.
• Galileo is the name of this EU system. The vision is that satellite-based positioning will become
even bigger due to the emergence of mobile phones equipped with receivers, perhaps with some
400 million users by the year 2015.
• Development of the system has experienced substantial delays, and at the time of writing
European ministers insist that Galileo should be up and running by the end of 2013. The
completed system will have 27 satellites, with three in reserve, orbiting in one of three, equally
spaced, circular orbits at an elevation of 23,222 km, inclined 56◦ with the equator. This higher
inclination, when compared to that of GPS, has been chosen to provide better positioning
coverage at high latitudes, such as northern Scandinavia where GPS performs rather poorly.
Spatial data can be obtained from various sources. It can be collected from scratch, using direct spatial
data acquisition techniques, or indirectly, by making use of existing spatial data collected by others.
Digitizing
• A traditional method of obtaining spatial data is through digitizing existing paper maps. This
can be done using various techniques. Before adopting this approach, one must be aware that
positional errors already in the paper map will further accumulate, and one must be willing
to accept these errors.
• There are two forms of digitizing: on-tablet and on-screen manual digitizing. In on-tablet
digitizing, the original map is fitted on a special surface (the tablet), while in on-screen
digitizing, a scanned image of the map (or some other image) is shown on the computer screen.
• In both of these forms, an operator follows the map's features with a mouse device, thereby
tracing the lines, and storing location coordinates relative to a number of previously defined
control points.
• The function of these points is to 'lock' a coordinate system onto the digitized data: the control
points on the map have known coordinates, and by digitizing them we tell the system
implicitly where all other digitized locations are. At least three control points are needed, but
preferably more should be digitized to allow a check on the positional errors made.
Scanning
• A scanner is an input device that illuminates a document and measures the intensity of the
reflected light with a CCD array. The result is an image as a matrix of pixels, each of which
holds an intensity value.
• Office scanners have a fixed maximum resolution, expressed as the highest number of pixels
they can identify per inch; the unit is dots-per- inch (dpi). For manual on-screen digitizing of a
paper map, a resolution of 200-300 dpi is usually sufficient, depending on the thickness of the
thinnest lines. For manual on-screen digitizing of aerial photographs, higher resolutions are
recommended — typically, at least 800 dpi.
• After scanning, the resulting image can be improved with various image processing
techniques. It is important to understand that scanning does not result in a structured data
set of classified and coded objects. Additional work is required to recognize features and to
associate categories and other thematic at- tributes with them.
Vectorization
• The process of distilling points, lines and polygons from a scanned image is called vectorization. As scanned
lines may be several pixels wide, they are often first thinned to retain only the centreline. The remaining
centreline pixels are converted to series of (x, y) coordinate pairs, defining a polyline.
• Subsequently, features are formed and attributes are attached to them. This process may be entirely
automated or performed semi- automatically, with the assistance of an operator. Pattern recognition
methods—like Optical Character Recognition (OCR) for text—can be used for the automatic detection of
graphic symbols and text.
• Vectorization causes errors such as small spikes along lines, rounded comers, errors in T- & X-junctions,
displaced lines or jagged curves. These errors are corrected in an automatic or interactive post-processing
phase. The phases of the vectorization process are illustrated in Figure below.
formed by a web portal which categorizes all available data and provides a local search engine
and links to data documentation also called metadata.
Metadata
• Metadata is defined as background information that describes all necessary information about
the data itself. More generally, it is known as 'data about data'.
• This includes: • Identification information : Data source(s), time of acquisition, etc. • Data quality
information : Positional, attribute and temporal accuracy, lineage, etc. • Entity and attribute
information: Related attributes, units of measure, etc.
Positional accuracy
• The surveying and mapping profession has a long tradition of determining and minimizing errors. This
applies particularly to land surveying and photogrammetry, both of which tend to regard positional
and height errors as undesirable.
• Cartographers also strive to reduce geometric and attribute errors in their products, and, in addition,
define quality in specifically cartographic terms, for example quality of linework, layout, and clarity of
text. It must be stressed that all measurements made with surveying and photogrammetric instruments
are subject to error.
• These include:
1. Human errors in measurement (e.g. reading errors) generally referred to as gross errors or
blunders. These are usually large errors resulting from care lessness which could be avoided
through careful observation, although it is never absolutely certain that all blunders have been
avoided or eleminated.
2. Instrumental or systematic errors (e.g. due to misadjustment of instruments). This leads to errors
that vary systematically in sign and/or magnitude, but can go undetected by repeating the
measurement with the same instrument. Systematic errors are paticularly dangerous because they
tend to accumulate.
3. Random errors caused by natural variations in the quantity being measured. These are effectively
the errors that remain after blunders and systematic errors have been removed. They are usually
small, and dealt with in least-squares adjustment, more general ways of quantifying positional
accuracy using root mean square error (RMSE).
• Measurement errors are generally described in terms of accuracy. In the case of spatial data, accuracy
may relate not only to the determination of coordinates (positional error) but also to the measurement
of quantitative attribute data. The accuracy of a single measurement can be defined as:
• "The closeness of observations, computations or estimates to the true values or the values perceived
to be true".
Accuracy tolerances
• Many kinds of measurement can be naturally represented by a bell-shaped probability density function p.
This function is known as the normal (or Gaussian) distribution of a continuous, random variable, in the
figure indicated as Y. It shape is determined by two parameters: ja, which is the mean expected value for Y,
and o which is the standard deviation of Y . A small a leads to a more attenuated bell shape.
• Any probability density function p has the characteristic that the area between its curve and the
horizontal axis has size 1. Probabilities P can be inferred from p as the size of an area under p's
curve. Figure above, for instance, depicts P (x - a < Y < x - a), i.e. the probability that the value for
Y is within distance a from |a. In a normal distribution this specific probability for Y is always
0.6826.
Attribute Accuracy
• Two types of attribute accuracies, related to the type of data it is dealing with:
o For nominal or categorical data, the accuracy of labeling (for example the type of land
cover, road surface, etc).
o For numerical data, numerical accuracy (such as the concentration of pollutants in the
soil, height of trees in forests, etc).
• It follows that depending on the data type, assessment of attribute accuracy may range from a
simple check on the labelling of features—for example, is a road classified as a metalled road
actually surfaced or not?—to complex statistical procedures for assessing the accuracy of
numerical data, such as the percentage of pollutants present in the soil.
Temporal Accuracy
• Spatial data sets captured through remotely sensed data has increased enormously over the last
decade. These data can provide useful temporal information such as changes in land ownership
and the monitoring of environmental processes such as deforestation.
• Analogous to its positional and attribute components, the quality of spatial data may also be
assessed in terms of its temporal accuracy. For a static feature this refers to the difference in the
values of its coordinates at two different times.
• This includes not only the accuracy and precision of time measurements but also the temporal
consistency of different data sets. Because the positional and attribute components of spatial
data may change together or independently, it is also necessary to consider their temporal
validity. For example, the boundaries of a land parcel may remain fixed over a period of many
years whereas the ownership attribute may change more frequently.
Lineage
• Lineage describes the history of a data set. In the case of published maps, some lineage
information may be provided as part of the metadata, in the form of a note on the data sources
and procedures used in the compilation of the data.
• Examples include the date and scale of aerial photography, and the date of field verification. For
digital data sets, however, lineage may be defined as: "that part of the data quality statement
that contains information that describes the source of observations or materials, data acquisition
and compilation methods, conversions, transformations, analyses and derivations that the data
has been subjected to, and the assumptions and criteria applied at any stage of its life."
Completeness
• Completeness refers to whether there are data lacking in the database compared to what exists
in the real world. Essentially, it is important to be able to assess what does and what does not
belong to a complete dataset as intended by its producer.
• It might be incomplete (i.e. it is 'missing' features which exist in the real world), or overcomplete
(i.e. it contains 'extra' features which do not belong’within the scope of the data set as it is
defined). Completeness can relate to spatial, temporal, or thematic aspects of a data set.
• For example, a data set of property boundaries might be spatially incomplete because it contains
only 10 out of 12 suburbs; it might be temporally incomplete because it does not include recently
subdivided properties; and it might be thematically over complete because it also includes
building footprints.
Logical consistency
• For any particular application, (predefined) logical rules concern:
o The compatibility of data with other data in a data set (e.g. in terms of data format),
o The absence of any contradictions within a data set,
o The topological consistency of the data set, and
o The allowed attribute value ranges, as well as combinations of attributes. For example,
attribute values for population, area, and population density must agree for all entities in
the database. The absence of any inconsistencies does not necessarily imply that the data
are accurate.
Rasterization or vectorization
• Vectorization produces a vector data set from a raster. We have looked at this in some sense
already: namely in the production of a vector set from a scanned image. Another form of
vectorization takes place when we want to identify features or patterns in remotely sensed
imagery. The keywords here are feature extraction and pattern recognition, which are dealt with
in Principles of Remote Sensing.
• If much or all of the subsequent spatial data analysis is to be carried out on raster data, one may
want to convert vector data sets to raster data. This process is known as rasterization.
• It involves assigning point, line and polygon attribute values to raster cells that overlap with the
respective point, line or polygon. To avoid information loss, the raster resolution should be
carefully chosen on the basis of the geometric resolution.
• A cell size which is too large may result in cells that cover parts of multiple vector features, and
then ambiguity arises as to what value to assign to the cell. If, on the other hand, the cell size is
too small, the file size of the raster may increase significantly.
Topology generation
• Topological relations may sometimes be needed, for instance in networks, e.g. the questions of
line connectivity, flow direction, and which lines have over- and underpasses. For polygons,
questions that may arise involve polygon inclusion: Is a polygon inside another one, or is the outer
polygon simply around the inner polygon? Many of these questions are mostly questions of data
semantics, and can therefore usually only be answered by a human operator.
2. They may be about the same area, but differ in choice of representation,
3. They may be about adjacent areas, and have to be merged into a single data set.
4. They may be about the same or adjacent areas, but referenced in different coordinate
systems.
Differences in accuracy
• These are clearly relevant in any combination of data sets which may themselves have varying
levels of accuracy. Images come at a certain resolution, and paper maps at a certain scale. This
typically results in differences of resolution of acquired data sets, all the more since map features
are sometimes intentionally displaced to improve readability of the map.
• For instance, the course of a river will only be approximated roughly on a small- scale map, and a
village on its northern bank should be depicted north of the river, even if this means it has to be
displaced on the map a little bit.
• The small scale causes an accuracy error. If we want to combine a digitized version of that map,
with a digitized version of a large-scale map, we must be aware that features may not be where
they seem to be. Analogous examples can be given for images at different resolutions.
• There can be good reasons for having data sets at different scales. A good example is found in
mapping organizations; European organizations maintain a single source database that contains
the base data.
Differences in representation
• Some advanced GIS applications require the possibility of representing the same geographic
phenomenon in different ways. These are called multi representation systems. The production of
maps at various scales is an example, but there are numerous others.
• The commonality is that phenomena must sometimes be viewed as points, and at other times as
polygons. For example, a small-scale national road network analysis may represent villages as
point objects, but a nation-wide urban population density study should regard all municipalities
as represented by polygons.
• The links between various representations for the same object maintained by the system allows
switching between them, and many fancy applications of their use seem possible. A comparison
is illustrated in Figure5.11.
• Examples include defining homogeneous areas (polygons) from our point data, or deriving
contour lines. This is generally referred to as interpolation, i.e. the calculation of a value from
'surrounding' observations. The principle of spatial autocorrelation plays a central part in the
process of interpolation.
• In order to predict the value of a point for a given (x, y) location, we could simply find the
'nearest' known value to the point, and assign that value. This is the simplest form of
interpolation, known as nearest-neighbour interpolation. We might instead choose to use the
distance that points are away from (x, y) to weight their importance in our calculation.
• A simple example is given in Figure5.13. Our field survey has taken only two measurements,
one at P and one at Q. The values obtained in these two locations are represented by a dark
and light green tint, respectively. If we are dealing with qualitative data, and we have no
further knowledge, the only assumption we can make for other locations is that those nearer
to P probably have P ’s value, whereas those nearer to Q have Q’s value. This is illustrated in
part (a)
• In figure 5.15, We have used the same set of point measurements, with four different
approximation functions. Part (a) has been determined under the assumption that the field can
be approximated by a tilted plane, in this case with a downward slope to the southeast. The values
found by regression techniques were: ci = -1.83934, c2 = 1.61645 and c3 = 70.8782, giving f(x, y)
= -1.83934 • x + 1.61645 • y + 70.8782.
Triangulation
• Another way of interpolating point measurements is by triangulation. Triangulated Irregular
Networks (TINs) technique constructs a triangulation of the study area from the known
measurement points. Preferably, the triangulation should be a Delaunay triangulation.
• After having obtained it, we may define for which values of the field we want to construct isolines.
For instance, for elevation, we might want to have the 100 m- isoline, the 200 m-isoline, and so
on.
• For each edge of a triangle, a geometric computation can be performed that indicates which
isolines intersect it, and at what positions they do so. A list of computed locations, all at the same
field value, is used by the GIS to construct the isoline. This is illustrated in Figure below
• In part (b) of the figure, the 295th cell value out of the 418 in total, is being computed. This
computation is based on eleven measurements, while that of the first cell had no measurements
available. Where this is the case, the cell should be assigned a value that signals this 'non-
availability of measurements'.
• The principle of spatial autocorrelation suggests that measurements closer to the cell centre
should have greater influence on the predicted value than those further away. In order to account
for this, a distance factor can be brought into the averaging function. Functions that do this are
called inverse distance weighting functions (IDW). This is one of the most commonly used
functions in interpolating spatial data.
Kriging
• Kriging was originally developed my mining geologists attempting to derive accurate estimates of
mineral deposits in a given area from limited sample measurements. It is an advanced
interpolation technique belonging to the field of geostatistics, which can deliver good results if
applied properly and with enough sample points.
• Kriging is usually used when the variation of an attribute and/or the density of sample points is
such that simple methods of interpolation may give unreliable predictions.
• The first step in the kriging procedure is to compare successive pairs of point measurements to
generate a semi-variogram.
• In the second step, the semi-variogram is used to calculate the weights used in interpolation.
Although kriging is a powerful technique, it should not be applied without a good understanding
of geostatistics, including the principle of spatial autocorrelation. It should be noted that there is
no single best interpolation method, since each method has advantages and disadvantages in
particular contexts.
• As a general guide, the following questions should be considered in selecting an appropriate
method of interpolation:
o For what type of application will the results be used?
UNIT 4
2) Overlay functions
These belong to the most frequently used functions in a GIS application. They allow the combination
of two (or more) spatial data layers comparing them position by position, and treating areas of
overlap—and of non-overlap —in distinct ways. In this way, we can find
• The cotton fields on black soils (select the 'cotton' cover in the crop data layer and the 'black'
cover in the soil data layer and perform an intersection),
• The fields where cotton or jowar is the crop (select both areas of 'cotton' and 'jowar' cover in
the crop data layer and take their union),
• The cotton fields not on red soils (perform a difference operator of areas with 'cotton' cover
with the areas having red soil),
• The fields that do not have wheat as crop (take the complement of the wheat areas).
3) Neighborhood functions
Whereas overlays combine features at the same location, neighborhood functions evaluate the
characteristics of an area surrounding a feature's location. A neighborhood function 'scans' the
neighborhood of the given feature(s), and performs a computation on it.
• Search functions allow the retrieval of features that fall within a given search window. This
window may be a rectangle, circle, or polygon.
• Buffer zone generation (or buffering) is one of the best known neighborhood functions. It
determines a spatial envelope (buffer) around (a) given feature(s). The created buffer may
have a fixed width, or a variable width that depends on characteristics of the area.
• Interpolation functions predict unknown values using the known values at nearby locations.
This typically occurs for continuous fields, like elevation, when the data actually stored does
not provide the direct answer for the location(s) of interest.
4) Connectivity functions
These functions work on the basis of networks, including road networks, water courses in coastal
zones, and communication lines in mobile telephony. These networks represent spatial linkages
between features. Main functions of this type include:
• Contiguity functions evaluate a characteristic of a set of connected spatial units. One can think
of the search for a contiguous area of forest of certain size and shape in a satellite image.
• Network analytic functions are used to compute over connected line features that make up a
network. The network may consist of roads, public transport routes, high voltage lines or
other forms of transportation infrastructure. Analysis of such networks may entail shortest
path computations (in terms of distance or travel time) between two points in a network for
routing purposes. Other forms are to find all points reachable within a given distance or
duration from a start point for allocation purposes, or determination of the capacity of the
network for transportation between an indicated source location and sink location.
• Visibility functions also fit in this list as they are used to compute the points visible from a
given location (viewshed modelling or viewshed mapping) using a digital terrain model.
Measurement
• Geometric measurement on spatial features includes counting, distance and area size
computations. For the sake of simplicity, this section discusses such measurements in a planar
spatial reference system.
• We limit ourselves to geometric measurements, and do not include attribute data measurement.
Measurements on vector data are more advanced, thus, also more complex, than those on raster
data.
• Area size is associated with polygon features. Again, it can be computed, but usually is stored with
the polygon as an extra attribute value. This speeds up the computation of other functions that
require area size values.
• Another geometric measurement used by the GIS is the minimal bounding box computation. It
applies to polylines and polygons, and determines the minimal rectangle-with sides parallel to the
axes of the spatial reference system-that covers the feature. This is illustrated in Figure6.1.
• Figure6.3shows an example of selection by attribute condition. The query ex- pression is Area <
400000, which can be interpreted as “select all the land use areas of which the size is less than 400,
000.” The polygons in red are the selected areas; their associated records are also highlighted in
red.
• Atomic conditions can be combined into composite conditions using logical connectives. The most
important ones are AND, OR, NOT and the bracket pair (• • •). If we write a composite condition
like Area < 400000 AND LandUse = 80,
Classification
• Classification is a technique of purposefully removing detail from an input data set, in the hope of
revealing important patterns (of spatial distribution). In the process, we produce an output data
set, so that the input set can be left intact.
• We do so by assigning a characteristic value to each element in the input set, which is usually a
collection of spatial features that can be raster cells or points, lines or polygons. If the number of
characteristic values is small in comparison to the size of the input set, we have classified the input
set.
• The pattern that we look for may be the distribution of household income in a city. Temperature
Shift is called the classification parameter. If we know for each ward in the city the associated
average recorded temperature, will have many different values.
• It can be defined in three different categories (or: classes): 'low', 'Moderate' and 'high', and
provide value ranges for each category. If these three categories are mapped in a sensible color
scheme, this may reveal interesting information. This has been done for Dares Salaam in
Figure6.9in two ways.
User-controlled classification
• In user-controlled classification, a user selects the attribute(s) that will be used as the classification
parameter(s) and defines the classification method. The latter involves declaring the number of
classes as well as the correspondence between the old attribute values and the new classes. This
is usually done via a classification table.
• Another case exists when the classification parameter is nominal or at least dis- crete. Such an
example is given in Figure6.10.
Automatic classification
User-controlled classifications require a classification table or user interaction. GIS software can also
perform automatic classification, in which a user only specifies the number of classes in the output data
set. The system automati- cally determines the class break points. Two main techniques of determining
break points are in use.
• minimum and maximum values vmin and vmax of the classification parameter are determined
and the (constant) interval size for each category is calculated as (vmax - vmin)/n, where n is
the number of classes chosen by the user. This classification is useful in revealing the
distribution patterns as it determines the number of features in each category.
2. Equal frequency technique
• This technique is also known as quantile classification. The objective is to create categories
with roughly equal numbers of features per category. The total number of features is
determined first and by the required number of categories, the number of features per
category is calculated. The class break points are then determined by counting off the features
in order of classification parameter value.
• Both techniques are illustrated on a small 5 × 5 raster in Figure 6.11.
• The standard overlay operator for two layers of polygons is the polygon intersection operator.
It is fundamental, as many other overlay operators proposed in the literature or implemented
in systems can be defined in terms of it.
• The result of this operator is the collection of all possible polygon intersections; the attribute
table result is a join—in the relational database of the two input attribute tables. This output
attribute table only contains one table for each intersection polygon found, and this explains
why we call this operator a spatial join.
• The expression on the right is evaluated by the GIS, and the raster in which it results is then stored
under the name on the left. The expression may contain references to existing rasters, operators
and functions; the format is made clear below. The raster names and constants that are used in
the expression are called its operands.
Arithmetic operators
• Various arithmetic operators are supported. The standard ones are multiplication (*), division (/),
subtraction (-) and addition (+). Other arithmetic operators may include modulo division (MOD)
and integer division (DIV). Modulo division returns the remainder of division: for example, 11
MOD 5 will return 1 as 10 - 5 * 2 = 1. Similarly, 10 DIV 2 will return 5.
Conditional expressions
• The above comparison and logical operators produce rasters with the truth value true and false.
In practice, we often need a conditional expression with them that allows us to test whether a
condition is fulfilled. The general format is:
For instance, our target might be a nearby ATM. Its neighborhood could be defined as:
o An area within 100m walking distance of an State Bank ATM, or
o An area within 2 km travel distance, or
o All roads within 500 m travel distance, or
o All other Bank ATM within 5 minutes travel time, or
o All Banks, for which the ATM is the closest.
To discover about the phenomena that exist or occur in the neighborhood. E. g. spatial extent,
also require statistical information like:
o The total population of the area,
o Average household income, or
Proximity Computations
• In proximity computations, we use geometric distance to define the neighborhood of one or more
target locations. The most common and useful technique is buffer zone generation.
• In vector-based buffer generation, the buffers themselves become polygon features, usually in a
separate data layer, that can be used in further spatial analysis.
• Buffer generation on rasters is a fairly simple function. The target location or locations are always
represented by a selection of the raster's cells, and geometric distance is defined, using cell
resolution as the unit.
• Figure below repeats the Delaunay triangulation of the Thiesse polygon partition constructed
from it is on the right.
Computation of Diffusion
• The determination of neighborhood of one or more target locations may depend not only on
distance—cases which we discussed above—but also on direction and differences in the terrain
in different directions. This typically is the case when the target location contains a 'source
material' that spreads over time, referred to as diffusion.
• This 'source material' may be air, water or soil pollution, commuters exiting a train station, people
from an opened-up refugee camp, a water spring uphill, or the radio waves emitted from a radio
relay station. In all these cases, one will not expect the spread to occur evenly in all directions.
There will be local terrain factors that influence the spread, making it easier or more difficult.
• Diffusion computation involves one or more target locations, which are better called source
locations in this context. They are the locations of the source of whatever spreads. The
computation also involves a local resistance raster, which for each cell provides a value that
indicates how difficult it is for the 'source material' to pass by that cell.
• The value in the cell must be normalized: i.e. valid for a standardized length (usually the cell's
width) of spread path. From the source location(s) and the local resistance raster, the GIS will be
able to compute a new raster that indicates how much minimal total resistance the spread has
witnessed for reaching a raster cell. This process is illustrated in Figure below.
• While computing total resistances, the GIS take proper care of the path lengths. Obviously, the
diffusion from a cell csrc to its neighbor cell to the east ce is shorter than to the cell that is its
northeast neighbor cne.
Flow Computation
• Flow computations determine how a phenomenon spreads over the area, in principle in all
directions, though with varying difficulty or resistance. There are also cases where a phenomenon
does not spread in all directions, but moves or 'flows' along a given, least- cost path, determined
again by local terrain characteristics.
• The typical case arises when we want to determine the drainage patterns in a catchment: the
rainfall water 'chooses' a way to leave the area.
• Cells with a high accumulated flow count represent areas of concentrated flow, and thus may
belong to a stream. By using some appropriately chosen threshold value in a map algebra
expression, we may decide whether they do. Cells with an accumulated flow count of zero are
local topographic highs, and can be used to identify ridges.
Applications
There are numerous examples where more advanced computations on continuous field representations
are needed. A short list is provided below.
• Slope angle calculation
o The calculation of the slope steepness, expressed as an angle in degrees or percentages,
for any or all locations.
• Slope aspect calculation
o The calculation of the aspect (or orientation) of the slope in degrees (between 0 and 360
degrees), for any or all locations.
• Slope length calculation
o With the use of neighborhood operations, it is possible to calculate for each cell the
nearest distance to a watershed boundary (the upslope length) and to the nearest stream
(the downslope length).
• Three-dimensional map display
o With GIS software, three-dimensional views of a DEM can be constructed, in which the
location of the viewer, the angle under which s/he is looking, the zoom angle, and the
amplification factor of relief exaggeration can be specified.
• Determination of change in elevation through time
o The cut-and-fill volume of soil to be removed or to be brought in to make a site ready for
construction can be computed by overlaying the DEM of the site before the work begins
with the DEM of the expected modified topography.
• Automatic catchment delineation
o Catchment boundaries or drainage lines can be automatically generated from a good
quality DEM with the use of neighborhood functions
• Dynamic modeling
o DEMs are increasingly used in GIS-based dynamic modeling, such as the computation of
surface run-off and erosion, groundwater flow, the delineation of areas affected by
pollution, the computation of areas that will be covered by processes such as debris flows
and lava flows.
• Visibility analysis
o A viewshed is the area that can be 'seen', i.e. in the direct line-of-sight from a specified
target location.
Filtering
• The principle of filtering is quite similar to that of moving window averaging. We define a window
and let the GIS move,it over the raster cell-by-cell. For each cell, the system performs some
computation, and assigns the result of this computation to the cell in the output raster.
• The. difference withimoving window averaging is that the moving window in filtering is itself a
little raster, which contains cell values that are used in the computation for the output cell value.
• This little raster is a filter, also known as a kernel which may be square (such as a 3x3 kernel), but
it does not have to be. The values in the filter are used as weight factors.
Chapter 6 : Analysis
NETWORK ANALYSIS
• A completely different set of analytical functions in GIS consists of computations on networks. A
network is a connected set of lines, representing some geographic phenomenon, typically of the
transportation type.
• The 'goods' transported can be almost anything: people, cars and other vehicles along a road
network, commercial goods along a logistic network, phone calls along a telephone network, or
water pollution along a stream/river network.
• Network analysis can be performed on either raster or vector data layers, but they are more
commonly done in the latter, as line features can be associated with a network, and hence can be
assigned typical transportation characteristics such as capacity and cost per unit. A fundamental
characteristic of any network is whether the network lines are considered directed or not.
• Additional application-specific rules are usually required to define what can and cannot happen
in the network. Most GISs provide rule-based tools that allow the definition of these extra
application rules. Various classical spatial analysis functions on networks are supported by GIS
software packages. The most important ones are:
1. Optimal path finding which generates a least cost-path on a network between a pair of
predefined locations using both geometric and attribute data.
2. Network partitioning which assigns network elements (nodes or line segments) to different
locations using predefined criteria.
• Notice that it is possible to travel on line b in Figure above, then take a U-turn at node N, and
return along a to where one came from. The question is whether doing this makes sense in
optimal path finding.
Network partitioning
• In network partitioning, the purpose is to assign lines and/or nodes of the network, in a
mutually exclusive way, to a number of target locations. Typically, the target locations play
the role of service centre for the network. This may be any type of service: medical treatment,
education, water supply. This type of network partitioning is known as a network allocation
problem.
• Another problem is trace analysis. Here, one wants to determine that part of the network that
is upstream (or downstream) from a given target location. Such problems exist in pollution
tracing along river/stream systems, but also in network failure chasing in energy distribution
networks.
Network allocation
• In network allocation, we have a number of target locations that function as resource centres,
and the problem is which part of the network to exclusively assign to which service centre.
• This may sound like a simple allocation problem, in which a service centre is assigned those
lines (segments) to which it is nearest, but usually the problem statement is more
complicated. These further complications stem from the requirements to take into account
• The capacity with which a centre can produce the resources (whether they are medical
operations, school pupil positions, kilowatts, or bottles of milk), and
• The consumption of the resources, which may vary amongst lines or line segments. After all,
some streets have more accidents, more children who live there, more industry in high
demand of electricity or just more thirsty workers.
Trace analysis
• Trace analysis is performed when we want to understand which part of a network is
'conditionally connected' to a chosen node on the network, known as the trace origin. For a
node or line to be conditionally connected, it means that a path exists from the node/line to
the trace origin, and that the connecting path fulfills the conditions set.
• What these conditions are depends on the application, and they may involve direction of the
path, capacity, length, or resource consumption along it. The condition typically is a logical
expression, as we have seen before, for example:
• The path must be directed from the node/line to the trace origin,
• Its capacity (defined as the minimum capacity of the lines that constitute the path) must be
above a given threshold, and
• The path's length must not exceed a given maximum length.
• The solution to a (spatial) problem usually depends on a large number of parameters. Since
these parameters are often interrelated, their interaction is made more precise in an
application model.
• The nature of application models varies enormously. GIS applications for famine relief
programs, for instance, are very different from earthquake risk assessment applications,
though both can make use of GIS to derive a solution. Many kinds of application models exist,
and they can be classified in many different ways.
• Purpose of the model refers to whether the model is descriptive, prescriptive or predictive in
nature. Descriptive models attempt to answer the "what is" question. Prescriptive models
usually answer the "what should be" question by determining the best solution from a given
set of conditions.
• Methodology refers to the operational components of the model. Stochastic models use
statistical or probability functions to represent random or semi-random behaviour of
phenomena. In contrast, deterministic models are based upon a well-defined cause and effect
relationship. Examples of deterministic models include hydrological flow and pollution
models, where the 'effect' can often be described by numerical methods and differential
equations.
• Scale refers to whether the components of the model are individual or aggregate in nature.
Essentially this refers to the 'level' at which the model operates. Individual-based models are
based on individual entities, such as the agent-based models described above, whereas
aggregate models deal with 'grouped' data, such as population census data.
• Dimensionality is the term chosen to refer to whether a model is static or dynamic, and spatial
or aspatial. Some models are explicitly spatial, meaning they operate in some geographically
defined space. Some models are aspatial, meaning they have no direct spatial reference.
• Implementation logic refers to how the model uses existing theory or knowledge to create
new knowledge. Deductive approaches use knowledge of the overall situation in order to
predict outcome conditions. This includes models that have some kind of formalized set of
criteria, often with known weightings for the inputs, and existing algorithms are used to derive
outcomes.
• One of the most commonly applied operations in geographic information systems is analysis
by overlaying two or more spatial data layers. Each such layer will contain errors, due to both
inherent inaccuracies in the source data and errors arising from some form of computer
processing, for example, rasterization. During the process of spatial overlay, all the errors in
the individual data layers contribute to the final error of the output.
• Modeling of error propagation has been defined by Veregin as: "the application of formal
mathematical models that describe the mechanisms whereby errors in source data layers are
modified by particular data transformation operations."
UNIT 5
Data visualization
• During the visualization process, cartographic methods and techniques are applied. These can be
considered to form a kind of grammar that allows for the optimal design and production for the
use of maps, depending on the application(see Figure7.8).
• The producer of these visual products may be a professional cartographer, but may also be a
discipline expert, for instance, mapping vegetation stands using remote sensing images, or health
statistics in the slums of a city. To enable the translation from spatial data into graphics; we
assume that the data are available and that the spatial database is well structured.
• The visualization process can vary greatly depending on where in the spatial data handling process
it takes place and the purpose for which it is needed. Visualizations can be, created during any
phase of the spatial data handling process. They can be simple or complex, while the production
time can be short or long.
• Some examples are the creation of a full, traditional topographic map sheet, a newspaper map, a
sketch map, a map from an electronic atlas, an animation showing the growth of a city, a three-
dimensional view of a building or a mountain, or even a real-time map display of traffic conditions.
• The visualization process is always influenced by several factors. Some of these questions can be
answered by just looking at the content of the spatial database:
o What will be the scale of the map: large, small, other? This introduces the problem of
generalization. Generalization addresses the meaningful reduction of the map content
during scale reduction.
o Are we dealing with topographic or thematic data? These two categories traditionally
resulted in different design approaches as was explained in the previous section.
o More important for the design is the question of whether the data to be represented are
of a quantitative or qualitative nature.
role: it is not only a communication tool, but also has become an aid in the user's visual thinking
process.
• This thinking process is accelerated by the continued developments in hardware and software.
Media like DVD-ROMs and the WWW allow dynamic presentation and also user interaction. These
went along with changing scientific and societal needs for georeferenced data and maps. Users
now expect immediate and real-time access to the data; data that have become abundant in many
sectors of the geoinformation world. This abundance of data, seen as a 'paradise' by some sectors,
is a major problem in other sectors.
• We lack the tools for user-friendly queries and retrieval when studying the massive amount of
spatial data produced by sensors, which is now available via the WWW. A new branch of science
is currently evolving to deal with this problem of abundance. In the geo-disciplines, it is called
visual data mining.
• These developments have given the term visualization an enhanced meaning. According to the
dictionary, it means ‘to make visible’ or ’to represent in graphical form’.
• Developments in scientific visualization stimulated DiBiase [18] to define a model for map-based
scientific visualization, also known as geovisualization. It covers both the presentation and
exploration functions of the map (see Figure7.9). Presentation is described as ‘public visual
communication’ since it concerns maps aimed at a wide audience. Exploration is defined as
‘private visual thinking’ because it is often an individual playing with the spatial data to determine
its significance.
• For instance, if all data are related to land use, collected in 2015, the title could be Landuse of . .
. 2015. Secondly, the individual component(s), such as landuse, and probably relief, should be
analyzed and their nature described. Later, these components should be visible in the map legend.
• Different types of data in relation to how it might map or display them.
• These visual variables can be used to make one symbol different from another. In doing this, map
makers in principle have free choice, provided they do not violate the rules of cartographic
grammar. They do not have that choice when deciding where to locate the symbol in the map.
The symbol should be located where features belong. Visual variables influence the map user’s
perception in different ways. What is perceived depends on the human capacity to see or
perceive:
o What is of equal importance (e.g. all red symbols represent danger),
o Order (e.g. the population density varies from low to high—represented by light and
dark color tints, respectively),
o Quantities (e.g. symbols changing in size with small symbols for small amounts), or An
instant overview of the mapped theme.
• The application of colour would be the best solution since is has characteristics that allow one to
quickly differentiate between different geographic units. How- ever, since none of the watersheds
is more important than the others, the colours used have to be of equal visual weight or brightness.
Figure7.12gives an example of a correct map.
• The fact that it is easy to make errors can be seen in Figure7.15. In7.15(a), differ- ent tints of green
(the visual variable ‘value’) have been used to represent absolute population numbers. The reader
might get a reasonable impression of the indi- vidual amounts but not of the actual geographic
distribution of the population, as the size of the geographic units will influence the perceptional
properties too much. Imagine a small and a large unit having the same number of inhabitants.
• The large unit would visually attract more attention, giving the impression there are more people
than in the small unit. Another issue is that the population is not necessarily homogeneously
distributed within the geographic units. Colour has also been misused in Figure7.15(b).
• Socio-economic data can also be viewed in three dimensions. This may result in dramatic images,
which will be long remembered by the map user. Figure7.19 shows the absolute population figures
of Overijssel in three dimensions.
MAP DISSEMINATION
• The map design will not only be influenced by the nature of the data to be mapped or the intended
audience (the 'what' and 'whom' from "How do I say What to Whom, and is it Effective"), the
output medium also plays a role. Traditionally, maps were produced on paper, and many still are.
Currently, most maps are presented on screen, for a quick view, for an internal presentation or
for presentation on the WWW.
• Compared to maps on paper, on-screen maps have to be smaller, and therefore their contents
should be carefully selected. This might seem a disadvantage, but presenting maps on-screen
offers very interesting alternatives. A mouse click could also open the link to a database, and
reveal much more information than a paper map could ever offer. Links to other than tabular or
map data could also be made available.
• Maps and multimedia (photography, sound, video or animation) can be integrated. Some of
today's electronic atlases, such as the Encarta World Atlas are good examples of how multimedia
elements can be integrated with the map. Pointing to a country on a world map starts the national
anthem of the country or shows its flag. It can be used to explore a country's language; moving
the mouse would start a short sentence in the region's dialects.
• The World Wide Web is a popular medium used to present and disseminate spatial data. Here,
maps can play their traditional role, for instance to show the location of objects, or provide insight
into spatial patterns, but because of the nature of the internet, the map can also function as an
interface to additional information. Geographic locations on the map can be linked to
photographs, text, sound or other maps, perhaps even functions such as on-line booking services.
Maps can also be used as 'previews' of spatial data products to be acquired through a spatial data
clearinghouse that is part of a Spatial Data Infrastructure. For that purpose we can make use of
geo-webservices which can provide interactive map views as intermediate between data and web
browser.