UNIT II
Creating visual representations: visualization reference model, visual
mapping, visual analytics, Design of visualization applications
A Reference Model
• Let’s take raw data as our starting point, or rather abstract data provided by the world around us.
• we speak of abstract data when these data don’t necessarily have a specific connection with physical space.
• For example, they may deal with people’s names, the prices of consumer products, voting results, and so on.
These data are rarely available in a format that is suitable for treatment with automatic processing tools and,
in particular, visualization software.
• Therefore, they must be processed appropriately, before being represented graphically.
The creation of a visual artifact is a process that we can model through a sequence of successive stages:
1. preprocessing and data transformations
2. visual mapping
3. view creation.
Pre-processing and Data Transformations
• “raw” to describe data supplied by the world around us. They can be data generated by tools:
values of some polluting agents taken from a monitoring station during pollution testing.
calculated by appropriate software, such as weather forecast data.
data linked to measurable events and entities that we find in nature or the social world, like the number of inhabitants
or birth rates of the cities in a specific state.
• In each case, these collections of data (known as datasets) are very rarely supplied to us with a precise logical structure.
To be able to process these data using software, we have to give them an organized logical structure.
• The structure usually used for this type of data is tabular—the arranging of data in a table—in a format appropriate for the
software that must receive and process them.
• Sometimes the input data are contained in one or more databases and are, therefore, already available in electronic format
and with a well-defined structure.
• In this case, the raw data correspond to the data located in the databases, and the phases of preprocessing and elaboration
involve extracting these data from the database and converting them into the structured format used by the visualization
software
• The data structures can also be enriched with additional information or preliminary processing.
• In particular, filtering operations to eliminate unnecessary data and calculations for obtaining new data,
such as statistics to be represented in the visual version, can be performed; furthermore, we can add
attributes to the data (also called metadata) that may be used to logically organize the tabular data. The
intermediate data structure, of the example we are processing, could therefore look like the following:
• Let us assume we want to study how people communicate in a discussion forum—the Internet-based communication tools that allow
users to converse through an exchange of messages. The users can write a message on the forum, which all other users of the service
can read.
• Anyone can reply to the message, thus creating an environment of interactive discussion.
• Imagine having to carry out an analysis on data relative to the number of messages read and written in a discussion forum.
• Suppose, for instance, that we wish to quickly single out both the most active users (or, rather, those who read and write a high
number of messages in the forum), as well as the users who silently read all of the messages and don’t take an active part in the
discussion.
• The tools that offer this type of service usually record every action carried out by the system’s users in an appropriate file: the log file.
• This file will be the source of row data in our system. The preprocessing phase should convert these data into a tabular format
VISUAL MAPPING
• The key problems of this process lie in defining which visual structures to use to map the data and their location in the display area.
abstract data don’t necessarily have a real location in physical space. There are some types of abstract data that, by their very nature,
can easily find a spatial location.
• For example, the data taken from a monitoring station for atmospheric pollution can easily find a position on a geographic map,
given that the monitoring stations that take the measurements are situated in a precise point in the territory.
• The same can be said for data taken from entities that have a topological structure, such as the traffic data of a computer network.
However, there are several types of data that belong to entities that have no natural geographic or topological positioning.
• The bibliographic references in scientific texts, of the consumption of car fuel, or of the salaries of various professional figures
within a company. This type of data doesn’t have an immediate correspondence with the dimensions of the physical space that
surround it.
• We must therefore define the visual structures that correspond to the data that we want to represent visually. This process is called
visual mapping. Three structures must be defined :
• 1. Spatial substrate
• 2. Graphical elements
• 3. Graphical properties
• The spatial substrate defines the dimensions in physical space where the visual representation is created. The spatial
substrate can be defined in terms of axes.
• In Cartesian space, the spatial substrate corresponds to x- and y-axes. Each axis can be of different types, depending on the
type of data that we want to map on it.
• In particular, an axis can be quantitative, when there is a metric associated to the values reported on the axis; ordinal, when
the values are reported on the axis in an order that corresponds to the order of the data; and nominal, when the region of an
axis is divided into a collection of subregions without any intrinsic order.
• The graphical elements are everything visible that appears in the space.
• There are four possible types of visual elements: points, lines, surfaces, and volumes
• The graphical properties are properties of the graphical elements to which the retina of the human eye is
very sensitive (for this reason, they are also called retinal variables). They are independent of the position
occupied by a visual element in spatial substrate.
• The most common graphical properties are size, orientation, color, texture, and shape.
• These are applied to the graphical elements and determine the properties of the visual layout that will be
presented in the view
• Color has to be given particular attention. In fact, color is the only graphical property in which perception can depend on
cultural, linguistic, and physiological factors. Some populations, for example, use a limited number of terms to define the
entire color spectrum (in some populations, there are only two words to describe the colors: black and white).
VIEWS
• The views are the final result of the generation process. They are the result of the mapping of data
structures to the visual structures, generating a visual representation in the physical space represented by
the computer.
• They are what we see displayed on the computer screen.
• The views are characterized by a difficult inherent problem, a quantity of data to be represented that is
too large for the available space. This is a problem that we come across rather frequently, given that
very often real situations involve a very large amount of data (at times even millions of items).
• In these cases, when the display area is too small to visibly support all the elements of a visual
representation, certain techniques are used, including zooming, panning, scrolling, focus + context
Designing a Visual Application
• Define the problem by spending a certain amount of time with potential users of the visual
representation.
• Identify their effective needs and how they work.
• Why is a representation needed?
• Is it needed to communicate something?
• Is it needed for finding new information? Or is it needed to prove hypotheses?
• It is necessary to bear in mind the human factors specific to the target audience that the application
will address and, in particular, their cognitive and perceptive abilities.
• This will influence the choice of which visual models to use, to allow users to understand the
information.
• Examine the nature of the data to represent.
• The data can be quantitative (e.g., a list of integers or real numbers),
• ordinal (data of a non numeric nature, but which have their own intrinsic order, such as the days of the week),
• categorical (data that have no intrinsic order, such as the names of people or cities).
• A different mapping may be appropriate, according to the data type.
• Number of dimensions(also called attributes)
• The attributes can be independent or dependent.
• The dependent attributes are those that vary and whose behavior we are interested in analyzing with respect to the
independent attributes.
• According to the number of dependent attributes, we have a collection of data that is called univariate (one
dimension varies with respect to another), bivariate (there are two dependent dimensions), trivariate (three
dependent dimensions), or multivariate (four or more dimensions that vary compared to the independent ones).
• the height of ten students in a class can be recorded and this is univariate data. There is only one variable
which is the height and thus it does not have any relationship and cause attached to it.
• Let us understand the example of studying the relationship between systolic blood pressure and age. Here you
take a sample of people in a particular age group. Say you take the sample of 10 workers.
• Data structures
• Linear (the data are codified in linear data structures like vectors, tables, collections, etc.),
• Temporal (data that change in time),
• Spatial or geographical (data that have a correspondence with something physical, such as a map, floorplan,
etc.),
• Hierarchical (data relative to entities organized on hierarchy, for example, genealogy, flowcharts, files on a
disk, etc.),
• Network (data that describe relationships between entities).
• Type of interaction
• This determines if the visual representation is static (e.g., an image printed on paper or an image represented on
a computer screen but not modifiable by the user),
• transformable (when the user can control the process of modification and transformation of data, such as
varying some parameters of data entry, varying the extremes of the values of some attributes, or choosing a
different mapping for view creation), or
• manipulable (the user can control and modify some parameters that regulate the generation of the views, like
zooming on a detail or rotating an image represented in 3D).
A collection of data is defined as univariate when one of its
attributes varies with respect to one or more independent
attributes. Let’s suppose that we have to analyze the gross
national product (GNP) realized by some nations in 2000.
suppose that we need to examine the total value of the overseas import and exports goods of these nations. In this
case also, the values can be reported in a table