0% found this document useful (0 votes)
115 views519 pages

86) Introduction To Computer Graphics Using OpenGL and Java (Karsten Lehn, Merijam Gotzes, Frank Klawonn) 2023

Uploaded by

ahnleo1778
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views519 pages

86) Introduction To Computer Graphics Using OpenGL and Java (Karsten Lehn, Merijam Gotzes, Frank Klawonn) 2023

Uploaded by

ahnleo1778
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 519

Undergraduate Topics in Computer

Science

Series Editor
Ian Mackie
University of Sussex, Brighton, UK

Advisory Editors
Samson Abramsky
Department of Computer Science, University of Oxford, Oxford, UK

Chris Hankin
Department of Computing, Imperial College London, London, UK

Mike Hinchey
Lero – The Irish Software Research Centre, University of Limerick,
Limerick, Ireland

Dexter C. Kozen
Department of Computer Science, Cornell University, Ithaca, NY, USA

Andrew Pitts
Department of Computer Science and Technology, University of
Cambridge, Cambridge, UK

Hanne Riis Nielson


Department of Applied Mathematics and Computer Science, Technical
University of Denmark, Kongens Lyngby, Denmark

Steven S. Skiena
Department of Computer Science, Stony Brook University, Stony Brook, NY,
USA
Iain Stewart
Department of Computer Science, Durham University, Durham, UK

Joseph Migga Kizza


College of Engineering and Computer Science, The University of
Tennessee-Chattanooga, Chattanooga, TN, USA

‘Undergraduate Topics in Computer Science’ (UTiCS) delivers high-quality


instructional content for undergraduates studying in all areas of computing
and information science. From core foundational and theoretical material to
final-year topics and applications, UTiCS books take a fresh, concise, and
modern approach and are ideal for self-study or for a one- or two-semester
course. The texts are all authored by established experts in their fields,
reviewed by an international advisory board, and contain numerous
examples and problems, many of which include fully worked solutions.
The UTiCS concept relies on high-quality, concise books in softback
format, and generally a maximum of 275–300 pages. For undergraduate
textbooks that are likely to be longer, more expository, Springer continues
to offer the highly regarded Texts in Computer Science series, to which we
refer potential authors.
Karsten Lehn, Merijam Gotzes and Frank Klawonn

Introduction to Computer Graphics


Using OpenGL and Java
3rd ed. 2023
Karsten Lehn
Faculty of Information Technology, Fachhochschule Dortmund, University
of Applied Sciences and Arts, Dortmund, Germany

Merijam Gotzes
Hamm-Lippstadt University of Applied Sciences, Hamm, Germany

Frank Klawonn
Data Analysis and Pattern Recognition Laboratory, Ostfalia University of
Applied Sciences, Braunschweig, Germany

ISSN 1863-7310 e-ISSN 2197-1781


Undergraduate Topics in Computer Science
ISBN 978-3-031-28134-1 e-ISBN 978-3-031-28135-8
https://doi.org/10.1007/978-3-031-28135-8

Translation from the German language edition: “Grundkurs Computergrafik


mit Java ” by Karsten Lehn et al., © Springer Fachmedien Wiesbaden
GmbH, ein Teil von Springer Nature 2022. Published by Springer Vieweg
Wiesbaden. All Rights Reserved.

© Springer Nature Switzerland AG 2008, 2012, 2023

Translation from the German language edition 'Grundkurs Computergrafik


mit Java' by Frank Klawonn, Karsten Lehn, Merijam Gotzes © Springer
Fachmedien Wiesbaden 2022. Published by Springer Vieweg Wiesbaden.
All Rights Reserved.
1st & 2nd edition: © Springer-Verlag London Limited 2008, 2012

This work is subject to copyright. All rights are reserved by the Publisher,
whether the whole or part of the material is concerned, specifically the
rights of reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter
developed.

The use of general descriptive names, registered names, trademarks, service


marks, etc. in this publication does not imply, even in the absence of a
specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice
and information in this book are believed to be true and accurate at the date
of publication. Neither the publisher nor the authors or the editors give a
warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The
publisher remains neutral with regard to jurisdictional claims in published
maps and institutional affiliations.

This Springer imprint is published by the registered company Springer


Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham,
Switzerland
Preface to the Third Edition
Since the publication of the first edition of this book, computer graphics has
evolved considerably. More applications for diverse fields of application
have emerged in which computer graphics are used. This is true for mobile
devices, such as smartphones, as well as for classic desktop computers and
notebooks. Computer graphics have always been of great importance in the
computer games and film industries. These areas of application have grown
significantly in recent years. In many other areas, such as medicine, medical
technology or education, computer-generated graphics have become
established. This is particularly evident in the emerging fields of augmented
reality (AR) and virtual reality (VR), which are based on computer
graphics.
Hand in hand with the wider dissemination and use of computer
graphics, the performance of modern graphics processors has been
significantly and steadily increased. In parallel to this, the software
interfaces to the graphics processors and the available software tools for the
creation of computer graphics have been significantly further developed.
An essential development is the tendency to have more and more
computing tasks performed by graphics processors in order to relieve the
central processing unit for other tasks. This requires methods for
programming the graphics pipeline of the graphics processors flexibly and
efficiently. Because of this need, programs that can be executed directly on
the graphics card—the so-called shaders—have gained immense
importance. The use of shaders has increased accordingly and the shaders
themselves have become more and more powerful, so that in the meantime
almost any computation, even independent of graphics processing, can be
performed by graphics processors using the compute shader.
Until the third edition, this book contained examples of the Java 2D and
Java 3D graphics systems. Although it is still possible to learn many
principles of computer graphics using Java 2D and Java 3D, it seems
timely, given the above background, to also introduce the basics of shader
programming. Since Java 2D and Java 3D do not allow this, this edition
contains only examples for the graphics programming interface Open
Graphics Library (OpenGL).
The OpenGL is a graphics programming interface that has become very
widespread over the last decades due to its open concept and platform
independence. It is now supported by the drivers of all major graphics
processors and graphics cards for the major operating systems. The web
variant WebGL (Web Graphics Library) is now supported by all major web
browsers, enabling shader-based graphics programming for web
applications. Despite the very widespread use of the OpenGL, it serves in
this book only as an example of the concepts of modern graphics
programming that can also be found in other programming interfaces, such
as Direct3D, Metal or Vulkan.
Vulkan is a graphics programming interface which, like the OpenGL,
was specified by the Khronos Group and was regarded as the successor to
the OpenGL. One objective in the development of Vulkan was to reduce the
complexity of the drivers that have to be supplied with the graphics
hardware by its manufacturer. Likewise, more and detailed control over the
processing in the graphics pipeline should be made possible. This led to the
development of Vulkan, a programming interface that has a lower level of
abstraction than the OpenGL and thus leads to a higher programming effort
for the graphics application and to a higher effort for initial training. In the
meantime, it has become apparent that the OpenGL will coexist with
Vulkan for even longer than originally thought. The Khronos Group
continues to further develop the OpenGL, and leading GPU manufacturers
announced they will continue the support for the OpenGL.
In 2011, the Khronos Group developed the first specification of
WebGL, which enables hardware-supported and shader-based graphics
programming for web applications. Currently, WebGL 2.0 is supported by
all major web browsers. Its successor, WebGPU, is currently under
development.
In addition, the graphics programming interface OpenGL has been
successfully used in teaching at many colleges and universities for years.
Because of this widespread use of the OpenGL and its application in
industry and education, the OpenGL was chosen for this book for a basic
introduction to modern graphics programming using shaders.
Java is one of the most popular and well-liked object-oriented high-
level programming languages today. It is very widespread in teaching at
colleges and universities, as its features make the introduction to
programming very easy. Furthermore, it is easy to switch to other
programming languages. This is confirmed again and again by feedback
from students after a practical semester or from young professionals.
Although the OpenGL specification is independent of a programming
language, many implementations and development tools exist for the
programming languages C and C++. For this reason, many solutions to
graphics programming problems, especially in web forums, can be found
for these programming languages.
The aim of this edition of this book is to combine the advantages of the
easy-to-learn Java programming language with modern graphics
programming using the OpenGL for a simple introduction to graphics
programming. The aim is to make it possible to enter this field even with
minimal knowledge of Java. Thus, this book can be used for the teaching of
computer graphics already early in the studies. For this purpose, the Java
binding Java OpenGL (JOGL) was chosen, which is very close in its use to
OpenGL bindings to the programming language C. This makes it easy to
port solutions implemented in C to Java. The limitation to a single Java
binding (JOGL) was deliberate in order to support the learner in
concentrating on the core concepts of graphics programming. All in all, the
combination of Java and JOGL appears to be an ideal basis for beginners in
graphics programming, with the potential to transfer the acquired
knowledge to other software development environments or to other
graphics programming interfaces with a reasonable amount of effort.
With the Java Platform Standard Edition (Java SE), the two-dimensional
programming interface Java 2D is still available, which was used for
examples in this book until the third edition. Since the concepts of OpenGL
differ fundamentally from the Java 2D concepts, all examples in this book
are OpenGL examples with the Java binding JOGL. This simplifies learning
for the reader and eliminates the need for rethinking between Java 2D, Java
3D or OpenGL.
With this decision, the explicit separation in this book between two-
dimensional and three-dimensional representations was also removed,
resulting in a corresponding reorganisation of the chapter contents.
Nevertheless, the first chapters still contain programming examples that can
be understood with a two-dimensional imagination. It is only in later
chapters that the expansion to more complex three-dimensional scenes takes
place.
Due to the use of the OpenGL, the contents for this edition had to be
fundamentally revised and expanded. In particular, adjustments were made
to account for the terminology used in the OpenGL specification and
community. This concerns, for example, the representation of the viewing
pipeline in the chapter on geometry processing. Likewise, the chapter on
rasterisation was significantly expanded. Since the OpenGL is very
complex, a new chapter was created with an introduction to the OpenGL, in
which very basic OpenGL examples are explained in detail, in order to
reduce the undoubtedly high barrier to entry in this type of programming. In
particular, there is a detailed explanation of the OpenGL graphics pipelines,
the understanding of which is essential for shader programming.
Furthermore, a fully revised and expanded chapter explains the basic
drawing techniques of the OpenGL, which can be applied to other graphics
systems. In the other chapters, the aforementioned adaptations and
extensions have been made and simple OpenGL examples for the Java
binding JOGL have been added.
The proven way of imparting knowledge through the direct connection
between theoretical content and practical examples has not changed in this
edition. Emphasis has been placed on developing and presenting examples
that are as simple and comprehensible as possible in order to enable the
reader to create his or her own (complex) graphics applications from this
construction kit. Furthermore, this edition follows the minimalist
presentation of the course content of the previous editions in order not to
overwhelm beginners. On the other hand, care has been taken to achieve as
complete an overview as possible of modern graphics programming in
order to enable the professional use of the acquired knowledge. Since both
goals are contradictory and OpenGL concepts are complex, this was not
possible without increasing the number of pages in this book.
The theoretical and fundamental contents of the book are still
comprehensible even without a complete understanding of the
programming examples. With such a reading method, the sections in which
“OpenGL” or “JOGL” occurs can be skipped. It is nevertheless advisable to
read parts of the introduction chapter to the OpenGL and the chapter on the
basic geometric objects and drawing techniques of the OpenGL, since these
basics are also used in other graphics systems. In these two chapters, the
abstraction of the theoretical contents and thus a strict separation from an
application with the OpenGL were dispensed with in favour of a simple
introduction to OpenGL programming and a reduction in the number of
pages. All these changes and adaptations have been made against the
background of many years of experience with lectures, exercises and
practical courses in computer graphics with Java 2D, Java 3D and the
OpenGL under Java and thus include optimisations based on feedback from
many students.
Additional material for this edition is also available for download; see
https://link.springer.com. A web reference to supplementary material
(supplementary material online) can be found in the footer of the first page
of a chapter.
The aim of this edition is to provide an easy-to-understand introduction
to modern graphics programming with shaders under the very successful
Java programming language by continuing the proven didactic concept of
the previous editions. We hope that we have succeeded in this and that the
book will continue the success of the previous editions.
Karsten Lehn
Merijam Gotzes
Frank Klawonn
Dortmund, Germany
Lippstadt, Germany
Wolfenbüttel, Germany
July 2022
Acknowledgement
Through years of work with our students in lectures, exercises and practical
courses, we were able to collect suggestions for improvement and use them
profitably for the revision of this textbook. Therefore, we would like to
thank all students who have directly or indirectly contributed to this book.
Our very special thanks go to Mr. Zellmann, Mr. Stark, Ms. Percan and
Mr. Pörschmann, whose constructive feedback contributed decisively to the
improvement of the manuscript.
Furthermore, we would like to thank Springer International Publishing,
who has made a fundamental revision and the participation of two new
authors possible with this edition.
On a very personal note, we would like to thank our families for their
support and understanding during the preparation of the manuscript and the
supplementary materials for the book.
Dortmund, Germany
Lippstadt, Germany
Wolfenbüttel, Germany
July 2022

Karsten Lehn

Merijam Gotzes

Frank Klawonn
Contents
1 Introduction
1.1 Application Fields
1.2 From the Real Scene to the Computer Generated Image
1.3 Rendering and Rendering Pipeline
1.4 Objectives of This Book and Recommended Reading Order for
the Sections
1.5 Structure of This Book
1.6 Exercises
References
2 The Open Graphics Library (OpenGL)
2.1 Graphics Programming Interfaces
2.2 General About the OpenGL
2.3 The OpenGL and Java
2.4 Profiles
2.5 OpenGL Graphics Pipelines
2.5.1 Vertex Processing
2.5.2 Vertex Post-Processing
2.5.3 Primitive Assembly
2.5.4 Rasterisation
2.5.5 Fragment Processing
2.5.6 Per-Fragment Operations
2.5.7 Framebuffer
2.6 Shaders
2.7 OpenGL Programming with JOGL
2.8 Example of a JOGL Program Without Shaders
2.9 Programming Shaders
2.9.1 Data Flow in the Programmable Pipeline
2.9.2 OpenGL and GLSL Versions
2.9.3 OpenGL Extensions
2.9.4 Functions of the GLSL
2.9.5 Building a GLSL Shader Program
2.10 Example of a JOGL Program Using GLSL Shaders
2.11 Efficiency of Different Drawing Methods
2.12 Exercises
References
3 Basic Geometric Objects
3.1 Surface Modelling
3.2 Basic Geometric Objects in the OpenGL
3.2.1 Points
3.2.2 Lines
3.2.3 Triangles
3.2.4 Polygon Orientation and Filling
3.2.5 Polygons
3.2.6 Quadrilaterals
3.3 OpenGL Drawing Commands
3.3.1 Indexed Draw
3.3.2 Triangle Strips
3.3.3 Primitive Restart
3.3.4 Base Vertex and Instanced Rendering
3.3.5 Indirect Draw
3.3.6 More Drawing Commands and Example Project
3.4 Exercises
References
4 Modelling Three-Dimensional Objects
4.1 From the Real World to the Model
4.2 Three-Dimensional Objects and Their Surfaces
4.3 Modelling Techniques
4.4 Modelling the Surface of a Cube in the OpenGL
4.5 Surfaces as Functions in Two Variables
4.5.1 Representation of Landscapes
4.6 Parametric Curves and Freeform Surfaces
4.6.1 Parametric Curves
4.6.2 Efficient Computation of Polynomials
4.6.3 Freeform Surfaces
4.7 Normal Vectors for Surfaces
4.8 Exercises
References
5 Geometry Processing
5.1 Geometric Transformations in 2D
5.1.1 Homogeneous Coordinates
5.1.2 Applications of Transformations
5.1.3 Animation and Movements Using Transformations
5.1.4 Interpolators for Continuous Changes
5.2 Geometrical Transformations in 3D
5.2.1 Translations
5.2.2 Scalings
5.2.3 Rotations Around $$ x $$-, $$ y $$- and $$ z $$-Axis
5.2.4 Calculation of a Transformation Matrix with a Linear
System of Equations
5.3 Switch Between Two Coordinate Systems
5.4 Scene Graphs
5.4.1 Modelling
5.4.2 Animation and Movement
5.4.3 Matrix Stacks and Their Application in the OpenGL
5.5 Arbitrary Rotations in 3D: Euler Angles, Gimbal Lock, and
Quaternions
5.5.1 Rotation Around Any Axis
5.6 Eulerian Angles and Gimbal Lock
5.6.1 Quaternions
5.7 Clipping Volume
5.8 Orthogonal and Perspective Projections
5.9 Perspective Projection and Clipping Volume in the OpenGL
5.10 Viewing Pipeline: Coordinate System Change of the Graphical
Pipeline
5.11 Transformations of the Normal Vectors
5.12 Transformations of the Viewing Pipeline in the OpenGL
5.13 Exercises
References
6 Greyscale and Colour Representation
6.1 Greyscale Representation and Intensities
6.2 Colour Models and Colour Spaces
6.3 Colours in the OpenGL
6.4 Colour Interpolation
6.5 Exercises
References
7 Rasterisation
7.1 Vector Graphics and Raster Graphics
7.2 Rasterisation in the Graphics Pipeline and Fragments
7.3 Rasterisation of Lines
7.3.1 Lines and Raster Graphics
7.3.2 Midpoint Algorithm for Lines According to Bresenham
7.3.3 Structural Algorithm for Lines According to Brons
7.3.4 Midpoint Algorithm for Circles
7.3.5 Drawing Arbitrary Curves
7.4 Parameters for Drawing Lines
7.4.1 Fragment Density and Line Style
7.4.2 Line Styles in the OpenGL
7.4.3 Drawing Thick Lines
7.4.4 Line Thickness in the OpenGL
7.5 Rasterisation and Filling of Areas
7.5.1 Odd Parity Rule
7.5.2 Scan Line Technique
7.5.3 Polygon Rasterisation Algorithm According to Pineda
7.5.4 Interpolation of Associated Data
7.5.5 Rasterising and Filling Polygons in the OpenGL
7.6 Aliasing Effect and Antialiasing
7.6.1 Examples of the Aliasing Effect
7.6.2 Antialiasing
7.6.3 Pre-Filtering
7.6.4 Pre-Filtering in the OpenGL
7.6.5 Post-Filtering
7.6.6 Post-Filtering Algorithms
7.6.7 Sample Arrangements for Post-Filtering
7.6.8 Post-Filtering in the OpenGL
7.7 Exercises
References
8 Visibility Considerations
8.1 Line Clipping in 2D
8.1.1 Cohen–Sutherland Clipping Algorithmus
8.1.2 Cyrus–Beck Clipping Algorithmus
8.2 Image-Space and Object-Space Methods
8.2.1 Backface Culling
8.2.2 Partitioning Methods
8.2.3 The Depth Buffer Algorithm
8.2.4 Scan-Line Algorithms
8.2.5 Priority Algorithms
8.3 Exercises
References
9 Lighting Models
9.1 Light Sources of Local Illumination
9.2 Reflections by Phong
9.3 The Lighting Model According to Phong in the OpenGL
9.4 Shading
9.5 Shading in the OpenGL
9.6 Shadows
9.7 Opacity and Transparency
9.8 Radiosity Model
9.9 Raycasting and Raytracing
9.10 Exercises
References
10 Textures
10.1 Texturing Process
10.1.1 Mipmap and Level of Detail: Variety in Miniature
10.1.2 Applications of Textures: Approximation of Light,
Reflection, Shadow, Opacity and Geometry
10.2 Textures in the OpenGL
10.3 Exercises
References
11 Special Effects and Virtual Reality
11.1 Factors for Good Virtual Reality Applications
11.2 Fog
11.3 Fog in the OpenGL
11.4 Particle Systems
11.5 A Particle System in the OpenGL
11.6 Dynamic Surfaces
11.7 Interaction and Object Selection
11.8 Object Selection in the OpenGL
11.9 Collision Detection
11.10 Collision Detection in the OpenGL
11.11 Auralisation of Acoustic Scenes
11.11.1 Acoustic Scenes
11.11.2 Localisability
11.11.3 Simulation
11.11.4 Reproduction Systems
11.11.5 Ambisonics
11.11.6 Interfaces and Standards
11.12 Spatial Vision and Stereoscopy
11.12.1 Perceptual Aspects of Spatial Vision
11.12.2 Stereoscopy Output Techniques
11.12.3 Stereoscopic Projections
11.12.4 Stereoscopy in the OpenGL
11.13 Exercises
References
Appendix A: Web References
Index
© Springer Nature Switzerland AG 2023
K. Lehn et al.Introduction to Computer GraphicsUndergraduate Topics in
Computer Sciencehttps://doi.org/10.1007/978-3-031-28135-8_1

1. Introduction
Karsten Lehn1 , Merijam Gotzes2 and Frank Klawonn3
(1)
Faculty of Information Technology, Fachhochschule Dortmund, University
of Applied Sciences and Arts, Dortmund, Germany
(2)
Hamm-Lippstadt University of Applied Sciences, Hamm, Germany
(3)
Data Analysis and Pattern Recognition Laboratory, Ostfalia University of
Applied Sciences, Braunschweig, Germany

Karsten Lehn (Corresponding author)


Email: [email protected]

Merijam Gotzes
Email: [email protected]

Frank Klawonn
Email: [email protected]

Computer graphics is the process of creating images using a computer. This


process is often referred to as graphical data processing. In this book, an
image is understood in an abstract sense. An image not only can represent a
realistic scene from the everyday world, but it can also be graphics such as
histograms or pie charts, or the graphical user interface of the software. In
the following section, some application fields of computer graphics are
presented as examples to give an impression of the range of tasks in this
discipline. This is followed by explanations of the main steps in computer
graphics and an overview of how a rendering pipeline works using the
graphics pipeline of the Open Graphics Library (OpenGL).

Computer graphics belongs to the field of visual computing. Visual


computing, also known as image informatics, deals with both image
analysis (acquisition, processing and analysis of image data) and image
synthesis (production of images from data). Visual computing is an
amalgamation of individual merging fields such as image processing,
computer graphics, computer vision, human–machine interaction and
machine learning. Computer graphics is an essential part of image
synthesis, just as image processing is an essential part of image analysis.
Therefore, in basic introductions to visual computing, the two disciplines of
computer graphics and image processing are taught together. This book
also integrates solutions to image processing problems, such as the
reduction of aliasing in rasterisation or the representation of object surfaces
using textures.

Further links to computer graphics exist with neighbouring disciplines such


as computer-aided design/manufacturing (CAD/CAM), information
visualisation, scientific visualisation and augmented reality (AR) and
virtual reality (VR) (see Chap. 11).

1.1 Application Fields


Although graphical user interfaces (GUIs) are fundamentally an application
area of computer graphics, the basics of computer graphics now play a
rather subordinate role in this field. On the one hand, there are standardised
tools for the design and programming of graphical user interfaces that
abstract the technical processes, and on the other hand, the focus is
primarily on usability and user experience and thus in the area of human–
computer interaction.

In advertising and certain art forms, many images are now produced
entirely by computer. Artists rework or even alienate photographs using
computer graphics techniques.
Besides the generation and display of these rather abstract graphics, the
main application of computer graphics is in the field of displaying realistic
—not necessarily real—images and image sequences. One such application
is animated films, which are created entirely using computer graphics
methods. The realistic rendering of water, hair or fur is a challenge that can
be achieved already in a reasonable amount of time using sophisticated
rendering systems.

For the very large amounts of data that are recorded in many areas of
industry, business and science, not only suitable methods for automatic
analysis and evaluation are required but also techniques for the
visualisation of graphic presentations. The field of interactive analysis
using visualisations is called visual analytics, which belongs to the field of
information visualisation. These visualisations go far beyond simple
representations of function graphs, histograms or pie charts, which are
already accomplished by spreadsheet programs today. Such simple types of
visualisations are assigned to data visualisation. This involves the pure
visualisation of raw data without taking into account an additional
relationship between the data. Data becomes information only through a
relationship. Information visualisations include two- or three-dimensional
visualisations of high-dimensional data, problem-adapted representations of
the data [3, 7, 12, 13] or special animations that show temporal
progressions of, for example, currents or weather phenomena. The
interactive visualisation of this information and the subsequent analysis by
the user generates knowledge. The last step is assigned to visual analytics.
Artificial intelligence techniques can be applied in this context as well.

Classic applications come from the fields of computer-aided design (CAD)


and computer-aided manufacturing (CAM), which involve the design and
construction of objects such as vehicles or housings. The objects to be
represented are designed on the computer, just as in computer games or
architectural design programs for visualising real or planned buildings and
landscapes. In these cases, the objects to be designed, for example an
extension to a house, are combined with already existing objects in the
computer. In the case of driving or flight simulators, real cities or
landscapes have to be modelled with the computer.
Not only does the ability to model and visualise objects plays an important
role in computer graphics, but also the generation of realistic
representations from measured data. To generate such data, scanners are
used that measure the depth information of the surfaces of objects. Several
calibrated cameras can also be used for this purpose, from which three-
dimensional information can be reconstructed. A very important field of
application for these technologies is medical informatics [6]. In this area,
for example, three-dimensional visualisations of organs or bone structures
are generated, which are based on data that was partly obtained by various
imaging procedures, such as ultrasound, computer tomography and
magnetic resonance imaging.

The coupling of real data and images with computer graphics techniques is
expected to increase in the future. Computer games allow navigating
through scenes and viewing a scene from different angles. Furthermore,
360-degree videos allow for an all-around view of a scene. Such videos can
be recorded using special camera arrangements, such as omnidirectional
cameras. These are spherical objects equipped with several cameras that
take pictures in different directions.

In most films in the entertainment industry, the viewer can only watch a
scene from a fixed angle. The basic techniques for self-selection of the
viewer position are already in place [4]. However, this requires scenes to be
recorded simultaneously from multiple perspectives and intelligent
playback devices. In this case, the viewer must not be bound to the
positions of the cameras, but can move to any viewpoint. The
corresponding representation of the scene is calculated from the
information provided by the individual cameras. For this purpose,
techniques of computer graphics, which serve the synthesis of images, have
to be combined with procedures from image processing, which deal with
the analysis of the recorded images [5].

Other important fields of application of computer graphics are virtual


reality (VR) and augmented reality (AR). In virtual reality, the aim is to
create an immersion for the user through the appropriate stimulation of as
many senses as possible, which leads to the perception of really being in
the artificially generated world. This desired psychological state is
called presence. This results in a wide range of possible applications, of
which only a few are mentioned as examples here. Virtual realities can be
used in architectural and interior simulations, for telepresence or learning
applications in a wide variety of fields. Psychotherapeutic applications to
control phobia (confrontation therapy) or applications to create empathy by
a user putting himself in the situation of another person are also possible
and promising applications.

Through the use of augmented reality technologies, real perception is


enriched by additional information. Here, too, there are many possible
applications, of which only a few are mentioned. Before a purchase,
products can be selected that fit perfectly into their intended environment,
for example, the new chest of drawers in the living room. Likewise, clothes
can be adapted to individual styles in augmented reality. Learning
applications, for example a virtual city or museum guide, are just as
possible as assembly aids in factories or additional information for the
maintenance of complex installations or devices, for example elevators or
vehicles. Conferences can be supplemented by virtual avatars of
participants in remote locations. In medicine, for the planning of an
operation or during an operation, images can be projected precisely onto
the body to be operated on as supplementary information, which was or is
generated before or during the operation by imaging techniques.

As these examples show, computer graphics applications can be


distinguished according to whether interaction with the application is
required or not. For the production of advertising images or classic
animated films, for example, no interaction is intended, so elaborate
computer graphics processes can be used for the most realistic
representation possible. In the film industry, entire computer farms are used
to calculate realistic computer graphics for a feature-length animated film,
sometimes over a period of months.

However, many computer graphics applications are interactive, such as


applications from the fields of virtual or augmented reality. This results in
requirements for the response time to generate a computer graphic. For
flight or vehicle simulators, which are used for the training of pilots or train
drivers, these response times must be strictly adhered to in order to be able
to represent scenarios that create a realistic behaviour of the vehicle.
Therefore, in computer graphics it is important to have methods that
generate realistic representations as efficiently as possible and do not
necessarily calculate all the details of a scene. For professional flight
simulators or virtual reality installations, high-performance computer
clusters or entire computer farms are potentially available. However, this is
not the case for computer games or augmented reality applications for
portable devices, even though in these cases the available computing power
is constantly growing. For this reason, simple, efficient models and
methods of computer graphics, which have been known for a long time, are
in use today to create interactive computer graphics applications.

1.2 From the Real Scene to the Computer


Generated Image
From the example applications presented in Sect. 1.1, the different tasks to
be solved in computer graphics can be derived. Figure 1.1 shows the rough
steps needed to get from a real or virtual scene to a perspective image.
First, the objects of the scene must be replicated as computer models. This
is usually done with a modelling tool. This replication is generally only an
approximation of the objects in the scene. Depending on the effort to be
expended and the modelling types available, the objects can be
approximated more or less well. This is illustrated in Fig. 1.1a, where a
house was modelled by two simple geometric shapes. The deciduous trees
consist of a cylinder and a sphere. The coniferous trees are composed of a
cylinder and three intersecting cones.

Fig. 1.1

Main steps of computer graphics to get from a real scene to a


computer generated perspective image

One of the first steps in graphics processing is to determine the viewer


location, also known as the camera position or eye point. Figure 1.1b shows
the modelled scene as seen by a viewer standing at some distance in front
of the house.

The modelled objects usually cover a much larger area than what is visible
to the viewer. For example, several building complexes with surrounding
gardens could be modelled, through which the viewer can move. If the
viewer is standing in a specific room of the building, or is at a specific
location in the scene looking in a specific direction, then most of the
modelled objects are not in the viewer’s field of view and can be
disregarded in the rendering of the scene. The example scene in Fig. 1.1
consists of a house and four trees, which should not be visible at the same
time. This is represented in Fig. 1.1c by the three-dimensional area marked
by lines. If a perspective projection is assumed, then this area has the shape
of a frustum (truncated pyramid) with a rectangular base. The (invisible)
top of the pyramid is the viewer’s location. The smaller boundary surface at
the front (the clipped top of the pyramid) is called the near clipping plane
(near plane). The area between the viewer and the near clipping plane is
not visible to the viewer in a computer generated scene. The larger
boundary surface at the rear (the base surface of the pyramid) is called far
clipping plane (far plane). The area behind the far plane is also not visible
to the viewer. The other four boundary surfaces limit the visible area
upwards, downwards and to the sides. Figure 1.1d shows a view of the
scene from above, revealing the positions of the objects inside and outside
the frustum.

The process of determining which objects are within the frustum and which
are not is called (three-dimensional) clipping. This is of great interest in
computer graphics in order to avoid unnecessary computing time for parts
of the scene that are not visible. However, an object that lies in this
perceptual area is not necessarily visible to the viewer, as it may be
obscured by other objects in the perceptual area. The process of
determining which parts of objects are hidden is called hidden face culling.
Clipping and culling are processes to determine the visibility of the objects
in a scene.

It is relatively easy to remove objects from the scene that are completely
outside of the frustum, such as the deciduous tree behind the far clipping
plane and the conifer to the right of the frustum (see Fig. 1.1c–d). Clipping
objects that are only partially in the clipping area, such as the conifer on the
left in the illustration, is more difficult. This may require splitting
geometrical objects or polygons and closing the resulting shape. Figure
1.1e–f shows the scene after clipping. Only the objects and object parts that
are perceptible to the viewer and thus lie within the clipping area need to be
processed further.

The objects that lie within this area must then be projected onto a two-
dimensional surface, resulting in an image that can be displayed on the
monitor or printer as shown in Fig. 1.1g. During or after this projection,
visibility considerations must be made to determine which objects or which
parts of objects are visible or obscured by other objects.

Furthermore, the illumination and light effects play an essential role in this
process for the representation of visible objects. In addition, further
processing steps, such as rasterisation, are required to produce an
efficiently rendered image of pixels for displaying on an output device,
such as a smartphone screen or a projector.

1.3 Rendering and Rendering Pipeline


The entire process of generating a two-dimensional image from a three-
dimensional scene is called rendering. The connection in the series of the
individual processing units roughly described in Fig. 1.1 from part (b)
onwards is called computer graphics pipeline, graphics pipeline
or rendering pipeline.

The processing of data through a pipeline takes place with the help of
successively arranged stages in which individual (complex) commands are
executed. This type of processing has the advantage that new data can
already be processed in the first stage of the pipeline as soon as the old data
has been passed on to the second stage. There is no need to wait for the
complete computation of the data before new data can be processed,
resulting in a strong parallelisation of data processing. Data in graphical
applications is particularly well suited for pipeline processing due to its
nature and the processing steps to be applied to it. In this case,
parallelisation and thus a high processing speed can easily be achieved. In
addition, special graphics hardware can accelerate the execution of certain
processing steps.

Fig. 1.2

Abstracted rendering pipeline of the Open Graphics Library


(OpenGL). The basic structure is similar to the graphics pipeline
of other graphics systems

Figure 1.2 shows an abstract representation of the graphics pipeline of


the Open Graphics Library (OpenGL) on the right. Since the stages shown
in the figure are also used in this or similar form in other graphical systems,
this pipeline serves to explain the basic structure of such systems. More
details on OpenGL and the OpenGL graphics pipeline can be found in
Chap. 2.

On the left side in the figure is the graphics application, which is usually
controlled by the main processor, the central processing unit (CPU), of the
computer. This application sends the three-dimensional scene (3D scene)
consisting of modelled objects to the graphics processing unit (GPU).
Graphics processors are integrated into modern computer systems in
different ways. A graphics processor can be located in the same housing as
the CPU, in a separate microchip on the motherboard or on a separate
graphics card plugged into the computer. Graphics cards or computers can
also be equipped with several graphics processors. In this book, the
word graphics processor or the abbreviation GPU is used to represent all
these variants.

In most cases, the surfaces of the individual (3D) objects in a 3D scene are
modelled using polygons (for example, triangles) represented by vertices.
A vertex (plural: vertices) is a corner of a polygon. In addition, in computer
graphics it is a data structure that stores, for example, the position
coordinates in 3D-space and colour values (or other data) at the respective
corner of the polygon (see Sect. 2.5.1). A set of vertices (vertex data) is
used to transfer the 3D scene from the application to the GPU.
Furthermore, additional information must be passed on how the surfaces of
the objects are to be drawn from this data. In Chap. 3, the basic geometric
objects available in OpenGL and their application for drawing objects or
object parts are explained. Chapter 4 contains basics for modelling three-
dimensional objects.

Furthermore, the graphics application transmits the position of the viewer,


also called camera position or viewer location, (mostly) in the form of
matrices for geometric transformations to the GPU. With this information,
vertex processing takes place on the GPU, which consists of the essential
step of geometry processing. This stage calculates the changed position
coordinates of the vertices due to geometric transformations, for example,
due to the change of the position of objects in the scene or a changed
camera position. As part of these transformations, a central projection (also
called perspective projection) or a parallel projection can be applied to
obtain the spatial impression of the 3D scene. This makes the scene
mapping onto a two-dimensional image. Details on geometry processing
can be found in Chap. 5.

Furthermore, in vertex processing, an appropriate tessellation (see Sect. 4.


2) can be used to refine or coarsen the geometry into suitable polygons
(mostly triangles). After vertex processing (more precisely in vertex post-
processing; see Sect. 2.5.2), the scene is reduced to the area visible to the
viewer, which is called clipping (see Sect. 8.1).

Until after vertex processing, the 3D scene exists as a vector graphic due to
the representation of the objects by vertices (see Sect. 7.1). Rasterisation
(also called scan conversion) converts this representation into a raster
graphic.1 Rasterisation algorithms are presented in Chap. 7. The chapter
also contains explanations of the most undesirable aliasing effect and ways
to reduce it. The conversion into raster graphics usually takes place for all
polygons in the visible frustum of the scene. Since at this point in the
graphics pipeline it has not yet been taken into account which objects are
transparent or hidden by other objects, the individual image points of the
rasterised polygons are called fragments. The rasterisation stage thus
provides a set of fragments.
Within the fragment processing, the illumination calculation usually takes
place. Taking into account the lighting data, such as the type and position of
the light sources and material properties of the objects, a colour value is
determined for each of the fragments by special algorithms (see Chap. 9).
To speed up the graphics processing or to achieve certain effects, textures
can be applied to the objects. Often, the objects are covered by two-
dimensional images (two-dimensional textures) to achieve a realistic
representation of the scene. However, three-dimensional textures and
geometry-changing textures are also possible (see Chap. 10).

The calculation of illumination during fragment processing for each


fragment is a typical procedure in modern graphics pipelines. In principle,
it is possible to perform the illumination calculation before rasterisation for
the vertices instead of for the fragments. This approach can be found, for
example, in the OpenGL fixed-function pipeline (see also Sect. 2.5).

Within the fragment processing, the combination of the fragments into


pixels also takes place through visibility considerations (see Sect. 8.1). This
calculation determines the final colour values of the pixels to be displayed
on the screen, especially taking into account the mutual occlusion of the
objects and transparent objects. The pixels with final colour values are
written into the frame buffer.

In the OpenGL graphics pipeline, the polygon meshes after vertex


processing or completely rendered images (as pixel data) can be returned to
the graphics application. This enables further (iterative) calculations and
thus the realisation of complex graphics applications.

The abstract OpenGL graphics pipeline shown in Fig. 1.2 is representative


of the structure of a typical graphics pipeline, which can deviate from a
concrete system. In addition, it is common in modern systems to execute as
many parts as possible in a flexible and programmable way. In a
programmable pipeline, programs—so-called shaders—are loaded and
executed onto the GPU. Explanations on shaders are in Sects. 2.6, 2.9 and
2.10.
1.4 Objectives of This Book and Recommended
Reading Order for the Sections
This book aims to teach the basics of computer graphics, supported by
practical examples from a modern graphics programming environment. The
programming of the rendering pipeline with the help of shaders will play a
decisive role. To achieve easy access to graphics programming, the
graphics programming interface Open Graphics Library (OpenGL) with the
Java binding Java OpenGL (JOGL) to the Java programming language was
selected (see Sect. 2.1).

OpenGL is a graphics programming interface that evolved greatly over the


past decades due to its platform independence, the availability of
increasingly powerful and cost-effective graphics hardware and its
widespread use. In order to adapt this programming interface to the needs
of the users, ever larger parts of the rendering pipeline became flexibly
programmable directly on the GPU. On the other hand, there was and still
is the need to ensure that old graphics applications remain compatible with
new versions when extending OpenGL. Reconciling these two opposing
design goals is not always easy and usually leads to compromises. In
addition, there are a large number of extensions to the OpenGL command
and feature set, some of which are specific to certain manufacturers of
graphics hardware. Against this background, a very powerful and extensive
graphics programming interface has emerged with OpenGL, which is
widely used and supported by drivers of all common graphics processors
for the major operating systems.

Due to this complexity, familiarisation with OpenGL is not always easy for
a novice in graphics programming and it requires a certain amount of time.
The quick achievement of seemingly simple objectives may fail to
materialise as the solution is far more complex than anticipated. This is
quite normal and should be met with a healthy amount of perseverance.

Different readers have different prior knowledge, different interests and are
different types of learners. This book is written for a reader2 with no prior
knowledge of computer graphics or OpenGL, who first wants to understand
the minimal theoretical basics of OpenGL before starting practical graphics
programming and deepening the knowledge of computer graphics. This
type of reader is advised to read this book in the order of the chapters and
sections, skipping over sections from Chap. 2 that seem too theoretical if
necessary. Later, when more practical computer graphics experience is
available, this chapter can serve as a reference section.

The practically oriented reader who wants to learn by programming


examples is advised to read this chapter (Chapter 1). For the first steps with
the OpenGL fixed-function pipeline, Sect. 2.7 can be read and the example
in Sect. 2.8 can be used. The corresponding source code is available in the
supplementary material to the online version of Chap. 2. Building on the
understanding of the fixed-function pipeline, it is useful to read Sect. 2.9
and the explanations of the example in Sect. 2.10 for an introduction to the
programmable pipeline. The source code of this example is available in the
supplementary material to the online version of Chap. 2. Afterwards,
selected sections of the subsequent chapters can be read and the referenced
examples used according to interest. The OpenGL basics are available in
Chap. 2 if required.

If a reader is only interested in the more theoretical basics of computer


graphics and less in actual graphics programming, he should read the book
in the order of the chapters and sections. In this case, the detailed
explanations of the OpenGL examples can be skipped.

A reader with prior knowledge of computer graphics will read this book in
the order of the chapters and sections, skipping the sections that cover
known prior knowledge. If OpenGL is already known, Chap. 2 can be
skipped entirely and used as a reference if needed.

Chapter 2 was deliberately designed to be a minimal introduction to


OpenGL. However, care has been taken to cover all the necessary elements
of the graphics pipeline in order to give the reader a comprehensive
understanding without overwhelming him. For comprehensive and more
extensive descriptions of this programming interface, please refer to the
OpenGL specifications [9, 10], the GLSL specification [1] and relevant
books, such as the OpenGL SuperBible [11], the OpenGL Programming
Guide [2] and the OpenGL Shading Language book [8].
1.5 Structure of This Book
The structure of this book is based on the abstract OpenGL graphics
pipeline shown in Fig. 1.2. Therefore, the contents of almost all chapters of
this book are already referred to in Sect. 1.3. This section contains a
complete overview of the contents in the order of the following chapters.
Section 1.4 gives recommendations for a reading order of the chapters and
sections depending on the reading type.

Chapter 2 contains detailed explanations of the OpenGL programming


interface and the individual processing steps of the two OpenGL graphics
pipelines, which are necessary to develop your own OpenGL applications
with the Java programming language. Simple examples are used to
introduce modern graphics programming with shaders and the shader
programming language OpenGL Shading Language (GLSL).

In Chap. 3, the basic geometric objects used in OpenGL for modelling


surfaces of three-dimensional objects are explained. In addition, this
chapter contains a presentation of the different OpenGL methods for
drawing these primitive basic shapes. The underlying concepts used in
OpenGL are also used in other graphics systems.

Chapter 4 contains explanations of various modelling approaches for three-


dimensional objects. The focus here is on modelling the surfaces of three-
dimensional bodies.

In Chap. 5, the processing of the geometric data is explained, which mainly


takes place during vertex processing. This includes in particular the
geometric transformations in the viewing pipeline. In this step, among other
operations, the position coordinates and normal vectors stored in the
vertices are transformed.

Models for the representation of colours are used at various steps in the
graphics pipeline. Chapter 6 contains the basics of the most important
colour models as well as the basics of greyscale representation.
An important step in the graphics pipeline is the conversion of the vertex
data, which is in a vector graphics representation, into a raster graphics
representation (images of pixels). Procedures for this rasterisation are
explained in Chap. 7. This chapter also covers the mostly undesirable
aliasing effects caused by rasterisation and measures to reduce these
effects.

A description of procedures for efficiently determining which parts of the


scene are visible (visibility considerations) can be found in Chap. 8. This
includes methods for clipping and culling.

During fragment processing, lighting models are usually used to create


realistic lighting situations in a scene. The standard lighting model of
computer graphics is explained together with models for the so-called
global illumination in Chap. 9. Effects such as shading, shadows and
reflections are also discussed there.

Chapter 10 contains descriptions of methods for texturing the surfaces of


three-dimensional objects. This allows objects to be covered with images
(textures) or the application of special effects that change their appearance.

In the concluding Chap. 11, selected advanced topics are covered that go
beyond the basics of computer graphics and lead to the popular fields of
virtual reality (VR) and augmented reality (AR). Important for the
realisation of interactive computer graphics applications are methods for
object selection and the detection and handling of object collisions.
Furthermore, methods for the realisation of fog effects or particle systems
are presented, whereby realistic effects can be created. The immersion for
virtual reality applications can be increased by auralising three-dimensional
acoustic scenes. Therefore, a rather detailed section from this area has been
included. Stereoscopic viewing of three-dimensional scenes, colloquially
called “seeing in 3D”, is an important factor in achieving immersion in a
virtual world and thus facilitating the grasp of complex relationships.

1.6 Exercises
Exercise 1.1
Background of computer graphics: Please research the answers to the
following questions:

1. (a)

What is the purpose of computer graphics?

2. (b)

On which computer was the first computer graphic created?

3. (c)

Which film is considered the first fully computer-animated film?

4. (d)

What is meant by the “uncanny valley” in the context of computer


graphics? Explain this term.

Exercise 1.2
Main steps in computer graphics

1. (a)

Name the main (abstract) steps in creating a computer graphic.

2. (b)

What are the main processing steps in a graphics pipeline and which
computations take place there?
Exercise 1.3
Requirements in computer graphics: Please research the answers to the
following questions:

1. (a)

Explain the difference between non-interactive and interactive


computer graphics.

2. (b)

From a computer science point of view, which requirements for


computer graphics software systems are particularly important in order
to be able to create computer graphics that are as realistic as possible?
Differentiate your answers according to systems for generating non-
interactive and interactive computer graphics.

3. (c)

Explain the difference between a real-time and non-real-time


computer graphics.

4. (d)

In computer graphics, what is meant by “hard real time” and “soft real
time”?

References

1. 1.
J. Kessenich, D. Baldwin and R. Rost. The OpenGL Shading
Language, Version 4.60.6. 12 Dec 2018. Specification. Abgerufen
2.5.2019. The Khronos Group Inc, 2018. URL: https://www.khronos.
org/registry/OpenGL/specs/gl/GLSLangSpec.4.60.pdf.
2. 2.
J. Kessenich, G. Sellers and D. Shreiner. OpenGL Programming
Guide. 9th edition. Boston [u. a.]: Addison-Wesley, 2017.
3. 3.
F. Klawonn, V. Chekhtman and E. Janz. “Visual Inspection of Fuzzy
Clustering Results”. In: Advances in Soft Computing. Ed. by Benítez
J.M., Cordóon O., Hoffmann F. and Roy R. London: Springer, 2003,
pp. 65–76.
4. 4.
M. Magnor. “3D-TV: Computergraphik zwischen virtueller und realer
Welt”. In: Informatik Spektrum 27 (2004), pp. 497–503.
5. 5.
A. Nischwitz, M. Fischer, P. Haberäcker and G. Socher.
Computergrafik. 4. Auflage. Computergrafik und Bildverarbeitung.
Wiesbaden: Springer Vieweg, 2019.
6. 6.
D. P. Pretschner. “Medizinische Informatik - Virtuelle Medizin auf
dem Vormarsch”. In: Carolo-Wilhelmina Forschungsmagazin der
Technis-chen Universitäat Braunschweig 1 (2001). Jahrgang XXXVI,
pp. 14–22.
7. 7.
F. Rehm, F. Klawonn and R. Kruse. “POLARMAP - Efficent
Visualizationof High Dimensional Data”. In: Information
Visualization. Ed. by E. Banissi, R.A. Burkhard, A. Ursyn, J.J. Zhang,
M. Bannatyne, C. Maple, A.J. Cowell, G.Y. Tian and M. Hou.
London: IEEE, 2006, pp. 731–740.
8. 8.
R. J. Rost and B. Licea-Kane. OpenGL Shading Language. 3rd
edition. Upper Saddle River, NJ [u. a.]: Addison-Wesley, 2010.
9. 9.
M. Segal and K. Akeley. The OpenGL Graphics System: A
Specification (Version 4.6 (Compatibility Profile) - October 22 2019.
Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https://
www.khronos.org/registry/OpenGL/specs/gl/glspec46.compatibility.
pdf.
10. 10.
M. Segal and K. Akeley. The OpenGL Graphics System: A
Specification (Version 4.6 (Core Profile) - October 22, 2019).
Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https://
www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf.
11. 11.
G. Sellers, S. Wright and N. Haemel. OpenGL SuperBible. 7th edition.
New York: Addison-Wesley, 2016.
12. 12.
T. Soukup and I. Davidson. Visual Data Mining. New York: Wiley,
2002.
13. 13.
K. Tschumitschew, F. Klawonn, F. Höoppner and V. Kolodyazhniy.
“Landscape Multidimensional Scaling”. In: Advances in Intelligent
Data Analysis VII. Ed. by R. Berthold, J. Shawe-Taylor and N. Lavrač.
Berlin: Springer, 2007, pp. 263–273.

Footnotes
1

The primitive assembly step executed in the OpenGL graphics pipeline


before rasterisation is described in more detail in Sect. 2.5.4.

It always refers equally to persons of all genders. To improve readability,


the masculine form is used in this book.
© Springer Nature Switzerland AG 2023
K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science
https://doi.org/10.1007/978-3-031-28135-8_2

2. The Open Graphics Library (OpenGL)


Karsten Lehn1 , Merijam Gotzes2 and Frank Klawonn3
(1) Faculty of Information Technology, Fachhochschule Dortmund,
University of Applied Sciences and Arts, Dortmund, Germany
(2) Hamm-Lippstadt University of Applied Sciences, Hamm, Germany
(3) Data Analysis and Pattern Recognition Laboratory, Ostfalia University
of Applied Sciences, Braunschweig, Germany

Karsten Lehn (Corresponding author)


Email: [email protected]

Merijam Gotzes
Email: [email protected]

Frank Klawonn
Email: [email protected]

Supplementary Information
The online version contains supplementary material available at https://doi.
org/10.1007/978-3-031-28135-8_2.

The Open Graphics Library (OpenGL) is a graphics programming interface


that has become very widespread in recent decades due to its open concept
and platform independence. Drivers of common graphics processors and
graphics cards for the major operating systems support the OpenGL. After a
brief overview of existing programming interfaces for graphics
applications, this chapter explains the basics of the OpenGL in detail. The
functionality is presented in such detail to enable an understanding and
classification of the basic concepts of computer graphics contained in the
following chapters. At the same time, this chapter can serve as a concise
reference book about the OpenGL. This is limited to the OpenGL variant
for desktop operating systems. Furthermore, the programming of OpenGL
applications with the Java programming language using the OpenGL
binding Java OpenGL (JOGL) with and without shaders is introduced.

2.1 Graphics Programming Interfaces


An application accesses the GPU through special application programming
interfaces (APIs). As shown in Fig. 2.1, these programming interfaces can
be distinguished according to their degree of abstraction from the GPU
hardware. The highest level of abstraction is provided by user interface
frameworks, which provide relatively simple (mostly) pre-built windows,
dialogues, buttons and other user interface elements. Examples of such an
interface are Windows Presentation Foundation (WPF), Java Swing or
JavaFX.
At the abstraction level below is scene graph middleware, which
manages individual objects in a scene graph. A scene graph is a special data
structure to store a scene in such a way that it can be efficiently manipulated
and rendered (see Sect. 5.4). Furthermore, it can be used to define and
control the properties, the structure of the graphic objects and their
animations. Examples of this are OpenSceneGraph, which is implemented
in C++, and the Java-based Java 3D.
The lowest level of abstraction in the figure is found in the graphics
programming interfaces, through which the drawing of objects and the
functioning of the graphics pipeline can be very precisely controlled and
programmed. An example of this is Direct3D by Microsoft. Other examples
are the programming interfaces Open Graphics Library (OpenGL)
and Vulkan, both specified by the industry consortium Khronos Group.
Here, Direct3D and Vulkan are at a lower level of abstraction than
OpenGL. A graphics programming interface provides uniform access to the
GPU, so that the functionality of each GPU of each individual manufacturer
does not have to be considered separately when programming graphics
applications.

Fig. 2.1 Degree of abstraction of graphics programming interfaces


The advantage of a low degree of abstraction is the possibility of very
precise control of rendering by the GPU and thus a very efficient
implementation and execution of drawing processes. In contrast to the
higher abstraction levels, the programming effort is significantly higher,
which can potentially lead to more errors. Programming interfaces with a
high level of abstraction can be used much more easily without having to
worry about the specifics of graphics programming or a specific GPU. On
the other hand, the control and flexibility of rendering are limited. For some
use cases, the execution speed in particular will be slower, as the libraries
used have to cover a large number of use cases in order to be flexible and
thus cannot be optimised for every application.
The use of the programming interface OpenGL was chosen for this
book to teach the basics of computer graphics. At this degree of abstraction,
the principles of the processing steps in the individual stages of graphics
pipelines can be taught clearly and with a direct reference to practical
examples. In contrast to the programming interfaces Direct3D and Vulkan,
the abstraction is high enough to achieve initial programming results
quickly. In addition, OpenGL (like Direct3D and Vulkan) offers the up-to-
date possibility of flexible programming of the graphics pipeline on the
GPU using shaders. OpenSceneGraph also supports shader programming,
but this middleware is only available for integration into C++ programs.
Java 3D was developed for the development of Java programs, but offers no
support for programming shaders. JavaFX provides an interface to a
window system at a high level of abstraction and allows working with a
scene graph for two- and three-dimensional graphics. Support for shaders
was planned, but is not yet available in the latest version at the time of
writing this book.
In the following sections of this chapter, the OpenGL programming
interface is introduced to the extent necessary to explain the basic
mechanisms of computer graphics. Comprehensive descriptions are
available in the OpenGL specifications [9, 10], the GLSL specification [4]
or the comprehensive books [5, 12].

2.2 General About the OpenGL


The programming interface Open Graphics Library (OpenGL) was
specified by the industry consortium Khronos Group and is maintained as
an open standard. Many companies belong to the Khronos Group, which
has made OpenGL very widespread. Current specifications, additional
materials and information can be found at www.opengl.org.
OpenGL was designed by the Khronus Group as a platform-independent
interface specification for the hardware-supported rendering of high-quality
three-dimensional graphics. The hardware support can be provided by a
Graphics Processing Unit (GPU) that is integrated into the central
processing unit (CPU) and is located as a separate processor on the
motherboard of the computer or on a separate graphics card. Support by
several GPUs or several graphics cards is also possible. Although OpenGL
is designed as an interface for accessing graphics hardware, the interface
can in principle be implemented by software that is largely executed on the
CPU. This, of course, eliminates the execution speed advantage of hardware
support.
OpenGL provides a uniform interface so that graphics applications can
use different hardware support (for example, by different graphics cards on
different devices) without having to take this into account significantly in
the programming.1
OpenGL specifies only an interface. This interface is implemented by
the manufacturer of the graphics hardware and is supplied by the graphics
driver together with the hardware. This allows the connection of a specific
hardware to the general programming interface, which gives the
manufacturer freedom for optimisations in graphics processing depending
on the features and performance of the GPU. This concerns hardware
parameters such as the size of the graphics memory or the number of
processor cores that allow parallel processing. Furthermore, algorithms can
be implemented that optimise, for example, the timing and the amount of
data transfer between the CPU and the GPU. Since the OpenGL
specifications only define the behaviour at the interface level and certain
variations in the implementation are permitted, a graphics application with
the same input parameters on different graphics processors will not produce
bitwise identical outputs in all cases. However, the results will be the same,
except for details. In order to use this abstraction for learning graphics
programming using OpenGL, this book only describes the conceptual
behaviour of the GPU from the perspective of a graphics application using
OpenGL. The actual behaviour below the OpenGL interface level may
differ from this conceptual behaviour and is of little relevance to
understanding the fundamentals of graphics programming. Therefore, this
book often describes data “being passed to the GPU” or processing “on the
GPU”, although this actually occurs at different times on different GPUs
due to the aforementioned optimisations of the graphics driver. The
graphics driver also determines the supported OpenGL version, so this
driver should be updated regularly.
OpenGL is also specified independently of a computer platform.
Nevertheless, OpenGL drivers are available for the common operating
systems Windows, macOS and Linux. OpenGL for Embedded Systems
(OpenGL ES) is a subset of (desktop) OpenGL and provides specific
support for mobile phones and other embedded systems, such as game
consoles and automotive multimedia devices. In particular, these
specifications take into account the need to reduce the power consumption
of mobile and embedded systems. The Android mobile operating system,
for example, offers OpenGL ES support. Furthermore, modern browsers
support WebGL (Web Graphics Library), which is a special specification for
the hardware-supported display of three-dimensional graphics for web
applications. WebGL is shader-based and has very similar functionalities to
the OpenGL ES. Separate specifications exist for OpenGL ES and WebGL
with their own version counts (see www.khronos.org). The basic
mechanisms and available shaders (in OpenGL ES from version 2.0) are—
depending on the version—identical to the desktop variant of OpenGL.
Since starting to develop applications for mobile phones or other embedded
systems requires further training and the mechanisms of OpenGL form a
basis for understanding the other variants, only OpenGL applications for
desktop operating systems are considered in this book.
OpenGL is specified independently of any programming language. On
the reference pages www.opengl.org, the commands are explained in a
syntax that is similar to the syntax of the C programming language. Many
graphics applications with OpenGL support are implemented in the C or
C++ programming language, for which an OpenGL binding exists. Since
OpenGL is independent of specific operating systems and computer
platforms (see above), such bindings usually do not include support for
specific window systems or input devices (mouse, keyboard or game
controllers). For this, further external libraries are needed. A short overview
of the necessary building blocks for development under C/C++ can be
found in [6, p. 46f]. Detailed instructions for setting up the development
environment for OpenGL graphics programming under C/C++ are available
on the relevant websites and forums. There are also bindings for a wide
range of other programming and scripting languages, such as C#, Delphi,
Haskell, Java, Lua, Perl, Python, Ruby and Visual Basic.

2.3 The OpenGL and Java


Two commonly used OpenGL language bindings to the Java programming
language are Java OpenGL (JOGL) and Lightweight Java Game Library
(LWJGL). With the very widespread programming language Java and the
associated tools, such as the Java Development Kit (JDK), an environment
exists that makes it easy for beginners to get started with programming.
Furthermore, the programming syntax of the OpenGL language binding
JOGL is close to the syntax of the OpenGL documentation. In addition to
the possibility of displaying computer graphics generated with hardware
support via the JOGL window system (GLWindow), the Abstract Window
Toolkit (AWT) or Swing window systems included in current Java versions
can also be used. This makes it possible to combine the teaching of the
basics of computer graphics with an easy introduction to practical graphics
programming.
Knowledge of Java and OpenGL form a solid foundation for
specialisations such as the use of OpenGL for Embedded Systems (OpenGL
ES) for the realisation of computer graphics on mobile phones or the Web
Graphics Library (WebGL) for the generation of GPU-supported computer
graphics in browsers. These basics can be used for familiarisation
with Vulkan as a further development of a graphics programming interface
at a lower level of abstraction. Furthermore, this basis allows an easy
transition to other programming languages, such as C++, C#, to other
window systems or to future developments in other areas. For these reasons,
programming examples in Java with the OpenGL binding JOGL are used in
this book to motivate and support the core objective—teaching the basics of
computer graphics—through practical examples.

2.4 Profiles
In 2004, with the OpenGL specification 2.0, the so-called programmable
pipeline (see Fig. 2.2, right) was introduced, which allows for more flexible
programming of the graphics pipeline. Before this time, only parameters of
fixed-function blocks in the so-called fixed-function pipeline (see Fig. 2.2,
left) could be modified. By a blockwise transmission of data into buffers on
the GPU, much more efficient applications can be implemented using the
programmable pipeline (see Sect. 2.10). The flexible programming in this
pipeline is mainly achieved by programming shaders. (cf. [6, pp. 44–45])
Since more complex and efficient graphics applications can be
implemented using the programmable pipeline and the language scope of
this pipeline is significantly smaller in contrast to the fixed-function
pipeline, an attempt was made in 2008 and 2009 with the OpenGL
specifications 3.0 and 3.1 to remove the old fixed-function pipeline
commands from the OpenGL specifications. However, this met with heavy
criticism from industrial OpenGL users, so that with version 3.2 the so-
called compatibility profile was introduced. This profile, with its
approximately 700 commands, covers the language range of the fixed-
function pipeline and the programmable pipeline. In the core profile, a
reduced language scope with approximately 250 commands of the
programmable pipeline is available. (cf. [6, p. 46]) The relationship
between the two profiles and the availability of the two pipelines is
summarised in Table 2.1. The core profile is specified in [10] and the
compatibility profile in [9].

Fig. 2.2 Fixed-function pipeline (left) and programmable pipeline (right)

Table 2.1 Differences between the compatibility profile and the core profile in OpenGL

Compatibility profile Core profile


Number of commands Approximately 700 Approximately 250
Fixed-function pipeline Available Not available
Programmable pipeline Available Available

2.5 OpenGL Graphics Pipelines


Figure 2.2 shows the basic processing stages of the graphics pipelines in the
OpenGL. Both pipelines in the figure are detailed representations of the
abstract OpenGL graphics pipeline introduced in Sect. 1.1. To simplify
matters, the presentation of special data paths and feedback possibilities to
the graphics application was omitted. As already explained, the essential
further development of the fixed-function pipeline to the programmable
pipeline was the replacement of fixed-function blocks with programmable
function blocks, each of which is colour-coded in Fig. 2.2. The vertex
shader of the programmable pipeline can realise the function blocks
geometric transformations, lighting and colouring in the fixed-function
pipeline. The function block texturing, colour sum and fog in the fixed-
function pipeline can be implemented in the programmable pipeline by
the fragment shader. Shader-supported programming naturally results in
additional freedom in graphics processing. For example, in modern
applications, the lighting is realised by the fragment shader instead of the
vertex shader. In addition, a programmable tessellation stage and a
geometry shader are optionally available in the programmable pipeline (see
Sect. 2.6). The non-coloured fixed-function blocks are identical in both
graphics pipelines and can be configured and manipulated through a variety
of parameter settings.
Through the compatibility profile (see Sect. 2.4), both graphics
pipelines are available in modern graphics processors. It is obvious to
simulate the functions of the fixed-function pipeline by (shader) programs
using the programmable pipeline. This possibility is implemented by the
graphics driver and is not visible to users of the interface to the fixed-
function pipeline. In the following sections, the function blocks of both
graphics pipelines are explained in more detail.

2.5.1 Vertex Processing


The vertex processing stage receives the scene to be drawn from the
graphics application. In the fixed-function pipeline, the scene is supplied to
the geometric transformations stage and in the programmable pipeline to
the vertex shader. The objects of the scene are described by a set of vertices.
A vertex (plural: vertices) is a corner of a geometric primitive (or primitive
for short), for example, an endpoint of a line segment or a corner of a
triangle. Furthermore, additional information is stored in these vertices in
order to be able to draw the respective geometric primitive. Typically, the
vertices contain position coordination in three-dimensional space (3D
coordinates), colour values, normal vectors or texture coordinates. The
colour values can exist as RGBA colour values, whose fourth component is
the so-called alpha value, by which the transparency of an object is
described. Strictly speaking, the alpha value represents the opacity of an
object. A low alpha value means high transparency and a high value means
low transparency (see Sect. 6.3 for details).
The normal vectors are displacement vectors oriented perpendicular to
the surface to be drawn. They are used to calculate and display the
illumination of the objects and the scene (see Chap. 9). The texture
coordinates are used to modify objects with textures. Often objects are
covered by two-dimensional images (see Chap. 10). In the fixed-function
pipeline, fog coordinates can be used to achieve fog effects in the scene. In
the programmable pipeline, in principle, any content can be stored in the
vertices for processing in the correspondingly programmed vertex shader.
The available data in addition to the three-dimensional position coordinates
are called associated data.
The three-dimensional position coordinates of the vertices are available
in so-called object coordinates (also called model coordinates) before the
execution of the geometric transformations function block. Through the
application of the so-called model matrix, the position coordinates of the
individual vertices are first transformed into world coordinates and then,
through the application of the so-called view matrix, into camera
coordinates or eye coordinates. The latter transformation step is also
called view transformation, which takes into account the position of the
camera (or the position of the observer) through which the scene is viewed.
The two transformations can be combined by a single transformation step
using the model view matrix, which results from a matrix multiplication of
the model matrix and the view matrix (see Chap. 5). The model view matrix
is usually passed from the graphics application to the GPU.
In the next step of geometry processing, the projection matrix is applied
to the position coordinates of the vertices, transforming the vertices into
clip coordinates (projection transformation). This often results in a
perspective distortion of the scene, so that the representation gets a realistic
three-dimensional impression. Depending on the structure of the projection
matrix, for example, perspective projection or parallel projection can be
used. These two types of projection are frequently used. The described use
of the model view matrix and the projection matrix is part of the viewing
pipeline, which is described in more detail in Sect. 5.10. Besides the
transformation of the position coordinates, the transformation of the normal
vectors of the vertices is also necessary (see Sect. 5.11).
In the programmable pipeline, the transformations using the model view
matrix and the projection matrix are implemented in the vertex shader.
Since this must be individually programmed, variations of the calculation
are possible according to the needs of the application. The second part of
the viewing pipeline is carried out in the vertex post-processing stage (see
Sect. 2.5.2).
In the function block lighting and colouring of the fixed-function
pipeline, the computation of colour values for the vertices takes place. This
block allows to specify light sources and material parameters for the
surfaces of individual object faces, which are taken into account in the
lighting calculation. In particular, the normal vectors of the individual
vertices can be taken into account. The use of the Blinn–Phong illumination
model (see [1]), which is often used in computer graphics, is envisaged.
This model builds on Phong’s illumination model [7] and modifies it to
increase efficiency. Details are given in Sect. 9.3. Due to the high
performance of modern graphics processors, the illumination values in
current computer graphics applications are no longer calculated for vertices
but for fragments. Normally, more fragments than vertices are processed in
a scene. In this case, the corresponding calculations are performed in the
programmable pipeline in the fragment shader (see Sect. 2.5.5).

2.5.2 Vertex Post-Processing


The function block vertex post-processing allows numerous post-processing
steps of the vertex data after the previous steps. The main steps are
explained below. The optional transform feedback makes it possible to save
the result of the geometry processing or processing by the preceding shader
in buffer objects and either return it to the application or use it multiple
times for further processing by the graphics pipeline.
Using flat shading, all vertices of a graphic primitive can be assigned
the same vertex data. Here, the vertex data of a previously defined vertex,
the provoking vertex, is used within the primitive. This allows, for example,
a triangle to be coloured with only one colour, which is called flat shading
(or constant shading) (see Sect. 9.4). Clipping removes invisible objects or
parts of objects from a scene by checking whether they are in the visible
area—the view volume. In this process, non-visible vertices are removed or
new visible vertices are created to represent visible object parts. Efficient
clipping methods are described in Sect. 8.1.
Furthermore, perspective division converts the position coordinates of
the vertices from clip coordinates into a normalised representation,
the normalised device coordinates. Based on this normalised representation,
the scene can be displayed very easily and efficiently to any two-
dimensional output device. Subsequently, the viewport transformation
transforms the normalised device coordinates into window coordinates
(also device coordinates) so that the scene is ready for output to a two-
dimensional window, the viewport. These two transformations for the
position coordinates of the vertices represent the second part of the viewing
pipeline, which is explained in detail in Sect. 5.10. Explanations of the
viewport are available in Sect. 5.1.2.

2.5.3 Primitive Assembly


An OpenGL drawing command invoked by the graphics application allows
to specify for a number of vertices (a vertex stream) which geometric shape
to draw. The available basic shapes are referred to in computer
graphics geometric primitives, or primitives for short. In the fixed-function
pipeline, points, line segments, polygons, triangles and quadrilaterals
(quads) are available. In the programmable pipeline, only points, line
segments and triangles can be drawn. See Sect. 3.2 for the available basic
objects in the OpenGL.
To increase efficiency, drawing commands exist that allow a large
number of these primitives to be drawn by reusing shared vertices. For
example, complete object surfaces can be rendered by long sequences of
triangles (GL_TRIANGLE_STRIP) or triangle fans
(GL_TRIANGLE_FAN) with only one call to a drawing command. Various
possibilities for drawing primitives in the OpenGL can be found in Sect. 3.
3.
The essential task of the function block primitive assembly is the
decomposition of these vertex streams into individual primitives, also
called base primitives. The three base primitives in the OpenGL are points,
line segments and polygons. In the programmable pipeline, the polygon
base primitive is a triangle. Through this function block, a line sequence
consisting of seven vertices, for example, is converted into six individual
base line segments. If a tessellation evaluation shader or a geometry shader
(see Sect. 2.6) is used, part of the primitive assembly takes place before the
shader is or shaders are executed.
The decomposition of a vertex stream into individual base primitives
determines the order in which the vertices in the vertex stream are specified
(enumerated) relative to the other vertices of a primitive. This allows the
front and back of the primitives to be identified and defined. According to a
common convention in computer graphics, the front side of a primitive is
the viewed side where the vertices are named (enumerated) counter-
clockwise in the vertex stream. For the backsides, this orientation, which is
also called winding order, is exactly the opposite (see Sect. 3.2.4). With this
information, it is possible to cull (suppress) front or backsides of the
polygons by frontface culling or backface culling before the
(computationally intensive) rasterisation stage. For example, for an object
viewed from the outside, only the outer sides of the triangles that make up
the surface of the object need to be drawn. This will usually be the front
sides of the triangles. If the viewer is inside an object, for example, in a
cube, then it is usually sufficient to draw the backsides of the polygons. If
the objects in the scene are well modelled, i.e., all the front sides actually
face outwards, then this mechanism contributes to a significant increase in
drawing efficiency without reducing the quality of the rendering result.
Details on visibility considerations can be found in Sect. 8.2.1.

2.5.4 Rasterisation
Up to this point of processing in the OpenGL graphics pipeline, all objects
of the scene to be rendered are present as a set of vertices—i.e., as a vector
graphic. The function block rasterisation converts the scene from this
vector format into a raster graphic (see Sect. 7.1 for an explanation of these
two types of representations) by mapping the individual base primitives
onto a regular (uniform) grid.2 Some algorithms for efficient rasterisation
(also called scan conversion) lines are presented in Sect. 7.3. Methods for
rasterising and filling of areas can be found in Sect. 7.5.
The results of rasterisation are fragments, which are potential pixels for
display on an output device. For a clear linguistic delimitation, we will
explicitly speak of fragments, as is common in computer graphics, and not
of pixels, since at this point in the pipeline it is not yet clear which of the
potentially overlapping fragments will be used to display which colour
value as a pixel of the output device. Only in the following steps of the
pipeline is it taken into account which objects are hidden by other objects,
how the transparency of objects affects them and how the colour values of
the possibly overlapping fragments are used to compute a final colour value
of a pixel.
After rasterisation—with the standard use of the pipeline—for each
fragment at least the position coordinates in two-dimensional window
coordinates (2D coordinates), a depth value (z-value) and possibly a colour
value are available. The three-dimensional position coordinates of the
vertices are used to determine which raster points belong to a geometric
primitive as a fragment. In addition, the assignment of additional data
stored in each vertex to the generated fragments takes place. These are, for
example, colour values, normal vectors, texture coordinates or fog
coordinates. This additional data stored in vertices and fragments is
called associated data (see above). Since the conversion of a vector graphic
into a raster graphic usually results in many fragments from a few vertices,
the additional intermediate values required are calculated by interpolating
the vertex data. This is done by using the two-dimensional position
coordinates of the fragment within the primitive in question. The
interpolation is usually done using barycentric coordinates within a triangle
(see Sect. 7.5.4). In Chap. 10, this is explained using texture coordinates.
Such an interpolation is used in both pipeline variants for the colour values.
This, for example, can be used to create colour gradients within a triangle.
If this interpolation, which has a smoothing effect, is not desired, it can be
disabled by activating flat shading in the function block vertex post-
processing (see Sect. 2.5.2). Since in this case all vertices of a primitive
already have the same values before rasterisation, all affected fragments
also receive the same values. This mechanism of rasterisation can be used
to realise flat shading (constant shading) , Gouraud shading or Phong
shading (siehe Sect. 9.4).
When converting a scene from a vector graphic to a raster
graphic, aliasing effects (also called aliasing for short) appear, which can be
reduced or removed by antialiasing methods (see Sect. 7.6). Already the
linear interpolation during the conversion of vertex data into fragment data
(see last paragraph) reduces aliasing, as this has a smoothing effect. The
OpenGL specification provides the multisample antialiasing (MSAA)
method for antialiasing. With this method, colour values are sampled at
slightly different coordinates in relation to the fragment coordinate.
These samples (also called subpixels) are combined at the end of the
graphics pipeline into an improved (smoothed) sample per fragment (see
Table 2.2 in Sect. 2.5.6). Furthermore, a so-called coverage value is
determined for each fragment, which indicates how large the proportion of
the samples is that lie within the sampled geometric primitive and
contribute to the colour of the respective fragment (see Sect. 7.6.3).
For the following pipeline steps, the computations can be performed for
all samples (in parallel) per fragment. In this case, true oversampling is
applied. This method is called supersampling antialiasing (SSAA) (see Sect.
7.6.5). The pipeline can be configured in such a way that the subsequent
stages use only a smaller proportion of samples or only exactly one sample
(usually the middle sample) per fragment for the computations. This option
of the multisampling antialiasing method allows optimisations to achieve a
large gain in efficiency.

2.5.5 Fragment Processing


In the pipeline stage texturing, colour sum and fog of the fixed-function
pipeline textures can be applied to objects. Often the objects are covered by
two-dimensional images (2D textures) to achieve a more realistic rendering
of the scene (see Chap. 10). The texturing operation uses the (interpolated)
texture coordinates that were determined for individual fragments. The
colour values of the texture are combined with the colour values of the
fragments when applying the texture. Due to the multiplicative combination
of the colour values, specular reflections (see Sect. 9.1) may be undesirably
strongly reduced. Therefore, with the function colour sum, a second colour
can be added after this colour combination to appropriately emphasise the
specular reflections. This second colour (secondary colour) must be
contained in the vertices accordingly. Furthermore, at this stage, it is
possible to create fog effects in the scene (see Sect. 11.3).
In the programmable pipeline, the operations of the fixed-function
pipeline can be realised by the fragment shader. In modern graphics
applications, the computations for the illumination per fragment take place
in the fragment shader. In contrast to the lighting computation per vertex in
the fixed-function pipeline (see Sect. 2.5.1), more realistic lighting effects
can be achieved. Such methods are more complex, but with modern
graphics hardware, they can easily be implemented for real-time
applications. Methods for local and global illumination are explained in
Chap. 9.
Table 2.2 Per-fragment operations in the OpenGL graphics pipelines

Per- Execution Possible applications or example, partly in combination with other


fragment before or per-fragment operations
operation after
fragment
processing
Pixel Before Identifying the occlusion of the framebuffer by windows outside of
ownership the OpenGL, preventing drawing in the hidden area
test
Scissor test Before Preventing drawing outside the viewport, support of drawing
rectangular elements, for example, buttons or input fields
Multisample Before Antialiasing in conjunction with transparency, if multisampling is
fragment activated
operations
Alpha to After Antialiasing in conjunction with transparency, for example, for the
coverage representation of fences, foliage or leaves, if multisampling is active
Alpha test After Representation of transparency
$$^1$$
Stencil test After, Cutting out non-rectangular areas, for example, for rendering shadows
optional from point light sources or mirrors or for masking the cockpit in flight
before or driving simulations
$$^2$$
Depth After, Occlusion of surfaces, culling
buffer test optional
before
$$^2$$
Occlusion After, Rendering of aperture spots or lens flare, occlusion of bounding
queries optional volumes, collision control
before
$$^2$$
Blending After Mixing/cross-fading (of images), transparency
sRGB After Conversion to the standard RGB colour model (sRGB colour model)
conversion
Dithering After Reduction of interferences when using a low colour depth
Logical After Mixing/cross-fading (of images), transparency
operations
Per- Execution Possible applications or example, partly in combination with other
fragment before or per-fragment operations
operation after
fragment
processing
Additional After Determination whether alpha test, stencil test, depth buffer test,
multisample blending, dithering and logical operations are performed per sample.
fragment Combination of the colour values of the samples to one colour value
operations per fragment, if multisampling is active

$$^1$$Available in the compatibility profile only


$$^2$$In the programmable pipeline, optional execution before the
fragment shader instead of after the fragment shader (see explanations in
the text)

2.5.6 Per-Fragment Operations


Before and after fragment processing, a variety of operations are applicable
to modify fragments or eliminate irrelevant or undesirable fragments. By
excluding further processing of fragments, efficiency can be significantly
increased. In addition, various effects can be realised by modifying
fragments. The possible per-fragment operations are listed in Table 2.2
along with possible applications or examples. The order of listing in the
table corresponds to the processing order in the OpenGL graphics pipelines.
As can be seen from the table, the pixel ownership test, the scissor test and
the multisample fragment operations are always executed before fragment
processing. From the point of view of the user of the fixed-function
pipeline, all other operations are only available after this function block. In
the programmable pipeline, the operations stencil test, depth buffer test and
occlusion queries can optionally be executed before the fragment shader.
This is triggered by specifying a special layout qualifier for input variables
of the fragment shader.3 If these operations are performed before the
fragment shader, they are called early fragment tests and are not performed
again after the fragment shader.
For advanced readers, it should be noted that in addition to this explicit
specification of whether early fragment tests are performed before or after
fragment processing, optimisations of the GPU hardware and the OpenGL
driver implementation exist that include early execution independently of
the specification in the fragment shader. Since in modern GPUs the fixed-
function pipeline is usually realised by shader implementations, these
optimisations are in principle also possible or available for the fixed-
function pipeline. In order to remain compatible with the OpenGL
specification, the behaviour of these optimisations in both pipelines must be
as if these operations were performed after fragment processing, although
this actually takes place (partially) beforehand. Of course, this is only true if
early fragment tests are not explicitly enabled. The user of OpenGL may
not notice any difference in the rendering results apart from an increase in
efficiency due to this optimisation.
The functionalities of the individual operations are partially explained in
the following chapters of this book. Detailed descriptions of the
mechanisms can be found in the OpenGL specifications (see [9, 10]) and in
books that comprehensively describe the OpenGL programming interface
(see, for example, [5, 12]). The following are examples of some
applications of the per-fragment operations.
With the help of the scissor test, a rectangular area can be defined in
which to draw. Fragments outside this rectangle can be excluded from
further processing. Due to the rectangular shape of this area, it is possible to
check very efficiently whether a fragment lies inside or outside this area.
To check whether a fragment lies in an area with a more complex shape
than a rectangle, the stencil test is applicable. For this, a stencil buffer must
be created that contains values for each two-dimensional coordinate. With
this test, for example, only the fragments from the area defined by the
stencil buffer can be drawn—as with a template—and those outside this
area can be excluded from drawing. In addition, more complex comparison
operations, modifications of the stencil buffer and a link with the depth
buffer test (see next paragraph) are possible. In principle, it is possible to
render shadows or mirrors with this test. However, today these effects are
mostly realised by drawing in textures (rendering to texture) and then
applying these textures to the corresponding objects.
In a depth buffer, also called z-buffer, the z-values (depth values) of the
fragments can be stored so that the depth buffer test can be used to check
whether certain fragments overlap each other. In this way, the mutual
occlusion of objects (called culling) can be implemented. In Sect. 8.2.5, the
depth buffer algorithm for performing occlusion calculation is explained.
By blending or using logical operations, colours of fragments (in
buffers) can be mixed for rendering transparent objects.
Furthermore, Table 2.2 lists operations for performing antialiasing using
the multisampling antialiasing method. The multisample fragment
operations can modify or mask the coverage value for a fragment to
improve the appearance of transparency. In the additional multisample
fragment operation, it can be determined whether alpha test, stencil test,
depth buffer test, blending, dithering and logical operations are performed
per sample or only per fragment. This operation is the last step of the
multisampling process, in which the colour values of the individual samples
are combined into one colour value per fragment. Section 7.6.6 contains
details on multisample antialiasing.
The alpha to coverage operation derives a temporary coverage value
from the alpha components of the colour values of the samples of a
fragment. This value is then offset by a logical AND operation with the
coverage value of the fragment and used as the new coverage value. This
makes it possible to combine good transparency rendering and good
antialiasing with efficient processing in complex scenes (for example, to
render foliage, leaves or fences).
If the reduction of the colour depth is desired, the dithering operation
can be used to minimise possible interference. The basic approach of such a
procedure is presented in Sect. 6.1.

2.5.7 Framebuffer
The result of the previous processing steps can be written into
the framebuffer to display the result on a screen. The OpenGL usually uses
as default framebuffer a buffer provided by the windowing system used, so
that the render result is efficiently output in a window. Alternatively,
framebuffer objects can be used whose content is not directly visible. These
objects can be used for computations in a variety of applications, for
example, to prepare the output via the visible framebuffer before the actual
content is displayed.

2.6 Shaders
In 2004, with the introduction of the programmable pipeline, the vertex
shader and the fragment shader were introduced as programmable function
blocks in the OpenGL graphics pipeline (see Sects. 2.4 and 2.5). Although
these shaders are very freely programmable, they typically perform certain
main tasks. The vertex shader usually performs geometric transformations
of the vertices until they are available in clip coordinates. Furthermore, it
often performs preparatory computations for the subsequent processing
stages. The task of the fragment shader is mostly to calculate the lighting
for the fragments depending on different light sources and the material
properties of objects. In addition, the application of textures is usually taken
over by this shader. In the compatibility profile, the vertex and fragment
shaders are optional. In the core profile, these two shaders must be present;
otherwise, the rendering result is undefined (see [10]).
For advanced readers, it should be noted that only the colour values of
the fragment processing result are undefined if there is no fragment shader
in the core profile. Depth values and stencil values have the same values as
on the input side, so they are passed through this stage without change.4
This may be useful, for example, for the application of shadow maps (see
Sect. 10.1.2).

Fig. 2.3 Vertex processing in the programmable pipeline with all function blocks: The Tessellation
Control Shader, the Tessellation Evaluation Shader and the Geometry Shader are optional

As of OpenGL version 3.x in 2008, the geometry shader was introduced


as a further programmable element. This shader is executed after the vertex
shader or after the optional tessellation stage (see Fig. 2.3). With the help of
this shader, the geometry can be changed by destroying or creating new
geometric primitives such as points, lines or polygons. For this purpose, this
shader also has access to the vertices of neighbouring primitives
(adjacencies). Before the geometry shader is executed, the decomposition
into base primitives takes place through the function block primitive
assembly (see Sect. 2.5.3). Thus, this decomposition step is carried out
earlier in the case of using the geometry shader.
The described functionality can be used for the dynamic generation of
different geometries. For example, a differently detailed geometry can be
generated depending on the distance of the viewer to the camera. This type
of distance-dependent rendering is called level of detail (LOD) (see Sect. 4.
5.1). Furthermore, the geometry can be duplicated and rendered from
different angles using the graphics pipeline. This is useful, for example, to
create shadow textures from different angles in order to simulate realistic
shadows. Similarly, so-called environment mapping (see Sect. 10.1.2) can
be used to simulate reflective objects by projecting the object’s
surroundings (from different angles) onto the inside of a cube and then
applying this as a texture to the reflective object. Together with
the transform feedback, the geometry shader can also be used for the
realisation of particle systems. Transform feedback allows the result of
vertex processing to be buffered in buffer objects and returned to the
graphics application or to the first stage of the graphics pipeline for further
processing. This allows the (slightly) modified graphics data to be
processed iteratively several times. In this case, the geometry shader takes
over the iterative creation, modification and destruction of the particles,
depending on the parameters (such as point of origin, initial velocity,
direction of motion, and lifetime) of each individual particle. Such particle
systems (see also Sect. 11.4) can be used to create effects such as fire,
explosions, fog, flowing water or snow. The use of the geometry shader is
optional in both OpenGL profiles.
The OpenGL version 4.0 in 2010 introduced the tessellation unit, which
allows the geometry to be divided into primitives dynamically during
runtime (for an explanation of tessellation, see Sect. 4.2). As shown in Fig.
2.3, the tessellation unit consists of three parts, the tessellation control
shader, the configurable tessellation primitive generation and
the tessellation evaluation shader. The two shaders are thus the
programmable elements of the tessellation unit.
With the tessellation unit, a new primitive, the so-called patch primitive
was introduced. A patch primitive is a universal primitive consisting of
individual patches, which in turn consist of a given number of vertices. This
can be used, for example, to define parametric surfaces that are broken
down into triangles by the subsequent tessellation steps. The tessellation
control shader controls the division of the patches into smaller primitives by
defining parameters for the subsequent stage per patch. Care must be taken
that the edges of adjacent patches have the same tessellation level, i.e., are
divided into the same number of primitives, in order to avoid discontinuities
or gaps in a continuously desired polygon mesh. In addition, further
attributes can be added per patch for the subsequent shader. The tessellation
control shader is optional. If it is not available, tessellation is performed
with preset values.
The tessellation primitive generation function block performs the
division of the patches into a set of points, line segments and triangles
based on the parameters defined by the tessellation control shader. The
tessellation evaluation shader is used to calculate the final vertex data for
each of the newly generated primitives. For example, the position
coordinates, the normal vectors or the texture coordinates for the (newly
generated) vertices have to be determined. If the tessellation evaluation
shader is not present, no tessellation takes place through the rendering
pipeline. As indicated earlier, parametric surfaces can be processed by the
tessellation unit. Such surfaces can be used to describe the surfaces of
objects by a few control points. The graphics pipeline generates the
(detailed) triangle meshes to be displayed on this basis. This means that
these tessellated triangle meshes no longer have to be provided by the
graphics application, possibly using external tools, but can be determined
by the GPU. On the other hand, this means a higher computing effort for the
GPU. With this mechanism, it is possible to pass the parameters of the
freeform surfaces often used for modelling surfaces, such as Bézier surfaces
or NURBS(non-uniform rational B-splines) (see Sect. 4.6), to the graphics
pipeline and only then perform the calculation of the triangle meshes.
Furthermore, through a dynamic degree of tessellation, the level of
detail of the triangle meshes that make up the 3D objects can be scaled
depending on the distance to the camera. This can save computational time
for more distant objects without losing detail when the camera is very close
to the object. Such a distance-dependent rendering is called level of detail
(see Sect. 4.5.1) and often used for landscapes or objects that may be
temporarily in the background. As shown above, a similar realisation is
possible using the geometry shader.
Table 2.3 Shaders in the OpenGL with their typical tasks or with application examples for the
shaders

Shader Must/optional Typical tasks or examples of use


in the core
profile
Vertex Must Geometric transformations into clip coordinates: Model, view and
projection transformations, preparation of subsequent calculations
Tessellation Optional Tessellation control per patch
control
Shader Must/optional Typical tasks or examples of use
in the core
profile
Tessellation Optional Calculation of the final vertex data after tessellation. If this shader
evaluation is missing, no tessellation takes place
Geometry Optional Dynamic creation and destruction of geometry, application
examples: special shadow calculation, reflecting objects, particle
systems
Fragment Must (see notes Realistic lighting, application of textures, fog calculation
in the text)
Compute Optional Universal computations on the GPU, not directly included in the
graphics pipeline

Another application for the tessellation unit is the realisation of


displacement mapping. This involves applying a fine texture (the
displacement map) to an object modelled by a coarse polygon mesh,
resulting in a refined geometry (see Sect. 10.2). Furthermore, an iterative
algorithm can be implemented by the tessellation unit together with the
transform feedback, which realises the technique of subdivision surfaces.
Here, a coarse polygon mesh undergoes a refinement process several times
until the desired high resolution is achieved. Details on this technique can
be found, for example, in [3, p. 607ff].
As of OpenGL version 4.3 from 2012, the compute shader can be used.
This shader is intended for performing general-purpose tasks that do not
need to be directly related to computing and rendering computer graphics. It
is not directly integrated into the OpenGL graphics pipelines and therefore
does not appear in the figures of this chapter. Typical applications for
graphics applications using this shader are, for example, calculations
upstream of the graphics pipeline for the animation of water surfaces or
particle systems. Likewise, the compute shader can be used for ray tracing
algorithms to calculate global illumination (see Sect. 9.9). Filter operations
for images or the scaling of images are also possible applications for the
compute shader.
Table 2.3 shows an overview of the shaders that can be used in the
OpenGL with their typical functions or application examples for the
shaders. The shaders are listed in the order in which they are processed in
the OpenGL graphics pipeline. As can be seen from the explanations for
advanced readers at the beginning of this section, the fragment shader is
mandatory according to the OpenGL specification. However, it is optional
from a technical point of view. If this shader is missing, the colour values of
the result of the fragment processing are undefined in any case.
Due to the increasing possibility of flexible programming of GPUs
through shaders, the use of graphics cards for tasks outside computer
graphics has become very interesting in recent years. The speed advantages
of GPUs over standard processors used as central processing units (CPUs)
essentially result from the higher degree of parallelism for the execution of
certain uniform operations in the graphics processing domain (for example,
matrix operations). This area is known as general-purpose computing on
graphics processing units (GPGPU). One way to use the different
processors in computers is the Open Computing Language (OpenCL),
which is specified as an open standard like the OpenGL by the Khronos
Group. However, an in-depth discussion of this topic would go beyond the
scope of this book.

2.7 OpenGL Programming with JOGL


This section explains the basic mechanisms for graphics programming
using the OpenGL binding Java OpenGL (JOGL). Sections 2.8 and 2.10
contain simple examples that illustrate how a JOGL renderer works. The
URL https://jogamp.org contains the Java archives for installing JOGL and
further information, such as tutorials and programming examples.

Fig. 2.4 Integrating a graphics application renderer into the JOGL architecture

The main task in graphics programming using the OpenGL binding


JOGL is to program a renderer that forwards the drawing commands to the
OpenGL for drawing on the GPU. The integration of such a renderer into
the JOGL architecture and the connection to a JOGL window (GLWindow)
is shown in Fig. 2.4 using a UML class diagram.5 To be able to react to
OpenGL events, the renderer must implement the methods
init,display, reshape and dispose defined by the interface
GLEventListener. The interface GLAutoDrawable calls these
methods when certain events occur. To receive these calls, an object of the
renderer must be registered as an observer of an object that implements the
GLAutoDrawable interface and that is to be observed accordingly. In the
figure, the class GLWindow implements this interface.6
By using an object of the JOGL class GLWindow, a window (of the
JOGL system) is created that can be used with Java on a Windows
operating system, for example. See https://jogamp.org for instructions on
how to integrate a GLWindow object into other windowing systems, such
as the Abstract Windowing Toolkit (AWT) or Swing. This allows the
connection to a Frame object or a Jframe object as an alternative output
window.
Figure 2.4 shows that in the main method of the Main class a
GLWindow object and a Renderer object are created. Subsequently, the
addGLEventListener method is used to register the Renderer
object as an observer of the GLWindow object.
To realise a graphic animation, the display method of the
GLWindow object is usually called regularly by an animation object. This
is done, for example, 60 times per second in order to achieve a smooth
rendering of the (moving) graphic content. After a call of this method,
through the GLAutoDrawable and GLEventListener interfaces, the
display methods of all registered Renderer objects are called, to
trigger drawing of the OpenGL content by the Renderer objects. In the
figure, there is only one Renderer object, which represents the simplest
case.
Calling the other methods of the interface GLEventListener works
according to the same mechanism as for the display method. The init
method is called when the OpenGL context is created. This is the case after
starting the program. The method reshape is called when the size of the
window to which the Renderer object is connected changes. This occurs,
for example, when the user enlarges the window with the mouse. The
method dispose is called just before the OpenGL context is destroyed,
for example, by the user exiting the program. This method should contain
source code that deallocates resources that were reserved on the GPU.
Figure 2.5 shows the basic structure of a JOGL renderer with
implemented but empty methods of the GLEventListener interface.

Fig. 2.5 Basic structure of a JOGL renderer (Java)


As explained in this section, the methods of the renderer are invoked
through the GLAutoDrawable interface, which is implemented and used
by a GLWindow object. The creation of objects of class GLWindow takes
place using a parameter of type GLCapabilities, by which the
OpenGL profile used (see Sect. 2.4) and the OpenGL version are specified.
Figure 2.6 shows the source code lines to set the OpenGL profile and the
OpenGL version and to create a GLWindow object. A Renderer object is
then created and registered as the observer of the GLWindow object. In the
example, the JOGL profile GL2 is selected, which activates the OpenGL
compatibility profile and supports all methods of OpenGL versions 1.0 to
3.0.
Table 2.4 provides an overview of selected profiles for applications for
desktop computers and notebooks. There are also profiles for OpenGL ES
support and profiles containing subsets of frequently used OpenGL methods
(not shown in the table). In order for the graphics application to run on
many graphics cards, a JOGL profile with a version number as low as
possible should be selected. It should be noted that very efficient graphics
applications can be created with the help of the core profile.

Fig. 2.6 JOGL source code to set an OpenGL profile and to create a window and a renderer (Java)

Table 2.4 JOGL profiles to select the OpenGL profile and OpenGL version: A selection of profiles
for desktop applications is shown

JOGL Compatibility Core OpenGL


profile name profile profile versions
GL2 X $$\textendash {}$$ 1.0 $$\textendash {}$$3.0
GL3bc X $$\textendash {}$$ 3.1 $$\textendash {}$$3.3
GL4bc X $$\textendash {}$$ 4.0 $$\textendash {}$$4.5
GL3 $$\textendash {}$$ X 3.1 $$\textendash {}$$3.3
GL4 $$\textendash {}$$ X 4.0 $$\textendash {}$$4.5

2.8 Example of a JOGL Program Without


Shaders
In this section, the basic principles of programming a renderer with the
OpenGL binding JOGL are illustrated by means of an example. To simplify
matters, this example uses functions of the fixed-function pipeline and not
yet shaders. Building on this, the following sections of this chapter explain
how to use the programmable pipeline, including shaders.
For this purpose, consider the Java source code of the init method in
Fig. 2.7 and the display method in Fig. 2.8, through which the triangle
shown in Fig. 2.9 is drawn. The first line of the init method or the
display method stores in the variable gl the OpenGL object passed by
the parameter object drawable. The object in this variable gl is used to
draw the graphics content using OpenGL. This gl object plays a central
role in JOGL, as all subsequent OpenGL commands are executed using this
object (calls of methods of the gl object). These OpenGL commands and
their parameters are very similar to the OpenGL commands used in
OpenGL programming interfaces for the C programming language.
Therefore, OpenGL source code for the C programming interface can be
easily translated into JOGL source code from relevant OpenGL literature or
web sources. Furthermore, the OpenGL version and the OpenGL profile are
selected in the first line of the renderer methods by choosing the type for
the gl object (see Table 2.4). In this case, it is the compatibility profile up
to OpenGL version 3.0. The selected OpenGL profile must match the
selected OpenGL profile when creating the output window (GLWindow;
see Fig. 2.6). It is worth remembering that the init method is executed
once when the OpenGL program is created, typically shortly after the
application is started. The display method is called for each frame,
typically 60 times per second.

Fig. 2.7 Init method of a JOGL renderer (Java)

Fig. 2.8 Display method of a JOGL renderer (Java)

Fig. 2.9 A triangle drawn with the OpenGL binding JOGL


In the init method (Fig. 2.7), after saving the gl object, an object is
created for accessing the OpenGL Utility Library (GLU), which is later
used in the display method. This library provides useful functions that
are not directly in the OpenGL language scope. The command
glClearColor defines the colour of the background of the drawing area.
The background colour is specified as an RGB colour triple with a
transparency value (alpha value) as the last component. Valid values for the
individual components lie in the interval [0, 1] (see Chap. 6 for
explanations on colour representations). In this example, the non-
transparent colour white is set. Since OpenGL is a state machine, once set,
values and settings are retained until they are overwritten or deleted. In this
example, the background colour is not to be changed by the animation, so it
is sufficient to set this colour once during initialisation.
In the display method (Fig. 2.8), after the gl object has been saved,
the glClear command initialises the colour buffer and the depth buffer to
build up the display for a new frame. The previously defined background
colour for the colour buffer is used.
The following two commands compute the model view matrix (see
Sect. 2.5.1). First, the model view matrix stored in the gl object is reset to
the unit matrix to delete the calculation results from the last frame. This is
followed by the use of the glu object already created in the init method.
By gluLookAt, the view matrix is computed and multiplied by the
existing matrix in the gl object. Here, the first three parameter values
represent the three-dimensional coordinates of the viewer location (eye
point, camera position). The next three parameter values are the three-
dimensional coordinates of the point at which the viewer (or camera) is
looking at. In this example, this is the origin of the coordinate system. The
last three parameter values represent the so-called up-vector, which
indicates the direction upwards at the viewer position. This allows the tilt of
the viewer (or camera) to be determined. In this case, the vector is aligned
along the positive y-axis.
After calculating the view transformation, commands can be executed
to perform a model transformation, such as moving, rotating or scaling the
triangle. In this case, the glRotatef command rotates the triangle by 10
degrees around the z-axis. It should be noted that OpenGL uses a right-
handed coordinate system. If no transformations have taken place, the x-
axis points to the right and the y-axis points upwards in the image plane.
The z-axis points in the direction of the viewer, i.e., it points out of the
image plane. After initialisation (and without any transformation having
been applied), the origin of the coordinate system is displayed in the centre
of the viewport. At the edges of the viewport are the coordinates
$$-1$$ (left and lower edge) and 1 (right and upper edge),
respectively. The axis around which to rotate is specified by the last three
arguments in the glRotatef call. In this case, it is the z-axis. The amount
of rotation in degrees around this axis is determined by the first argument of
this command. This operation multiples this model transformation with the
model view matrix in the gl object by matrix multiplication and stores the
result in the gl object. Details on geometric transformations are in Chap. 5.
The glColor4f method sets the (foreground) colour for subsequent
drawing commands. Since there are different variants of this command, the
number at the end of the command name indicates how many arguments are
expected. In this case, it is an RGBA colour value. The last letter of the
command name indicates the expected type of arguments. The f in this case
sets the argument type to float so that the colour components can be
specified as floating point values in the interval [0, 1]. In the OpenGL
command syntax, there are a number of command groups that enable
different parameter formats. These are distinguished by the number–letter
combination mentioned above. The glRotate command, for example, is
also available as a double variant and is then called glRotated.
The actual drawing of the triangle is triggered by the glBegin/glEnd
block. The glBegin command specifies the geometric primitives to be
drawn, which are defined by the vertices specified within the block. These
are (separate) triangles in this example. The glVertex3f commands each
defines the three-dimensional position coordinates of the individual
vertices. In this case, these are the vertices of the triangle to be drawn.
Section 3.2 provides explanations of the geometric primitives in the
OpenGL. Section 3.3 contains details on the possible drawing commands in
the OpenGL.
Since the geometric transformations involve matrices and matrix
operations in which the new matrix (in the program code) is the right matrix
in the matrix multiplication operation,7 the order of execution of the
transformations is logically reversed (see Sect. 5.1.1). In the above
example, the triangle is logically drawn first and then the model
transformation and the view transformation are applied. In order to
understand these steps, the program code must be read from bottom to top.
The full project JoglBasicStartCodeFFP for this example can be
found in the supplementary material to the online version of this chapter.
The source code includes implementations of the reshape and dispose
methods and the control of the renderer by an animation object, which are
not shown here. If fixed-function pipeline commands are used, then the
dispose method can often be left empty. Furthermore, the supplementary
material to the online version of this chapter contains the project
JoglStartCodeFFP, which enables the interactive positioning of the
camera using the mouse and keyboard and is well suited as a starting point
for own JOGL developments.
It should be noted that the example in this section uses commands of the
fixed-function pipeline that are only available in the compatibility profile.
These include in particular the drawing commands using the
glBegin/glEnd block and the GLU library. Both are no longer available
in the core profile.

2.9 Programming Shaders


Special high-level programming languages have been developed for
programming shaders, which are very similar to high-level programming
languages for developing CPU programs, but take into account the specifics
of a graphics pipeline. For example, Apple and Microsoft have developed
the Metal Shading Language (MSL) and the High-Level Shading Language
(HLSL) for their 3D graphics or multimedia programming interfaces Metal
and DirectX, respectively. The high-level shader programming language C
for Graphics (Cg) was designed by Nvidia for the creation of shader
programs for DirectX or OpenGL. Since 2012, however, Cg has no longer
been developed. For programming OpenGL shaders described in Sect. 2.6,
the Khronos Group developed the OpenGL Shading Language (GLSL) (see
[4]), whose syntax is based on the syntax of the C programming language.

Fig. 2.10 Data flow in the OpenGL programmable graphics pipeline: The geometry and tessellation
shaders and the feedbacks to the application are not shown
2.9.1 Data Flow in the Programmable Pipeline
Figure 2.10 shows in an abstract representation the data flow from the
graphics application (left side) to the programmable OpenGL graphics
pipeline (right side). For simplification, the possible data feedback from the
graphics pipeline to the application and the optional shaders (geometry and
the two tessellation shaders) are not shown. Furthermore, the non-
programmable blocks of the graphics pipeline between the two shaders
have been combined into one function block.
During a typical graphics animation, a set of vertices, usually storing
positional coordination in three-dimensional space (3D coordinates), colour
values, normal vectors and texture coordinates, is continuously passed to
the vertex shader. This vertex data is passed via user-defined variables,
which are defined in the vertex shader (user-defined in variables). These
variables are also referred to as attributes in the case of the vertex shader.
This logical view of data transfer is intended to increase the understanding
of how the graphics pipeline works. In fact, in a concrete OpenGL
implementation, the vertex data is usually buffered and transferred in one
block to increase efficiency (see Sects. 2.10 and 2.11).
Furthermore, both the vertex and fragment shaders receive data via so-
called uniforms. These are another type of variable that can be defined in
the respective shader and are independent of a single vertex or fragment.
Typically, transformation matrices and illumination data are passed to
shaders via uniforms. A third data path can be used to pass textures (texture
data, texel data) to a texture storage from which the shaders can read. The
texture storage is not shown in Fig. 2.10 for simplicity.
Conceptually, a vertex shader is called exactly once for each input
vertex. However, optimisations are possible in the respective
implementation of the pipeline, which can lead to fewer calls. In addition,
parallelising the execution of the vertex shader is allowed for processing
multiple vertices at the same time. The result of processing a vertex by the
vertex shader can be passed on to the next non-programmable function
block of the graphics pipeline via user-defined out variables or special out
variables. An important predefined out variable is, for example,
gl_Position, through which the newly calculated position coordinate of
the vertex is passed on.
An essential non-programmable function block between the vertex and
fragment shaders is rasterisation. In rasterisation, fragments are created
from vertices by converting the vector graphic into a raster graphic.
Usually, several fragments are created from a few vertices (see Sect. 2.5.4
and Chap. 7). For this conversion, it can be specified whether the values of
the output variables of the vertex shader (out variables) are taken as values
for the input variables (in variables) of the fragment shader and copied into
several fragments by flat shading (see Sect. 2.5.2) or whether the values for
the newly created fragments are determined by linear interpolation.
The names of the user-defined input variables of the fragment shader
must be the same as the user-defined output variables of the vertex shader
in order to enable a mapping of the rasterised fragment data to the vertex
data. For example, a predefined input variable for the fragment shader is
gl_FragCoord, which contains the position coordinate of the fragment
in question and was determined by rasterisation. Based on the input
variables, the uniforms and the texture data, the fragment shader typically
performs the calculations for the illumination of the scene or the mapping
of textures onto objects (texture mapping). The calculation results are
passed on to the next stage of the graphics pipeline via user-defined and
predefined output variables. For example, gl_FragColor determines the
colour for further processing. However, this predefined variable is only
available in the compatibility profile. User-defined variables can be used to
pass the colour values to the subsequent stages in the core profile, which are
mapped to the input of a specific output channel via a special mechanism
(via output layout qualifiers). In case only one output variable is defined, it
is mapped by default to channel 0 (layout (location = 0)), which
in turn allows output to the framebuffer. Another predefined output variable
of the fragment shader, which is also defined in the core profile, is
gl_FragDepth. This allows the depth value of a fragment in the
fragment shader to be changed and passed on to the next level.
Predefined and user-defined input variables (in variables) can vary per
vertex or fragment. In the vertex shader, these variables are used to pass
geometry data. In contrast, uniforms are usually constant for each geometric
primitive. A typical use for uniforms is to pass transformation matrices and
illumination data to the shaders. The third type of data are textures (texture
data), which are passed to the shaders via a texture storage. The texture
storage can also be used to record the results of the fragment calculation
and to reuse them accordingly. In programming terms, textures are special
uniforms. For example, sampler2D denotes the type of a two-dimensional
texture.

2.9.2 OpenGL and GLSL Versions


Since OpenGL is only a specification of a graphics interface and is
constantly being extended, not every GPU and every graphics card driver
will support the latest complete range of functions. This will be particularly
noticeable when using older GPUs. If a professional graphics application is
to function efficiently on a large number of computers with GPU support, it
must react to this situation and, if possible, also be able to get by with a
smaller OpenGL and GLSL feature set. For this purpose, the JOGL
command glGetString reads out properties of the GPU, especially the
version number of the supported OpenGL version. Depending on the range
of functions found, alternative implementations of the graphics application
or alternative shader code may have to be used. The specifications of the
GLSL are subject to version numbering just like those of the OpenGL.
However, it is only since OpenGL version 3.3 that the numbers of the
OpenGL versions match those of the GLSL versions.
The language range to be used in a GLSL shader program can be
specified by a so-called preprocessor directive in the first non-commented
line of a shader program. For example, the following directive specifies
GLSL version 4.5.

Such a shader program cannot be executed on older graphics cards without


the support of this version. Without specifying the profile, the core profile
is set. However, the profile can be specified explicitly, as the following
example shows.

The following directive switches to the compatibility profile.

2.9.3 OpenGL Extensions


The OpenGL offers a special mechanism for extending the range of
functions. This enables, for example, manufacturers of graphics processors
to implement so-called OpenGL Extensions. In order for these extensions to
be widely used, the Khronos Group provides web pages with an Extensions
Registry,8 which contains a list of known OpenGL Extensions. Whether a
certain extension is available can be queried in the OpenGL application.
Only if an extension is available, the corresponding OpenGL commands
can be used.9
For example, the following JOGL command checks whether the
extension for vertex buffer objects is supported by the graphics processor
driver currently in use.

The functionality of vertex buffer objects is explained in Sect. 2.10. Vertex


buffer objects were only available as an extension in the past, but were
included as part of the interface in the OpenGL specification as of version
1.5. In older versions, the extension can still be used. This procedure allows
enhancements to the scope of the OpenGL graphics interface to be tested
extensively before they are included in the specification.
In the shading language GLSL, extensions must be explicitly enabled.
For example, the following preprocessor directive enables an extension for
clipping.

The following directive activates all available extensions.

If all extensions are activated in this way, a warning is given each time an
extension is used. Directives for the activation of extensions must follow
the version directive in the shader program, but must be placed before the
shader source code.

2.9.4 Functions of the GLSL


The shader language GLSL is based on the syntax of the programming
language C and has similar data types. In addition, useful data types for
graphics programming are available, such as two-, three- and four-
dimensional vectors. Furthermore, matrices with different numbers of
columns and rows from two to four are available. Thus, the smallest matrix
provided is a $$2\times 2$$ matrix and the largest matrix provided is a
$$4\times 4$$ matrix. In addition, the corresponding vector–matrix and
matrix–matrix operations are supported, which can be executed efficiently
with hardware support depending on the GPU and driver implementation.
These include operations for transposition, inversion and calculation of the
determinant of matrices. Furthermore, special data types are available for
accessing one-, two- and three-dimensional textures. The values of
trigonometric, exponential and logarithmic functions can also be calculated.
In the fragment shader, partial derivatives of functions can be determined
for certain special cases. In contrast to the programming language C, no
special libraries have to be integrated for these functionalities. Furthermore,
the concept of overloading is present, whereby correspondingly suitable
functions can exist under the same function name for different numbers and
types of parameters. Overloading has been used for many predefined
functions and can also be used for self-defined functions.
A useful function of the GLSL language is the so-called swizzling,
which allows to read individual vector components. The following three
swizzling sets are available for addressing vector components.
Components of location coordinates: x, y, z, w.
Colour components: r, g, b, a.
Components of texture coordinates: s, t, p, q.
The following example shows the effect of accessing vector elements.

With the help of the so-called write masking, the vector components can be
changed individually. The following example shows the effect of this
operation.

The GLSL specification [4] and the OpenGL Programming Guide [8]
provide a comprehensive definition and description of the OpenGL Shading
Language. This includes a listing of the possible predefined variables
(special input and output variables). A very condensed introduction is
available in [6, pp. 59–71]. See [3, pp. 927–944] for examples of GLSL
shaders. There is additional material on the web for the OpenGL
SuperBible [12], including examples of GLSL shaders.

2.9.5 Building a GLSL Shader Program


The OpenGL Shading Language (GLSL) is a high-level programming
language whose source code must be compiled and bound in a manner
similar to the C and C++ programming languages. The tools for this are in
the driver for the graphics processor. In Fig. 2.11, the typical sequence of
commands for creating a shader program consisting of a vertex shader and a
fragment shader is shown as a UML sequence diagram10 from the
perspective of a Java application.11 Here, (the references to) the shaders and
the shader program are understood as objects. The parameter lists of the
messages (which correspond to method calls in Java) are greatly simplified
or not specified for the sake of a clear presentation. The methods shown
must be called by the graphics application, for example, by the init
method of the OpenGL renderer object (:Renderer).

Fig. 2.11 UML sequence diagram for the generation of a shader program

For each shader to be compiled, a shader object is first created. This


object can be referenced by an integer value (type int). Afterwards, the
command glShaderSource assigns the shader source code string to a
shader object, which is then compiled by glCompileShader. When the
source code for all shaders has been compiled, a shader program object is
created, which can again be referenced by an integer value. After the shader
objects have been assigned to the program by glAttachShader, the
translated shaders are linked to an executable shader program by
glLinkProgram.
After this last step, the assignments of the shader objects to the shader
program object can be removed again and the memory occupied by this can
be released. This is done with the commands glDetachShader and
glDeleteShader. If changes to the executable shader program are
planned in the further course and rebinding becomes necessary, then these
objects can be retained, modified if necessary and used for the renewed
creation of an executable program. To use the executable shader program, it
must be activated by glUseProgram. Figure 2.11 shows an example of
the integration of two shaders. With the help of this mechanism, shader
programs can be created that consist of all available shader types.
Conceptually, the generation of the shader objects and the shader
program takes place on the GPU. Due to the possibility of different
OpenGL implementations, it is not determined or predictable whether, for
example, the shader compilation and binding to an executable shader
program takes place entirely on the CPU and only the result is transferred to
the GPU or whether a transfer to the GPU already takes place earlier.
Since different objects of the 3D scene are usually rendered using
different shaders, several shader programs can be created, each of which is
activated by glUseProgram before the object in question is rendered.12 If
different objects use different shaders, glUseProgram is called multiple
times per frame (in the display method). The remaining commands
shown in Fig. 2.11 only need to be called once in the init method if no
shader programs need to be changed or created during the rendering
process.
Another way of including shaders is to use the Standard Portable
Intermediate Representation (SPIR). SPIR was originally developed for
the Open Computing Language (OpenCL), which enables parallel
programming of the different processors of a computer system. The SPIR-V
version is part of the newer OpenCL specifications and part of the Vulkan
graphics programming interface. Through an OpenGL 4.5 extension, SPIR-
V can also be used in OpenGL programs.13
With suitable tools, shaders can in principle be developed in any
shading language and then compiled into the intermediate language SPIR-V.
The SPIR-V binary code is not easily readable by humans. This allows
shaders to be compiled into SPIR-V binary code, which can then be shared
with other developers without having to expose development concepts or
programming details through open-source code.
The integration of shaders into SPIR-V binary code is very similar to
the process shown in Fig. 2.11. Instead of assigning a GLSL string to a
shader object and then compiling it, the pre-compiled SPIR-V binary code,
which can also contain multiple shaders, can be assigned to shader objects
with glShaderBinary. Subsequently, the entry points into the binary
code are defined by the glSpecializeShader command. The direct
mixing of GLSL shaders and SPIR-V shaders in a shader program is only
possible if GLSL shaders have previously been converted to SPIR-V binary
code using external tools.

2.10 Example of a JOGL Program Using GLSL


Shaders
Building on the previous sections and the example from Sect. 2.8, which
uses the fixed-function pipeline, the use of the programmable pipeline is
explained below. Only commands of the core profile are used and shaders
are integrated that are programmed in the shading language GLSL. To
illustrate the basic mechanisms, this program again draws a simple triangle
(see Fig. 2.16). The full Java program JoglBasicStartCodePP is
available as supplementary material to the online version of this chapter.

Fig. 2.12 Init method of a JOGL renderer using only the core profile (Java)

Figure 2.12 shows the init method of the renderer. First, the OpenGL
object is stored in the variable gl. Since this is of type GL3, only
commands of the core profile are available (see Table 2.4). A vertex shader
and a fragment shader must be included when using this profile; otherwise,
the result of the program execution is undefined. The next step is the
creation of an object for the use of a JOGL class for the calculation of
transformation and projection matrices. This class PMVMatrix can be
used very well together with the core profile. The class GLU is not available
for the core profile. The source code of the vertex and fragment shaders is
loaded by the following two commands and used to create an executable
shader program. The class ShaderProgram.14 is used, which applies the
command sequence shown in Fig. 2.11.
By default, the drawing of 3D objects of the scene in the core profile
takes place via buffers (more precisely via vertex buffer objects (VBO)),
which are created in the init method on the GPU and filled with object
data. Drawing from such a buffer object takes place independently on the
GPU when a draw command is called (usually) in the display method.
Since a large part of the object data of a 3D scene often changes little or not
at all, little or no data needs to be retransmitted to the GPU. This makes this
approach very efficient for drawing. For this reason, the drawing method of
the compatibility profile used in the example from Sect. 2.8 is no longer
available in the core profile.
In the source code in Fig. 2.12, a Java array is stored in the variable
vertices, which contains the vertex data for the triangle to be drawn.
The three-dimensional position coordinates and the three-dimensional
colour values in the RGB colour space are arranged alternately. The data for
a vertex thus consist of six consecutive floating point values, which usually
come from the interval [0, 1] for the position coordinates and colour values.
This interleaved arrangement of the data means that only one buffer is
needed for all vertices. Alternatively, two separate buffers can be used for
the position coordinates and the colour data. The processing of the vertex
data, in this case called vertex attributes, takes place during drawing by the
vertex shader.
Since the use of buffer objects on the GPU leads to very efficient
graphics programs, OpenGL has a large number of such buffer objects that
are suitable for different purposes. By creating a buffer object,
(conceptually) memory is reserved on the GPU that can be filled for the
corresponding purpose. In this example, vertex buffer objects are used for
drawing. To use this buffer type, a vertex array object (VAO) must also be
created and used.
Vertex array objects (VAOs) are buffers on the GPU that store all the
states for the definition of vertex data. Both the format of the data and the
references to the necessary buffers (for example, to—possibly several—
vertex buffer objects) are stored.15 Here, the data is not copied, but only
references to the relevant buffers are stored. Using the concept of vertex
array objects, a single glBindVertexArray command can thus switch
the VAO, making all (previously prepared) necessary buffers for a particular
3D object to be drawn accessible and drawable by a drawing command.
This is particularly helpful when many different 3D objects need to be
drawn in a scene, which is almost always the case. The concept of vertex
array objects makes it very clear that OpenGL is designed as a state
machine. Once a state is set, it is maintained until it is explicitly changed.
When a VAO is activated, the associated context is active until another
VAO is activated.
In the example considered here, the use of vertex array objects does not
seem to be necessary, since only a single triangle is drawn and thus
switching between several objects is not required. In order to be able to use
vertex buffer objects, however, at least one vertex array object16 should
always be present. As can be seen in the example in Fig. 2.12, the variable
vaoHandle is defined to store the names (references) of the vertex array
objects to be reserved on the GPU. The names of the buffers on the GPU
are integers, so the data type for the Java array is int[]. Then
glGenVertexArrays reserves exactly one vertex array object on the
GPU. If the returned name (integer value) is less than one, an error occurred
during creation. For example, there might not have been enough memory
available on the GPU. Subsequently, glBindVertexArray activates
this (newly created) VAO.
As can be seen from Fig. 2.12, glGenBuffers reserves exactly one
more buffer object on the GPU in which the vertex data for the triangle to
be drawn is to be stored. After successful creation, glBindBuffer and
the argument GL_ARRAY_BUFFER set this buffer to a vertex buffer object
in which the vertex attributes to be processed by the vertex shader can be
stored. Subsequently, the data from the Java array vertices prepared
above is transferred to the GPU via the glBufferData command into the
previously reserved vertex buffer object. The argument
GL_STATIC_DRAW indicates that the vertex data is expected to be
transferred to the GPU only once and read frequently (for rendering). The
specification of other types of usage is possible. For example,
GL_DYNAMIC_DRAW specifies that the data is frequently modified in the
buffer and frequently read. These arguments serve as a hint for the OpenGL
implementation to apply optimised algorithms for efficient drawing, but are
not mandatory determinations of the use of the data.

Fig. 2.13 Source code of a simple vertex shader (GLSL)

After the data has been transferred to the GPU, the following four
OpenGL commands establish the connection between the (currently active)
vertex buffer object and the vertex shader so that the latter can (later)
process the correct vertex data from the correct buffer on the GPU. By
glEnableVertexAttribArray(0), the so-called vertex attribute
array with the name 0 is activated. This corresponds to the layout position 0
(layout (location = 0) ...) defined in the vertex shader (see
Fig. 2.13). This layout position is linked in the shader to the input variable
vPosition, which holds the position coordinates for the respective
vertex. The OpenGL command glVertexAttribPointer (see Fig.
2.12) maps this layout position to the vertex attributes in the VBO. In this
example, the arguments have the following meaning:
0: Link the position data in the VBO to the vertex attribute array 0
(layout position 0) of the vertex shader.
3: A vertex vector element consists of three components, namely the
three-dimensional position coordinates of the currently active VBO.
GL_FLOAT: The data is stored in float format.
false: No normalisation of data is required.
$$\texttt {6*}$$ Float.BYTES: The distance to the next record of
position coordinates in bytes is six times the number of bytes occupied
by a float value. Since three colour values are stored between each of
the three position coordinates (see the variable vertices above), the
distance between the first components of the position coordinates is six.
This value is to be multiplied by the number of bytes for a floating point
value. This is four in Java and is supplied by Float.BYTES.
0: The offset from the first entry of the buffer under which the first
component of the position coordinate is stored is 0. In this case, the first
entry (index 0) of the buffer contains the first component of the position
coordinates.
Subsequently, glEnableVertexAttribArray(1) activates the
vertex attribute array with the name 1. This corresponds to layout location 1
(layout (location = 1) ...) defined in the vertex shader (see
Fig. 2.13). This layout position is connected to the input variable vColor
in the shader and takes the colour values of the respective vertex. The
OpenGL command glVertexAttrib Pointer (see Fig. 2.12) maps
the layout position 1 to the colour data in the VBO. In this example, the
arguments have the same values as for the position coordinates, except for
the last argument. Since the first component of the colour values is stored in
the VBO only after the first three position coordinates (see above the
variable vertices), an offset of three (3) must be specified as the last
argument for the glVertexAttribPointer call. This is again
multiplied by the number of bytes for a float value in Java, since the
distance between the buffer entries must be specified in bytes.
At this point, it should be noted that the input variables vPosition
and vColor of the vertex shader (see Fig. 2.13) are defined in the GLSL
as three-dimensional vector types consisting of float components. This
format must naturally match the format specification in the OpenGL
application, which is the case in this example.
In the GLSL, fvec*, dvec*, ivec* and uvec* vector types with
float, double, integer and unsigned integer components are
available. Here, the values 2, 3 or 4 can be used for the * character, so that
vectors with two, three or four components can be used. As the last step of
the init method, a white background colour is defined by
glClearColor.
Figure 2.14 shows the display method of the OpenGL renderer for
this example. The first five commands are quite similar to the commands in
the example in Fig. 2.8, where the fixed-function pipeline is used. First, the
OpenGL object used for drawing is stored in the local variable gl. Then the
command glClear clears the colour buffer and the depth buffer.

Fig. 2.14 Display method of a JOGL renderer using only the core profile (Java)

Using the following two Java methods of the JOGL class PMVMatrix,
the model view matrix (see Sect. 2.5.1) is calculated. In contrast to the
example in Sect. 2.8, no transformation matrix is stored in the gl object.
The calculation takes place completely separately from the OpenGL by the
CPU. Only after a final transformation matrix has been determined, it is
passed on to the GPU. First, the model view matrix is reset to the unit
matrix in order to delete the calculation results from the last frame. By
gluLookAt, the matrix for the view matrix is calculated and multiplied by
the existing matrix in the pmvMatrix object. As explained above, the first
three parameter values represent the three-dimensional coordinates of the
viewer location (eye point, camera position). The next three parameter
values are the three-dimensional coordinates of the point at which the
viewer is looking at. In this example, this is the origin of the coordinate
system. The last three parameter values represent the up-vector, which
indicates the direction upwards at the viewer position. In this case, the
vector is aligned along the positive y-axis.
After calculating the view transformation, commands can be executed
to perform a model transformation, for example, the scene could be moved
(translated), rotated or scaled. In this case, the glRotatef command is
used to rotate the triangle 15 degrees around the z-axis.
The glBindVertexArray command switches to the vertex array
object (VAO) that was specified by the name passed. As already explained,
this establishes the necessary state for drawing a specific 3D object. In this
case, the vertex buffer object (VBO) transferred in the init method to the
GPU becomes accessible, which contains the vertices with the position
coordinates and colour values for the triangle to be drawn.
The command glUseProgram activates the corresponding shader
program containing the compiled vertex and fragment shaders. For this
simple example, it would have been sufficient to call this command in the
init method, since no other shader program exists. As a rule, however,
different 3D objects are displayed by different shaders, so that it is usually
necessary to switch between these shader programs within a frame and thus
within the display method.
The two glUniformMatrix4fv commands pass the projection
matrix and the model view matrix previously calculated using the
pmvMatrix object as uniform variables to the OpenGL so that they can be
used in the shaders. For the transfer of uniform variables, a large number of
command variants exist for scalars, vectors and matrices of different
dimensions and different primitive data types (such as int, float or
double). For a complete list of these commands, distinguished by the last
characters (number and letters) of the command name, refer to the GLSL
specification [4] or the OpenGL Programming Guide [5, p. 48]. The
arguments in the first call of glUniformMatrix4fv have the following
meaning in this example:
0: The data is passed via the uniform variable of layout position 0. The
count of the layout positions of the uniforms is different from the layout
positions for the vertex attribute arrays (the input variables of the vertex
shader).
1: Exactly one data item (one matrix) is passed.
false: No transposition (swapping of rows and columns) of the matrix
is required.
pmvMatrix.glGetPMatrixf(): The projection matrix read from
the pmvMatrix object is passed.
In the second glUniformMatrix4fv call, the model view matrix is
passed in a similar way. The transformation matrices passed are four-
dimensional, as this has advantages for the calculation of transformations
(see Sect. 5.2).
In this example, the two matrices should (sensibly) only be processed
by the vertex shader. As can be seen in Fig. 2.13, two variables have been
defined for reading the matrices in the vertex shader. The corresponding
layout positions must match the layout positions used in the
glUniformMatrix4fv commands to allow the correct data flow.
For advanced readers, the integer value (reference) of a layout position
defined in the vertex shader by location can be retrieved by the graphics
application using the OpenGL command glGetAttribLocation and
the variable name defined in the shader as argument. Thus, the variable
name must be known in the graphics application instead of the layout
position whose assignment is defined in the shader. A similar mechanism
exists with glGetUniformLocation for the identification of the layout
positions of uniforms. Both commands can be used since OpenGL version
2.0.
The last command in the display method is a drawing command by
which the triangle—from the perspective of the graphics application—is
drawn. For this purpose, the vertex data is read from the vertex buffer
object (VBO). Conceptually, the VBO including the data is already on the
GPU. When exactly the actual transfer of data from the CPU to the GPU
takes place depends on the optimisation decisions made for the respective
driver implementation of the GPU.
The argument GL_TRIANGLES specifies that triangles are to be drawn
from the vertices in the VBO. The second argument specifies the index of
the first vertex in the VBO that is used to draw the first triangle (offset).
This allows multiple objects to be stored in a VBO if required. These
different objects can be referenced accordingly by this parameter. The last
argument determines the number of vertices to be used. Since a triangle is
drawn in this example, three vertices must be used. For example, if there
are six vertices in the buffer and the last argument of the drawing command
is also six, two separate triangles would be drawn. Explanations of drawing
commands in the OpenGL are available in Sect. 3.3.
After calling a drawing command, each vertex is processed by the
vertex shader (see Fig. 2.13). As explained above in connection with the
init and display method, the user-defined input variables and the
uniforms are defined at the beginning of the vertex shader source code. The
user-defined output variable is the four-dimensional vector color. The
GLSL source code to be executed in the main method defines the
calculations per vertex. In this case, the multiplication of the projection
matrix with the model view matrix and the position coordinates of the input
vertex takes place. Since the vector for the position coordinates has only
three components, but the matrices are 4x4 matrices, the value 1 is added as
the fourth component of the homogeneous position coordinate. For an
explanation of homogeneous coordinates, see Sect. 5.1.1.
This computation in the shader results in a coordinate transformation
from the world coordinates to the clip coordinates (see Sect. 5.41 for the
detailed calculation steps performed). The result is stored in the predefined
output variable gl_Position for further processing in the subsequent
step of the graphics pipeline. The three-dimensional colour values are also
extended by one component. The extension here is a transparency value
(alpha value), which in this case specifies a non-transparent (opaque)
colour. The four-dimensional result vector is stored in the user-defined
output variable color.

Fig. 2.15 Source code of a simple fragment shader (GLSL)


As explained in Sect. 2.5, the output data of the vertex shader is
processed by the non-programmable function blocks vertex post-
processing, primitive assembly, rasterisation and early-per-fragment
operations; see also Fig. 2.10. The fragment shader shown in Fig. 2.15 can
access the result of this processing. In particular, it should be noted here that
a conversion of the vector graphic into a raster graphic has taken place
through rasterisation. In this process, fragments were created from the
vertices. Position coordinates and colour values are now assigned to each
fragment. In this example, the colour values were interpolated linearly,
which is not visible in the result due to the identical colour values for all
input vertices. In the fragment shader, which is called for each fragment, the
user-defined input variable color is defined. This must have the same
name as the output variable for the colour values of the vertex shader, so
that the colour values of the fragments created from the vertices can be
processed correctly in the fragment shader. In complex shader programs,
several output variables of the vertex shader can be used, which makes a
correct assignment to the input variables of the fragment shader necessary.
The output variable of the fragment shader is FragColor. In this
simple example, the colour values generated by the fixed-function blocks of
the graphics pipeline before the fragment shader are passed on to the
function blocks after the fragment shader without any processing. In
principle, such a shader can contain very complex calculations, usually for
the illumination of objects. The colour value in FragColor is passed
through a special mapping mechanism to the colour buffer, which contains
the result to be displayed on the screen. Therefore, this output variable can
be given (almost) any user-defined name. It should be noted that the
fragment shader does not use the position coordinates of the fragments.
This could be done by using the predefined input variable
gl_FragCoord, but this is not necessary in this simple example.
The calculation results of the fragment shader are passed on to the per-
fragment operations stage so that the drawing result can be written into the
framebuffer as the last step. Figure 2.16 shows the rendering result for
output to the screen via the framebuffer.

Fig. 2.16 A triangle drawn with a JOGL renderer and the programmable pipeline
Fig. 2.17 Comparison of data transfer to the OpenGL when using different drawing methods in
immediate mode and in the core profile

2.11 Efficiency of Different Drawing Methods


Figure 2.17 shows a pseudocode representation of the timing and overhead
of data transfers to the OpenGL (for processing by the GPU) for the
different drawing methods used in the examples from Sects. 2.8 and 2.10. In
the example from Sect. 2.8 (left side of the figure), the vertex data is
transferred by the glBegin/glEnd block in the so-called immediate mode
to the GPU. The data is available to the OpenGL exactly when the data is
needed. Since this takes place in the display method, the transfer of data
in this example is carried out for each rendered image (in this case 60 times
per second17). In the example from Sect. 2.10 (right side in the figure), the
drawing is prepared in the init method by creating and binding buffers on
the GPU. The transfer of the vertices to the OpenGL takes place exactly
once using this method. The actual drawing from the buffer only needs to
be triggered in the display method by a draw command (here by
glDrawArrays). This allows the data once transferred to be reused in
many frames. The figure shows the transfer to the OpenGL interface from
the logical point of view of a Java application. At which point in time the
data is actually transferred to the GPU depends on the specific
implementation of the OpenGL driver and the specific GPU. Even if
optimisations are implemented in the driver for the immediate mode, for
example, by transferring data to the GPU in blocks, the (final) data is only
available when it is needed. Little is known about the structure of the data
and its planned use by the application. In contrast, when drawing in the core
profile, the data is already available very early in buffer objects, ideally
already during the one-time initialisation.
Since typical 3D scenes consist of thousands or millions of vertices and
the largest parts of a scene undergo only a few or only slow changes, this
comparison makes it very clear how efficient the drawing method of the
core profile is. The effort required to manage the buffer objects on the
application side is thus worthwhile.
For the sake of completeness, it should be noted that display lists
and vertex arrays are available in the compatibility profile to draw more
efficiently than with glBegin/glEnd blocks. However, these methods are
no longer available in the core profile, as the functionality is efficiently
covered by the use of buffer objects.

2.12 Exercises
Exercise 2.1 General information about the OpenGL
(a)
What is the difference between the OpenGL programming interface
specification and an OpenGL implementation?
(b)
What are the benefits of an open and standardised interface
specification?
(c)
What makes an OpenGL implementation work on a computer?
(d)
In which programming language is OpenGL specified?
(e)
Please explain what is JOGL and what is LWJGL.

Exercise 2.2 OpenGL profiles


Explain the difference between the compatibility profile and the core
profile of the OpenGL. How are the fixed-function pipeline and the
programmable pipeline related to these profiles?

Exercise 2.3 OpenGL pipeline


(a)
What is the purpose of vertex processing?
(b)
What is the purpose of rasterisation?
(c)
What is the purpose of fragment processing?

Exercise 2.4 Programmable OpenGL pipeline


(a)
Which parts of the fixed-function pipeline have become
programmable in the programmable pipeline?
(b) What exactly is a shader?

(c)
Which shaders do you know and which functions can they perform?
(d)
Which shaders must always be present in the compatibility profile?
(e)
Which shaders must always be present in the core profile?

Exercise 2.5 Explain the difference between a vertex, a fragment and a


pixel.

Exercise 2.6 Swizzling


Given the following two vectors defined using GLSL:

Furthermore, the following result variables are given:


$$\begin{aligned} {} \texttt {float v1;}\,\, \texttt {vec2 v2;}\,\, \texttt
{vec3 v3;} \,\, \texttt {vec4 v4;} \end{aligned}$$
Specify the value that is in the variable to the left of the assignment after the
respective GLSL instruction:

(a) v1 = pos.x; (b) v1 = pos.y; (c) v1= pos.z;


(d) v1 = pos.w; (e) v1 = col.r; (f) v1 = col.g;
(g) v1 = col.b; (h) v1 = col.a; (i) v3 = pos.xyz;
(j) v2 = pos.wx; (k) v3 = pos.xwy; (l) v3 = pos.wyz;
(m) v3 = col.rgr; (n) v3 = col.bgr; (o) v2 = col.ra;
(p) v3 = col.bbr; (q) v3 = pos.stp; (r) v4 = pos.qtst;
(s) v4 = pos.qtsp; (t) v4 = pos.qspw; (u) v3 = pos.xys;
(v) v3 = pos.rbg; (w) v3 = pos.bgq; (x) v3 = pos.zwr;

Exercise 2.7 Write masking


Given the following vector defined using GLSL:
Specify the value that is in the variable pos after the respective GLSL
instruction:

(a) pos.x = 0.5;


(b) pos.y = 0.8;
(c) pos.xw = vec2 (0.1, 0.2);
(d) pos.zw = vec2 (0.1, 0.2);
(e) pos.stp = vec3 (0.1, 0.2, 0.3);
(f) pos.rga = vec3 (0.1, 0.2, 0.3);
(g) pos.xys = vec3 (0.1, 0.2, 0.3);
(h) pos.abb = vec3 (0.1, 0.2, 0.3);

References
1. J. F. Blinn. “Models of light reflection for computer synthesized pictures”. In: Proceedings of the
4th annual conference on Computer graphics and interactive techniques. SIGGRAPH ’77.
ACM, 1977, pp. 192–198.

2. E. Gamma, R. Helm, R. Johnson and J. Vlissides. Design Patterns: Elements of Reusable


Object-Oriented Software. Pearson Education India, 2015.

3. J. F. Hughes, A. van Dam, M. MaGuire, D. F. Sklar, J. D. Foley, S. K. Feiner and K. Akeley.


Computer Graphics. 3rd edition. Upper Saddle River, NJ [u. a.]: Addison-Wesley, 2014.

4. J. Kessenich, D. Baldwin and R. Rost. The OpenGL Shading Language, Version 4.60.6. 12 Dec
2018. Specification. Abgerufen 2.5.2019. The Khronos Group Inc, 2018. URL: https://www.
khronos.org/registry/OpenGL/specs/gl/GLSLangSpec.4.60.pdf.

5. J. Kessenich, G. Sellers and D. Shreiner. OpenGL Programming Guide. 9th edition. Boston [u.
a.]: Addison-Wesley, 2017.

6. A. Nischwitz, M. Fischer, P. Haberäcker and G. Socher. Computergrafik. 4. Auflage.


Computergrafik und Bildverarbeitung. Wiesbaden: Springer Vieweg, 2019.

7. B. T. Phong. “Illumination for Computer Generated Pictures”. In: Commun. ACM 18.6 (1975),
pp. 311–317.

8. R. J. Rost and B. Licea-Kane. OpenGL Shading Language. 3rd edition. Upper Saddle River, NJ
[u. a.]: Addison-Wesley, 2010.
9.
M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6
(Compatibility Profile) - October 22 2019. Abgerufen 8.2.2021. The Khronos Group Inc, 2019.
URL: https://www.khronos.org/registry/OpenGL/specs/gl/glspec46.compatibility.pdf.

10. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Core
Profile) - October 22, 2019). Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https://
www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf.

11. M. Seidl, M. Scholz, C. Huemer and Gerti Kappel. UML @ Classroom: An Introduction to
Object-Oriented Modeling. Heidelberg: Springer, 2015.

12. G. Sellers, S. Wright and N. Haemel. OpenGL SuperBible. 7th edition. New York: Addison-
Wesley, 2016.

Footnotes
1 Since drivers for graphics processors only support OpenGL up to a certain version or a limited
number of OpenGL extensions, a proportion of hardware-dependent programming code may still be
necessary. Depending on the specific application, a large part of the graphics application will still be
programmable independently of GPUs using OpenGL.

2 In the OpenGL specification, a regular (uniform) grid consisting of square elements is assumed for
simplicity. In special OpenGL implementations, the elements may have other shapes.

3 For details, see, for example, https://www.khronos.org/opengl/wiki/Early_Fragment_Test.

4 See https://www.khronos.org/opengl/wiki/Fragment_Shader for details.

5 Introductions to Unified Modelling Language (UML) class diagrams can be found, for example, in
[11].

6 This mechanism is based on the observer design pattern, also called listener pattern. Details of this
design pattern are, for example, in [2].

7 Please remember matrix multiplication is not commutative, i.e., the order of the operants must not
be reversed.
8 References to the OpenGL Extensions can be found at the web address https://www.khronos.org/
registry/OpenGL/.

9 So-called extension viewers show which OpenGL versions and which extensions are available on
a particular computer system. The company Realtech VR, for example, provides such software at
https://realtech-vr.com.

10 For introductions to Unified Modelling Language (UML) sequence diagrams, see, for example,
[11].

11 An alternative representation of the command syntax can be found in [5, p. 73].

12 Since switching shader programs means a certain workload for the GPU, the number of switching
operations can be minimised by using identical shader programs and sorting the 3D objects
appropriately before drawing.

13 In the JOGL version 2.3.2 from 2015, SPIR-V is not supported.

14 The class ShaderProgram is not to be confused with the class of the same name of the JOGL
binding. The class used here, together with the entire example program, can be found in the
supplementary material to the online version of this chapter.

15 In fact, glBindBuffer(GL.GL_ARRAY_BUFFER, ...) does not change the state of a


vertex array object (VAO). However, an assignment to a vertex array buffer takes place indirectly
through the command glVertexAttribPointer.

16 Implementations of some graphics card drivers may deliver the desired result in such a case even
without a VAO. For maximum compatibility, however, a VAO should always be used.

17 The maximum frame rate can be set to values other than 60 frames per second in the JOGL
examples JoglBasicStartCodeFFP and JoglBasicStartCodePP in the class for the main
windows.
© Springer Nature Switzerland AG 2023
K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science
https://doi.org/10.1007/978-3-031-28135-8_3

3. Basic Geometric Objects


Karsten Lehn1 , Merijam Gotzes2 and Frank Klawonn3
(1) Faculty of Information Technology, Fachhochschule Dortmund,
University of Applied Sciences and Arts, Dortmund, Germany
(2) Hamm-Lippstadt University of Applied Sciences, Hamm, Germany
(3) Data Analysis and Pattern Recognition Laboratory, Ostfalia University
of Applied Sciences, Braunschweig, Germany

Karsten Lehn (Corresponding author)


Email: [email protected]

Merijam Gotzes
Email: [email protected]

Frank Klawonn
Email: [email protected]

Supplementary Information
The online version contains supplementary material available at https://doi.
org/10.1007/978-3-031-28135-8_3.

This chapter describes basic geometric objects that are used in computer
graphics for surface modelling. Furthermore, the planar basic objects used
in the OpenGL and their use through OpenGL drawing commands are
explained. The graphic primitive points, lines, triangles, polygons and
quadrilaterals (quads) used in the OpenGL are considered in more detail.
The OpenGL has graphic primitives to draw sequences of basic objects,
allowing surfaces of objects to be represented efficiently. In addition, there
are OpenGL drawing commands, such as indexed drawing, primitive restart
and indirect drawing, to enable drawing by a graphics processor
independent of the graphics application. Many of the concepts used in the
OpenGL are also used in a similar form in other graphics systems. The
content of this chapter provides the basis for understanding the modelling of
surfaces of three-dimensional objects, which is presented in Chap. 4.

3.1 Surface Modelling


A commonly used approach in computer graphics for the representation of
(complex) objects is the modelling of their surfaces using basic geometric
objects. Further possibilities for modelling three-dimensional objects are
described in Chap. 4. The basic geometric objects in computer graphics are
usually called graphics output primitives, geometric primitives
or primitives. Three main types of primitives can be distinguished.

$${\textbf {Points}}$$ are uniquely specified by their coordinates.


Points are mainly used to define other basic
objects, for example, a line by specifying the starting point and the
endpoint.
$${\textbf {Lines, polylines or curves}}$$ can be lines defined by two
points, but also polylines that
are connected sequences of lines. Curves require more than two control
points.
$${\textbf {Areas}}$$ are usually bound by closed polylines or defined
by polygons. An area can be filled with a colour
or a texture.

The simplest curve is a line segment or a line characterised by a starting


point and an endpoint. If several lines are joined together, the result is a
polyline. A closed polyline is a polyline whose last line ends where the first
begins. The region enclosed by a closed polyline defines a polygon. Thus, a
polygon is a plane figure described by a closed polyline. Depending on the
graphics program or graphics programming interface, only certain types of
polygons are permitted. Important properties are that the polygon should
not overlap with itself or its convexity. A polygon is convex if the line
segment between two points of the polygon is contained in the union of the
interior and the boundary of the polygon. The notion of convexity can be
generalised to three-dimensional bodies. Figure 3.1 shows a self-
overlapping, a non-convex and a convex polygon. For the non-convex
polygon in the middle, a connecting line between two points of the polygon
is indicated by a dashed line, which does not completely lie inside the
polygon.

Fig. 3.1 A self-overlapping (left), a non-convex (middle) and a convex (right) polygon

Besides lines or piecewise linear polylines, curves are also common in


computer graphics. In most cases, curves are defined as parametric
polynomials that can also be attached to each other like line segments in a
polyline. How these curves are exactly defined and calculated is described
in Sect. 4.6.1. At this point, it will suffice to understand the principle of
how the parameters of a curve influence its shape. Besides a starting point
and an endpoint, additional control points are defined. In the context of
parametric curves, these additional control points are called inner control
points. Usually, one additional control point is used for the definition of a
quadratic curve and two additional control points for the definition of a
cubic curve. The curve begins at the start point and ends at the endpoint. It
generally does not pass through inner control points. The inner control
points define the direction of the curve in the two endpoints.
In the case of a quadratic curve with one inner control point, consider
the connecting lines from the inner control point to the starting point and to
the endpoint. These imaginary lines form the tangents to the curve at the
start and endpoints. Figure 3.2 shows on the left a quadratic curve defined
by a start and endpoint and one inner control point. The tangents at the start
and endpoints are drawn dashed. In the case of a cubic curve, as shown in
the right part of the figure, the two tangents can be defined independently of
each other by two inner control points.

Fig. 3.2 Definitions of quadratic and cubic curves using control points
When curves are joined together to form a longer, more complicated
curve, it is generally not sufficient for the endpoint of a curve to coincide
with the start point of the respective following curve. The resulting joint
curve would be continuous, but not smooth, and could therefore contain
sharp bends. To avoid such sharp bends, the tangents at the endpoint of the
previous curve and at the start point of the following curve must point in the
same direction. This means the endpoint, which is equal to the start point of
the next curve, and the two inner control points defining the two tangents
must be collinear. This means they must lie in the same straight line. This
can be achieved by the appropriate choice of control points. Therefore, the
first inner control point of a succeeding curve must be on the straight line
defined by the last inner control point and endpoint of the previous curve.
Similarly, a curve can be fitted to a line without creating sharp bends by
choosing the first inner control point to lie on the extension of the line.
Figure 3.3 illustrates this principle.

Fig. 3.3 Smooth fitting of a cubic curve to a straight line

Other curves frequently used in computer graphics are circles and


ellipses or parts thereof in the form of arcs of circles or ellipses. Circles and
ellipses, like polygons, define areas. Areas are bounded by closed curves. If
only the edge of an area is to be drawn, there is no difference to drawing
curves in general. Areas, unlike simple lines, can be filled with colours or
textures. Algorithmically, filling an area is very different from drawing lines
(see Sect. 7.5).
Axes-parallel rectangles whose sides are parallel to the coordinate axes
play an important role in computer graphics. Although they can be
understood as special cases of polygons, they are simpler to handle since it
is already sufficient to specify two diagonally opposite corners, i.e., two
points. These rectangles can also be used as axis-aligned bounding boxes
(AABB) (see Sect. 11.7).
It can be very cumbersome to define complex areas by directly
specifying the boundary curve. One way to construct complex areas from
existing areas is to apply set-theoretic operations to the areas. The most
important of these operations are union, intersection, difference and
symmetric difference. The union joins two sets, while the intersection
consists of the common part of both areas. The difference is obtained by
removing from the first area all parts that also belong to the second area.
The symmetric difference of set is the pointwise exclusive-OR operation
applied to the two areas. In other terms, the symmetric difference is the
union of the two areas without their intersection. Figure 3.4 shows the result
of applying these operations to two areas in the form of a circle and a
rectangle.

Fig. 3.4 Union, intersection, difference and symmetrical difference of a circle and a rectangle

Another way to create new areas from already constructed areas is to


apply geometric transformations, such as scaling, rotation or translation.
These transformations are illustrated in Sect. 5.1.

3.2 Basic Geometric Objects in the OpenGL


In the OpenGL it is common to represent (complex) objects by modelling
the object surface with planar polygons. As a rule, this leads to polygon
meshes, i.e., sets of interconnected polygons. Using these planar pieces of
areas, curved surfaces must be approximated—possibly by a large number
—of polygons until a desired representation quality is achieved. The use of
polygon meshes offers the advantage that the further computations of the
graphics pipeline can be performed very easily and efficiently. Therefore,
this approach has been very common for a long time.
With the availability of tessellation units in modern GPUs, it is now
possible to work with curved surface pieces that can be processed on the
GPU. The curved surfaces can be described precisely by a few control
points, so that less complex data structures have to be transferred to the
GPU than when modelling with planar polygons. Since modelling by
polygons is very common in computer graphics and not all GPUs and
existing graphics applications support tessellation on the GPU, the basics
for this approach are presented below.
The OpenGL provides geometric primitives for drawing points, lines
and planar triangles. In the compatibility profile, convex polygons and
quadrilaterals can also be used. Since complex objects require a large
number of these primitives, the OpenGL allows several connected
primitives to be drawn by reusing vertices from the preceding primitive to
increase efficiency. Table 3.1 shows the types of geometric primitives
available in the OpenGL. The entry contained in the left column is one of
the arguments passed to a drawing command (see Sect. 3.3) and specifies
how a geometric primitive is drawn from a sequence of n vertices
$$v_0, v_1,..., v_{n-1}$$. This sequence is called a vertex stream.
Table 3.1 Types of geometric primitives in the OpenGL

Type Explanation
Points
GL_POINTS A point is drawn for each vertex
Lines
GL_LINE_STRIP Two consecutive vertices are connected by a line, whereby the end
vertex is always used as the start vertex of the subsequent line
GL_LINE_LOOP A GL_LINE_STRIP is drawn and in addition a line from the last
vertex to the first vertex
GL_LINES Each two consecutive vertices that belong together define the start
vertex and the end vertex between which a line is drawn. The lines are
not connected to each other
Triangles
GL_TRIANGLE_STRIP A sequence of connected filled triangles is drawn, with the subsequent
triangle reusing an edge of the previous triangle. A triangle is drawn
from the first three vertices. Each subsequent triangle consists of the last
two vertices of the previous triangle and one new vertex. The drawing
order of the vertices defines the front and back sides of a triangle (see
Sects. 3.2.3 and 3.2.4)
GL_TRIANGLE_FAN A sequence of connected filled triangles is drawn, with the subsequent
triangle reusing an edge of the previous triangle. A triangle is drawn
from the first three vertices. Each subsequent triangle consists of the
very first vertex, the last vertex of the previous triangle and one new
vertex. The drawing order of the vertices defines the front and back side
of a triangle (see Sects. 3.2.3 and 3.2.4)
GL_TRIANGLES A filled triangle is drawn from each of three consecutive vertices that
belong together. The respective triangles are not connected to each
other. The drawing order of the vertices defines the front and back side
of a triangle (see Sects. 3.2.3 and 3.2.4)
Polygon (only available in the compatibility profile)
GL_POLYGON A convex filled polygon is drawn from all vertices. The drawing order
of the vertices defines the front and back side of a polygon (see Sects.
3.2.4 and 3.2.5)
Type Explanation
Quadrilaterals (only available in the compatibility profile)
GL_QUAD_STRIP A sequence of connected, filled quadrilaterals is drawn, with the
subsequent quadrilateral reusing an edge of the previous quadrilateral. A
quadrilateral is drawn from the first four vertices. Each subsequent
quadrilateral consists of the last two vertices of the previous
quadrilateral and two new vertices. The drawing order of the vertices
defines the front and back side of a quadrilateral (see Sects. 3.2.4 and
3.2.6)
GL_QUADS A filled quadrilateral is drawn from each of four consecutive vertices
that belong together. The respective quadrilaterals are not connected to
each other. The drawing order of the vertices defines the front and back
side of a quadrilateral (see Sects. 3.2.4 and 3.2.6)

In the following sections, source code extracts or commands are given


to explain the essential procedure of drawing in the OpenGL. This source
code and these commands can be used to modify and extend the basic
examples for creating graphical objects with the compatibility profile (see
Sect. 2.8) and the core profile (see Sect. 2.10). Furthermore, the
supplementary material to the online version of this chapter contains
complete JOGL projects that can be used to reproduce the contents of the
following sections.

3.2.1 Points
Figure 3.5 shows an example of points that are drawn using the
GL_POINTS geometric primitive. For this purpose, the Java source code of
the display method of a JOGL renderer shown in Fig. 3.6 can be used.
Besides the definition of the vertex positions, the glBegin command
specifies that the vertices are to be used to draw points. Furthermore, the
size of the points to be displayed was set to a value of 10 using
glPointSize. The points shown in the figure are square. In order to
render round points, antialiasing and transparency can be enabled using the
following command sequence.

Antialiasing smoothes the edge pixels of each point, reducing the


intensity of the grey values towards the edge and creating a rounded
impression. This function is only available in the compatibility profile and
has some disadvantages (see Sect. 7.6.2). As an alternative, a point can be
modelled by a (small) circle to be drawn by the primitive
GL_TRIANGLE_FAN (see below).

Fig. 3.5 Example of an OpenGL geometric primitive for drawing points

Fig. 3.6 Example for drawing points (GL_POINTS) in the compatibility profile: Part of the source
code of the display method of a JOGL renderer (Java)

Fig. 3.7 Parts of the source code of the init method of a JOGL renderer (Java) for drawing points
(GL_POINTS) in the core profile

Fig. 3.8 Parts of the source code of the display method of a JOGL renderer (Java) for drawing
points (GL_POINTS) in the core profile

Figures 3.7 and 3.8 show the relevant Java source code to draw points in
the core profile. To do this, the vertex positions and their colour are defined
in the init method. The definition of the colour per vertex is required
when working with the shaders from Sect. 2.10. Alternatively, the colour in
the shaders can be determined using the illumination calculation (see Chap.
9) or—for testing purposes—set to a fixed value in one of the shaders. In
the display method, only the draw command glDrawArrays with the
first parameter GL.GL_POINTS must be called. The transfer of the vertex
data into a vertex buffer object takes place in the same way as in the
example described in Sect. 2.10. The same shaders are used for rendering as
in this example.
The glPointSize command sets the size of the points in the same
way as in the compatibility profile. Alternatively, this value can be set in the
vertex or geometry shader. To do this, this function must be activated by the
following command in the JOGL program.
Afterwards, the point size can be set to 10 in the vertex shader (GLSL
source code), for example.

The variable used, gl_PointSize, is a predefined output variable of the


vertex or geometry shader.
In order to be able to reproduce the examples in this section, the two
complete projects JoglPointsFFP and JoglPointsPP are available
in the supplementary material to the online version of this chapter.

3.2.2 Lines
Figure 3.9 shows examples of the rendering result using the three different
geometric primitives for drawing lines in the OpenGL. Below each of the
examples is an indication of the order in which the vertices are used to
represent a line segment.

Fig. 3.9 Examples of geometric primitives in the OpenGL for drawing lines

Fig. 3.10 Example of drawing a line strip (GL_LINE_STRIP) in the compatibility profile: Part of
the source code of the display method of a JOGL renderer (Java) is shown

Figure 3.10 shows the relevant Java source code of the display
method of a JOGL renderer for the compatibility profile to draw a polyline
similar to the left part of Fig. 3.9. By replacing the argument in the
glBegin command with GL.GL_LINE_LOOP or GL.GL_LINES, a
closed polyline or separate lines can be drawn, as shown in the middle and
right parts of Fig. 3.9.

Fig. 3.11 Parts of the source code of the init method of a JOGL renderer (Java) for drawing a
polyline (GL_LINE_STRIP) in the core profile

Fig. 3.12 Parts of the source code of the display method of a JOGL renderer (Java) for drawing a
polyline (GL_LINE_STRIP) in the core profile
Figures 3.11 and 3.12 show the relevant Java source code to draw a
polyline in the core profile (left part of Fig. 3.9). For this purpose, the
vertex positions and their colours are defined in the init method. In the
display method, only the drawing command glDrawArrays must be
called. The transfer of the vertex data into a vertex buffer object takes place
in the same way as in the example described in Sect. 2.10. Likewise, the
same shaders are used for rendering as in the example. By replacing the
first argument in the glDrawArrays command with
GL.GL_LINE_LOOP or GL.GL_LINES, a closed polyline or separate
lines can be drawn, as shown in the middle and right parts of Fig. 3.9.
The width of the lines can be set with the glLineWidth command.
After the additional activation of antialiasing by the following JOGL
command, smooth lines are drawn.

For explanations on line widths, line styles and antialiasing, see Sect. 7.4.
The full example projects JoglLinesFFP and JoglLinesPP used in
this section can be found in the supplementary material to the online
version of this chapter.

3.2.3 Triangles
The top three images in Fig. 3.13 show rendering examples of the three
geometric primitives in the OpenGL that are available for drawing triangles.
In each case, triangles are shown that lie in a plane (planar triangles).
Basically, a shape with three vertices can be curved in space. However, in
order to define the curvature precisely, additional information—besides the
three position coordinates of the vertices—would have to be available,
which is not the case. Therefore, the three respective vertices are connected
by a planar triangle. Thus, each individual triangle lies in a plane, but
adjoining planar triangles can (in principle) be oriented arbitrarily in space
and have (in principle) any size. This allows curved structures to be
approximated. The size and number of triangles determine the accuracy of
this approximation to a curved structure. In this book, triangles are always
understood as plane/planar triangles.
Fig. 3.13 Examples of OpenGL geometric primitives for drawing triangles: In the lower part of the
figure, the individual triangles are visible because the edges have been highlighted

As already explained in Sects. 3.2.1 and 3.2.2 and shown in the


respective source code, for the use of a specific geometric primitive, only
the first argument of the drawing command has to be chosen accordingly.
Thus, for drawing a sequence of connected triangles, a triangle fan or
individual triangles, the first argument is GL.GL_TRIANGLE_STRIP,
GL.GL_TRIANGLE_FAN or GL.GL_TRIANGLES, respectively.
The full example source code for drawing triangles can be found in the
projects JoglTrianglesFFP and JoglTrianglesPP, which are in
the supplementary material to the online version of this chapter.
In the lower images of Fig. 3.13, the edges have been highlighted to
reveal the drawn individual triangles. Below each of the images, it is further
indicated in which order the vertices are used by a geometric primitive to
represent one triangle at a time. The vertices were passed to the GPU in
ascending order of their indices, i.e., in the order
$$v_0, v_1, v_2, v_3, v_4, v_5$$. The change of order for rendering
takes place through the geometric primitive. By changing this order for a
GL_TRIANGLE_STRIP or a GL_TRIANGLE_FAN, all rendered adjacent
triangles have the same orientation. This can be used for modelling so that
adjacent areas represent either the outer or inner surfaces of a modelled
object (see Sect. 3.2.4).
Using the geometric primitive GL_TRIANGLE_STRIP, adjacent
triangles are drawn so that each subsequent triangle consists of an edge of
the previous triangle. After the first triangle is drawn, only a single vertex is
required for the next triangle. This makes this geometric primitive very
efficient, both for storing and rendering objects. For long triangle
sequences, this advantage over drawing individual triangles converges to a
factor of three. For this reason, this primitive is very often used for
modelling complex surfaces of objects.
The geometric primitive GL_TRIANGLE_FAN also draws adjacent
triangles where each subsequent triangle consists of an edge of the previous
triangle. Since the first vertex is used for each of the triangles, a fan-shaped
structure is created, a triangle fan. This makes this primitive well suited for
approximating round structures. For example, if the first vertex is chosen to
be the centre and each subsequent vertex is chosen to be a point on the
edge, then a circle can be rendered. The use of the primitive
GL_TRIANGLE_FAN is as efficient as the use of the primitive
GL_TRIANGLE_STRIP. Also in this primitive, from the second triangle
onwards, only one vertex is needed for each further triangle.
Since any surface can be approximated with arbitrary accuracy by
sequences of triangles, triangles are the most commonly used polygons in
computer graphics. Another advantage of triangles is that a planar triangle
can always be drawn from three vertices. Triangles cannot overlap
themselves. Furthermore, triangles are always convex, which is not
necessarily true for shapes with more than three corners. In graphics
programming interfaces, further restrictions are therefore defined for
quadrilaterals or pentagons, for example, so that these primitives can be
drawn efficiently. Furthermore, the geometric primitives for drawing
multiple adjacent planar triangles available on GPUs are implemented in a
very memory and runtime efficient way, mainly due to the reuse of the
adjacent edges (see above). The algorithms for rasterisation and occlusion
computation for planar triangles are relatively simple and therefore very
efficient to implement. These properties make planar triangles the preferred
polygon in computer graphics.
By using two or three identical vertices, lines or points can be drawn
from triangle primitives. Such degenerated triangles are useful when a
single long triangle sequence is to be drawn, but the geometry of the
modelled object does not permit the drawing of complete triangles within
the sequence at all points of the object. Such a triangle is not visible if the
triangle degenerated to a line is identical to an already drawn edge or if the
triangle degenerated to a point coincides with an already drawn point (see
Sect. 3.3.2). Another application for triangles degenerated into lines or
points is to use them instead of the special geometric primitives for lines
and points. This is sensible if the line or point primitives are less efficiently
implemented as triangle sequences. As an example, a circle can be drawn as
a substitute for a point with an adjustable size by using a
GL_TRIANGLE_FAN.
Points, lines and triangles can be used to approximate all the necessary
objects in computer graphics. Any convex planar polygon can be
represented by a GL_ TRIANGLE_STRIP or GL_TRIANGLE_FAN (see
Sect. 3.2.5). For example, it is possible to draw a planar quadrilateral with
two triangles. If a GL_TRIANGLE_ STRIP is used for this, even the
number of vertices to be displayed is identical (see Sect. 3.2.6). For these
reasons, only geometric primitives for drawing points, lines and triangles
are present in the OpenGL core profile.

3.2.4 Polygon Orientation and Filling


By a convention often used in computer graphics, the front face (front side)
of a polygon is defined by the counter-clockwise order of the vertices (see
Sect. 4.2) when looking at the respective face (side). Similarly, the back
face of a polygon is defined by the clockwise order of the vertices. This
order definition of naming (enumeration) of the vertices within a polygon is
also called winding order of the respective polygon. Based on the vertex
order given in Fig. 3.13, it can be seen that, according to this convention, all
triangles are oriented with their front face towards the viewer. In the
OpenGL, this convention is set as the default winding order. However, the
command glFrontFace(GL_CW) reverses this order so that afterwards
front faces are defined by the clockwise order of vertices. The
glFrontFace(GL.GL_CCW) command restores the default state. CW
and CCW are abbreviations for clockwise and counter-clockwise.
The definition of front and back surfaces is so important because it
reduces the effort for rendering. Polygon (and especially triangle) meshes
are often used to model the surfaces of (complex) objects. These three-
dimensional surfaces are in many cases closed and the interior of the
objects is not visible from the outside. If all polygons for surface modelling
have their fronts facing outwards and the observer is outside an object, then
the inward facing backsides do not need to be drawn. If the observer is
inside an object, then only the insides need to be shown and the outsides
can be omitted from the rendering process. In other applications, both sides
of a polygon are visible, for example, with partially transparent objects. If
the drawing of the front or backsides is switched off (culling) in the cases
where front or backsides are not visible, then the rendering effort decreases.
The underlying mechanisms for determining the visibility of surfaces are
presented in Sect. 8.2.1. In addition, the orientation of the surfaces is used
for the illumination calculation (see Chap. 9).
In the OpenGL, both sides of a polygon are drawn by default. The
command gl.glEnable(GL.GL_CULL_FACE) suppresses the drawing
of front or back faces or both as follows.
Frontface culling : gl.glCullFace(GL.GL_FRONT)
Backface culling : gl.glCullFace(GL._BACK)
Frontface and backface culling:
gl.glCullFace(GL._FRONT_AND_BACK)
These JOGL commands work in both the compatibility profile and the
core profile.
By default, polygons are displayed as filled polygons in the OpenGL.
The command glPolygonMode changes this drawing mode. The edges
of the polygons can only be rendered as lines or the vertices of the polygons
(positions of the vertices) as points. The first parameter of this command
uses GL_FRONT, GL_BACK or GL_FRONT_AND_BACK to determine the
sides of the polygons that are to be affected. The second argument
determines whether polygons are to be drawn as points (GL_POINT) or
lines (GL_LINE) or whether the polygon is to be filled (GL_FILL). The
following Java command sequence, for example, indicates that the front
sides are to be filled and only the edges of the backsides are to be drawn:

The mentioned commands for the polygon mode are available in the
compatibility profile and core profile. Section 7.5 presents algorithms for
filling polygons.

Fig. 3.14 Examples of drawn front faces of a sequence of connected triangles


(GL_TRIANGLE_STRIP) when using different polygon modes

Figure 3.14 shows the effects of different polygon modes using a


sequence of connected triangles. The commands described in Sects. 3.2.1
and 3.2.2 for changing the rendering of points and lines, for example, the
point size or line width, also affect the representation of the polygons in the
changed mode for drawing polygons.
Using the two example projects JoglTrianglesFFP and
JoglTrianglesPP for drawing triangles, which can be found in the
supplementary material to the online version of this chapter, the settings for
polygon orientation and drawing filled polygons explained in this section
can be reproduced.
3.2.5 Polygons
Figure 3.15 shows a rendering example of an OpenGL geometric primitive
for polygons. This primitive is only available in the compatibility profile.
The vertices are used for drawing according to the order of the vertex
stream transmitted to the GPU. As explained in Sects. 3.2.1 and 3.2.2 and
shown in the source code, to use a particular geometric primitive, only the
first parameter of the drawing command needs to be chosen accordingly.
For example, to draw a polygon in a glBegin command, the first
argument is GL.GL_POLYGON. The full source code for drawing polygons
can be found in the project JoglPolygonFFP, which is in the
supplementary material to the online version of this chapter.

Fig. 3.15 Example of an OpenGL geometric primitive for drawing polygons: In the right part of the
figure, the edges have been highlighted

The OpenGL specification [2] requires that convex polygons are


guaranteed to be displayed correctly. In a convex polygon, all points on a
connecting line between two points of the polygon also belong to the
polygon (see Sect. 3.1). In convex polygons, all vertices point outwards.
The polygon in Fig. 3.15 is convex. Polygons that are not convex have
inward indentations. Figure 3.16 shows a non-convex polygon. Not all
points on the red connecting line between the two points $$P_1$$ and
$$P_2$$ (that belong to the polygon) are inside the polygon. The
inward indentation near vertex $$v_1$$ is clearly visible. The top
three images in Fig. 3.17 show the output of an OpenGL renderer for the
non-convex polygon in Fig. 3.16. As soon as backface culling is switched
on or if the polygon is filled, the rendering is no longer correct (image in
the middle and on the right).1

Fig. 3.16 Example of a non-convex polygon: The edges have been highlighted

Fig. 3.17 Example of rendering results for a non-convex polygon: The upper part of the figure
shows the drawing results using the geometric primitive GL_POLYGON. The lower part of the figure
shows the result after triangulation and rendering by a triangle fan (GL_TRIANGLE_FAN)
Fig. 3.18 Examples of OpenGL geometric primitives for drawing quadrilaterals: In the lower part of
the figure, the individual quadrilaterals are visible because the edges have been highlighted

In general, a non-convex polygon can be represented by a set of adjoint


convex polygons. Since triangles are always convex, the same figure can be
drawn by a triangle fan after decomposing this polygon into triangles
(triangulation). The bottom three images in Fig. 3.17 now show correct
rendering results in all polygon modes. Since every convex and planar
polygon can be represented by a sequence of connected triangles
(GL_TRIANGLE_STRIP) or a triangle fan (GL_TRIANGLE_STRIP), the
geometric primitive for polygons is now only available in the compatibility
profile.
An OpenGL implementation will most likely triangulate each polygon
and afterwards draw the resulting triangles. This will be the case in
particular because the polygon primitive no longer exists in the core profile
and the deprecated functions of the fixed-function pipeline will most likely
be realised internally (within the graphics card driver or the GPU) through
functions of the core profile.

3.2.6 Quadrilaterals
The top two images in Fig. 3.18 show examples of the two OpenGL
geometric primitives for drawing quadrilaterals. As explained in Sects.
3.2.1 and 3.2.2 and shown in the source code, the use of a specific
geometric primitive only requires the first argument of the drawing
command to be chosen accordingly. These primitives are only available in
the compatibility profile. Therefore, to draw a sequence of quadrilaterals or
individual quadrilaterals, the first argument must be
GL.GL_QUAD_STRIP or GL.GL_QUADS. Quad is the abbreviation for
quadrilateral. The project JoglQuadsFFP, which is included in the
supplementary material for the online version of this chapter, contains the
full source code for this example.
In the lower images of Fig. 3.18, the edges have been highlighted to
reveal the individual quadrilaterals drawn. Below each of the images, it is
further indicated in which order the vertices are used by a geometric
primitive to represent a quadrilateral in each case. The vertices were passed
to the GPU in ascending order of their indices, i.e., in the order
$$v_0, v_1, v_2, v_3, v_4, v_5$$. The order in which the vertices are
used for rendering in GL_QUAD_STRIP ensures that all adjacent
quadrilaterals have the same orientation (see Sect. 3.2.4). Thus, adjacent
faces are either outer or inner faces. For the primitive GL_QUADS, the
vertices are used in the order in which they are specified. If the default
OpenGL settings have not been changed, then all the quadrilaterals shown
in Fig. 3.18 have their faces oriented towards the viewer (see Sect. 3.2.4).
The geometric primitive GL_QUAD_STRIP draws sequences of
adjacent quadrilaterals. Each subsequent quadrilateral consists of an edge of
the previous quadrilateral. Therefore, after drawing the first quadrilateral,
only two vertices are required for the next quadrilateral. This makes it very
efficient to store and draw objects that are modelled by sequences of
quadrilaterals.
Since a quadrilateral can be represented by two triangles, any object
constructed from a quadrilateral sequence can be represented by a triangle
sequence (GL_TRIANGLE_STRIP). Figure 3.19 contrasts two approaches
for drawing a rectangular figure. In both cases, the same number of vertices
is necessary. Since this is true for the general case, both geometric
primitives can be stored and drawn equally efficiently. However, modern
graphics processors are optimised for drawing triangle sequences, which is
also due to the fact that they are supported in both the compatibility profile
and the core profile of the OpenGL.

Fig. 3.19 Comparison of a quadrilateral sequence (GL_QUAD_STRIP) with a triangle sequence


GL_TRIANGLE_STRIP using the same rectangular figure: The individual quadrilaterals and
triangles are visible because the edges have been highlighted

Quadrilaterals are often used in computer graphics (rendering and


modelling software) as they are well suited for modelling surfaces.
However, triangle sequences offer more flexibility than quadrilateral
sequences. For example, a monochrome non-illuminated cuboid with
uniformly oriented faces can be represented by a single triangle sequence
(GL_TRIANGLE_STRIP). When using quadrilaterals, at least two
quadrilateral sequences (GL_QUAD_STRIP) are required for the same
figure (see Sect. 4.4).
3.3 OpenGL Drawing Commands
Using the introductory example for the compatibility profile, Sect. 2.8
explains how to draw a triangle using a glBegin/glEnd block. A
parameter to the glBegin command specifies which geometric primitive
(see Sect. 3.2) to draw. Since this drawing method transfers vertex data one
by one from the graphics application to the GPU for each frame, this
method is very inefficient. Therefore, so-called display lists and vertex
arrays were introduced in the OpenGL, which help to transfer vertex data
as blocks once to the GPU in order to use this data several times by drawing
commands (of the application) for drawing objects.
As a further development of this idea, with the core profile vertex
buffer objects (VBO) were introduced. Vertex buffers are reserved memory
areas on the GPU to hold the vertex data. After filling these buffers on the
GPU, they can be used again and again for rendering by a drawing
command of the application. These memory areas can be changed when
objects are modified, which, however, does not or rarely occur with many
objects in a scene. The use of this technique is explained in detail with the
help of an example in Sect. 2.10 for drawing a triangle. This example uses
the drawing command glDrawArrays. The use of vertex buffer objects
is also the standard drawing technique in the core profile. Drawing with
glBegin/glEnd blocks, display lists or vertex arrays is no longer
available in the core profile.
In order to speed up the drawing of objects and save memory on the
GPU, the OpenGL follows the following basic approaches:
1.
Transferring the data in a block to the GPU, if possible before drawing.
2.
Minimising the data (to be transferred) on the GPU by reusing existing
data.
3.
Reduction of the number of calls to drawing commands by the
application.
The first item is realised through the use of vertex buffer objects. The
second item can be achieved by the use of primitives, by which a series of
connected lines or triangles are drawn by one call to a drawing command.
In the core profile, these are the GL_LINE_STRIP, GL_LINE_LOOP,
GL_TRIANGLE_STRIP and GL_TRIANGLE_FAN primitives. All
OpenGL primitives allow multiple points, lines or triangles to be created
with one call to the glDrawArrays drawing command, so that the third
item in the enumeration above is also supported.
In addition to the glDrawArrays command (see the example from
Sect. 2.10), there are other drawing commands and methods in OpenGL
through which further efficiency improvements are possible according to
the list presented above. The following sections present the main underlying
ideas of some of these commands.

3.3.1 Indexed Draw


Figure 3.20 shows a simple surface consisting of nine vertices drawn by an
OpenGL renderer. This could be a small piece of the ground surface of a
terrain. Figure 3.21 shows the top view of this surface. The vertices are
marked by $$v_j (j = 0, 1,... 8)$$. To draw this surface with separate
triangles (GL_TRIANGLES) and glDrawArrays, 24 vertices have to be
transferred into a vertex buffer on the GPU. A GL_TRIANGLE_STRIP
primitive still needs 13 vertices (see below). In both cases, however, the
vertices $$v_3, v_4$$ and $$v_5$$ must be entered several times in the
buffer, since they belong to more than one triangle. This can be avoided by
using the drawing command glDrawElements. This uses an index
buffer (an additional buffer) whose elements point to the vertices in the
vertex buffer. For this, similar to a vertex buffer object, an index buffer
object (IBO) is reserved on the GPU and filled with the corresponding
indices. Table 3.2 shows possible contents for the index and vertex buffer
objects for drawing the example mesh in Fig. 3.20. The vertex index j in the
index buffer specifies which vertex from the vertex buffer is used for
drawing and in which order. Note that the data in the vertex buffer object is
stored in floating point format (float) and the indices in the index buffer
object are integers (int).

Fig. 3.20 Example of a surface of nine vertices drawn using an OpenGL renderer: In the left part of
the figure, the triangles are filled. In the right part of the figure, the edges of the triangles are shown
as lines
Fig. 3.21 Top view of a surface consisting of the nine vertices $$v_0, v_1, ..., v_8$$ : The large
numbers represent numbers of the triangles

Table 3.2 Contents of the vertex and index buffers for drawing the example surface in Fig. 3.20

Assuming the use of floating point numbers and integers, each of which
occupies four bytes in the GPU memory, there is already a memory
advantage compared to drawing only with the VBO. Drawing with
GL_TRIANGLES and glDrawArrays requires 24 vertices consisting of
six components, each of which occupies four bytes. Thus, the vertex buffer
requires 576 bytes of data. For drawing with GL_TRIANGLES and
glDrawElements, only nine vertices are needed. This means that the
vertex buffer object contains 216 bytes of data. In addition, there is the
index buffer object with 24 indices and thus a data volume of 96 bytes. In
total, this results in a memory space requirement of only 312 bytes. This
reuse of vertex data supports item 2 of the list in Sect. 3.3.
If we also consider that each vertex usually stores other data, at least
normal vectors and texture coordinates, and that complex objects consist of
many more vertices and contain many more shared vertices than in this
simple example, the savings become even more obvious. On the other hand,
the use of an index buffer by glDrawElements results in one more
access each time a vertex is read, which slows down the drawing operation.
This access, however, takes place entirely on the GPU and can be
accelerated accordingly—through hardware support.

Fig. 3.22 Parts of the source code of the init method of a JOGL renderer (Java) for drawing the
example surface with glDrawElements from separate triangles (GL_TRIANGLES)

Figures 3.22 and 3.23 show the essential Java source code for rendering
the example surface by the index buffer object and the vertex buffer object
from separate triangles.
Fig. 3.23 Parts of the source code of the display method of a JOGL renderer (Java) for drawing
the example surface with glDrawElements from separate triangles (GL_TRIANGLES)

3.3.2 Triangle Strips


Since the example surface consists of contiguous triangles, it is obvious to
use the geometric primitive for drawing a sequence of connected triangles
(GL_TRIANGLE_ STRIP) to increase efficiency.

Fig. 3.24 Example of a surface of nine vertices drawn with a short GL_TRIANGLE_STRIP: The
left part of the figure shows the top view of the surface with the vertices $$v_j$$ , the indices
$$i_k$$ and the drawing order of the triangles (large numbers). The right part of the figure shows
the output of an OpenGL renderer

The top view of the example surface in the left part of Fig. 3.24 shows
the vertices $$v_j$$ for the vertex buffer and the indices
$$i_k$$ for the index buffer. The content of the index buffer and the
drawing sequence of the triangles are derived from this.

Fig. 3.25 Parts of the source code of the init method of a JOGL renderer (Java) for drawing the
example surface with glDrawElements from a short sequence of connected triangles
(GL_TRIANGLE_STRIP)

Fig. 3.26 Parts of the source code of the display method of a JOGL renderer (Java) for drawing
the example surface with glDrawElements from a short sequence of connected triangles
(GL_TRIANGLE_STRIP)

Figures 3.25 and 3.26 show the relevant parts of the source code for
drawing with glDrawElements. To avoid multiple calls of this drawing
command, a single continuous sequence of triangles was defined. Due to
the geometry, the renderer draws a triangle degenerated to a line on the right
side. This line is created by index $$i_6$$ (see the left part of Fig.
3.24). Triangle 5 is only represented by index $$i_7$$. If the
geometry were flat, this degenerated triangle would not be visible. In this
example, however, the surface is bent, so the additional triangle on the right
edge of the surface is visible as an additional line (see right side of Fig.
3.24). If such a surface is only viewed from above and is bent downwards,
this may not be a problem. However, it can happen that such a geometry is
bent upwards and such degenerated triangles become visible above the
surface and thus disturb the representation of the object.
One solution to this problem is to use more degenerate triangles that
follow the geometry. The left side of Fig. 3.27 again shows in a top view
the assignment of indices $$i_k$$ to vertices $$v_i$$. After
triangle 4 has been drawn, index $$i_6$$ draws a triangle
degenerated to a line from vertex $$v_2$$ to vertex $$v_5$$.
Subsequently, index $$i_7$$ creates a triangle degenerated to a point
at the position of vertex $$v_5$$. Then $$i_8$$ draws a
triangle degenerated to a line from $$v_5$$ to $$v_8$$ until
$$i_9$$ represents triangle 5. The degenerated triangles now follow
the geometry and are not visible, as can be seen in the right part of Fig.
3.27.

Fig. 3.27 Example of surface of nine vertices drawn with a long GL_TRIANGLE_STRIP: The left
part of the figure shows the top view of the surface with the vertices $$v_j$$ , the indices
$$i_k$$ and the drawing order of the triangles (large numbers). The right part of the figure shows
the output of an OpenGL renderer

Figures 3.28 and 3.29 show the relevant source code for this longer and
improved GL_TRIANGLE_STRIP. It should be noted at this point that all
considerations in this section about drawing a sequence of connected
triangles (GL_TRIANGLE_STRIP) using index buffers through
glDrawElements can also be transferred to drawing without index
buffers using glDrawArrays. However, for more complex objects, it is
easier to first consider the vertices and then separately define the drawing
order of the triangles using indices. In this respect, indexed drawing not
only saves memory, but also facilitates the modelling of objects.
Furthermore, changes to objects that become necessary in an interactive
application are simplified, since the structure of the object is stored in the
vertex buffer independently from the rendering order. For example, changes
in the position of vertices can be achieved by changing the vertex buffer
alone. The use of connected triangles for drawing supports the items 1 and
3 from the list in Sect. 3.3.
Fig. 3.28 Parts of the source code of the init method of a JOGL renderer (Java) for drawing the
example surface with glDrawElements from a long sequence of connected triangles
(GL_TRIANGLE_STRIP)

Looking closely at the right part of Fig. 3.27, it can be seen that three
indices are needed for the first triangle in the lower triangle sequence
(triangle 5). For each further triangle, only one index is needed. This is the
same situation as at the beginning of the entire triangle sequence. Triangle 1
needs three indices and the subsequent triangles only one index each. By
using the three degenerated triangles at the transition from triangle 4 to
triangle 5, a new triangle sequence (a new GL_TRIANGLE_STRIP) has
effectively been started in the lower part of the surface.

Fig. 3.29 Parts of the source code of the display method of a JOGL renderer (Java) for drawing
the example surface with glDrawElements from a long sequence of connected triangles
(GL_TRIANGLE_STRIP)

3.3.3 Primitive Restart


Instead of restarting the sequence indirectly within a triangle sequence by
degenerated triangles (see Sect. 3.3.2), the OpenGL provides the explicit
mechanism primitive restart. For this purpose, a special index is defined
that can be used in the index buffer so that when drawing with
glDrawElements the geometric primitive is restarted. The subsequent
indices in the buffer after this special index are used as if the drawing
command had been called again. Since this is a special index, primitive
restart is not available for the glDrawArrays drawing command.
However, this mechanism can be used for indexed drawing with all
geometric primitives. Thus, primitive restart supports the items 1 and 3 of
the list from Sect. 3.3.
Figures 3.30 and 3.31 show the relevant source code parts to draw the
example surface using primitive restart. This mechanism is enabled by
glEnable(gl.GL_ PRIMITIVE_RESTART). The
glPrimitiveRestartIndex method can be used to define the
specific index that will cause the restart. In this example, the value 99 was
chosen. In order to avoid conflicts with indices that refer to an entry in the
vertex buffer, it makes sense to choose this special index as the largest
possible value of the index set. The figure shows that the indices listed after
index 99 were chosen as if the drawing command had been invoked anew.
The glDrawElements call (see Fig. 3.31) is no different from a call
without this mechanism. Only the correct number of indices to use,
including the special restart indices, has to be taken into account.

Fig. 3.30 Parts of the source code of the init method of a JOGL renderer (Java) for drawing the
example surface with glDrawElements from a sequence of connected triangles
(GL_TRIANGLE_STRIP) using primitive restart

Fig. 3.31 Parts of the source code of the display method of a JOGL renderer (Java) for drawing
the example surface with glDrawElements from a sequence of connected triangles
(GL_TRIANGLE_STRIP) using primitive restart

In this example for drawing the surface, the index buffer with primitive
restart is exactly as long as in the last example from Sect. 3.3.2 without
primitive restart. However, this mechanism prevents the drawing of the
three (degenerated) triangles. Furthermore, modelling is simpler, since the
drawing of degenerated triangles does not have to be considered. The use of
degenerated triangles can be error-prone or not so simple, as awkward
modelling can unintentionally turn backsides into front sides, or the
degenerated triangles can become visible after all.
As explained in Sect. 3.2.3, triangles and in particular sequences of
connected triangles (GL_TRIANGLE_STRIP) have many advantages for
the representation of object surfaces. Therefore, much effort has been
invested in developing algorithms and tools to transform objects from
simple triangles into objects consisting of as few and short sequences of
connected triangles as possible. Together with the hardware support of
graphics processors, such objects can be drawn extremely quickly. Thus, the
GL_TRIANGLE_STRIP has become the most important geometric
primitive in computer graphics.

3.3.4 Base Vertex and Instanced Rendering


In the OpenGL, further commands exist to draw even more efficiently and
flexibly than shown in the previous sections. For example,
glDrawElementsBaseVertex can be used to add a constant (base
vertex) to the index value after accessing the index buffer. Afterwards, the
resulting index is used to access the vertex buffer. This makes it possible to
store different parts of a complex geometry in a vertex buffer and to display
these parts separately. This drawing command supports item 1 of the list
from Sect. 3.3.
The item 2 from the list in Sect. 3.3 is supported by the following two
drawing commands. These commands allow multiple instances (copies) of
the same object to be drawn with a single call.

For this purpose, the vertex shader has the predefined variable
gl_Instance_ID, which is incremented by one with each instance,
starting from zero. The value of this counter can be used in the shader to
vary with each instance, for example, by changing the position, colour or
event, the geometry of the object instance. This mechanism can be used
effectively to represent, for example, a starry sky or a meadow of many
individual plants.

3.3.5 Indirect Draw


Indirect drawing provides further flexibility and thus further support of the
items of the list from Sect. 3.3. With this mechanism, the parameters for the
drawing command are not passed with the call of the drawing command,
but are also stored in a buffer (on the GPU). The following drawing
commands use indirect drawing.

Fig. 3.32 Parts of the source code of the init method of a JOGL renderer (Java) for drawing the
example surface with glDrawElementsIndirect from a sequence of connected triangles
(GL_TRIANGLE_STRIP) using primitive restart and an indirect draw buffer

Figures 3.32 and 3.33 show the relevant source code for rendering the
example surface using the indirect draw command
glDrawElementsIndirect. A GL_TRIANGLE_STRIP with
primitive restart is used. In addition to the vertex and index buffers, another
buffer object (GL_DRAW_INDIRECT_BUFFER) is reserved on the GPU
and filled with the parameter values for the drawing command. The
parameter values for indexed drawing are expected in the buffer in the
following order:

$${\textbf {count}}\!:$$ Number of indices that refer to vertices and are


necessary for drawing the object.
$${\textbf {instanceCount}}\!:$$ Number of object instances to be
drawn.
$${\textbf {firstIndex}}\!:$$ First index into the index buffer. This
allows multiple index sequences to be kept
in the same buffer.
$${\textbf {baseVertex}}\!:$$ Value of the base vertex; see Sect. 3.3.4
$${\textbf {baseInstance}}\!:$$ Value to access other vertex data for
individual object instances (for details,
see, for example, [4]).

Buffer creation works as with the other buffers via the


glGenBuffers command (not included in the source code parts shown).
The glBindBuffer command binds this buffer to the target
GL_DRAW_INDIRECT_BUFFER, defining it as a buffer containing the
parameter values for an indirect draw command. As with other buffers,
glBufferData transfers the data into the buffer on the GPU. With the
drawing command glDrawElementsIndirect, only the geometric
primitive to be drawn, the type of the index buffer and the index (offset in
bytes) at which the drawing parameters are to be found in this buffer need
to be specified (see Fig. 3.33).

Fig. 3.33 Parts of the source code of the display method of a JOGL renderer (Java) for drawing
the example surface with glDrawElementsIndirect from a sequence of connected triangles
(GL_TRIANGLE_STRIP) using primitive restart and an indirect draw buffer

The following extensions of these indirect drawing commands allow


drawing with several successive parameter sets located in the indirect
buffer:

These commands are very powerful and the amount of usable objects and
triangles is in fact only limited by the available memory. Thus, millions of
objects can be autonomously drawn on the GPU by a single call from the
graphics application running on the CPU.

3.3.6 More Drawing Commands and Example Project


In the OpenGL there are further variants and combinations of the drawing
commands described in the previous sections. However, this presentation
would go beyond the scope of this book. Therefore, for details on the
commands described here and for further drawing commands, please refer
to the OpenGL Super Bible [4, p. 249ff], the OpenGL Programming Guide
[1, p. 124ff] and the OpenGL specification [3].
The supplementary material to the online version of this chapter
includes the complete project JoglSimpleMeshPP, which can be used to
reproduce the examples of the drawing commands explained in the previous
sections.

3.4 Exercises
Exercise 3.1 Which three main types of basic geometric objects in
computer graphics do you know? Explain their properties. What are the
properties of a convex polygon?

Exercise 3.2 Which three geometric primitives for drawing triangles are
provided in the OpenGL? Explain their differences.

Exercise 3.3 Given the vertex position coordinates $$(-0.5, 1.0, -0.5)$$,
$$(0.5, 1.0, -1.0)$$, $$(-0.5, 0, -0.5)$$ and $$(0.5, 0, -1.0)$$ of a
square area in space. The last two position coordinates, together with the
additional position coordinates $$(-0.5, -1.0, -0.5)$$ and
$$(0.5, -1.0, -1.0)$$, span another square area that is directly adjacent to
the first area.
(a) Draw the entire area using a glBegin/glEnd command sequence
using the geometric primitives for drawing triangles. Alternatively,
use the GL_TRIANGLES, GL_TRIANGLE_STRIP and
GL_TRIANGLE_FAN primitives so that the same area always results.
How many commands must be specified within each
glBegin/glEnd block?
(b)
Draw the entire area with a single glDrawArrays command and a
vertex buffer object. Alternatively, use the GL_TRIANGLES,
GL_TRIANGLE_STRIP and GL_TRIANGLE_FAN primitives so
that the same area always results. How many drawing steps must be
specified in the command glDrawArrays (third argument) for each
alternative?

Exercise 3.4 Given the two-dimensional vertex position coordinates


$$(-0.5, 0.5)$$, (0.5, 0.5), $$(-0.5, -0.5)$$ and $$(0.5, -0.5)$$ of a
square area.
(a)
Draw this area using triangles and the glDrawArrays command (in
three-dimensional space). Make sure that the faces of the triangles are
aligned in a common direction.
(b)
Using the glCullFace command, alternately suppress the back
faces (backface culling), the front faces (frontface culling) and the
front and back faces of the polygons. Observe the effects on the
rendered surface.
(c)
Deactivate the suppression of drawing of front or backsides. Use the
glPolygonMode command to change the rendering of the front and
backsides of the triangles so that only the vertices are displayed as
points. Vary the size of the rendered points and observe the effects.
(d)
Deactivate the suppression of the drawing of front and backsides. Use
the glPolygonMode command to change the rendering of the front
and backsides of the triangles so that only the edges of the triangles
are displayed as lines. Vary the line width and observe the effects.
(e) Deactivate the suppression of the drawing of front and backsides. Use
the glPolygonMode command to change the rendering of the front
and backsides of the triangles so that the front side is displayed
and backsides of the triangles so that the front side is displayed
differently from the backside.

Exercise 3.5 In this exercise, a circle with centre (0, 0, 0) and radius 0.5 is
to be approximated by eight triangles.
(a)
Draw the circular area using a single glDrawArrays command and
the geometric primitive GL_TRIANGLE_FAN. Make sure that the
front sides of the triangles are oriented in a common direction. How
many drawing steps must be specified in the glDrawArrays
command (third argument)?
(b)
Draw the circular area using a single glDrawArrays command and
the geometric primitive GL_TRIANGLE_STRIP. Make sure that the
front sides of the triangles are oriented in a common direction.
Degenerated triangles must be used for this purpose. How many
drawing steps must be specified in the glDrawArrays command
(third argument)?
(c)
Draw the circular area using a single glDrawElements command
using an index buffer object and the geometric primitive
GL_TRIANGLE_STRIP. Make sure that the front sides of the
triangles are oriented in a common direction. Degenerated triangles
must be used for this purpose. How many drawing steps must be
specified in the glDrawElements command (third argument)?
(d)
Draw the circular area with a single glDrawElements command
and the use of primitive restart. The geometric primitive
GL_TRIANGLE_STRIP should still be used. Make sure that the front
sides of the triangles are oriented in a common direction. How many
drawing steps must be specified in the glDrawElements command
(third argument)?
(e)
Extend your program so that the circular area is approximated with an
(almost) arbitrary number res of triangles. Vary the value of the
variable res and observe the quality of the approximation.

Exercise 3.6 Drawing instances


(a) Draw a rectangle using a glDrawArrays command and the
geometric primitive GL_TRIANGLE_STRIP.
(b)
Use the glDrawArraysInstanced command to draw multiple
instances of the same rectangle. In order for these instances not to be
drawn on top of each other but to become visible, the vertex shader
must be modified. Use the predefined variable gl_Instances_ID
in the vertex shader to slightly shift the position of each instance of
the rectangle relative to the previous instances.
(c)
Change the vertex shader so that, depending on the instance number
gl_ Instances_ID, the geometry of the rectangle changes
(slightly) in addition to the position.

References
1. J. Kessenich, G. Sellers and D. Shreiner. OpenGL Programming Guide. 9th edition. Boston [u.
a.]: Addison-Wesley, 2017.

2. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6
(Compatibility Profile) - October 22 2019. Abgerufen 8.2.2021. The Khronos Group Inc, 2019.
URL: https://www.khronos.org/registry/OpenGL/specs/gl/glspec46.compatibility.pdf.

3. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Core
Profile) - October 22, 2019). Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https://
www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf.

4. G. Sellers, S. Wright and N. Haemel. OpenGL SuperBible. 7th edition. New York: Addison-
Wesley, 2016.

Footnotes
1 The figure shows the output of a particular implementation of the OpenGL. Since the OpenGL
specification only guarantees the correct rendering of convex polygons, the output of another
implementation (using a different GPU) may look different.
© Springer Nature Switzerland AG 2023
K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science
https://doi.org/10.1007/978-3-031-28135-8_4

4. Modelling Three-Dimensional Objects


Karsten Lehn1 , Merijam Gotzes2 and Frank Klawonn3
(1) Faculty of Information Technology, Fachhochschule Dortmund, University of
Applied Sciences and Arts, Dortmund, Germany
(2) Hamm-Lippstadt University of Applied Sciences, Hamm, Germany
(3) Data Analysis and Pattern Recognition Laboratory, Ostfalia University of Applied
Sciences, Braunschweig, Germany

Karsten Lehn (Corresponding author)


Email: [email protected]

Merijam Gotzes
Email: [email protected]

Frank Klawonn
Email: [email protected]

Supplementary Information
The online version contains supplementary material available at https://doi.org/10.1007/
978-3-031-28135-8_4.

This chapter contains an overview of the basic approaches for modelling three-
dimensional objects. Since modelling of surfaces of objects is of great importance in
computer graphics, special attention is given to this. Often the (curved) surfaces of objects
are approximated by planar polygons (see Chap. 3). Triangles are particularly well suited
for this. In modern graphics processors, the tessellation unit can decompose a curved
surface into planar polygons independently from the central processing unit. Freeform
surfaces are well suited for modelling curved surfaces of three-dimensional objects and
can be used as starting point for this decomposition. Therefore, this chapter presents the
basics of freeform surface modelling. Special attention is paid to the normal vectors of the
surfaces, as these are crucial for the calculation of illumination effects of the surfaces of
objects.

4.1 From the Real World to the Model


Before anything can be rendered on the screen, a three-dimensional virtual world must
first be created in the computer. In most cases, this virtual world contains more than just
the objects of the small section of the world that is ultimately to be displayed. For
example, an entire city or a park landscape could be modelled in the computer, of which
the viewer sees only a small part at a time.
The first thing to do is to provide techniques for modelling three-dimensional objects.
To describe a three-dimensional object, its geometric shape must be defined, but also the
properties of its surface. This includes how it is coloured and whether it is more matt or
glossy.
There are two possible approaches for the geometric modelling of objects. In many
applications, the objects do not represent real objects that already exist in reality. This is
true for fantasy worlds often seen in computer games, as well as prototypes of vehicles or
planned buildings that have not yet been built and may never be built. In these cases, the
developer must have suitable methods for modelling three-dimensional objects. Even if
real objects are to be represented, modelling might be necessary. In the case of existing
buildings or furniture, the essential dimensions are known, but they are by far not
sufficient for an approximately realistic representation of the geometric shapes, especially
if rounded edges are present, for example.
In other cases, very detailed measurement data about the geometric structure of objects
is available. 3D scanners allow an extremely precise measurement of surfaces. However,
these raw data are not suitable to be used directly for geometric modelling of the measured
objects. They are usually converted automatically, with possible manual correction, into
simpler surface models. The same applies to techniques for measuring inner geometric
structures. Such techniques allow, for example, the study of steel beams in bridges.
Another very important and rapidly developing field of application is medicine. X-ray,
ultrasound or tomography techniques provide information about different skeletal and
tissue structures, so that three-dimensional models of bones and organs can be computed.
The first step in computer graphics is thus to model a virtual world in the computer,
either manually by a developer or automatically derived from measurement data. To
represent a concrete section of this world, the viewer’s position and direction of view must
be defined. This includes the position of the viewer in the virtual world, in which direction
he is looking, how large his viewing angle (field of view) is and how far he can see. In this
way, a three-dimensional clipping area is defined so that only the objects in this area need
to be considered for rendering.
However, the viewer will not yet be able to see anything, as there is no light in the
virtual world yet. Therefore, information about the illumination of the virtual world must
be available. Only then can the exact representation of the surfaces of the objects be
calculated, i.e., how intensively they are illuminated with which light and where shadows
are located (see Chap. 9). Another problem to be solved is to determine which objects are
actually visible in the clipping area and which objects or parts of objects are hidden by
others (see Chap. 8). In addition, there are possible special effects such as fog, smoke or
reflections (see Chap. 11).
In the following sections of this chapter, important aspects of modelling three-
dimensional objects are explained.
4.2 Three-Dimensional Objects and Their Surfaces
In computer graphics, all objects are typically understood as three-dimensional objects in
the three-dimensional space $${\text {I}\!\text {R}}^3$$. If points and lines are to be
represented, then these must also be modelled as three-dimensional objects. This
corresponds to the observation of the real world. For example, in a naive modelling
approach, a sheet of paper could be considered a two-dimensional surface. In fact, the
sheet is a three-dimensional object, albeit with a very small height.

Fig. 4.1 Isolated and dangling edges and faces (to be avoided)

Figure 4.1 shows examples of how three-dimensional objects should not look. Isolated
or dangling edges and surfaces, as seen in the illustration, should be avoided.
The representation of an objects in computer graphics is usually determined by its
surface and not by the set of points it consists of. Exceptions to this can be transparent
objects. In computer graphics, therefore, it is mostly surfaces and not sets of points in
$${\text {I}\!\text {R}}^3$$ that are modelled. In applications where objects are
measured with 3D scanners or tomography data, no explicit description of the surface of
the objects is provided. In these cases, the object is therefore often first described as a
three-dimensional set of points (sometimes called a point cloud) and then, if necessary, the
surface of the object is derived from this set.
There are various sophisticated techniques for modelling complex surfaces with round
and curved shapes (see Sects. 4.5 and 4.6). However, for the representation of a scene in
computer graphics, these curved surfaces are usually approximated by (many) planar
polygons—usually triangles—to simplify the computation of lighting effects. For arbitrary
surfaces, it might even be impossible to find an analytical expression for the representation
of the projection. Efficient and fast calculations of projections would become impossible.
However, the situation is much easier for polygons. The intersection of a straight line
representing the direction of projection with a flat surface can be determined easily and
quickly. The approximation of curved surfaces by polygons is called tessellation. The fact
that only triangles are often used for the planar polygons is not a real limitation. Any
polygon can be triangulated, that is, divided into triangles. Figure 4.2 shows a
triangulation of a polygon using dashed lines. The advantage of triangles is that efficient
algorithms are available for the calculations to be performed in computer graphics, most of
which are implemented directly on the GPU. A disadvantage of polygons with more than
three edges is that it must be assured that all vertices lie in the same plane.

Fig. 4.2 Example of a triangulation of a polygon

The individual triangles or polygons used to model a surface are usually oriented so
that the side of the triangle on the surface of the object faces outwards. The orientation is
given by choosing the order of the vertices of the polygon so that they are traversed in a
counter-clockwise direction when the surface is viewed from the outside. Section 3.2.4
describes the polygon orientation in the OpenGL.

Fig. 4.3 According to the convention typically used in computer graphics, the front of the left polygon is oriented
towards the viewer. The front of the right polygon is oriented away from the viewer

In Fig. 4.3, this means that the triangle with the vertex order 0, 1, 2 points to the
viewer, i.e., the viewer looks at the corresponding surface from the front. In contrast, the
same triangle with the reverse vertex order 0, 2, 1 would remain invisible to the viewer
from this perspective, since he is looking at the surface from the back. The surface would
be invisible since it is impossible to see the surface of a solid three-dimensional object
from the inside. When polygons have an orientation, rendering can be accelerated
considerably, as surfaces that are not pointing at the viewer can be ignored for the entire
rendering process. This can be achieved by backface culling .
Isolated and dangling faces should be avoided since they can lead to unrealistic effects.
For example, they can be seen from one side, but become invisible from the other if
backface culling is activated.

Fig. 4.4 A tetrahedron with its vertices

Figure 4.4 shows a tetrahedron with the vertices


$$\boldsymbol{P}_0, \boldsymbol{P}_1, \boldsymbol{P}_2, . The four
\boldsymbol{P}_3$$
triangular faces can be defined by the following groups of vertices.
$$\boldsymbol{P}_0, \boldsymbol{P}_3, \boldsymbol{P}_1$$
$$\boldsymbol{P}_0, \boldsymbol{P}_2, \boldsymbol{P}_3$$
$$\boldsymbol{P}_0, \boldsymbol{P}_1, \boldsymbol{P}_2$$
$$\boldsymbol{P}_1, \boldsymbol{P}_3, \boldsymbol{P}_2$$
In the specification of the triangles, the order of the vertices was chosen in such a way
that the vertices are listed in a counter-clockwise direction when looking at the
corresponding outer surface of the tetrahedron.

4.3 Modelling Techniques


A very simple technique for modelling three-dimensional objects is offered by voxels. The
three-dimensional space is divided into a grid of small, equally sized cubes (voxels). An
object is defined by the set of voxels that lie inside of the object. Voxels are suitable, for
example, for modelling objects in medical engineering. For example, tomography data can
be used to obtain information about tissue densities inside the measured body or object. If,
for example, bones are to be explicitly modelled and represented, voxels can be assigned
to those areas where measurement points are located that indicate the corresponding bone
tissue density.
Figure 4.5 illustrates the representation of a three-dimensional object based on voxels.
The storage and computational effort for voxels is very large compared to pixels. The
division of the three-dimensional space into voxels is in a way analogous to the division of
a two-dimensional plane into pixels, if the pixels are interpreted as small squares. The
storage and computational effort increases exponentially with an additional dimension. At
a resolution of 4,000 by 2,000 pixels in two dimensions, eight million pixels need to be
managed and computed, which is roughly the resolution of today’s computer monitors.
Extending this resolution by 2,000 pixels into the third dimension requires
$$4000\times 2000\times 2000$$, or sixteen billion voxels.

Fig. 4.5 Example of modelling a three-dimensional object with voxels

An efficient form of voxel modelling is provided by octrees, which work with voxels
of different sizes. The three-dimensional object to be represented is first enclosed in a
sufficiently large cube. This cube is divided into eight equally sized smaller subcubes.
Subcubes that are completely inside or completely outside the object are marked with in or
off respectively and are not further refined. All other subcubes, that is, those that intersect
the surface of the object, are marked as on and broken down again into eight equally sized
subcubes. The same procedure is followed with these smaller subcubes. It is determined
which smaller subcubes are in, off or on. This subdivision is continued until no new
subcubes with the mark on result or until a maximum resolution is reached, i.e., until the
increasingly smaller subcubes have fallen below a minimum size.
To illustrate the concept of octrees, consider their two-dimensional counterpart,
the quadtrees. Instead of approximating a three-dimensional object with cubes, an area is
approximated with squares.

Fig. 4.6 Recursive decomposition of an area into squares

Figure 4.6 shows an area enclosed in a square that has been recursively divided into
smaller and smaller squares. A smaller square is only divided further if it intersects the
boundary of the area. The process ends when the squares have reached a predefined
minimum size.

Fig. 4.7 The quad tree to the area from Fig. 4.6

Figure 4.7 shows the corresponding quadtree. Octrees are similar to quadtrees, but
their inner nodes have eight child nodes instead of four, because the cubes are divided into
eight subcubes.
The voxel model and the octrees offer a way to approximate objects captured with
certain measurement techniques. The realistic representation of interactive scenes is
already possible with these models (see for example [1]), but not yet developed to the
point where widespread use in computer graphics would make sense.
For illumination effects such as reflections on the object surface, the inclination of the
surface plays an essential role. However, the cubes used in the voxel model and the octrees
do not have inclined surfaces, since surfaces of the cubes always point in the direction of
the coordinate axes. If the voxels are small enough, the gradient between the voxels could
be used as a normal vector. It is useful to subsequently approximate the surfaces of objects
modelled with voxels or octrees by freeform surfaces, which are presented in Sect. 4.6.
The voxel model and octrees are more tailored to creating models based on measured
data. Other techniques are used as direct design and modelling tools. Among these
techniques is the CSG scheme, where CSG stands for constructive solid geometry. The
CSG scheme is based on a collection of basic geometric objects from which new objects
can be created using transformations and regularised set-theoretic operations.

Fig. 4.8 An object created by the CSG scheme with the basic geometric objects and set-theoretic operations shown on
the right

Figure 4.8 shows on the left an object composed of the basic objects cuboid and
cylinder. The right part of the figure specifies how the corresponding basic geometric
objects were combined with set-theoretic operations to obtain the object on the left. The
necessary transformations are not indicated. For example, the middle section of the object
shown is formed from the difference between the cuboid and the cylinder, so that a
semicircular bulge is created.
Another modelling technique is the sweep representation. This technique creates three-
dimensional objects from surfaces that are moved along a path.

Fig. 4.9 Two objects created using sweep representation

For example, the horseshoe shape on the left in Fig. 4.9 is created by shifting a
rectangle along an arc that lies in depth. Correspondingly, the tent shape on the right in the
figure results from shifting a triangle along a line in the (negative) z-direction, i.e., into the
image plane.
It is known from Fourier analysis that any signal can be uniquely represented by a
superposition of sine and cosine oscillations of different frequencies, which in this context
are called basis functions. Since curves and surfaces from computer graphics can be
understood as such signals, it is in principle always possible to represent curves and
surfaces by functions. However, an exact representation may require an infinite number of
these basis functions (see Sect. 7.6). For some shapes or freeform surfaces, simple
functional equations exist that are relatively easy to represent analytically. For
considerations on the representation of surfaces by function equations in two variables, see
Sect. 4.5.
Probably the most important modelling technique is based on freeform surfaces
computed by parametric curves. A description of the analytical representation of freeform
surfaces and curves by polynomials, splines, B-splines and NURBS and their properties
can be found in Sect. 4.6.
For representing objects in computer graphics, as described earlier, the surfaces of
objects are often approximated with planar polygons (usually triangles). Describing the
surface of an object with polygons requires a list of points, a list of planar polygons
composed of these points, colour or texture information, and possibly the specification of
normal vectors of the surface. The normal vectors are needed to calculate light reflections.
When approximating a curved surface with planar polygons, not only the position
coordinates of the polygons to be formed are stored in the corresponding vertices, but
usually also the normal vectors of the (original) curved surface.

Fig. 4.10 Tessellation of the helicopter scene from Fig. 5.15

Fig. 4.11 Representation of a spheres with different tessellations

Also the basic geometric shapes (cone, cylinder, cuboid and sphere) used for the
representation of the helicopter scene in Fig. 4.10 are mostly tessellated for the
representation and thus approximated by triangles. The larger the number of triangles
chosen to approximate a curved surface, the more accurately the surface can be
approximated. Figure 4.11 shows a sphere for which the tessellation was refined from left
to right. The computational effort increases with the growing number of triangles. This
does not only apply to the determination of the triangles, which can be done before
rendering the scene. But also the calculations for light reflections on the surface, the
determination of which objects or polygons are hidden by others, and the collision
detection, i.e., whether moving objects collide, become more complex as the number of
triangles increases. As a rule, the computational effort increases quadratically with the
level of detail, since, for example, a doubling of the level of detail for the representation
(of a surface) results in a doubling in two dimensions and thus a quadrupling of the
number of triangles.
In some cases, an object in a scene is therefore stored in the scene graph in different
levels of detail. If, for example, a forest is viewed in the distance, it is sufficient to
approximate the trees only by a rough polygon model. The approximation of each
individual tree with thousands of triangles would increase the computational effort
enormously. When rendering the scene in this form, a single triangle in the rendered image
might not even fill a pixel. However, when the viewer enters the forest, the surrounding
trees must be rendered in much more detail, possibly even down to individual leaf
structures if the viewer is standing directly in front of a tree branch. This technique is
called level of detail (LOD) (see Sect. 4.5.1).
4.4 Modelling the Surface of a Cube in the OpenGL
Based on the basic OpenGL geometric objects introduced in Sect. 3.2 and the OpenGL
drawing commands from Sect. 3.3, below are some considerations for modelling the
surface of a cube. These considerations can be extended to other polyhedra.

Fig. 4.12 A cube with vertex labels at the corners

Figure 4.12 shows a cube in which the corners are marked as vertices
$$v_i (i = 0, 1, \dots , 7)$$. Indexed drawing using the drawing command
glDrawElements (see Sect. 3.3.1) will be used, as this method has efficiency
advantages for storing objects in memory. This also makes the considerations of this
section easier to understand. The eight vertices can be stored in a vertex buffer object.
Much of the content of this section is transferable to drawing without an index buffer
using glDrawArrays.
In the core profile, only triangles are available for drawing the surfaces, so that one
face of the cube must be divided into two triangles. This results in twelve triangles. If the
geometric primitive GL_TRIANGLES is used to draw individual triangles, then the index
buffer must contain a total of $$6 \cdot 2 \cdot 3 = 36$$ indices. For the front and
right faces, part of this buffer could look like this:
$$\begin{aligned} 0, 1, 2, 1, 3, 2, 2, 3, 6, 3, 7, 6. \end{aligned}$$
Since the cube faces consist of adjacent triangles, each with a common edge, the
geometric primitive GL_TRIANGLE_STRIP can be used to draw one face. This reduces
the number of indices per side to four and for the cube to a total of
$$6 \cdot 4 = 24$$ indices. However, the drawing command must be called again for
rendering each face.
After a brief consideration, it can be seen that adjacent faces, each consisting of two
triangles, can be represented with only one sequence of connected triangles
(GL_TRIANGLE_STRIP). However, at least two triangle sequences become necessary
with this approach. For example, the front, right and back faces can be combined into one
sequence and the top, left and the bottom faces into another sequence. Each
GL_TRIANGLE_STRIP consists of eight indices, requiring a total of 16 indices.
However, this makes it necessary to call the drawing command glDrawElements
twice, which is undesirable. This can be remedied by restarting the second triangle
sequence through the primitive restart mechanism (see Sect. 3.3.3). The special start index
required for this is needed once, which increases the number of indices in the index buffer
to 17.
If one face of the cube is no longer regarded as a unit consisting of two triangles, then
there is a further possibility for optimisation. Figure 4.13 shows the unrolled cube with
vertex labels at the corners. In contrast to the previous considerations, the triangles divide
the cube faces in a certain irregular way, which is irrelevant for the rendering result as
long as the surface polygons are filled. The large numbers within the triangles in the figure
indicate the order in which the triangles are drawn (render order). It can be seen from
these numbers that after drawing the front face the lower triangle of the right face is
drawn, then the bottom, left and top faces are drawn. Afterwards the upper part of the right
face is drawn and finally the back face is drawn. In this way, only one
GL_TRIANGLE_STRIP is required. Figure 4.13 also shows the order of the indices
$$i_j$$ that refer to the respective vertices in the index buffer. It can be seen that
this clever modelling approach only needs 14 indices.

Fig. 4.13 Mesh of a cube with vertices $$v_i$$ : The render order for a triangle sequence (GL_TRIANGLE_STRIP)
is indicated by large numbers. The order of the indices $$i_j$$ determines the render order

Fig. 4.14 Part of the source code of the init method of a JOGL renderer (Java) for drawing the cube with eight
vertices with glDrawElements from a sequence of connected triangles (GL_TRIANGLE_STRIP)

Fig. 4.15 Part of the source code of the display method of a JOGL renderer (Java) for drawing the cube from eight
vertices with glDrawElements from a sequence of connected triangles (GL_TRIANGLE_STRIP)

Fig. 4.16 Cube drawn with an OpenGL renderer

Figure 4.14 shows the definition of the index buffer in the Java source code.
Figure 4.15 shows the corresponding call of the JOGL drawing command. Figure 4.16
shows the cube drawn by the JOGL renderer, which uses the source code parts from
Figs. 4.14 and 4.15. The right side of the figure shows the drawn triangles that are not
visible with filled polygons, as shown on the left side.
The cube in Fig. 4.16 is very efficient to store in memory and to draw because only
one vertex is used per cube corner. This modelling is useful for rendering of wireframe
models and monochrome objects that are not illuminated. In realistic scenes, objects with
certain material properties are illuminated by different types of light sources, which causes
reflections and other special effects. Even for an object where all surfaces are the same
colour without illumination or are made of the same homogeneous material, illumination
effects change the rendered colours of (and even within) the surfaces to display on the
screen.
As explained in Chap. 9, in the standard illumination model of computer graphics the
colour values of the illuminated surfaces are influenced by the normal vectors of the
surfaces of the objects. Normal vectors are displacement vectors that are perpendicular to
the surface in question and are stored in the OpenGL—like in other computer graphics
libraries—exclusively in the vertices. For the cube in Fig. 4.12 the normal vectors for the
left face is $$\boldsymbol{n}_l$$, the front face is $$\boldsymbol{n}_f$$, the
top face is $$\boldsymbol{n}_u$$ , the right face is $$\boldsymbol{n}_r$$, the
back face is $$\boldsymbol{n}_b$$ and the bottom face is
$$\boldsymbol{n}_d$$ as follows:
$$\begin{aligned} \boldsymbol{n}_l = \left( \begin{array}{c} -1 \\ 0 \\ 0 \end{array}
\right) ; \boldsymbol{n}_f = \left( \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right) ;
\boldsymbol{n}_u = \left( \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \right) ;
\boldsymbol{n}_r = \left( \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right) ;
\boldsymbol{n}_b = \left( \begin{array}{c} 0 \\ 0 \\ -1 \end{array} \right) ;
\boldsymbol{n}_d = \left( \begin{array}{c} 0 \\ -1 \\ 0 \end{array} \right) .
\end{aligned}$$
Similar considerations apply to textures that can be applied to surfaces of objects. In the
OpenGL, texture coordinates are also stored exclusively in the vertices.
For the following explanations in this section, it is sufficient to consider a cube with
differently coloured but homogeneous faces. This means that the detailed illumination
calculation can be dispensed with these considerations for modelling. If the correct normal
vectors are later added to the vertices of such a cube, then a realistic illumination of this
object is possible. Due to the illumination calculation and the respective processing in the
graphics pipeline, the colour values also vary within the illuminated faces. This can also
be neglected for these modelling considerations at this point.
For the cube, this simplification means that the cube face can be coloured differently.
In the model so far in this section, three faces share one vertex containing the colour
information. For example, vertex $$v_0$$ determines the colour at the corner of the
front, the left and the top faces of the cube. But these faces must now be able to have
different colours. This means that three vertices are needed for each corner point instead of
just one, so that three different colours can be displayed and rendered for each adjacent
face.

Fig. 4.17 Mesh of a cube with vertex labels at each corner of each face

Figure 4.17 shows a mesh of an unrolled cube with three vertices per corner. This
gives each corner of a cube face its own vertex, so that different faces can be assigned
separate colours. As a result, these individual faces must be drawn separately.
Similar to the considerations for the cube with eight vertices (see above), a total of
twelve triangles must be drawn to represent the cube width differently coloured faces.
Using the geometric primitive GL_TRIANGLES for indexed drawing, the index buffer
contains a total of $$6 \cdot 2 \cdot 3 = 36$$ indices. For the front face (vertices
$$v_0$$ to $$v_3$$) and the right face, this buffer could look like this
$$\begin{aligned} 0, 1, 2, 1, 3, 2, 12, 13, 14, 13, 15, 14. \end{aligned}$$
For optimisation, each face of the cube can be rendered by four vertices using a triangle
sequence (GL_TRIANGLE_STRIP). This reduces the number of indices per side to four
and for the cube to a total of $$6 \cdot 4 = 24$$ indices. Since six triangle sequences
must be drawn, the drawing command must be called six times.
Fig. 4.18 Mesh of a cube with vertices $$v_i$$ : The render order for a triangle sequence (GL_TRIANGLE_STRIP)
is identical to the render order in Fig. 4.13. The order of the indices $$i_j$$ determines the render order

Using the render order of the triangles from Fig. 4.13, a single GL_TRIANGLE_
STRIP can also be derived from this cube mesh. Figure 4.18 shows the division of the
faces into triangles and the order of the indices. In this case, 26 indices are to be processed
in a single call to the drawing command glDrawElements. Since the faces consist of
different vertices, degenerate triangles must be drawn at the transition to an adjacent side.
For example, the first four indices draw the front face with the vertex order
$$v_0, v_1, v_2, v_3$$. After that, the index $$i_4$$ creates a triangle
degenerated to a line consisting of the vertices $$v_2, v_3, v_{12}$$. Subsequently,
the index $$i_5$$ creates another triangle degenerated to a line from the vertices
$$v_3, v_{12}, v_{13}$$. Only with the following index $$i_6$$ a (non-
degenerated) triangle is rendered again. In total, twelve triangles degenerated to a line
must be generated with this approach. The degenerated triangles are not visible in this
example, but they generate a certain computational effort for drawing, which is not
significant for a single cube. However, if a large number of cubes or cuboids are drawn,
this effort can be considerable.
To avoid drawing the degenerate triangles, an index buffer can be created using the
primitive restart mechanism (see Sect. 3.3.3). This mechanism renders six triangle
sequences (GL_TRIANGLE_STRIP) when the drawing command is called. Since each
face requires four indices and five triangle sequences must be restarted (the first sequence
is started by calling the drawing command), $$6 \cdot 4 + 5 = 29$$ indices are
required.

Fig. 4.19 Part of the source code of the init method of a JOGL renderer (Java) for drawing the cube by 24 vertices
with glDrawElements, from a sequence of connected triangles (GL_TRIANGLE_STRIP) with the primitive restart
mechanism

Fig. 4.20 Part of the source code of the display method of a JOGL renderer (Java) for drawing the cube by 24
vertices with glDrawElements, from sequences of connected triangles (GL_TRIANGLE_STRIP) with the primitive
restart mechanism

Fig. 4.21 Different views of a cube rendered by an OpenGL renderer

Figures 4.19 and 4.20 show the relevant parts of the source code and the corresponding
index buffer to enable drawing with the primitive restart mechanism. Figure 4.21 shows
the modelled cube from different viewer positions, rendered with the primitive restart
mechanism explained above.
As noted at the beginning of this section, almost all of the explanations about
modelling for indexed drawing (with glDrawElements) of this cube can be transferred
to drawing without an index buffer (with the drawing command glDrawArrays). For
this, the sequences of the indices in the index buffer can be used to build the
corresponding vertex buffer. Only the primitive restart mechanism is not applicable to this
type of drawing, as it is only intended for drawing with index buffers.
Furthermore, these considerations can also be applied to the geometric primitives,
which are only available in the compatibility profile. One cube face can be represented by
a quadrilateral (GL_QUADS), which requires four vertices. Such a quadrilateral can be
drawn by two connected triangles (GL_TRIANGLE_STRIP), which also requires four
vertices. Therefore, to draw the cube from separate cube faces, $$6 \cdot 4 = 24$$
vertices are needed.
The cube with eight vertices can also be composed of (at least) two connected
quadrilaterals (GL_QUAD_STRIP). This primitive is also only available in the
compatibility profile. For example, the first quadrilateral sequence consists of the left,
front and the right faces. The top, back and bottom faces form the second sequence. This
makes it possible to reduce the number of required vertices to
$$2 \cdot (4 + 2 + 2) = 16$$. However, the drawing of two quadrilateral sequences
must be initiated. Using degenerate quadrilaterals to realise the drawing sequence as in
Fig. 4.13 does not provide any efficiency advantage.

4.5 Surfaces as Functions in Two Variables


One principal way of modelling surfaces in three-dimensional space is to model the
surfaces by an implicit functional equation
$$\begin{aligned} F(x,y,z) \; = \; 0 \end{aligned}$$ (4.1)
or by an explicit functional equation of the form
$$\begin{aligned} z \; = \; f(x,y). \end{aligned}$$ (4.2)
The explicit form can always be transformed into an implicit form. However, choosing a
simple implicit or explicit equation that gives the desired form as an area is usually
extremely difficult. Nevertheless, this possibility is considered in the following, since the
explicit form is suitable, for example, for the representation of landscapes (see
Sect. 4.5.1).
The representation of an area by an implicit equation has the advantage that more
functions can be described than by an explicit form. For the calculation of the surface
itself, the solutions of the Eq. (4.1) must be determined, which in most cases is only
possible numerically and not analytically, since an analytical solution, for example
according to the z-coordinates, does not always exist. If a representation of the desired
area can be found as an explicit functional equation, then the solution is available.
Numerical evaluation can be applied, for example, in computer graphics pipelines that
use ray tracing techniques (see Sect. 9.9). Here, however, problems of convergence or
numerical stability may arise when calculating the solution of the functional equation.
Using the explicit form, the surfaces can be tessellated (see Sects. 4.2 and 4.3). No
convergence or numerical stability problems arise, but the level of detail of the tessellation
must be chosen appropriately. If the level of detail is too low, the function will be poorly
approximated and the rendering quality of the surface may be poor too. A too high level of
detail leads to a better approximation, but may result in an unacceptable computational
effort. To find a level of detail depending on the situation, for example, the level of
detail(LOD) method can be used (see Sect. 4.5.1).

Fig. 4.22 The surface generated by the function $$z = x\sin (7x)\cos (4y)$$

The remainder of this section contains considerations of the representation of surfaces


using explicit functional equations in two variables. Figure 4.22 shows an example of the
following function:
$$ z \; = \; f(x,y) \; = \; x\sin (7\,x)\cos (4y) \qquad \quad (-1 \le x,y \le 1). $$
A solid closed shape cannot be modelled by a single function. If necessary, however, such
a closed shape can be composed of several individual functions.
The surface defined by function (4.1) must also be approximated by triangles for
rendering. The simplest approach is to divide the desired area in the xy-plane by a
rectangular grid. For the grid points $$(x_i,y_j)$$ the function values
$$z_{ij} = f(x_i,y_j)$$ are calculated. The points $$(x_i,y_j,z_{ij})$$ are then
used to define the triangles to approximate the surface. Two triangles are defined over
each rectangle of the grid in the xy-plane. Above the rectangle defined by the two points
$$(x_i,y_j)$$ and $$(x_{i+1},y_{j+1})$$ in the xy-plane, the triangles
$$(x_i,y_j,f(x_i,y_j))$$, $$(x_{i+1},y_j,f(x_{i+1},y_j))$$,
$$(x_i,y_{j+1},f(x_i,y_{j+1}))$$ and
$$(x_{i+1},y_j,f(x_{i+1},y_j))$$, $$(x_{i+1},y_{j+1},f(x_{i+1},y_{j+1}))$$,
$$(x_i,y_{j+1},f(x_i,y_{j+1}))$$
are defined. Figure 4.23 illustrates this procedure.

Fig. 4.23 Approximation of a surface defined by a function by triangles

How well the function or the surface is approximated by the triangles depends on the
curvature of the surface and the resolution of the grid. A very high-resolution grid leads to
a better approximation quality, but results in a very large number of triangles and thus in a
high computational effort during rendering. It is therefore recommended to use a technique
similar to quadtrees already presented in Sect. 4.3.
If the function is defined over a rectangular area, it is initially approximated by only
two triangles resulting from the subdivision of the rectangles. If the maximum error of the
approximation by the triangle—in the simplest case the absolute error in z-direction can be
determined—is sufficiently small, i.e., if it falls below a given value
$$\varepsilon >0$$, the rectangle is not further divided into smaller rectangles.
Otherwise, the rectangle is divided into four smaller rectangles of equal size. Each of these
smaller rectangles is then treated in the same way as the original rectangle. If the
approximating error of the function on a smaller rectangle by two triangles is sufficiently
small, the corresponding triangles on the smaller rectangle are used to approximate the
function, otherwise the smaller rectangle is further divided recursively into four smaller
rectangles. In addition, a maximum recursion depth or a minimum size of the rectangles
should be specified so that the division algorithm terminates. The latter criterion, limiting
the number of steps, guarantees that the splitting will terminate even if the function is
discontinuous or has a pole at the boundary of the region on which it is defined.
For arbitrary functions, the calculation of the maximum error on a given
(sub-)rectangle is not necessarily analytically possible. It is therefore advisable to use a
sufficiently fine grid—for example, the grid that would result from the smallest
permissible rectangles—and to consider the error only on the grid points within the
respective (sub-)rectangle.

4.5.1 Representation of Landscapes


Most of the objects with curved surfaces are usually not described by arbitrary functions
of the form $$z = f(x,y)$$, but are composed of low-degree polynomials or rational
functions, which make modelling more intuitive and easier to understand than with
arbitrary functions. This approach to modelling is explained in more detail in Sect. 4.6.
However, the technique of representing functions of the form $$z = f(x,y)$$, as
introduced here, can be used for modelling landscapes, for example in flight simulators.
Artificial landscapes with hills, bumps and similar features can be generated relatively
easily by suitable functions. Further, real landscapes can be rendered using the above
technique for function approximation based on triangles. This section only considers the
pure geometry or topology of the landscape and not the concrete appearance in detail, such
as grassy areas, tarmac roads, stone walls or the like. This would be a matter of textures
that are discussed in Chap. 10. For real landscapes, altitude information is usually
available over a sufficiently fine grid or lattice. These elevations indicate for each point of
the grid how high the point is in relation to a certain reference plane, for instance in
relation to sea level. This means, for example, that in a file the geographical height is
stored for each intersection point of a grid that is laid over the landscape. Instead of
evaluating the function $$z = f(x,y)$$ at the corresponding grid points
$$(x_i,y_j)$$, the value $$z_{ij} = f(x_i,y_j)$$ is replaced by the altitude
value $$h_{ij}$$ at the corresponding point of the landscape to be modelled.
For a larger landscape, a uniform division into triangles or a division depending on the
landscape structure would result in a large number of triangles having to be considered in
the rendering process. On the one hand, a high resolution of the landscape is not necessary
for parts of the landscape that are far away from the viewer. On the other hand, for the
parts of the landscape that are close to the viewer, a higher resolution is necessary for a
realistic rendering. It is therefore advantageous if the number of triangles used for
rendering depends on the distance to the viewer. Very close parts of the landscape are
divided very finely and the further away the parts of the landscape are from the viewer, the
larger the division into triangles is made (see [3, Chap. 2]). Figure 4.24 illustrates this
principle of clipmaps. The altitude information is not shown in the figure, only the
resolution of the grid and the triangles.

Fig. 4.24 Level of Detail (LOD) partitioning of a landscape using clipmaps

Such techniques, where the number of triangles to represent a surface depends on the
distance of the viewer, are used in computer graphics not only for landscapes but also for
other objects or surfaces that consist of many triangles. If the viewer is close to the object,
a high resolution with a high number of triangles is used to represent the surface. At
greater distances, a coarse resolution with only a few triangles is usually sufficient. This
technique is also called level of detail (LOD) (see Sect. 10.1.1).
The methods described in this section refer to a purely geometric or topological
modelling of landscapes. For achieving realistic rendering results, additional textures have
to be applied to the landscape to reproduce the different ground conditions such as grass,
sand or tarmac. Explanations on textures can be found in Chap. 10.

4.6 Parametric Curves and Freeform Surfaces


For the representation of a scene, the surfaces of the individual objects are approximated
by triangles, but this representation is not suitable for modelling objects. Freeform
surfaces are much better suited for this purpose. They are the three-dimensional
counterpart of curves in the plane as described in Sect. 3.1. Like these curves, a freeform
surface is defined by a finite set of points which it approximates. Storing geometric
objects in memory based on freeform surfaces allows working with different resolutions
when rendering these objects. The number of triangles used to approximate the surface can
be varied depending on the desired accuracy of the representation. In addition, the normal
vectors used for the illumination effects are not calculated from the triangles, but directly
from the original curved surface, which creates more realistic effects.
The modelling of curved surfaces is based on parametric curves. When a surface is
scanned parallel to one of the coordinate axes, a curve in three-dimensional space is
obtained.

Fig. 4.25 Two curves that result when the curved surface is scanned along a coordinate axis

Figure 4.25 illustrates this fact. The fundamentals of parametric curves in three-
dimensional space are therefore essential for understanding curved surfaces, which is why
the modelling of curves is presented in the following section.

4.6.1 Parametric Curves


If a curve in space or in the plane is to be specified by a series of points—so-called control
points—the following properties are desirable to allow easy modelling and adjusting of the
curve.

Controllability: The influence of the parameters on the curve is intuitively


understandable. If a curve is to be changed, it must be easy for the user to see which
parameters he should change and how.
Locality principle: It must be possible to make local changes to the curve. For example,
if one control point of the curve is changed, it should only affect the vicinity of the control
point and not change the curve completely.
Smoothness: The curve should satisfy certain smoothness properties. This means, that
not only the curve itself should be continuous, that is, have no gaps or jumps, but also its
first derivative, so that the curve has no bends. The latter means that the curve must be
continuously differentiable at least once. In some cases, it may additionally be required
that higher derivatives exist. Furthermore, the curve should be of some limited variation.
This means that it not only passes close to the control points, but also does not move
arbitrarily far between the control points.

When the curve passes exactly through the control points, this is called interpolation,
while an approximation only requires that the curve approximate the points as closely as
possible. Through $$(n+1)$$ control points, an interpolation polynomial of degree n
or less can always be found that passes exactly through the control points. Nevertheless,
interpolation polynomials are not suitable for modelling in computer graphics. Besides the
problem that with a large number of control points the degree of the polynomial and thus
the computational effort becomes very large, interpolation polynomials do not satisfy the
locality principle. If a control point is changed, this usually affects all coefficients of the
polynomial and thus the entire curve. Clipping for such polynomials is also not
straightforward, since a polynomial interpolating a given set of control points can deviate
arbitrarily from the region around the control points. Therefore, it is not sufficient to
consider only the control points for clipping of such interpolation polynomials. The curve
must be rendered to check whether it passes through the clipping region. In addition, high-
degree interpolation polynomials tend to oscillate. This means that they sometimes
fluctuate strongly between the control points.

Fig. 4.26 An interpolation polynomial of degree five defined by the control points (0, 0), (1, 0), (2, 0), (3, 0), (4, 1) and
(5, 0)

Figure 4.26 shows an interpolation polynomial of degree five defined by six control
points, all but one of which lie on the x-axis. The polynomial oscillates around the control
points and has a clear overshoot above the highest control point. It does not remain within
the convex hull of the control points, which is represented by the red triangle.
The undesirable properties of interpolation polynomials can be avoided by dropping
the strict requirement that the polynomial must pass through all the control points. Instead,
it is sufficient to approximate only some of the control points. This leads to Bernstein
polynomials of degree n, which have better properties. The ith Bernstein polynomial of
degree n ( $$i \in \{0,\ldots ,n\}$$) is given by the following equation.
$$ B_i^{(n)} (t) \; = \; {n \atopwithdelims ()i} \cdot (1-t)^{n-i} \cdot t^i \qquad \quad
(t\in [0,1]) $$
The Bernstein polynomials satisfy two important properties.
$$ B_i^{(n)}(t) \in [0,1] \qquad \quad \text{ for } \text{ all } t\in [0,1] $$
That is, the evaluation of the Bernstein polynomials within the range of the unit interval
[0, 1] only yields values between zero and one. This and the second following property is
needed to construct curves that stay within the convex hull of their control points.
$$\begin{aligned} \sum _{i=0}^n B_i^{(n)}(t) \; = \; 1 \qquad \quad \text{ for } \text{
all } t \in [0,1] \end{aligned}$$
At each point in the unit interval, the Bernstein polynomials add up to one.
Bézier curves use Bernstein polynomials of degree n to approximate $$(n+1)$$
control points
$$\boldsymbol{b}_0, \ldots , \boldsymbol{b}_n \in {\text {I}\!\text . For
{R}}^p$$
computer graphics, only the cases of the plane with $$p=2$$ and the three-
dimensional space with $$p=3$$ are relevant. The control points are also
called Bézier points. The curve defined by these points
$$\begin{aligned} \boldsymbol{x}(t) \; = \; \sum _{i=0}^n \boldsymbol{b}_i \cdot
(4.3)
B_i^{(n)}(t) \qquad \quad (t \in [0,1]) \end{aligned}$$
is called Bézier curve of degree n.
The Bézier curve interpolates the start and end points, that is,
$$\boldsymbol{x}(0) = \boldsymbol{b}_0$$ and
$$\boldsymbol{x}(1) = \boldsymbol{b}_n$$ hold. The other control points are
generally not on the curve. The tangent vectors to the Bézier curve at the initial and final
points can be calculated as follows.
$$\begin{aligned} \boldsymbol{\dot{x}}(0)= & {} n \cdot (\boldsymbol{b}_1 -
\boldsymbol{b}_0) \\ \boldsymbol{\dot{x}}(1)= & {} n \cdot (\boldsymbol{b}_n -
\boldsymbol{b}_{n-1}) \end{aligned}$$
This means that the tangent at the starting point $$\boldsymbol{b}_0$$ points in the
direction of the point $$\boldsymbol{b}_1$$. The tangent at the end point
$$\boldsymbol{b}_n$$ points in the direction of the point
$$\boldsymbol{b}_{n-1}$$. This principle is also the basis of the definition of cubic
curves, as shown as an example in Fig. 3.2.
Fixing t in Eq. (4.3), due to the properties of the Bernstein polynomials, causes
$$\boldsymbol{x}(t)$$ to become the convex combination of the control points
$$\boldsymbol{b}_0, \ldots , \boldsymbol{b}_n$$, since the values
$$\boldsymbol{x}(t)$$ add up to one in every point t. Thus, the Bézier curve stays
within the convex hull of its control points (cf. [2, Sect. 4.3]).
If an affine transformation is applied to all control points, the Bézier curve of the
transformed points matches the transformation of the original Bézier curve. Bézier curves
are thus invariant under affine transformation such as rotation, translation or scaling.
Bézier curves are also symmetric in the control points, that is, the control points
$$\boldsymbol{b}_0, \ldots , \boldsymbol{b}_n$$ and
$$\boldsymbol{b}_n, \ldots , \boldsymbol{b}_0$$ lead to the same curve. The
curve is only passed through in the opposite direction.
If a convex combination of two sets of control points is used to define a new set of
control points, the convex combination of the corresponding Bézier curves again results in
a Bézier curve. This can be expressed mathematically as follows.
If the control points
$$\tilde{\boldsymbol{b}}_0, \ldots , \tilde{\boldsymbol{b}}_n$$ define the Bézier
curve $$\tilde{\boldsymbol{x}}(t)$$ and
the control points $$\hat{\boldsymbol{b}}_0, \ldots , \hat{\boldsymbol{b}}_n$$
define the Bézier curve $$\hat{\boldsymbol{x}}(t)$$,
then the control points
$$\alpha \, \tilde{\boldsymbol{b}}_0 + \beta \, \hat{\boldsymbol{b}}_0 ,
\ldots , \alpha \, \tilde{\boldsymbol{b}}_n + \beta \, define
\hat{\boldsymbol{b}}_n$$
the Bézier curve
$${\boldsymbol{x}}(t) = \alpha \, \tilde{\boldsymbol{x}}(t) + \beta \,
, if
\hat{\boldsymbol{x}}(t)$$
$$\alpha +\beta =1$$ and $$\alpha ,\beta \ge 0$$ hold.
If all control points lie on a line or a parabola, then the Bézier curve also results in the
corresponding line or parabola. Bézier curves also preserve certain shape properties such
as monotonicity or convexity of the control points.
Despite the many favourable properties of Bézier curves, they are unsuitable for larger
numbers of control points, as this would lead to a too high polynomial degree. For
$$(n+1)$$ control points, the Bézier curve is usually a polynomial of degree n.
Therefore, instead of Bézier curves, B-splines are more commonly used to define a curve
to a given set of control points. B-splines are composed of several Bézier curves of lower
degree—usually degree three or four. For this purpose, a Bézier curve is calculated for n
control points (for instance $$n = 4$$) and the last control point of the previous
Bézier curve forms the first control point of the following Bézier curve. In this way, B-
splines interpolate the control points at which each Bézier curves are joined. These
connecting points are called knots. The other control points are called inner Bézier points.

Fig. 4.27 B-Spline with knots $$P_1,P_4, P_7$$ and inner Bézier points $$P_2, P_3, P_5, P_6$$
Figure 4.27 shows a B-spline composed of two Bézier curves of degree three.
To avoid sharp bends at the connection of the Bézier curves, which is equivalent to the
non-differentiability of the curve, the respective knot and its two neighbouring inner
Bézier points must be collinear. This method for avoiding sharp bends is illustrated in
Fig. 3.3. By choosing the inner Bézier points properly, a B-spline of degree n can be
differentiated $$(n - 1)$$ times. Cubic B-splines are based on polynomials of degree
three and can therefore be differentiated twice when the inner Bézier points are chosen
correctly. In addition to the requirement of collinearity, another constraint must apply to
the neighbouring inner Bézier points.

Fig. 4.28 Representation of the condition for the inner Bézier points for a twice continuously differentiable cubic B-
spline

The B-spline shown in Fig. 4.28 is composed of two Bézier curves of degree three and
is defined by the knots $$P_1, P_4, P_7$$ and the inner Bézier points
$$P_2, P_3, P_5, P_6$$. In order to ensure twice differentiability, the segments of
the tangents must be in the same ratio to each other as indicated in the figure.
B-splines inherit the positive properties of Bézier curves. They stay within the convex
hull of the control points, are invariant under affine transformations, symmetric in the
control points, interpolate start and end points of the control points and satisfy the locality
principle.
A B-spline is composed piecewise of Bézier curves. These can be expressed in
homogeneous coordinates in the following form.
$$ \left( \begin{array}{c} P_x(t)\\ P_y(t)\\ P_z(t)\\ 1 \end{array} \right) $$
Here $$P_x(t), P_y(t), P_z(t)$$ are polynomials in t. Applying to this representation,
a perspective projection in the form of the matrix from Eq. (5.11) yields
$$ \left( \begin{array}{cccc} z_0 &{} 0 &{} 0 &{} 0\\ 0 &{} z_0
&{} 0 &{} 0\\ 0 &{} 0 &{} z_0 &{} 0\\ 0 &{} 0 &{}
-1 &{} 0\\ \end{array} \right) \cdot \left( \begin{array}{c} P_x(t)\\ P_y(t)\\ P_z(t)\\ 1
\end{array} \right) \; = \; \left( \begin{array}{c} P_x(t) \cdot z_0\\ \ P_y(t) \cdot z_0\\ \
P_z(t) \cdot z_0\\ \ - P_z(t) \end{array} \right) . $$
In Cartesian coordinates, the projection of a Bézier curve thus results in a parametric curve
that has rational functions in the individual coordinates:
$$ \left( \begin{array}{c} - \frac{P_x(t)}{P_z(t)} \cdot {z_0}\\[2mm] - \frac{P_y(t)}
{P_z(t)} \cdot {z_0}\\ - z_0 \end{array} \right) . $$
If the perspective projection of B-splines or Bézier curves usually results in rational
functions anyway, it is already possible to work with rational functions when modelling in
the three-dimensional space. The perspective projection of a rational function is again a
rational function. Instead of B-splines, it is very common to use the more
general NURBS (non-uniform rational B-splines). NURBS are generalisations of B-splines
based on extensions of Bézier curves to rational functions in the following form.
$$ \boldsymbol{x}(t) \; = \; \frac{\sum _{i=0}^n w_i \cdot \boldsymbol{b}_i \cdot
B_i^{(n)}(t)}{\sum _{i=0}^n w_i \cdot B_i^{(n)}(t)} $$
The freely selectable weights $$w_i$$ are called form parameters. A larger weight
$$w_i$$ increases the influence of the control point $$\boldsymbol{b}_i$$ on
the curve. In the sense of this interpretation and in order to avoid singularities, it is usually
required that the weights $$w_i$$ are positive.

4.6.2 Efficient Computation of Polynomials


In order to draw a parametric curve, polynomials must be evaluated. This also applies to
freeform surfaces. In most cases, polynomials of degree three are used for this purpose. In
this section, an efficient evaluation scheme for polynomials is presented, which is based
on a similar principle as the incremental calculations introduced in the context of the
Bresenham algorithm in Sect. 7.3.2. Although floating point arithmetic cannot be avoided
for polynomials in this way, it is at least possible to reduce the repeated calculations to
additions only.
To draw a cubic curve, the parametric curve is usually evaluated at equidistant values
of the parameter t. The corresponding points are computed and connected by line
segments. The same applies to freeform surface, which are also modelled by parametric
curves or surfaces in the form of polynomials. In order to evaluate a polynomial f(t) at the
points $$t_0$$, $$t_1= t_0 + \delta $$, $$t_2 = t_0 + 2 \delta , \ldots $$
with the step size $$\delta >0$$, a scheme of forward differences is used. For this
purpose, the polynomial f(t) has to be evaluated once by explicitly calculating the initial
value $$ f_0 = f(t_0)$$ at the point $$t_0$$ and then the changes
$$ \Delta f(t) \; = \; f(t+\delta ) - f(t) $$
are added in an incremental way as
$$ f(t+\delta ) \; = \; f(t) + \Delta f(t) $$
or
$$ f_{n+1} \; = \; f_n + \Delta f_n. $$
For a polynomial $$f(t) = at^3+bt^2+ct+d$$ of degree three this leads to
$$ \Delta f(t) \; = \; 3at^2\delta + t(3a\delta ^2+2b\delta ) + a \delta ^3 + b\delta ^2 +
c\delta . $$
In this way, the evaluation of a polynomial of degree three was reduced to the addition of
$$\Delta $$ values, which require an evaluation of a polynomial of degree two. For
this polynomial of degree two, forward differences can also be used:
$$\begin{aligned} \Delta ^2f(t)= & {} \Delta (\Delta f(t)) \; = \; \Delta f(t+ \delta ) -
\Delta f(t)\\[2mm]= & {} 6a\delta ^2 t + 6a\delta ^3 + 2b\delta ^2. \end{aligned}$$
The $$\Delta $$-values for the original polynomial of degree three are thus given by
the following formula.
$$ \Delta f_n \; = \; \Delta f_{n-1} + \Delta ^2 f_{n-1} $$
For the calculation of the $$\Delta ^2$$-values, multiplication still has to be carried
out. Applying the scheme of forward differences one last time, we get
$$ \Delta ^3 f(t) \; = \; \Delta ^2 f(t+\delta ) - \Delta ^2 f(t) \; = \; 6a\delta ^3. $$
Thus, multiplications are only required for the calculation of the initial value at
$$t_0 = 0$$:
$$\begin{aligned} f_0= & {} d \\ \Delta f_0= & {} a \delta ^3 + b\delta ^2 +
c\delta \\ \Delta ^2 f_0= & {} 6a \delta ^3 + 2b\delta ^2 \\ \Delta ^3 f_0= & {}
6a \delta ^3. \end{aligned}$$
All other values can be determined by additions alone. Table 4.1 illustrates this principle
of the difference scheme. Table 4.2 contains the evaluation of the difference scheme for
the example polynomial $$f(t) = t^3+2t+3$$, that is $$a=1$$, $$b=0$$,
$$c=2$$, $$d=3$$, with step size $$\delta =1$$.
Table 4.1 Difference scheme for the efficient evaluation of a polynomial of degree three

$$t_0= $$t_0 $$t_0 + $$t_0 + $$\ld


0$$ + \delta 2\delta $$ 3\delta $$ ots $$
$$
$$f_0$$ $$\rightarrow $$+$$ $$\rightarrow $$+$$ $$\rightarrow $$+$$ $$\ld
$$ $$ $$ ots $$
$$\Delt $${\nearrow + $${\nearrow + $${\nearrow + $$\ld
a f_0$$ \atop \rightarrow \atop \rightarrow \atop \rightarrow ots $$
}$$ }$$ }$$
$$\Delt $${\nearrow + $${\nearrow + $${\nearrow + $$\ld
a ^2 f_0$$ \atop \rightarrow \atop \rightarrow \atop \rightarrow ots $$
}$$ }$$ }$$
$$\Delt $${\nearrow $$\Delt $${\nearrow $$\Delt $${\nearrow $$\Delt $$\ld
a ^3 f_0$$ \atop \rightarrow a ^3 \atop \rightarrow a ^3 f_0$$ \atop \rightarrow a ^3 f_0$$ ots $$
}$$ f_0$$ }$$ }$$

Table 4.2 Difference scheme for the polynomial $$f(t) = t^3+2t+3$$ with step size $$\delta =1$$

$$t= $$t= $$t= $$t= $$t= $$\ld


0$$ 1$$ 2$$ 3$$ 4$$ ots $$
3 $$\rightarr 6 $$\rightarr 15 $$\rightarr 36 $$\rightarr 75 $$\ld
ow $$ ow $$ ow $$ ow $$ ots $$
3 $${\nearro 9 $${\nearro 21 $${\nearro 39 $${\nearro 63 $$\ld
w \atop w \atop w \atop w \atop ots $$
\rightarrow \rightarrow \rightarrow \rightarrow
}$$ }$$ }$$ }$$
$$t= $$t= $$t= $$t= $$t= $$\ld
0$$ 1$$ 2$$ 3$$ 4$$ ots $$
6 $${\nearro 12 $${\nearro 18 $${\nearro 24 $${\nearro 30 $$\ld
w \atop w \atop w \atop w \atop ots $$
\rightarrow \rightarrow \rightarrow \rightarrow
}$$ }$$ }$$ }$$
6 $${\nearro 6 $${\nearro 6 $${\nearro 6 $${\nearro 6 $$\ld
w \atop w \atop w \atop w \atop ots $$
\rightarrow \rightarrow \rightarrow \rightarrow
}$$ }$$ }$$ }$$

4.6.3 Freeform Surfaces


As explained at the beginning of the Sect. 4.6, freeform surfaces are closely related to
parametric curves. For the representation of curves, one parameter t is needed, whereas for
the representation of surfaces, two parameters are required.

Fig. 4.29 Example of a parametric freeform surface

If one of these two parameters is held fixed, the result is a curve on the surface, as
shown in Fig. 4.29. Bézier surfaces are composed of Bézier curves in parameters s and t.
$$ \boldsymbol{x}(s,t) \; = \; \sum _{i=0}^n \sum _{j=0}^m \boldsymbol{b}_{ij} \cdot
B_i^{(n)}(s) \cdot B_j^{(m)}(t) \qquad \quad (s,t \in [0,1]) $$
Usually, Bézier curves of degree three are used, so that $$m = n = 3$$ is chosen. To
define such a Bézier surface, $$(m+1)\cdot (n+1)$$ Bézier points
$$\boldsymbol{b}_{ij}$$, i.e., 16 in the case of cubic Bézier curves, must be
specified.

Fig. 4.30 A network of Bézier points defining a Bézier surface

Figure 4.30 illustrates how a network of Bézier points defines a Bézier surface.
Bézier surfaces have similar favourable properties as Bézier curves. The four vertices
$$\boldsymbol{b}_{00}, \boldsymbol{b}_{0m}, lie on the
\boldsymbol{b}_{n0}, \boldsymbol{b}_{nm}$$
surface. This is usually not the case for the other control points. The surface remains
within the convex hull of the control points. The curves with constant value
$$s = s_0$$ are Bézier curves with respect to the control points
$$ \boldsymbol{b}_j \; = \; \sum _{i=0}^n \boldsymbol{b}_{ij} \cdot B_j^{(n)}(s_0).
$$
The same applies to the curves with the constant parameter $$t = t_0$$.
Since tessellation in computer graphics usually involves an approximation of surfaces
by triangles rather than quadrilaterals, Bézier surfaces of degree n, usually $$n = 3$$
, are sometimes defined over a grid of triangles as follows:
$$ \boldsymbol{x}(t_1,t_2,t_3) \; = \; \sum _{i,j,k\ge 0: \; i+j+k=n}
\boldsymbol{b}_{ijk} \cdot B_{ijk}^{(n)}(t_1,t_2,t_3). $$
The corresponding Bernstein polynomials are given by
$$ B_{ijk}^{(n)}(t_1,t_2,t_3) \; = \; \frac{n!}{i!j!k!} \cdot t_1^i \cdot t_2^j \cdot t_3^k
$$
where $$t_1 + t_2 + t_3 = 1$$, $$t_1, t_2, t_3 \ge 0$$ and $$i+j+k=n$$
(for $$i,j,k \in {\text {I}\!\text {N}}$$ ). The triangular grid is shown in Fig. 4.31.

Fig. 4.31 A triangular grid for the definition of a Bézier surface

4.7 Normal Vectors for Surfaces


To render a scene realistically, illumination effects such as reflections must be taken into
account. Reflections depend on the angles at which light rays hit a surface. Normal vectors
of the surface are required to calculate these angles. Illumination effects and reflections
are explained in detail in Chap. 9. This section presents the determination of the normal
vectors of a surface.
For a triangle located in a plane surface, the normal vectors all point in the same
direction, corresponding to the normal vector of the plane. If the plane induced by a
triangle is given by the equation
$$\begin{aligned} Ax + By + Cz + D \; = \; 0, \end{aligned}$$ (4.4)
then the vector $$(A,B,C)^\top $$ is the non-normalised1 normal vector to this
plane. This can be easily seen from the following consideration. If
$$\boldsymbol{n} = (n_x, n_y, n_z)^\top $$ is a not necessarily normalised normal
vector to the plane and $$\boldsymbol{v} = (v_x, v_y, v_z)^\top $$ is a point in the
plane, then the point $$(x, y, z)^\top $$ lies also in the plane if and only if the
connecting vector between $$\boldsymbol{v}$$ and $$(x,y,z)^\top $$ lies in
the plane. This means the connecting vector must be orthogonal to the normal vector. The
following must therefore apply.
$$ 0 \; = \; \boldsymbol{n}^\top \cdot \left( (x,y,z)^\top - \boldsymbol{v}\right) \; = \;
n_x\cdot x + n_y \cdot y + n_z \cdot z - \boldsymbol{n}^\top \cdot \boldsymbol{v} $$
If $$A = n_x$$, $$B = n_y$$, $$C = n_z$$ and
$$D = \boldsymbol{n}^\top \cdot \boldsymbol{v}$$, the result is exactly the plane
Eq. (4.4).
If a triangle is given by the three non-collinear points
$$\boldsymbol{P}_1, \boldsymbol{P}_2, \boldsymbol{P}_3$$, then the normal
vector to this triangle can be calculated using the cross product as follows.
$$ \boldsymbol{n} \; = \; (\boldsymbol{P}_2 - \boldsymbol{P}_1) \; \times \;
(\boldsymbol{P}_3 - \boldsymbol{P}_1) $$
The cross product of two vectors $$(x_1,y_1,z_1)^\top $$ and
$$(x_2,y_2,z_2)^\top $$ is defined by the vector
$$ \left( \begin{array}{c} x_1\\ y_1\\ z_1 \end{array} \right) \times \left( \begin{array}
{c} x_2\\ y_2\\ z_2 \end{array} \right) \; = \; \left( \begin{array}{c} y_1 \cdot z_2 - y_2
\cdot z_1\\ z_1 \cdot x_2 - z_2 \cdot x_1\\ x_1 \cdot y_2 - x_2 \cdot y_1\\ \end{array}
\right) . $$
The cross product is the zero vector if the two vectors are collinear.
In this way, the non-normalised normal vector of the plane can be determined from
Eq. (4.4). The value D in this equation can then be determined by inserting one of the
points of the triangle, i.e., one point in the plane, into this equation:
$$ D \; = \; \boldsymbol{n}^\top \cdot \boldsymbol{P}_1. $$
If a surface is described by a freeform surface, the normal vector at a point
$$\boldsymbol{x}(s_0,t_0)$$ on the surface can be determined as the normal vector
to the tangent plane at that point. The tangent plane is given by the two tangent vectors to
the parametric curves $$\boldsymbol{p}(s) = \boldsymbol{x}(s,t_0)$$ and
$$\boldsymbol{q}(t) = \boldsymbol{x}(s_0,t)$$ at the point $$x(s_0,t_0)$$.
$$\begin{aligned} \left( \frac{\partial }{\partial s} \boldsymbol{x}(s,t_0)\right)
_{s=s_0}= & {} \left( \frac{\partial }{\partial s} \sum _{i=0}^n \sum _{j=0}^m
\boldsymbol{b}_{ij} \cdot B_i^{(n)}(s) \cdot B_j^{(m)}(t_0) \right) _{s=s_0}\\[1mm]=
& {} \sum _{j=0}^m B_j^{(m)}(t_0) \cdot \sum _{i=0}^n \boldsymbol{b}_{ij}
\cdot \left( \frac{\partial B_i^{(n)}(s)}{\partial s}\right) _{s=s_0}\\[5mm] \left(
\frac{\partial }{\partial t} \boldsymbol{x}(s_0,t)\right) _{t=t_0}= & {} \left(
\frac{\partial }{\partial t} \sum _{i=0}^n \sum _{j=0}^m \boldsymbol{b}_{ij} \cdot
B_i^{(n)}(s_0) \cdot B_j^{(m)}(t) \right) _{t=t_0}\\[1mm]= & {} \sum _{i=0}^n
B_i^{(n)}(s_0) \cdot \sum _{j=0}^m \boldsymbol{b}_{ij} \cdot \left( \frac{\partial
B_j^{(m)}(t)}{\partial t}\right) _{t=t_0} \end{aligned}$$
These two tangent vectors are parallel to the surface at the point $$(s_0, t_0)$$ and
thus span the tangent plane of the surface at this point. The cross product of the two
tangent vectors thus yields the normal vector at the surface at the point
$$\boldsymbol{x}(s_0,t_0)$$.
When a curved surface in the form of a freeform surface is approximated by triangles,
the normal vectors for the triangles should not be determined after the approximation by
the triangles, but directly from the normal vectors of the surface. Of course, it is not
possible to store normal vectors at every point on the surface. At least for the points used
to define the triangles, the normal vectors of the curved surface should be calculated and
stored. Usually these will be the vertices of the triangle. In this way, a triangle can have
three different normal vectors associated with it, all of which do not coincide with the
normal vector of the plane defined by the triangle, as can be seen in Fig. 4.32. This
procedure improves the rendering result for illuminated surfaces.

Fig. 4.32 Normal vectors to the original surface in the vertices of an approximating triangle

4.8 Exercises
Exercise 4.1 The surface of the object in the figure on the right is to be modelled with
triangles. Give appropriate coordinates for the six nodes and corresponding triangles to be
formed from these nodes. Make sure that the triangles are oriented in such a way that the
counter-clockwise oriented surface face outwards. The object is two units high, one unit
deep and five units wide. Write an OpenGL program to render the object.

Exercise 4.2 Sketch the quad tree for the solid triangle in the figure on the right. Stop
this process at step two (including). The root (level 0) of the quad tree corresponds to the
dashed square.

Exercise 4.3 Let a sphere with radius one and centre at the coordinate origin be
tessellated as follows: Let $$n\in \mathbb {N}-\{0\}$$ be even. Let the grid of the
tessellation be spanned by $$\frac{n}{2}$$ equal sized longitude circles and
$$\frac{n}{2} - 1$$ different-sized latitude circles. Assume that the centres of the
latitude circles lie on the z-axis.
(a)
Construct a parameterised procedure that can be used to determine coordinates for
the vertices of the tessellation. Sketch the situation.
(b)
Determine the corresponding normal vectors of the corner points from the results of
task (a).
(c)
Write an OpenGL program that draws a sphere with an arbitrary number of latitude
and longitude circles. Approximate the surface of the sphere with triangles.
(d)
Check if all surfaces are oriented outwards by turning on backface culling. Correct
your program if necessary.
(f) Optimise your program so that the complete object is drawn with only one OpenGL
drawing command. If necessary, use the primitive restart mechanism.
Exercise 4.4 Let there be a cylinder whose centre of gravity lies at the origin of the
coordinates. The two circular cover faces have a radius of one. The cylindrical surface has
a length of two and is parallel to the z-axis. The surface of the cylinder is tessellated as
follows ( $$n\in \mathbb {N}-\{0\}$$): The cylindrical surface is divided into n
rectangles ranging from one cover surface to the other cover surface. The two circular
faces become two polygons having n corners. The angle between two successive vertices
as seen from the centre of the cover is $$\alpha = \frac{2\pi }{n}$$. The corner points
of the tessellation vary with the parameter $$i \cdot \alpha $$; $$i = 0,\dots ,n - 1$$.
(a)
Sketch the cylinder and its tessellation in a right-handed coordinate system.
(b)
What are the coordinates of the vertices in homogeneous coordinates for each
rectangle as a function of $$\alpha $$ and i?
(c)
What are the corresponding normal vectors at the vertices in homogeneous
coordinates?
d)
Write an OpenGL program that draws the cylindrical surface for any value of n.
Approximate the rectangles by two triangles each.
(e)
Complete your program with two approximated circles (polygons with n corners) to
represent the cover surfaces for any values for n. Approximate the polygons with n
corners with triangles. For this purpose the geometric primitive
GL_TRIANGLE_FAN is well suited.
(f)
Check if all surfaces are oriented outwards by turning on backface culling. Correct
your program if necessary.
(g)
Optimise your program so that the complete object is rendered with only one
OpenGL drawing command. Use the primitive restart mechanism if necessary.
(h)

Tailor your program so that the correct normal vectors are stored in the vertices.
Note that the normal vectors for the cylindrical surface are oriented differently from
those for the cover surfaces.
(i)
Extend your program so that a frustum (truncated cone) is drawn instead of a
cylinder. In this case, the two top surfaces must only be able to have (arbitrarily)
different radii.

References
1. C. Crassin, F. Neyret, M. Sainz, S. Green and E. Eisemann. “Interactive Indirect Illumination Using Voxel Cone
Tracing”. In: Computer Graphics Forum 30.7 (2011), pp. 1921–1930.

2. G. Farin. Curves and and Surfaces for CAGD: Practical Guide. 5th edition. Morgan Kaufmann, 2001.

3. M. Pharr, ed. GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose
Computation. Boston: Addison-Wesley, 2005.

Footnotes
1 For a normalised vector $$\boldsymbol{v}, \Vert \boldsymbol{v}\Vert = 1$$ must hold.
© Springer Nature Switzerland AG 2023
K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science
https://doi.org/10.1007/978-3-031-28135-8_5

5. Geometry Processing
Karsten Lehn1 , Merijam Gotzes2 and Frank Klawonn3
(1) Faculty of Information Technology, Fachhochschule Dortmund,
University of Applied Sciences and Arts, Dortmund, Germany
(2) Hamm-Lippstadt University of Applied Sciences, Hamm, Germany
(3) Data Analysis and Pattern Recognition Laboratory, Ostfalia University
of Applied Sciences, Braunschweig, Germany

Karsten Lehn (Corresponding author)


Email: [email protected]

Merijam Gotzes
Email: [email protected]

Frank Klawonn
Email: [email protected]

This chapter presents the individual steps of the geometric transformations,


which are necessary to be able to represent the geometry of real-world
objects on the screen. After introducing the concepts in the two-dimensional
plane, they are applied in three-dimensional space. The first step is the
geometric modelling of each object as a model of the virtual world to be
represented, as described in Sect. 4.1. This modelling is done for each
object in its own coordinate system, the model coordinate system.
Afterwards, all models are transferred into the coordinate system of the
virtual scene, the world coordinate system. From this step on, the
perspective comes into play, which is created by a view through the camera
from the viewer’s location. Afterwards, all areas not visible through the
camera are cut off, creating the so-called visible (clipping) area. The
clipping area is then mapped onto the output device. The mathematics of
this chapter can be deepened with the help of [1, 2].

5.1 Geometric Transformations in 2D


Besides the geometric objects, which are modelled in Chap. 4, geometric
transformations play an essential role in computer graphics. Geometric
transformations are used to position objects, i.e., to move or rotate them, to
deform them, for example, to stretch or compress them in one direction.
Geometric transformations are also used to realise movements or changes
of objects in animated graphics. Before the geometric transformations are
discussed in more detail, a few conventions should be agreed upon. In
computer graphics, both points and vectors are used, both of which are
formally represented as elements of the $${\text{I}{\!}\text{R} }^n$$.1
In the context of this book and from the perspective of computer graphics,
there is frequent switching between the two interpretations as point and
vector, so that these terms are dealt with very flexibly here to some extent.
A tuple $$(x_1,\ldots ,x_n)\in {\text{I}{\!}\text{R} }^n$$ can be
interpreted as a point in one equation and as a vector in the next.2 Column
vectors are generally used in equations in this book. In the text, dots are
sometimes noted as row vectors to avoid unnecessarily high rows. In cases
where a dot is used explicitly as a column vector, the transposition symbol
is usually written at the dot, i.e., it is written in the form
$$(x,y)^\top \in {\text{I}{\!}\text{R} }^2$$ and
$$(x,y,z)^\top \in {\text{I}{\!}\text{R} }^3$$, respectively.
The scalar product of two vectors $$\textbf{u}$$ and
$$\textbf{v}$$ is noted as follows:
$$ \textbf{u}^\top \cdot \textbf{v} \; = \; (u_1,\ldots , u_n) \cdot \left(
\begin{array}{c} v_1 \\ \vdots \\ v_n \end{array} \right) \; = \; \sum
_{i=1}^n u_i \cdot v_i. $$
The most important geometric transformations in computer graphics are
scaling and rotation, and shear and translation. Scaling causes stretching or
compression in the x- and y-direction. The point (x, y) is mapped to the
point $$(x',y')$$ as follows for a scaling $$S(s_x,s_y)$$:
$$ \left( \begin{array}{l} x' \\ y' \end{array} \right) \; = \; \left(
\begin{array}{l} s_x \cdot x \\ s_y \cdot y \end{array} \right) \; = \; \left(
\begin{array}{cc} s_x &{} 0\\ 0 &{} s_y \end{array} \right)
\cdot \left( \begin{array}{l} x \\ y \end{array} \right) . $$
$$s_x$$ is the scaling factor in the x-direction. There is stretching in
the x-direction exactly when l $$|s_x| > 1$$ applies. If
$$|s_x| < 1$$, there is compression. If the value $$s_x$$ is
negative, in addition to the stretching or compression in the x-direction
another mirroring is operated on the y-axis. Correspondingly, $$s_y$$
causes stretching or compression in the direction of the y-axis and, if the
value is negative, additionally a reflection on the x-axis.

Fig. 5.1 Scaling using the example of a rectangle

A scaling is—like all other geometric transformations—always to be


understood point by point, even if it is applied to objects. As an example,
the scaling with $$s_x=2$$ and $$s_y=0.5$$ is considered here,
which stretches in the direction of the x-axis to double compression and in
the direction of the y-axis to half compressed. If you apply this scaling to
the rectangle whose lower-left corner is at point (80, 120) and whose upper-
right corner is at point (180, 180), the resulting rectangle is not only twice
as wide and half as high as the original one but also shifted to the right and
down. Figure 5.13 shows the output rectangle and dashed the scaled
rectangle. The scaling always refers to the coordinate origin. If an object is
not centred in the origin, scaling always causes an additional displacement
of the object.
Another important geometric transformation is rotation, which is
defined by the specification of an angle. The rotation is counter-clockwise
around the coordinate origin or clockwise if the angle is negative. The
rotation $$R(\theta )$$ around the angle $$\theta $$ maps the
point (x, y) on the following point $$(x',y')$$:
$$ \left( \begin{array}{l} x' \\ y' \end{array} \right) \; = \; \left(
\begin{array}{l} x \cdot \cos (\theta ) - y\cdot \sin (\theta ) \\ x\cdot \sin
(\theta ) + y \cdot \cos (\theta ) \end{array} \right) \; = \; \left( \begin{array}
{cc} \cos (\theta ) &{} -\sin (\theta ) \\ \sin (\theta ) &{} \cos
(\theta ) \end{array} \right) \cdot \left( \begin{array}{l} x \\ y \end{array}
\right) . $$
Since the rotation always refers to the coordinate origin, a rotation of an
object not centred in the coordinate origin causes an additional
displacement of the object, just like scaling. In Fig. 5.2, a rotation of
$$45^\circ $$ was performed, mapping the output rectangle to the
dashed rectangle.

Fig. 5.2 Rotation using the example of a rectangle

As penultimate, elementary geometric transformation, shear is


introduced, which causes a distortion of an object. Shear is defined as a
scale by two parameters, with the difference that the two parameters are
defined in the secondary diagonals and not in the principal diagonals of the
corresponding matrix stand. If a shear $$Sh(s_x,s_y)$$ is applied to
the point (x, y), it results in the following point $$(x',y')$$:
$$ \left( \begin{array}{l} x' \\ y' \end{array} \right) \; = \; \left(
\begin{array}{c} x + s_x \cdot y\\ y + s_y \cdot x \end{array} \right) \; = \;
\left( \begin{array}{cc} 1 &{} s_x \\ s_y &{} 1 \end{array}
\right) \cdot \left( \begin{array}{l} x \\ y \end{array} \right) . $$
Analogous to scaling and rotation, the shear refers to the coordinate origin.
An object not centred in the coordinate origin is affected by shear and an
additional move. In computer graphics, shear is of rather minor importance
compared to the other geometric transformations. In Fig. 5.3, the dashed
rectangle is obtained by applying the shear with the parameters
$$s_x=1$$ and $$s_y=0$$ to the output rectangle.

Fig. 5.3 Shear using the example of a rectangle


Since $$s_y=0$$ is valid here, we speak of shear in the x-
direction. For shear in the y-direction, $$s_x=0$$ must apply.
The last elementary geometric transformation still to be considered is
relatively simple but differs substantially from the three transformations
introduced so far. The translation $$T(d_x,d_y)$$ causes a shift of
one vector $$\textbf{d} = (d_x,d_y)^\top $$ , i.e., the point (x, y) is
mapped to the point
$$ \left( \begin{array}{l} x' \\ y' \end{array} \right) \; = \; \left(
\begin{array}{c} x + d_x\\ y + d_y \end{array} \right) \; = \; \left(
\begin{array}{l} x \\ y \end{array} \right) + \left( \begin{array}{l} d_x \\
d_y \end{array} \right) . $$
Figure 5.4 shows a translation of a rectangle around the vector
$$\textbf{d} = (140,80)^\top $$.

Fig. 5.4 Translation using the example of a rectangle

In contrast to the transformations discussed earlier, which are all linear


mappings,4 a translation cannot be represented by matrix multiplication in
Cartesian coordinates. With matrix multiplication in Cartesian coordinates,
the zero vector, i.e., the origin of the coordinate system, is always mapped
to itself. A translation shifts but all points, including the coordinate origin.
Translations belong to the affine, but not to the linear mappings.
In computer graphics, more complex transformations are often created
by stringing together elementary geometric transformations. A
transformation, which results from a concatenation of different scales,
rotations and shears, is described by the matrix, which is obtained by
multiplying the matrices belonging to the corresponding elementary
transformations in reverse order. If translations are also used, the composite
transformation can no longer be calculated and represented in this simple
way. This is due to the fact that a translation is an addition of a constant
displacement vector dx and, consequently, a movement is caused. This
vector addition $$x+dx$$ in Cartesian coordinates cannot be
converted to express a matrix representation Ax. It would be advantageous
both from the point of view of memory and computational effort if all
operations connected with geometric transformations could be traced back
to matrix multiplications. To make this possible, a different coordinate
representation of the points is used, where the translation can also be
described in the form of a matrix multiplication. This alternative coordinate
representation is called homogeneous coordinates, which are explained in
more detail in the following Sect. 5.1.1.

5.1.1 Homogeneous Coordinates


At this point, homogeneous coordinates for points in the plane are
introduced. The same technique is used for three-dimensional space.
Homogeneous coordinates use an additional dimension to represent points
and vectors. The point (x, y, z) in homogeneous coordinates is defined with
the point $$\displaystyle \left( \frac{x}{z}, \frac{y}{z}\right) $$ in
Cartesian coordinates. The z-component of a point in homogeneous
coordinates must never be zero. For (directional) vectors and normals, the z-
component is set to zero in the homogeneous representation, i.e., in this
case, the first three components are identical to the Cartesian ones. If the
point $$(x_0,y_0)$$ is to be represented in homogeneous coordinates, the
representation $$(x_0,y_0,1)$$ can be used as so-called normalised
homogeneous coordinates. Although this representation occurs frequently, it
is not the only representation. Any representation
$$(z\cdot x_0, z \cdot y_0,z)$$ with $$z\ne 0$$ is also correct. The
points
$$\{(x,y,z) \in {\text{I}{\!}\text{R} }^3 \mid (x,y,z) = (z\cdot x_0, z
\cdot y_0,z)\}$$
all lie on the straight line in $${\text{I}{\!}\text{R} }^3$$, which is
defined by the equation system
$$\begin{aligned} x -x_0 \cdot z= & {} 0\\ y -y_0 \cdot z= &
{} 0 \end{aligned}$$
and which runs through the coordinate origin. Each point on this line,
except the coordinate origin, represents the point $$(x_0,y_0)$$ in
homogeneous coordinates. If a fixed z-value is selected for the
representation in homogeneous coordinates, e.g., $$z = 1$$, all points in
the plane parallel to the xy-plane are represented by the corresponding z-
value. Figure 5.5 illustrates these relationships. All points on the displayed
straight line represent the same point in $${\text{I}{\!}\text{R} }^2$$. If
one fixes a z-value, e.g., one of the planes drawn, the points of the
$${\text{I}{\!}\text{R} }^2$$ can be represented in the corresponding
plane.

Fig. 5.5 Homogeneous coordinates

The coordinate origin in Cartesian coordinates corresponds in


homogeneous coordinates to a point of the form (0, 0, z). A linear map with
respect to homogeneous coordinates, i.e., a linear map in
$${\text{I}{\!}\text{R} }^3$$, does not necessarily map this point to
itself. A linear map can map this point to another point in homogeneous
coordinates. A translation can be represented in homogeneous coordinates
as matrix–vector multiplication:
$$ \left( \begin{array}{c} x'\\ y'\\ 1 \end{array} \right) \; = \; \left(
\begin{array}{c} x + d_x \\ y + d_y \\ 1 \end{array} \right) \; = \; \left(
\begin{array}{ccc} 1 &{} 0 &{} d_x\\ 0 &{} 1 &{}
d_y\\ 0 &{} 0 &{} 1 \end{array} \right) \cdot \left( \begin{array}
{c} x\\ y\\ 1 \end{array} \right) . $$
The other elementary geometric transformations can easily be extended to
homogeneous coordinates, resulting in the following transformation
matrices:

Transformation Abbreviation Matrix


Translation $$T(d_x,d_y $$ \left( \begin{array}{ccc} 1 &{} 0 &{} d_x\\ 0
)$$ &{} 1 &{} d_y\\ 0 &{} 0 &{} 1
\end{array} \right) $$
Shear $$S(s_x,s_y) $$ \left( \begin{array}{ccc} s_x &{} 0 &{} 0\\ 0
$$ &{} s_y &{} 0\\ 0 &{} 0 &{} 1
\end{array} \right) $$
Rotation $$R(\theta $$ \left( \begin{array}{ccc} \cos (\theta ) &{} -\sin
)$$ (\theta ) &{} 0\\ \sin (\theta ) &{} \cos (\theta )
&{} 0\\ 0 &{} 0 &{} 1 \end{array} \right) $$
Scherung $$S(s_x,s_y) $$ \left( \begin{array}{ccc} 1 &{} s_x &{} 0\\
$$ s_y &{} 1 &{} 0\\ 0 &{} 0 &{} 1
\end{array} \right) $$
Rotations and translations conserve lengths and angles according to
their application. Scales and shears generally manipulate lengths or angles,
but at least the parallelism of lines is preserved.
The series-connected execution of geometric transformations in
homogeneous coordinates can, therefore, be realised by matrix
multiplication. The introduced matrices for the elementary transformations
all have the form
$$\begin{aligned} \left( \begin{array}{ccc} a &{} c &{} e
\\ b &{} d &{} f\\ 0 &{} 0 &{} 1\\ \end{array} (5.1)
\right) . \end{aligned}$$
The product of two such matrices results in a matrix of this form, as can be
easily calculated. Geometric transformations in computer graphics are
therefore usually represented and stored in this form. Especially in the
OpenGL homogeneous coordinates are used. This is not only valid for
transformations, which operate in the plane but in a similar way also for
spatial transformations, which are treated starting from Sect. 5.8. Therefore,
a graphics card of a computer must be able to execute vector and matrix
operations efficiently.

Fig. 5.6 Different results when the order of translation and rotation is reversed

When connecting transformations in series, it must be taken into


account that the order in which the transformations are carried out plays a
role. Matrix multiplication is a non-commutative operation. Figure 5.6
shows in the right-hand part the different results obtained by applying in
one case first a translation around the vector $$(40,20)^\top $$ and
then a rotation of $$45^\circ $$ and in the other case in the reverse
order to the rectangle in the left-hand part of the figure. This effect occurs
only if different transformations are linked together. When connecting
transformations of the same type in series, i.e., only rotations, only
translations, only scalings or only shearings, the order is not important.
Exchanges among themselves are only possible in a few cases. It should
also be noted that when matrix notation or notation of transformations as
mapping is used, the transformations are performed from right to left. This
is a common mathematical convention. The transformation
$$ (T(d_x,d_y)\circ R(\theta ))(\textbf{v}) $$
or in matrix notation
$$ \left( \begin{array}{ccc} 1 &{} 0 &{} d_x\\ 0 &{} 1
&{} d_y\\ 0 &{} 0 &{} 1 \end{array} \right) \cdot \left(
\begin{array}{ccc} \cos (\theta ) &{} -\sin (\theta ) &{} 0\\ \sin
(\theta ) &{} \cos (\theta ) &{} 0\\ 0 &{} 0 &{} 1
\end{array} \right) \cdot \textbf{v} $$
means that first the rotation $$R(\theta )$$ and then the translation
$$T(d_x,d_y)$$ are applied to the point $$\textbf{v}$$.

5.1.2 Applications of Transformations


In this section, some example applications and problems are explained that
can be solved with the help of geometric transformations.
In computer graphics, it is common practice to specify objects in
arbitrary coordinates in floating point arithmetic, the so-called world
coordinates. For the generation of a concrete graphics, a rectangular
window, the viewport, must be specified, which defines the area of the
“object world” visible on the screen or other output device. Therefore, a
mapping from the world coordinates to the device or screen coordinates
must be calculated. The viewport transformation into the device coordinates
from three-dimensional to two-dimensional space is dealt with in detail in
Sect. 5.38.
At this point, this transformation is greatly simplified for illustrative
purposes. Only one observation in two-dimensional space is carried out, in
which a two-dimensional object section that is converted into (simplified)
two-dimensional world coordinates transforms into a section of the screen,
the screen or window coordinates. Figure 5.7 illustrates this situation.

Fig. 5.7 From world to window coordinates (highly simplified two-dimensional view)

The rectangle with the lower-left corner


$$(x_{\text{ min }},y_{\text{ min }})$$ and the upper-right corner
$$(x_{\text{ max }},y_{\text{ max }})$$ at the top left of the image
specifies the section of the object world to be displayed, the window in
world coordinates. This world section must be drawn in the window with
the screen coordinates $$(u_{\text{ min }},v_{\text{ min }})$$ and
$$(u_{\text{ max }},v_{\text{ max }})$$ as the lower-left and upper-
right corner of the window on the screen. The two rectangles in world
coordinates and window coordinates do not have to be the same size or
have the same shape.
The mapping to the viewport can be realised by the following
transformations in series. First, the screen coordinates of the viewport are
translated into the coordinate origin. Afterwards, these coordinates of the
viewport in the origin are scaled to the size of the screen window in order to
position this scaled window on the screen by a translation at the correct
position on the screen. This results in the following transformation, where
$$\circ $$ is the series connection of these transformations.
$$\begin{aligned} T(u_{\text{ min }},v_{\text{ min }}) \circ S \left(
\frac{u_{\text{ max }} - u_{\text{ min }}}{x_{\text{ max }} -
x_{\text{ min }}}, \frac{v_{\text{ max }} - v_{\text{ min }}} (5.2)
{y_{\text{ max }} - y_{\text{ min }}} \right) \circ T(-x_{\text{ min
}},-y_{\text{ min }}). \end{aligned}$$
Here too, the transformations are to be carried out from right to left.
Rotations always refer to the coordinate origin. To perform a rotation
around any point $$(x_0,y_0)$$, this point must first be moved to the
coordinate origin by means of a translation, then the rotation must be
performed and finally, the translation must be undone. A rotation around the
angle $$\theta $$ around the point $$(x_0,y_0)$$ is realised by
the following series of transformations:
$$\begin{aligned} R(\theta ,x_0,y_0) \; = \; T(x_0,y_0)\circ R(\theta
(5.3)
) \circ T(-x_0,-y_0). \end{aligned}$$
If the rotation in this equation is replaced by a scale, it results in a scale
related to the point $$(x_0,y_0)$$. Depending on which device or
image format this window section is subsequently drawn on, further steps
must be taken to adapt it to the conditions. Individual image formats (e.g.,
PNG) have the coordinate origin in the upper-right-hand corner. More
transformations are needed here. This results in the so-called pixel
coordinates. Pixel coordinates of a window on the screen are usually
specified so that the first coordinate defines the pixel column and the
second coordinate defines the pixel line. For example, the x-axis, i.e., the
first coordinate axis, would run as usual from left to right, while the y-axis
of the window coordinates would point down instead of up. Through a
suitable geometric transformation, this effect can be avoided. Before a
drawing starts, one must first mirror on the x-axis. Mirroring causes the y-
axis in the window to point upwards, but still be at the top edge of the
window. After mirroring, a translation in the y-direction around the height
of the window must be performed so that the geometric transformation
$$\begin{aligned} T(0,h) \circ S(1,-1) \end{aligned}$$ (5.4)
is received where h is the height of the window in pixels.

5.1.3 Animation and Movements Using Transformations


So far, the geometric transformations were only used statically here to
describe the mappings of one coordinate system into another, and
positioning or deformation of objects. Geometric transformations are also
suitable for modelling movements such as the movement of the second
hand of a clock in the form of a stepwise rotation of $$6^\circ $$ per
second. Continuous movements must be broken down into small partial
movements, each of which is described by a transformation. In order to
avoid a jerky representation of the continuous motion, the partial
movements must be sufficiently small or the distance between two
consecutive images must be sufficiently short.
If the movement of an object is modelled by suitable transformations,
the object must be drawn by default, the transformed object must be
calculated, the old object must be overwritten and the transformed object
must be redrawn (in the OpenGL, efficient procedures are carried out for
this when using VBOs and VAOs; see Chap. 2). But it is by no means clear
what the pixels, which belong to the old object, should be overwritten with.
For this purpose, a unique background must be defined. In addition, the old
object must be completely rendered again to determine which pixels it
occupies. For this reason, the entire image buffer is usually rewritten.
However, you do not write directly to the screen buffer, but into a virtual
screen buffer, which is then transferred to the actual screen buffer.

Fig. 5.8 A moving clock with second hand


As a simple example, a clock with a second hand is considered here,
which is to run from the bottom left to the top right across a screen window.
The clock itself consists of a square frame and has an only one-second
hand. Minute and hour hands could be treated accordingly, but should not
be considered further for reasons of simplification. The square frame of the
clock must be replaced piece by piece from bottom left to top right by a
translation that can be moved. This translation must also act on the second
hand, which must also be rotated. Figure 5.8 shows at the same time
individual intermediate stations of the clock.
The clock could move two units to the right and one unit upwards at
each step, which could be achieved by a translation
$$ T_{\text{ clock,hand }} \; = \; T(2,1). $$
Correspondingly, the second hand would have to have a rotation of the form
$$ T_{\text{ hand,step }} \; = \; R(-\pi /30). $$
This should be used if the hand is to continue turning clockwise by
$$-\pi /30$$, i.e., by $$6^\circ $$ in each step. The hand starts
at the centre of the clock so that the hand must be rotated around this point.
One could position the clock at the beginning in such a way that the hand
starts at the coordinate origin. However, at the latest after one movement
step of the clock, the hand leaves this position and the centre of rotation
would have to be changed accordingly.
There are two strategies for describing such compound movements. In
the first strategy, you keep a record of where the object is—in our example
the second hand—and moves the centre of rotation accordingly. In general,
it is not enough to save only the displacement of the object under
consideration. For example, if the object is to expand along an axis using a
scale, the orientation of the object must be known. For example, if you want
the second hand to become longer or shorter in the course of one revolution
without changing its width, it is not sufficient to scale the reference point
moved with the object, as the hand would then also become thicker or
thinner. In principle, this strategy can be used to model movements, but the
following second strategy is usually easier to implement. The principle is to
always start from the initial position of the object, accumulate the geometric
transformations to be applied and apply them to the initial object before
drawing it. In the example of the clock, one could use the two
transformations mentioned above and three others:
$$\begin{aligned} T_{\text{ clock,total }}^{\text{(new) }}= & {}
T_{\text{ clock,step }} \circ T_{\text{ clock,total }}^{\text{(old) }} \\
T_{\text{ hand,total } \text{ rotations }}^{\text{(new) }}= & {}
T_{\text{ hand,step }} \circ T_{\text{ hand,total } \text{ rotations
}}^{\text{(old) }} \\ T_{\text{ hand,total }}= & {} T_{\text{
clock,total }} \circ T_{\text{ hand,total } \text{ rotations }}.
\end{aligned}$$
$$T_{\text{ clock,total }}$$ and
$$T_{\text{ hand,total } \text{ rotations }}$$ are initialised with the
identity at the beginning and then updated according to these equations in
each step. $$T_{\text{ clock,total }}$$ describes the (total)
translation that must be performed to move the clock from the starting
position to the current position. $$T_{\text{ clock,total }}$$ is
applied to the frame of the clock centred in the coordinate origin.
$$T_{\text{ hand,total } \text{ rotations }}$$ indicates the (total)
rotation around the coordinate origin that the hand has performed up to the
current time. In addition, the hand must perform the shift together with the
frame of the clock. The transformation $$T_{\text{ hand,total }}$$ is
therefore applied to the pointer positioned in the coordinate origin. It is
important that first the (total) rotation of the pointer and then the (total)
displacement are executed.
An alternative to this relatively complex modelling of movements is the
scene graph presented in Sect. 5.4.

5.1.4 Interpolators for Continuous Changes


Another way of describing movements or changes is to define an
interpolation from an initial state to an end state. An object should
continuously change from the initial to the final state. In the example of the
clock from the previous two sections, one would not use the transformation
$$T_{\text{ clock,step }} \; = \; T(2,1)$$, which is to be executed in a
sequence of, e.g., 100 images after the original image from one image to the
next, but is the start and end position, approximately
$$\textbf{p}_0 = (0,0)^\top $$ and $$\textbf{p}_1 =(200,100)^\top $$.
The points $$\textbf{p}_\alpha $$ on the connecting line between
the points $$\textbf{p}_0$$ and $$\textbf{p}_1$$ result from
the convex combination of these two points with
$$ \textbf{p}_\alpha \; = \; (1-\alpha ) \cdot \textbf{p}_0 + \alpha \cdot
\textbf{p}_1, \alpha \in [0,1]. $$
For $$\alpha = 0$$, one gets the starting point $$\textbf{p}_0$$
, for $$\alpha = 1$$, the endpoint $$\textbf{p}_1$$ and for
$$\alpha = 0.5$$, the point in the middle of the connecting line
between $$\textbf{p}_0$$ and $$\textbf{p}_1$$.
The principle of convex combination can be applied not only to points
or vectors but also to matrices, i.e., transformations. Later it will be shown
how continuous colour changes can also be created in this way (further
explanations of colours can be found in Chap. 6).
If two affine transformations are given by the matrices
$$M_{0}$$ and $$M_{1}$$ in homogeneous coordinates, their
convex combinations $$M_\alpha $$ are defined accordingly by
$$ M_\alpha \; = \; (1-\alpha ) \cdot M_0 + \alpha \cdot M_1, \alpha \in
[0,1]. $$

Fig. 5.9 Transformation of one ellipse into another using a convex combination of transformations

In this way, two objects created from the same basic object by applying
two different transformations can be continuously transformed into each
other. Figure 5.9 illustrates this process by means of two ellipses, both of
which originated from a basic ellipse by using different scales and
translations. In the upper left corner of the figure, the initial ellipse is
shown, which was created by the first transformation from the base ellipse.
In the lower right corner, the end ellipse is shown, which was created with
the second transformation from the base ellipse. The ellipses in between are
created by applying convex combinations of the two transformations to the
base ellipse.
One has to be careful with rotations. Of course, two rotation matrices
can be transformed into each other in the same way as in the ellipse
transformation just discussed. However, if linear interpolation of the
rotation depending on the angle is used, then it makes more sense to
interpolate between the angles of the two rotation matrices and then reinsert
them into the respective rotation matrices.
Another technique of continuous interpolation between two objects S
and $$S'$$ assumes that the two objects are separated by n control
points $$P_1 = (x_1,y_1), \ldots ,$$ $$P_n = (x_n,y_n)$$ or
$$P_1' = (x_1',y_1'), \ldots , P_n' = (x_n',y_n')$$ and connecting
elements (straight lines, square or cubic curves) which define these used
points. Corresponding connecting lines appear in both objects, i.e., if object
S contains the square curve (see Chap. 4) defined by points $$P_1$$,
$$P_3$$ and $$P_8$$, then object $$S'$$ contains the
square curve defined by points $$P'_1$$, $$P'_3$$ and
$$P'_8$$.
Figure 5.10 shows two simple examples of two objects in the form of
the letters D and C, for the definition of which five control points
$$P1,\ldots ,P5$$ or $$P1',\ldots ,P5'$$, respectively. Both
letters are described by two square curves:
A curve that starts at the first point, ends at the second and uses the third
as a control point. For the letter D, these are points P1, P2 and P3,
respectively, and for C, points $$P1'$$, $$P2'$$ and $$P3'$$,
respectively.
The second square curve uses the first point as the start point, the fourth
as the endpoint and the fifth as the control point.
If the two objects, in this case, the letters D and C, are to be transformed
into each other by a continuous movement, convex combinations can be
used for this purpose. Instead of the convex combination of
transformations, here the convex combinations of the corresponding pairs
of points $$P_i$$ and $$P_i'$$ are calculated, that is,
$$ P_i^{(\alpha )} \; = \; (1-\alpha )\cdot P_i + \alpha \cdot P_i'. $$
To display the intermediate image $$\alpha \in [0,1]$$, the
corresponding curve segments are drawn using the points
$$P_i^{(\alpha )}$$. In the example of the transformation of the letter
D into the letter C, the two square curves are drawn which are defined by
the points $$P_1^{(\alpha )}$$, $$P_2^{(\alpha )}$$ and
$$P_3^{(\alpha )}$$ or $$P_4^{(\alpha )}$$,
$$P_5^{(\alpha )}$$ and $$P_3^{(\alpha )}$$, respectively.

Fig. 5.10 Two letters, each defined by five control points and curves of the same type

Fig. 5.11 Stepwise transformation of two letters into each other

Figure 5.11 shows the intermediate results for the convex combinations
with $$\alpha = 0,0.2,0.4,0.6,0.8,1$$, using the points from Fig. 5.10
and drawing the corresponding quadratic curves.
In Sect. 6.4, further application possibilities of interpolators in
connection with colours and raster graphics are presented.

5.2 Geometrical Transformations in 3D


As in two-dimensional computer graphics, geometric transformations also
play an important role in three-dimensionality.
All three-dimensional coordinates in this book refer to a right-handed
coordinate system. Using the thumb of the right hand as the x-axis, the
index finger as the y-axis and the middle finger as the z-axis results in the
corresponding orientation of the coordinate system. In a right-handed
coordinate system, the x-axis is transformed into the y-axis by a
$$90^\circ $$ rotation, i.e., counter-clockwise, around the z-axis, the
y-axis is transformed into the y-axis by a $$90^\circ $$ rotation
around the x-axis into the z-axis and the z-axis by a $$90^\circ $$
rotation around the y-axis into the x-axis. Figure 5.12 shows a right-handed
coordinate system.

Fig. 5.12 A right-handed coordinate system

In Sect. 5.1.1, homogeneous coordinates were introduced to be able to


represent all affine transformations of the plane by matrix multiplication.
The same principle of extension by one dimension is used for affine
transformations of the three-dimensional space. One point of the three-
dimensional space homogeneous coordinate is represented by four
coordinates $$(\tilde{x},\tilde{y},\tilde{z},w)$$ with
$$w\ne 0$$. Thereby coded
$$(\tilde{x},\tilde{y},\tilde{z},w)$$ the point
$$\left( \frac{\tilde{x}}{w},\frac{\tilde{y}}{w},\frac{\tilde{z}}
{w}\right) \in {\text{I}{\!}\text{R} }^3$$
. The point $$(x,y,z) \in {\text{I}{\!}\text{R} }^3$$ can be measured
in homogeneous coordinates that can be displayed in the form (x, y, z, 1).
However, this is not the only possibility. Every representation of the form
$$(x\cdot w, \, y\cdot w,\, z\cdot w,\, w)$$ with $$w\ne 0$$ also
represents this point.

5.2.1 Translations
In the homogeneous coordinates, a translation around the vector
$$(d_x,d_y,d_z)^\top $$ is as matrix multiplication in the form
$$ \left( \begin{array}{l} x'\\ y'\\ z'\\ 1 \end{array} \right) \; = \; \left(
\begin{array}{cccc} 1 &{} 0 &{} 0 &{} d_x\\ 0 &{} 1
&{} 0 &{} d_y\\ 0 &{} 0 &{} 1 &{} d_z\\ 0
&{} 0 &{} 0 &{} 1 \end{array} \right) \cdot \left(
\begin{array}{c} x\\ y\\ z\\ 1 \end{array} \right) \; = \; \left( \begin{array}
{c} x+d_x\\ y+d_y\\ z+d_z\\ 1 \end{array} \right) $$
with the translation matrix:
$$ T(d_x,d_y,d_z) \; = \; \left( \begin{array}{cccc} 1 &{} 0 &
{} 0 &{} d_x\\ 0 &{} 1 &{} 0 &{} d_y\\ 0 &{} 0
&{} 1 &{} d_z\\ 0 &{} 0 &{} 0 &{} 1
\end{array} \right) . $$

5.2.2 Scalings
Scaling by the factors $$s_x,s_y,s_z$$ is given by
$$ \left( \begin{array}{c} x'\\ y'\\ z'\\ 1 \end{array} \right) \; = \; \left(
\begin{array}{cccc} s_x &{} 0 &{} 0 &{} 0\\ 0 &{}
s_y &{} 0 &{} 0\\ 0 &{} 0 &{} s_z &{} 0\\ 0
&{} 0 &{} 0 &{} 1 \end{array} \right) \cdot \left(
\begin{array}{c} x\\ y\\ z\\ 1 \end{array} \right) \; = \; \left( \begin{array}
{c} s_x\cdot x\\ s_y\cdot y\\ s_z\cdot z\\ 1 \end{array} \right) $$
with the scaling matrix:
$$ S(s_x,s_y,s_z) \; = \; \left( \begin{array}{cccc} s_x &{} 0 &
{} 0 &{} 0\\ 0 &{} s_y &{} 0 &{} 0\\ 0 &{} 0
&{} s_z &{} 0\\ 0 &{} 0 &{} 0 &{} 1
\end{array} \right) . $$

5.2.3 Rotations Around x-, y- and z-Axis


In two dimensions, it was sufficient to look at rotations around the
origin of the coordinates. In combination with translations, any rotation
around any point can be displayed. In three dimensions, a rotation axis must
be specified instead of a point around which to rotate. The three Elementary
rotations in the three dimensions are the rotations around the coordinate
axes. A rotation by a positive angle around a directed axis in the three
dimensions means that counter-clockwise rotation occurs while the axis
points towards the viewer. In this context, the right-hand rule comes into
play. Suppose the thumb of the right-hand points in the direction of the
oriented rotation axis, and the remaining fingers are clenched into a fist.
The fingers indicate the direction of positive rotation.
A rotation around the z-axis by the angle $$\theta $$ can be
described in homogeneous coordinates as follows:
$$ \left( \begin{array}{c} x'\\ y'\\ z'\\ 1 \end{array} \right) \; = \;
R_z(\theta ) \cdot \left( \begin{array}{c} x\\ y\\ z\\ 1 \end{array} \right) $$
with the rotation matrix:
$$ R_z(\theta ) \; = \; \left( \begin{array}{cccc} \cos \theta &{} -\sin
\theta &{} 0 &{} 0\\ \sin \theta &{} \cos \theta &{} 0
&{} 0\\ 0 &{} 0 &{} 1 &{} 0\\ 0 &{} 0 &{}
0 &{} 1 \end{array} \right) . $$
This rotation matrix corresponds to the rotation matrix around the
coordinate origin already known from the two dimensions, which was only
extended by the z-dimension. With a rotation around the z-axis, the z-
coordinate does not change. The matrices for rotations around the x- and y-
axis are obtained from the above matrix by swapping the roles of the axes
accordingly so that a rotation around the x-axis by the angle
$$\theta $$ is given by the matrix
$$ R_x(\theta ) \; = \; \left( \begin{array}{cccc} 1 &{} 0 &{} 0
&{} 0\\ 0 &{} \cos \theta &{} -\sin \theta &{} 0\\ 0
&{} \sin \theta &{} \cos \theta &{} 0\\ 0 &{} 0 &
{} 0 &{} 1 \end{array} \right) $$
and a rotation around the y-axis by the angle $$\theta $$ is realised by
the matrix
$$ R_y(\theta ) \; = \; \left( \begin{array}{cccc} \cos \theta &{} 0
&{} \sin \theta &{} 0\\ 0 &{} 1 &{} 0 &{} 0\\ -
\sin \theta &{} 0 &{} \cos \theta &{} 0\\ 0 &{} 0
&{} 0 &{} 1 \end{array} \right) . $$
Note that the rotation matrices are orthonormal, i.e., their normalised
column vectors (scaled to length one) are orthogonal to each other. Thus,
the inverse matrix of a rotation matrix is equal to the transported matrix,
just like orthonormal matrices.
In this context, the question remains about how to approach a rotation
around an arbitrary axis. It would be desirable that this is derived purely
from the geometric transformations treated so far. This is possible and will
be discussed in Sect. 5.5. However, further fundamental consideration of
coordinate system transformations and scene graphing is required to
understand the concepts and methods behind them better. These are
explained in Sects. 5.3 to 5.4.3.

5.2.4 Calculation of a Transformation Matrix with a Linear


System of Equations
For all the transformation matrices made so far, the last row is (0, 0, 0, 1).
This property is also retained during matrix multiplication.
The following properties can be determined for transformation matrices:
In the two-dimensional case, there is exactly one transformation matrix,
which maps three non-collinear points to three other non-collinear points.
Correspondingly, in the three-dimensional case, there is exactly one
transformation matrix, which maps four non-coplanar points to four other
image non-coplanar points. If four points
$$\textbf{p}_1,\textbf{p}_2,\textbf{p}_3,\textbf{p}_4$$, which do
not lie in a plane, are given in $${\text{I}{\!}\text{R} }^3$$ and their
new coordinates
$$\textbf{p}_1',\textbf{p}_2',\textbf{p}_3',\textbf{p}_4'$$, calculate
the transformation matrix by solving a linear system of equations:
$$\begin{aligned} \textbf{p}_i' \; = \; M\cdot \textbf{p}_i
(5.5)
(i=1,2,3,4). \end{aligned}$$
The matrix
$$ M \; = \; \left( \begin{array}{cccc} a &{} b &{} c &{}
d\\ e &{} f &{} g &{} h\\ i &{} j &{} k &{}
l\\ 0 &{} 0 &{} 0 &{} 1\\ \end{array} \right) $$
in the homogeneous coordinates must, therefore, be determined from the
four-vector equations (5.5), each consisting of three equations5 and the total
of twelve parameters of the matrix M.
In this sense, a transformation can also be understood as a change of the
coordinate system. This property is used later in the Viewing Pipeline, for
example, to view scenes from different angles (model, world and camera)
(see Sect. 5.10).

5.3 Switch Between Two Coordinate Systems


Each coordinate system is described by a base with the corresponding base
vectors and a coordinate origin. In the common three-dimensional right-
handed (Cartesian) coordinate system K the three basis vectors
$$e_{1}=(1, 0, 0)^{T}$$, $$e_{2}=(0 , 1,0)^{T}$$ and
$$e_{3}=(0, 0,1)^{T}$$. The coordinate system origin is located at
$$(0 , 0 ,0)^{T}$$. This results in the following matrix, which spans this
three-dimensional space
$$\begin{aligned} M=\left( \begin{array}{rrr} 1 &{} 0 &{} 0
\\ 0 &{} 1 &{} 0\\ 0 &{} 0 &{} 1\end{array}\right) ,
\end{aligned}$$
the so-called unit matrix or identity matrix. Interestingly, this matrix results
from the following matrix:
$$\begin{aligned} M=\left( \begin{array}{rrrr} 1 &{} 0 &{} 0
&{} 0\\ 0 &{} 1 &{} 0 &{} 0\\ 0 &{} 0 &{}
1 &{} 0\end{array}\right) , \end{aligned}$$
which contains the base vectors as columns, followed by the coordinate
origin as the location vector. Since the last column is redundant, it can be
omitted. This results in the following matrix in homogeneous coordinates:
$$\begin{aligned} M=\left( \begin{array}{rrrr} 1 &{} 0 &{} 0
&{} 0\\ 0 &{} 1 &{} 0 &{} 0\\ 0 &{} 0 &{}
1 &{} 0\\ 0 &{} 0 &{} 0 &{} 1 \end{array}\right) .
\end{aligned}$$
Let there be another Cartesian coordinate system K. Let $$b_{x}$$,
$$b_{y}$$ and $$b_{z}$$ be the associated basis vectors and b the
coordinate origin of A, represented with the coordinates in relation to K.
Then the representation of the points can be described from both K and A.
Different coordinates result for this description of a point P, depending on
whether it is viewed from K or from A. The transition (i.e., representation
with respect to K) at given coordinates that are dependent on A can be
realised by multiplying the vectors in A with the matrix whose columns
form the vectors $$b_{x}$$, $$b_{y}$$, $$b_{z}$$ and b. The
inverse matrix of this matrix represents the transition from K to A.
This connection is illustrated by the following example. For the sake of
simplicity, the two-dimensional case is considered. Figure 5.13 shows the
Cartesian coordinate system and another Cartesian coordinate system A.

Fig. 5.13 Example of the change between two coordinate systems

In relation to the Cartesian coordinate system K, the basis vectors of the


Cartesian coordinate system A are $$(3 , 4)^{T}$$, in normalised
homogeneous coordinates $$(\frac{3}{5}, \frac{4}{5}, 0)^{T}$$,
and the corresponding orthogonal vector $$( -4 , 3)^{T}$$, in
normalised homogeneous coordinates
$$(-\frac{4}{5} , \frac{3}{5} , 0)^{T}$$.
Since these vectors are not location vectors (with a fixed starting point),
they have a 0 as a third component. The origin of the coordinate system is
$$( 6 , 3)^{T}$$, in homogeneous coordinates
$$(6 ,3 , 1)^{T}$$.
In Fig. 5.13, it is easy to see that the Cartesian coordinate system can be
transformed into the coordinate system A by rotation through the angle
$$\alpha $$ (the angle between the vectors
$$(\frac{3}{5} , \frac{4}{5} , 0)^{T}$$ and $$(1, 0 , 0)^{T}$$,
the basis vector of the x-axis of K in homogeneous coordinates) and a shift
by the homogeneous vector $$(6 , 3 , 1)^{T}$$.
For the calculation of the angle $$\alpha $$, the scalar product is
used, it results in
$$ \cos \alpha = \frac{ (\frac{3}{5}~ \frac{4}{5}~ 0)\cdot \left(
\begin{array}{c} 1 \\ 0 \\ 0\end{array}\right) }{\left| \left( \begin{array}{c}
\frac{3}{5} \\ \frac{4}{5} \\ 0\end{array}\right) \right| \cdot \left| \left(
\begin{array}{c} 1 \\ 0 \\ 0\end{array}\right) \right| } = \frac{\frac{3}{5}}
{1\cdot 1} = \frac{3}{5}$$
and according to Pythagoras
$$\begin{aligned} \sin \alpha = \sqrt{1 -\cos ^{2} \alpha } = \frac{4}{5}.
\end{aligned}$$
For the series-connected execution of the transformations for the transfer,
the following matrix multiplication is performed:
$$ M = \left( \begin{array}{rrr} 1 &{} 0 &{} dx \\ 0 &{}
1 &{} dy \\ 0 &{} 0 &{} 1\end{array}\right) \cdot \left(
\begin{array}{rrr} \cos \alpha &{} -\sin \alpha &{} 0\\ \sin \alpha
&{} \cos \alpha &{} 0\\ 0 &{} 0 &{}
1\end{array}\right) = \left( \begin{array}{rrr} 1 &{} 0 &{} 6 \\ 0
&{} 1 &{} 3 \\ 0 &{} 0 &{} 1\end{array}\right) \cdot
\left( \begin{array}{rrr} \frac{3}{5} &{} -\frac{4}{5} &{} 0\\
\frac{4}{5} &{} \frac{3}{5} &{} 0\\ 0 &{} 0 &{}
1\end{array}\right) = \left( \begin{array}{rrr} \frac{3}{5} &{} -
\frac{4}{5} &{} 6\\ \frac{4}{5} &{} \frac{3}{5} &{} 3\\ 0
&{} 0 &{} 1\end{array}\right) $$
where dx describes the displacement in x- and dy the displacement in the y-
direction.
On closer examination of the result matrix M, it becomes clear that the
column vectors are identical to the basis vectors of the coordinate system A,
including the corresponding coordinate origin.
The inverse matrix of the result matrix $$M^{-1}$$ is
$$\left( \begin{array}{rrr} \frac{3}{5} &{} -\frac{4}{5} &{}
-6\\ - \frac{4}{5} &{} \frac{3}{5} &{} 3\\ 0 &{} 0 &
{} 1\end{array}\right) $$
that reverses the overpass.
The question arises with which matrix, i.e., M or $$M^{-1}$$, the
homogeneous vectors of the Cartesian points have to be multiplied in order
to calculate them from the point of view of the coordinate system A. At
first, it is astonishing that the multiplication with $$M^{-1}$$ must
be done. The reason is obvious on further consideration: The rotation of a
point with the angle $$\beta $$ in the fixed coordinate system has the
same effect as the rotation of the coordinate system by the angle
$$-\beta $$ with a fixed point. The same is true for displacement.
Since the points are treated as fixed objects and the coordinate systems are
transformed into each other, the matrix $$M^{-1}$$ must be applied
to the Cartesian points for the application to succeed in the transformation
into the coordinate system A. For the transfer of, e.g., point (4, 8) from A to
K, this results in
$$\left( \begin{array}{rrr} \frac{3}{5} &{} -\frac{4}{5} &{}
6\\ \frac{4}{5} &{} \frac{3}{5} &{} 3\\ 0 &{} 0 &{}
1\end{array}\right) \cdot \left( \begin{array}{c} 4 \\ 8 \\ 1
\end{array}\right) = \left( \begin{array}{c} 2 \\ 11 \\ 1 \end{array}\right) $$
and thus in Cartesian coordinates (2, 11). The way back over the inverse
matrix results in
$$\left( \begin{array}{rrr} \frac{3}{5} &{}\frac{4}{5} &{}
-6\\ \frac{4}{5} &{} \frac{3}{5} &{} 3\\ 0 &{} 0 &{}
1\end{array}\right) \cdot \left( \begin{array}{c} 2 \\ 11 \\ 1
\end{array}\right) = \left( \begin{array}{c} 4 \\ 8 \\ 1 \end{array}\right) $$
and thus (4, 8) from the point of view of coordinate system A.
These regularities are described in the so-called viewing pipeline (see
Sect. 5.10), which in particular describes the geometric transformations of
the vertices of an object from model coordinates to world coordinates via
camera coordinates to device coordinates.

5.4 Scene Graphs


5.4.1 Modelling
To model a three-dimensional scene, geometric objects must be defined and
positioned in the scene. Possibilities for modelling individual geometric
objects are presented in Chap. 4. Beside elementary basic objects like
cuboids, spheres, cylinders or cones, usually more complex techniques for
object modelling are available. As a rule, complex objects are composed of
individual subobjects. Figure 5.14 shows a chair that was constructed with
elementary geometric objects. The legs and the seat are cuboids, and the
backrest consists of a cylinder.

Fig. 5.14 A chair composed of elementary geometric objects

In order to model the chair, the elementary geometric objects must be


created with the appropriate dimensions and positioned correctly. The
positioning of the individual objects, i.e., the four legs, the seat and the
backrest, is carried out by means of suitable geometric transformations,
which are applied individually to each object. If one wants to place the
chair at a different position in the scene, e.g., move it further back, one
would have to define an appropriate translation and apply it additionally to
all partial objects of the chair. This would be very complex, especially for
objects that are much more complicated than the chair, if these moves
would have to be applied explicitly to each object component. When
modelling three-dimensional scenes, it is therefore common to use a scene
graph, in which objects can be hierarchically combined into transformation
groups. In the case of the chair, the chair forms its own transformation
group to which the legs, seat and backrest are assigned. A transformation
that is to be applied to the transformation group chair automatically affects
all elements that belong to this transformation group. In this way, the entire
chair can be positioned in the scene without having to explicitly apply the
transformation (required for positioning) to all sub-objects. The algorithm
that generates this data structure scene graph traverses visits the
transformation groups implicitly as soon as their ancestors (node that is
closer to the root on the path to the root node in the tree) have been visited.
This means that actions do not only affect the visited node but all its
descendants. This is used in the animation.
Consequently, the concept of the scene graph is to be illustrated in detail
by means of an example, which also includes animations. In the scene, there
is a very simplified helicopter standing on a cube-shaped platform. In
addition, a tree, which is also shown in a very abstract way, belongs to the
scene shown in Fig. 5.15 is shown.

Fig. 5.15 A scene composed of several elementary objects

Figure 5.16 shows a possible scene graph for this scene. The root node
of the overall scene above has two child nodes. Both are transformation
groups. Elementary geometric objects, further transformation groups and
geometric transformations can be assigned to a transformation group as
child nodes. The upper two transformation groups, tgHeliPlat and
tgTree, represent the helicopter with the platform and the tree in the
scene, respectively. Each of these two transformation groups is assigned, as
a direct child node, a transformation tfHeliPlat and tfTree,
respectively, which are used to position the helicopter together with the
platform or the whole tree in the scene.

Fig. 5.16 The scene graph for the scene from Fig. 5.15

The transformation group tfTree also has two transformation groups


as child nodes, tgTrunk for the trunk and tgLeaves for the crown. The
transformation group tgTrunk is only an elementary geometric object in
the form of a cylinder, which is generated in the coordinate origin like all
elementary objects. The transformation group tgLeaves consists of the
elementary geometric object leaves in the shape of a cone and a
transformation tfLeaves. This transformation is used to move the tree
crown created in the coordinate origin to the tree trunk.
The same applies to the transformation group tgHeliPlat, which
consists of the helicopter and the platform. Besides a transformation
tfHeliPlat for positioning in the scene, two transformation groups are
assigned to it: tgHelicopter and tgPlatform. The helicopter
consists of a transformation tfHelicopter for positioning the helicopter
on the platform and three further transformation groups to which
elementary geometric objects are assigned. The cabin cabin of the
helicopter consists of a sphere, the tail and the rotor that were created as a
cuboid. The transformations tfTail and tfRotor are used to position
the tail at the end of the cabin and the rotor on top of the cabin, respectively.
Let $$M_{\text {tfHeliPlat}}$$ be the transformation matrix to
place the platform into the scene, $$M_{\text {tfHelicopter}}$$ the
transformation matrix to place the helicopter from the position of the
platform onto it and $$M_{\text {tfRotor}}$$ the transformation
matrix to place the rotor from the position of the helicopter onto it. Then,
for example, $$T_{\text {Rotor}}$$ is the overall transformation
matrix to position the rotor from its model coordinates to the world
coordinates (on the helicopter):
$$ T_{\text {Rotor}} = M_{\text {tfHeliPlat}}\cdot M_{\text
{tfHelicopter}} \cdot M_{\text {tfRotor}}. $$
Note that the order of this matrix multiplication corresponds to the
corresponding transformations in the scene graph in the path from node
tgHeliPlat to node tgRotor. The transformations of this matrix
multiplication do not influence the transformations of other objects if the
corresponding matrices of the current object are processed individually
using a matrix stack (matrices stored in a stack). Afterwards, the matrix
stack is emptied. The transformations of other objects can be calculated
analogously in the matrix stack.
The following section explains how to add animations to these scene
graphs.

5.4.2 Animation and Movement


Only static three-dimensional worlds have been considered so far. To
describe dynamically changing scenes, similar techniques are used as they
were considered for the two-dimensional case in Sect. 5.1.3. Movements
can be realised as stepwise interpolation or convex combination between
positions or states. The same applies to many other dynamic changes, e.g., a
slow change of colour or brightness. In the two-dimensional case,
composite or complex movements, as in the example of the second hand of
a linearly moving clock, are generated by the explicit series connection of
the individual transformations. The use of scene graphs allows a much
easier handling of the modelling of movements. Each transformation group
can be assigned a motion that is applied to all objects assigned to it. For a
better illustration, the scene with the helicopter shown in Fig. 5.15 shall be
used again. The helicopter should start its rotor and then lift off the platform
diagonally upwards. Due to the linear motion of the helicopter together with
its own rotation, the rotor performs a more complex spiral motion. In the
scene graph, however, this can be realised very easily. The rotor is created
at the coordinate origin and there its rotation around the y-axis is assigned
to the corresponding transformation group. Only then is the rotor positioned
at the top of the cabin together with its own movement. If the entire
helicopter is placed on the platform and the platform is placed together with
the helicopter in the scene, the rotation is also transformed and still takes
place in the right place. A linear movement is assigned to the
transformation group of the helicopter, which allows the take-off of the
platform.

Fig. 5.17 Section of the scene graph with dynamic transformations

A transformation group can now be assigned objects, other


transformation groups, transformations for positioning or interpolators. The
interpolators are used to describe movements. Figure 5.17 shows a section
of the extended scene graph for the helicopter, with which the rotation and
lift-off of the helicopter can be described. The names of transformation
groups containing interpolators, i.e., movements, begin with the letters tgm
(for TransformationGroup with movement).
The transformation group tgmRotor contains the rotor as a
geometrical object centred in the coordinate origin and two rotational
movements br and brStart around the y-axis, which are executed one
after the other. The first rotational movement lets the rotor rotate slowly at
first. The second rotational movement describes a faster rotation, which
should finally lead to the take-off of the helicopter. The take-off of the
helicopter is realised in the transformation group tgmHelicopter. The
movements and the positioning should be cleanly separated in the
transformation groups. The rotor remains in the transformation group
tgmRotor in the coordinate origin, where its rotation is described. This
transformation group is assigned to another transformation group
tgRotor, which positions the rotor on the helicopter cabin. This
translation is also applied to the rotational movement, since the movement
is located in a child node of tgRotor. The same applies to the movement
of the helicopter. The ascent of the helicopter is described in the
transformation group tgmHelicopter relative to the coordinate origin.
Only then the helicopter is transformed together with its flight movement in
the parent transformation group tgHelicopter that is placed on the
platform so that it starts from there. It is also conceivable to swap the order
of the two transformation groups. The representation chosen here
corresponds to the modelling of the following movement: The helicopter
should take off from a platform until it finally reaches a given height h,
measured from the platform. The height of the platform is not important for
the helicopter in this movement. It must always cover a distance of h units.
However, if the helicopter is to take off from the platform and fly up to a
height h above the ground, the distance to be covered depends on the height
of the platform. In this case, the two transformation groups would be
swapped. However, in the transformation group tgmHelicopter, one
would have to define another movement starting from a different point.
Thus, the movement is calculated as the interpolation of transformations in
the scene graph.

5.4.3 Matrix Stacks and Their Application in the OpenGL


Sections 5.4.1 and 5.4.2 describe how individual objects in a scene can be
realised using a scene graph. In this context, it becomes clear that
transformations act on particular objects located in the sequence from the
object, including its transformations to the root within the scene graph.
Since the corresponding total matrix is created by matrix multiplication of
the corresponding matrices in reverse order, this entire matrix should only
affect one object in this scene and not influence the other objects. This is
realised with the help of matrix stacks. A stack is a data structure that stores
data on top of each other, as in a stack of books, and then empties it in the
reverse order. In a matrix stack, matrices are stored as data.

Fig. 5.18 Matrix stack in the fixed-function pipeline in display method (Java)

The last object placed on the stack is taken first again. For example, to
place two objects in a scene, the corresponding transformations of the
objects must be performed one after the other. This means, for example, that
after placing the first object in the scene, the corresponding transformations
must be undone (by applying the opposite transformations in relation to the
transformations performed, in reverse order) so that they do not affect the
second object. Afterwards, the same procedure is carried out with the
second object. This procedure would be too cumbersome. Another solution
is to use matrix stacks. With the first object, the matrices of the
transformations that are present on the path from the object to the root of
the scene graph are pushed into the matrix stack one after the other. Note
that the removal of these matrices in the matrix stack is done in reverse
order, which is identical to the order of matrix multiplication of these
matrices. With this matrix multiplication, the correct total matrix is
obtained, corresponding to the corresponding one after the other execution
of the transformations of this first object. Thus, before processing the
second object, the matrix stack is emptied. The procedure is repeated using
the matrix stack with the second object and its corresponding matrices.
Thus, the second object is placed in the scene independently of the first one.
The same procedure can be used to display all objects of a scene graph in
the scene. Movements and all transformations of the viewing pipeline (see
Sect. 5.10) are implemented similarly. This can be studied in depth in [5].
In the OpenGL fixed-function pipeline, matrix stacks are realisable via
the gl.glPushMatrix() and gl.glPopMatrix() functions. For
example, for two objects, the matrix stack is filled and emptied again as
shown in Fig. 5.18 (in the display method).
These commands are not present in the programmable pipeline in the
core profile of OpenGL. They either must be implemented, or a library
must be imported to implement this functionality. For example, the class
PMVMatrix (JOGL) works well with the core profile as shown in Fig.
5.19.

Fig. 5.19 Matrix stack in the core profile of the programmable pipeline with class PMVMatrix in
the display method (Java)

5.5 Arbitrary Rotations in 3D: Euler Angles,


Gimbal Lock, and Quaternions
From the point of view of translations and rotations for transferring an
already scaled object from model coordinates to world coordinates, the
following six degrees of freedom exist in three-dimensional space:
The position (x, y, z) (and thus three degrees of freedom) and
the orientation (with one degree of freedom each):
– rotation around the x-axis,
– rotation around the y-axis and
– rotation around the z-axis.
With these six degrees of freedom, any (already scaled) object,
including its model coordinate system, can be brought into any position and
orientation of the world coordinate system. Each scaled object is modelled
in the model coordinate system. The model coordinate system is congruent
at the beginning with the world coordinate system, the coordinate system of
the 3D scene. By the application of these six degrees of freedom also, a
coordinate system transformation of the model coordinate system (together
with the associated object) takes place each time, which results in the final
position and orientation (of the model coordinate system together with the
object) in the world coordinate system. Orientation about the x-, y- and z-
axes takes place by applying the rotations about the x-axis (with angle
$$\theta _x$$), y-axis (with angle $$\theta _y$$) and z-axis
(with angle $$\theta _z$$). The angles $$\theta _x$$,
$$\theta _y$$ and $$\theta _z$$ are the so-called Eulerian
angles.

5.5.1 Rotation Around Any Axis


By a suitable series connection of these three rotations around the
coordinate axes x, y and z (the orientation) in combination with suitable
translations (for the final position), a rotation around any axis can be
described by any angle $$\theta $$.
This arbitrary rotation axis is mathematically represented by a vector
$$v = (x, y, x)^T$$. After translating the starting position of the
vector v into the coordinate origin, the following steps will bring the vector
v collinearly onto, for example, the z-axis with suitable standard rotations
around the x-, y- and z-axis. Afterwards, a rotation around the z-axis takes
place with the angle $$\theta $$. The transformations carried out must
be reversed in the reverse order. The corresponding matrices are multiplied
inversely in this order. The resulting total matrix then describes a matrix for
the rotation around this rotation axis with the angle $$\theta $$. In the
following, this procedure is applied to an example.
First, a translation $$T(d_x,d_y,d_z)$$ must be applied, which
shifts the rotation axis so that it passes through the coordinate origin. Then
a rotation around the z-axis is performed so that the rotation axis lies in the
yz-plane. A subsequent rotation around the x-axis allows the rotation axis to
be mapped onto the z-axis. Now the actual rotation around the angle
$$\theta $$ is executed as rotation around the z-axis. Afterwards, all
auxiliary transformations must be undone, so that a transformation of the
following form results:
$$ T(-d_x,-d_y,-d_z) \cdot R_z(-\theta _z) \cdot R_x(-\theta _x) \cdot
R_z(\theta ) \cdot R_x(\theta _x) \cdot R_z(\theta _z) \cdot T(d_x,d_y,d_z).
$$
Note that the transformations are carried out from right to left, as with
matrix multiplication.
Between the translation $$T(dx, dy, dz) = (-x,-y,-z)^T$$ and the
back translation at the end, the selected rotations do not necessarily have to
be carried out to bring the rotation axis congruently onto the z-axis.
Moreover, not all rotations about the three main axes x, y and z must
necessarily be present. In the present example, an initial rotation around the
z-axis would also have brought the rotation axis into the yz-plane. Other
rotations around the x-, y- and z-axes are possible. Furthermore, a collinear
mapping of the rotation axis onto the x- or y-axis with the corresponding
matrices and rotation with the angle $$\theta $$ instead of the z-axis is
possible. Therefore, depending on the procedure, other rotation sequences
arise within this sequence.

5.6 Eulerian Angles and Gimbal Lock


Let’s assume that each copy of an object exists at two different positions
and different orientations. There is always an axis of rotation and an angle
with which these object variants can be transferred into each other. This
means that for each orientation of an object in space, there is an axis and a
rotation angle to the original placement of the object, assuming that its size
is not changed. This can also be interpolated. Therefore, a rotation requires
only the specification of a vector describing the axis of rotation and the
corresponding angle, which specifies the rotation around this axis of
rotation. The challenge is the efficient calculation of this axis and the
corresponding angle. As explained at the end of Sect. 5.5.1, suitable
geometric transformations can execute a rotation around any rotation axis in
three-dimensional space.
An interpolation is considered between the two positions and
orientations, e.g., for an animation. The Eulerian angles are used for
interpolation in the following. Eulerian angles indicate the orientation of an
object with respect to an arbitrary but fixed orthonormal base in three-
dimensional space. Rotations with Euler angles are defined by three
rotations around the three main Cartesian axes x with base (1, 0, 0), y with
base (0, 1, 0) and z with base (0, 0, 1). This corresponds to the approach of
the series-connected execution of the corresponding homogeneous rotation
matrices of the x-, y- and z-axis. This allows that any orientation and
position of the object can be realised by the series connection of the
rotations around the x-, y- and z-axis. This is also the case if the order of the
x-, y- and z-axes is reversed. Interestingly, this property continues as long as
the first axis is different from the second, and the second from the third.
This means that a rotation, e.g., first around the x-, then around the y-axis
and then around the x-axis can create any orientation and position of the
object.
The application of the Eulerian angle poses some challenges, which is
why today’s computer graphics systems use quaternions, which are
introduced later in this chapter. In order to better understand these
challenges, the change of rotations is further investigated. The following is
an analysis of how the orientation of the object behaves when, after
performing a rotation around the x-, y- and z-axis, e.g., an x-axis rotation is
applied. A rotational sequence is not commutative, this means that the
rotation sequence is essential, and the previous application influences the
rotation of the following one. If the series is reversed, it usually results in a
different orientation or position of the object. Let $$\theta _x$$,
$$\theta _y$$ and $$\theta _z$$ be angles. For example, first,
rotate about the y-axis ( $$R_y(\theta _y)$$), then about the x- (
$$R_x(\theta _x)$$) and then about the z-axis (
$$R_z(\theta _z)$$). In the following, this rotation sequence yxz with
the orthonormal total matrix
$$R_z(\theta _z)\cdot R_x(\theta _x) \cdot R_y(\theta _y)$$ is
considered in more detail. Strictly speaking, this means that only the
rotation around the y-axis with angle $$\theta _y$$ does not affect the
other rotations of the object. After applying angle $$\theta _y$$, the
object is rotated around the x-axis with angle $$\theta _x$$. After
that, the rotation around the z-axis with angle $$\theta _z$$ is
performed on the resulting orientation. These rotations are themselves
interpreted as actions on objects. Then the order of rotations to be
performed is a hierarchy in the form of a scene graph, as shown in
Fig. 5.20.

Fig. 5.20 Hierarchy and thus mutual influence of rotations around the axes x, y and z, here rotation
order $$y$$ $$x$$ $$z$$ with rotation total matrix
$$R_z(\theta _z) \cdot R_x(\theta _x) \cdot R_y(\theta _y)$$

Since the successively executed transformations require a change of


coordinate system each time, the corresponding coordinate system
transformation is started from the model coordinate system of the scaled
object. In the beginning, the model coordinate system is congruent with the
axis of the world coordinate system. After performing the transformations,
the model coordinate system, including the object, is in the correct position
concerning the world coordinate system. This results in two views:
The intrinsic view describes the transformations done from the
perspective of an observer coupled to the object, for example, a pilot in
an aircraft. Here, the model coordinate system changes as a reference
through coordinate system transformations.
The extrinsic view represents the transformations done from the point of
view of an observer in the world coordinates. This point of view is
independent and outside of the object. The world coordinate system
remains fixed as a reference.
These two different views differ only in the reverse order of their
transformations. The view of the rotation order yxz of the example in the
chosen rotation order is thus intrinsic. In this context, it makes sense to
consider the three coordinate system changes from the model to the world
coordinate system intrinsically with the rotation order yxz. Initially, both
coordinate systems are identical. The model coordinate system is fixed with
the object throughout. If a rotation around the y-axis takes place, the object
rotates around the y-axis of the model coordinate system. Thus, the x- and z-
axes of the model coordinate system rotate with it, while the x- and z-axes
of the world coordinate system remain in their original position. However,
the x- and z-axes continue to span the same xz-plane. If the object is
subsequently rotated around the x-axis of the model coordinate system, both
the object and the y- and z-axes of the model coordinate system rotate with
it. Thus, in the first rotation $$R_y(\theta _y)$$ also rotates. When
rotating the object around the z-axis, the x- and y-axis of the model
coordinate system rotate along with the object. Therefore, this rotation also
affects the result of the first two previous rotations. For this reason, Eulerian
angles are often thought of as rotating x-, y- and z-axes, with a hierarchy
like in a scene graph, as shown in Fig. 5.20.
The first rotation influences only itself, while each subsequent rotation
influences all previous ones, according to the rotation order yxz with the
total matrix
$$R_z(\theta _z) \cdot R_x(\theta _x) \cdot R_y(\theta _y)$$. At the
top of the hierarchy is the z-axis, then the x-axis, and at the lowest level is
the y-axis. If, after rotation around these three axes, the y-axis (of the model
coordinate system) is rotated again, the result of interpolation between these
steps is no longer intuitively apparent at the latest. This is the first
disadvantage of the Eulerian angles. Another disadvantage of Eulerian
angles in interpolation is the risk of a so-called gimbal lock. As long as all
three degrees of freedom of the orientation exist, the interpolation occurs in
a straight line. However, as soon as one of the degrees of freedom
disappears, at least two axes are involved in an interpolation since they
must compensate for the lost degree of freedom. In the animation of this
interpolation, it is performed curve-like, which can lead to challenges,
especially if this behaviour is not intuitively expected.
A degree of freedom can disappear if a rotation around the middle axis
in the hierarchy is carried out by $$90^{\circ }$$, and thus the lower
hierarchy level coincides with the upper one. Rotations around the upper
and lower axes then give the same result. Therefore, one degree of freedom
is missing, which must be compensated for by the middle and upper or
lower axes. The rotation around the middle axis with $$90^{\circ }$$
thus creates a gimbal lock. A gimbal lock can be created if, in the above
example, the y-axis of the model coordinate system maps onto the z-axis of
the model coordinate system by rotating around the x-axis (which is on the
middle hierarchical level) and thus restricts the freedom of movement by
one. The first rotation dimension of the y-axis disappears as it becomes
identical to rotations around the z-axis. Strictly speaking, in this
constellation, a further rotation around the y-axis does the same as a rotation
around the z-axis. Only the x-axis extends the degree of freedom. If the z-
axis is rotated, this rotation influences all previous rotations since the z-axis
is on the highest level of the hierarchy.
Of course, this lost degree of freedom can still be realised by rotating
the axes, but one needs all three axes to compensate for it (and not just one
without gimbal lock). However, if these three rotations are executed
simultaneously as animation, the model is not animated in a straight line in
the direction of the missing degree of freedom, but experiences a
curvilinear movement to the final position, since the interaction of at least
two axes is involved.
To prevent this, the axis on the middle level of the hierarchy is assigned
the frequently occurring rotations, while no rotations of
$$90^{\circ }$$ and more on the x-axis or the z-axis may be allowed.
This means that a suitable change in the hierarchy already prevents a
gimbal lock for the application example in question. As a result, the
animation, in this case, will be running straight. Nevertheless, a gimbal lock
cannot be avoided completely. It always occurs when the axis in the middle
hierarchy level is rotated by $$90^{\circ }$$ degrees and coincides
with the axis of the outer hierarchy level. Additionally, to make things
worse, the corresponding Euler angles cannot be calculated unambiguously
from a given rotation matrix since a rotation sequence that results in
orientation can also be realised by different Euler angles. In summary, it can
be said that the Euler angles can challenge the animator. In their pure form,
they are intuitively understandable, but the interaction increases the
complexity to such an extent that predicting the actual result is anything but
trivial.
This fact is demonstrated in the following by the example of an
animation of the camera movement. It is assumed in the following that the
hierarchy order xyz runs from low to high hierarchy level. In this case, the
rotation around the y-axis should not include a $$90^{\circ }$$ degree
rotation; otherwise, a gimbal lock would result. Assuming that the camera is
oriented in the direction of the z-axis and the y-axis runs vertically upwards
from the camera. Since the camera has to pan sideways to the left or right
very often, i.e., a yaw motion as a rotation around the y-axis, it could
achieve the $$90^{\circ }$$ degree rotation and thus the gimbal lock.
If a new rotation is then applied, for example, around the z-axis, then the
other axes will inevitably rotate as well since the z-axis is at the highest
level of the hierarchy. An animation from the initial position of the camera
to the current one will cause the camera to pan unfavourably because in this
animation the values of two axes are actually changed. Therefore, a
different hierarchical order is chosen. A camera rarely pans up or down, so
it is best to place the x-axis in the middle of the hierarchy. In addition, the
camera rarely rolls around, so that the rotation around the z-axis should be
placed in the lowest hierarchy level. This results in the rotation order zxy
with which the Euler angles for the animated camera movement can be
worked with more easily. In aviation, automotive engineering and shipping,
the hierarchical order is zyx and axis selection with the calculation of Euler
angles is precisely controlled (see DIN 70000, DIN 9300). When designing
in CAD, a car points in the direction of the x-axis, i.e., a roll movement
causes the rotation around the x-axis. This is simulated when, for example,
one drives over a pothole and the car has to sway a little. The axis that
protrudes vertically upwards from the vehicle is the z-axis, which triggers a
yaw movement as a rotation, if, for example, the vehicle is steered in a
different direction. The y-axis causes a pitching movement as rotation,
which is used in the animation to create a braking action. With a car, or
more clearly with an aeroplane, a gimbal lock is created when the vehicle
flies steeply up or down, which would be very unrealistic. In computer
games, a nodding motion of at least $$90^{\circ }$$ is not allowed.
At the beginning of the section, it is explained that an arbitrary rotation
only requires the specification of a vector (which describes the axis of
rotation) and the corresponding angle (which specifies the rotation around
this axis of rotation). This arbitrary rotation has thereafter been realised
with a rotation sequence of Eulerian angles, for example, with the rotation
sequence yxz and the total matrix
$$M = R_z(\theta _z) \cdot R_x(\theta _x) \cdot R_y(\theta _y)$$.
When M is further examined, it turns out that this orthonormal total matrix
has an eigenvalue of one. Consequently, there is guaranteed to be an
associated eigenvector that maps to itself when the matrix is multiplied by
M. M describes the arbitrary rotation. Therefore, a vector exists; this
arbitrary rotation does not influence that. However, since only the rotation
axis remains in its position, the eigenvector must be the rotation axis. The
rotation angle results from trigonometric considerations, for example, the x-
axis before and after the total rotation in relation to this rotation axis (see,
for example, [2] or the Rodrigues rotation formula for vectors). Thus, only
one rotation axis and the corresponding rotation angle are needed to obtain
the same result without a gimbal lock. Another advantage is that less
memory is required because instead of a total rotation matrix, only one
vector for the rotation axis and one angle must be stored. In large scene
graphs with many rotations, it is noticeable.

5.6.1 Quaternions
Quaternions work indirectly with the axis of rotation and the associated
angle. The application of quaternions, therefore, does not pose the same
challenges as the application of Eulerian angles. The theory behind them is
not trivial, but the application is intuitive. In addition, they also require less
memory to calculate the necessary data. Consequently, quaternions are used
in today’s computer graphics for rotations.
Quaternions are an extension of the theory of complex numbers in the
plane and are associated with geometric algebra. In order to understand
them, first a shift and rotation of the points in the complex plane are
considered. It is easy to see that a multiplication of two complex numbers
of length one causes a rotation in the complex plane, while an addition
causes a shift in the complex plane.
Consider the following example. Two complex numbers $$2+i$$
and $$3 +4i$$ are given. The multiplication of the two numbers gives
$$(2+i)(3+4i) = 2(3 + 4i) + i(3 + 4i) = 2(3 + 4i) + (3i - 4) = 2 + 11i$$.
The resulting vector (2, 11) of the complex plane results from the addition
of the double vector (3, 4) with its orthogonal vector $$(- 4, 3)$$ in
the complex plane. In Fig. 5.21, the resulting vector is visualised by this
addition.

Fig. 5.21 Multiplication of two complex numbers results in a rotation


It can be seen that the multiplication of the two complex numbers
$$2 + i$$ and $$3 +4i$$ is identical to a rotation with both x-
axis including angles $$\alpha $$ and $$\beta $$ of the
corresponding vectors (2, 1) and (3, 4). The length of the resulting vector
(2, 11) is identical to the product of the two lengths or amounts of the
corresponding vectors (2, 1) and (3, 4).
If we consider the addition of the two complex numbers
$$3 +4i$$ and $$2 + i$$, the complex number $$5 +5i$$
or (5, 5) is obtained as a vector in the complex plane, i.e., from the point of
view of vector notation, you get a real vector addition
$$(3,4) + (2 , 1) = (5,5)$$, which means that a displacement in the
form of the second vector (2, 1) is applied to the first (3, 4). This is shown
in Fig. 5.22.

Fig. 5.22 Multiplication of two complex numbers gives a rotation

These two properties, which become apparent in multiplication and


addition, are generally valid for complex numbers. William Rowan
Hamilton extended these properties with quaternions in 1843 with the only
difference being that commutativity is lost in quaternions.
This means that although $$a\cdot b=b\cdot a$$ is true for two
complex numbers a and b, this is not true for two quaternions a and b. The
quaternions are defined below. While $$a=x_{0}+ix_{1}$$ with
$$x_{0}$$ and $$x_{1}$$ are real and $$i^{2}=-1$$ the
complex numbers are defined, quaternions with
$$a=x_{0}+ix_{1}+jx_{2}+kx_{3}$$ with
$$x_{0},\dots , x_{3}$$ are real and
$$\begin{aligned} i^{2}=j^{2}=k^{2}=ijk=-1. \end{aligned}$$
It can be proved that
$$\begin{aligned} ij=k, jk=i, ki=j \end{aligned}$$
and that $$i(-i)=1$$ (analogous for j and k) and thus $$-i$$
represents the reciprocal of i (analogous for j and k). From this, it follows
directly that i, j and k are not commutative but negative commutative, i.e.,
$$\begin{aligned} ij = - ji \end{aligned}$$
(analogous for all unequal couples). If the respective number of real values
is compared as dimensions, then it is noticeable that the complex numbers
are two-dimensional ( $$x_{0},x_{1}\in \mathbb {R}$$) and the
quaternions are four-dimensional
$$x_{0},x_{1},x_{2},x_{3}\in \mathbb {R}$$ (in geometric algebra
scalar, vector, bivector and trivector). While the space of complex numbers
is called $$\mathbb {C}$$, the space of quaternions is called
$$\mathbb {H}$$.
With the above rules, quaternions can be formed again by any
multiplication and addition of quaternions, whereby the properties
regarding translation and rotation are inherited according to the complex
numbers. Similar to the complex numbers, the calculations are performed,
but this time based on the corresponding rules. For clarification, an example
of use is considered below. The multiplication of the two quaternions
$$1+2i-j+3k$$ and $$-1-2i+j+k$$ is considered.
By simple multiplication and observance of the order i, j and k (not
commutative),
$$\begin{aligned} (1+2i-j+3k)(-1-2i+j+k) \end{aligned}$$
$$\begin{aligned} =-1-2i+j-3k-2i-4i^{2}+2ji-6ki+j+2ij-
j^{2}+3kj+k+2ik-jk+3k^{2} \end{aligned}$$
$$\begin{aligned} =-1-2i+j-3k-2i+4-2ij-6j+j+2k+1-3i+k-2j-i-3
\end{aligned}$$
$$\begin{aligned} =-1-2i+j-3k-2i+4-2k-6j+j+2k+1-3i+k-2j-i-3
\end{aligned}$$
$$ =1-8i-6j-2k. $$
Now the question arises of what exactly these quaternions mean
geometrically. Based on the representations in geometric algebra, the
quaternions are represented as vectors of four components. The example
therefore contains the two initial vectors $$(1,2,-1,3)^{T}$$ and
$$(-1,-2,1,1)^{T}$$ and the result vector $$(3,-4,-8,2)^{T}$$.
Note the analogy to the complex numbers, which can also be represented as
vectors of two components. The length of a quaternion
$$(x_{0},x_{1},x_{2},x_{3})^{T}$$ is defined as
$$\sqrt{x_{0}^{2}+x_{1}^{2}+x_{2}^{2}+x_{3}^{2}}$$. For the
vectors in the example, this means that $$(1,2,-1,3)^{T}$$ has the
length $$\sqrt{15}$$, $$(-1,-2,1,1)^{T}$$ has the length
$$\sqrt{7}$$ and $$(1,-8,-6,-2)^{T}$$ has the length
$$\sqrt{105}$$. Multiplication of two quaternions of length one
causes a rotation. An addition results in a translation. It is astonishing that
this rotation can be read directly from the result vector, as explained in the
following. The vector $$(x_{0},x_{1},x_{2},x_{3})^{T}$$ can be
thought of as a two-part vector. Under the constraint that it has the length
one, roughly spoken $$(x_{0},x_{1},x_{2},x_{3})^{T}$$ represents
the cosine of the rotation angle and $$(x_{1},x_{2},x_{3})^{T}$$
the corresponding rotation axis.
This means that what initially proved to be very difficult to calculate
under the usual analytical tools—to calculate an axis of rotation with the
angle of rotation—and therefore the detour via the Euler angles was
accepted, now turns out to be very elegant when using the quaternions.
If we assume that the length of the quaternion q is one, then the
following relationships apply to rotations about an axis. The rotation angle
is $$\delta =2\arccos (x_{0})$$ and the rotation axis a of length one
gives
$$ \left( \begin{array}{c} a_{x}\\ a_{y}\\ a_{z} \end{array}\right) =
\left( \begin{array}{c}\frac{x_1}{\sin (\frac{\delta }{2})} \\ \frac{x_2}
{\sin (\frac{\delta }{2})} \\ \frac{x_{3}}{\sin (\frac{\delta }{2})}
\end{array}\right) .$$
Thus, for the corresponding quaternion q,
$$\left( \begin{array}{c} x_{0}\\ x_{1}\\ x_{2}\\
x_{3}\end{array}\right) = \left( \begin{array}{c}\cos (\frac{\delta }{2})\\
a_{x}\cdot \sin (\frac{\delta }{2})\\ a_{y}\cdot \sin (\frac{\delta }{2})\\
a_{z}\cdot \sin (\frac{\delta }{2})\end{array}\right) . $$
In the example, the normalised result vector is
$$\frac{1}{\sqrt{105}}(1,-8,-6,-2)^{T}$$. Therefore,
$$x_{0} = \frac{1}{\sqrt{105}} = \cos (\delta )$$. This results in
approximately $$84,4^\circ $$ for the angle $$\delta $$. The
axis of rotation is thus
$$\frac{1}{\sqrt{105}}(-8,-6,-2)^{T}=-\frac{2}{\sqrt{105}}
(4,3,1)^{T}=(a_x\cdot \sin (84.4^{\circ },a_y\cdot \sin (84.4^{\circ
}),a_z\cdot \sin (84.4^{\circ }))^{T}$$
. The axis of rotation is thus
$$(a_x,a_y,a_z)^{T}=(-0.79,-0.59,-0.2)^{T}$$.
The rotation matrix R resulting from a quaternion q looks like this
$$R \; = \; \left( \begin{array}{cccc} (x_{0}^2+x_{1}^2-x_{2}^2-
x_{3}^2) &{} 2(x_{1}x_{2}-x_{0}x_{3}) &{}
2(x_{1}x_{3}+x_{0}x_{2}) \\ 2(x_{1}x_{2}+x_{0}x_{4}) &{}
(x_{0}^2-x_{1}^2+x_{2}^2-x_{3}^2) &{} 2(x_{2}x_{3}-
x_{0}x_{1}) \\ 2(x_{1}x_{4}-x_{0}x_{2}) &{}
2(x_{2}x_{3}+x_{0}x_{1}) &{} (x_{0}^2-x_{1}^2-
x_{2}^2+x_{3}^2)\\ \end{array} \right) . $$
If we assume that the quaternion q has the length one and thus
$$x_{0}^{2}+x_{1}^{2}+x_{2}^{2}+x_{3}^{2}=1$$, this matrix
can be simplified to
$$R \; = \; \left( \begin{array}{cccc} (1-2x_{2}^2-2x_{3}^2) &{}
2(x_{1}x_{2}-x_{0}x_{3}) &{} 2(x_{1}x_{3}+x_{0}x_{2}) \\
2(x_{1}x_{2}+x_{0}x_{4}) &{} (1-2x_{1}^2-2x_{3}^2) &{}
2(x_{2}x_{3}-x_{0}x_{1}) \\ 2(x_{1}x_{4}-x_{0}x_{2}) &{}
2(x_{2}x_{3}+x_{0}x_{1}) &{} (1-2x_{1}^2-2x_{2}^2)\\
\end{array} \right) . $$
If we look at the multiplication of two quaternions from this point of view
$$q=(x_{0},x_{1},x_{2},x_{3})^{T}$$ and
$$q'=(x'_{0},x'_{1},x'_{2},x'_{3})^{T}$$ so
$$\begin{aligned} q\cdot q' = \left( \begin{array}{c} x_{0}x'_{0} -
x_{1}x'_{1} - x_{2}x'_{2} - x_{3}x'_{3} \\ x_{0}x'_{1} + x'_{0}x_{1}
+x_{2}x'_{3} - x_{3}x'_{2}\\ x_{0}x'_{2} + x_{2}x'_{0} + x_{3}x'_{1} -
x_{1}x'_{3}\\ x_{0}x'_{3} + x_{3}x'_{0} + x_{1}x'_{2} - x_{2}x'_{1}
\end{array}\right) . \end{aligned}$$
Obviously
$$\begin{aligned} q\cdot q' = \left( \begin{array}{cc} x_{0}x'_{0} -
(x_{1},x_{2},x_{3})\left( \begin{array}{c} x'_{1}\\ x'_{2}\\
x'_{3}\end{array}\right) \\ x_{0}\left( \begin{array}{c} x'_{1}\\ x'_{2}\\
x'_{3}\end{array}\right) + x'_{0}\left( \begin{array}{c} x_{1}\\ x_{2}\\
x_{3}\end{array}\right) + \left( \begin{array}{c} x_{1}\\ x_{2}\\
x_{3}\end{array}\right) \times \left( \begin{array}{c} x'_{1}\\ x'_{2}\\
x'_{3}\end{array}\right) \end{array}\right) \end{aligned}$$
where quaternion results, i.e., the first line represents the angle and the
second line the (three-dimensional) axis of rotation of the total rotation.
Note the cross product in the calculation. Quaternions can be used in the
same way as homogeneous rotation matrices to concatenate rotations
connected in series. For the inverse of a quaternion q
$$q^{-1}=(x_{0}, -x_{1},-x_{2},-x_{3})^{T}$$ where the
quaternion $$1 = (1, 0, 0, 0)^{T}$$ forms the neutral element.
If one wants to rotate a point $$P=(x,y,z)$$ with a given
quaternion q, this point must first be transformed into a quaternion
$$(0, x, y, z)^{T}$$. Then the rotation is performed in the following
way, where the resulting point $$P'=(x',y',z')$$ follows
$$\begin{aligned} (0, x', y', z')^{T} = q (0, x, y, z)^{T} q^{-1}.
\end{aligned}$$
Therefore, if the rotation axis is a three-dimensional vector
$$\boldsymbol{a}$$ and the rotation angle $$\delta $$ is given,
then
$$q = \left( \begin{array}{c} \cos (\frac{\delta }{2})\\ \sin
(\frac{\delta }{2})\boldsymbol{a}\end{array}\right) $$
. The last three components extracted from the quaternion q correspond to
the point P’ in Cartesian coordinates.
If, on the other hand, there is no axis of rotation, but one object must be
transferred to another, the axis of rotation and the angle must be calculated.
Assume that a vector $$\boldsymbol{v}$$ is to be transferred into
another vector $$\boldsymbol{v'}$$. Then the axis of rotation must
be perpendicular to both vectors, which is calculated using the cross
product of the two vectors:
$$\boldsymbol{n}=\boldsymbol{v}\times \boldsymbol{v'}$$. If
applicable, the resulting axis must then be normalised if the length is not
equal to one. The angle of rotation corresponds to the angle between the
two vectors and can, therefore, be determined by the scalar product of the
two vectors, since
$$\boldsymbol{v}^{T}\boldsymbol{v'} = \cos (\delta )$$ applies.
Again the quaternion
$$q= \left( \begin{array}{c} \cos (\frac{\delta }{2})\\ \sin
(\frac{\delta }{2})\boldsymbol{n}\end{array}\right) $$
performs the rotation.
To sum up, it can be said that quaternions contain the same information
as homogeneous matrices. With quaternions, there is no gimbal lock. In
addition, memory space is saved because a quaternion takes up only four
memory locations, whereas $$4\cdot 4 = 16$$ memory locations are
required for each homogeneous matrix.
The following rules follow the Euler angles $$\alpha $$,
$$\beta $$ and $$\gamma $$ and quaternions
$$q=(x_{0},x_{1},x_{2},x_{3})^{T}$$ where q still has the length
one, i.e., $$x_{0}^{2}+x_{1}^{2}+x_{2}^{2}+x_{3}^{2}=1$$. The
Euler angles $$\alpha $$, $$\beta $$ and $$\gamma $$
can be calculated from a quaternion as follows:
$$\begin{aligned} \alpha = \arctan \left(
\frac{2(x_{1}x_{2}+x_{0}x_{3})}{x_{0}^2+x_{1}^2-x_{2}^2-
x_{3}^2}\right) \end{aligned}$$
$$\begin{aligned} \beta = \arcsin (2(x_{0}x_{3}-x_{1}x_{3}))
\end{aligned}$$
$$\begin{aligned} \gamma = -\arctan \left(
\frac{2(x_{2}x_{3}+x_{0}x_{1})}{-(x_{0}^2-x_{1}^2-
x_{2}^2+x_{3}^2)}\right) . \end{aligned}$$
Note that it is also possible to calculate the quaternion from the Eulerian
angles. For this purpose, we refer, for example, to [2].

5.7 Clipping Volume


The representation of the section of a three-dimensional model world
that a viewer can see requires a number of details about how the viewer’s
look is to be represented in the model world. The coordinates of the point in
which the observer is located and the direction in which he looks must be
specified. However, this information is not yet sufficient. The projection
plane must be specified. It corresponds to the plane of the display medium,
usually the plane of the screen with which the viewer sees the scene as if
through a window. The screen or the display window on the screen can only
show a finite section of the projection plane. Usually, this is a rectangular
cutout. Instead of specifying a corresponding rectangle on the projection
plane, an angle is often specified that defines the viewer’s field of view.
This angle defines how far the observer’s field of view opens to the left and
right. This results in width on the projection plane that corresponds to the
width of the display window on the screen. The height of the area on the
projection plane is selected in proportion to the height of the display
window. Figure 5.23 shows a top view of the observer’s field of view.

Fig. 5.23 The field of view angle determines the area on the projection plane that corresponds to the
window width

In principle, this information is sufficient for clipping calculation. The


three-dimensional clipping area—the clipping volume—corresponds to a
pyramid of infinite height in the case of perspective projection or to a
cuboid extending infinitely in one direction in the case of parallel
projection. The visibility of a human being is theoretically almost
unlimited. One can see stars light-years away as well as the frame of a pair
of glasses right in front of the eyes. However, when seeing, the eyes are
focussed at a distance, so that it is not possible to see a very close and a
very distant object in focus at the same time. If, for example, one fixates on
a distant object and then holds a finger relatively close in front of one eye,
one hardly sees this finger. Conversely, when reading a book, one does not
notice how a bird flies past in the distance. Therefore, the range of vision
that can be seen in a blind eye usually extends from a certain minimum to a
certain maximum distance. This property is modelled in computer graphics
by defining a front (near) and a rear (far) clipping plane. In a perspective
projection, the clipping volume thus takes the shape of a truncated pyramid,
while a parallel projection provides a cuboid as clipping volume. The
projection plane is at the distance at which the eyes of the viewer are
optimally focussed. The near and far clipping planes correspond to the
smallest and largest distance at which the viewer can still perceive objects
when focussing on the projection plane. The projection plane is usually
between the near and far clipping planes. For the sake of simplicity, the
projection plane is assumed identical to the near clipping level in the
considerations of this book. Objects that lie in front of the projection plane
should give the viewer the impression that they are in front of the screen.
However, this effect can only be achieved with techniques that support the
stereoscopic vision, which is discussed in the Sect. 11.12. The relationship
between clipping volume, near and far clipping plane, projection plane and
projection type is shown in Fig. 5.24.

Fig. 5.24 The clipping volume for the parallel projection (top) and perspective projection (bottom)

In Sect. 5.8, it has already been explained how each projection can be
split into a transformation and a subsequent projection onto an image plane
parallel to the xy-plane. Therefore, the three-dimensional clipping can be
easily and efficiently implemented. First, the transformation T is applied to
all objects. If the objects transformed in this way are mapped to the xy-
plane by means of parallel projection, the transformed clipping volume
corresponds to a cuboid whose edges are parallel to the coordinate axes.
This cuboid can be defined by two diagonally opposite corners
$$(x_{\text{ left }},y_{\text{ bottom }},-z_{\text{ near }})$$ and
$$(x_{\text{ right }},y_{\text{ top }},-z_{\text{ far }})$$. To
determine whether an object lies within this clipping volume, it is only
necessary to check whether at least one point $$(x',y',z')$$ of the
object lies within the box. This is exactly the case if
$$ x_{\text{ left }} \le x' \le x_{\text{ right }} \text{ und } y_{\text{
bottom }} \le y' \le y_{\text{ right }} \text{ und } -z_{\text{ near }} \le z'
\le -z_{\text{ far }} $$
applies. Therefore, for a plane polygon, these comparisons only need to be
performed for all corners to determine whether the polygon lies within the
clipping volume. If the objects transformed in this way are mapped by
means of perspective projection onto the xy-plane, the transformed clipping
volume corresponds to the truncated pyramid. Again, there is a near
clipping and a far clipping level, and the frustum in between.
5.8 Orthogonal and Perspective Projections
In Sect. 5.2, transformations have been used to transform geometric objects
into position or move a scene. To display a three-dimensional scene, a
projection is required on a two-dimensional plane, which corresponds to the
computer monitor, for example. These projections can also be realised by
means of geometric transformations.
When displaying a three-dimensional scene on a common output
device, such as a 2D screen, a viewer’s point of view and a projection plane
must be defined. The viewer looks in the direction of the projection plane,
which has a window with a view of the three-dimensional world. The
projection of an object on this plane is obtained by connecting rays emitted
from a projection centre around the point of view corresponding to the
observer’s point of view to the points of the object to be projected and
calculating the points of impact in the projection plane. Only the points are
projected (no surfaces). This procedure, illustrated in Fig. 5.25 on the left, is
called perspective projection.

Fig. 5.25 Perspective (left) and parallel projection (right)

If the centre of projection is moved further and further perpendicular to


the plane of projection in the opposite direction, the result is an infinite
parallel projection beam. If such a projection direction is given instead of a
projection centre, it is called parallel projection. In this case, the projection
is of an object by letting rays parallel to the direction of projection emanate
from the points of the object and calculating the points of intersection with
the projection plane. Usually, the projection direction is perpendicular to the
projection plane. The parallel projection is shown on the right in Fig. 5.25.
At first, the parallel projection with the plane $$z=z_0$$ as a
projection plane that is, the projection plane is parallel to the xy-plane. With
this parallel projection, the point (x, y, z) is mapped to the point
$$( x,y,z_0)$$. In the homogeneous coordinates, this figure can be
represented as matrix multiplication as follows:
(5.6)
$$\begin{aligned} \left( \begin{array}{c} x\\ y\\ z_0\\ 1 \end{array}
\right) \; = \; \left( \begin{array}{cccc} 1 &{} 0 &{} 0
&{} 0\\ 0 &{} 1 &{} 0 &{} 0\\ 0 &{} 0
&{} 0 &{} z_0\\ 0 &{} 0 &{} 0 &{} 1
\end{array} \right) \cdot \left( \begin{array}{c} x\\ y\\ z\\ 1
\end{array} \right) . \end{aligned}$$
This allows any parallel projection to be described in the form of matrix
multiplication in homogeneous coordinates. If the projection is to be made
on a plane that is not parallel to the xy-plane, only a corresponding
transformation must be connected in front of the projection matrix in Eq.
(5.6), which maps the given projection plane on a plane parallel to the xy-
plane. This can always be achieved by a rotation around the y-axis and a
subsequent rotation around the x-axis, as shown in Fig. 5.26.

Fig. 5.26 Mapping of any plane to a plane parallel to the xy-plane

To understand parallel projection, it is therefore sufficient to consider


only projection planes parallel to the xy-plane. In the case of another
projection plane, one can instead imagine a transformed scene in the same
way, which in turn is then projected onto a plane parallel to the xy-plane.
The parallel projection in the OpenGL maps the visual volume onto a
unit cube whose length, width and height are in the interval
$$[-1,1]$$ (see Fig. 5.27). In this illustration, there is a change from
the right-handed to the left-handed coordinate system, i.e., the z-values are
negated.

Fig. 5.27 Calculation of the orthogonal projection in the OpenGL

To achieve this, the values $$a_{1}$$, $$a_{2}$$,


$$a_{3}$$, $$b_{1}$$, $$b_{2}$$ and $$b_{3}$$
must be calculated in the following matrix.
(5.7)
$$\begin{aligned} \left( \begin{array}{cccc} a_{1} &{} 0
&{} 0 &{} b_{1}\\ 0 &{} a_{2} &{} 0 &{}
b_{2}\\ 0 &{} 0 &{} -a_{3} &{} b_{3}\\ 0 &{} 0
&{} 0 &{} 1 \end{array} \right) \cdot \left( \begin{array}
{c} x\\ y\\ z\\ 1 \end{array} \right) = \left( \begin{array}{c}
a_{1}\cdot x + b_{1}\\ a_{2}\cdot y + b_{2}\\ -a_{3}\cdot z + b_{3}\\
1 \end{array} \right) = \left( \begin{array}{c} x_{\text {c}}\\ y_{\text
{c}}\\ z_{\text {c}}\\ 1 \end{array} \right) \end{aligned}$$
Each component is calculated individually in the following.
The following dependencies apply to the x-coordinate.
1.
, : ,
2.
, : .
The first equation is solved after $$b_{1}$$, resulting in
$$b_{1}=-1-a_{1}\cdot x_{\text {left}}$$ and inserted into the
second equation as follows:
$$a_{1}\cdot x_{\text {right}} + (-1-a_{1}\cdot x_{\text {left}}) = 1
\Leftrightarrow a_{1}\cdot x_{\text {right}} -a_{1}\cdot x_{\text {left}} =
2 \Leftrightarrow a_{1}\cdot ( x_{\text {right}} -x_{\text {left}}) = 2$$
$$\begin{aligned} \Leftrightarrow a_{1} = \frac{2}{x_{\text {right}} -
x_{\text {left}}}. \end{aligned}$$
For $$b_{1}$$, this results in
$$\begin{aligned} -1-(\frac{2}{x_{\text {right}} -x_{\text {left}}}) \cdot
x_{\text {left}}=\frac{x_\text {left} -x_{\text {right}}-2\cdot x_{\text
{left}}}{x_{\text {right}} -x_{\text {left}}}=-\frac{x_{\text {right}}+
x_{\text {left}}}{x_{\text {right}} -x_{\text {left}}}. \end{aligned}$$
For the y-component, analogously,
$$a_{2}=\frac{2}{y_{\text {top}} -y_{\text {bottom}}}$$ und
$$ b_{2}=-\frac{y_{\text {top}}+ x_{\text {bottom}}}{y_{\text
{top}} -y_{\text {bottom}}}$$
.
The following dependencies apply to the z-component.
1. $$z\mapsto -z_{\text {near}}$$, $$z_{\text {c}}\mapsto -1$$:
$$(-a_{3}\cdot z + b_{3}=z_{\text {c}}) \mapsto (a_{3}\cdot
z_{\text {near}} + b_{3}=-1)$$
,
2.
$$z\mapsto -z_{\text {far}}$$, $$z_{\text {c}}\mapsto 1$$:
$$(-a_{3}\cdot z + b_{3}=z_{\text {c}}) \mapsto (a_{3}\cdot
z_{\text {far}} + b_{3} = 1)$$
.
It should be noted that $$z_{\text {near}}$$ and
$$z_{\text {far}}$$ are both positive.
Analogously, follows
$$a_{3} = \frac{2}{z_{\text {far}} - z_{\text {near}}}$$ and
$$b_{3} = - \frac{z_{\text {far}} + z_{\text {near}}}{z_{\text {far}}
- z_{\text {near}}}$$
.
The final results for the orthogonal projection in the OpenGL are as
follows:
$$\begin{aligned} \left( \begin{array}{cccc} \frac{2}{x_{\text
{right}} -x_{\text {left}}} &{} 0 &{} 0 &{} -
\frac{x_{\text {right}}+ x_{\text {left}}}{x_{\text {right}} -x_{\text
{left}}}\\ 0 &{} \frac{2}{y_{\text {top}} -y_{\text {bottom}}}
&{} 0 &{} -\frac{y_{\text {top}}+ x_{\text {bottom}}}
(5.8)
{y_{\text {top}} -y_{\text {bottom}}}\\ 0 &{} 0 &{} -
\frac{2}{z_{\text {far}} - z_{\text {near}}} &{} -\frac{z_{\text
{far}} + z_{\text {near}}}{z_{\text {far}} - z_{\text {near}}}\\ 0
&{} 0 &{} &{} 1 \end{array} \right) \cdot \left(
\begin{array}{c} x\\ y\\ z\\ 1 \end{array} \right) . \end{aligned}$$
Perspective projection can also be consider in homogeneous coordinates as
matrix–vector multiplication. For this purpose, a perspective projection
with a projection centre at the coordinate origin and a projection point
facing the (x, y)-plane at $$z=z_0$$, as shown in Fig. 5.28. The point
(x, y, z) is projected onto a point $$(x',y',z_{0})$$ on this projection
plane as shown below.
Fig. 5.28 Calculation of the perspective projection

The ray theorem results in


$$ \frac{x'}{x} \; = \; \frac{z_0}{z} \text{ and } \frac{y'}{y} \; = \;
\frac{z_0}{z} $$
respectively
$$ x' \; = \;\frac{z_0}{z} \cdot x \text{ and } y' \; = \;\frac{z_0}{z}\cdot
y. $$
This perspective projection thus maps the point (x, y, z) to the point
$$\begin{aligned} (x',y',z_0) \; = \; \left( \frac{z_0}{z} \cdot x,
(5.9)
\frac{z_0}{z}\cdot y, z_0\right) . \end{aligned}$$
In the homogeneous coordinates, this mapping can be written as follows:
$$\begin{aligned} \left( \begin{array}{c} x'\cdot z\\ y'\cdot z\\
z_{0}\cdot z\\ z \end{array} \right) \; = \; \left( \begin{array}{c}
x\cdot z_{0}\\ y\cdot z_{0}\\ z\cdot z_{0}\\ z \end{array} \right) \; =
\; \left( \begin{array}{cccc} z_{0} &{} 0 &{} 0 &{}
(5.10)
0\\ 0 &{} z_{0} &{} 0 &{} 0\\ 0 &{} 0 &{}
z_{0} &{} 0\\ 0 &{} 0 &{} 1 &{} 0 \end{array}
\right) \cdot \left( \begin{array}{c} x\\ y\\ z\\ 1 \end{array} \right) .
\end{aligned}$$
If the resulting point is represented in Cartesian coordinates, i.e., if the first
three components are divided by the fourth component, the searched point
(5.9) is obtained.
The matrix for the perspective projection in Eq. (5.10) does not have as
last row (0, 0, 0, 1), as all other matrices discussed so far. Therefore, the
result, in this case, is not available in normalised homogeneous coordinates.
Analogous to parallel projection, the special choice of the coordinate
origin as the projection centre with a projection plane parallel to the xy-
plane does not represent a real limitation in the case of perspective
projection. Every perspective projection can be assigned to this special
perspective projection. To do this, one moves the projection centre to the
coordinate origin with the help of a transformation. Then, just like in
parallel projection, the projection plane can be mapped to a plane parallel to
the xy-plane by two rotations.
If the perspective projection is mirrored at the xy-plane, i.e., the
eyepoint remains at the coordinate origin and the projection centre changes
from positive $$z_{0}$$- to negative $$-z_{0}$$-coordinate,
this case can easily be derived from the previous one and the corresponding
matrix can be set up
$$\begin{aligned} \left( \begin{array}{c} x'\cdot (-z)\\ y'\cdot (-z)\\
z_{0}\cdot z\\ -z \end{array} \right) \; = \; \left( \begin{array}{c}
x\cdot z_{0}\\ y\cdot z_{0}\\ z\cdot z_{0}\\ -z \end{array} \right) \; =
\; \left( \begin{array}{cccc} z_{0} &{} 0 &{} 0 &{}
(5.11)
0\\ 0 &{} z_{0} &{} 0 &{} 0\\ 0 &{} 0 &{}
z_{0} &{} 0\\ 0 &{} 0 &{} -1 &{} 0 \end{array}
\right) \cdot \left( \begin{array}{c} x\\ y\\ z\\ 1 \end{array} \right) .
\end{aligned}$$
It has already been mentioned that every perspective projection can be
traced back to a perspective projection with a projection centre at the
coordinate origin with a projection plane parallel to the xy-plane. To
understand perspective transformations, it is therefore sufficient to examine
the perspective projection in Eq. (5.11). All other perspective projections
can be understood as this perspective projection with an additional
preceding transformation of the object world.
A point of the form $$\left( 0,0,\frac{1}{w}\right) $$ with
$$w\in {\text{I}{\!}\text{R} }$$, $$w\ne 0$$ is considered. In
the homogeneous coordinates, this point can be written in the form
$$\left( 0,0,\frac{1}{w},1\right) $$. This point is written by matrix
multiplication (5.11) to the point
$$\left( 0,0,-\frac{z_{0}}{w},\frac{1}{w}\right) $$ in homogeneous
coordinates. In Cartesian coordinates, this means
$$ \left( 0,0,\frac{1}{w}\right) \; \mapsto \; \left( 0,0,-z_{0} \right) . $$
If one lets the parameter w go towards zero, the starting point
$$\left( 0,0,\frac{1}{w}\right) $$ moves towards infinity on the z-
axis, while the pixel converts towards the finite point
$$(0,0,-z_{0})$$. The infinitely distant object point on the negative z-
axis is thus converted to a concrete and non-infinite pixel that is displayed.
If one considers all lines, which run through
$$\left( 0,0,\frac{1}{w}\right) $$, the lines transformed by the matrix
multiplication (5.11) intersect in the image in the homogeneous point
$$\left( 0,0,-\frac{z_{0}}{w},\frac{1}{w}\right) $$. If one lets w
again go towards zero, the set of lines changes into the lines parallel to the
z-axis, which do not intersect. However, the transformed lines in the image
all intersect at the point $$(0,0,-z_{0})$$. This point is also called
vanishing point.
These theoretical calculations prove the well-known effect that in a
perspective projection, parallel lines running away from the viewer intersect
at one point, the vanishing point. Figure 5.29 shows a typical example of a
vanishing point, actually parallel iron rails that appear to converge.

Fig. 5.29 Vanishing point for perspective projection

Only for the lines running to the rear, parallelism is not maintained in
this perspective projection. Lines that are parallel to the x- or y-axis also
remain according to the projection parallel. If a projection plane is selected
that intersects more than one coordinate axis, multiple escape points are
obtained. If the projection plane intersects two or even all three coordinate
axes, two or three vanishing points are obtained. These are then referred to
as two or three-point vanishing points. Figure 5.30 illustrates the effects of
displaying a cube with one, two or three vanishing points.

Fig. 5.30 One-, two- and three-point perspective

It has been shown that any projection of a scene can always be


represented with the standard transformations with subsequent projection
according to Eq. (5.11). A change of the observer’s point of view
corresponds to a change of the transformations which are carried out before
this projection. In this sense, a movement of the observer can also be
interpreted in such a way that the observer remains unchanged, infinitely far
away, since a parallel projection is used. Instead of the observer’s
movement, the entire scene is subjected to a transformation that
corresponds exactly to the observer’s inverse movement. To model the
viewer’s movements, therefore, only the transformation group of the entire
scene needs to be extended by a corresponding transformation. For
example, a rotation of the observer to the left is realised by a rotation of the
entire scene to the right.
Again, place the camera as an eyepoint in the coordinate origin with the
direction of view in the negative z-axis. In the following, a further special
case of a perspective projection is used in the OpenGL fixed-function
pipeline. This projection additionally calculates in one step the visible
volume, the so-called clipping volume, which is explained in more detail in
Sect. 5.7. This is the area to be displayed in which the visible objects are
located. Rendering must only be carried out for these objects.
In the following Sect. 5.9, this case is explained in more detail
according to the OpenGL situation.

5.9 Perspective Projection and Clipping Volume


in the OpenGL
In OpenGL, the clipping volume is limited by means of perspective
projection between the near and far clipping plane. For simplicity, the near
clipping plane corresponds to the image plane in the following. Figures 5.31
and 5.32 visualise this situation. In [3], the perspective projection is treated
in more detail with the image plane not identical to the near clipping plane
and is recommended for deepening. Note the similarity of the procedures
for determining the corresponding transformation matrices.

Fig. 5.31 Calculating the perspective projection in the OpenGL

Fig. 5.32 Calculating the perspective projection: Camera coordinate system in the OpenGL with U,
V and N base vectors

The lower-left corner of the near clipping plane has the coordinates
$$(x_{\text {left}},y_{\text {bottom}},-z_{\text {near}})$$ and the
upper-right corner
$$(x_{\text {right}},y_{\text {top}}, -z_{\text {near}})$$. The far
clipping plane has the z-coordinate $$-z_{\text {far}}$$. All objects
that are inside the frustrum should be mapped into the visible area of the
screen. OpenGL makes use of an intermediate step by mapping the vertices
of the objects from the frustrum of the visible volume of the pyramid into a
unit cube. This unit cube has the characteristic $$[-1,1]$$ in x-, y- and
z-direction. This means that the frustrum has to be transferred into this unit
cube by a suitable matrix.
It applies to object points (x, y, z) and their image points
$$(x',y',-z_{\text {near}})$$:
$$x_{\text {left}}\le x'\le x_{\text {right}} \Leftrightarrow 0\le x'-
x_{\text {left}}\le x_{\text {right}}-x_{\text {left}} \Leftrightarrow 0\le
\frac{x'-x_{\text {left}}}{x_{\text {right}}-x_{\text {left}}}\le 1
\Leftrightarrow $$
$$ 0\le 2\frac{x'-x_{\text {left}}}{x_{\text {right}}-x_{\text {left}}}\le
2 \Leftrightarrow -1\le 2\frac{x'-x_{\text {left}}}{x_{\text {right}}-
x_{\text {left}}}-1\le 1 \Leftrightarrow $$
$$ -1 \le 2\frac{x'-x_{\text {left}}}{x_{\text {right}}-x_{\text {left}}}
{-}\frac{x_{\text {right}}-x_{\text {left}}}{x_{\text {right}}-x_{\text
{left}}}{\le } 1\Leftrightarrow -1\le \frac{2x'}{x_{\text {right}}-x_{\text
{left}}}-\frac{x_{\text {right}}+x_{\text {left}}}{x_{\text {right}}-
x_{\text {left}}}\le 1 \Leftrightarrow $$
$$-1\le \frac{2\cdot z_{\text {near}}\cdot x}{-z(x_{\text {right}}-
x_{\text {left}})}{-}\frac{x_{\text {right}}{+}x_{\text {left}}}{x_{\text
{right}}-x_{\text {left}}}\le 1 \text {~mit~} \frac{x'}{x} = \frac{-z_{\text
{near}}}{z} \text {~from~the~ray~theorem} $$
$$\begin{aligned} \Leftrightarrow -1 \le \left( \frac{2\cdot z_{\text
{near}}}{x_{\text {right}}-x_{\text {left}}}\cdot x+\frac{x_{\text
{right}}+x_{\text {left}}}{x_{\text {right}}-x_{\text {left}}}\cdot z\right)
\cdot \frac{1}{-z}\le 1. \end{aligned}$$
Analogously, one can conclude from
$$y_{\text {bottom}}\le y'\le y_{\text {top}}$$ folgern
$$\begin{aligned} -1 \le \left( \frac{2\cdot z_{\text {near}}}{y_{\text
{top}}-y_{\text {bottom}}}\cdot y+\frac{y_{\text {top}}+y_{\text
{bottom}}}{y_{\text {top}}-y_{\text {bottom}}}\cdot z\right) \cdot
\frac{1}{-z}\le 1. \end{aligned}$$
Accordingly, the following applies to the standardised clip coordinates in
the Cartesian representation:
$$\begin{aligned} x_{\text {c}}= \left( \frac{2\cdot z_{\text {near}}}
{x_{\text {right}}-x_{\text {left}}}\cdot x+\frac{x_{\text
{right}}+x_{\text {left}}}{x_{\text {right}}-x_{\text {left}}}\cdot z\right)
\cdot \frac{1}{-z} \end{aligned}$$
and
$$\begin{aligned} y_{\text {c}} = \left( \frac{2\cdot z_{\text {near}}}
{y_{\text {top}}-y_{\text {bottom}}}\cdot y+\frac{y_{\text
{top}}+y_{\text {bottom}}}{y_{\text {top}}-y_{\text {bottom}}}\cdot
z\right) \cdot \frac{1}{-z}. \end{aligned}$$
What remains is the calculation of the normalised clip coordinates in the z-
direction and thus the entries $$a_{1}$$ and $$a_{2}$$ in the
following matrix with homogeneous coordinates
$$\begin{aligned} \left( \begin{array}{c} x_{\text {c}}\cdot (-z)\\
y_{\text {c}}\cdot (-z)\\ z_{\text {c}}\cdot (-z)\\ -z \end{array}
\right) \; = \; \left( \begin{array}{cccc} \frac{2\cdot z_{\text {near}}}
{x_{\text {right}}-x_{\text {left}}} &{} 0 &{}
\frac{x_{\text {right}}+x_{\text {left}}}{x_{\text {right}}-x_{\text
{left}}} &{} 0\\ 0 &{}\frac{2\cdot z_{\text {near}}} (5.12)
{y_{\text {top}}-y_{\text {bottom}}} &{} \frac{y_{\text
{top}}+y_{\text {bottom}}}{y_{\text {top}}-y_{\text {bottom}}}
&{} 0\\ 0 &{} 0 &{} a_{1} &{} a_{2}\\ 0
&{} 0 &{} -1 &{} 0 \end{array} \right) \cdot \left(
\begin{array}{c} x\\ y\\ z\\ 1 \end{array}f \right) \end{aligned}$$
$$z_{\text {c}}\cdot (-z)= a_{1}z+a_{2}$$ apply.
There are two cases: First, the mapping of the points from the near
clipping plane to $$- 1$$, and second, the mapping of the points from
the far clipping plane to 1. From these considerations, two equations with
two unknowns result $$a_{1}$$ and $$a_{2}$$ in the
following manner.
1.
For $$(z, z_{\text {c}})=(-z_{\text {near}}, -1)$$, the following
applies and
2. for applies .

Note the nonlinear relation between z and . For , there is a high


precision near the near clipping plane, while near the far clipping plane the
precision is very low. This means that small changes of the object corner
points near the camera are clearly displayed, while in the background they
have little or no influence on the clipping coordinates. If the near and far
clipping planes are far apart, precision errors can, therefore, occur when
investigating, for example, the visibility of objects near the far clipping
plane. Hiding objects near the far clipping plane receive the same z-
coordinate and compete with each other due to these precision errors and
therefore become alternately visible and invisible. This leads to an
unwanted flickering, which is called z-fighting. It is therefore important that
the scene to be displayed is enclosed as closely as possible between the near
and far clipping planes to keep the distance between these two planes as
small as possible. Alternatively, a 64-bit floating point representation
instead of 32-bit helps to double the number of numbers that can be
displayed.
After substituting the two equations into each other, the following
results:
1.
, used in
2.
and converted into
.

is used in the first equation, , and


converted into

This results in the projection matrix for a general frustum to convert into the
Normalised Device Coordinate (NDC), which is the projection matrix (GL
PROJECTION) of the fixed-function pipeline in the OpenGL.

(5.13)

The frustum can take on special forms that further unify the projection
matrix. If the visible volume of the frustum is symmetrical, i.e.,
and , then the projection matrix is reduced

(5.14)

Interestingly, this matrix can be derived if only the angle of view, the aspect
ratio of the image area, and are available as parameters. The
question arises how the coordinates , , and can be
derived from this. First, the simple case is considered that the aspect ratio of
the image area is 1:1, i.e., the image area is square. For this purpose, the
image angles in x- and in y-direction are added in the following (see
Figs. 5.33 and 5.34). In most cases, the angle of view is given, from
which can be calculated with trigonometric considerations.
Fig. 5.33 Calculation of the perspective projection depending on the angle of view in the
OpenGL

Fig. 5.34 Calculation of the perspective projection depending on the angle of view in the
OpenGL

According to trigonometric considerations,

and

Accordingly, with the reciprocal value and due to the symmetry,

and

The distance between the eyepoint of the camera and the image area is
called focal length f. With a normalised image area in x- and y-direction in
the interval , the symmetrical case results in an enlargement of the
image angle or a zoom out and when the image angle is reduced, a
zoom in. The ratio and
continues to exist with varying f after the ray theorem. Hence, the following
applies and thus the following matrix:

(5.15)

Additionally,
.
In this connection, the nonlinear relation between z and whose
equation can now be specified is as follows:

For example, if and and thus the formula .


This results in a mapping between the z-values $$-10$$ and
$$-30$$ as shown in Fig. 5.35.

Fig. 5.35 Nonlinear relation between z and $$z_{\text {c}}$$

For $$z_c$$, high precision exists near the near clipping plane,
while near the far clipping plane, the precision is very low. This means that
small changes in the object corner points are displayed near the camera
while they are hardly or not perceptible in the background. Therefore, if the
two near and far clipping planes are far apart, precision errors may occur
when investigating, for example, the visibility of objects near the far
clipping plane. Obscuring objects near the far clipping plane receive, due to
these precision errors, the same z-coordinate and compete. As a result, they
become alternately visible and invisible, respectively. This leads to
unwanted flickering, which is called z-fighting. It is therefore essential to
frame the scene to be displayed as closely as possible between the near and
far clipping planes to keep the distance between these two planes as small
as possible. Alternatively, a 64-bit floating point representation instead of
32-bit also helps double the number of numbers displayed.
This means that the further away the objects are, the more they “crowd”
into the far clipping plane due to the nonlinear mapping. However, since the
order is not changed from the near clipping plane to the far clipping plane,
this distortion is not relevant for this order, and depth calculations can still
be performed. It should be noted that there is a difference in precision in
that more bits are relatively devoted to values further ahead than to values
further back.
In OpenGL, the image area is internally oriented in x- and y-direction in
the interval $$[-1,1]$$, i.e., symmetrically in the aspect ratio 1:1.
What remains is to consider the case where the aspect ratio is not equal
to 1:1. Assuming that the ratio is $$\frac{w}{h}$$ with unequal
image width w and image height h, i.e., the image area is rectangular. Then
the properties of the symmetrical case remain, i.e.,
$$x_{\text {left}}=-x_{\text {right}}$$ and
$$y_{\text {bottom}}=-y_{\text {top}}$$. However, the aspect ratio
$$\frac{w}{h}$$ influences the deviation between
$$x_{\text {right}}$$ and $$y_{\text {top}}$$ and thus also
the relationship between $$x_{\text {left}}$$ and
$$y_{\text {bottom}}$$.
$$x_{\text {right}}=y_{\text {top}}\cdot \frac{w}{h}$$ applies. In
addition, with trigonometric considerations,
$$y_{\text {top}}= \tan (\frac{\theta _{y}}{2})\cdot z_{\text
{near}}$$
is true.
Therefore, the matrix changes to
(5.16)
$$\begin{aligned} \left( \begin{array}{c} x_{\text {c}}\cdot (-z)\\
y_{\text {c}}\cdot (-z)\\ z_{\text {c}}\cdot (-z)\\ -z \end{array}
\right) \; = \; \left( \begin{array}{cccc} \frac{f}{\text {aspect ratio}}
&{} 0 &{} 0 &{} 0\\ 0 &{} f &{} 0 &
{} 0\\ 0 &{} 0 &{} -\frac{z_{\text {far}}+z_{\text
{near}}}{z_{\text {far}}-z_{\text {near}}} &{} -\frac{2\cdot
z_{\text {far}}\cdot z_{\text {near}}}{z_{\text {far}}-z_{\text
{near}}}\\ 0 &{} 0 &{} -1 &{} 0 \end{array} \right)
\cdot \left( \begin{array}{c} x\\ y\\ z\\ 1 \end{array} \right) .
\end{aligned}$$
An asymmetric frustum is used in VR. After performing the perspective
division (from homogeneous to the Cartesian coordinates by division with
the fourth component $$-z$$), one obtains the coordinates of the
vertices $$(x_{\text {c}}, y_{\text {c}}, z_{\text {c}})$$ in the unit
cube, the so-called normalised device coordinate (NDC). The separation
between projection and perspective division is essential in the OpenGL
since clipping can be performed without the perspective projection. The
clipping limits are between $$[-\vert z\vert , \vert z\vert ]$$ for each
dimension x, y and z without the perspective division.

5.10 Viewing Pipeline: Coordinate System


Change of the Graphical Pipeline
This section summarises the geometric transformations discussed in the
previous section in the correct order in which they pass through the
graphical pipeline. This part of the pipeline is also called the viewing
pipeline and is subordinate to the process of vertex processing (see
Fig. 5.36).

Fig. 5.36 Transformations of the viewing pipeline with the associated coordinate systems

Initially, the objects are created in the model coordinate system. Then
they are arranged in the scene by means of scaling, rotation, translation or
distortion and thus placed in the world coordinate system. Afterwards, they
are displayed from the camera’s point of view.
The previous calculations of Sect. 5.8 assumed that the camera is
located at the coordinate origin of the world coordinate system and looks in
a negative direction (negative z-axis of the world coordinate system). Since
this is usually not true, the transformation from the world coordinate system
to the camera coordinate system must be considered in the following. First
of all consideration of the transformation from model coordinates to world
coordinates of an object is necessary because the model view matrix of
OpenGL contains the overall transformation from the model coordinate
system to camera coordinate system. The camera coordinate system is also
called the view reference coordinate system (VRC).
Following the results of Sect. 5.3, the following matrices can be set up
very easily (see Figs. 5.32 and 5.37).

Fig. 5.37 Transformations from Model to World to Camera Coordinate System

Matrix $$M_{\text {MC}\rightarrow \text {WC}}$$ from model


coordinate system MC to world coordinate system WC and
Matrix $$M_{\text {VRC}\rightarrow \text {WC}}$$ from camera
coordinate system VRC to world coordinate system WC.
Assuming that the axes of the model coordinate system are spanned by
the three-dimensional column vectors $$\boldsymbol{m_{x}}$$,
$$\boldsymbol{m_{y}}$$ and $$\boldsymbol{m_{z}}$$, and
the coordinate origin is $$\boldsymbol{P_{\text {model}}}$$ in
reference to the world coordinate system, then applies in homogeneous
coordinates
$$\begin{aligned} M_{\text {MC}\rightarrow \text {WC}}= \left(
\begin{array}{cccc} \boldsymbol{m_{x}} &{}
\boldsymbol{m_{y}} &{} \boldsymbol{m_{z}} &{} (5.17)
\boldsymbol{P_{\text {model}}}.\\ 0 &{} 0 &{} 0 &
{} 1 \end{array} \right) . \end{aligned}$$
Analogously, if the axes of the camera coordinate system are spanned by
the three-dimensional column vectors $$\boldsymbol{U}$$,
$$\boldsymbol{V}$$ and $$\boldsymbol{N}$$, and the
coordinate origin would be $$\boldsymbol{P_{\text {eye}}}$$ from
the point of view of the world coordinate system, then it can be seen that in
the homogeneous coordinates
$$\begin{aligned} M_{\text {VRC}\rightarrow \text {WC}}= \left(
\begin{array}{cccc} \boldsymbol{U} &{} \boldsymbol{V}
&{} \boldsymbol{N} &{} \boldsymbol{P_{\text {eye}}}\\ (5.18)
0 &{} 0 &{} 0 &{} 1 \end{array} \right)
\end{aligned}$$
applies. However, since the world coordinate system is to be mapped into
the camera coordinate system, the inverse
$$M_{\text {VC}\rightarrow \text {WC}}^{-1}$$ must be applied.
The overall matrix GL_MODELVIEW in the OpenGL thus results from
matrix multiplication
$$M_{\text {VC}\rightarrow \text {WC}}^{-1}\cdot M_{\text
{MC}\rightarrow \text {WC}}$$
.
The input parameters in the OpenGL for the function gluLookAt for
the definition of the camera coordinate system are not directly the axes ,
and and its coordinate origin, but the eyepoint , the up-vector
and the Reference point . The up-vector $$V_{\text {up}}$$
is a vector from the point of view of the world coordinate system which
points upwards, mostly $$(0,1,0)^{T}$$ is used as a vector of the y-
axis.
In order to calculate the axes of the camera coordinate system, and thus
the matrix $$M_{\text {VC}\rightarrow \text {WC}}$$, the
following calculations are performed. Let $$P_{\text {obj}}$$ be a
vertex of the object.
First, the normalised connection vector
$$\boldsymbol{F}=\frac{P_{\text {obj}}-P_{\text {eye}}}{\vert
P_{\text {obj}}-P_{\text {eye}}\vert }$$
is calculated that specified the direction of view of the camera. The non-
named vector $$\boldsymbol{U}$$ is perpendicular to the direction
of view of the camera and the up-vector and thus results from the cross
product
$$\boldsymbol{U}=\boldsymbol{F}\times \boldsymbol{V_{\text
{up}}}$$
. The normalised vector $$\boldsymbol{V}$$ is, of course,
orthogonal to the two vectors $$\boldsymbol{U}$$ and
$$\boldsymbol{F}$$ and is calculated by
$$\boldsymbol{V}=\boldsymbol{U}\times \boldsymbol{F}$$.
Finally, the normalised vector $$\boldsymbol{N}$$ is identical to
$$-\boldsymbol{F}$$. Then a projection, which is usually the
perspective projection, is made into the clipping coordinates. Then clipping
(see Sect. 8.1) is applied.
Let w be the fourth component of the homogeneous coordinates (see
Sect. 5.1.1), the one that must be divided by in the perspective division to
obtain the corresponding Cartesian coordinates. The advantage of
performing the clipping calculations already now and not only in the next
step after the perspective division is based on the fact that clipping can be
applied to $$-w,w$$. The corresponding efficiency is even identical
and the special case $$w = 0$$ does not need to be treated. In
addition, the perspective division does not have to be carried out
subsequently for all transformed key points, which leads to an increase in
efficiency. Then, by applying the perspective division, the frustum is
mapped into the normalised device coordinate system (NDC), which is a
unit cube in the dimension $$[- 1, 1]$$ of all three components. The
corresponding matrix in the OpenGL is the GL_PROJECTION Matrix.
Finally, the mapping is done to the window of the respective device, which
is represented in the window space coordinate system (DC).
In the last step, the clip coordinates are mapped to the respective
window of the end device. Assuming that for the DC coordinate system of
the window, the x-axis is directed to the right and the y-axis to the top. The
coordinate origin would be at (x, y). The mapping of the unit cube from the
normalised device coordinate system (NDC) into this two-dimensional
coordinate system would then have to be carried out using the following
rules:
1.
$$-1 \rightarrow x$$, $$1 \rightarrow (x + \text {width})$$;
2.
$$-1 \rightarrow y$$, $$1 \rightarrow (y + \text {height})$$;
3. $$-1 \rightarrow z+\text {near}$$,
$$1 \rightarrow (z+\text {far})$$.
A visualisation of these relationships is shown in Fig. 5.38.

Fig. 5.38 Transformations from the normalised device coordinate system to the window space
coordinate system (z-component neglected)

The mapping from the NDC to the DC is a linear mapping


1.
from interval $$[- 1,1]$$ to interval $$[x,x+\text {width}]$$,
2.
from interval $$[- 1,1]$$ to interval $$[y,y+\text {height}]$$ and
3.
from interval $$[-1,1]$$ to interval
$$[z+\text {near},z+\text {far}]$$
with (x, y) coordinate origin, window width and window height and is
calculated as follows:
1.
$$a_{1}\cdot (-1)+b_{1}=x$$,
$$a_{1}\cdot 1 + b_{1}=x + \text {width}$$;
2.
$$a_{2}\cdot (-1)+b_{2}=y$$,
$$a_{2}\cdot 1 + b_{2}=y + \text {height}$$;
3.
$$a_{3}\cdot (-1)+b_{3}=z+\text {near}$$,
$$a_{3}\cdot 1 + b_{3}=z+\text {far}$$.
For the first two equations applies
$$\begin{aligned} a_{1}\cdot (-1)+b_{1}=x \Leftrightarrow b_{1}=x +
a_{1} \end{aligned}$$
and
$$\begin{aligned} a_{1}+b_{1}=x + \text {width} \Leftrightarrow
a_{1}+x + a_{1} =x + \text {width} \Leftrightarrow a_{1}=\frac{\text
{width}}{2} \end{aligned}$$
and thus
$$\begin{aligned} -\frac{\text {width}}{2} + b_{1} = x \Leftrightarrow
b_{1} = x + \frac{\text {width}}{2}. \end{aligned}$$
For the next two equations follows analogously
$$\begin{aligned} a_{2}=\frac{\text {height}}{2} \text {~und~} b_{2} =
y + \frac{\text {height}}{2}. \end{aligned}$$
The last two equations remain to be examined. It is necessary that
$$\begin{aligned} a_{3}\cdot (-1)+b_{3}=z+\text {near} \Leftrightarrow
b_{3}=z+\text {near}+a_{3} \end{aligned}$$
and therefore
$$ a_{3} \cdot 1 +b_{3}=z+\text {far} \Leftrightarrow a_{3}=z+\text
{far}-z-\text {near}-a_{3} \Leftrightarrow 2\cdot a_{3}=\text {far}-\text
{near} \Leftrightarrow a_{3}=\frac{\text {far}-\text {near}}{2}. $$
In addition,
$$\begin{aligned} b_{3}=z+\text {near}+\frac{\text {far}-\text {near}}
{2}=z+\frac{\text {far}+\text {near}}{2}. \end{aligned}$$
This results in the total mapping (in Cartesian coordinates)
$$\begin{aligned} \left( \begin{array}{c} x_{\text {window}}\\
y_{\text {window}}\\ z_{\text {window}} \end{array} \right) \; = \;
\left( \begin{array}{c} \frac{\text {width}}{2}\cdot x_{\text {c}} +
(x+\frac{\text {width}}{2})\\ \frac{\text {height}}{2}\cdot y_{\text (5.19)
{c}} + (y +\frac{\text {height}}{2})\\ \frac{\text {far}-\text {near}}
{2}\cdot z_{\text {c}} + (z + \frac{\text {far}+\text {near}}{2})
\end{array} \right) \end{aligned}$$
and in the homogeneous representation by means of matrix multiplication
(5.20)
$$\begin{aligned} \left( \begin{array}{c} x_{\text {window}}\\
y_{\text {window}}\\ z_{\text {window}}\\ 1 \end{array} \right) \; =
\; \left( \begin{array}{cccc} \frac{\text {width}}{2} &{} 0
&{} 0 &{} (x+\frac{\text {width}}{2})\\ 0 &{}
\frac{\text {height}}{2} &{} 0 &{} (y +\frac{\text
{height}}{2})\\ 0 &{} 0 &{} \frac{\text {far}-\text {near}}
{2} &{} (z + \frac{\text {far}+\text {near}}{2})\\ 0 &{} 0
&{} 0 &{} 1 \end{array} \right) \cdot \left( \begin{array}
{c} x_{\text {c}}\\ y_{\text {c}}\\ z_{\text {c}}\\ 1 \end{array}
\right) . \end{aligned}$$

Fig. 5.39 Transformations from the window space coordinate system with y-axis up to the
coordinate system with y-axis down [4]

Calculating the size of the screen with any coordinate origin with
positive x- and y-coordinates seems obvious, but the question arises why the
z-coordinates are transformed again. Of course, these remain between
$$-1$$ and 1, but due to the z-fighting phenomenon, they are
transferred to a larger range, especially if the storage possibility only allows
a certain degree of accuracy. The depth buffer in typical OpenGL
implementations (depending on the graphics processor), for example, only
allows certain accuracies, typically integer values of 16, 24 or 32 bits. This
means that the numerical interval $$[-1,1]$$ is first mapped to
$$[0,1]$$ and then, for example, with 16-bit accuracy, to the interval
$$[0,2^{16}-1]= [0,65535]$$. $$z_{\text {window}}$$ is
calculated with the matrix calculation rule by setting
$$\text {near}=0$$ and $$\text {far}=65535$$.
For many image formats, however, the coordinate origin, e.g., top left is
expected, for which further transformations are necessary as a post-
processing step. To transfer this output to the two-dimensional coordinate
system, where the y-axis points down, the following steps must be carried
out (see Fig. 5.39):
1.
Move by $$-y_{\text {height}}$$.
2. Mirroring on the x-axis (i.e., multiplying the y-values by $$-1$$).
5.11 Transformations of the Normal Vectors
For the calculations of the illumination (see Chap. 9), the normals of the
corner points are needed. Since the corner points are transformed in the
viewing pipeline, these transformations must also be applied to the normal
vectors. In the fixed-function pipeline, this adjustment is automated, but in
the programmable pipeline, it is left to the programmer.
One could assume that it is sufficient to multiply each normal vector by
the total matrix of the transformations of the viewing pipeline. The
following consideration refutes this hypothesis. A normal vector is strictly
speaking a (directional) vector that only needs to be rotated during the
transformation.
If the normal vectors are present and if they are orthogonal to the
corresponding vertices, it would be desirable that this behaviour is also
present after a transformation. In this case, the transformed normal vectors
must be orthogonal to the corresponding vertices again. Therefore, for a
normal vector $$\boldsymbol{n}^{T}$$, and for the total matrix M of
the transformation in the viewing pipeline, the following applies:
$$\begin{aligned} \boldsymbol{n}^{T}\cdot ((x,y,z)^{T}-
\boldsymbol{v})=0 \Leftrightarrow \boldsymbol{n}^{T}\cdot
\boldsymbol{r} = 0 \Leftrightarrow \boldsymbol{n}^{T}\cdot M^{-1}
\cdot M \cdot \boldsymbol{r} = 0 \end{aligned}$$
$$\begin{aligned} \Leftrightarrow ((M^{-1})^{T}\cdot
\boldsymbol{n})^{T} \cdot M \cdot \boldsymbol{r} = 0 \Leftrightarrow
\boldsymbol{\bar{n}}^{T}\cdot \boldsymbol{r'} = 0. \end{aligned}$$
Thus, for the transformed normal vector
$$\boldsymbol{\bar{n}}^{T}=((M^{-1})^{T}\cdot
\boldsymbol{n})^{T}$$
and for the tangential vector (direction vector of the plane)
$$\boldsymbol{r'}= M \cdot \boldsymbol{r} = M \cdot ((x,y,z)^{T}-
\boldsymbol{v})$$
, with a vector $$\boldsymbol{v}$$ of a point of the spanned plane.
Obviously, tangents are transformed with the total matrix M of the
viewing pipeline, like the vertices. The normal vectors in this respect must
be transformed with the transposed inverse $$(M^{-1})^{T}$$ in
order to remain orthogonal to the object. Note the special case M is
orthonormal, in which case $$(M^{-1})^{T}$$ is equal to M;
therefore, a standard matrix inversion is no longer necessary in this case.

5.12 Transformations of the Viewing Pipeline in


the OpenGL
The $$4\times 4$$ transformation matrices for homogeneous coordinates
are represented in the OpenGL by a 16-digit array of type float. The matrix
fields are stored column by column.
In the following, the fixed-function pipeline is considered first (see
Figs. 5.40 and 5.41). The GL_MODELVIEW Matrix realises that the
transformations of objects by the corresponding transformation matrices are
multiplied in reverse order and stored as an overall matrix in
GL_MODELVIEW. The GL_MODELVIEW matrix is activated via the
command glMatrixMode(GLMODELVIEW) and, among other things, is
using the functions glLoadldentity, glLoadMatrix,
glMultMatrix, glPushMatrix, glPopMatrix, glRotate,
glScale and glTranslate. Since OpenGL works like a state machine,
these transformations of an object are only updated when drawing.

Fig. 5.40 The transformations in the context of vertex processing in the OpenGL

Fig. 5.41 Transformations of the viewing pipeline in the fixed-function pipeline of OpenGL

First, the GL_MODELVIEW matrix is overwritten with the


$$4\times 4$$ identity matrix. The function glLoadIdentity does
this. Then the necessary transformations with the multiplication of their
matrices are applied in reverse order.
Assume that first a rotation and then a transformation takes place, then
the functions glRotate and glTranslate are called with the
corresponding parameters in reverse order. The functions
glMultMatrix, glRotate, glScale and glTranslate each
generate a $$4\times 4$$ matrix, which is multiplied from the right to
the GL_MODELVIEW matrix, also a $$4\times 4$$ matrix.
Afterwards, the transformation into the camera coordinate system takes
place, which is considered in Sect. 5.8. The corresponding
$$4\times 4$$ matrix is then also connected from the right to the
Matrix GL_MODELVIEW multiplied. Thus, the resulting GL_MODELVIEW
consists of the multiplication of a total matrix of geometric transformations,
$$M_{\text {model}}$$, with the matrix resulting from the
transformation into the camera coordinate system,
$$M_{\text {view}}$$, in the following way. The resulting total
matrix is applied to each vertex point in the form of parallel processing. In
OpenGL, all transformations from the model coordinate system to the
camera coordinate system are summarised in the GL_MODELVIEW result
matrix. For projection from the camera coordinate system into the
normalised device coordinate system, the GL_PROJECTION matrix with
glMatrixMode(GL_PROJECTION) is activated. All known matrix
operations and additionally gluPerspective can be executed on the
GL_PROJECTION matrix. Finally, glViewport is used to transform the
result to the mobile device. With the programmable pipeline (core profiles),
the programmer has to implement most of the functions previously
explained in the fixed-function pipeline in the vertex shader himself (except
perspective division and viewport transformation), which in turn offers
complete design freedom. It should be mentioned that the class PMVTool
provides the corresponding functions and can be used well with the core
profile. However, all calculations are performed on the CPU. In Chap. 2,
this programmable pipeline is described. Since OpenGL is a state machine,
the current states of the matrices only influence the corresponding
transformations of the objects when drawing. Corresponding source code
snippets are available in Sect. 2.5 with the comment “display code” seen in
the marked areas.

5.13 Exercises
Exercise 5.1 Let there be a two-dimensional sun–earth orbit. The sun is at
the centre (i.e. its centre is at the coordinate origin). The earth is visualised
as a sphere with a radius of 10 length units. The earth (i.e., its centre) moves
evenly on its circular orbit counter-clockwise around the sun. The radius of
the orbit is 200 units of length. The initial position of the earth (i.e., its
centre) is (200, 0). During the orbit of the sun, the earth rotates 365 times
evenly around itself (counter-clockwise). Let P be the point on the earth’s
surface that is the least distance from the sun in the initial position. Using
geometric transformations, find out where P is set if the earth has
completed one-third of its orbit.
Proceed as follows.
(a)
Consider which elementary transformations are necessary to solve the
task and arrange them in the correct order.
(b)
Specify the calculation rule to transform the point P using the
transformations in (b). Make sure that the multiplications are in the
correct order.
(c)
Calculate the matrix for the total transformation in homogeneous
coordinates and determine the coordinates of the transformed point P.

Exercise 5.2 Select the constant c in the matrix


$$ \left( \begin{array}{ccc} c &{} 0 &{} 6\\ 0 &{} c
&{} 4\\ 0 &{} 0 &{} c\\ \end{array} \right) $$
so that the matrix in homogeneous coordinates represents a translation
around the vector $$(3,2)^\top $$.

Exercise 5.3 Program a small animation showing the movement from task
Exercise 5.1.

Exercise 5.4 Program a small animation in which a beating heart moves


across a screen window.

Exercise 5.5 Apply the technique for converting two letters shown in
Fig. 5.10 for D and C to two other letters, e.g., your initials.
Exercise 5.6 The following chair is to be positioned on the xz-plane
centrally above the coordinate origin so that it stands on the xz-plane. The
backrest should be oriented to the rear (direction of the negative z-axis).
The chair has the following dimensions:
The four chair legs have a height of 1.0 with a square base of side length
0.1.
The square seat has the side length 0.8 and the thickness 0.2.
The cylindrical backrest has the radius 0.4 and the thickness 0.2.
Construct the chair using the elementary objects
$$\text {box}(x,y,z)$$, which creates a box of width 2x, height 2y and
depth 2z with the centre at the coordinate origin, and
$$\text {cylinder}(r,h)$$, which creates a cylinder of radius r around
the height h with the centre at the origin.
(a)
Draw a scene graph for the chair in which each node is assigned a
suitable transformation.
(b)
Specify the geometric elements and geometric transformations of the
scene graph by selecting appropriate types of geometric
transformations and specifying precise parameter values for the
geometric elements and transformations.
(c)
Write an OpenGL program to visualise this chair.

Exercise 5.7 Write a program in which the individual parts of the chair
shown in Fig. 5.14 are first positioned next to each other and the chair is
then assembled as an animation.

Exercise 5.8 The perspective projection on one plane is to be reduced to a


parallel projection on the xy-plane. For this purpose, the projection plane is
to be considered, which is defined by the normal vector
$$\frac{1}{\sqrt{3}} (1,1,1)^{T}$$ and the point $$P=(2,4,3)$$, which
is defined in the plane. The projection centre is located at the coordinate
origin. Calculate suitable geometric transformations for the return to a
parallel projection. Specify the projection as a series-connected execution
of the determined transformations. Proceed as follows:
(a) Specify the plane in the Hessian normal form. What is the distance
from the origin of the plane?
(b)
Specify the projection reference point (PRP) and the view reference
point (VRP). When calculating the VRP, keep in mind that the
connection between the PRP and VRP is parallel to the normal vector
of the plane.
(c)
Specify the transformation to move the VRP to the origin.
(d)
Specify the transformations in order to rotate the plane into the xy-
plane based on the transformation from task part (c).
(e)
Specify the transformation in order to build on the transformations
from task parts (c) and (d) and move the transformed projection
centree to the z-axis (to the point $$(0,0,-z_0 )$$) if necessary.
Determine $$z_0$$.
(f)
Specify the transformations that are necessary to transform the result
of (e) into the standard projection, PRP in the coordinate origin and
VRP on the negative z-axis.
(g)
Enter the calculation rule for determining the total transformation,
which consists of the individual transformation steps from the
previous task parts.

Exercise 5.9 The following perspective projection is given from the


camera’s point of view in the viewing pipeline. The Projection Reference
Point is enclosed (0, 0, 24). The view reference point is at the coordinate
origin. The projection surface is the xy-plane with the limitation
$$-x_{\text {min}}=x_{\text {max}}=3$$ and
$$-y_{\text {min}}=y_{\text {max}}=2$$. The near plane has the
distance $$d_{\text {min}}=10$$ and the far plane has the distance
$$d_{\text {max}}=50$$ to the eye point. Explain the necessary
transformations until shortly before mapping to the device mapping
coordinate system and specify the corresponding matrix multiplication.
Calculate the total matrix.
Exercise 5.10 In this task, the individual steps of the viewing pipeline
from “modelling” to “device mapping” are to be traced. For the sake of
simplicity, it is assumed that the scene, which consists of individual objects,
has already been modelled. The following information is available in the
coordinate system after the modelling is complete. The eyepoint (synthetic
camera, projection reference point) is at $$(4,10,10)^{T}$$, and the view
reference point is at $$(0,0,0)^{T}$$. The camera has a coordinate
system which is defined by the vectors
$$U ^{T}= (0.928,0.0,-0.371)^{T}$$,
$$V^{T}=(-0.253,0.733,-0.632)^{T}$$ and
$$N^{T}=(0.272,0.68,0.68)^{T}$$ is clamped. The projection surface
has the dimensions $$x_{\text {min}}= -x_{\text {max}}= -2.923$$ and
$$y_{\text {min}}= -y_{\text {max}}= -2.193$$. Furthermore, let
$$d_{\text {min}} = 0.1$$ and $$d_{\text {max}} = 100$$. The output
device has a resolution of 640 $$\times $$ 480 pixels.
(a)
Enter the names of the involved steps of the viewing pipeline and
coordinate systems to move the objects from the model coordinate
system to the coordinates of the output device.
(b)
Specify the matrices that describe the steps of the viewing pipeline
mentioned under task part (a). Assume here that the modelling of the
scene has already been completed.
(c)
How is the matrix of the overall transformation calculated from the
results of subtask (b) in order to transform the scene into the
coordinates of the output device?

Exercise 5.11 In the fixed-function pipeline of OpenGL, the matrix of the


transfer to the camera coordinate system VRC is created with the call

The subsequent perspective projection is created with the call


In this task, it shall be understood how the steps of the viewing pipeline
can be calculated from these specifications alone. For the sake of simplicity,
it is assumed that the scene, which consists of individual objects, is already
modelled. The following information is available in the coordinate system
after the modelling is completed.
The projection reference point
$$(\text {eyeX},\text {eyeY},\text {eyeZ})^{T}=(0,0,0)^{T}$$ and
the view reference point at
$$(\text {centerX},\text {centerY},\text {centerZ})^{T}=
(-4,-10,-10)^{T}$$
are given. The up-vector $$V_{\text {up}}$$ is enclosed
$$(\text {upX},\text {upY},\text {upZ})^{T} =(4,11,10)^{T}$$. The
angle of view (field of view, fovy) of the camera is $$22.5^\circ $$.
Furthermore, $$z_{\text {far}} = 100$$ (zFAR) and
$$z_{\text {near}}$$ is identical to the distance of the image area
from the eyepoint in the z-direction (zNEAR). The output device has a
resolution of 640 $$\times $$ 480 pixels (aspect). Assume the
symmetrical case.
(a)
Calculate the base vectors $$\boldsymbol{U}$$,
$$\boldsymbol{V}$$ and $$\boldsymbol{N}$$ of the camera
coordinate system VRC.
(b)
Calculate $$z_{\text {near}}$$, $$x_{\text {left}}$$,
$$x_{\text {right}}$$, $$y_{\text {bottom}}$$ and
$$y_{\text {top}}$$ of the image area of the camera coordinate
system PRC.
The results can be used to determine the steps of the viewing pipeline as
in task Exercise 5.10.

References
1. E. Lengyel. Mathematics. Vol. 1. Foundations of Game Engine Development. Lincoln: Terathon
Software LLC, 2016.
2.
E. Lengyel. Mathematics for 3D Game Programming and Computer Graphics. Boston: Course
Technology, 2012.

3. E. Lengyel. Rendering. Vol. 2. Foundations of Game Engine Development. Lincoln: Terathon


Software LLC, 2019.

4. NASA. “Taken Under the ‘Wing’ of the Small Magellanic Cloud”. Abgerufen 08.02.2021,
19:21h. URL: https://www.nasa.gov/image-feature/taken-under-the-wing-of-the-small-
magellanic-cloud

5. A. Nischwitz, M. Fischer, P. Haberäcker and G. Socher. Computergrafik. 4. Auflage.


Computergrafik und Bildverarbeitung. Wiesbaden: Springer Vieweg, 2019.

Footnotes
1 Especially in physics, a clear distinction between these two concepts is required.

2 Physicists may forgive us for this carelessness.

3 The coordinate system (with the origin in the lower-left corner) of the figure corresponds to the
window coordinate system of OpenGL.

4 Any mapping is given by a matrix A that maps one vector x to another vector b. Mapping in the
form $$Ax = b$$ is a linear mapping.

5 Due to the homogeneous coordinates, each of the vector equations actually consists of four
equations. However, the last line or equation always has the form
$$0\cdot p_x + 0 \cdot p_y + 0 \cdot p_z + 1\cdot 1 = 1$$.
© Springer Nature Switzerland AG 2023
K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science
https://doi.org/10.1007/978-3-031-28135-8_6

6. Greyscale and Colour Representation


Karsten Lehn1 , Merijam Gotzes2 and Frank Klawonn3
(1) Faculty of Information Technology, Fachhochschule Dortmund,
University of Applied Sciences and Arts, Dortmund, Germany
(2) Hamm-Lippstadt University of Applied Sciences, Hamm, Germany
(3) Data Analysis and Pattern Recognition Laboratory, Ostfalia University
of Applied Sciences, Braunschweig, Germany

Karsten Lehn (Corresponding author)


Email: [email protected]

Merijam Gotzes
Email: [email protected]

Frank Klawonn
Email: [email protected]

This chapter contains the basics for the representation of computer graphics
objects using grey values and colours. As an example of how grey values
can be represented even though the output device can only produce
completely black or completely white pixels, the halftone method is
described. Furthermore, this chapter contains an overview of colour models
that are used in computer graphics and related areas such as image
processing and digital television. For professional applications, the exact
reproducibility of colours on different output devices is required. For
example, exactly the same colour should be the output on a printer as it
appears on the monitor. For this purpose, the so-called calibrated colour
spaces, also called colorimetric colour spaces, can be used. Furthermore,
the basic colour models used in the OpenGL are presented and the basic
principles of colour interpolation are explained.

6.1 Greyscale Representation and Intensities


In black-and-white images, the individual pixels of an image can not only
be black or white, but can also take on grey values of different intensity.
Such images are colloquially called black-and-white images. In the context
of computer graphics, the term greyscale image is more appropriate for
such images in order to distinguish these images from binary images, that
only contain black or white pixels and no grey value. In colour images,
analogously, the colour components of a coloured pixel are not either set or
not set, but can also take on graded values. Therefore, the following
explanations of this section on greyscale representations can also be applied
to colour images.
When indicating the brightness of an LED light source, the equivalent
power in watts is often given, which is what a light bulb with a
conventional filament needs to shine as brightly as the LED light source.
Since human perception of brightness is relatively oriented, a 60-watt bulb
appears much brighter relative to a 20-watt bulb than a 100-watt bulb
relative to a 60-watt bulb, even though the difference is 40 watts in both
cases. The 60-watt bulb appears about three times as bright as the 20-watt
bulb ( $$60/20 = 3$$), while the 100-watt bulb appears less than twice
as bright as the 60-watt bulb ( $$100/60 \approx 1.6$$).
For the greyscale or colour representation in images, there are usually
only a finite number of intensity levels available. These intensity levels
should be scaled logarithmically rather than linearly according to the human
perception of brightness. Starting from the lowest intensity (which is black
for grey values) $$I_0$$, the intensity levels should be of the form
$$ I_0 = I_0, \qquad I_1 = rI_0, \qquad I_2 = rI_1 = r^2 I_0, \qquad \ldots
$$
up to a maximum intensity $$I_n = r^n I_0$$ with a constant
$$r>1$$. Humans can generally only distinguish between adjacent
grey values when $$r>1.01$$, i.e., when the difference is at least
$$1 \%$$ (see [6]). Assuming a maximum intensity $$I_n = 1$$
and a minimum intensity $$I_0$$ depending on the output device, it
follows from $$1.01^n I_0 \le 1$$ that
$$ n \; \le \; \frac{\ln \left( \frac{1}{I_0}\right) }{\ln (1.01)} $$
grey levels are sufficient for resolution on the corresponding output device.
A finer representation would not make any difference to human visual
perception.
Table 6.1 Intensity levels for different output media (according to [1])

Medium $$I_0$$ (ca.) Max. number of grey levels


Monitor 0.005–0.025 372–533
Newspaper 0.1 232
Photo 0.01 464
Slide 0.001 695

Based on these considerations, Table 6.1 shows the maximum useful


number of intensity levels for different output media.
If an image with multiple intensity levels is to be displayed on an output
medium that has only binary pixel values (values can really only be black or
white), for example, a black-and-white laser printer, an approximation of
the different intensity levels can be achieved by reducing the resolution.
This technique is called halftoning, which combines binary pixels to larger
pixels in a pixel matrix. For instance, if 2 $$\times $$2 pixels are
combined into one larger pixel, five intensity levels can be displayed. For 3
$$\times $$3 pixels ten and for n $$\times $$n-pixels
$$n^2+1$$ intensity levels are possible. The reduction of the
resolution may only be carried out to such an extent that the visual system
does not perceive the visible raster as disturbing from the corresponding
viewing distance. The individual pixels to be placed in the n
$$\times $$n large pixel should be chosen as adjacent as possible and
should not lie on a common straight line. Otherwise, striped patterns may
be visible instead of a grey area.

Fig. 6.1 Grey level representation based on halftoning for a 2 $$\times $$ 2 pixel matrix (top line)
and a 3 $$\times $$ 3 pixel matrix
Figure 6.1 shows in the first line a representation of the five possible
grey levels by a matrix of 2 $$\times $$2 pixels and below the
representation of ten possible grey levels based on a 3 $$\times $$3
pixel matrix.
For the definition of these pixel matrices, dither matrices are suitable.
These matrices define which pixels must be set to represent an intensity
value by the n $$\times $$n large pixel. The five pixel matrices of the
first row in Fig. 6.1 are encoded by the dither matrix $$D_2$$. The
ten pixel matrices for the 3 $$\times $$3 grey level representation are
encoded by the dither matrix $$D_3$$ as follows:
$$ D_2 \; = \; \left( \begin{array}{cc} 0 &{} 2 \\ 3 &{} 1
\end{array} \right) , \quad \qquad D_3 \; = \; \left( \begin{array}{ccc} 6
&{} 8 &{} 4\\ 1 &{} 0 &{} 3\\ 5 &{} 2 &{}
7 \end{array} \right) $$
Halftoning can also be applied to non-binary intensity levels in order to
refine the intensity levels further. For example, using 2 $$\times $$2
matrices with four different intensity levels for each pixel results in 13
intensity levels. This can be represented by the following dither matrices:
$$ \begin{array}{ccccc} \left( \begin{array}{cc} 0 &{} 0\\ 0 &
{} 0 \end{array} \right) , &{} \left( \begin{array}{cc} 1 &{} 0\\ 0
&{} 0 \end{array} \right) , &{} \left( \begin{array}{cc} 1 &
{} 0\\ 0 &{} 1 \end{array} \right) , &{} \left( \begin{array}{cc} 1
&{} 1\\ 0 &{} 1 \end{array} \right) , &{} \left(
\begin{array}{cc} 1 &{} 1\\ 1 &{} 1 \end{array} \right) , \\
[7mm] \left( \begin{array}{cc} 2 &{} 1\\ 1 &{} 1 \end{array}
\right) , &{} \left( \begin{array}{cc} 2 &{} 1\\ 1 &{} 2
\end{array} \right) , &{} \left( \begin{array}{cc} 2 &{} 2\\ 1
&{} 2 \end{array} \right) , &{} \left( \begin{array}{cc} 2 &
{} 2\\ 2 &{} 2 \end{array} \right) , &{} \left( \begin{array}{cc} 3
&{} 2\\ 2 &{} 2 \end{array} \right) , \\[7mm] \left( \begin{array}
{cc} 3 &{} 2\\ 2 &{} 3 \end{array} \right) , &{} \left(
\begin{array}{cc} 3 &{} 3\\ 2 &{} 3 \end{array} \right) , &
{} \left( \begin{array}{cc} 3 &{} 3\\ 3 &{} 3 \end{array} \right)
.&\end{array} $$
In this encoding, for example, the first matrix in the second row represents
the fifth intensity level (counting starts at 0), which is obtained by colouring
one of the four finer pixels with grey level 2 and the other three with grey
level 1. As mentioned at the beginning of this section, the halftoning
method can also be applied to colour images.

6.2 Colour Models and Colour Spaces


The human visual system can perceive wavelengths in a range with a lower
limit of approximately 300–400 nm (violet) and an upper limit of
approximately 700–800 nm (red). Theoretically, a colour is described by the
distribution of the intensities of the wavelengths in the frequency spectrum.
The human eye has three types of receptors called cones for colour
perception. The sensitivity of the cones is specialised to different
wavelength ranges. The L-type receptor is sensitive to long wavelengths,
covering the red portion of the visible spectrum. The M-type and S-type
receptors detect light in the middle and short wavelength ranges of the
visible light spectrum, respectively. This makes them sensitive to the
colours green and blue. We usually speak of red, green and blue receptors.
Blue receptors have a much lower sensitivity than the other two receptors.
The three receptor types mentioned could be detected in the human eye
through physiological research, thus confirming the previously established
trichromatic theory according to Young and Helmholtz (see [4, p. 207]).
The following characteristics are essential for the human intuitive
perception of colour:
The hue determined by the dominant wavelength.
The saturation or colour purity, which is very high when the frequency
spectrum is very focused on the dominant wavelength. When there is a
wide spread of frequencies, the saturation is low.
The intensity or lightness, which depends on the energy with which the
frequency components are present.

Fig. 6.2 Distribution of the energies over the wavelength range at high (top) and low (bottom)
saturation
Figure 6.2 shows the distribution of the energies for the wavelengths at
high and low saturation. Since the wavelength $$\lambda $$ is
inversely related to the frequency f via the speed of light c1 and the
relationship $$c = \lambda \cdot f$$, the frequency spectrum can be
derived from the wavelength spectrum. The perceived intensity is
approximately determined by the mean height of the spectrum.
Based on physiological and psychophysical findings, various colour
models have been developed. A distinction is made between additive and
subtractive colour models. In additive colour models, the colour is
determined in the same way as when different coloured light is
superimposed. On a black or dark background, light of different colours is
mixed additively. The addition of all colours results in a white colour.
Examples of this are computer monitors or projectors. Subtractive colour
models require a white background onto which pigment colours are applied.
The mixture of all colours results in a black colour. Such subtractive colour
models are used for colour printers for example.
The most commonly used colour model in computer graphics is the
RGB colour model. Most computer monitors and projectors work with this
model. Each colour is composed additively of the three primary colours red,
green and blue. Thus, a colour can be uniquely defined by three values
$$r, g, b \in [0,1]$$. If 0 is the minimum and 1 the maximum
intensity, the intensity vector (0, 0, 0) corresponds to black and (1, 1, 1) to
white. The intensity vector (x, x, x) with $$x \in [0,1]$$ corresponds
to shades of grey, depending on the choice of x. The intensity vectors
(1, 0, 0), (0, 1, 0) and (0, 0, 1) correspond to red, green and blue,
respectively. In computer graphics, one byte is usually used to encode one
of the three colours, resulting in 256 possible intensity levels for each of the
three primary colours. Therefore, instead of using three floating point
values between 0 and 1, three integer values between 0 and 255 are often
used to specify a colour. This encoding of a colour by three bytes results in
the so-called 24-bit colour depth, which is referred to as true colour and is
widely used for computer and smartphone displays. Higher colour depths of
30, 36, 42 or 48 bits per pixel are common for high quality output or
recording devices.
The subtractive CMY colour model is the complementary model to the
RGB colour model and is used for printers and plotters. The primary
colours are cyan, magenta and yellow. The conversion of an RGB colour to
its CMY representation is given by the following simple formula:
$$ \left( \begin{array}{c} C\\ M\\ Y \end{array} \right) \; = \; 1 - \left(
\begin{array}{c} R\\ G\\ B \end{array} \right) $$

Fig. 6.3 RGB and the CMY colour model

Figure 6.3 shows a colour cube for the RGB and the CMY colour
model. The corners of the cube are labelled with their corresponding
colours. In the RGB colour model, the origin of the coordinate system is in
the back, lower left corner representing the colour black. In the CMY
model, the origin is located in the front, upper right corner representing the
colour white. The grey tones are located on the diagonal line between the
black-and-white corners.
In practice, printers do not exclusively use the colours cyan, magenta
and yellow from the CMY model, but instead use a four-colour printing
process corresponding to the CMYK colour model with the fourth colour
black2 (K). In this way, the colour black can be better reproduced than by
mixing the three other colours. The colour values of the CMY colour model
to the CMYK colour model can be converted according to the following
equations:
$$\begin{aligned} K:= & {} \min \{C,M,Y\}\\ C:= & {} C-K\\
M:= & {} M - K\\ Y:= & {} Y -K. \end{aligned}$$
Based on these equations, at least one of the four values C, Y, M, K is
always equal to zero.
The YIQ colour model, YUV colour model and the
$$YC_{b}C_{r}$$ colour model originate from the field of television
signal processing and transmission. In these three colour models, the
luminance Y, which is a measure of lightness, is separated from two
measures of chrominance (colourfulness), which are used to represent
colour. This provides backward compatibility with the older black-and-
white (greyscale) television systems. Furthermore, the colour signals can be
transmitted with a lower bandwidth than the luminance signal, as the human
visual system is less tolerant of blurring in the luminance signal than of
blurring in the colour signals. A lower bandwidth means that a data type
with a reduced number of bits can be used for the digital encoding of such
signals, so that less data needs to be transmitted. This property of the human
visual system is also used for the compression of digital images, for
example, the JPEG compression method, which provides the conversion of
RGB images into the
$$\text{ YC}_{\text{ b }}\text{ C}_{\text{ r }}$$ colour model. The
YIQ colour model is a variant of the YUV colour model and was originally
intended for the American NTSC television standard for analogue
television, but was replaced early on by the YUV colour model. The YUV
colour model is thus the basis for colour coding in the PAL and NTCS
television standards for analogue television. The
$$\text{ YC}_{\text{ b }}\text{ C}_{\text{ r }}$$ colour model is a
variant of the YUV colour model and is part of the international standard
for digital television [2, p. 317–320].
The conversion from the RGB colour model to the
$$\text{ YC}_{\text{ b }}\text{ C}_{\text{ r }}$$ colour model is
given by the following equation (see [2, p. 319]):
$$ \left( \begin{array}{c} Y\\ C_b\\ C_r \end{array} \right) \; = \; \left(
\begin{array}{rrr} 0.299 &{} 0.587 &{} 0.114 \\ -0.169 &{}
-0.331 &{} 0.500 \\ 0.500 &{} -0.419 &{} -0.081
\end{array} \right) \cdot \left( \begin{array}{c} R\\ G\\ B \end{array}
\right) . $$
The luminance value Y represents the greyscale intensity of the image. This
is the luminance value that a greyscale image must have in order to produce
the same impression of lightness as the original colour image. From the top
row of the matrix in the equation, the weighting factors for the RGB colour
components can be read to determine Y. These values are based on the ITU
recommendation BT.601.3 According to the ITU Recommendation for high-
definition television (HDTV) BT.7094 the weighting factors for the
luminance calculation written as line vector are (0.2126, 0.7152, 0.0722).
The green colour component has the highest influence and the blue colour
component the least influence on the luminance, which is consistent with
the findings of physiological research on the sensitivity of the colour
receptors of the human eye (see above). For example, the Y value can be
used to adjust the brightness of different monitors very well or to calculate a
greyscale image with the same brightness distribution as in the colour
image. Attempting to individually adjust the RGB values to change the
brightness easily leads to colour distortions.
Similar to the colour models from the television sector, the HSV colour
model is also not based on primary colours, but instead on the three
parameters hue, saturation and value (intensity). Since these parameters
correspond to the intuitive perception of colour (see above), they can be
used to make a simple selection of a colour, for example, in a raster
graphics editing programme.

Fig. 6.4 The HSV colour model

The HSV model can be represented in the form of an upside-down


pyramid (see Fig. 6.4). The tip of the pyramid corresponds to the colour
black. The hue H is specified by the angle around the vertical axis. The
saturation S of a colour is zero on the central axis of the pyramid (V-axis)
and is one at the sides. The value V encodes lightness and increases from
the bottom to the top. The higher V is, the lighter the colour. Strictly
speaking, the HSV colour model leads to a cylindrical geometry. Since the
colour black is intuitively always considered to be completely unsaturated,
the cylindrical surface representing black is often unified into one point.
This results in the pyramid shown in the figure.
The HLS colour model is based on a similar principle to the HSV colour
model. The hue is defined by an angle between $$0^\circ $$ and
$$360^\circ $$ around the vertical axis with red ( $$0^\circ $$),
yellow ( $$60^\circ $$), green ( $$120^\circ $$), blue (
$$240^\circ $$) and purple ( $$300^\circ $$). A value between
zero and one defines the lightness. Saturation results from the distance of a
colour from the central axis (grey value axis) and is also characterised by a
value between zero and one.

Fig. 6.5 The HLS colur model

Figure 6.5 shows the HLS colour model. Sometimes an HLS colour
model is also used in the form of a double cone, as shown on the right in the
figure, instead of a cylinder. The double cone reflects the fact that it does
not make sense to speak of saturation for the colours black and white, when
grey values are already characterised by the lightness value.
The equations for converting from the HSV and HLS colour model to
the RGB colour model and vice versa can be found in [3] or [2, p. 306–
317]. Setting a desired colour or interpreting a combination of values is
very difficult with colour models such as RGB or CMY, at least when all
three colour components interact. Therefore, colour tables or colour
calculators exist from which assignments of a colour to the RGB values can
be taken. Colour adjustment using the HSV and HLS colour models is
much easier, as these colour models are relatively close to an intuitive
colour perception. For this reason, HSV and HLS are referred to
as perception-oriented colour models. A further simplification of the colour
selection can be achieved by using colour palettes. This involves first
selecting a set of colours using an appropriate colour model and then only
using colours from this palette.
The colour naming system (CNS) is another perception-oriented colour
system that, like the HSV and HLS colour models, specifies a colour based
on hue, saturation and lightness. For this purpose, the CNS uses linguistic
expressions (words) instead of numbers. For the colour type, the values
purple, red, orange, brown, yellow, green, blue with further subdivisions
yellowish green, green-yellow or greenish yellow are available. The
lightness can be defined by the values very dark, dark, medium, light and
very light. The saturation is defined by greyish, moderate, strong and vivid.
Even though this greatly limits the number of possible colours, the
description of a colour is possible in a very intuitive way.
The reproduction of colours depend on the characteristics of output
devices, such as the LCD display of monitors or printing process of colour
printers. Since the colours used in these devices have a very large variance,
it takes some effort, for example, to reproduce the exact same colour on a
printout that was previously displayed on the monitor. Colour deviations
exist not only across device boundaries, but also within device classes (for
example, for different monitors). These deviations are caused, for example,
by the use of different reproduction processes (e.g., laser printing or inkjet
printing) and different components (e.g., printer ink of different chemical
composition).
Colour models, as presented in this chapter so far, can be seen as only a
definition of how colours can be represented by numbers, mainly in the
form of triples or quadruples of components. These components could take
—in principal—arbitrary values. In order to accurately reproduce colours
on a range of output devices, a defined colour space is needed.
The so-called colorimetric colour spaces or calibrated colour spaces
are required for such an exactly reproducible and device-independent
representation of colours. The basis for almost all colorimetric colour
spaces used today is the CIEXYZ colour space standardised by the
Commission Internationale de l’Éclairage (CIE, engl. International
Commission on Illumination). This colour space was developed by
measurements with defined test subjects (standard observers) under strictly
defined conditions. It is based on the three artificial colours X, Y and Z, with
which all perceptible colours can be represented additively. Values from
common colour models, such as the RGB colour model, can be mapped to
the CIEXYZ colour space by a linear transformation to calibrate the colour
space [2, p. 341–342]. Important for the precise measurement and
reproduction of colour within the physical reality is the illumination.
Therfore, the CIEXYZ colour space uses specified standard illuminants. [2,
p. 344–345].
Since the CIEXYZ colour space encompasses all perceptible colours, it
can be used to represent all colours encompassed by another colour space. It
can also be used to describe what colour space certain output devices such
as monitors or printers encompass. The set of reproducible colours of a
device or particular colour space is called gamut or colour gamut. The
gamut is often plotted on the CIE chromaticity diagram, which can be
derived from the CIEXYZ colour space. In this two-dimensional diagram,
the range of perceptible colours has a horseshoe-shaped form. [2, p. 345]
The main disadvantage of the CIEXYZ colour space is the nonlinear
mapping between human perception and the colour distances in the model.
Due to this disadvantage, the CIELab colour space was developed, which is
also referred to as CIE L*a*b* colour space. In this colour space, the
distances between colours correspond to the colour spacing according to
human perception. [2, p. 346–348]
For computer graphics, the direct use of the CIEXYZ colour space is
too unwieldy and inefficient, especially for real-time representations.
Therefore, the standard RGB colour space (sRGB colour space) is used,
which was developed for computer-based and display-oriented applications.
The sRGB colour space is a precisely defined colour space, with a
standardised mapping to the CIEXYZ colour space. The three primary
colours, the white reference point, the ambient lighting conditions and the
gamma values are specified for this purpose. The exact values for these
parameters can be found, for example, in [2, p. 350–352]. For the
calculation of the colour values in the sRGB colour space, a linear
transformation of the values from the CIEXYZ colour space into the linear
RGB colour space takes place first. This is followed by the application of a
modified gamma correction, which introduces a nonlinearity. Let
$$(r_l, g_l, b_l)$$ be a colour tuple from the RGB colour model with
$$r_l, g_l, b_l \in [0, 1]$$ after the linear transformation from the
CIEYXZ colour space. The conversion of each of these three colour
components $$c_l$$ is done by the function $$f_s$$ according
to the following equation (see [2, p. 352]):
$$\begin{aligned} f_s(c_l) \; = \; \left\{ \begin{array}{ll} 0 &
{} \text{ if } c_l \le 0\\[1mm] 12.92 \cdot c_l &{} \text{ if } 0<
c_l< 0.0031308 \\[1mm] 1.055 \cdot {c_l}^{0.41666} - 0.055 (6.1)
&{} \text{ if } 0.0031308 \le c_l < 1 \\[1mm] 1 &{} \text{
if } 1 \le c_l \end{array} \right. \end{aligned}$$
Let the result of the application of the function $$f_s$$ be the colour
tuple $$(r_s, g_s, b_s)$$, whose colour components are restricted to
the value range [0, 1] due to this equation. For the exponent (gamma value),
the approximate value $$\gamma = 1 / 2.4 \approx 0.4166$$ is used.
For display on an output device and for storing in memory or in a file, these
colour components are scaled to the interval [0, 255] and discretised to an
eight bits representation each.
The inverse transformation of the colour tuple $$(r_s, g_s, b_s)$$
with the nonlinear colour components $$r_s, g_s, b_s \in [0, 1]$$ into
linear RGB colour components is done for each of the three colour
components $$c_s$$ by the function $$f_{sr}$$ according to
the following equation (see [2, p. 353]):
(6.2)
$$\begin{aligned} f_{sr}(c_s) \; = \; \left\{ \begin{array}{ll} 0
&{} \text{ if } c_s \le 0\\[1mm] \frac{c_s}{12.92} &{}
\text{ if } 0< c_s< 0.04045 \\[1mm] \Bigl (\frac{c_s + 0.055}
{1.055}\Bigr )^{2.4} &{} \text{ if } 0.04045 \le c_s < 1 \\
[1mm] 0 &{} \text{ if } 1 \le c_s \end{array} \right.
\end{aligned}$$
The values of the resulting colour tuple $$(r_l, g_l, b_l)$$ (with the
three linear colour components) are again limited to the value range [0, 1]
and can be converted back into the CIEXYZ colour space by applying the
inverse linear transformation.
The gamma value for the function $$f_s$$ according to Eq. (6.1)
is $$\gamma \approx 1 / 2.4$$. For the inverse function
$$f_{sr}$$, according to Eq. (6.2), it is $$\gamma = 2.4$$.
Since these are modified gamma corrections due to the linear ranges for
small colour component values (see the second case in each of the above
formulae), the effective gamma values under the hypothetical assumption of
non-modified (pure) gamma correction are
$$\gamma \approx 1 / 2.2$$ and $$\gamma \approx 2.2$$.
The sRGB colour space was developed for CRT monitors so that the 8-
bit values of the nonlinear colour components $$(r_s, g_s, b_s)$$ can
be output on these devices without further processing or measures. Even
though these monitors no longer play a role today, many current devices on
the mass market, such as LCD monitors, simulate this behaviour so that
sRGB values can still be displayed without modifications. Since the sRGB
colour space is very widespread in many areas, it can be assumed that
practically every image file (for example, textures) with 8-bit values per
colour channel contains colour values according to the sRGB colour space.
The sRGB colour space is also the standard colour space in Java. Strictly
speaking, for any calculations with these colour values, the individual
colour components must be converted into linear RGB values using the
function $$f_{sr}$$ according to Eq. (6.2). After the desired
calculations, the resulting linear RGB values must be converted back into
nonlinear sRGB colour values using the $$f_s$$ function according
to Eq. (6.1). However, these steps are often not applied in practice. [2, p.
353]
In contrast to the CIELab colour space, the sRGB colour space has a
relatively small gamut, which is sufficient for many (computer graphics)
applications, but can cause problems for printing applications. Therefore,
the Adobe RGB colour space was developed, which has a much larger
gamut than the sRGB colour space. [2, p. 354–355] For detailed
explanations, figures and transformation formulas for colorimetric colour
spaces, please refer to [2, p. 338–365].

6.3 Colours in the OpenGL


In the OpenGL, the RGB colour model (see Sect. 6.2) is used. The values of
the individual components of a colour tuple are specified as floating point
values from the interval from zero to one. Thus, for a colour tuple (r, g, b)
$$r, g, b \in [0, 1]$$ applies. As in many other technical applications, a
fourth component, the so-called alpha value, is added in the OpenGL for the
representation of transparency—or rather opacity (non-transparency). For a
detailed description of the handling of transparency and translucency in
connection with the illumination of scenes, see Sect. 9.7. The alpha
component is also a floating point value within the interval [0, 1]. An alpha
value of one represents the highest opacity and thus the least transparency.
An object with such a colour is completely opaque. An alpha value of zero
represents the lowest opacity and thus the highest transparency. An object
with such a colour is completely transparent. Thus, the colour of a vertex,
fragment or pixel including an opacity value (transparency value) can be
expressed as a quadruple of the form (r, g, b, a) with
$$r, g, b, a \in [0, 1]$$, which is called RGBA colour value.
In the core profile, the RGB values are mostly used for colour
representation and the alpha value for the representation of opacity. But
their use is not limited to this application. For example, in shaders, the
RGBA values can also be used for any purpose other than colour
representation.
In the compatibility profile, the colour index mode is available. This
makes it possible to define a limited number of colours that serve as a
colour palette for the colours to be displayed. The colours are stored in a
colour index table. A colour is accessed by the number of the entry (the
index) of the table. This index can be used in the framebuffer and elsewhere
instead of a colour with the full colour resolution (for example 24 bits). For
a colour index table with 256 values, only an integer data type of eight bits
is needed for an index. Such a value can, therefore, be stored in one byte.
This type of representation allows for more efficient storage of image
information than using full resolution colour values. This can be use, for
example, for a smaller framebuffer, faster interchangeability of colour
information and faster access to colours. However, these advantages are
also countered by disadvantages, especially in the flexible use of colour
information in shaders. For example, blending between colour indices is
only defined meaningfully in exceptional cases (for example, with a
suitable sorting of the colour values in the colour index table). Furthermore,
in modern GPUs there is usually enough memory and computing power
available so that it is possible to calculate with RGBA values of the full
resolution per vertex, fragment or pixel without any problems. For these
reasons, the colour index mode is not available in the core profile. In some
window systems and in JOGL, such a mode does not exist either.
In the OpenGL, the conversion to the nonlinear sRGB colour space is
provided. The conversion of the colour components from the linear RGB
colour components takes place according to formula (6.1), if this
conversion has been activated as follows:

Deactivating the sRGB conversion can be done with the following


instruction:

6.4 Colour Interpolation


Most of the colour models, which are presented in this chapter, use three
colour components to characterised a colour. For example, a colour in the
RGB colour model corresponds to a vector $$(r,g,b) \in [0,1]^3$$. This
interpretation can be used to calculate convex combinations of colours. One
possible application of such convex combinations are colour gradients. If a
surface is not to be coloured homogeneously but with a continuous change
form one colour to another colour, two colours $$(r_0,g_0,b_0)$$ and
$$(r_1,g_1,b_1)$$ can be defined for the two points
$$\boldsymbol{p}_0$$ and $$\boldsymbol{p}_1$$. At point
$$\boldsymbol{p}_0$$, colour $$(r_0,g_0,b_0)$$ is used, at point
$$\boldsymbol{p}_1$$, colour $$(r_1,g_1,b_1)$$ is used. For points
on the connecting line between the two points, the corresponding convex
combination of the two colours is used to determine the respective colour.
For the point
$$\boldsymbol{p} = (1 - \alpha )\,\boldsymbol{p}_1 + \alpha
\,\boldsymbol{p}_0$$
(with $$\alpha \in [0,1]$$), the colour
$$(1-\alpha )\cdot (r_0,g_0,b_0) + \alpha \cdot (r_1,g_1,b_1)$$ is used.
This corresponds to an interpolation of the colour values on the connecting
line from the given colours at the end points $$\boldsymbol{p}_0$$ and
$$\boldsymbol{p}_1$$. The same colour gradient is used parallel to the
connecting line.
When filling surfaces with textures (see Chap. 10), colour interpolation
can also be useful in some cases. If the texture has to be drawn several
times horizontally or vertically to fill the area, visible borders usually
appear at the edges between the individual texture tiles. A kind of tile
pattern becomes visible. In digital image processing, smoothing
operators [5] are used to make the edges appear less sharp. Simple
smoothing operators can be characterised by a weight matrix that is used to
modify the colour values of the pixels (or fragments). For example, the
weight matrix

0.1 0.1 0.1


0.1 0.2 0.1
0.1 0.1 0.1

means that the smoothed colour intensity of a pixel is the weighted sum of
its own intensity and the intensities of the neighbouring pixels. The
smoothed colour intensities are calculated individually for each of the
colours red, green and blue. In this weight matrix, the pixel’s own intensity
is given a weight of 0.2 and each of its eight immediate neighbouring pixels
is given a weight of 0.1. Depending on how strong the smoothing effect is
to be, the weights can be changed. For example, all pixels are set to a value
of 1/9 to obtain a strong smoothing effect. Furthermore, the weight matrix
can be extended so that not only the immediate neighbours of the pixel are
considered for smoothing, but also pixels from a larger region. To smooth
the transitions between texture tiles, the smoothing must be done at the
edges. For this purpose, the pixels on the right edge should be considered as
left neighbours of the pixels at the left edge and vice versa. The upper
neighbouring pixels of a pixel on the upper edge are found at the
corresponding position on the lower edge.
This application of image smoothing effectively is a low-pass filtering
of an image (tile). Section 7.6.5 provides more details on this process and
various weighting matrices for low-pass filtering in the context of
antialiasing as part of the rasterisation stage of the graphics pipeline.
In Sect. 5.1.4, some possible applications of interpolators to model
continuous changes are presented. The interpolators considered there are all
based on geometric transformations. One application for such animated
graphics was the transformation of one geometric object into another one by
convex combinations of transformations. In order to smoothly transform
one arbitrary image into another, additional colour interpolators are
necessary. The simplest way to transform one image into another image of
the same format is the pixel-by-pixel convex combination of the intensity
values for the colours red, green and blue. However, this only achieves a
continuous blending of the two images. In this case, a new image appears
while the old image is faded out. More realistic effects can be achieved,
when also the geometric shapes in the two images are transformed properly
into each other. In this case, geometric transformations are required in
addition to colour interpolation. A common technique that does more than
just blend the two images is based on a triangulation of the two images. A
triangulation is a division using triangles. For this application, the two
triangulations of the two images must use the same number of triangle
vertices and the triangles must correspond to each other, i.e., if in the first
image the points $$\boldsymbol{p}_i$$, $$\boldsymbol{p}_j$$
and $$\boldsymbol{p}_k$$ form a triangle, then the corresponding
points in the second image must also represent a triangle within the
triangulation. The coordinates of the corresponding points in the two
images do not have to match.
Each triangle of the triangulation describes a section of the image that
has a counterpart in the corresponding triangle of the other image. Such
corresponding sections may be different in size and shape.
Fig. 6.6 Example of compatible triangulations of two images

Figure 6.6 illustrates this situation. On the left are two faces that are to
be transformed into each other. The right part of the figure shows
compatible triangulations of the two images. For example, the respective
upper triangle in the two triangulations stands for the forehead area, the
lower triangle for the chin area. It should be noted that the number of points
for the triangulations is identical for both images, but the points do not have
the same positions.
In order to transform one image into the other, step by step, a
triangulation is first determined for each intermediate image. The points of
the triangulation are determined as convex combinations of the
corresponding points in the two images. The triangles are formed according
to the same scheme as in the other two images.
If the points $$\boldsymbol{p}_i$$, $$\boldsymbol{p}_j$$
and $$\boldsymbol{p}_k$$ form a triangle in the first image and the
associated points $${\boldsymbol{p}'_i}$$,
$$\boldsymbol{p}'_j$$ and $$\boldsymbol{p}'_k$$ form a
triangle in the second image, then the points
$$\begin{aligned} (1 - \alpha )\,\boldsymbol{p}_l + \alpha
\,\boldsymbol{p}'_l l \in {i, j, k} \end{aligned}$$
form a triangle in the intermediate image.
This means that if three points in the first image define a triangle, the
triangle of the corresponding points belongs to the triangulation in the
second image. The convex combination of the corresponding pairs of points
describes a triangle of the triangulation of the intermediate image.
Within each triangle, the pixel colours are determined by colour
interpolation. To colour a pixel in the intermediate image, it must first be
determined to which triangle it belongs. If it belongs to several triangles,
i.e., if it lies on an edge or a vertex of a triangle, then any of the adjacent
triangles can be chosen. It is necessary to find out whether a pixel
$$\boldsymbol{q}$$ belongs to a triangle defined by the points
$$\boldsymbol{p}_1, \boldsymbol{p}_2, \boldsymbol{p}_3$$.
Unless the triangle is degenerated into a line or a point, there is exactly one
representation of $$\boldsymbol{q}$$ in the form
$$\begin{aligned} \boldsymbol{q} \; = \; \alpha _1 \cdot (6.3)
\boldsymbol{p}_1 + \alpha _2 \cdot \boldsymbol{p}_2 + \alpha _3
\cdot \boldsymbol{p}_3 \end{aligned}$$
where
$$\begin{aligned} \alpha _1 + \alpha _2 + \alpha _3 = 1.
(6.4)
\end{aligned}$$
This is a system of linear equations with three equations and three variables
$$\alpha _1, \alpha _2, \alpha _3$$. The vector equation (6.3) gives
two equations, one for the x-coordinate and one for the y-coordinate. The
third equation is the constraint (6.4). The point $$\boldsymbol{q}$$
lies inside the triangle spanned by the points
$$\boldsymbol{p}_1, \boldsymbol{p}_2, \boldsymbol{p}_3$$ if and
only if $$0 \le \alpha _1, \alpha _2, \alpha _3 \le 1$$ holds, that is, if
$$\boldsymbol{q}$$ can be represented as a convex combination of
$$\boldsymbol{p}_1, \boldsymbol{p}_2, \boldsymbol{p}_3$$.
After the triangle in which the pixel to be transformed lies and the
corresponding values $$\alpha _1, \alpha _2, \alpha _3$$ have been
determined, the colour of the pixel is determined as a convex combination
of the colours of the corresponding pixels in the two images to be
transformed into each other. The triangle in the intermediate image in which
the pixel is located corresponds to a triangle in each of the two images to be
transformed into each other. For each of these two triangles, the convex
combination of its vertices with weights
$$\alpha _1, \alpha _2, \alpha _3$$ specifies the point corresponding
to the considered pixel in the intermediate image. Rounding might be
required to obtain a pixel from the point coordinates. The colour of the
pixel in the intermediate image is a convex combination of the colours of
the two pixels in the two images to be transformed into each other.

Fig. 6.7 Computation of the interpolated colour of a pixel

Figure 6.7 illustrates this procedure. The triangle in the intermediate


image in which the pixel to be coloured is located is shown in the centre.
On the left and right are the corresponding triangles in the two images to be
transformed into each other. The three pixels have the same representation
as a convex combination with regard to the corner points in the respective
triangle.
The procedure described in this section is an example of the use of
barycentric coordinates for the interpolation of colour values for points
within a triangle. Values for the tuple
$$(\alpha _1, \alpha _2, \alpha _3)$$ represent the barycentric
coordinates of these points. Colour values are a type of the so-called
associated data for vertices and fragments in terms of the computer graphics
pipeline. Vertices or fragments can, for example, be assigned colour values,
texture coordinates, normal vectors or fog coordinates. The interpolation of
the associated data for the individual fragments within a triangle, i.e., the
filling of a triangle on the basis of the data at the three corner points stored
in the vertices, is an important part of the rasterisation of triangles. Section
7.5.4 describes barycentric coordinates and their use for interpolating
associated data for fragments within a triangle.

6.5 Exercises
Exercise 6.1 Let the colour value (0.6, 0.4, 0.2) in the RGB colour model
be given. Determine the colour value in the CMY colour model and in the
CMYK colour model. Calculate the colour value for the same colour for the
representation in the $$\text{ YC}_{\text{ b }}\text{ C}_{\text{ r }}$$
colour model.

Exercise 6.2 Research the formulas for calculating the Hue, Saturation
and Value components for the HSV colour model based on RGB values.

Exercise 6.3 Given are the following colour values in the RGB colour
model:
(0.8, 0.05, 0.05)
(0.05, 0.8, 0.05)
(0.05, 0.05, 0.8)
(0.4, 0.4, 0.1)
(0.1, 0.4, 0.4)
(0.4, 0.1, 0.4)
(0.3, 0.3, 0.3).
Display these colours in an OpenGL program, for example, by filling a
triangle completely with one of the colour values. Then, for each of these
colour tuples, specify a linguistic expression (name of the colour) that best
describes the colour. Convert the individual colour values into the HSV
colour model. What do you notice about the HSV values?

Exercise 6.4 Draw the curves from the Eqs. (6.1) and (6.2) for the
conversion between RGB and sRGB colour values. Note that the domain
and codomain of the functions both are from the interval [0, 1].

Exercise 6.5 Given are the following colour values in the RGB colour
model:
(0.8, 0.05, 0.05)
(0.4, 0.1, 0.4)
(0.3, 0.3, 0.3).
Assume these are linear RGB colour components. Convert these
individual colour values into sRGB values using the equations from this
chapter. Using an OpenGL program, draw three adjacent triangles, each
filled with one of the specified linear RGB colour values. Extend your
OpenGL program by three adjacent triangles, each of which is filled with
one of the calculated sRGB colour values. Extend your OpenGL program
by three more adjacent triangles, each of which is filled with one of the
sRGB colour values that are determined by the OpenGL internal functions.
Check whether these sRGB colour values calculated by the OpenGL match
the sRGB colour values you calculated.

Exercise 6.6 A circular ring is to be filled with a colour gradient that is


parallel to the circular arc. Write the equation of the circle in parametric
form and use this representation to calculate an interpolation function of the
colours depending on the angle $$\alpha $$. Assume that two colours are
given at $$\alpha _s = 0^\circ $$ and $$\alpha _e = 180^\circ $$,
between which interpolation is to take place.

Exercise 6.7 Let there be a circular ring whose outer arc corresponds to a
circle of radius one. Let the inner arc be parallel to this circle. Fill this
circular ring with a colour gradient, where at 0 degrees the colour X is given
and at 270 degrees the (other) colour Y. A (linear) colour interpolation is to
take place between the colours X and Y.
(a)
Sketch the shape described.
(b)
Give a formula to calculate the interpolated colour FI for each pixel
(x, y) within the gradient depending on the angle $$\alpha $$.
(c)
Then implement this technique of circular colour interpolation in an
OpenGL program.

References
1. H.-J. Bungartz, M. Griebel and C. Zenger. Einführung in die Computergraphik. 2. Auflage.
Wiesbaden: Vieweg, 2002.
[Crossref][zbMATH]

2. W. Burger and M. J. Burge. Digital Image Processing: An Algorithmic Introduction Using Java.
2nd edition. London: Springer, 2016.
[Crossref]

3. J. D. Foley, A. van Dam, S. K. Feiner and J. F. Hughes. Computer Graphics: Principles and
Practice. 2nd edition. Boston: Addison Wesley, 1996.
[zbMATH]

4. E. B. Goldstein. Sensation and Perception. 8th edition. Belmont, CA: Wadsworth, 2010.

5. S. E. Umbaugh. Computer Imaging: Digital Image Analysis and Processing. Boca Raton: CRC
Press, 2005.
[zbMATH]

6. G. Wyszecki and W. Stiles. Color Science: Concepts and Methods, Quantitative Data and
Formulae. 2nd edition. New York: Wiley, 1982.

Footnotes
1 The speed of light in vacuum is $$c_{\text{ light }} = 299.792.458$$ m/s.

2 Since the letter B (black) is already assigned to blue in the RGB colour model, K (key) is used for
black as an abbreviation for key colour.
3 International Telecommunication Union. ITU Recommendation BT.601-7 (03/2011).
https://www.itu.int/rec/R-REC-BT.601-7-201103-I/en, retrieved 22.4.2019, 21:50h.

4 International Telecommunication Union. ITU recommendation BT.709-6 (06/2015).


https://www.itu.int/rec/R-REC-BT.709-6-201506-I/en, retrieved 22.4.2019, 21:45h.
© Springer Nature Switzerland AG 2023
K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science
https://doi.org/10.1007/978-3-031-28135-8_7

7. Rasterisation
Karsten Lehn1 , Merijam Gotzes2 and Frank Klawonn3
(1) Faculty of Information Technology, Fachhochschule Dortmund,
University of Applied Sciences and Arts, Dortmund, Germany
(2) Hamm-Lippstadt University of Applied Sciences, Hamm, Germany
(3) Data Analysis and Pattern Recognition Laboratory, Ostfalia University
of Applied Sciences, Braunschweig, Germany

Karsten Lehn (Corresponding author)


Email: [email protected]

Merijam Gotzes
Email: [email protected]

Frank Klawonn
Email: [email protected]

Chapters 3 and 4 explain the computer graphical representations of basic


geometric objects, and Chap. 5 explains transformations for these objects.
The descriptions of these objects are based on the principles of vector
graphics. This allows a lossless and efficient transformation of objects and
scenes of objects. However, for display on common output devices such as
computer monitors or smartphone devices, the conversion of this vector-
based representation into a raster graphic is required. This conversion
process is referred to as rasterisation. This chapter compares the advantages
and disadvantages of these representation types. It also explains the basic
problems and important solutions that arise with rasterisation.
Traditionally, a line is an important geometric primitive of computer
graphics. Lines can be used, for example, to represent wireframe models or
the edges of polygons. Therefore, algorithms for efficient rasterisation were
developed early on, some of which are presented in this chapter.
Meanwhile, planar triangles are the most important geometric primitive in
computer graphics. They are very commonly used to approximate surfaces
of objects (with arbitrary accuracy). Not only more complex polygons but
also lines and points can be constructed from triangles. Therefore,
algorithms for rasterising and filling polygons, with a focus on planar
triangles, are presented in this chapter. When converting a vector graphic
into a raster graphic, undesirable disturbances usually occur. These
disturbances are called aliasing effects, which can be reduced
by antialiasing methods. A detailed section on this topic completes this
chapter.

7.1 Vector Graphics and Raster Graphics


An object to be drawn, if it is not already available as a finished image,
must first be described or modelled. This is usually done by means of
vector graphics, also called vector-oriented graphics. In such a
representation, the object to be modelled is described by the combination of
basic geometric objects, such as lines, rectangles, arcs of circles and
ellipses. Each of these basic objects can be uniquely defined by specifying a
few coordinates, which determine the position, and some parameters, such
as the radius in the case of a circle.
As explained in Sect. 3.1, these basic objects are also called geometric
primitives. In OpenGL, the essential geometric primitives available are
points, lines and triangles. These primitives are sometimes called base
primitives in OpenGL. All other available geometric primitives for drawing
and any other three-dimensional geometric object can be composed of base
primitives or approximated arbitrarily. Depending on the desired quality of
approximation, a large number of base primitives may be necessary.

Fig. 7.1 Representation of a house (a) as a vector graphic (b) and as a raster graphic (c)

Figure 7.1b shows a description as a very simple vector graphic of the


house from Fig. 7.1a. The house can be described as a sequence of point
coordinates or vectors. In addition, it must be specified whether two
consecutive points are to be connected with each other by a line or not. In
the figure, two not to be connected points are indicated by a dashed line.
In OpenGL, the description of a vector graphic is done by specifying a
sequence of vertices containing, among other data, the three-dimensional
position coordinates of each vertex. In addition, a parameter of the drawing
command specifies the geometric primitive to be drawn and thus
determines the use of the current vertex stream (see Sect. 3.3).
However, this form of an object description is not directly suitable for
display on a purely pixel-oriented output device such as a flat screen,
projector and printer. One of the few output devices that can directly
process a vector graphic is a plotter. With a pen plotter, pens are moved
over the medium to be printed, for example paper, according to the vectorial
description and lowered accordingly for writing. With a cathode ray tube,
commonly used in computer monitors and televisions in the past, a vector-
oriented graphic could theoretically be drawn directly by guiding the
cathode ray—or three cathode rays in a colour display—along the lines to
be drawn. Depending on whether a line is to be drawn or not, the cathode
ray is faded in or out. However, this can lead to the problem of a flickering
screen, since the fluorescence of the screen dots diminishes if they are not
hit again by the cathode ray. Flicker-free requires a refresh rate of about 60
Hz. If the cathode ray has to scan the individual lines of the vector graphic
over and over again, the refresh rate depends on how many lines the image
contains. Therefore, for more complex images, a sufficiently high refresh
rate for flicker-free operation cannot be guaranteed. To avoid this, the
cathode ray tube scans the screen line by line so that the refresh rate is
independent of the graphic (content) to be displayed. However, this method
is based on pixel-oriented graphics.
Not only for common monitors, projectors and printers but also for
various image storage formats such as JPEG,1 raster graphics or raster-
oriented graphics is used. A raster graphic is based on a fixed matrix of
image points (a raster). Each individual point of the raster is assigned a
colour value. In the simplest case of a black and white image, the pixel is
either coloured black or white.
If a vector graphic is to be displayed as a raster graphic, all lines and
filled polygons must be converted to points on a raster. This conversion is
called rasterisation or, in computer graphics, scan conversion. This process
requires a high computing effort. A common standard monitor has about
two million pixels, for each of which a colour decision has to be made for
every frame to be displayed, usually 60 times per second. Moreover, this
conversion leads to undesirable effects. The frayed or stepped lines
observed after rasterising are sometimes counted as aliasing effects. The
term aliasing effect originates from signal processing and describes
artefacts, i.e., artificial, undesirable effects, caused by the discrete sampling
of a continuous signal. Rasterisation can therefore be seen as a sampling of
a signal (the vector graphic).

Fig. 7.2 An arrowhead (left) shown in two resolutions (centre and right)

Even though an image must ultimately be displayed as a raster graphic,


it is still advantageous to model it in vector-oriented form, process it in the
graphics pipeline and save it. A raster graphic is bound to a specific
resolution. If a specific raster is specified, there are considerable
disadvantages in the display if an output device works with a different
resolution. Figure 7.2 shows the head of an arrow and its representation as a
raster graphic under two different resolutions of the raster. If only the
representation in the coarse resolution in the middle is known (saved), the
desired finer raster on the right can no longer be reconstructed without
further information. At best, the identical coarse representation in the centre
could be adopted for the finer resolution by mapping one coarse pixel onto
four finer pixels. If the ratio of the resolutions of the two screenings is not
an integer, the conversion from one resolution to another becomes more
complex. From the point of view of signal processing, the conversion of a
raster graphic into another resolution basically corresponds to a resampling,
whereby (new) aliasing effects can occur.

7.2 Rasterisation in the Graphics Pipeline and


Fragments
As shown in Sect. 7.1, vector graphics and raster graphics each have
advantages and disadvantages. In order to use the advantages and at the
same time avoid the disadvantages, vector graphics are used in the front
stages of graphics pipelines, for example, to model objects, to build up
scenes, to define the viewer’s perspective and to limit the scene to the scope
to be further processed by clipping. Raster graphics are processed in the
rear stages of graphics pipelines, since the raster resolution can be fixed in
these stages and the necessary computations can thus be carried out
efficiently, such as for the illumination of surfaces or the check for the
visibility of objects (see Sect. 2.5).
Therefore, within a graphics pipeline, rasterisation must be used to
convert a scene to be drawn into a two-dimensional image. A scene is
composed of objects that consist of geometric primitives, which in turn are
described by vertices (corner points of a geometric primitive). Thus, for
each different geometric primitive, a procedure for converting the vertices
into a set of pixels on a grid must be defined. As explained in Sect. 7.5.3,
any geometric primitive commonly used in computer graphics can be traced
back to a planar triangle or a set of planar triangles. This means, in principle
the consideration of only this geometric form is sufficient. All in all, a
rasterisation method for planar triangles can be used to convert a complete
scene from a vector graphic to a raster graphic.
If a nontrivial scene is viewed from a certain camera perspective, some
objects and thus the geometric primitives of the scene usually obscure each
other. Therefore, when rasterising a single geometric primitive, it cannot be
decided immediately which candidate image point must be displayed on the
screen for correct representation. In order to be able to make this decision
later, when all necessary information is available, the generated image point
candidates are assigned a depth value (z-value) in addition to the two-
dimensional position coordinate (window coordinate) and the colour
information. Such a candidate image point is called a fragment in computer
graphics. A fragment is distinct from a pixel, which represents an image
point actually to be displayed on the screen. A pixel results from visibility
considerations (see Chap. 8) of the overlapping fragments at the relevant
two-dimensional position coordinate. If all fragments are opaque (i.e., not
transparent), the colour value of the visible fragment is assigned to the
pixel. If there are transparent fragments at the position in question, the
colour value of the pixel may have to be calculated from a weighting of the
colour values of the fragments concerned.
In addition to the raster coordinates, the fragments are usually assigned
additional data that are required for visibility considerations, special effects
and ultimately for determining the final colour. These are, for example,
colour values, depth values, normal vectors, texture coordinates and fog
coordinates. This data, which is assigned to a vertex or fragment in addition
to the position coordinates, is called associated data. Depending on the
viewer’s position, the generation of many fragments from a few vertices
usually takes place through the rasterisation of a geometric primitive. The
calculation of the resulting intermediate values (additional required data)
for the fragments is done by linear interpolation of the vertex data based on
the two-dimensional position coordinates of the fragment within the
respective primitive. For the interpolation within a triangle, barycentric
coordinates are usually used (see Sect. 7.5.4).

Fig. 7.3 Representations of a fragment with coordinate (3, 3): In the left graph, the fragment is
represented as a circle on a grid intersection. The right graph shows the representation from the
OpenGL specification

To illustrate rasterisation methods, this book mostly uses the


representation as in Fig. 7.2, in which a fragment is represented by a unit
square between the grid lines. In other cases, the representation as in the left
part of Fig. 7.3 is more favourable, where the fragments correspond to
circles on the grid intersections. In the figure, a fragment is drawn at
location (3, 3). The right side of the figure shows the definition of a
fragment as in the OpenGL specification (see [20, 21]). In this
specification, the fragment is represented as a filled square. Its integer
coordinate is located at the lower left corner of the square, which is marked
by the small cross at the position (3, 3). The centre of the fragment is
marked by a small circle with the exact distance
$$(\frac{1}{2}, \frac{1}{2})$$ (offset) from the fragment coordinate.
According to the OpenGL specification, a fragment does not have to be
square, but can have other dimensions. If non-square fragments are used,
for example, lines can be created as raster graphics that are thicker in one
coordinate dimension than in the other coordinate dimension.
When describing methods and algorithms for rasterisation, the literature
often does not distinguish between fragments and pixels, as this difference
is not relevant in isolated considerations and visibility considerations only
take place in later stages of the graphics pipeline. Mostly, the term pixel is
used. In order to achieve a uniform presentation and to facilitate
classification in the stages of the graphics pipeline, the term fragment is
used uniformly in this book when this makes sense in the context of the
consideration of rasterisation by the graphics pipeline.

7.3 Rasterisation of Lines


This section explains procedures for rasterising lines. Section 7.5 presents
methods and procedures for rasterising or filling areas and in particular for
rasterising triangles. The algorithm according to Pineda for the rasterisation
of triangles can be parallelised very easily and is therefore particularly
suitable for use on graphics cards, since these usually work with parallel
hardware.

7.3.1 Lines and Raster Graphics


In the following, the task of drawing a line from point $$(x_0,y_0)$$ to
point $$(x_1,y_1)$$ on a grid is considered. In order to keep the problem
as simple as possible at first, it shall be assumed that the two given points
lie on the available grid, i.e., their coordinates are given as integer values.
Without loss of generality, it is further assumed that the first point is not
located to the right of the second point, i.e., that $$x_0 \le x_1$$ holds.

Fig. 7.4 Pseudocode of a naive algorithm for drawing a line from point $$(x_0, y_0)$$ to point
$$(x_1, y_1)$$

A naive approach to drawing the line between these two points on the
raster would step incrementally through the x-coordinates starting at
$$x_0$$ up to the value $$x_1$$ and calculate the
corresponding y-value for the line in each case. Since this y-value is
generally not an integer, this y-value must be rounded to the nearest integer
value. The fragment with the corresponding x- and the rounded y-coordinate
is drawn. Figure 7.4 describes this algorithm in the form of pseudocode.
First, it should be noted that for vertical lines, i.e., where
$$x_0 = x_1$$ holds, this algorithm performs a division by zero when
calculating the slope m and therefore fails. Even if this situation is treated as
a special case, the algorithm remains ineffective, as Fig. 7.5 shows. The
upper, horizontal line is drawn perfectly as can be expected for a raster
graphic. The slightly sloping straight line below is also drawn correctly
with the restrictions that are unavoidable due to the raster graphics. The
fragments marked in black are set by the algorithm. The ideal line, which is
to be approximated by these fragments of the raster graphic, is also drawn
for clarification.
However, the line with the strongly negative slope at the bottom left is
not even correctly reproduced in the sense of a raster graphic by the drawn
fragments. Not all fragments are set that would be expected when
representing the line as a raster graphic. This error is caused by the fact that
the fragments x-values are incremented by one in each step, and the
fragment is drawn that is created by rounding the corresponding y-value on
the line to be drawn. Since this line has an absolute value of the slope
greater than one, grid positions in the y-direction are skipped when drawing.
This effect occurs with all lines whose absolute slope is greater than one.
The effect becomes stronger the larger the absolute slope is.

Fig. 7.5 Line drawn with the naive line algorithm

The solution to this problem is to swap the roles of the x- and y-axes for
lines with an absolute slope greater than one, i.e., in this case the y-axis is
incremented in steps of one instead of the x-axis. This simultaneously
solves the aforementioned problem of division by zero for a vertical line. A
vertical line with an infinite slope becomes a horizontal line with zero slope
if the two coordinate axes are swapped.
Traversal and drawing of lines are tasks that must be performed
extremely frequently in computer graphics when displaying images. For
this reason, the algorithm for drawing lines should be as efficient as
possible. The naive line drawing algorithm from Fig. 7.4 could be used,
taking into account a possible swap of the roles of the coordinate axes for
lines with an absolute slope greater than one. In the last line of the
pseudocode, however, there are still two possibilities. The first formula
requires only a single addition and is thus faster than the second formula,
which is commented out. The second formula needs two additions, i.e., an
addition and a subtraction, and a multiplication. In general, multiplication
requires more computing time than addition. Therefore, the first version for
the computation of the y-value should be preferred. With this variant, there
is at most the danger that rounding errors accumulate due to the repeated
addition within the loop. Due to the integer rounding that has to be done
anyway during drawing, the accumulated error would have to be extremely
large to become visible. A few hundred or thousand loop iterations, as
would be necessary for display on standard computer monitors, would not
produce nearly such large rounding errors.

7.3.2 Midpoint Algorithm for Lines According to Bresenham


Drawing of lines can be done much more efficiently by a technique that is
very important in computer graphics than the method presented in Sect.
7.3.1. The naive line drawing algorithm requires floating point operations to
finally determine the coordinates of the fragments rounded to integer
values. If floating point arithmetic could be completely eliminated and all
computations could be performed in the usually much faster integer
arithmetic, a great gain in efficiency would result. Such an algorithm for
drawing lines, based solely on integer operations, was developed by J. E.
Bresenham [5]. This algorithm is explained in more detail below. Since
modern graphics hardware has very efficient and optimised floating point
operations, it must be verified in each individual case which type of
calculation and algorithm is actually more efficient.
When considering the naive algorithm for drawing line segments, it has
already been established that it must be ensured that the absolute value of
the slope of the line to be drawn does not exceed the value of one. For a line
to be drawn with an absolute slope greater than one, the roles of the x- and
y-axes must be swapped. Before starting to draw the line segment, it should
therefore be determined which coordinate axis is to be considered the x-axis
on the basis of the absolute slope of the line. After this decision, only a line
must be drawn whose absolute slope—related to the possibly swapped
coordinate axes—does not exceed the value one. In the following, only the
case is considered in which the slope lies between 0 and 1. The
considerations for the case of a slope between 0 and $$-1$$ are
completely analogous. The coordinate axis whose values are iteratively
incremented by one to draw the line is to be called the x-axis, even though
the roles of the coordinate axes were possibly swapped.
If the fragment $$(x_p,y_p)$$ was drawn last when drawing a
line, only two fragments are possible as the next fragment to be drawn.
Since it was assumed that the slope of the line to be drawn is positive, the y-
value of the line at the point $$(x_p+1)$$ must be at least as large as
at the point $$x_p$$. Thus, the rounded value $$y_{p+1}$$
cannot be smaller than $$y_p$$. Since it was also assumed that the
slope of the line is at most one, the y-value of the line at the point
$$(x_p+1)$$ can also be at most by one greater than the y-value at the
point $$x_p$$. The same applies to the rounded value
$$y_{p+1}$$. In total, therefore,
$$y_p \le y_{p+1} \le y_p +1$$ must hold. This means that after
drawing the fragment $$(x_p,y_p)$$, the next step is to draw either
the fragment $$(x_{p+1}, y_{p+1}) = (x_p+1,y_p)$$ or the fragment
$$(x_{p+1}, y_{p+1}) = (x_p+1,y_p+1)$$. Therefore, when drawing,
only one decision has to be made in each step between the immediate right
neighbouring fragment and the diagonally right upper neighbouring
fragment.

Fig. 7.6 Representation of the two fragment candidates when drawing a line (based on [11])

Figure 7.6 illustrates this situation. The neighbouring fragment on the


right (to the east) is labelled E, and the neighbouring fragment on the top
right (to the north-east) is labelled NE.
In order to decide which of the two fragments is to be drawn, i.e., which
of the two fragments is closer to the point Q on the ideal line, it must be
verified as to whether the midpoint M between these two fragments lies
below or above the ideal line. If M lies below the ideal line, the upper
fragment, i.e., NE, must be drawn. If M lies above the ideal line, the lower
fragment E must be drawn. If M is exactly on the ideal line, this
corresponds to the problem of how to round with a decimal fraction of 0.5.
The decision for fragment NE corresponds to rounding 0.5 to one, and the
decision for fragment E corresponds to rounding 0.5 to zero. However, a
clear decision must be made, for example for the fragment NE, which is
consistently maintained throughout the line drawing process.
The previous considerations reduce the choice of which fragment to set
in each step of the line drawing to two possible fragments. In the following,
this reduction to two alternatives will be further exploited to avoid floating
point arithmetic and use only integer computations in order to decide
whether the midpoint M is above or below the ideal line, inducing which
fragment should be drawn.
If one considers a line as a function in the mathematical sense, the line
can be expressed by the following very frequently used straight line
equation:
$$\begin{aligned} y \; = \; f(x) \; = \; m \cdot x + b. \end{aligned}$$ (7.1)
This notation reflects the view of an explicit computational procedure. Each
x-value is assigned exactly one corresponding y-value with the help of this
calculation rule. For drawing lines, this type of consideration is not the most
favourable. On the one hand, vertical straight lines cannot be written in the
form (7.1). On the other hand, this representation is of little help in the
above considerations of deciding which fragment to draw on the basis of
the midpoint. Of course, Eq. (7.1) could be used to calculate the y-value on
the line. However, the comparison with the midpoint M can be omitted,
since the fragment to be drawn results directly from rounding. Therefore, a
form of representation other than (7.1) is chosen in the following.
Any function $$y=f(x)$$, especially a simple function like a line,
can be expressed in the implicit form as follows:
$$\begin{aligned} F(x,y) \; = \; 0. \end{aligned}$$ (7.2)
This form does no longer consider the function as an explicit computational
procedure, but in the sense of the function graph, which consists of the set
of all points. This graph of a function consists of all points belonging to the
following set:
$$ \left\{ (x,y) \in \mathrm{I\!R}^2 \mid F(x,y) = 0 \right\} . $$
This is the set of points (x, y) that satisfy the implicit equation (7.2) defining
the function. A straight line can always be expressed in the following form:
$$\begin{aligned} F(x,y) \; = \; A \cdot x + B \cdot y + C = 0.
(7.3)
\end{aligned}$$
For example, the straight line function $$y=m\cdot x + b$$ can be
rewritten in the following form:
$$\begin{aligned} F(x,y) \; = \; m \cdot x - y + b = 0.
(7.4)
\end{aligned}$$
This representation is much better suited for determining whether a point,
especially the midpoint M, is above, below or on the given line. If the point
$$(x_M,y_M)$$ is inserted into Eq. (7.4), the following three
possibilities arise, where $$m \ge 0$$ is assumed:
$$F(x_M,y_M) = 0$$: The point $$(x_M,y_M)$$ is on the line
under consideration.
$$F(x_M,y_M)>0$$: If the point would be on the line, the value
$$y_M$$ would have to be greater. This means that the point
$$(x_M,y_M)$$ is below the line.
$$F(x_M,y_M)<0$$: If the point would be on the line, the value
$$y_M$$ would have to be smaller. This means that the point
$$(x_M,y_M)$$ lies above the line.
The sign of the value $$F(x_M,y_M)$$ can thus be used to decide
where the point $$(x_M,y_M)$$ is located relative to the line under
consideration. If the implicit representation of the straight line in the form
of Eq. (7.4) is used for this decision, calculations in floating point
arithmetic cannot be avoided. Since a line connecting two given grid points
$$(x_0,y_0)$$ and $$(x_1,y_1)$$ is to be drawn, Eq. (7.4) can
be further transformed so that only integer arithmetic operations are
required. The line given by these two points can be expressed by the
following equation:
$$ \frac{y - y_0}{x - x_0} \; = \; \frac{y_1 - y_0}{x_1 - x_0}. $$
By solving this equation for y and with the definition of the integer values2
$$dx = x_1 - x_0$$ and $$dy = y_1 - y_0$$, the following
explicit form is obtained:
$$ y \; = \; \frac{dy}{dx} x + y_0 - \frac{dy}{dx}x_0. $$
From this, the implicit form can be derived as follows:
$$ 0 \; = \; \frac{dy}{dx} x - y + y_0 - \frac{dy}{dx}x_0. $$
After multiplying this equation by dx, we get the implicit form
$$\begin{aligned} F(x,y) \; = \; dy \cdot x - dx \cdot y + C \; = \; 0
(7.5)
\end{aligned}$$
with
$$ C \; = \; dx \cdot y_0 - dy \cdot x_0. $$
The aim of these considerations is to enable the drawing of a line based
only on integer arithmetic. From the assumption that the line segment to be
drawn has a slope between zero and one, it follows that in each step it is
only necessary to decide which of two possible fragments is to be drawn.
Either the eastern (E) or the north-eastern (NE) fragment must be set. To
make this decision between the two fragment candidates, it must be verified
whether the line is above or below the midpoint M as illustrated in Fig. 7.6.
For this purpose, the representation of the line in implicit form is very
useful. If the midpoint is inserted into the implicit equation of the line, the
sign indicates how the midpoint lies in relation to the line. The midpoint
$$M = (x_M,y_M)$$ lies on the grid in the x-direction and the y-
direction between two grid points. The x-coordinate $$x_M$$ is
therefore an integer, and the y-coordinate $$y_M$$ has the form
$$ y_M \; = \; y_M^{(0)} + \frac{1}{2} $$
with an integer value $$y_M^{(0)}$$. Using the implicit form (7.5)
of the line and the correct value $$y_M$$, floating point number
operations are always necessary to compute its value. However, multiplying
Eq. (7.5) by the factor 2 yields the implicit form
$$\begin{aligned} \tilde{F}(x,y) \; = \; 2 \cdot dy \cdot x - dx \cdot
(7.6)
\cdot 2 \cdot y + 2 \cdot C \; = \; 0 \end{aligned}$$
for the line under consideration. If the midpoint $$M = (x_M,y_M)$$
is inserted into this equation, the computations can be reduced to integer
arithmetic. Instead of directly inserting the floating point value
$$y_M$$, which has the decimal part 0.5, only the integer
$$2 \cdot y_M^{(0)} + 1$$ value can be used for the term
$$2\cdot y_M$$.
In this way, the calculations for drawing lines in raster graphics can be
completely reduced to integer arithmetic. However, Eq. (7.6) is not used
directly for drawing lines, as incremental calculations can even avoid the
somewhat computationally expensive (integer) multiplications. Instead of
directly calculating the value (7.6) for the midpoint M in each step, only the
first value and the change that results in each drawing step are determined.
To determine the formulas for this incremental calculation, the implicit
form (7.5) is used instead of the implicit form (7.6). In each step, the
decision variable
$$ d \; = \; F(x_M,y_M) \; = \; dy \cdot x_M - dx \cdot y_M + C $$
indicates whether the fragment is to be drawn above or below the midpoint
$$M = (x_M,y_M)$$. For $$d > 0$$, the upper fragment NE
is to be set, and for $$d < 0$$ the lower fragment E. It is calculated
how the value of d changes in each step. This is done by starting with the
fragment $$(x_p, y_p)$$ drawn after commercial rounding. After
drawing the fragment $$(x_{p+1},y_{p+1})$$, it is determined how
d changes by considering the midpoint to determine the fragment
$$(x_{p+2},y_{p+2})$$. For this purpose, two cases have to be
distinguished, which are illustrated in Fig. 7.7.

Fig. 7.7 Representation of the new midpoint depending on whether the fragment E or NE was drawn
in the previous step (based on [11])

Case 1: E, i.e., $$(x_{p+1},y_{p+1}) = (x_p+1,y_p)$$ was the


fragment to be drawn after $$(x_p,y_p)$$. The left part of Fig. 7.7 shows
this case. Therefore, the midpoint
$$M_{\text { new}} = \left( x_p +2, y_p+\frac{1}{2}\right) $$ must be
considered for drawing fragment $$(x_{p+2},y_{p+2})$$. Inserting this
point into Eq. (7.5) yields the following decision variable:
$$ d_{\text {new}} \; = \; F\left( x_p + 2, y_p + \frac{1}{2}\right) \; = \;
dy\cdot (x_p + 2) - dx \cdot \left( y_p + \frac{1}{2}\right) + C. $$
In the previous step to determine fragment $$(x_{p+1},y_{p+1})$$, the
midpoint $$\left( x_p +1,\right. $$ $$\left. y_p+\frac{1}{2}\right) $$
had to be inserted into Eq. (7.5) so that the decision variable has the
following value:
$$ d_{\text {old}} \; = \; F\left( x_p + 1, y_p + \frac{1}{2}\right) \; = \;
dy\cdot (x_p + 1) - dx \cdot \left( y_p + \frac{1}{2}\right) + C. $$
The change in the decision variable in this case is therefore the value
$$ \varDelta _E \; = \;d_{\text {new}} - d_{\text {old}} \; = \; dy. $$
Case 2: NE, i.e., $$(x_{p+1},y_{p+1}) = (x_p+1,y_p+1)$$ was the
fragment to be drawn after $$(x_p,y_p)$$. The right part of Fig. 7.7
shows this case. Therefore, the midpoint
$$M_{\text {new}} = \left( x_p +2, y_p+\frac{3}{2}\right) $$ must be
considered for drawing fragment $$(x_{p+2},y_{p+2})$$. The value for
the decision variable results as follows:
$$ d_{\text {new}} \; = \; F\left( x_p + 2, y_p + \frac{3}{2}\right) \; = \;
dy\cdot (x_p + 2) - dx \cdot \left( y_p + \frac{3}{2}\right) + C. $$
The previous value of the decision variable d is the same as in the first case
of the eastern fragment E, so that the change in the decision variable is
$$ \varDelta _{NE} \; = \; d_{\text {new}} - d_{\text {old}} = \; dy - dx.
$$

In summarised form, this results in a change of the decision variable


$$ \varDelta \; = \; \left\{ \begin{array}{ll} dy &{} \text{ if } E
\text{ was } \text{ chosen } \\ dy - dx &{} \text{ if } NE \text{ was }
\text{ chosen } \end{array} \right. $$
that is
$$ \varDelta \; = \; \left\{ \begin{array}{ll} dy &{} \text{ if }
d_{\text {old}} < 0 \\ dy - dx &{} \text{ if } d_{\text {old}} > 0.
\end{array} \right. $$
The value of $$\varDelta $$ is always an integer, which means that
the decision variable d only changes by integer values.
In order to compute the value of the decision variable d in each step, the
initial value of d is needed in addition to the change of d. This is obtained
by inserting the first midpoint into Eq. (7.5). The first fragment on the line
to be drawn has the coordinates $$(x_0,y_0)$$. The first midpoint to
be considered is therefore $$\left( x_0+1, y_0 + \frac{1}{2}\right) $$,
so the first value of the decision variable can be calculated as follows:
$$\begin{aligned} d_{\text {init}}= & {} F\left( x_0+1, y_0 +
\frac{1}{2}\right) \\= & {} dy \cdot (x_0+1) - dx \cdot \left( y_0 +
\frac{1}{2}\right) + C\\= & {} dy \cdot x_0 - dx \cdot y_0 + C + dy -
\frac{dx}{2}\\= & {} F(x_0,y_0) + dy - \frac{dx}{2}\\= & {} dy
- \frac{dx}{2}. \end{aligned}$$
The value $$F(x_0,y_0)$$ is zero, since the point
$$(x_0,y_0)$$ lies by definition on the line to be drawn.
Unfortunately, the initial value $$d_{\text {init}}$$ is generally
not an integer, except when dx is even. Since the change dx of d is always
an integer, this problem can be circumvented by considering the decision
variable $$D = 2\cdot d$$ instead of the decision variable d. This
corresponds to replacing the implicit form (7.5) of the line to be drawn by
the following implicit form:
$$ D \; = \; \hat{F}(x,y) \; = \; 2\cdot F(x,y) \; = \; 2 \cdot dy \cdot x - 2
\cdot dx \cdot y + 2 \cdot C \; = \; 0. $$
For the determination of the fragment to be drawn, it does not matter which
of the two decision variables d or $$D = 2\cdot d$$ is used, since only
the sign of the decision variables is relevant.
For the initialisation and the change of D, the result is as follows:
$$\begin{aligned} D_{\text {init}}= & {} 2\cdot dy - dx\\
\nonumber D_{\text {new}}= & {} D_{\text {old}} + \varDelta (7.7)
\quad \text{ where }\end{aligned}$$

$$\begin{aligned} \varDelta= & {} \left\{ \begin{array}{ll} 2


\cdot dy &{} \text{ if } D_{\text {old}}<0\\ 2\cdot (dy - dx)
(7.8)
&{} \text{ if } D_{\text {old}}>0. \end{array} \right.
\end{aligned}$$
The decision variable D takes only integer values. For the initialisation of
D, multiplication and subtraction are required. In addition, the two values
for $$\varDelta $$ should be precalculated once at the beginning,
which requires two more multiplications and one subtraction. When
iteratively updating $$\varDelta $$, only one addition is needed in
each step. It should be noted that multiplying a number in binary
representation by a factor of two can be done very efficiently by shifting the
bit pattern towards the most significant bit and adding a zero as the least
significant bit, much like multiplying a number by a factor of ten in the
decimal system.
An example will illustrate the operation of this algorithm, which is
called midpoint algorithm or after its inventor Bresenham algorithm. Table
7.1 shows the resulting values for the initialisation and for the decision
variable for drawing a line from point (2, 3) to point (10, 6).
After the start fragment (2, 3) is set, the negative value $$-2$$ is
obtained for $$D_{\text {init}}$$, so that the eastern fragment is to
be drawn next and the decision variable changes by
$$\varDelta _E = 6$$. Thus, the decision variable becomes positive,
the north-eastern fragment is to be drawn and the decision variable is
changed by $$\varDelta _{NE}$$. In this step, the value of the
decision variable is zero, which means that the line is exactly halfway
between the next two fragment candidates. For this example, it should be
specified that in this case the eastern fragment is always drawn. Specifying
that the north-eastern fragment is always drawn in this case would also be
possible, but must be maintained throughout the line drawing process. The
remaining values are calculated accordingly. Figure 7.8 shows the resulting
line of fragments.
Table 7.1 Calculation steps of the midpoint algorithm using the example of the line from point (2, 3)
to point (10, 6)

$$\begin{array}{lclcrl} dx &{} = &{} 10 - 2 &{} = &{} 8\\ dy &{} =


&{} 6- 3 &{} = &{} 3\\ \varDelta _E &{} = &{} 2\cdot dy &{} =
&{} 6\\ \varDelta _{NE} &{} = &{} 2\cdot (dy - dx) &{} = &{} -10 \\
D_{\text {init}} &{} = &{} 2\cdot dy - dx &{} = &{} -2 &{} (E)\\
D_{\text {init}+1} &{} = &{} D_{\text {init}} + \varDelta _E &{} = &{} 4
&{} (NE)\\ D_{\text {init}+2} &{} = &{} D_{\text {init}+1} + \varDelta _{NE}
&{} = &{} -6 &{} (E)\\ D_{\text {init}+3} &{} = &{} D_{\text
{init}+2} + \varDelta _E &{} = &{} 0 &{} (E?)\\ D_{\text {init}+4} &{} =
&{} D_{\text {init}+3} + \varDelta _E &{} = &{} 6 &{} (NE)\\ D_{\text
{init}+5} &{} = &{} D_{\text {init}+4} + \varDelta _{NE} &{} = &{} -4
&{} (E)\\ D_{\text {init}+6} &{} = &{} D_{\text {init}+7} + \varDelta _E &
{} = &{} 2 &{} (NE)\\ D_{\text {init}+7} &{} = &{} D_{\text {init}+6} +
\varDelta _{NE} &{} = &{} -8 &{} (E)\\ D_{\text {init}+8} &{} = &{}
D_{\text {init}+7} + \varDelta _E &{} = &{} -2 &{} \end{array}$$

Fig. 7.8 Example of a line from point (2, 3) to point (10, 6) drawn with the Bresenham algorithm
The precondition for the application of this algorithm is that the slope of
the line to be drawn lies between zero and one. As described above, if the
absolute values of the slope exceed the value of one, the roles of the x- and
y-axes must be swapped in the calculations, i.e., the y-values are
incremented iteratively by one instead of the x-values. This always results
in lines whose slope lies between $$-1$$ and 1. For lines with a slope
between $$-1$$ and 0, completely analogous considerations can be
made as for lines with a slope between 0 and 1. Instead of the north-eastern
fragment, only the south-eastern fragment has to be considered.
An essential prerequisite for the midpoint algorithm is the assumption
that the line to be drawn is bounded by two points that have integer
coordinates, i.e., they lie on the grid. These points can be taken directly into
the grid as fragments. When modelling the line as a vector graphic, this
requirement does not have to be met at all. In this case, the line is drawn
that results from the connection of the rounded start- and endpoints of the
line. This might lead to a small deviation of the line of fragments obtained
from rounding the y-coordinates compared to the ideal line. However, the
deviations amount to at most one integer grid coordinate and are therefore
tolerable.

7.3.3 Structural Algorithm for Lines According to Brons


The midpoint algorithm (see Sect. 7.3.2) requires n integer operations to
draw a line consisting of n fragments, in addition to the one-time
initialisation at the beginning. This means that the computational time
complexity is linear to the number of fragments. Structural algorithms for
lines further reduce this complexity by exploiting repeating patterns that
occur when a line is drawn on a fragment raster. Figure 7.9 shows such a
pattern, which has a total length of five fragments. To better recognise the
repeating pattern, the drawn fragments were marked differently. The basic
pattern in Fig. 7.9 consists of one fragment (filled circle), two adjacent
fragments diagonally above the first fragment (non-filled circles), followed
by two more adjacent fragments diagonally above (non-filled double
circles). If D denotes a diagonal step (drawing the “north-eastern”
fragment) and H a horizontal step (drawing the “eastern” fragment), one
repeating pattern of the line can be described by the sequence DHDHD.
Fig. 7.9 A repeating pattern of fragments when drawing a line on a grid

If the line from Fig. 7.8 did not end at the point (10, 6) but continued to
be drawn, the pattern HDHHDHDHD would repeat again. This can also be
seen from the calculations of the midpoint algorithm in Table 7.1. The
initial value of the decision variable $$D_{\text {init}}$$ is identical
to the last value $$D_{\text {init}+8}$$ in the table. Continuing the
calculations of the midpoint algorithm would therefore lead to the same
results again as shown in the table.
Since up to now it has always been assumed that a line to be drawn is
defined by a starting point $$(x_0, y_0)$$ and an endpoint
$$(x_1, y_1)$$ on a grid, the values $$x_0, y_0,x_ 1 ,y_1$$
must be integers and therefore also the values $$dx = x_1 - x_0$$ and
$$dy = y_1 - y_0$$. The line to be drawn therefore has a rational
slope of $$\frac{dy}{dx}$$. For drawing the line, the y-values
$$\begin{aligned} \frac{dy}{dx}\cdot x + b \end{aligned}$$ (7.9)
with a rational constant b and integer values x must be rounded, regardless
of whether this is done explicitly as in the naive straight line algorithm or
implicitly with the midpoint algorithm. It is obvious that this can only result
in a finite number of different remainders when calculating the y-values
(7.9). For this reason, every line connecting two endpoints to be drawn on a
fragment raster must be based on a repeating pattern, even if it may be very
long. In the worst case, the repetition of the pattern would only start again
when the endpoint of the line is reached.
Structural algorithms for drawing lines exploit this fact and determine
the underlying basic pattern that describes the line. In contrast to the linear
time complexity in the number of fragments for the midpoint algorithm, the
structural algorithms can reduce this effort for drawing lines to a
logarithmic time complexity, but at the price of more complex operations
than simple integer additions.
Following the same line of argumentation as in the context of the
midpoint algorithm, the considerations of structural algorithms in this
section are limited to the case of a line with a slope between zero and one.
A structural algorithm constructs the repeated pattern of fragments for
drawing a line as a sequence of horizontal steps (H) and diagonal steps (D).
The basic principle is outlined below.
Given the starting point $$(x_0,y_0)$$ and the endpoint
$$(x_1,y_1)$$ of a line with a slope between zero and one, the values
$$dx = x_1 - x_0$$ and $$dy = y_1 - y_0$$ are computed.
After drawing the start fragment, a total of dx fragments must be drawn. On
this path, the line must rise by dy fragments. For these dx fragments, this
requires dy diagonal steps. The remaining $$(dx-dy)$$ drawing steps
must be horizontal steps.
The problem to be solved is to find the right sequence of diagonal and
horizontal steps. As a first, usually a very poor approximation of the line,
i.e., the sequence describing the line, the sequence
$$H^{dx-dy}D^{dy}$$ is chosen. This notation means that the
sequence consists of $$(dx-dy)$$ H steps followed by dy D steps. For
example, $$H^2 D^3$$ defines the sequence HHDDD. The initial
approximation $$H^{dx-dy}D^{dy}$$ (see above) contains the
correct number of horizontal and diagonal steps, but in the wrong order. By
appropriately permuting this initial sequence, the desired sequence of
drawing steps is finally obtained.
The algorithm of Brons constructs the correct permutation from the
initial sequence $$H^{dx-dy}D^{dy}$$ as follows [6, 7]:
If dx and dy (and therefore also $$(dx-dy)$$) have a greatest common
divisor greater than one, i.e., $$g = \text{ gcd }(dx,dy) > 1$$, then
the drawing of the line can be realised by g repeating sequences of length
dx/g.
Therefore, only the repeated pattern is considered, and it can be assumed
without loss of generality that dx and dy have no common divisor greater
than one.
Let P and Q be any two words (sequences) over the alphabet
$$\{D, H\}$$.
From an initial sequence $$P^p Q^q$$ with frequencies p and q having
no common divisor greater than one and assuming without loss of
generality $$1<q<p$$, the integer division
$$ p \; = \; k \cdot q +r, \quad 0< r < q $$
leads to the permuted sequence
$$ \begin{array}{ll} (P^k Q)^{q-r} (P^{k+1}Q)^r, &{} \text{ if }
(q-r)>r, \\ (P^{k+1}Q)^r (P^kQ)^{q-r}, &{} \text{ if } r > (q-
r). \end{array} $$
In these formulae, k is the integer result and r is the integer remainder of
the integer division.
Continue recursively with the two subsequences of lengths r or
$$(q-r)$$ until $$r = 1$$ or $$(q-r) = 1$$ holds.
How this algorithm works will be illustrated by the example of the line
from the point $$(x_0,y_0) = (0,0)$$ to the point
$$(x_1,y_1) = (82,34)$$. Obviously, $$dx = 82$$,
$$dy = 34$$ and thus $$\text{ gcd }(dx,dy) = 2$$ holds. The
line has a slope of $$dy/dx = 17/41$$. Thus, starting from the
fragment $$(x_0,y_0)$$ that lies on the ideal line, after 41 fragments
another fragment is reached that lies on the ideal line. It is therefore
sufficient to construct a sequence for drawing the first half of the line up to
the fragment (41, 17) and then to repeat this sequence once to draw the
remaining fragments. Therefore, the values
$$\widetilde{dx} = dx/2 = 41$$ and
$$\widetilde{dy} = dy/2 = 17$$ are considered. Thus, the initial
sequence is $$H^{24} D^{17}$$ and the corresponding integer
division with $$p=24$$ and $$q=17$$ yields
$$24 = 1\cdot 17 + 7$$. This leads to the sequence
$$(HD)^{10} (H^2D)^7$$ with $$p=10$$ and $$q=7$$.
The integer division $$10 = 1 \cdot 7 +3$$ results in the sequence
$$(HDH^2D)^4((HD)^2H^2D)^3$$. Here, $$p=4$$ and
$$q=3$$ hold, and the final integer division is
$$4 = 1\cdot 3 +1$$. Since the remainder of this division is
$$r = 1$$, the termination condition of the algorithm is satisfied and
the correct sequence of drawing steps results in
$$ (HDH^2D(HD)^2H^2D)^2((HDH^2D)^2(HD)^2((HD)^2H^2D))^1.
$$
This sequence must be applied twice to draw the complete line from point
(0, 0) to point (82, 34).
In contrast to the midpoint algorithm (see Sect. 7.3.2), the algorithm
described in this section can have a logarithmic time complexity depending
on the number of fragments to be drawn. Depending on the actual
implementation, however, the manipulation of strings or other complex
operations may be required, which in turn leads to disadvantages in runtime
behaviour.

7.3.4 Midpoint Algorithm for Circles


Section 7.3.2 presents an efficient algorithm for drawing lines on a grid
whose calculations are based solely on integer arithmetic. This midpoint
algorithm can be generalised for drawing circles and other curves under
certain conditions. The essential condition is that the centre
$$(x_m,y_m)$$ of the circle to be drawn lies on a grid point, i.e., has
integer coordinates $$x_m$$ and $$y_m$$. In this case, it is sufficient
to develop a method for drawing a circle with its centre at the origin of the
coordinate system. To obtain a circle with centre $$(x_m,y_m)$$, the
algorithm for a circle with centre (0, 0) has to be applied and the calculated
fragments have to be drawn with an offset of $$(x_m,y_m)$$.
In order to determine the coordinates of the fragments to be drawn for a
circle around the origin of the coordinate system, the calculations are
explicitly carried out for only one-eighth of the circle. The remaining
fragments result from symmetry considerations, as shown in Fig. 7.10. If
the fragment (x, y) has to be drawn in the hatched octant, the corresponding
fragments $$(\pm x,\pm y)$$ and $$(\pm y, \pm x)$$ must also
be drawn in the other parts of the circle.

Fig. 7.10 Exploiting the symmetry of the circle so that only the fragments for one-eighth of the
circle have to be calculated

To transfer the midpoint or Bresenham algorithm to circles [4], assume


that the radius R of the circle is an integer. In the octant circle under
consideration, the slope of the circular arc lies between 0 and $$-1$$.
Analogous to the considerations for the midpoint algorithm for lines with a
slope between zero and one, only two fragments are available for selection
as subsequent fragments of a drawn fragment. If the fragment
$$(x_p,y_p)$$ has been drawn in one step, then—as shown in Fig.
7.11—only one of the two fragments E with the coordinates
$$(x_p + 1, y_p)$$ or SE with the coordinates
$$(x_p + 1, y_p -1)$$ is the next fragment to be drawn.

Fig. 7.11 Representation of the fragment to be drawn next after a fragment has been drawn with the
midpoint algorithm for circles

As with the midpoint algorithm for lines, the decision as to which of the
two fragments is to be drawn is to be made with the help of a decision
variable. To do this, the circle equation $$x^2 + y^2 = R^2$$ is
rewritten in the following form:
$$\begin{aligned} d \; = \; F(x,y) \; = \; x^2 + y^2 - R^2 = 0.
(7.10)
\end{aligned}$$
For this implicit equation and a point (x, y), the following statements hold:
$$F(x,y) = 0 \; \Leftrightarrow \; (x,y)$$ lies on the circular arc.
$$F(x,y) > 0 \; \Leftrightarrow \; (x,y)$$ lies outside the circle.
$$F(x,y) < 0 \; \Leftrightarrow \; (x,y)$$ lies inside the circle.
In order to decide whether the fragment E or SE is to be drawn next, the
midpoint M is inserted into Eq. (7.10) leading to the following cases for
decision:
If $$d > 0$$ holds, SE must be drawn.
If $$d < 0$$ holds, E must be drawn.
As with the midpoint algorithm for lines, the value of the decision
variable d is not calculated in each step by directly inserting the midpoint
M. Instead, only the change in d is calculated at each step. Starting with
fragment $$(x_p,y_p)$$ which is assumed to be drawn correctly, the
change of d is calculated for the transition from fragment
$$(x_{p+1},y_{p+1})$$ to fragment $$(x_{p+2},y_{p+2})$$.
The following two cases must be distinguished.

Case 1: E, i.e., $$(x_{p+1},y_{p+1}) = (x_p+1,y_p)$$ was the


fragment drawn after $$(x_p,y_p)$$. This corresponds to the case shown
in Fig. 7.11. The midpoint $$M_E$$ under consideration for drawing the
fragment $$(x_{p+2},y_{p+2})$$ has the coordinates
$$\left( x_p +2, y_p-\frac{1}{2}\right) $$. Inserting this midpoint into
Eq. (7.10) yields the following value for the decision variable d:
$$ d_{\text {new}} \; = \; F\left( x_p + 2, y_p - \frac{1}{2}\right) \; = \;
(x_p + 2)^2 + \left( y_p - \frac{1}{2}\right) ^2 - R^2. $$
In the previous step to determine the fragment $$(x_{p+1},y_{p+1})$$,
the midpoint $$\left( x_p +1, y_p+\frac{1}{2}\right) $$ was considered.
Inserting this midpoint into Eq. (7.10) gives the prior value of the decision
variable as follows:
$$ d_{\text {old}} \; = \; F\left( x_p + 1, y_p - \frac{1}{2}\right) \; = \;
(x_p + 1)^2 + \left( y_p - \frac{1}{2}\right) ^2 - R^2. $$
The change in the decision variable in this case is thus the following value:
$$ \varDelta _E \; = \;d_{\text {new}} - d_{\text {old}} \; = \; 2x_p + 3.
$$
Case 2: SE, i.e., $$(x_{p+1},y_{p+1}) = (x_p+1,y_p-1)$$ was the
fragment drawn after $$(x_p,y_p)$$. In this case next, the midpoint to be
considered is $$M_{SE} = \left( x_p +2, y_p-\frac{3}{2}\right) $$ (see
Fig. 7.11). This results in the following value for the decision variable:
$$ d_{\text {new}} \; = \; F\left( x_p + 2, y_p - \frac{3}{2}\right) \; = \;
(x_p + 2)^2 + \left( y_p - \frac{3}{2}\right) ^2 - R^2. $$
The previous value of the decision variable d is the same as in the first case
of the eastern fragment E, so the change of the decision variable is given by
the following equation:
$$ \varDelta _{SE} \; = \; d_{\text {new}} - d_{\text {old}} \; = \; 2x_p -
2y_p + 5. $$

In summary, the change in the decision variable for both cases is as


follows:
$$ \varDelta \; = \; \left\{ \begin{array}{ll} 2x_p + 3 &{} \text{ if }
E \text{ was } \text{ chosen }\\ 2x_p - 2y_p + 5 &{} \text{ if } SE
\text{ was } \text{ chosen }. \end{array} \right. $$
This means
$$ \varDelta \; = \; \left\{ \begin{array}{ll} 2x_p + 3 &{} \text{ if }
d_{\text {old}} < 0, \\ 2x_p - 2y_p + 5 &{} \text{ if } d_{\text
{old}} > 0, \end{array} \right. $$
so that the change $$\varDelta $$ of the decision variable d is always
an integer.
In order to compute the decision variable d in each step, the initial value
must also be determined. The first fragment to be drawn has the coordinates
(0, R), so that $$\left( 1,R - \frac{1}{2}\right) $$ is the first centre to
be considered. The initial value of d is therefore
$$\begin{aligned} F\left( 1, R - \frac{1}{2}\right) \; = \; \frac{5}
(7.11)
{4} - R. \end{aligned}$$
As in the case of lines, the decision variable changes only by integer values,
but the initial value is not necessarily an integer. Similar to the algorithm
for lines, the decision variable $$D = 4\cdot d$$ could be used to
achieve an initialisation with an integer value. A simpler solution is,
however, to ignore the resulting digits after the decimal dot in Eq. (7.11).
This is possible for the following reason. In order to decide which fragment
to draw in each case, it is only necessary to determine in each step whether
the decision variable d has a positive or negative value. Since d only
changes by integer values in each step, the decimal part of the initialisation
value cannot influence the sign of d.
In deriving the midpoint algorithm for drawing circles, it was assumed
that the centre of the circle is the coordinate origin or at least a grid point.
Furthermore, it was assumed that the radius is also an integer. The midpoint
algorithm can easily be extended to circles with arbitrary, not necessarily
integer radius. Since the radius has no influence on the change of the
decision variable d, the (non-integer) radius only needs to be taken into
account when initialising d. For a floating point radius R,
$$(0,\text{ round }(R))$$ is the first fragment to be drawn and thus
$$\left( 1,\text{ round }(R) - \frac{1}{2}\right) $$ is the first
midpoint to be considered. Accordingly, d must be initialised by the
following value
$$ F\left( 1, \text{ round }(R) - \frac{1}{2}\right) \; = \; \frac{5}{4} -
\text{ round }(R). $$
For the same reasons as for circles with an integer radius, the digits after the
decimal dots can be ignored. This makes the initialisation of d an integer
and the change in d remains an integer independent of the radius.
Both the midpoint algorithm for lines and for circles can be parallelised
(see [24]). This means that these algorithms can be used efficiently by
GPUs with parallel processors.

7.3.5 Drawing Arbitrary Curves


The midpoint algorithm can be generalised not only to circles but also to
other curves, for example for ellipses [1, 15, 18]. An essential, very
restrictive condition for the midpoint algorithm is that the slope of the curve
must remain between 0 and 1 or between 0 and $$-1$$ in the interval to
be drawn. For drawing arbitrary or at least continuous curves or for the
representation of function graphs, a piecewise approximation by lines is
therefore made. For drawing the continuous function $$y = f(x)$$, it is
not sufficient to iterate stepwise through the x-values for each grid position
and draw the corresponding fragments with the rounded y-values. In areas
where the function has an absolute value of the slope greater than one, the
same gaps in drawing the function graph would result as produced by the
naive line drawing algorithm in Fig. 7.5. Lines have a constant slope that
can be easily determined. Therefore, this problem for lines is solved by
swapping the roles of the x- and y-axes when drawing a line with an
absolute value of the slope greater than one. This means that in this case the
inverse function is drawn along the y-axis. Arbitrary functions do not have a
constant slope, and both the slope and the inverse function are often
difficult or impossible to calculate.
For this reason, drawing arbitrary curves is carried out by stepwise
iterating through the desired range on the x-axis and computing the
corresponding rounded y-values. However, not only these fragments are
drawn but also the connecting line between fragments with neighbouring x-
values is drawn based on the midpoint algorithm for lines.
Fig. 7.12 Drawing an arbitrary curve

Figure 7.12 illustrates this principle for drawing a continuous function


$$y=f(x)$$ in the interval $$[x_0,x_1]$$ with
$$x_0,x_1 \in \mathrm{I\!N}$$. The filled circles are the fragments
of the form $$(x,\text{ round }(f(x)))$$ for integer x-values. The
unfilled circles correspond to the fragments that are set when drawing the
connecting lines of fragments with neighbouring x-coordinates. When a
curve is drawn in this way, fragments are generally not placed at the same
positions as when the grid position closest to the curve is chosen for the
fragment. However, the deviation is at most one integer grid coordinate.

7.4 Parameters for Drawing Lines


Section 7.3 explains procedures for converting geometric line primitives
described as vector graphics into raster graphics. It is assumed that the
generated lines or the circular arc are drawn solid and are exactly one
fragment wide. This section presents methods for drawing lines with
different line styles, such as dashed and dotted lines, and lines of different
widths. It also explains how to compensate for the different densities of
fragments with different slopes of lines.

Fig. 7.13 An example of different fragment densities depending on the slope of a line

7.4.1 Fragment Density and Line Style


In most cases, rasterisation creates a staircase-like structure when drawing a
continuous line. The undesirable effects that occur during rasterisation are
explained in more detail in Sects. 7.6 and 7.6.1. Section 7.6.2 contains
suitable countermeasures to avoid or reduce these effects. When drawing
lines, it was assumed for the algorithms presented in this chapter that the
lines have the width of one fragment. Thin lines, where one fragment per
single step is set in x- or y-direction, have different densities depending on
their slope. Figure 7.13 shows this effect. The horizontal line has the
highest fragment density. As the slope increases, the fragment density
decreases until slope one, where the lowest density is reached. Since lines
with a slope greater than one are drawn by swapping the roles of the
coordinate axes, the fragment density increases again with steeper lines.
The same applies to lines with negative slopes.
To see in more detail how the slope of a line affects fragment density in
raster graphics, consider a line from point (0, 0) to point (n, m) with
$$m \le n$$ (and thus with $$m/n \le 1$$). The line always
consists of n fragments, regardless of its slope. In Table 7.2, the fragment
densities are given as a function of the value m. The last row of the table
contains the general formula for arbitrary values $$m\le n$$.
Table 7.2 Fragment densities of lines with different slopes

m Slope Length of the line Fragment density


0 0 n 1
$$\displayst $$\displayst $$\displaystyle n \cdot $${\displaystyle \frac{1}
yle \frac{n} yle \frac{1} \sqrt{1+\frac{1}{16}}^{}$$ {\sqrt{1+\frac{1}{16}}}}_{}$$
{4}$$ {4}$$
$$\displayst 1 $$\displaystyle n \cdot $${\displaystyle \frac{1}
yle n$$ \sqrt{2}$$ {\sqrt{2}}}_{}^{}$$
m $$\displayst $$\displaystyle n \cdot $${\displaystyle \frac{1}
yle \frac{m} \sqrt{1+\left( \frac{m} {\sqrt{1+\left( \frac{m}{n}\right)
{n}$$ {n}\right) ^2}^{}$$ ^2}}}_{}$$

The horizontal line, in which a total of n fragments are drawn over a


length of n fragments, has the greatest fragment density of one. A diagonal
line of slope one has $$1/\sqrt{2} \approx 0.7$$, only about 70% of
the density of a horizontal line. If not only a pure black and white
representation of fragments are available, but grey fragments can be drawn,
this effect can be compensated by using different intensities when drawing
lines. Horizontal and vertical lines are drawn with the lowest intensity
(approx. 70% of the maximum intensity), and diagonal lines with the
maximum intensity. The compensation of the fragment density occurs at the
expense of the overall intensity, since the lines with the lowest fragment
density define the maximum intensity, so that all other lines are drawn
(slightly) weaker.
The effect that different slopes of lines lead to optical differences when
drawing occurs in the same way in the realisation of line styles. A line
style is the way a line is drawn. Up to now, it has been assumed that lines
are drawn solid, which corresponds to the standard line style. Other
common line styles include dashed or dotted lines.
A simple way to define different line styles is to use bitmasks. A
bitmask is a finite sequence of zeros and ones. When drawing a line, this
bitmask is mapped to the fragments of the line. If there is a one in the
bitmask, the corresponding fragment is actually drawn; if there is a zero, it
is not drawn.

Fig. 7.14 Bitmasks for different line styles

Figure 7.14 illustrates the relationship between the bitmask, the


fragments to be set and the appearance of the line style. The figure shows
four different line styles. For each line, the underlying bitmask is shown
once at the beginning in slightly enlarged boldface digits, afterwards the
repetitions of the bitmask are shown in the normal font. Below each
bitmask is the sequence of fragments to be drawn. Below this, the resulting
line style is shown. The top bitmask for drawing a solid line consists only of
a single one. For a dashed line, for example, the bitmask 111000 could be
used.
Since a bitmask determines the fragments to be drawn based on the
direction the line is traversed on the corresponding coordinate axis,
analogous effects result as when drawing solid lines when the slope of the
line changes. This makes lines appear differently dense depending on their
slope. For example, for a dashed line with a bitmask of the form
$$1^n0^n$$, that is, n fragments are alternately drawn or n fragments
are omitted, the length of the dash depends on the slope of the line. For
horizontal and vertical lines, a single dash is n fragments long, while for a
line with $$45^\circ $$ slope, the length of the dash is
$$n\cdot \sqrt{2}$$ fragments long, that is, more than 40% longer.
Figure 7.15 illustrates this relationship between the length of the dashes and
the slope of the dashed line using a simple bitmask.

Fig. 7.15 Different dash length of a dashed line when using the same bitmask

7.4.2 Line Styles in the OpenGL


In the OpenGL fixed-function pipeline, different line styles can be defined
by the command glLineStipple. Such a predefined function is not
available in the core profile. In this profile, a different line style can be
created by using short lines or dots if needed. In the compatibility profile,
this functionality has to be activated by
glEnable(gl.GL_LINE_STIPPLE) and can be deactivated later by
glDisable(gl.GL_LINE_STIPPLE). The line style is defined by a
bitmask as described in Sect. 7.4.1. A typical JOGL command sequence for
the definition of a dashed line looks as follows:

Fig. 7.16 Examples of line styles in OpenGL: The left side shows the output of an OpenGL
renderer. The right side shows a magnification of the lines. The lines were drawn with a width of two
fragments

If a corresponding drawing command is subsequently executed, it draws


a dashed line as shown in Fig. 7.16 (second line from the top). The bitmask
is always 16 bits long, which is why in the code example above the bitmask
is specified as a primitive Java type short. When drawing, the bitmask is
repeatedly evaluated from the least significant bit to the most significant bit
(reading direction in the source code from right to left) to decide whether a
fragment should be set (one in the bitmask) or not set (zero in the bitmask).
If a fragment is not set, then no further processing of this fragment takes
place in the graphics pipeline. Since the lines in Fig. 7.16 were drawn from
left to right, the dashed line therefore starts with eight fragment set. The
following two commands were used to define the line styles of the bottom
two lines in the figure:
Dotted line:

gl.glLineStipple(1, (short) 0b0001000100010001);


Line with dashes and dots:

gl.glLineStipple(1, (short) 0b0000100001111111);


The top line in the figure is a solid line for comparison.
Fig. 7.17 Dashed lines of different slopes with a width of two fragments drawn with an OpenGL
renderer: For the left side, the same bitmask was used for each slope. On the right side, a bitmask
with a slope-dependent correction factor was applied

As explained in Sect. 7.4.1 and shown in Fig. 7.15, when simply


applying the same bitmask, the length of the dashes change at different
slopes of the line. This problem is also visible in the output of an OpenGL
renderer shown on the left in Fig. 7.17. The two oblique lines have longer
dashes than the horizontal line.
To solve this problem, the first parameter in the glLineStipple
command can be used. This specifies how many times a bit in the bitmask
will be used repeatedly to mask drawing a fragment until the next bit is
used. Valid values for this parameter are integer values from the interval
[1, 256]. This can be used, for example, to stretch the dashes of a line with a
small slope compared to a line with a larger slope. As explained in Sect.
7.4.1, for a basic consideration it is sufficient to consider only slopes in the
interval [0, 1].
Table 7.3 Examples of correction factors for drawing non-solid lines as a function of the line slope:
The general formula for this factor is given in the last row of the table

Slope Fragment density $$\sqrt{2} \cdot $$ Correction factor


fragment density
0 1 $$\sqrt{2}$$ 14
$$\displayst $$\displaystyle $${\displaystyle 13
yle \frac{1} \frac{1} \frac{\sqrt{2}}
{2}$$ {\sqrt{1+\frac{1} {\sqrt{1+\frac{1}
{4}}}$$ {4}}}}_{}^{}$$
1 $$\displaystyle 1 10
\frac{1}
{\sqrt{2}}$$
$$\displayst $$\displaystyle $$\displaystyle round
yle \frac{m} \frac{1} \frac{\sqrt{2}} $$\Biggl ({\displaystyle \frac{10
{n}$$ {\sqrt{1+\left( {\sqrt{1+\left( \cdot \sqrt{2}}{\sqrt{1+\left(
\frac{m}{n}\right) \frac{m}{n}\right) \frac{m}{n}\right)
^2}}$$ ^2}}$$ ^2}}}_{}^{}\Biggr )$$

The correction value to be specified in the glLineStipple


command is to be calculated as a function of the fragment density, whichin
turn is determined by the slope of the line (see Table 7.2). In Table 7.3, this
relationship is given for three slope values and for the general case. In this
example, a line from the point (0, 0) to the point (n, m) with
$$m \le n$$ is considered. The line always consists of n fragments
regardless of its slope. The last two columns of the table show the
possibility to determine the correction factors. Since lines with a slope of
one have the lowest fragment density in the drawing direction and thus the
dashes become the longest, no correction needs to be applied in this case.
This can be achieved by normalising the fragment density to the value one
by multiplying it by the factor $$\sqrt{2}$$. If this normalisation is
also carried out for lines with smaller slopes, then values greater than one
result for these lines, whereby the dashes of the lines are stretched as
desired. Since the first parameter of the glLineStipple command must
be an integer value from the interval [1, 256], multiplication by the factor
10 with subsequent rounding to an integer value was carried out in the last
step (last column of the table). The use of another factor is also possible.
By applying the correction factors, each individual bit of the bitmask is
repeated in turn. Therefore, the bitmask must also be adjusted. For drawing
dashed lines with similar line lengths, the following definitions of line
styles can be used:
Dashed line with slope 0:

gl.glLineStipple(14, (short) 0b0101010101010101);


Dashed line with slope $$\vert \frac{1}{2} \vert $$:

gl.glLineStipple(13, (short) 0b0101010101010101);


Dashed line with slope $$\vert 1 \vert $$:

gl.glLineStipple(10, (short) 0b0101010101010101);


These commands were used before the dashed lines on the right-hand
side in Fig. 7.17 were drawn. The dashes of the lines within the right part of
the figure are similarly long and shorter compared to the oblique lines in the
uncorrected version on the left side of the figure.

7.4.3 Drawing Thick Lines


With today’s high-resolution output devices, lines that are only one
fragment wide generally appear very thin. There are a number of techniques
for drawing lines wider than one fragment. The simplest technique for
increasing the line thickness is to replicate fragments or pixels (called pixel
replication). For drawing a curve with a slope between $$-1$$ and 1, n
fragments are additionally drawn above and below each fragment to be
drawn, so that the curve is $$2n+1$$ grid points wide in the y-direction.
As with drawing of curves with a width of 1, a curve appears thinner with
increasing slope (up to a maximum of 1). The same effect that lines with a
larger slope look thinner than those with zero slope, as discussed in Sect.
7.4.1, also occurs here. The reason why only curves with a slope between
$$-1$$ and 1 are considered is that lines are always drawn in such a way
that their absolute value of the slope does not exceed 1. This is discussed in
detail in Sect. 7.3.1. As explained in Sect. 7.3.5, arbitrary curves are also
approximated by lines when drawn, so that also in this case the restriction
to a slope between $$-1$$ and 1 does not cause any difficulties.

Fig. 7.18 Example of drawing thick lines: On the left side, fragments were replicated (pixel
replication). The moving pen technique was used in the illustration on the right side

Figure 7.18 illustrates this method of fragment replication on the left.


The right side of the figure shows the moving pen technique. In this
example, a thick drawing pen with a tip consisting of 5 $$\times $$5
fragments was used. In addition to the fragment in the centre of the tip, all
other fragments belonging to the tip are also drawn.
The replication of fragments (pixel replication) can be interpreted as a
special case of the moving pen technique with a width of 1. Also with this
method, a curve with a square drawing pen tip appears thinner the steeper
the curve is, at least up to the maximum slope of 1.
To avoid multiple writing of fragments, the movement and shape of the
tip should be taken into account. For the tip consisting of 5
$$\times $$ 5 fragments in Fig. 7.18 and assuming that the line to be
drawn has a slope between 0 and 1, only in the first step all 25 fragments
have to be drawn. After that, it is sufficient to set only the five fragments at
the right edge if the next line fragment to be drawn is the right (eastern)
fragment. A total of nine fragments have to be drawn if the north-eastern
line fragment is the next fragment of the line.
Another way to draw thick lines is to think of the lines as rectangles or,
more generally, as polygons that need to be filled. The rasterisation of
polygons is covered in Sect. 7.5.
When drawing thick lines, it must be considered and sensibly defined
how the ends of lines as well as the joins of polylines should look. With a
simple replication of fragments (pixel replication), the ends of lines would
always be parallel to one of the coordinate axes. This problem can be
solved by considering lines as rectangles to be filled. However, problems
then arise at the joins of polylines. Figure 7.19 shows various possibilities
of how the ends and joins of thick lines can look. The lines in the
illustration have been drawn extremely wide to make the effects more
visible. On the far left, two lines have been interpreted as rectangles to be
filled, which results in an unsightly transition between the two lines. For the
other three line segments, the transitions were drawn in different ways.
Extending the outer edges of the two rectangles to their intersection gives a
distinct peak as seen in the second line from the left. Next to it on the right,
the transition has been cut off straight, while at the far right of the figure a
segment of a circle has been placed to connect the lines. The ends of the
lines can also be cut off straight or provided with semicircles as in the
penultimate polyline. Filling areas is used not only as a technique for
drawing thick lines but also as a mode when drawing filled polygons.

Fig. 7.19 Examples of different line endings and joins between lines

7.4.4 Line Thickness in the OpenGL


In the core and compatibility profile, the glLineWidth command sets the
width of lines. For example, a line with a width of five fragments can be
defined by the following command:

Subsequent drawing commands draw lines with this specification. If the


argument to this command is a floating point number, then the line width
will be rounded to the nearest integer. If a 0 is given as an argument or the
rounding result is 0, the line width is set to 1.
If a line is drawn by iterating the values along the x-axis, this line is
called x-major according to the OpenGL specification. As explained several
times in this chapter, this makes sense if the slope of the line is between
$$-1$$ and 1. If the slope is outside this range, then the values along
the y-axis are iterated to draw the line. In this case, the line is called y-
major.
If the integer value of the line width b is determined and greater than 1,
then for x-major lines the y-coordinates of the start- and endpoints of the
line are shifted by half the line width in the negative direction. A line is
then drawn between these shifted coordinates, with a number b of
fragments drawn in the y-direction instead of drawing one fragment. This
means the replicated fragments expand in the positive y-direction, creating a
packet of fragments. Effectively, this is the realisation of fragment
replication (pixel replication) as described in Sect. 7.4.3. For y-major lines,
a shift takes place in the negative x-direction, and the fragment replication
takes place in the positive x-direction.

Fig. 7.20 On the left are lines of widths 1, 2, 5 and 10 drawn by an OpenGL renderer. A dashed
white line of width 1 was drawn over the bottom two lines to illustrate the replication of fragments.
The right side shows magnifications of the drawn lines on the left side

Figure 7.20 shows examples of horizontal (x-major) lines of different


thicknesses drawn by an OpenGL renderer. To visualise the fragment
replication, dashed white lines with the width of one fragment were drawn
on each of the two lower lines. It is clearly visible that the white lines are in
the middle of the thick lines.
The available widths for rendering lines depend on the implementation
of the OpenGL interface and thus on the graphics card and its driver. The
available width range of the currently used GPU can be queried as follows:

After executing this JOGL source code, the arrays contain the minimum
and maximum possible line widths for lines that can be drawn without and
with antialiasing measures. Section 7.6 presents aliasing effects and
methods for reducing aliasing.
7.5 Rasterisation and Filling of Areas
Area is usually defined by polygons or closed curves. The rasterisation of
the edges of such an area defined in this way is discussed in Sects. 7.3 and
7.4. In order to fill a closed curve, it is necessary to define which grid points
are inside and which are outside the area thus defined. This process,
sometimes referred to as the rasterisation of areas or polygons, is the subject
of this section. When filling an area, it is important to note that the
rasterisation process is not only about determining the coordinates or the
colour of the fragment to be rasterised in the grid. Added to this is the
calculation of associated data for each fragment (see Sect. 7.2). The
associated data per fragment, which can be determined by interpolation (see
Sect. 7.5.4) based on the associated data per vertex, is often used in the
subsequent pipeline stages to compute illumination effects for the fragments
and subsequently the final colours of the pixels to be displayed on the
screen.

7.5.1 Odd Parity Rule


While it is obvious which points are inside and which are outside for areas
defined by closed curves without intersections, arbitrary curves that may
intersect themselves require an unambiguous definition of the inner points.
For this purpose, the odd parity rule (also called even–odd rule) is used,
which is based on the following consideration.

Fig. 7.21 Application of the odd parity rule (even–odd rule) to points inside and outside of a closed
polygonal chain

If one starts to move along a half-line (a ray) from an inner point of a


closed curve in one direction, then a bounding edge of the area must be
reached at some point on the line and thus the curve must be intersected. If
the defined area is not convex, it is possible that there are several
intersections between the curve and the line starting from the point. After
the first intersection, it appears the area has been exited, and after the
second intersection, if any, the area is re-entered. In this case, there must be
at least one more intersection with the curve. With each intersection of the
half-line with the curve, there is a change from inside to outside or from
outside to inside. Thus, there can only be an odd number of intersections
with the closed curve on the half-line starting from an inner point, since the
surface must have been left after the last intersection. The odd-parity rule
therefore states that a point lies within a closed curve exactly when any
half-line starting from this point has an odd number of intersections with
the curve. Figure 7.21 shows the application of the odd-parity rule to
different points. In each case, the number of intersections of the half-line
with the curve is indicated. Inner points, where the number of intersections
is odd, are interior (“int”). For exterior “exte” points, the number of
intersections is even.

7.5.2 Scan Line Technique


To fill an area defined by a closed curve, it would be too computationally
expensive to consider a half-line for each grid point and determine the
intersections of these half-lines with the curve. Instead, a scan line
technique is used. Scan line techniques are very common in computer
graphics. They carry out computations along a line, the scan line, usually
along lines parallel to one of the coordinate axes. The first scan line
methods were described in [3, 26], especially in the context of visibility
considerations (see Chap. 8). Without extensions, they are not well suited
for parallelisation on GPUs.
For the rasterisation, starting from an already rasterised edge of the
polygon, one runs, for example, along the horizontal raster lines (scan lines)
and determines for each line which points lie inside or outside the area
defined by a polygonal chain. First, all intersections of the straight line
corresponding to the raster line with the curve are determined and sorted in
ascending order according to the x-coordinates. If
$$x_1< \ldots < x_n$$ are the x-coordinates of these
intersections, exactly the grid points between $$x_1$$ and
$$x_2$$, between $$x_3$$ and $$x_4$$, etc., and
between $$x_{n-1}$$ and $$x_n$$ must be drawn to fill the
area. The number of intersections is always even, although it can also be
zero. Figure 7.22 illustrates this principle for one scan line.

Fig. 7.22 Scan line technique for filling a polygon


Fig. 7.23 A scan line intersecting two vertices of a polygon

There are some special cases to take into account. This includes
clipping (see Chap. 8). All intersections of the scan line with the curve must
be calculated. However, the filling only has to be done for fragments within
the clipping area. Another special case occurs when the scan line intersects
the vertices of the polygon. Such cases need special treatment. In Fig. 7.23,
the scan line intersects two vertices of the polygon. At the first vertex, a
change from the exterior to the interior part of the polygon takes place. At
the second vertex, the scan line remains outside the polygon. The treatment
of vertices as intersection points requires considering the angles between
the scan line and the edges that are attached to the corresponding vertices.
Horizontal edges also need special treatment.
If a scan line technique is also used for visibility considerations, i.e., for
calculating which polygons are not occluded by others and can be seen by
the viewer and thus need to be displayed (see Sect. 8.2.4), then filling using
a scan line method can be combined with these calculations. Furthermore, it
is possible to incrementally calculate the associated data for the fragments
to be scanned from the respective previous values while passing through the
scan line. The incremental calculation for depth values (z-values) is
presented in Sect. 8.2.3. Furthermore, this method does not depend on the
presence of a frame buffer.
In its simple variant, a scan line technique is well suited for processing
on a single processor, for example, for realisation by a software renderer
running on a CPU. For the efficient use of this technique on multiprocessor
GPUs, however, (partly complex) extensions for parallel processing are
required. Approaches to this and runtime estimates can be found in [10, 16].
A simple rasterisation method for triangles that allows parallelisation for
multiple processors on the GPU is described in Sect. 7.5.3.
Drawing and filling a closed curve are usually considered two
independent operations in a scan line technique. Due to the limited
resolution of the grid, grid points can belong to the interior of the area as
well as to the edge, i.e., to the curve, especially if the polygon is to be
drawn by a thick line. Using antialiasing (see Sect. 7.6) also results in
overlaps between the edge and the interior of an area. To avoid gaps at the
edge when a curve is to be both drawn and filled, the antialiasing may start
at the edge at the earliest when filling the surface. When drawing very long
or thin curves, additional aliasing effects may occur during the filling
process. For example, a contiguous area may be filled by a non-contiguous
set of fragments, as shown in Fig. 7.24.

Fig. 7.24 Aliasing effect when filling a surface

7.5.3 Polygon Rasterisation Algorithm According to Pineda


An effective algorithm for filling or rasterising convex polygons was
proposed by Pineda [17]. Because this algorithm is easy to parallelise, it is
well suited for use in graphics processors that allow parallel processing with
multiple processors or processor cores. Moreover, the edge of the polygon
does not have to be rasterised beforehand by another method, such as a line
drawing method.
Figure 7.25 shows a triangle as an example of a polygon to be rasterised
in two-dimensional window coordinates. The dark-coloured fragments
represent the desired result of the rasterisation of the triangle contour. The
principle of Pineda’s algorithm is based on simple linear equations for the
edges of the polygon (edge function), based on which it is possible to
efficiently decide which fragments are inside or outside the polygon. To do
this, the parameters of these equations must be determined. Afterwards, the
grid points potentially belonging to the polygon must be traversed in order
to examine whether they belong to the polygon to be rasterised or not. In
principle, any traversing algorithm can be used for this purpose, as long as
it covers all the grid points potentially to be rasterised.

Fig. 7.25 A triangle to be rasterised with vertices $$P_0$$ , $$P_1$$ and $$P_2$$ in two-
dimensional window coordinates on a grid. The centres of the grid elements are marked by thick
dots. The dark-coloured fragments are the result of rasterising the interior of the contour. The light-
coloured fragments represent auxiliary fragments resulting from considering raster blocks of size
$$2 \times 2$$ . The normal vector (red coloured) of the edge from $$P_0$$ to $$P_1$$ is
shown nonnormalised and altered in length for clarity

For the representation of an edge of a polygon, the Hesse normal form


can be used. Let $$\boldsymbol{p}_i$$ be a location vector to a point
$$P_i$$ on the line i and $$\boldsymbol{n}_{n, i}^T$$ be the
normalised normal vector to this line in $$\mathbb {R}^2$$. Then all
points $$Q_j$$ with location vector
$$\boldsymbol{q}_j = (x_j, y_j)^T$$ lie on this straight line if they
satisfy the following equation:
$$\begin{aligned} \boldsymbol{n}_{n, i}^T\, (\boldsymbol{q}_j -
\boldsymbol{p}_i) = 0 \; \Leftrightarrow \; \boldsymbol{n}_{n,i}^T\,
(7.12)
\boldsymbol{q}_j = \boldsymbol{n}_{n,i}^T\, \boldsymbol{p}_i.
\end{aligned}$$
It should be noted that the scalar product is used to multiply the vectors.
Figure 7.25 shows the nonnormalised, truncated normal vector for the
straight line $$i = 0$$ represented by the vertices $$P_0$$ and
$$P_1$$. The normal vector used in the algorithm is calculated
according to Eq. (7.14) (see below).
Equation (7.12) has the useful property that
$$(\boldsymbol{n}_{n, i}^T\, \boldsymbol{p}_i)$$ represents the
distance of the straight line from the coordinate origin. From this, it follows
that for the points $$\boldsymbol{q}_j = (x_j, y_j)^T$$, which do not
lie on this straight line and therefore do not satisfy the equation, the
following can be established:
(
$$\boldsymbol{n}_{n, i}^T\, \boldsymbol{q}_j) >
(\boldsymbol{n}_{n, i}^T\, \boldsymbol{p}_i $$
) $$\Leftrightarrow \; \boldsymbol{q}_j $$ lies on the side of the
straight line in the direction of the normal vector
$$\boldsymbol{n}_{n, i}$$.
(
$$\boldsymbol{n}_{n, i}^T\, \boldsymbol{q}_j) <
(\boldsymbol{n}_{n, i}^T\, \boldsymbol{p}_i $$
) $$ \Leftrightarrow \; \boldsymbol{q}_j $$ lies on the side of the
straight line against the direction of the normal vector
$$\boldsymbol{n}_{n, i}$$.
If the normal vectors of all edges of the polygon under consideration are
set to point inwards, as for the edge from $$P_0$$ to $$P_1$$
in Fig. 7.25, then Eq. (7.12) can be extended to the following decision
functions:
$$\begin{aligned} e_i (\boldsymbol{q}_j) \; = \boldsymbol{n}_i^T (7.13)
(\boldsymbol{q}_j - \boldsymbol{p}_i) \left\{ \begin{array}{ll} = 0
&{} \text{ if } \boldsymbol{q}_j \text{ lies } \text{ on } \text{
the } \text{ edge } i \\ > 0 &{} \text{ if } \boldsymbol{q}_j
\text{ lies } \text{ inside } \text{ relative } \text{ to } \text{ edge } i \\
< 0 &{} \text{ if } \boldsymbol{q}_j \text{ lies } \text{
outside } \text{ relative } \text{ to } \text{ edge } i. \\ \end{array}
\right. \end{aligned}$$
If the considered grid point $$\boldsymbol{q}_j$$ according to the
decision functions $$e_i$$ lies relative to all edges i with
$$i = 0, 1, ..., (N-1)$$ within an N-sided polygon, then this point is a
fragment of the rasterised polygon. Since only a sign test is needed when
using the decision functions, the length of the normal vectors is irrelevant.
Therefore, Eq. (7.13) uses the nonnormalised normal vectors
$$\boldsymbol{n}_i $$. Omitting the normalisation saves computing
time.
In computer graphics, it is a common convention to arrange the vertices
on the visible side of a polygon in a counter-clockwise order. If this order is
applied to the convex N-sided polygon to be rasterised, then the
nonnormalised normal vectors can be determined by rotating the
displacement vectors between two consecutive vertices by
$$90^{\circ }$$ counter-clockwise. The nonnormalised normal vector
for the edge i ( $$i = 0, 1, ..., (N-1) $$) through points $$P_i$$
and $$P_{i+1}$$ can be determined using the matrix for a rotation by
$$90^{\circ }$$ as follows:
$$\begin{aligned} \boldsymbol{n}_i = \left( \begin{array}{cc} 0
&{} -1 \\ 1 &{} 0 \end{array}\right) (7.14)
(\boldsymbol{p}_{i+1} - \boldsymbol{p}_i). \end{aligned}$$
To obtain a closed polygon,
$$\boldsymbol{p}_{N} = \boldsymbol{p}_0$$ is set. This captures
all N edges of the N-sided polygon and all normal vectors point in the
direction of the interior of this polygon. Alternatively, it is possible to work
with a clockwise orientation of the vertices. In this case, the calculations
would change accordingly.
Besides the simplicity of the test if a fragment belongs to the polygon to
be rasterised, Eq. (7.13) has a useful locality property. For this, consider the
edge $$i = 0$$ of a polygon through the points $$P_0$$ and
$$P_1$$ with their position vectors
$$\boldsymbol{p}_0 = (p_{0x}, p_{0y})^T$$ and
$$\boldsymbol{p}_1 = (p_{1x}, p_{1y})^T$$. Using Eq. (7.14), the
associated nonnormalised normal vector yields as follows:
$$\begin{aligned} \boldsymbol{n}_0 = \left( \begin{array}{cc} 0
&{} -1 \\ 1 &{} 0 \end{array} \right) \left( \begin{array}
{c} p_{1x} - p_{0x} \\ p_{1y} - p_{0y} \\ \end{array} \right) \; = \; (7.15)
\left( \begin{array}{c} -(p_{1y} - p_{0y}) \\ p_{1x} - p_{0x} \\
\end{array} \right) . \end{aligned}$$
For a location vector $$\boldsymbol{q}_j = (x_j, y_j)^T$$ to a grid
point $$Q_j$$ for which it is to be decided whether it lies within the
polygon, inserting Eq. (7.15) into Eq. (7.13) yields the following decision
function:
$$\begin{aligned} e_0 (x_j, y_j) \; = \; e_0 (\boldsymbol{q}_j) \; = \; -
(p_{1y} - p_{0y}) (x_j - p_{0x}) + (p_{1x} - p_{0x}) (y_j - p_{0x})
\end{aligned}$$
$$\begin{aligned} \; = \; - (p_{1y} - p_{0y}) x_j + (p_{1x} - p_{0x}) y_j
+ (p_{1y} - p_{0y}) p_{0x} - (p_{1x} - p_{0x}) p_{0x} \end{aligned}$$

$$\begin{aligned} \; = \; a_0 x_j + b_0 y_j + c_0. \end{aligned}$$ (7.16)


Let the grid for the rasterisation and thus a location vector
$$\boldsymbol{q}_j = (x_j, y_j)^T$$ be described in integer
coordinates. Furthermore, the decision function
$$ e_0 (\boldsymbol{q}_j) = e_0 (x_j, y_j) $$ has already been
evaluated for the grid point in question. Then the value of the decision
function $$e_0$$ for the grid point $$(x_j + 1, y_j)$$, which
lies one (integer) grid coordinate further along the x-coordinate next to the
already evaluated grid point, results from Eq. (7.16) as follows:
$$\begin{aligned} e_0 (x_j + 1, y_j) \; = \; a_0 (x_j + 1) + b_0 y_j + c_0
\end{aligned}$$
$$\begin{aligned} \; = \; a_0 x_j + b_0 y_j + c_0 + a_0 = e_0 (x_j, (7.17)
y_j) + a_0. \end{aligned}$$
This means that the value of the decision function for the next grid point in
the x-direction differs only by adding a constant value (during the
rasterisation process of a polygon). Similar considerations can be made for
the negative x-direction and the y-directions and applied to all decision
functions. This locality property makes it very efficient to incrementally
determine the values of the decision functions from the previous values
when traversing an area to be rasterised.

Fig. 7.26 Traversing the bounding box (thick frame) of a triangle to be rasterised on a zigzag path
(arrows). The starting point is at the top left

In Fig. 7.26, a zigzag path can be seen through the bounding box of a
triangle to be rasterised. This is a simple approach to ensure that all points
of the grid potentially belonging to the triangle are covered. At the same
time, the locality property from Eq. (7.17) can be used for the efficient
incremental calculation of the decision functions. Depending on the shape
and position of the triangle, the bounding box is very large and many grid
points outside the triangle have to be examined. Optimisations for the
traversal algorithm are already available from the original work by Pineda
[17].
Besides this obvious approach, a hierarchical traversal of areas to be
rasterised can be used. Here, the raster to be examined is divided into tiles,
which consist of $$16 \times 16$$ raster points, for example. Using
an efficient algorithm (see [2, pp. 970–971]), a test of a single grid point of
a tile can be used with the help of the decision functions to decide whether
the tile must be examined further or whether the entire tile lies outside the
triangle to be rasterised. If the tile must be examined, it can again be
divided into smaller tiles of, for example, size $$4 \times 4$$, which
are examined according to the same scheme. Only for the tiles of the most
detailed hierarchy level must an examination take place for all grid points,
for example on a zigzag path. Traversing through the tiles and examining
them at each of the hierarchy levels can take place either sequentially on a
zigzag path or by processors of a GPU working in parallel. At each level of
the hierarchy, the locality property of the decision functions (Eq. (7.17)) for
traversing from one tile to the next by choosing the step size accordingly
can also be used. For example, on a stage with tiles of
$$16 \times 16$$ grid points in x-direction, the value
$$(16 \cdot a_0)$$ has to be added to the value of the decision
function $$e_0$$ for one tile to get the value for the following tile.
This enables the very effective use of the parallel hardware of GPUs.
In Fig. 7.25, next to the fragments belonging to the rasterised triangle,
grid points are brightly coloured, which are called auxiliary fragments. The
additional use of these auxiliary fragments results in tiles of size
$$2 \times 2$$, each containing at least one fragment inside the
triangle under consideration. The data from these tiles can be used for the
approximation of derivatives, which are used for the calculation of textures.
Furthermore, the processing using hierarchical tiles is useful for the use
of mipmaps. Chapter 10 presents details on the processing of textures and
mipmaps.
In this context, it is important to note that in addition to deciding which
raster point is a fragment of the triangle to be rasterised, further data must
be calculated (by interpolation) for each of these fragments. This includes,
for example, depth values, colour values, fog parameters and texture values
(see Sect. 7.5.4). This data must be cached if it is derived from the data of
the environment of a fragment, which is often the case. Even if the tiles are
traversed sequentially, hierarchical traversal has the advantage that locally
adjacent data is potentially still in a memory buffer for fast access, i.e., a
cache, which can be (repeatedly) accessed efficiently. If the entire raster is
processed line by line according to a scan line technique, then the individual
scan lines might be so long in x-direction, for example, that required
neighbouring data are no longer in the cache when the next scan line is
processed for the next y-step. In this case, the capacity of the cache memory
might have been exhausted in the meantime. If this happens, the data must
either be recalculated or loaded from the main memory, both of which have
a negative effect on processing performance.
To solve the special case where a grid centre lies exactly on the edges of
two directly adjacent triangles, the top-left rule is often used. This rule is
applied to make an efficient decision for the assignment to only one
triangle. Here, the grid point is assigned exactly to the triangle where the
grid centre lies on an upper or a left edge. According to this rule, an upper
edge is defined as an edge that is exactly horizontal and located on the grid
above the other two edges. A left edge is defined as a nonhorizontal edge
located on the left side of the triangle. Thus, either one left edge or two left
edges belong to the triangle under consideration. These edge properties can
be efficiently determined from Eq. (7.17). According to this, an edge is a
horizontal edge if $$(a_0 = 0)$$ and $$(b_0 < 0)$$ holds, or a
left edge if $$(a_0 \ne 0)$$ and $$(b_0 > 0)$$ holds.
The complete test of whether a grid point is a fragment of the triangle to
be rasterised is sometimes referred to in the literature as inside test ([2, pp.
996–997]).
The Pineda algorithm can be used for the rasterisation of convex
polygons with N edges. The edges of a clipping area can be taken into
account and thus clipping can be performed simultaneously with the
rasterisation of polygons (cf. [17]). Furthermore, by using edge functions of
higher order, an extension to curved edges of areas is possible (cf. [17]). On
the other hand, any curved shape can be approximated by a—possibly large
—number of lines or triangles, which is often used in computer graphics.
Since every polygon can be represented by a decomposition of triangles
and a triangle is always convex, a triangle can be used as the preferred
geometric primitive. Processing on GPUs can be and is often optimised for
triangles. For the rasterisation of a triangle, the edge equations (7.13) for
$$N = 3$$ are to be used. A line can be considered as a long
rectangle, which—in the narrowest case—is only one fragment wide. Such
a rectangle can either be composed of two triangles or rasterised using the
decision functions (7.13) for $$N = 4$$. A point can also be
represented in this way as a small square, which in the smallest case
consists of a single fragment. This makes it possible to use the same
(parallelised) graphics hardware for all three necessary types of graphical
primitives.

7.5.4 Interpolation of Associated Data


In addition to deciding which raster point represents a fragment of the
triangle to be rasterised, whereby each fragment is assigned a position in
window coordinates, further data must be calculated for each of these
fragments. These data are, for example, depth values, colour values, normal
vectors, texture coordinates and parameters for the generation of fog effects.
This additional data is called associated data. Each vertex from the
previous pipeline stages is (potentially) assigned associated data in addition
to the position coordinates. In the course of rasterisation, the associated
fragment data must be obtained by interpolating this associated vertex data.

Fig. 7.27 Visualisation of the definition of barycentric coordinates in a triangle with vertices
$$P_0$$ , $$P_1$$ and $$P_2$$ . The areas $$A_0$$ , $$A_1$$ and $$A_2$$ ,
which are defined by subtriangles, each lie opposite a corresponding vertex. The dashed line h
represents the height of the subtriangle with the vertices $$P_0$$ , $$P_1$$ and $$Q_j$$

For such an interpolation, the barycentric coordinates (u, v, w) within a


triangle can be used. Figure 7.27 shows a triangle with vertices
$$P_0$$, $$P_1$$ and $$P_2$$. Based on the location
vectors (position coordinates)
$$\boldsymbol{p}_0 = (p_{0x}, p_{1y})^T$$,
$$\boldsymbol{p}_1 = (p_{1x}, p_{1y})^T$$ and
$$\boldsymbol{p}_2 = (p_{2x}, p_{2y})^T$$ to these points, a
location vector (position coordinates)
$$\boldsymbol{q}_j = (q_{jx}, q_{jy})^T$$ can be determined for a
point $$Q_j$$ within the triangle or on its edge as follows:
$$\begin{aligned} \boldsymbol{q}_j \; = \; u \; \boldsymbol{p}_0 + v \;
\boldsymbol{p}_1 + w \; \boldsymbol{p}_2. \end{aligned}$$
The real numbers of the triple $$(u, v, w) \in \mathbb {R}^3$$ are
the barycentric coordinates of the point $$Q_j$$. If
$$u + v + w = 1$$ holds, then $$u, v, w \in [0, 1]$$ and (u, v, w)
are the normalised barycentric coordinates of the point $$Q_j$$. In
the following considerations of this section, normalised barycentric
coordinates are assumed. As shown below, (u, v, w) can be used to calculate
the interpolated associated data for a fragment.
The barycentric coordinates of a triangle are defined by the areas
$$A_0$$, $$A_1$$ and $$A_2$$ of the triangles, which
are respectively opposite the corresponding points $$P_0$$,
$$P_1$$ and $$P_2$$ (see Fig. 7.27) as follows (see also [2, p.
998]):
(7.18)
$$\begin{aligned} u = \frac{A_0}{A_g} \quad v = \frac{A_1}
{A_g} \quad w = \frac{A_2}{A_g} \quad \text{ with } A_g = A_0 +
A_1 + A_2. \end{aligned}$$
As will be shown in the following, the surface area of the triangles can be
determined from the edge functions, which can also be used for Pineda’s
algorithm for the rasterisation of polygons (see Sect. 7.5.3). For this
purpose, consider the edge function (7.13) for $$i = 0$$, i.e., for the
line from the point $$P_0$$ to the point $$P_1$$:
$$\begin{aligned} e_0 (\boldsymbol{q}_j) \; = e_0 (x_j, y_j) \; =
\boldsymbol{n}_0^T (\boldsymbol{q}_j - \boldsymbol{p}_0).
\end{aligned}$$
The definition of the scalar product between two vectors gives the
following equation:
$$\begin{aligned} e_0 (\boldsymbol{q}_j) \; = \Vert
\boldsymbol{n}_0^T\Vert \cdot \Vert (\boldsymbol{q}_j - (7.19)
\boldsymbol{p}_0)\Vert \cdot cos (\alpha ). \end{aligned}$$
The angle $$\alpha $$ is the angle between the two vectors of the
scalar product. The first term $$\Vert \boldsymbol{n}_0^T\Vert $$ in
the formula is the length (magnitude) of the nonnormalised normal vector.
This vector results from a rotation by $$90^{\circ }$$ of the vector
from $$P_0$$ to $$P_1$$ (see Eq. (7.14)). Since the lengths of
the non-rotated and the rotated vectors are identical, the following holds for
the base b of the triangle $$A_2$$:
$$\begin{aligned} b = \Vert \boldsymbol{n}_0^T| = \Vert
\boldsymbol{p}_1 - \boldsymbol{p}_0 \Vert . \end{aligned}$$
The remaining term from Eq. (7.19) can be interpreted as the projection of
the vector from $$P_0$$ to $$Q_j$$ onto the normal vector.
This projection represents exactly the height h of the triangle
$$A_2$$, so it holds the following:
$$\begin{aligned} h = \Vert (\boldsymbol{q}_j - \boldsymbol{p}_0)\Vert
\cdot \cos (\alpha ). \end{aligned}$$
Since the area of a triangle is half of the base side multiplied by the height
of the triangle, the area can be determined from the edge function as
follows:
$$\begin{aligned} A_2 = \frac{1}{2} \; b \; h = \frac{1}{2} \; e_0
(\boldsymbol{q_j}). \end{aligned}$$
From the definition of the barycentric coordinates for triangles (Eq. (7.18)),
the following calculation rule for (u, v, w) results:
$$\begin{aligned} u = \frac{1}{e_g} \; e_1 (x_j, y_j) \quad v =
\frac{1}{e_g} \; e_2 (x_j, y_j) \quad w = \frac{1}{e_g} \; e_0 (x_j, (7.20)
y_j) \quad \end{aligned}$$

$$\begin{aligned} \text{ with } e_g = e_0 (x_j, y_j) + e_1 (x_j, y_j) +
e_2 (x_j, y_j). \end{aligned}$$
The term (1/2) has been dropped out by cancellation. Alternatively, the
barycentric w-coordinate can be determined by $$w = 1 - u - v$$. To
increase efficiency, $$e_g = 2\, A_g$$ can be precomputed and
reused for a triangle to be rasterised, since the area of the triangle does not
change. The barycentric coordinates for a fragment can thus be determined
very quickly if Pineda’s algorithm is used for the rasterisation of a triangle
and the above-described edge functions are applied for the inside test.
The barycentric coordinates according to Eq. (7.20) can be used for
interpolation for depth values and in the case of a parallel projection.
However, they do not always yield the correct interpolation results for
another type of projection or other associated data, if the position
coordinates of the underlying vertices of the points $$P_0$$,
$$P_1$$, $$P_2$$ in four-dimensional homogeneous
coordinates do not have 1 as the fourth coordinate after all transformations.
The normalisation of the homogeneous coordinates to the value 1 in the
fourth component of the position coordinates of a vertex takes place by the
perspective division, whereby a division of the first three position
coordinates by the fourth component takes place (see Sect. 5.41).
If this normalisation has not taken place, it can be done for each
fragment. For further explanations and references to the derivation, please
refer to [2, pp. 999–1001]. In the following, only the results of this
derivation are presented. With $$w_0$$, $$w_1$$ and
$$w_2$$ as the fourth components (homogeneous coordinates) of the
vertices of the points $$P_0$$, $$P_1$$ and $$P_2$$
after all geometric transformation steps, the perspective correct barycentric
coordinates $$(\tilde{u}, \tilde{v}, \tilde{w})$$ result as follows:

(7.21)

As can be seen from Eq. (7.21), the value must be determined


anew for each fragment, while from Eq. (7.20) is constant for all
fragments per triangle.
Let , and be associated data at the points , and
assigned to the respective vertices at these points. Then the interpolated
value of the associated datum at the point can be determined
using the perspective correct barycentric coordinates as follows:
(7.22)
These data are, for example, colour values, normal vectors, texture
coordinates and parameters for fog effects.
As mentioned above, the depth values , and at the respective
points , and can be interpolated using the barycentric coordinates
(u, v, w) from Eq. (7.20). The division should take place per
vertex. Then the following calculation rule for the interpolated depth values
can be used:
(7.23)
For simple rasterisation of polygons, the OpenGL specification [21] defines
the interpolation of depth values by Eq. (7.23) and for all other associated
data the interpolation by Eq. (7.22).

7.5.5 Rasterising and Filling Polygons in the OpenGL


The rasterisation and filling of polygons in the OpenGL are possible, for
example, with the help of a scan line technique (see Sect. 7.5.2) or the
Pineda algorithm (see Sect. 7.5.3). The OpenGL specification does not
prescribe a special procedure for this.
In the core profile, all polygons to be rendered must be composed of
planar triangles, because in this profile only separate planar triangles
(GL_TRIANGLES) or sequences of triangles (GL_TRIANGLE_STRIP or
GL_TRIANGLE_FAN) are supported as polygon shapes (see Sect. 3.2). In
the compatibility profile, there is a special geometric primitive for
polygons. It must be possible to calculate the data of the fragments
belonging to a polygon from a convex combination (a linear combination)
of the data of the vertices at the polygon corners. This is possible, for
example, by triangulating the polygon into triangles without adding or
removing vertices, which are then rasterised (cf. [20]). In OpenGL
implementations for the compatibility profile that rely on the functions of
the core profile, triangle fans (GL_TRIANGLE_FAN) are preferred for this
purpose.
In the OpenGL specification, point sampling is specified as a rule for
determining which fragment is generated for a polygon in the rasterisation
process. For this, a two-dimensional projection of the vertex position
coordinates is performed by selecting the x- and y-values of the position
coordinates of the polygon vertices. In the case that two edges of a polygon
are exactly adjacent to each other and a grid centre lies exactly on the edge,
then one and only one fragment may be generated from this [21].
For the interpolation of associated data (to the vertices) within a
triangle, i.e., for the filling of the triangle, the OpenGL specification defines
the use of barycentric coordinates (see Sect. 7.5.4). If an interpolation is to
take place, Eq. (7.22) is to be used for most data to achieve a perspective
correct interpolation. Only for depth values Eq. (7.23) must be used. It has
to be taken into account that the barycentric coordinates (u, v, w) or
used in these equations have to exactly match the coordinates of
the grid centres. The data for the fragment to be generated must therefore be
obtained by sampling the data at the centre of the raster element [21].
By default, the values of the shader output variables are interpolated by
the rasterisation stage according to Eq. (7.22). By using the keyword
noperspective before the declaration of a fragment shader input
variable, Eq. (7.23) is set to be used for the interpolation. This equation
shall be used for interpolating the depth values. If the keyword flat is
specified for a fragment shader input variable, then no interpolation takes
place. In this case, the values of the associated data from a defined vertex of
the polygon—the provoking vertex—is taken without any further processing
(and without an interpolation) as the associated data of all fragments of the
polygon in question (see Sect. 9). Using the keyword smooth for a
fragment shader input variable results in an interpolation of the values as in
the default setting (see above).3
The default setting option (smooth shading) allows the colour values of
a polygon to be interpolated using barycentric coordinates within a triangle,
which is required for Gouraud shading. Using this interpolation for normal
vectors (as associated data) realises Phong shading. By switching off
interpolation, flat shading can be achieved. Even without the use of
shaders, flat shading can be set in the compatibility profile by the command
glShadeModel(gl.GL_FLAT). The default setting is interpolation,
which can be activated by glShadeModel(gl.GL_SMOOTH) and
results in Gouraud shading. This allows the rasterisation stage of the
OpenGL pipeline to be used effectively for the realisation of the shading
methods frequently used in computer graphics. Section 9.4 explains these
shading methods in more detail. Section 3.2.4 describes how the command
glPolygonMode can be used to determine whether a polygon or a
triangle is filled, or whether only the edges or only points at the positions of
the vertices are drawn.

7.6 Aliasing Effect and Antialiasing


The term aliasing effect or aliasing describes signal processing phenomena
that occur when sampling a (continuous) signal if the sampling frequency
, also called sampling rate, is not twice the highest occurring frequency
in the signal being sampled. This relationship can be described as a
formula in the sampling theorem as follows:
(7.24)
Here, the frequency is called Nyquist frequency or folding frequency.
Sampling usually converts a continuous signal (of the natural world) into a
discrete signal. It is also possible to sample a discrete signal, creating a new
discrete signal.
In computer graphics, such a conversion takes place during
rasterisation, in which a vector graphic is converted into a raster graphic.
For objects described as a vector graphic, colour values (or grey values) can
be determined for any continuous location coordinate. However, this only
applies within the framework of a vectorial representation. If, for example,
a sphere has been approximated by a certain number of vertices of planar
triangles, then colour values at arbitrary coordinates can only be determined
for these approximating triangles and not for the original sphere. Strictly
speaking, in this example sampling of the sphere surface has already taken
place when modelling the sphere through the triangles. In this case, the
signal is already discrete. If textures, which are mostly available as raster
graphics, have to be (under- or over-) sampled, there is also no continuous
signal available that can be sampled. In these cases, a discrete signal is
sampled, resulting in another discrete signal. Due to the necessary
approximations, (further) undesirable effects may arise.
The rasterisation process samples the colour values of the available
signals at the discrete points of the raster (usually with integer coordinates)
onto which the scene is mapped for output to the frame buffer and later on a
screen. In the scene (in the source and target signal), there are periodic
signal components that are recognisable, for example, by repeating patterns.
The frequencies at which these signal components repeat are called spatial
frequencies. If a measurable dimension of the image is available, a spatial
frequency can be specified in units of (one divided by metres) or dpi
(dots per inch). The sampling (spatial) frequency results from the distances
between the points (or lines) of the raster. The shorter the distance between
the dots, the higher the sampling frequency.
A special feature of signals in computer graphics is their artificial origin
and the resulting ideal sharp edges, which do not occur in signals of the
natural world. From Fourier analysis, it is known that every signal can be
represented uniquely from a superposition of sine and cosine functions of
different frequencies, which in this context are called basis functions. Under
certain circumstances, an infinite number of these basis functions are
necessary for an exact representation. An ideal sharp edge in a computer
graphics scene represents an infinitely fast transition, for example, from
black to white. In other words, an infinitely high spatial frequency occurs at
the edge. Since the sampling frequency for an error-free sampling of this
spatial frequency must be at least twice as large (see Eq. (7.24)), the grid for
sampling would have to be infinitely dense. It is therefore impossible in
principle (except for special cases) to convert the ideal sharp edges of a
computer graphic into a raster graphic without error. There will always be
interferences due to approximations, which must be minimised as much as
possible.
In this book, the theory and relationships in signal and image processing
are presented in a highly simplified manner and are intended solely for an
easy understanding of the basic problems and solution approaches in
computer graphics. Detailed insights into signal processing and Fourier
analysis in image processing can be found for example in [8].

Fig. 7.28 Aliasing effect when rasterising dashed lines with different spatial frequencies

7.6.1 Examples of the Aliasing Effect


In Fig. 7.28, two dashed thin lines are shown greatly magnified, which are
discretely sampled by the drawn grid. For this example, the ideal sharp
edges through the broken lines shall be neglected and only the spatial
frequencies due to the dashing shall be considered. In the middle line, the
dashes follow each other twice as often as in the upper line. Due to this
dashing, the upper line contains a half as high spatial frequency as the
middle line. The filling of the squares in the grid indicates in the figure
which fragments are created by rasterising these lines. For the upper line,
the dashing is largely correctly reproduced in the raster graphic. For the
middle line, however, the resolution of the raster is not sufficient to
reproduce the spatial frequency of the dashing in the raster graphic. With
this aliasing effect, fragments are generated at the same raster positions as
with the rasterisation of a continuous line, which is drawn in the figure
below for comparison. In a continuous line, there are no periodic changes
(dash present–dash absent), so the spatial frequency contained in this line is
zero with respect to the dashing.
For the example of the upper line and by choosing this grid resolution,
which determines the sampling frequency, the sampling theorem is fulfilled.
The sampling frequency is exactly twice as large as the spatial frequency
due to the dashing. The sampling theorem is not fulfilled for the middle
line. The sampling frequency is equal to the spatial frequency through the
dashing, which causes the described aliasing effect. Furthermore, it can be
seen in this example that the result of the rasterisation only approximates
the upper line, since the lengths of the dashes of the dashing do not
correspond to the size of a raster element (fragment).
Fig. 7.29 Moiré pattern as an example of an aliasing effect

Figure 7.29 shows another example of an aliasing effect. For this


graphic, two identical grids were placed exactly on top of each other and
then one of these grids was rotated by 4.5 degrees (counter-clockwise).
When viewing the image, a (visual) sampling of the rear grid by the front
grid takes place. Since both rasters are identical, they have identical
resolutions and thus contain identical maximum spatial frequencies. As a
consequence, the sampling theorem is violated because the maximum
spatial frequency of the signal to be sampled (the rear grid) is equal to the
sampling frequency (determined by the front grid). In this case, the
sampling frequency is not twice as high, i.e., the sampling raster is not
twice as narrow. This causes an aliasing effect, which in this example is
expressed by the perceptible rectangular patterns that are not present in the
original rasters. Such patterns are referred to as Moiré patterns. They occur
on output devices in which a raster is used whenever the sampling theorem
is violated. The rectangular patterns observed in the Moiré pattern in this
example follow each other more slowly than the lines of the raster when
these sequences of patterns and lines are considered in the direction of a
coordinate axis. The new patterns thus have a lower spatial frequency than
the spatial frequency generated by the periodic succession of the grid lines.
This aliasing effect creates a low spatial frequency component in the
sampled signal (the result of the sampling process) that is not present in the
original signal.

Fig. 7.30 Examples of stairs in lines drawn by an OpenGL renderer: On the right is a cropped
magnification of the lines on the left

Figure 7.30 shows lines drawn by an OpenGL renderer.4 In the


magnification, the staircase-like progression is clearly visible, which is due
to the finite resolution of the grid for rasterisation. Such a shape for the
oblique lines results from the fact that a fragment can only be assigned one
grid position out of a finite given set. In each of the right oblique lines, a
regular pattern with a lower spatial frequency compared to the spatial
frequency of the raster can be seen, which is not provided by the line in
vector graphic representation. In the literature, the staircase effect is often
counted among the effects due to aliasing. Watt distinguishes this effect
from the classic aliasing effects, where high spatial frequencies that are too
high for sampling at the given sampling frequency erroneously appear as
low spatial frequencies in the sampled signal (see [23, p. 440f]), as can be
seen in the example of the Moiré pattern (see above).
Staircases like the one in Fig. 7.30, sometimes referred to as frayed
representations, are also found in rasterised points and at the edges of
polygons. When objects, usually represented by geometric primitives, are
animated, the resulting movement adds temporal sampling at discrete points
in time, which can lead to temporal aliasing effects. Together with the
spatial aliasing effects at the edges of the objects, these effects can be very
disturbing in computer graphics.

7.6.2 Antialiasing
As is known from signal processing and can be derived from the sampling
theorem (7.24), there are two ways to avoid or reduce aliasing effects.
Either the sampling frequency is increased or the (high) spatial frequencies
contained in the signal are reduced, so that the sampling frequency is twice
as high as the maximum spatial frequency occurring in the signal and thus
the sampling theorem holds. To reduce the maximum spatial frequency, the
signal must be filtered with a filter that allows only low spatial frequencies
to pass and suppresses high frequencies. A filter with these properties is
called a low-pass filter.
In computer graphics, the resolution of the raster determines the
sampling frequency (see Sect. 7.6). Using a fine raster instead of a coarse
raster for rasterisation results in an increase of the sampling frequency,
which is called upsampling. The resolution of the frame buffer determines
the target resolution for the output of the graphics pipeline. However, the
grid resolution for the steps in the graphics pipeline from rasterisation to the
frame buffer can be increased. Before the signal (the rendered scene) is
output to the frame buffer, the increased grid resolution must be reduced
again to the resolution of the frame buffer. In signal processing, this step of
reducing the resolution is called downsampling and usually consists of low-
pass filtering5 with a subsequent decrease of the resolution (decimation).
Since this filtering takes place as the last step after processing the
fragments, this antialiasing method is called post-filtering.
Fig. 7.31 Pre-filtering (left) and post-filtering (right) in the OpenGL graphics pipeline: The part of
the graphics pipeline after vertex processing is shown. This representation is valid for both the fixed-
function pipeline and the programmable pipeline

The right side of Fig. 7.31 shows the steps for post-filtering within the
OpenGL graphics pipeline. As part of the rasterisation, an increase in the
resolution of the grid (compared to the resolution of the frame buffer) takes
place through a so-called multisampling of the scene. In OpenGL, the
special procedure multisample antialiasing is used for post-filtering, which
is explained in more detail in Sects. 7.6.6 and 7.6.8. An essential step of this
process is to increase the sampling rate of the scene by using K samples per
pixel of the frame buffer resolution. The reduction is the final step in
processing the fragments by the per-fragment operations, in which the scene
is converted back to the frame buffer resolution by downsampling.
Section 7.6.5 describes the general post-filtering procedure. Section 7.6.6
contains detailed explanations of some frequently used algorithms,
especially the special procedure multisample antialiasing.
In the pre-filtering antialiasing method, low-pass filtering takes place
during the generation of the fragments by rasterisation (see the left side in
Fig. 7.31). If the original signal—for example, a line to be drawn—is
described analytically or as a vector graphic, then the signal—from the
point of view of signal processing—is present in an infinitely high
resolution. Low-pass filtering as part of pre-filtering suppresses high spatial
frequencies in order to reduce the resolution of the signal to the resolution
of the frame buffer. All further steps of the graphics pipeline are performed
with the (non-increased) resolution of the frame buffer. Section 7.6.3
contains more details of this method.
Since in the pre-filtering method the signals can be analytically
described and thus there is an infinitely high resolution, pre-filtering is also
called area sampling. In contrast, post-filtering is also called point sampling
to express that signals can also be processed that have already been sampled
(discrete signals).
The procedures for drawing lines, shown in Sect. 7.3, determine which
fragments are to be placed at which points of the grid. Thus, only whether a
fragment is present or not is considered. If the associated data are
disregarded for a moment, then a fragment is coloured either black or white.
In the general case, the lines thus generated will have a staircase shape or a
frayed appearance. Furthermore, they will have sharp edges at the
transitions between a set and an unset fragment, and thus infinitely high
spatial frequencies.
Fig. 7.32 Smoothed stair steps in lines drawn by an OpenGL renderer using post-filtering: On the
right is a cropped magnification of the lines on the left

The scan line technique for drawing polygons described in Sect. 7.5.2
fills a polygon whose edge has already been drawn by lines. The algorithm
according to Pineda (see Sect. 7.5.3) allows polygons to be filled without
drawing an edge beforehand. However, the same effects occur at the edges
of the polygons as when drawing lines. Thus, these methods for drawing
polygons also result in staircase-like edges.
If geometric primitives for (large) points are understood as special
polygons or special lines and drawn with the methods from Sects. 7.3 or
7.5, then it is obvious that even for rasterised points staircase-like edges
occur.
This staircase effect, described in more detail in Sect. 7.6.2, which is a
major problem in computer graphics, can be reduced by antialiasing by
using the grey levels (or colour gradations) of a fragment. The low-pass
filtering of the antialiasing methods described below softens or blurs the
hard edges of the stair steps, producing a smoother or softer rendering
result. At the same time, this reduces high spatial frequencies in the scene.
Figure 7.32 shows lines drawn by an OpenGL renderer using post-filtering
for antialiasing. The smoothing effect of using greyscales is clearly visible,
in contrast to the lines in Fig. 7.30, which were drawn without antialiasing
measures. Comparing the two images, it is also noticeable that the
smoothed lines are blurrier. This is very evident when comparing the
magnifications on the right side of the figures. In typical computer graphics
scenes, where a suitable camera position and resolution are used for
rasterisation, this blurriness is not very disturbing, as can be seen on the left
side of Fig. 7.32. The advantage of the smooth and less staircase-like lines
prevails. Nevertheless, it can be stated in principle that due to the necessary
low-pass filtering for antialiasing, a perceived smooth shape is exchanged
for the sharpness of the image.

7.6.3 Pre-Filtering
In the following, two approaches for pre-filtering are presented. In the
unweighted area sampling, a line is considered as a long, narrow rectangle.
Each fragment is assigned a square from the pixel grid. The intensity with
which the fragment is coloured is chosen in proportion to how much of the
square is covered by the line segment. Figure 7.33 illustrates this concept.

Fig. 7.33 An example of a line drawn with unweighted area sampling

A major disadvantage of this method is that for each fragment, the


intersection with the line must be determined and then its area must be
computed. This calculation is very computationally intensive. A simple
heuristic for approximating the proportion covered by the rectangle (the
line) is to cover each square of the grid (fragment candidate) with a finer
sample grid. Determining the proportion of samples inside the rectangle
versus the total number of samples gives an approximation of the
proportion of the square area covered by the line and thus an approximation
of the intensity for the fragment.
Fig. 7.34 Example representation of a line segment on a raster: Distinction between the terms
fragment, pixel (raster element) and sample (based on [14, p. 1056])

Figure 7.34 shows a line segment on a raster as an example. The finer


sample raster consists of four samples per pixel. In OpenGL, the position of
a raster element is indicated by its bottom left corner, which lies on integer
coordinates of the raster. The centres of each raster element (indicated by a
dot in the figure) are offset by the vector from these integer
coordinates. A raster element becomes a fragment by being identified as
belonging to a geometric primitive through a rasterisation process. The
fragment marked with the cross thus has the coordinate (3, 2) and an
approximate value of the area covered by the line of 1. The fragment (1, 1)
has the value and the fragment (3, 3) has the value . These values are
called coverage values. This concept of using a coverage value for each
fragment can also be applied to other geometric primitives.
At this point, it is important to note a slightly different usage of the
term fragment. In Fig. 7.34, a fragment refers to a generally non-square
portion of a geometric primitive that covers a pixel (raster element). In
OpenGL, a fragment is represented by a two-dimensional coordinate (the
lower left corner), so that it can always be assigned to exactly one whole
pixel of the pixel raster and it can only be square.6 Due to the finite
resolution of the grid, the incomplete coverage of a pixel in OpenGL is
taken into account by the change of the colour or grey value of the fragment
by the coverage value.

Fig. 7.35 Schematic representation of weighted area sampling

In the weighted area sampling approach, not only the covered area is
calculated but also a weighting function w(x, y) is used. This achieves a
better fit to the geometric primitive and an improvement of antialiasing.
The weighting function w(x, y) has the highest value in the centre of the
pixel on the pixel raster and decreases with increasing distance. A typical
weighting function is shown on the right of Fig. 7.35. This function is
defined for a circular area A around a pixel P, as shown on the left of the
figure. The intensity of the fragment is calculated as follows:
In this term, is the intersection of the circle with the line rectangle.
Although the formula may seem complicated at first due to the integrals, the
intensity of a fragment in this case depends only on its distance from the
line. The intensity can therefore be written as a function , where
is the distance of the centre of the fragment P from the line. The number of
intensity values that can be displayed is limited, on screens mostly by the
value 256. For antialiasing, however, it is sufficient to use only a few
intensity levels. Instead of the real function ,a
discretised variant is sufficient if the intensity
levels are to be used. For this purpose, the grid must be traversed
and the resulting value must be determined for each fragment. This
problem is obviously similar to the task of drawing a curve on a
grid. Curve drawing also involves traversing a grid—in this case only along
one axis—and calculating the rounded function value round(f(x)) in
each step. The differences to antialiasing are that the raster is scanned not
only along one coordinate axis but also in the neighbourhood of the
fragment and that the calculated discrete value is not the y-coordinate of
a fragment to be set, but one of the finitely many discrete intensity values.
Based on this analogy, efficient algorithms for antialiasing can be
developed, which are based on the principle of the midpoint algorithm.
These include Gupta–Sproull antialiasing [12] and the algorithm of
Pitteway and Watkinson [19]. The algorithm of Wu [25] uses an improved
distance measure (error measure) between the ideal line and possible
fragments compared to the midpoint algorithm, which leads to a very fast
algorithm for drawing lines while taking antialiasing into account. For a
detailed description of these methods, please refer to the original literature
and to [11, 13].
Through the weighted integration present in formula (7.6.3) or—in the
discrete variant—through the weighted summation, an averaging and thus
the desired low-pass filtering effect is achieved. To optimise this low-pass
filtering, other weighting functions can be used as an alternative to the
function shown in Fig. 7.35.
Fig. 7.36 Dots, lines and a triangle drawn without antialiasing by an OpenGL renderer: The lower
part shows a cropped magnification of the upper part of the image

7.6.4 Pre-Filtering in the OpenGL


Figure 7.36 shows points (GL_POINTS), lines (GL_LINE_STRIP) and a
triangle (GL_TRIANGLE_STRIP) drawn by an OpenGL renderer without
antialiasing measures. The dot size is eight and the line width is three
fragments. In the highly magnified parts of the figure, the staircase effect
described in Sect. 7.6.1 is clearly visible in the drawn lines and at the edges
of the triangle. The dots are displayed as squares due to the square
geometry of the pixels of the output device (an LCD screen in this case).
This can also be interpreted as a staircase effect, since the target geometry
of a point is a (small) circular disc.
In OpenGL, antialiasing can be enabled using the pre-filtering method
for drawing geometric primitives by a glEnable command, and it can be
disabled by a glDisable command. Column target of Table 7.4 shows
available arguments for these commands. For example, antialiasing is
enabled by pre-filtering with the command
glEnable(gl.GL_LINE_SMOOTH) for drawing lines. This affects the
drawing of contiguous lines (GL_LINE_STRIP, GL_LINE_LOOP) and
separate lines (GL_LINES). Antialiasing by pre-filtering for points is only
available in the compatibility profile, while pre-filtering for the other
primitives can also be activated in the core profile.
Table 7.4 Arguments (see column target) of the glEnable command for activating antialiasing by
pre-filtering for geometric primitives in the compatibility and core profile

Geometric primitive Target Availability


Point GL_POINT_SMOOTH Compatibility profile
Line GL_LINE_SMOOTH Compatibility and core profile
Polygon GL_POLYGON_SMOOTH Compatibility and core profile

In OpenGL, colours can be represented by additive mixing of the three


primary colours red, green and blue, using the RGB colour model (see Sect.
6.2). Using the fourth alpha component transparency and translucency can
be taken into account (see Sect. 6.3). Thus, the colour of a fragment
including a transparency value can be expressed as a quadruple of the form
with which is referred to
as RGBA colour value. In the core profile and in JOGL, this is the only way
to represent colours.
For antialiasing using pre-filtering, the coverage value is determined,
which indicates the proportion of a pixel that is covered by a fragment of a
geometric primitive (see Sect. 7.6.3 and Fig. 7.34). This coverage value is
multiplied by the alpha value of the colour value of the fragment. If the
colour value including the alpha value of the fragment is ,
then after taking into account, the quadruple
$$(R_s, G_s, B_s, A_s) = (R_f, G_f, B_f, \rho \cdot A_f)$$ is
obtained. The modified colour value of this fragment is then mixed with the
existing colour value $$(R_d, G_d, B_d, A_d)$$ of the pixel at the
respective position in the frame buffer. To enable this operation, blending
must be activated and the factors for the blending function must be
specified. A commonly used blending function can be activated and set by
the following JOGL commands:
This defines the following mixing function:
(7.25)
$$C_s$$ in this formula represents each component of the quadruple
of the source $$(R_s, G_s, B_s, A_s)$$, i.e., the colour values of the
fragment modified by $$\rho $$. $$C_d$$ represents each
component of the destination $$(R_d, G_d, B_d, A_d)$$, i.e., the
colour values of the pixel to be modified in the frame buffer. This is the
classic and frequently used blending function, which is a simple convex
combination (see Sect. 5.1.4).

Fig. 7.37 Dots, lines and a triangle drawn with pre-filtering for antialiasing by an OpenGL renderer:
The lower part shows a cropped magnification of the upper part of the image

Figure 7.37 shows the output result of an OpenGL renderer in which the
respective commands GL_POINT_SMOOTH, GL_LINE_SMOOTH,
GL_POLYGON_SMOOTH and antialiasing with the blend function (7.25)
were used. The same primitives with the same parameters as in Fig. 7.36
were drawn. Comparing the images with and without antialiasing, the
smoother rendering with antialiasing is noticeable, significantly improving
the quality of the rendering. The magnifications show a round shape of the
dots. At the edges of the objects, the use of graduated grey values is
noticeable, resulting in a smoothing of the staircase-shaped edges.
Furthermore, it can be seen that antialiasing has created blurs that are not
very disturbing in the non-magnified illustration (see Sect. 7.6.2). The
magnifications of the rendering results in this and the following figures are
only shown to illustrate the antialiasing effect. These effects are usually not
or only hardly visible in an output of an OpenGL renderer without special
measures.
Fig. 7.38 Rendering of a polygon of two triangles drawn by an OpenGL renderer using pre-filtering
for antialiasing: The right side shows a cropped magnification of the left part of the figure. The upper
part of the image was created with the blend function (7.25). For the lower part of the image, the
blend function (7.26) was used

As explained in Chap. 4, in order to represent a three-dimensional


object in computer graphics very often only its surface is modelled, which
usually consists of a sequence of connected planar triangles. The left part of
Fig. 7.38 shows a white planar surface against a black background
composed of two adjacent triangles. The surface was drawn by a
TRIANGLE_STRIP. For antialiasing with POLYGON_SMOOTH in the
upper part of the image, the blending function (7.25) was used. The edge
between the two triangles is clearly visible, which is normally undesirable
for a planar surface. This effect is due to the blending function used.
Assume that a fragment at the edge between the triangles has a coverage
value of $$\rho = 0.5$$. Then the colour value of this white fragment
is (1.0, 1.0, 1.0, 0.5). Before drawing, let the background in the frame buffer
be black (0, 0, 0, 1.0). While drawing the first triangle, the formula (7.25) is
applied and creates the RGB colour components
$$C_d = 0.5 \cdot 1.0 + (1- 0.5) \cdot 0 = 0.5$$ for the respective
edge pixel in the frame buffer. If the second triangle is drawn, such an edge
fragment again meets an edge pixel already in the frame buffer and by the
same formula the RGB colour components
$$C_d = 0.5 \cdot 1.0 + (1- 0.5) \cdot 0.5 = 0.75$$ are calculated.
Since the resulting RGB colour values are not all equal to one and thus do
not produce the colour white, the undesired grey line between the triangles
is visible.
In this case, the grey line separating the two triangles can be suppressed
by choosing the following blend function:
$$\begin{aligned} C_d = A_s \cdot C_s + C_d. \end{aligned}$$ (7.26)
The following JOGL commands activate blending and define the blend
function (7.26):

Using the blend function (7.26), the colour components of the overlapping
edge fragments of the triangles add up exactly to the value one. This results
in white fragments also at the transition of the two triangles and thus the
grey line disappears. The rendering result for the planar surface with these
settings is illustrated in the lower part of Fig. 7.38.

Fig. 7.39 Illustration of two coloured polygons, each consisting of two triangles, drawn by an
OpenGL renderer using pre-filtering for antialiasing: The middle part of the figure shows a cropped
magnification of one polygon. The right part of the figure contains the cropped magnification of a
polygon with contrast enhancement

Unfortunately, choosing the blending function does not solve all the
problems of drawing polygons with antialiasing by pre-filtering. Figure
7.39 shows two coloured overlapping polygons, each consisting of two
triangles, drawn by an OpenGL renderer using the primitive
TRIANGLE_STRIP. For antialiasing, POLYGON_SMOOTH with the blend
function (7.26) was used. In the polygons, edges between the triangles are
again visible. For clarity, in the magnification on the right side of the figure,
a very strong contrast enhancement has been applied.7
Furthermore, the blending function results in an undesired colour
mixture in the overlapping area of the polygons. This colour mixing can be
avoided by using the depth buffer test, but in this case the edges of the
triangles become visible as black lines. An algorithm for the depth buffer
test (depth buffer algorithm) can be found in Sect. 8.2.5. Section 2.5.6
shows the integration into the OpenGL graphics pipelines. Detailed
explanations of the depth buffer test in OpenGL are available in [22, pp.
376–380].
The problems presented in this section can be avoided by using post-
filtering methods for antialiasing, which are presented in the following
sections. Section 7.6.8 contains corresponding OpenGL examples.

7.6.5 Post-Filtering
As explained in the introduction to Sect. 7.6, computer-generated graphics
usually contain ideal edges that have infinitely high spatial frequencies.
This means that in principle the sampling theorem cannot hold and aliasing
effects occur. Although such graphics are not band-limited, the resulting
disturbances can nevertheless be reduced by antialiasing measures.
In post-filtering, an increase of the sampling rate by upsampling takes
place during rasterisation (see Sect. 7.6.2 and Fig. 7.31). This uses K
samples per pixel, resulting in an overall finer grid for the rasterisation of
the scene.

Fig. 7.40 Examples of sample positions (small rectangles) within a pixel in a regular grid-like
arrangement

Figure 7.40 shows some examples of regular, grid-like arrangements of


samples within a pixel. The small squares represent the sample positions
within a pixel. The round dot represents the centre of a pixel. This position
is only sampled in the middle example.
If shading—that is, the execution of all steps of the graphics pipeline
from rasterisation to reduction (see the right part of Fig. 7.31)—is applied
for each of these samples, then this process is called supersampling
antialiasing (SSAA). The advantage of this method lies in the processing of
all data, including the geometry data, the materials of the objects and the
lighting through the entire graphics pipeline with the increased sampling
resolution. This effectively counteracts aliasing effects from different
sources. Another advantage is the ease of implementation of this method,
without having to take into account the properties of individual geometric
primitives.
One disadvantage of supersampling antialiasing is the computational
effort, which is in principle K times as large as the computation without a
sampling rate increase. However, this procedure can be parallelised well
and thus executed well by parallel hardware. Since the calculation of the
entire shading for each sample within a pixel is not necessary in normal
cases, efficient algorithms have been developed whose approaches are
presented in Sect. 7.6.6.
If a shading value (colour value) was determined for more than one
sample per pixel, then the rendered image with the increased sampling rate
must be reduced to the resolution of the frame buffer by downsampling so
that the rendering result can be written to the frame buffer for output to the
screen (see Fig. 7.31).
If a pixel is represented by K samples at the end of the processing in the
graphics pipeline, then in the simplest case exactly one sample per pixel can
be selected. However, this would make the processing in the graphics
pipeline for the other $$(K-1)$$ samples per pixel redundant. This
variant is also detrimental from the point of view of signal and image
processing, since potentially high spatial frequencies are not reduced and
aliasing effects can arise from this simple form of downsampling.
As explained in the introduction of Sect. 7.6.2, low-pass filtering is
required to reduce this problem. The simplest form of low-pass filtering is
the use of an averaging filter (boxcar filter). This can be realised by taking
the arithmetic mean or arithmetic average (per colour component) of the
colour values of all samples per pixel and using it as the colour value for the
downsampled pixel. This averaging can be represented graphically as
a convolution matrix, also called filter kernel or kernel. In the top row of
Fig. 7.41, averaging filter kernels of different sizes are shown. The value of
each cell of the filter kernel is multiplied by the colour value of a sample at
the corresponding position within the pixel. Subsequently, these partial
results are added, and the total result is divided by the number of cells of
the filter kernel. The last operation is shown as factors left to the filter
matrices in the figure. The application of these factors is necessary to avoid
an undesired increase in contrast to the resulting image.8

Fig. 7.41 Examples of low-pass filter kernels of different sizes: The upper part shows kernels of
averaging filters (boxcar filters). The lower part shows kernels of binomial filters

This simple averaging can cause visible disturbances (called artefacts in


image processing) (see, for example, [8, p. 98]). Therefore, it makes sense
to use filter cores with coefficients that slowly drop off at the edges of the
filter kernel. Well-suited filter kernels are binomial filters, which can be
derived from the binomial coefficients. For high filter orders (for large
dimensions of the kernel), this filter approximates the so-called Gaussian
filter, which has the characteristic symmetric “bell-shape” curve. In Fig.
7.41, filter kernels of the binomial filter of different sizes are shown in the
bottom row. Also when using these filter kernels, the value of a cell of the
filter kernel is multiplied by the colour value of a sample at the
corresponding position in the pixel. The partial results are summed and then
multiplied by the factor left to the filter kernels in the figure to obtain a
filter weight of one.9 The value resulting from this calculation represents
the result of subsampling for one pixel.
In the downsampling approach described so far, the number K of
samples per pixel (see Fig. 7.40) must exactly match the dimensions of the
low-pass filter kernel (see Fig. 7.41) in both dimensions. In principle,
downsampling including low-pass filtering can also include samples of
surrounding pixels by choosing larger filter kernels, with a number of
elements larger than K. This results in a smoothing effect that goes beyond
pixel boundaries. However, this results in greater image blur.
Furthermore, in the downsampling procedure described so far, it has
been assumed that the downsampling is performed in one step and per
pixel. Formally, this operation of reduction by downsampling can be broken
down into two steps: (1) low-pass filtering and (2) selection of one sample
per pixel (decimation). In general, the first step represents the linear
filtering of an image $$I(m, n);\, m = 0, 1,... , M;\, n = 0, 1,..., N$$
with a filter kernel $$H(u, v);\, u = 0, 1,... , U;\, v = 0, 1,..., V$$. Let
I(m, n) be the intensity function of a greyscale image or one colour channel
in a colour image. M and N are the dimensions of the image and U and V
are the dimensions of the filter kernel in horizontal and vertical directions,
respectively. For example, for low-pass filtering, a filter kernel H(u, v) from
Fig. 7.41 can be used. The linear filter operation can be described as a two-
dimensional discrete convolution such that the filtered image
$$I' (m, n)$$ is given by the following formula:

(7.27)

In this operation, it should be noted that the filter kernel H(u, v) used should
have odd dimensions in both dimensions so that a unique value for the
sample in the middle can be determined. The dimension of the filter kernel
in the horizontal or vertical direction is the finite number of non-zero
elements in the central region of the filter kernel (see Fig. 7.41 for
examples). Such a kernel can be extended with zeros in both dimensions in
the negative and positive directions, which has no effect on the result of Eq.
(7.27). Since the resulting image has the same dimension and the
same resolution as the original image I(m, n), the second step is to select a
sample for each pixel (decimation) according to the downsampling factor
(in this case K) to complete downsampling.

7.6.6 Post-Filtering Algorithms


In Sect. 7.6.5, supersampling antialiasing (SSAA) was introduced as an
easy-to-implement but computationally expensive procedure for reducing
aliasing effects as a post-filtering method. Some alternative algorithms are
outlined below.
If the graphics hardware does not support supersampling antialiasing,
the calculations for the individual K sample positions can also be carried
out one after the other and the (intermediate) results stored in a so-called
accumulation buffer. At the beginning of this calculation, the buffer is
empty. Subsequently, rasterisation and shading of the whole image are
performed successively at the K different sample positions. These different
sample positions can also be understood as different viewpoints on the
scene. The intermediate results for a sample position are each multiplied by
a corresponding weighting for the low-pass filtering and added to the
content of the accumulation buffer. At the end of the calculation, the low-
pass filtering factor is multiplied to avoid a contrast increase in the result
image. In the case of an averaging filter, this is a division by the number of
sample positions, and in the case of a binomial filter, a factor that can be
taken from Fig. 7.41, for example. In order to record the intermediate
results without loss, the accumulation buffer usually has a higher resolution
for the colour values.
The described calculation steps for this method are mathematically
equivalent to supersampling antialiasing (see Sect. 7.6.5) including low-
pass filtering. This method does not require any memory besides the
accumulation buffer, which has the same resolution as the frame buffer. In
particular, no frame buffer with an increased resolution by the factor K
needs to be provided. The disadvantage, however, is that the possible frame
rate decreases or the hardware speed requirements increase due to the
temporally sequential computations. Ultimately, the same computations are
carried out as with supersampling antialiasing, which still leads to an
increased effort by a factor of K compared to a computation without a
sampling rate increase.
With both supersampling antialiasing and the use of an accumulation
buffer, shading must be performed K times more often (with K as the
number of samples per pixel) than without sampling rate increase.
According to Fig. 7.31, shading in OpenGL includes early-per-fragment
operations, fragment processing and per-fragment operations. In the case of
a programmable graphics pipeline, fragment processing takes place in the
fragment shader, so this shader is called K times more often.
However, observations of typical computer graphics scenes show that
shading results vary more slowly within a geometric primitive than between
geometric primitives due to materials of the objects and lighting. If these
primitives are large enough relative to the grid so that they often completely
cover pixels, then often all samples within a pixel will have the same
(colour) values when these values arise from a fragment of one geometric
primitive. Thus, shading with a fully increased resolution is rarely
necessary. The methods presented below avoid this overshading by
performing fewer shading passes per pixel.
The A-buffer algorithm, which is described for example in [9] and [14,
p. 1057], separates the calculation of coverage and shading. In this
algorithm, the coverage is calculated with the increased resolution and the
shading without increasing the resolution (i.e., it remains at the resolution
of the frame buffer). Furthermore, the depth calculation (see Sect. 8.2.5) is
integrated with this algorithm to take into account the mutual occlusion of
fragments. This occlusion calculation is also performed for each sample and
thus with the increased resolution. If at least one sample of a pixel is
covered by a fragment, then this fragment is added to a list for the
respective pixel. If this fragment covers another fragment already in the list,
then the existing fragment is removed from the list for this pixel. After all
fragments have been rasterised, the shading calculation is performed for all
fragments that remain in the lists. Only one shading pass is performed per
fragment and the result is assigned to all samples that are covered by the
respective fragment. Subsequently, the reduction to the resolution of the
frame buffer takes place as described in Sect. 7.6.5.
The coverage calculation causes a known overhead. In contrast, the
general computational overhead for the shading results (in programmable
graphics pipelines) is not predictable for all cases due to the free
programmability of the fragment shader. In general, the A-buffer algorithm
is more efficient than supersampling antialiasing. Furthermore, this
algorithm is particularly well suited to take into account transparent and
translucent fragments. For an explanation of transparent objects, see Sect. 9.
7. The main disadvantage is the management of the lists of fragments per
pixel and the associated need for dynamic memory space. Since the
complexity of the scenes to be rendered is not known in advance, this
memory requirement is theoretically unlimited. Furthermore, this algorithm
requires integration with the rasterisation algorithm and the coverage
calculation. For this reason, the A-buffer algorithm is preferably used for
the pre-computation of scenes (offline rendering) and less for (interactive)
real-time applications.
Another algorithm for post-filtering is multisample antialiasing
(MSAA). This term is not used consistently in computer graphics literature.
Usually, multisample antialiasing refers to all methods that use more
samples than pixels—especially for coverage calculation—but which, in
contrast to supersampling antialiasing, do not perform a shading
computation for each sample. Thus, the A-buffer algorithm and coverage
sample antialiasing (CSAA) (see below) belong to this group of methods.
According to this understanding, supersampling antialiasing does not
belong to this group. In the OpenGL specifications [20, 21], multisample
antialiasing is specified with an extended meaning, since this method can be
extended to supersampling antialiasing by parameter settings (see Sect.
7.6.8).
In the following, multisample antialiasing (MSAA) is understood in a
narrower sense and is presented as a special procedure as described in [14,
pp. 1057–1058]. This procedure works similar to the A-buffer algorithm,
but no lists of fragments are managed for the individual pixels. A depth test
(see Sect. 8.2.5) is performed. The coverage calculation takes place with the
increased resolution (for each sample), which is directly followed by the
shading computation per fragment.

Fig. 7.42 A triangle primitive covers samples (small rectangles) within a pixel (left side). The small
filled circle represents the centre of the pixel. The right side of the figure shows the corresponding
binary coverage mask

First, a coverage mask is calculated for each fragment, indicating by its


binary values which samples within that pixel are covered by the fragment.
Figure 7.42 shows an example of such a binary coverage mask for a pixel
containing a regular grid-like sample pattern. The covered samples marked
on the left side are entered as ones in the coverage mask on the right side. A
zero in the mask indicates that the sample is not covered by the primitive.
Meaningful operations between these masks are possible through Boolean
operations. For example, from two coverage masks derived from different
fragments, a coverage mask can be determined by an element-wise
exclusive or operation (XOR), which indicates the coverage of the samples
of the pixel by both fragments together.
In most implementations of multisample antialiasing, the coverage mask
is obtained by rasterisation and a depth test with an increased resolution (for
K samples). If at least one sample is visible due to the depth test, a shading
value is calculated for the entire pixel. This calculation takes place with the
non-increased resolution (with the resolution of the frame buffer). The
position within the pixel used for shading depends on the specific OpenGL
implementation. For example, the sample closest to the centre of the pixel
can be used, or the first visible pixel can be used. It does not matter whether
this sample is visible or covered by a fragment. If supported by the OpenGL
implementation, the position of the sample used for shading can be moved
so that it is guaranteed to be covered by the fragment. This procedure is
called centroid sampling or centroid interpolation and should only be used
deliberately, as such a shift can cause errors in the (later) calculation of
gradients of the rasterised image. Such gradients are used for example
for mipmaps (see Sect. 10.1.1). The resulting (single) shading value is used
to approximate the shading values of all samples of the pixel. After all
shading values for all samples of all pixels have been determined, the
reduction to the resolution of the frame buffer takes place as described in
Sect. 7.6.5.
Since only one shading calculation takes place per pixel, this method is
more efficient than supersampling antialiasing. However, this advantage
also leads to the disadvantage that the variations of the shading function
within a pixel are not taken into account and thus the approximation of a
suitable shading value for a pixel is left to the shader program. Essentially,
this method reduces disturbances that occur at geometric edges. This
approximation is usually useful for flat, matt surfaces, but can lead to
visible disturbances on uneven surfaces with strong specular reflections.
With multisample antialiasing, shading only needs to be calculated once per
pixel, but the resulting shading values are stored per sample, resulting in the
same memory requirements as for supersampling antialiasing. Furthermore,
multisample antialiasing works well for geometric primitives that are
significantly larger than one pixel and when the number of samples per
pixel is large. Otherwise, the savings from the per-pixel shading
computation are small compared to supersampling antialiasing. Another
disadvantage of multisample antialiasing is the difficulty of integration with
shading calculations in deferred shading algorithms. Section 9.2 provides
explanations of these algorithms.
Furthermore, since the multisample antialiasing method described
above has not only efficiency advantages but also a number of
disadvantages, there are more advanced methods that aim to maximise the
advantages of the multisample approach while minimising its
disadvantages. For reasons of space, the coverage sample antialiasing
(CSAA) is mentioned here only as one such representative where the
resolutions (number of samples used per pixel) for the coverage, depth and
shading calculations can be different. A description of this algorithm can be
found, for example, in [14, pp. 1058–1059].
In contrast to pre-filtering (see Sect. 7.6.3), the post-filtering methods
are not context-dependent and have a global effect. In particular, the post-
filtering methods are not dependent on specific geometric primitives. The
basic strategy for reducing aliasing effects is the same for all image areas
within a method. This is usually an advantage. However, it can also be a
disadvantage, for example, if the scene contains only a few objects and the
entire method is applied to empty areas. It must be added that some
methods, such as the A-buffer algorithm, depend on the number of
fragments and thus the computational effort is reduced for areas of the
scene with few objects. Furthermore, when using regular grid-like
arrangements of samples with a fixed number of pixels, very small objects
may fall through the fine sample grid and may not be detected. This
problem can be reduced by using an irregular or random sample
arrangement (see Sect. 7.6.7).
Since most of the presented methods for post-filtering have advantages
and disadvantages, the appropriate combination of methods and parameter
values, as far as they are selectable or changeable, has to be chosen for a
specific (real-time) application.

7.6.7 Sample Arrangements for Post-Filtering


In contrast to regular arrangements of samples on a grid, as shown in Fig.
7.40, alternative arrangements of samples within a pixel are possible. This
can further reduce aliasing effects and improve the quality of the rendering
result. It is important to ensure that the samples cover as many different
locations within the pixel as possible in order to represent the fragment to
be sampled as well as possible. Figure 7.43 shows some alternative sample
arrangements. At the top left of the figure is an arrangement that uses only
two samples per pixel, with the samples still arranged in a regular grid. The
slightly rotated arrangement of the samples in the Rotated Grid Super
Sampling (RGSS) pattern in the centre top of the figure improves
antialiasing for horizontal or vertical lines or nearly horizontal or nearly
vertical lines, which is often necessary. By moving the samples to the edge
of the pixel in the FLIPQUAD pattern at the top right of the figure, the
sampling points for several neighbouring pixels can be shared. For this, the
pattern of the surrounding pixels must be mirrored accordingly. The
computational effort of using this pattern is reduced to an average of two
samples per pixel. At the bottom left of the figure is a regular pattern with
eight samples arranged like on a chessboard.

Fig. 7.43 Examples of sample positions (small rectangles) within a pixel: Top left: grid-like; top
centre: Rotated Grid Super Sampling (RGSS); top right: FLIPQUAD; bottom left: chessboard-like
(checker); bottom right: arrangement in a grid with random elements (jitter)

In addition to these examples of deterministic sample arrangements,


stochastic approaches use a random component which arranges the samples
differently within each pixel. As with deterministic approaches, care must
be taken to avoid clustering of samples in certain areas of the pixel in order
to represent well the fragment to be sampled. This can further reduce
aliasing effects. Furthermore, very small objects in a scene that would fall
through a regular grid can be better captured. A random component
introduces noise into the image, but this is usually perceived by a viewer as
less disturbing than regular disturbing patterns due to aliasing. The bottom
right of Fig. 7.43 shows an example of such a sample arrangement with
nine samples for one pixel, created by so-called jittering (or jittered or
jitter). In this approach, the pixel to be sampled by K samples is evenly
divided into K areas. In the example in the figure, . Within each of
these areas, a sample is placed at a random position. The sample pattern
looks different for each pixel within the image to be sampled, but the
random variation is limited. This allows for image enhancement in many
cases with relatively little extra effort.
Further patterns for the arrangement of samples within a pixel can be
found in [2, p. 140 and p. 144]. In general, a higher number of samples per
pixel not only leads to better image quality but also increases the computing
effort and memory requirements. Typical values for the number of samples
K per pixel are in the range of two to 16 for current graphics hardware.

7.6.8 Post-Filtering in the OpenGL


The accumulation buffer algorithm (see Sect. 7.6.6) is only available in the
OpenGL compatibility profile. This algorithm can be used, for example, for
the representation of motion blur or the targeted use of the depth of field for
certain areas of the scene.
The frequently used post-filtering method specified for both OpenGL
profiles is multisample antialiasing (MSAA). As explained in Sect. 7.6.6,
according to the OpenGL specifications [20, 21], this method includes
supersampling antialiasing. These specifications do not define which
algorithms, which sample arrangement within a pixel or which type of low-
pass filtering must be used. The definition of these variations is left to the
OpenGL implementation, which means that the implementation of the
rasterisation procedure can be different for different graphics processors and
OpenGL graphics drivers. This means that rendering results at the pixel
level may differ depending on which hardware and software is used. The
rendering results usually differ only in detail.
Fig. 7.44 Part of the source code of the JOGL method (Java) for generating the output window:
Included is the initialisation of multisample antialiasing for samples per pixel

Multisample antialiasing is very closely linked to the frame buffer and


the output on the screen. Therefore, the initialisation on the software side
usually must take place when creating the window for the output of the
rendering result. Figure 7.44 shows how the initialisation of multisample
antialiasing can look in the JOGL source code for setting up an output
window. The following two commands activate the buffer needed to hold
the additional data for the samples per pixel and set the number K of
samples (in this case ) per pixel:

If multisample antialiasing is initialised and thus the corresponding


buffers are prepared, it is activated by default. It can be deactivated and
reactivated by the following commands:

If multisample antialiasing is activated, then antialiasing using pre-filtering


(see Sect. 7.6.4) is automatically deactivated, so that both methods can
never be active at the same time. The commands for antialiasing using pre-
filtering are ignored in this case.
As explained above, the exact arrangement of the samples within a pixel
cannot be specified via the OpenGL interface. However, the following
source code lines in the JOGL renderer can be used to specify the minimum
fraction of samples to be computed separately by the fragment shader
(proportion of separately shaded samples per pixel):

The floating point value of this fraction must lie in the interval [0, 1]. How
this value affects the shading of the individual samples within a pixel and
which samples exactly are calculated by the fragment shader is left to the
respective OpenGL implementation. If the value 1.0f is used as an
argument, as in the example above, then shading is to be performed for all
samples. In this case, supersampling antialiasing is applied.

Fig. 7.45 Dots, lines and a triangle drawn with multisample antialiasing by an OpenGL renderer:
Eight samples per pixel and sample shading with glMinSampleShading(1.0f) were set. The
lower part shows a cropped magnification of the upper part of the figure

Figure 7.45 shows dots, lines and a triangle created by an OpenGL


renderer using multisample antialiasing. Eight samples per pixel were used
and sample shading was activated for all samples. The drawn objects are the
same objects shown in Fig. 7.37 as an example rendering results using pre-
filtering. The smoothing by the multisample antialiasing is clearly visible at
the edges of all objects. It is noticeable that, in contrast to the pre-filtering
example (see Fig. 7.37), the dot is rectangular and does not have a circular
shape. As explained earlier, multisample antialiasing is largely independent
of the content of the scene, making it impossible to adapt the antialiasing
algorithm to specific geometric primitives. For most applications, this
rectangular representation of the dot will be sufficient, especially for small
dots. However, if round points are needed, a triangle fan
(TRIANGLE_FAN) can be used to draw a circular disc to which antialiasing
measures can be applied.
Fig. 7.46 Polygons drawn with multisample antialiasing by an OpenGL renderer against a black
background: Eight samples per pixel and sample shading with glMinSampleShading(1.0f)
were set. The top right and the bottom centre of the figure show cropped magnifications of the left
parts of the figure. The bottom right shows a cropped magnification with contrast enhancement of the
left part of the figure

The upper part of Fig. 7.46 shows the output of an OpenGL renderer
using multisample antialiasing for a white planar surface against a black
background. The same object as in Fig. 7.38 was drawn. The lower part of
the figure represents the output of an OpenGL renderer using multisample
antialiasing for the two coloured overlapping polygons shown in Fig. 7.39
using pre-filtering. Eight samples per pixel were used and sample shading
for all samples per pixel using glMinSampleShading(1.0f) was
activated. In contrast to the output using pre-filtering, no edges are visible
between the triangles that make up the surfaces.10 Furthermore, there is no
unwanted mixing of colours in the overlapping area of the coloured
polygons, as can be seen in Fig. 7.39. The use of a (special) blending
function as for pre-filtering is not necessary. At the edges of the surfaces,
the effect of the low-pass filtering is clearly visible, whereby the quality of
the display is significantly improved compared to the rendering result
without antialiasing. This is noticeable in the non-magnified objects. The
magnifications only serve to illustrate the antialiasing effect, which is
usually not or only hardly visible in the output of a renderer without special
measures.
Fig. 7.47 Polygons drawn with multisample antialiasing by an OpenGL renderer against a white
background: Eight samples per pixel and sample shading with glMinSampleShading(1.0f)
were set. The top right and the bottom centre of the figure show cropped magnifications of the left
parts of the figure. The bottom right shows a cropped magnification with contrast enhancement of the
left part of the figure

In Fig. 7.47, the same objects as in Fig. 7.46 are shown against a white
background. The colour of the white polygon was changed to black. The
same parameters were used for the rendering as for the example in Fig.
7.46. In this case, too, an error-free rendering with functioning and quality-
enhancing antialiasing is evident (see left part of the figure).11
As explained in Sect. 7.6.6, a crucial efficiency advantage of
multisample antialiasing is that shading only needs to be performed for one
sample of a pixel and thus only once per pixel. To achieve this, sample
shading can be disabled, for example, by the command
glDisable(gl.GL_SAMPLE_SHADING). The position of the sample
within a pixel at which the shading calculation takes place is not specified
by the OpenGL specification and thus depends on the concrete OpenGL
implementation used. For this reason, shading may be performed for a
pattern that lies outside of the fragment that (partially) covers the pixel.
Normally, this situation is not problematic. For high-quality rendering
results, centroid sampling or centroid interpolation can be used, which
guarantees the sample to be shaded to be covered by the fragment that
(partially) covers the pixel. In OpenGL, this is achieved by the keyword
centroid before an input variable of the fragment shader. According to
the OpenGL specification, this keyword is an auxiliary storage qualifier.
For this purpose, the following GLSL example source code line of the
fragment shader can be used:

The examples used in this section were created without a depth test.
However, the use of the depth test together with multisample antialiasing is
possible without any problems. Section 8.2.5 presents an algorithm for the
depth buffer test (depth buffer algorithm). The integration into the OpenGL
graphics pipelines is shown in Sect. 2.5.6. Detailed explanations of the
depth buffer test in OpenGL can be found in [22, pp. 376–380].
A comparison of pre-filtering (Sect. 7.6.4) and post-filtering methods
specified in the OpenGL shows that for most applications multisample
antialiasing is the recommended choice.

7.7 Exercises
Exercise 7.1 Explain the difference between vector and raster graphics.
What are the advantages and disadvantages of a vector graphic and a raster
graphic for use in computer graphics?

Exercise 7.2 Derive the midpoint algorithm for drawing a line with slopes
between and 0.

Exercise 7.3 Given a line from point (1, 1) to point (7, 6), apply the
midpoint algorithm to this line.

Exercise 7.4 Let the starting point (2, 3) and the endpoint (64, 39) of the
line be given.
(a)
Apply the structural algorithm of Brons to draw this line. Give the
complete drawing sequence.
(b)
Plot a sequence of the repeating pattern in a coordinate system. In the
same coordinate system, plot the drawing sequences for each iteration
of the algorithm (for this sequence of the pattern).

Exercise 7.5 Use the structural algorithm from Sect. 7.3.3 to draw the line
from Fig. 7.9.

Exercise 7.6 A part of the graph of the function $$y = -a\sqrt{x}+b$$ (


$$a,b \in \mathrm{I\!N}^+$$) is to be drawn using the midpoint
algorithm.
(a)
For which x-values is the slope between $$-1$$ and 0?
(b)
Write the equation of the function in suitable implicit form
$$F(x,y) = 0$$. Use $$d = F(x,y)$$ as the decision variable to
develop the midpoint algorithm for this function. How must d be
changed depending on whether the eastern (E) or the south-eastern
(SE) point is drawn by the midpoint algorithm?
(c)
How must the initial value $$d_{\text {init}}$$ be chosen for d
when the first point to be drawn is
$$(x_0,y_0) = \left( a^2,-a^2+b\right) $$?
(d)
How can the fractional rational values in the decision variable—at the
initial value and each recalculation—be avoided?

Exercise 7.7 Hatch the inside of the polygon shown below according to
the odd parity rule (even–odd rule).
References
1. J. R. Van Aken. “An Efficient Ellipse-Drawing Algorithm”. In: IEEE Computer Graphics and
Applications 4.9 (1984), pp. 24–35.

2. T. Akenine-Möller, E. Haines, N. Hoffman, A. Pesce, M. Iwanicki and S. Hillaire. Real-Time


Rendering. 4th edition. Boca Raton, FL: CRC Press, 2018.

3. W. J. Bouknight. “A procedure for generation of three-dimensional half-toned computer graphics


presentations”. In: Commun. ACM 13.9 (Sept. 1970), pp. 527–536.

4. J. E. Bresenham. “A Linear Algorithm for Incremental Digital Display of Circular Arcs”. In:
Communications of the ACM 20.2 (1977), pp. 100–106.

5. J. E. Bresenham. “Algorithm for Computer Control of a Digital Plotter”. In: IBM Systems
Journal 4.1 (1965), pp. 25–30.

6. R. Brons. “Linguistic Methods for the Description of a Straight Line on a Grid”. In: Computer
Graphics and Image Processing 3.1 (1974), pp. 48–62.

7. R. Brons. “Theoretical and Linguistic Methods for the Describing Straight Lines”. In:
Fundamental Algorithms for Computer Graphics. Ed. by Earnshaw R.A. Berlin, Heidelberg:
Springer, 1985, pp. 19–57. URL: https://doi.org/10.1007/978-3-642-84574-1_1.

8. W. Burger and M. J. Burge. Digital Image Processing: An Algorithmic Introduction Using Java.
2nd edition. London: Springer, 2016.

9. L. Carpenter. “The A-Buffer, an Antialiased Hidden Surface Method”. In: SIGGRAPH Comput.
Graph. 18.3 (1984), pp. 103–108.

10. F. Dévai. “Scan-Line Methods for Parallel Rendering”. In: High Performance Computing for
Computer Graphics and Visualisation. Ed. By M. Chen, P. Townsend and J. A. Vince. London:
Springer, 1996, pp. 88–98.

11. J. D. Foley, A. van Dam, S. K. Feiner and J. F. Hughes. Computer Graphics: Principles and
Practice. 2nd edition. Boston: Addison-Wesley, 1996.

12. S. Gupta and R. E. Sproull. “Filtering Edges for Gray-Scale Displays”. In: SIGGRAPH Comput.
Graph. 15.3 (1981), pp. 1–5.

13. D. Hearn and M. P. Baker. Computer Graphics with OpenGL. 3rd edition. Upper Saddle River,
NJ: Pearson Prentice Hall, 2004.

14. J. F. Hughes, A. van Dam, M. MaGuire, D. F. Sklar, J. D. Foley, S. K. Feiner and K. Akeley.
Computer Graphics. 3rd edition. Upper Saddle River, NJ [u. a.]: Addison-Wesley, 2014.

15. M. R. Kappel. “An Ellipse-Drawing Algorithm for Raster Displays”. In: Fundamental
Algorithms for Computer Graphics. Ed. by Earnshaw R. A. NATO ASI Series (Series F:
Computer and Systems Sciences). Berlin, Heidelberg: Springer, 1985, pp. 257–280.
16.
R. Li, Q. Hou and K. Zhou. “Efficient GPU Path Rendering Using Scan-line Rasterization”. In:
ACM Trans. Graph. 35.6 (Nov. 2016), Article no 228.

17. J. Pineda. “A Parallel Algorithm for Polygon Rasterization”. In: SIG GRAPH Comput. Graph.
22.4 (1988), pp. 17–20.

18. M. L. V. Pitteway. “Algorithm for drawing ellipses or hyperbolae with a digital plotter”. In: The
Computer Journal 10.3 (1967), pp. 282–289.

19. M. L. V. Pitteway and D. J. Watkinson. “Bresenham’s Algorithm with Gray Scale”. In: Commun.
ACM 23.11 (1980), pp. 625–626.

20. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6
(Compatibility Profile) - October 22 2019. Abgerufen 8.2.2021. The Khronos Group Inc, 2019.
URL: https://www.khronos.org/registry/OpenGL/specs/gl/glspec46.compatibility.pdf.

21. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Core
Profile) - October 22, 2019). Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https://
www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf.

22. G. Sellers, S. Wright and N. Haemel. OpenGL SuperBible. 7th edition. New York: Addison-
Wesley, 2016.

23. A. Watt. 3D-Computergrafik. 3rd edition. München: Pearson Studium, 2002.

24. W. E. Wright. “Parallelization of Bresenham’s line and circle algorithms”. In: IEEE Computer
Graphics and Applications 10.5 (Sept. 1990), pp. 60–67.

25. X. Wu. “An efficient antialiasing technique”. In: SIGGRAPH Comput. Graph. 25.4 (1991), pp.
143–152.

26. C. Wylie, G. Romney, D. Evans and A. Erdahl. “Half-tone perspective drawings by computer”.
In: Proceedings of the November 14-16, 1967, Fall Joint Computer Conference. New York, NY,
USA: Association for Computing Machinery, 1967, pp. 49–58.

Footnotes
1 JPEG stands for Joint Photographic Experts Group. This body developed the standard for the
JPEG format, which allows the lossy, compressed storage of images.

2 It should be noted that the points $$(x_0,y_0)$$ and $$(x_1,y_1)$$ are coordinates of
fragments lying on a grid, so that $$x_0$$, $$y_0$$, and can only be integers.

3 In the OpenGL specifications [20, 21], it is described that the interpolation qualifiers (flat,
noperspective and smooth) are to be specified for the output variables of the vertex processing
stage. However, https://www.khronos.org/opengl/wiki/Type_Qualifier_(GLSL) clarifies that such a
change only has an effect if the corresponding keywords are specified for input variables of the
fragment shader. This can be confirmed by programming examples.

4 To illustrate the staircase effect, all antialiasing measures were disabled when drawing these lines.

5 In signal processing, the use of the term downsampling is inconsistent. Sometimes, low-pass
filtering with subsequent reduction of the number of sampling points is meant and sometimes only
the selection of (a smaller number of) sampling points from the image to be sampled is meant,
without prior low-pass filtering. In this book, downsampling is understood as consisting of low-pass
filtering followed by a decimation of the number of sampling points (samples).

6 According to the OpenGL specification, non-square pixels are also allowed. However, a fragment
in the OpenGL specification is always assumed to be rectangular.

7 The soft blending of the edges has been lost in the magnification on the right side of the figure due
to the contrast enhancement. This desired antialiasing effect is visible in the magnification without
contrast enhancement (in the middle of the figure).

8 The explained filter operation corresponds to the illustrative description of the linear convolution
according to Eq. (7.27) (see below). Strictly speaking, the filter kernel (or the image) must be
mirrored in the horizontal and vertical directions before executing the explained operation, so that
this operation corresponds to the formula. Since the filter kernels are symmetrical in both
dimensions, this is not necessary in this case.

9 See information in previous footnote.

10 The soft blending of the edges has been lost in the magnification on the right side of the figure
due to the contrast enhancement. However, the desired antialiasing effect is visible in the
magnification without contrast enhancement (in the middle of the figure).

11 The sharp cut edges in the enlarged renderings are due to manual cropping of the fully rendered
and magnified object. These cropped magnifications were created for the presentation in this book
and therefore do not represent a loss of quality due to antialiasing measures.
© Springer Nature Switzerland AG 2023
K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science
https://doi.org/10.1007/978-3-031-28135-8_8

8. Visibility Considerations
Karsten Lehn1 , Merijam Gotzes2 and Frank Klawonn3
(1) Faculty of Information Technology, Fachhochschule Dortmund,
University of Applied Sciences and Arts, Dortmund, Germany
(2) Hamm-Lippstadt University of Applied Sciences, Hamm, Germany
(3) Data Analysis and Pattern Recognition Laboratory, Ostfalia University
of Applied Sciences, Braunschweig, Germany

Karsten Lehn (Corresponding author)


Email: [email protected]

Merijam Gotzes
Email: [email protected]

Frank Klawonn
Email: [email protected]

For the representation of a section of a three-dimensional model world, it


must be determined (just as in two dimensions) which objects are actually
located in the area to be displayed. In addition to these clipping
calculations, only for the objects located in the visible area, the problem of
obscuring objects or parts of objects by other objects must be solved. This
chapter introduces three-dimensional clipping, the spatial reduction of the
entire scene to the visible space to be displayed. The procedure is explained
in two-dimensional space, which can easily be extended to the three-
dimensional space. Straight-line segments are considered, since the edges of
polygons can be understood as such. The concept of clipping volume is
explained. In addition, this chapter describes procedures for determining the
visible objects in a scene. These include backside removal, which removes
polygons that are not visible in the scene and lie on the backside, so that
further calculations in the graphics pipeline only have to be applied to the
front sides. This saves computing time. Techniques are presented that
identify the visible areas of objects that are to be displayed in the case of
concealed objects. These techniques are divided into object and image-
space procedures, depending on whether they are applied to the objects or
only later in the graphics pipeline to the image section.

8.1 Line Clipping in 2D


When drawing a section of an image of a more complex “world” that is
modelled using vector graphics, the first step is to determine which objects
are located wholly or at least partially in the area to be displayed. The
process of deciding which objects can be omitted entirely when drawing or
which parts of an object should be considered when the drawing is called
clipping. The area from which objects are to be displayed is known as the
clipping area. In this section, algorithms for clipping of straight lines are
discussed in more detail. Since primitives are usually triangles made up of
three line segments, the generality is not limited when clipping is examined
below in terms of line segments. Later in this chapter, the methods
discussed here will be applied to the three-dimensional case.
With straight-line clipping, four main cases can occur, which are
illustrated in Fig. 8.1 as an example:
Fig. 8.1 Different cases of straight-line clipping

The start and endpoints of the line segment lie in the clipping area, i.e.,
the line segment lies entirely within the area to be drawn. The entire line
segment must be drawn.
The start point is inside the clipping area, and the endpoint is outside the
clipping area or vice versa. The calculation of an intersection of the
straight-line segment with the clipping area is required. Only a part of the
line segment is drawn.
The start and endpoints are both outside the clipping area, and the
straight segment intersects the clipping area. The calculation of two
intersections of the line segment with the selected clipping area is
necessary. The line segment between these two intersection points must
be drawn.
The start and endpoints are both outside the clipping area, and the
straight segment does not intersect the clipping area. The straight-line
segment, therefore, lies entirely outside the clipping area and does not
need to be drawn.
An obvious but very computationally intensive method for clipping
straight-line segments is to calculate all intersections of the straight-line
segment with the edge of the rectangular clipping area to be displayed.
Since it is necessary to determine the intersection points of the line segment
with the four-line segments that limit the clipping range, it is not sufficient
to know only the intersection points of the corresponding lines. If an
intersection point is outside a line segment, it does not matter. To calculate
the intersection points, the line segment with start point and
endpoint can be displayed as a convex combination of start and
endpoint:

(8.1)

The rectangle defines the clipping area left corner and the
upper right corner . As an example, the determination of a
possibly existing intersection point of the straight-line segment with the
lower edge of the rectangle is considered here. For this purpose, the line
segment (8.1) must be equated with the line segment for the lower side

(8.2)

The x- and the y-component of (8.2) result in an equation in the variables


and , respectively. If this linear system of equations has no unique
solution, the two straight lines run parallel, so that the lower edge of the
clipping area does not cause any problems during clipping. If the system of
equations has a unique solution, the following cases must be distinguished:
1.
and : The intersection point is outside the straight section
and in front of .
2.
and : The line segment intersects the line defined by
the lower edge before .
3.
and : The point of intersection is outside the straight
section and in front of .
4.
and : The straight-line cuts the edge .
5.
and : The straight section cuts the bottom edge.
6. and : The straight-line cuts the edge behind
.
7.
and : The point of intersection lies outside the straight-
line and behind .
8.
and : The line segment intersects the line defined by
the lower edge behind .
9.
and : The point of intersection lies outside the straight
section and behind .
The same considerations can be made for the remaining rectangular
edges, and from this, it can be concluded whether or which part of the
straight-line is to be drawn. In the following, two methods for two-
dimensional line clipping are introduced, that is the Cohen–Sutherland
clipping algorithm and the Cyrus–Beck clipping algorithm.

8.1.1 Cohen–Sutherland Clipping Algorithmus


The Cohen–Sutherland clipping algorithm (see, e.g. [2]) avoids the time-
consuming calculations of the line intersections. For this purpose, the plane
is divided into nine subareas, which are described by a 4-bit code. One
point is assigned the four-digit binary code
according to the following pattern (otherwise 0):

Figure 8.2 shows the nine regions and the binary codes assigned to them.
Fig. 8.2 Binary code for the nine areas of the Cohen–Sutherland Clipping

A line segment to be drawn with start point P and endpoint Q is


considered. The binary code or assigned to these two points can
be determined by simple numerical comparisons. Three cases can be
distinguished when drawing the line segment. Drawing the straight-line
segment is done piece by piece. The first two cases represent a termination
condition. In the third case, the straight-line segment is divided into pieces.

1. case: The bitwise combination of the binary codes assigned to the two
points by the logical OR operator results in . Then
obviously, both points lie within the clipping area, the entire route is
drawn, and the drawing of the straight segment is finished.
2. case: The bitwise combination of the binary codes assigned to the two
points by the logical AND operator results in . This
means that the two binary codes have a common one, at least in one
position. The binary code is read from left to right. If there is a common
one in the first position, the entire straight segment is on the left side of the
clipping area, and if there is a one at the second position, the entire straight
segment is on the right side of the clipping area. Accordingly, a one at the
third or fourth position means that the straight-line segment is located
above or below the clipping range. In this case, the line segment does not
need to be drawn and the drawing of the line segment is finished.
3. case: Neither the first nor the second case applies.
Then and must apply. Without the restriction
of the generality, it can be assumed that , otherwise the points
P and Q are swapped. One calculate in in this case, the intersection or
intersections of the line segment with the rectangular lines to which the
ones in belong. There exist either one or two intersection
points. An example of this will be shown using the range with the binary
code 1000. The straight-line segment cannot intersect the lower and the
upper limiting straight line of the clipping area at the same time. The lower
line can only be cut if has a one at the third position, whereas the
upper line can only be cut if has a one at the fourth position. But the
third and fourth position of can never have the value one at the same
time.
If there is only one intersection point of the line segment with the
boundary lines of the clipping area, the point P is replaced by this
intersection point. If there are two intersection points, one of them is
determined, and this intersection point replaces P.
The straight-line segment shortened in this way is treated accordingly
until one of the first two cases applies.

Fig. 8.3 Cohen–Sutherland Clipping


Figure 8.3 illustrates this procedure. The straight-line segment PQ is
shortened via the intermediate steps , , to the straight-line
segment , which is finally drawn completely.

8.1.2 Cyrus–Beck Clipping Algorithmus


The Cyrus–Beck clipping algorithm [1] determines the piece of a straight-
line segment to be drawn using normal vectors of the clipping area and
parametric representation of the straight-line segment, which is represented
as a convex combination of start and endpoint as in Eq. (8.1). If is the
start point and is the endpoint of the line segment, then the line segment
corresponds exactly to the points

A normal vector is determined for each of the four edges bounding the
clipping area. The normal vector is determined so that it points outwards.
For the left edge, therefore, the normal vector edge is used. For
the lower edge , for the upper edge and for the right edge
are used.
Figure 8.4 shows an example of the Cyrus–Beck clipping procedure on
the left boundary edge of the clipping area. Next to the respective normal
vector , a point on the corresponding edge of the clipping area is also
selected. For the left edge, this point is marked (E for “edge”) in the
Fig. 8.4. This point can be chosen anywhere on this edge; for example, one
of the corner points of the clipping area located on it could be selected.
Fig. 8.4 Cyrus Beck clipping

The connection vector of the point with a position on the line


defined by the points and , can be expressed in the form

To calculate the intersection of the straight line with the edge of the CP, the
following must apply:

This equation states that the connection vector of with the intersection
point must be orthogonal to the normal vector , since it is parallel to the
left edge of the clipping area. Thus, the following applies to the parameter t
from which the intersection point results:

(8.3)

The denominator can only become 0 if either applies, i.e., the


straight line consists of only one point, or if the straight line is
perpendicular to the normal vector , i.e., parallel to the edge E under
consideration, then there can be no point of intersection.
The value t is determined for all four edges of the clipping rectangle to
determine whether there are intersections of the straight-line with the edges
of the rectangle. If a value t is outside the interval [0, 1], there is no
intersection of the line segment with the corresponding rectangle edge. Note
that the calculation of scalar products in the numerator and denominator of
Eq. (8.3) is performed by the pure form of the normal vectors and consists
of only the selection of the x- or y-component of the respective right
difference vector, if necessary with a change of the sign.
For each of the four edges of the clipping rectangle, consider the
calculated value t for the intersection of the straight line with the edge. If a
value t lies outside of the interval [0, 1], the straight-line segment does not
intersect the corresponding rectangular edge. The remaining potential
intersection points with the rectangle edges of the clipping area are
characterised as “potentially exiting” (PA) and “potentially entering” (PE).
These clipping points are all within the straight segment, however, you do
not have to cut a square edge, but can also reduce its extensions outside the
clipping area. Figure 8.5 illustrates this situation.

Fig. 8.5 Potential intersection points with the clipping rectangle between points and

Mathematically, the decision PA or PE can be made based on the angle


between the straight-line $$\overline{\textbf{p}_0\textbf{p}_1}$$
and the normal vector $$\textbf{n}$$ belonging to the corresponding
rectangular edge
If the angle is greater than $$90^\circ $$, the case is PE.
If the angle is smaller than $$90^\circ $$, the case is PA.
For this decision, only the sign of the scalar product must be determined
$$ \textbf{n}^\top \cdot (\textbf{p}_1-\textbf{p}_0)=\mid \textbf{n}^\top
\mid \cdot \mid \textbf{p}_1-\textbf{p}_0\mid \cdot \cos (\alpha ) $$
(where $$\alpha $$ is the angle between the two vectors
$$\textbf{n}^\top $$ and $$\textbf{p}_1-\textbf{p}_0$$). The
right side of the equation becomes negative only when the cosine becomes
negative because the lengths of the vectors are always positive. A negative
sign implies the case PE, and a positive sign implies the case PA. Since, in
each case, a component of the normal vectors of the rectangle edges is zero,
the sign of the scalar product is obtained by considering the corresponding
sign of the vector $$(\textbf{p}_1-\textbf{p}_0)$$.
To determine the straight section lying in the clipping rectangle, the
largest value $$t_E$$ must be calculated that belongs to a PE point
and the smallest value $$t_A$$ that belongs to a PA point. If
$$t_E \le t_A$$ is valid, exactly the part of the line between points
$$\textbf{p}_0 + (\textbf{p}_1-\textbf{p}_0)t_E$$ and
$$\textbf{p}_0 + (\textbf{p}_1-\textbf{p}_0)t_A$$ must be drawn in
the rectangle. In the case $$t_E > t_A$$, the line segment lies
outside the rectangle.

8.2 Image-Space and Object-Space Methods


Clipping determines which objects of a three-dimensional world modelled
in the computer are at least partially within the clipping volume. These
objects are candidates for the objects to be displayed in the scene. Usually,
not all objects will be visible, since, for example, objects further back will
be covered by those further forward. Also, the backsides of the objects may
not be visible. A simple algorithm for drawing the visible objects in a scene
could be as follows. Think of the pixel grid drawn on the projection plane
and place a beam through each pixel in the projection direction, i.e., parallel
to the z-axis. A pixel is to be coloured in the same way as the object on
which its corresponding ray first hits. This technique is called the image-
space method because it uses the pixel structure of the image to be
constructed. For p pixels and n objects, a calculation effort of
$$n \cdot p$$ steps is needed. At a usual screen resolution, one can
calculate from about one million pixels. The number of objects can vary
greatly depending on the scene. Objects are the triangles that are used to
approximate the surfaces of the more complex objects (see Chap. 7). For
this reason, hundreds of millions or more objects, i.e., triangles, are not
uncommon in more complex scenes.
In contrast to most image-space methods, object-space methods do not
start from the fragments but determine directly for the objects, i.e., the
triangles, which objects at least partially obscure others. Only the visible
parts of an object are then projected. Object-space methods must test the
objects in pairs for mutual obscuration so that for n objects, a computational
effort of about $$n^2$$, i.e., a square effort, is required. In general, it
is true that the number of objects is smaller than the number of pixels so
that $$n^2 \ll n\cdot p$$ follows. This means that the object-space
procedures must perform a much lower number of steps than the image-
space procedures. The individual steps are with the object-space procedures
substantially more complex. One advantage of the object-space methods is
that they can be calculated independently of the image resolution because
they do not access the pixels for the visibility considerations. Only during
the subsequent projection (see Chap. 4) of the visible objects, the image
resolution does play a role.
In the following two objects, space methods are presented, that is the
backside removal and the partitioning methods. Afterwards, three image-
space procedures are explained. The depth buffer algorithm and its variants
play the most crucial role in the practical application of screening.
Additionally, the scan-line method and the priority algorithms are
discussed. Beam tracing as an additional image-space method is presented
in the Sect. 9.9. This procedure is called raytracing.

8.2.1 Backface Culling


Regardless of which visibility method is preferred, the number of objects,
i.e., triangles or, more generally, plane polygons, should be reduced by
backface culling before use. Back faces are surfaces that point away from
the viewer, and that one cannot see them. These areas can be neglected by
the visibility considerations, and are, therefore, omitted in all further
calculations.
The orientation of the surfaces described in Sect. 4.2 allows the normal
vectors to the surfaces to be aligned in such a way that they always point to
the outside of the surface. If a normal vector is oriented, in this waypoints,
away from the observer, the observer looks at the back of the surface, and it
does not have to be taken into account in the visibility considerations. With
the parallel projection on the xy-plane considered here, the projection
direction is parallel to the z-axis. In this case, the viewer looks at the back
of a surface when the normal vector of the surface forms an obtuse angle
with the z-axis, i.e., when it points approximately in the opposite direction
of the z-axis.

Fig. 8.6 A front surface whose normal vector has an acute angle with the z-axis, and a posterior one
whose normal vector gives an obtuse angle so that the surface at representation is negligible

Figure 8.6 illustrates this situation using two sides of a tetrahedron. The
two parallel vectors correspond to the projection direction and thus point in
the direction of the z-axis. The other two vectors are the normal vectors to
the front or back of the tetrahedron. One can see that the normal vector of
the visible front side is at an acute angle to the projection direction. In
contrast, the normal vector of the invisible backside forms an obtuse angle
with the projection direction.
If there is an obtuse angle between the normal vector of a surface and
the projection direction, this means that the angle is greater than
$$90^\circ $$. If one calculates the scalar product of the normal
vector $$\textbf{n} = (n_x,n_y,n_z)^\top $$ with the projection
direction vector, i.e., with the unit vector
$$\textbf{e}_z = (0,0,1)^\top $$ in the z-direction, results
$$\begin{aligned} \textbf{e}_z^\top \cdot \textbf{n} \; = \; \cos
(\varphi ) \cdot \parallel \textbf{e}_z \parallel \cdot \parallel \textbf{n} (8.4)
\parallel \end{aligned}$$
Where $$\varphi $$ is the angle between the two vectors, and
$$\parallel \textbf{v} \parallel $$ is the length of the vector
$$\textbf{v}$$. The lengths of the two vectors on the right side of the
equation are always positive. Thus, the right side becomes negative exactly
when $$\cos (\varphi )<0$$, d.h. $$\varphi >90^\circ $$
applies. The sign of the scalar product (8.4) thus indicates whether the side
with the normal vector $$\textbf{n}$$ is to be considered in the
visibility considerations. All pages where the scalar product is negative can
be neglected. Since here the scalar product is calculated with a unit vector,
it is reduced to
$$ \textbf{e}_z^\top \cdot \textbf{n} \; = \; (0,0,1) \cdot \left(
\begin{array}{c} n_x \\ n_y \\ n_z \end{array} \right) \; = \; n_z. $$
To determine the sign of the scalar product, therefore, neither
multiplications nor additions have to be carried out; only the sign of the z-
component of the normal vector has to be considered.
Thus, in the case of parallel projection on the xy-plane, backside
removal consists of ignoring all plane polygons that have a normal vector
with a negative z-component. One can easily see that a good placement of
the culling is in the normalised device coordinates of the viewing pipeline.
The NDC performs a parallel projection of the x and y values into the
window space coordinate system. At the same time, the corresponding z-
coordinates are used for further considerations in the rasterizer (see
Chap. 7). The results of the Sect. 5.11 show that each normal vector
$$\boldsymbol{n}$$ must be multiplied by the transposed inverse of
the total matrix M from model- to NDC, i.e.,
$$\begin{aligned} (M^{-1})^{T}\cdot \boldsymbol{n} \end{aligned}$$
to obtain the corresponding normal vector
$$\boldsymbol{n}_{\text {NDC}}$$ in NDC. According to the
results of this section, it is sufficient to consider the z-component of
$$\boldsymbol{n}_{\text {NDC}}$$ concerning the sign. If this is
negative, the corresponding side of the primitive (usually triangle) is
removed, because it is not visible.

8.2.2 Partitioning Methods


Rear side removal reduces the computational effort for the visibility
considerations and all subsequent calculations. Spatial partitions also serve
to reduce the effort further. For this purpose, the clipping volume is divided
into disjunctive subareas, e.g., into eight sub-cubes of equal size. The
objects are assigned to the sub-cuboids with which they have a non-empty
intersection. In this way, an object that extends beyond the boundary of a
sub-quadrant is assigned to several sub-quadrants.
When using an object-space method (regardless of the resolution of the
image area), only the objects within a sub-quadrant need to be compared to
decide which objects are visible. In the case of successive sub-quadrants,
the objects of the following sub-quadrants are projected first and then those
of the front ones, so that they overwrite the rear objects with the front ones
if necessary. If the clipping volume is divided into k sub-cubes, a reduction
in the calculation effort of n objects of $$n^2$$ on
$$k\cdot \left( \frac{n}{k}\right) ^2 = \frac{n^2}{k}$$. However,
this only applies if each object lies in exactly one sub-quadrant, and the
objects are distributed evenly over the sub-cubes so that
$$\frac{n}{k}$$ objects can be viewed in each sub-cube. If there is a
large number of sub-boxes this assumption will not even come close to
being true.

Fig. 8.7 Split of clipping volume for image-space (left) and object-space methods

For image-space methods (depending on the resolution of the image


area), it may be advantageous to partition the clipping volume, as shown in
Fig. 8.7 on the left. For this purpose, the projection plane is divided into
rectangles, and each of these rectangles induces a cuboid within the
clipping volume. Thus, for a pixel, only those objects that are located in the
same sub-cube as the pixel need to be considered for the visibility
considerations.
With recursive division algorithms, the studied area is recursively
broken down into smaller and smaller areas until the visibility decision can
be made in the lower regions. The image resolution limits the maximum
recursion depth. With the area subdivision method, the projection surface is
divided. They are, therefore, image-space methods. Octal tree methods (see
Chap. 4) partition the clipping volume and belong to the object-space
methods.

8.2.3 The Depth Buffer Algorithm


The sl depth or z-buffer algorithm is the most widely used method for
determining which objects are visible in a scene. It involves the conversion
of a set of fragments into a pixel to be displayed (see Chap. 7)). The depth
buffer algorithm is, therefore, an image-space method that works according
to the following principle: A colour buffer is used in which the colour for
each pixel is stored. The objects are projected in an arbitrary order and
entered into the colour buffer. With this strategy alone, the object came last
by chance would be visible, but it would not usually be the one furthest
ahead. For this reason, before entering an object into the colour buffer, one
checks whether there is already an object at this position that is closer to the
viewer. Then this object must not be overwritten by the further object.

Fig. 8.8 How the depth buffer algorithm works

To determine whether an object further ahead has already been entered


in the colour buffer, use a second buffer, the depth buffer, or z-buffer. This
buffer is initially initialised with the z-coordinate of the rear clipping level.
Since here again, only a parallel projection on the xy-plane is considered, no
object lying in the clipping volume (see Sect. 5.7) can have a larger z-
coordinate. When an object is projected, not only the colour values of the
pixels are entered in the colour buffer, but also the corresponding z-values
in the depth buffer. Before entering an amount in the colour buffer and the
depth buffer, check whether the depth buffer already contains a smaller
value. If this is the case, this means that an object further ahead has already
been entered at the corresponding position, so that the object currently
being viewed must not overwrite the previous values in the colour and
depth buffer.
The operation of the depth buffer algorithm is illustrated in Fig. 8.8.
There are two objects to be projected, a rectangle, and an ellipse. In reality,
both objects are tessellated, and the procedure is carried out with the
fragments of the corresponding triangles. For the sake of clarity, the process
will be explained below using the two objects as examples. The viewer
looks at the scene below. The colour buffer is initialised with the
background colour, the depth buffer with the z-value of the rear clipping
level. For better clarity, these entries in the depth buffer have been left
empty in the figure, and normally they have the greatest value. The
rectangle is projected first. Since it lies within the clipping volume and its z-
values are, therefore, smaller than the z-value of the rear clipping bend,
which has been in the depth buffer everywhere up to now, the
corresponding colour values of the projection are written into the colour
buffer and the z-values into the depth buffer. During the subsequent
projection of the ellipse, it is found that all z-values at the corresponding
positions are larger so that the ellipse is entered into the colour and depth
buffer and thus partially overwrites the already projected rectangle.
If one had projected the ellipse first and then edited the rectangle, one
would notice that there is already a smaller z-value in the depth buffer at
two of the pixels of the rectangle to be projected, which, therefore, must not
be overwritten by the rectangle. Also, in this projection sequence, the same
correct colour buffer would result in the end, which is shown in the lower
part of Fig. 8.8.
Note that the z-values of an object are usually not the same and that for
each projected fragment of an object, it must be decided individually
whether it should be entered into the colour and depth buffer.
The depth buffer algorithm can also be used very efficiently for
animations where the viewer’s point of view does not change. The objects
that do not move in the animation form the background of the scene. They
only need to be entered once in the colour and depth buffer, which are then
reinitialised with these values for the calculation of each image. Only the
moving objects must be entered again. If a moving object moves behind a
static object in the scene, this is determined by the z-value in the depth
buffer, and the static object correctly hides the moving object. Also, the
depth buffer algorithm can be applied to transparent objects (transparency is
explained in Chap. 6).
To enter a plane polygon into the colour and depth buffer, a scan-line
method is used in which the pixels in the projection plane are scanned line
by line. The implicit equation describes the plane induced by the polygon
$$\begin{aligned} A\cdot x + B\cdot y + C \cdot z + D \; = \; 0.
(8.5)
\end{aligned}$$
The z-value within a scan line can be expressed in the form
$$ z_{\tiny \text{ new }} \; = \; z_{\tiny \text{ old }} + \varDelta z $$
because of the z-values over a plane change linearly along the scan line. For
the pixel (x, y), let the corresponding z-coordinate polygon be
$$z_{\text{ alt }}$$. For the z-coordinate $$z_{\text{ neu }}$$
of the pixel $$(x+1,y)$$, it applies with the plane Eq. (8.5) and under
the assumption that the point $$(x,y,z_{\text{ alt }})$$ lies on the
plane
$$\begin{aligned} 0= & {} A \cdot (x+1) + B \cdot y + C \cdot
z_{\text{ new }} + D\\[1mm]= & {} A \cdot (x+1) + B \cdot y + C
\cdot (z_{\text{ old }} + \varDelta z) + D\\[1mm]= & {}
\underbrace{A \cdot x + B \cdot y + C \cdot z_{\text{ old }} + D}_{=\, 0}
+ A + C \cdot \varDelta z \\[1mm]= & {} A + C \cdot \varDelta z.
\end{aligned}$$
From this, one get the searched change of the z-coordinate along the scan
line
$$ \varDelta z \; = \; - \frac{A}{C}. $$
This means that not every z-coordinate of the fragments has to be
calculated. For each scan line, the following z-coordinate is determined by
adding $$\varDelta z$$ to the current z-value. Only the first z-
coordinate of the scan line has to be interpolated, which reduces the overall
calculation effort. However, this leads to data dependencies, and the use of
parallelism is reduced.

8.2.4 Scan-Line Algorithms


With the depth buffer algorithm, the projection of individual polygons is
carried out using scan-line methods. Alternatively, the edges of all objects
can also be projected to determine which objects are to be displayed using a
scan-line method. With this scan-line method for determining visibility, it is
assumed that the objects to be projected are plane polygons. The coordinate
axes of the display window—or the corresponding rectangular area in the
projection plane—are denoted by u and v (not to be confused with the
texture coordinates).
The scan-line method works with three tables. The edge table contains
all edges that do not run horizontally and has the following structure:

$$v_{\text{ min $$u(v_{\text{ min $$v_{\text{ max $$\varDelta Polygon


}}$$ }})$$ }}$$ u$$ numbers

$$v_{\text{ min }}$$ is the smallest v-value of the edge,


$$u(v_{\text{ min }})$$, the corresponding u-value to the
$$v_{\text{ min }}$$-value of the edge. Accordingly,
$$v_{\text{ max }}$$ is the largest v-value of the edge.
$$\varDelta u$$ is the slope of the edge. The list of polygons to which
the edge belongs is entered in the column Polygon numbers. The edges are
sorted in ascending order according to the values
$$v_{\text{ min }}$$ and, in case of equal values, in ascending order
according to $$u(v_{\text{ min }})$$.
The second table in the scan-line method is the polygon table. It
contains information about the polygons in the following form:

Polygon number A B C D Colour In-flag

The polygon number is used to identify the polygon. The coefficients


A, B, C andD define the implicit plane equation belonging to the polygon
$$ Ax+By+Cz+D \; = \; 0. $$
The colour or colour information for the polygon is entered in the Colour
column. The requirement that a unique colour can be assigned to the
polygon considerably limits the applicability of the scan-line method. It
would be cheaper, but more complex, to read the colour information
directly from the polygon point by point. The in-flag indicates whether the
currently considered position on the scan line is inside or outside the
polygon.
The last table contains the list of all active edges. The active edges are
those that intersect the currently viewed scan line. They are sorted in
ascending order by the u-components of the intersection points. The length
of the table of the active edge changes for each scan line in the calculation.
The lengths and the entries of the other two tables remain unchanged during
the entire calculation except for the in-flag.
In the example shown in Fig. 8.9, the following active edges result for
the scan lines $$v_1, v_2, v_3, v_4$$:

Fig. 8.9 Determination of the active edges for the scan lines $$v_1, v_2, v_3, v_4$$

$$v_1$$: $$P_3 P_1$$, $$P_1 P_2$$


$$v_2$$ : $$P_3 P_1$$, $$P_1 P_2$$, $$P_6 P_4$$, $$P_5 P_4$$
$$v_3$$ : $$P_3 P_1$$, $$P_6 P_5$$, $$P_3 P_2$$, $$P_5 P_4$$
$$v_4$$ : $$P_6 P_5$$, $$P_5 P_4$$.

If one look at a scan line, after determining the active edges, first, all in-
flags are set to zero. Afterwards, the scan line is traced with the odd-parity
rule (see Sect. 7.5). If an active edge is crossed, the in-flags of the polygons
are updated. The in-flags of the polygons to which the edge belongs need to
be adjusted, as one enter or exit the polygon when crossing the edge. At
each pixel, the visible polygon must be determined among the active
polygons with In-Flag=1. For this purpose, only the z-value of each polygon
must be determined from the corresponding plane equation. The polygon
with the smallest z-value is at a similar position visible. The corresponding
z-value of a polygon can be defined incrementally as with the depth buffer
algorithm.

8.2.5 Priority Algorithms


With the depth buffer algorithm, the order in which the objects are projected
is not essential. By exploiting the information in the z-buffer, the
overwriting of objects further ahead by objects further back is prevented.
The goal of priority algorithms is to select the projection order of the
objects so that objects further forward are projected only after all objects
behind them have already been projected. This projection sequence means
that the z-buffer can be dispensed with. Also, the determination of the order
is independent of the resolution, so that priority algorithms belong to the
class of object-space methods.

Fig. 8.10 No overlapping of x- or y-coordinates

In the following, the sequence in which two objects, i.e., two plane
polygons P and Q, are to be projected is to be decided, so that the polygon
that may be located further back does not overwrite the front one. If the z-
coordinates of the two polygons do not overlap, the object with the larger z-
coordinates is drawn to the front. If the z-coordinates overlap, further
investigations must be made. If the x- or y-coordinates of P and Q do not
overlap, the order in which they are projected does not matter because for
the viewer, they are next to each other or on top of each other, as shown in
Fig. 8.10.
To check whether the polygons overlap in a coordinate, only the values
of the corresponding vertex coordinates of one polygon need to be
compared with those of the other. If all values of one polygon are smaller
than those of the other or vice versa, an overlap in the corresponding
coordinate can be excluded.
If overlaps occur in all coordinates, the question is whether one polygon
is completely behind or in front of the plane of the other. These two
possibilities are shown in Fig. 8.11. On the left side of the figure, the
polygon that lies behind the plane of the other one should be projected first.
On the right, the polygon that lies entirely in front of the plane of the other
would have to be drawn last.

Fig. 8.11 Is one polygon completely behind or in front of the plane of the other?

These two possible cases can be solved by using a suitably oriented


normal vector to the plane. An example of this is shown here for the
situation on the right in Fig. 8.11. The normal vector to the plane induced
by the one polygon is used. The normal vector should be oriented towards
the observer. Otherwise, the polygon would already have been deleted from
the list of the polygons to be considered during the backside removal (see
Sect. 8.2.1). If any point in the plane is selected, the connection vectors of
these points must form an angle smaller than $$90^\circ $$ with the
normal vector to the corner points of the other polygon. This is equivalent
that the scalar product of each of these vectors is positive with the normal
vector. When this is true, one polygon is entirely in front of the plane
induced by the other polygon. Figure 8.12 illustrates this situation: The
angles that the normal vector makes with the vectors to the three corners of
the polygon are all less than $$90^\circ $$.

Fig. 8.12 Determining whether one polygon is entirely in front of the plane of the other

Fig. 8.13 A case where there is no correct projection order of the objects
Unfortunately, these criteria are not sufficient to construct a correct
projection order of the objects. There are cases like in Fig. 8.13 where it is
impossible to project the objects one after the other so that in the end, a
correct representation is obtained. If none of the above criteria apply,
priority algorithms require a finer subdivision of the polygons involved to
allow a correct projection order. Even if these cases are rarely encountered,
the determination of a suitable finer subdivision of the polygons can be very
complicated.

8.3 Exercises
Exercise 8.1 In Fig. 8.13, if we assume triangles rather than arbitrary
plane polygons, the projection and application of the odd-parity rule is not
necessary to determine whether the intersection is lying within the triangle.
Specify a procedure to make the decision for triangles easier.

Exercise 8.2 Specify a procedure to test whether the left case in Fig. 8.11
is present. Proceed in a similar way as in Fig. 8.12.

Exercise 8.3 Let the following clipping range be given with


$$(x_{\text {min}}, y_{\text {min}}) = (1, 2)$$ and
$$(x_{\text {max}}, y_{\text {max}}) = (6, 4)$$. In addition, let a
straight-line segment be given, which leads from the point
$$P_0 = (0, 0)$$ to the point $$P_1 = (5, 3)$$. Sketch the described
situation and perform a clipping of the straight-line segment using
(a)
the Cohen–Sutherland algorithm and
(b)
the Cyrus–Beck algorithm.

Exercise 8.4 A backface culling is to be carried out in a scene:


(a) Consider the triangle defined by the homogeneous vertices
$$(1, 0, 1, 1)^{T}$$,(2, 1, $$ 0, 1)^{T}, (2, 2, 1, 1)^{T}$$. To
perform culling, you need to calculate the surface’s normal (normal
vector) (assuming it is flat). What is the normal of the surface, taking
into account the constraint that the vertices are arranged
counterclockwise to allow a clear meaning of the front and back? The
viewer’s position should be on the negative z-axis, and the surface
should be visible from there.
(b)
For another scene area, the normal $$(-3, 1,-2, 0)^{T}$$ was
calculated. The eye point is located at $$(0, 10, 1, 1)^{T}$$.
Determine by calculation whether the corner point of the surface
$$(-1, 1,-1, 1)^{T}$$ is visible or facing the viewer, taking into
account the position of the eye point.

References
1. M. Cyrus and J. Beck. “Generalized Two- and Three-Dimensional Clipping”. In: Computers and
Graphics 3.1 (1978), pp. 23–28.

2. J. D. Foley, A. van Dam, S. K. Feiner and J. F. Hughes. Computer Graphics: Principles and
Practice. 2nd edition. Boston: Addison-Wesley, 1996.
[zbMATH]
© Springer Nature Switzerland AG 2023
K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science
https://doi.org/10.1007/978-3-031-28135-8_9

9. Lighting Models
Karsten Lehn1 , Merijam Gotzes2 and Frank Klawonn3
(1) Faculty of Information Technology, Fachhochschule Dortmund,
University of Applied Sciences and Arts, Dortmund, Germany
(2) Hamm-Lippstadt University of Applied Sciences, Hamm, Germany
(3) Data Analysis and Pattern Recognition Laboratory, Ostfalia University
of Applied Sciences, Braunschweig, Germany

Karsten Lehn (Corresponding author)


Email: [email protected]

Merijam Gotzes
Email: [email protected]

Frank Klawonn
Email: [email protected]

Section 5.8 discusses projections that are needed to represent a three-


dimensional scene. Projections are considered special mappings of three-
dimensional space onto a plane. In this sense, projections only describe
where a point or object is to be drawn on the projection plane. The visibility
considerations in Chap. 8 are only about ensuring that front objects are not
covered by rear ones when projecting objects. Just the information where an
object is to be drawn on the projection plane, i.e., which fragment covers
this object, is not sufficient for a realistic representation. Figure 9.1 shows
the projections of a grey sphere and a grey cube in two variants each. In the
first representation, the projection of the sphere or cube is coloured in the
grey basic colour of the objects. This results in uniformly coloured flat
structures without three-dimensional information. The projection of the
sphere is a uniformly grey circle. The projection of the cube results in a
grey filled hexagon.
Only the consideration of illumination and reflection, which leads to
different shades of the object surfaces and thus also of the projections,
results in the three-dimensional impression of the second sphere and the
cube on the right in the figure. Figure 9.2 shows this situation using the
Utah teapot as an example.
The colour or brightness modification of an object surface due to
lighting effects is called shading. This chapter presents the basic principles
and techniques required for shading. It should be noted in this context that
shaders (small programmable units that run on the GPU, see Chap. 2) are
different from shading.
According to theory, the calculations for shading explained in the
following sections would have to be carried out separately for each
wavelength of light. Computer graphics algorithms are usually limited to
the determination of the three RGB values for red, green and blue (colour
representations are discussed in Chap. 6). In the meantime, much more
complex illumination models exist that are physically-based and even
simulate illumination at the photon level. A very good overview is given in
[6, p. 215ff]. In this chapter, the basics are considered.

9.1 Light Sources of Local Illumination


In addition to the objects and the viewer, a three-dimensional scene also
includes information about the lighting, which can come from different
light sources. The colour of the light belongs to each type of lighting. It
should be noted that colour is not a physical aspiration of light, but that
wavelength-dependent energy distribution is mapped onto colours and
linear colours are used for calculation. Usually, white light is used, but it
can deviate from this, for example, with coloured lamps or the reddish light
of a sunset. The colour of light from a light source is defined by RGB value.
Fig. 9.1 Objects without and with lighting effects

Fig. 9.2 Utah Teapot without and with lighting effects

The aim is to create a very efficient and realistic image of the lighting
conditions. In the following, therefore, the physically exact illumination is
not but a very good efficient approximation. In addition, the reflections of
light from adjacent surfaces to each other are neglected, so that only light
emitted by light sources is distributed. This means that only the intensity of
the incident light on the object and the reflection behaviour of the object in
question towards the observer is evaluated. Thus, indirect illuminations of
the objects to each other are neglected as well as the visibility of the light
source and the generation of corresponding shadows.
This is called local lighting and is the standard lighting model for
computer graphics. More computationally intensive techniques, which take
into account the influence of neighbouring surfaces and thus act globally,
are dealt with in Sect. 9.8 (radiosity) and Sect. 9.9 (ray tracing). This means
that the sum of all reflections, direct and indirect, is perceived as lighting.
In addition, these computationally intensive global lighting techniques can
be calculated in advance and stored in a raster image similar to a texture. It
should be noted that this applies to diffusely reflecting surfaces. This is not
possible for specularly reflecting surfaces since the illumination depends on
the observer’s position. Similarly, with static light sources, the shadow cast
can be calculated in advance and saved in the so-called shadow maps, again
similar to a texture. With dynamic light sources, the shadow maps are
recalculated. In the local calculation process, these textures can then be
accessed in order to incorporate them directly as a factor of illumination.
Indirect light that can be calculated by global illumination is, therefore, not
integrated into the local illumination model without the detour via textures.
Therefore, an additional simulation of indirect light is necessary to keep
objects that are not directly illuminated displayable. This simulation is
achieved by a constant approximation of the diffuse indirect illumination.
For this purpose, the simplest form of light is used, i.e., ambient light. It is
not based on a special light source but represents the light that should be
present almost everywhere due to multiple reflections on different surfaces.
It is, therefore, represented by the constant light intensity in the scene. In a
room with a lamp above a table, there will not be complete darkness under
the table, even if the light from the lamp cannot shine directly into it. The
light is reflected from the table’s surface, the ceiling and the walls, and thus
also gets under the table, albeit at a lower intensity. In the model, the
scattered light does not come from a certain direction, but from all
directions with the same intensity, and therefore, has no position. Parallel
incident or directional light has a colour and a direction from which it
comes. Directional light is produced by a light source such as the sun,
which is almost infinitely far away. The light rays run parallel and have no
position for the simulation. Therefore, they have a constant direction of
light and constant intensity. A lamp is modelled by a point light source. In
addition to the colour, a point light source must be assigned a unique
position in space. The light rays spread out from this position in all
directions. The intensity of the light decreases with the distance from the
light source. This effect is called attenuation. The following consideration
shows that the intensity of the light decreases quadratically with distance.
To do this, imagine a point source of light that is located in the centre of a
sphere with radius r. The entire energy of the light is thus distributed evenly
on the inside of the sphere’s surface. If the same situation is considered with
a sphere with a larger radius R, the same light energy is distributed over a
larger area. The ratio of the two spherical surfaces is
With a ratio of , each point on the inside of the larger sphere,
therefore, receives only a quarter of the energy of a point on the smaller
sphere. Theoretically, therefore, the intensity of the light from a point
source would have to be multiplied by a factor of if it were to strike
an object at a distance d. In this way, the intensity of the light source falls
very quickly to a value close to zero, so that from a certain distance on, one
can hardly perceive any differences in light intensity. On the other hand,
very close to the light source, there are extreme variations in the intensity.
The intensity can become arbitrarily large and reaches the value infinity at
the point of the light source. To avoid these effects, a damping factor is used
in the form

(9.1)

with constants to be selected by the user where d is the distance to


the light source. First of all, this formula prevents intensities greater than
one from occurring. The quadratic polynomial in the denominator can be
used to define a parabola shape that produces moderate attenuation effects.
In addition, the linear term with the coefficient is used to model
atmospheric attenuation. The quadratic decrease of the light intensity
simulates the distribution of light energy over a larger area as the distance
increases. In addition, the light is attenuated by the atmosphere, i.e., the air
turbidity. Fine dust in the air absorbs a part of the light s. In this model, this
leads to a linear decrease of light intensity with distance. When a light beam
travels twice the distance, it must pass twice as many dust particles on its
path, each of which absorbs a certain value of its energy. It should be noted
that, in reality, absorption in the homogeneous media such as mist and dust
behaves approximately according to Beer’s Law. The intensity decreases
exponentially with distance. Additionally, outward scattering is added, so
that light intensity is lost. To illustrate this is much more complicated and is
not realised in this model.
Fig. 9.3 Light beam of a headlamp

Another commonly used form of the light source is spotlighting. A


spotlight differs from a point light source, in that, it emits light only in one
direction, which is conical in shape. A spotlight is characterised by its light
colour, its unique position, the direction in which it radiates and the aperture
angle of its light cone (the beam angle). As in the case of a point source of
light, a headlamp is characterised by attenuation in accordance with Eq.
(9.1). The root means square of the attenuation results from considerations
similar to those for point light sources. Figure 9.3 shows that the energy of
the headlamp is distributed over the area of a circle. The radius of the circle
grows proportionally with distance so that the circular area increases
quadratically and the light energy decreases quadratically.

Fig. 9.4 The warning model for headlamps


To get a more realistic model of a headlamp, it should be taken into
account that the intensity of light decreases towards the edge of the cone.
The Warn model [9] can be used for this purpose. The decrease of light
intensity towards the edge of the cone is controlled in a similar way to
specular reflection (see Sect. 9.9). If is the vector pointing in the direction
of the headlamp from a point on a surface and is the axis of the cone, the
light intensity at the point under consideration is
(9.2)
where is the intensity of the headlamp, is the distance-dependent
attenuation from Eq. (9.1) and y is the angle between the axis of the cone
and the line connecting the point on the surface to the headlamp. The value
p controls how much the headlamp focuses. In this way, it also controls the
intensity depending on the direction. For , the headlamp behaves like
a point light source. With the increasing value of p, the intensity of the
headlamp focuses to an increasingly narrow area around the axis of the
cone. The cosine term in Eq. (9.2) can again be calculated using the scalar
product of the vectors and , provided they are normalised. Figure 9.4
illustrates the calculation for the warning model. Furthermore, let be
equal to .

9.2 Reflections by Phong


In order to achieve lighting effects such as those shown in Figs. 9.1 and 9.2,
it is necessary to specify how the surfaces of the objects reflect the light
present in the scene. The light that an object reflects is not explicitly
determined when calculating the reflections of other objects but is
considered as general scattered light in the scene. The more complex
radiosity model presented in Sect. 9.8 explicitly calculates the mutual
illumination of the objects.
For the model of light reflections from object surfaces described in this
section, a point on a surface is considered that should be correctly coloured
in consideration of the light in the scene. The individual effects presented in
the following usually occur in combination. For each individual effect,
RGB value is determined, which models the reflection part of this effect. To
determine the colouration of the point on the surface, the RGB values of the
individual effects must be added together. It must be taken into account that
the total intensity for the R, G and B components can each have a maximum
value of one. Figure 9.5 visualises this procedure using the example of the
Utah Teapot. This underlying model is the illumination model according to
Phong [8]. The Phong lighting model is not based on physical calculations,
but on purely heuristic considerations. Nevertheless, it provides quite
realistic effects for specular reflection. Sometimes a modified Phong’s
model, the Blinn–Phong lighting model, is used.

Fig. 9.5 The Utah teapot with effects of the Phong lighting model: Additive superposition of
emissive, ambient, diffuse and specular lighting or reflection

A very simple contribution to the colouration of a pixel is made by an


object that has its own luminosity independent of other light sources, which
is characterised by an RGB value. This means that objects with their own
luminosity are not interpreted as a light source, and therefore, are not
included in indirect lighting. Objects also rarely have their own luminosity.
The illumination equation for the intensity I of a fragment by the (emissive)
intrinsic luminosity of the object is, therefore
$$ I \; = \; I_{e} \; = \; k_e. $$
This equation should be considered separately for each of the three colours
red, green and blue. Correctly, it should, therefore, read
$$ I^{\text{(red) }} \; = \; I_e^{\text{(red) }} \; = \; k_e^{\text{(red) }}
\qquad \quad I^{\text{(green) }} \; = \; I_e^{\text{(green) }} \; = \;
k_e^{\text{(green) }} \qquad \quad I^{\text{(blue) }} \; = \;
I_e^{\text{(blue) }} \; = \;k_e^{\text{(blue) }}. $$
Since the structure of the illumination equation is always the same for all
types of reflection for each of the three colours, only one intensity equation
is given in the following, even if the calculation must be carried out
separately for red, green and blue.
A self-luminous object is not considered a light source in the scene. It
shines only for the viewer but does not illuminate other objects. Self-
luminous objects can be used sensibly in combination with light sources. A
luminous lamp is modelled by a point light source to illuminate other
objects in the scene. In order to make the lamp itself recognisable as a
luminous object, a self-luminous object is additionally generated, for
example, in the form of an incandescent lamp. In this way, a self-luminous
object does not create a three-dimensional impression of its surface. The
projection results in uniformly coloured surfaces, as can be seen in Fig. 9.1
on the far left and secondly from the right. Figure 9.5 also shows that the
Utah Teapot does not contain any intrinsic luminosity. $$k_{e}$$ is,
therefore, set to (0, 0, 0) in this figure, considering all RGB colour
channels, which is shown as black colour in the illustration.
All subsequent lighting effects result from reflections of light present in
the scene. The lighting equation has the form:
$$ I \; = \; k_{\text{ pixel }} \cdot I_{\text{ lighting }}. $$
$$I_{\text{ lighting }}$$ is the intensity of the light coming from the
light source under consideration. $$k_{\text{ pixel }}$$ is a factor
that depends on various parameters: The colour and type of surface, the
distance of the light source in case of attenuation and the angle at which the
light in the fragment under consideration strikes the surface.
In case of ambient or scattered light, one obtains the illumination
equation:
$$ I \; = \; k_a\cdot I_a. $$
$$I_a$$ is the intensity of the scattered light and $$k_a$$ the
reflection coefficient of the surface for scattered light. As with self-
luminous objects, scattered light results in uniformly coloured surfaces
during projection without creating a three-dimensional impression. The
scattered light is, therefore, responsible for the basic brightness of an object.
$$k_{a}$$ should be rather small compared to the other material
parameters. In Fig. 9.5, the Utah Teapot has received the values
(0.1745, 0.01175, 0.01175) for the individual colour channels for
$$k_{a}$$ as ambient part.1 $$I_a$$ in this visualisation is set
to (1, 1, 1), again concerning all RGB colour channels.
Only when the light can be assigned a direction—in contrast to scattered
light, which comes from all directions—does a non-homogeneous shading
on the object surface occurs, which leaves a three-dimensional impression.
On (ideally) matt surfaces, an incident light beam is reflected uniformly in
all directions. How much light is reflected depends on the intensity of the
incoming light, the reflection coefficient of the surface and the angle of
incidence of the light beam. The influence of the angle of incidence is
shown in Fig. 9.6. The same light energy that hits the circle perpendicular to
the axis of the cone of light is distributed over a larger area when the circle
is inclined to the axis. The more oblique the circle is, i.e., the flatter the
angle of incidence of the light, the less light energy is incident at a point.

Fig. 9.6 Light intensity depending on the angle of incidence

This effect is also responsible for the fact that, on earth, it is warmer at
the equator, where the sun’s rays strike vertically at least in spring and
autumn than at the poles. Also, summer and winter are caused by this effect,
because the axis of the earth is inclined differently in relation to the sun
during a year. In the period from mid-March to mid-September, the northern
hemisphere is inclined towards the sun. In the rest of the year, the southern
hemisphere is inclined towards the sun. According to Lambert’s reflection
law for such diffuse reflection, the intensity I of the reflected light is
calculated according to the following illumination equation:
$$\begin{aligned} I \; = \; I_d \cdot k_d \cdot \cos \theta ,
(9.3)
\end{aligned}$$
Where $$I_d$$ is the intensity of the incident light,
$$0\le k_d \le 1$$ is the material-dependent reflection coefficient of
the surface and $$\theta $$ is the angle between the normal vector
$$\textbf{n}$$ to the surface at the point under consideration and the
vector $$\textbf{l}$$ pointing in the direction from which the light
comes. Figure 9.7 illustrates this situation. Note that the diffuse reflection is
independent of the viewing direction of the observer.

Fig. 9.7 Diffuse Reflection

The illumination equation for diffuse reflection applies only to angles


$$\theta $$ between $$0^\circ $$ and $$90^\circ $$.
Otherwise, the light beam hits the surface from behind, so that no reflection
occurs. The intensity of the incident light depends on the type of light
source. With directional light, which is emitted from an infinitely distant,
the light source $$I_d$$ has the same value everywhere, according to
its intensity, specified for the directional light. In the case of a point light
source, $$I_d$$ is the intensity of the light source multiplied by the
distance-dependent attenuation factor given in Eq. (9.1). In the case of a
headlamp, the light beam is attenuated further in addition to the attenuation,
depending on the distance from the central axis of the light cone deviates.
The ambient and diffuse material colours are usually identical, as ambient
light also takes into account the diffuse reflection, but as a highly simplified
model for the dispersion of the atmosphere with indirect reflection. The
description is only possible indirectly by constant irradiance of surfaces.
However, the ambient part is usually smaller than the diffuse.
Since the reflection calculations are carried out for each fragment, the
cosine in Eq. (9.3) can be evaluated very often. If the normal vector
$$\textbf{n}$$ of the surface and the vector in the direction of the
light source $$\textbf{l}$$ are normalised, i.e., normalised to the
length of one, the cosine of the angle between the two vectors can be
determined by their scalar product, so that Eq. (9.3) becomes simplified
$$ I \; = \; I_d \cdot k_d \cdot (\textbf{n}^\top \cdot \textbf{l}) $$
With directional light and a flat surface, the vectors $$\textbf{l}$$
and $$\textbf{n}$$ do not change. In this case, a surface is shaded
evenly. One can see this with the right cube in Fig. 9.1, which was
illuminated with directional light. Because of their different inclination, the
individual surfaces of the cube are shaded differently, but each individual
surface is given a constant shading. In Fig. 9.5, $$k_d$$ is set to
(0.61424, 0.04136, 0.04136) as the diffuse component of the Utah Teapot.2
$$I_d$$ receives the values (1, 1, 1).
In addition to diffuse reflection, which occurs on matt surfaces and
reflects light uniformly in all directions, the Phong lighting model also
includes specular reflection (also known as mirror reflection). With
(ideally) smooth or shiny surfaces, part of the light is reflected only in a
direction which is determined according to the law “angle of incidence =
angle of reflection”. The incident light beam and the reflected beam thus
form the same angle with the normal vector to the surface. The difference
between diffuse reflection and specular reflection is illustrated in Fig. 9.8.
Note that specular reflection, unlike diffuse reflection, is dependent on the
viewing direction of the observer.

Fig. 9.8 Diffuse reflection and specular reflection

Real smooth surfaces usually have a thin, transparent surface layer, such
as painted surfaces. The light that penetrates the lacquer layer is reflected
diffusely in the colour of the object. In contrast, specular reflection occurs
directly at the lacquer layer, so that the light does not change its colour on
reflection. This effect can also be seen in Fig. 9.2 on the second object from
the left. The bright spot on the surface of the sphere is caused by specular
reflection. Although the sphere itself is grey, the point in the centre of the
bright spot appears white. The same effect can be observed for the Utah
Teapot in Fig. 9.5.
Shading caused by diffuse reflection is only determined by the angle of
incidence of the light and the reflection coefficient. The point of view of the
observer is not important. Where an observer sees a specular reflection
depends on the observer’s point of view. In the case of a flat surface on
which a light source appears, a mirror reflection is ideally only visible in
one point. However, this only applies to perfect mirrors. On more or less
shiny surfaces, mirror reflection is governed by the attenuated law “angle of
incidence approximately equal to the angle of reflection”. In this way, an
extended bright spot is created instead of a single point on the above-
mentioned grey sphere. Before presenting approaches to modelling this
effect of non-ideal specular reflection, the following first explains how the
direction of specular reflections is calculated.
In Fig. 9.9, $$\textbf{l}$$ denotes the vector pointing in the
direction from which light is incident on the considered point of the surface.
$$\textbf{n}$$ is the vector normal to the surface at the considered
point. The vector $$\textbf{v}$$ points in the direction of the
observer, i.e., in the direction of projection. $$\textbf{r}$$ is the
direction in which the mirror reflection can be seen. The normal vector
$$\textbf{n}$$ forms the same angle with $$\textbf{l}$$ and
$$\textbf{r}$$.
Fig. 9.9 Calculation of the specular reflection

To determine the specular reflection direction $$\textbf{r}$$ for


the given normalised vectors $$\textbf{n}$$ and $$\textbf{l}$$,
the auxiliary vector is introduced as shown in Fig. 9.9 on the right. The
projection from $$\textbf{l}$$ onto $$\textbf{n}$$ corresponds
to the vector are normalized $$\textbf{n}$$ in the figure, shortened to
the vector $$\textbf{s}$$. Because $$\textbf{n}$$ and
$$\textbf{l}$$, this projection is the vector
$$\textbf{n}\cdot \cos \theta $$. For the searched vector,
$$\textbf{r}$$ applies
$$\begin{aligned} \textbf{r} \; = \; \textbf{n} \cdot \cos (\theta ) +
(9.4)
\textbf{s}. \end{aligned}$$
The auxiliary vector $$\textbf{s}$$ can be determined from
$$\textbf{l}$$ and the projection of $$\textbf{l}$$ on
$$\textbf{n}$$
$$ \textbf{s} \; = \; \textbf{n}\cdot \cos (\theta ) - \textbf{l}. $$
Inserting $$\textbf{s}$$ into Eq. (9.4) yields
$$ \textbf{r} \; = \; 2 \cdot \textbf{n}\cdot \cos (\theta ) - \textbf{l}. $$
As in the case of diffuse reflection, the cosine of the angle between
$$\textbf{n}$$ and $$\textbf{l}$$ can be calculated via the
scalar product: $$\textbf{n}^\top \cdot \textbf{l} = \cos \theta $$.
This results in the direction of the specular reflection
$$ \textbf{r} \; = \; 2\cdot \textbf{n}(\textbf{n}^\top \cdot \textbf{l}) -
\textbf{l}. $$
Again, it is assumed that the surface is not illuminated from behind, i.e.,
that $$ 0^\circ \le \theta < 90^\circ $$ applies, which is exactly
fulfilled when the scalar product $$\textbf{n}^\top \cdot \textbf{l}$$
is positive.
Only with an ideal mirror does the specular reflection occur solely in
the direction of the vector $$\textbf{r}$$. As a rule, real mirrors are
not ideally smooth, and so specular reflection with decreasing intensity can
be seen at angles other than the angle of reflection. The illumination model
according to Phong takes this fact into account by calculating the intensity
that the observer sees due to the specular reflection that depends on the
viewing angle a and decreases with increasing $$\alpha $$

Fig. 9.10 The functions $$(\cos \alpha )^{64}$$ , $$(\cos \alpha )^8$$ ,
$$(\cos \alpha )^2$$ , $$\cos \alpha $$

$$\begin{aligned} I \; = \; I_s \cdot W(\theta )\cdot (\cos (\alpha


(9.5)
))^n. \end{aligned}$$
$$I_s$$ is the intensity of the incident light beam, in which, if
necessary, the distance-dependent attenuation in the case of a point light
source and, in addition, the deviation from the axis of the beam in the case
of a headlamp is calculated. The value $$0\le W(\theta ) \le 1$$ is the
proportion of light subject to specular reflection at an angle of incidence of
$$\theta $$. As a general rule, $$W(\theta ) = k_{s}$$ is taken
as a constant specular reflection coefficient of the surface. n is the specular
reflection exponent of the surface. For a perfect mirror, $$n = \infty $$
would apply. The smaller n is chosen, the more the specular reflection
scatters. The exponent n thus simulates the more or less existing roughness
of the object surface. The rougher the material, the smaller n. Figure 9.10
shows the value of the function3 $$f(\alpha ) = (\cos (\alpha ))^n$$ for
different specular reflection exponents n. For $$n=64$$, the function
already falls at very small angle $$\alpha $$ to a value close to zero,
so that specular reflection is almost only visible in the direction of the angle
of reflection. In contrast, the pure cosine function with the specular
reflection exponent $$n = 1$$ drops relatively slowly and produces a
visible specular reflection in a larger range. The shinier and smoother a
surface is, the larger the specular reflection exponent should be.
The calculation of the cosine function in Eq. (9.7) can again be traced
back to the scalar product of the normalised vectors $${\textbf{n}}$$
and $${\textbf{r}}$$:
$$\cos \alpha = {\textbf{r}}^\top \cdot {\textbf{v}}$$. Thus, for the
specular component applies
$$\begin{aligned} I \; = \; I_s \cdot k_s\cdot ({\textbf{r}}^{\top (9.6)
}\cdot {\textbf{v}})^{n}. \end{aligned}$$
The vector $${\textbf{r}}$$ is automatically normalised if the vectors
$${\textbf{l}}$$ and $${\textbf{n}}$$ are normalised.
In Fig. 9.5, $$k_s$$ is set to (0.727811, 0.626959, 0.626959) as
the specular component of the Utah Teapot.4 ls receives the values (1, 1, 1).
For the exponent n is selected $$0.6\cdot 128=76.8$$.5 Figure 9.12
shows the rendering results with different specular reflection exponents n.

Fig. 9.11 The angle bisector $$\textbf{h}$$ in Phong’s model

The angle $$\alpha $$ in Phong’s model indicates the deviation of


the viewing direction from the ideal specular reflection direction. Another
way to determine this deviation is the following consideration. If the
observer is exactly in the direction of the specular reflection, the normal
vector $$\textbf{n}$$ forms the bisector between
$$\textbf{l}$$ and $$\textbf{v}$$. The angle $$\beta $$
between the normal vector $$\textbf{n}$$ and the bisector
$$\textbf{h}$$ of $$\textbf{l}$$ and $$\textbf{v}$$, as
shown in Fig. 9.11 is, therefore, also used as a measure of the deviation
from the ideal specular reflection. However, this calculation is faster
because the calculation of the reflecting vector is avoided (Fig. 9.12).

Fig. 9.12 Rendering results with different specular reflection exponents n

In the modified Phong model, therefore, the term $$\cos \beta $$


is used instead of the term $$\cos \alpha $$ in Eq. (9.7), which in turn
can be determined using a scalar product
$$ \cos \beta \; = \; \textbf{n}^\top \cdot \textbf{h}. $$
The normalised angle bisector $$\textbf{h}$$ is given by the formula
$$ \textbf{h} \; = \; \frac{\textbf{l} + \textbf{v}}{\parallel \textbf{l} +
\textbf{v} \parallel }. $$
With directional light and an infinitely distant observer, i.e., with parallel
projection, $$\textbf{h}$$ does not change in contrast to the vector
$$\textbf{r}$$ in the original by Phong. In this case, this leads to an
increase in the efficiency of the calculation. The specular part of the
modified Phong model is, therefore
$$\begin{aligned} I \; = \; I_s \cdot k_s\cdot (\textbf{n}^\top \cdot
(9.7)
\textbf{h})^n. \end{aligned}$$
Note that the diffuse reflection is stored in a kind of texture, the irradiance
map. Non-uniform light from anywhere, e.g., specular reflection, is stored
in the environment map (see Chap. 10). The texture of the diffuse
visualisation in the irradiance map looks as if it was created by a strong low
pass filtering. This is due to the fact that the scattering behaviour of the
surface is also stored.
The considerations in this section have always referred to a light source,
a point on a surface and one of the three basic colours. If several light
sources, as well as scattered light (ambient light) $$I_{a}$$ and
possible (emissive) intrinsic light $$I_{e}$$ of the surface, are to be
taken into account, the individually calculated intensity values must be
added up so that the lighting equation
$$\begin{aligned} I= & {} I_{e} + I_{a} \cdot k_a\\[2mm]
\nonumber{} & {} + \; \sum _j I_j \cdot f_{\text{ att }} \cdot
g_{\text{ cone }} \cdot \left( k_d \cdot (\textbf{n}^\top \cdot (9.8)
\textbf{l}_j) + k_{s}\cdot (\textbf{r}_j^\top \cdot \textbf{v})^n\right)
\end{aligned}$$
where $$k_a$$ is the reflection coefficient for scattered light, which
is usually identical to the reflection coefficient $$k_d$$ for diffuse
reflection. $$I_j$$ is the basic intensity of the j-th light source, which
in the above formula has the same proportion of diffuse and specular
reflection. Of course, if the proportions $$I_d$$ and $$I_s$$ are
unequal, this can be separated into diffuse and specular reflection, as
described in this chapter. In the case of directional light, the two factors
$$f_{\text{ att }}$$ and $$g_{\text{ cone }}$$ are equal to one,
in the case of a point light source, the first factor models the distance-
dependent attenuation of the light, and in the case of a headlamp, the second
factor ensures the additional decrease in light intensity towards the edge of
the cones of light. $$\textbf{n}$$, $$\textbf{l}$$,
$$\textbf{v}$$, and $$\textbf{r}$$, are the vectors already used
in Figs. 9.7 and 9.9, respectively. In this equation, the original Phong model
for the specular reflection was used. The reflection coefficient Fig. 9.9 for
the specular reflection is usually different from $$k_d$$ or
$$k_a$$.
The following laws apply to material properties. No more radiation may
be emitted than is received. Thus, $$0\le k_a,k_d,k_s\le 1$$ with
$$k_a + k_d + k_s \le 1$$ must apply. If something should be
represented strongly reflecting, then it is important that $$k_s$$ is
much higher than $$k_d$$. For matt surfaces, however, the
$$k_d$$ can be set much larger than $$k_s$$. For low-contrast
display, it is recommended to set $$k_a$$ much higher than
$$k_d$$ and $$k_s$$. Plastic-like materials have a white
specular material colour $$k_s$$, i.e., the RGB colour channels are
almost identical for $$k_s$$. Metal-like materials have more RGB
colour channels that are similar for diffuse and specular material colour, i.e.,
for each RGB colour channel $$k_d$$ is almost identical to
$$k_s$$. Nevertheless, it should be noted that a realistic
representation of shiny metals is difficult to realise with the local Phong
lighting model. Here you can get realistic effects by using, e.g.,
environment maps and projecting them onto the object (see Chap. 10). A
good list of specific material properties and their implementation with the
parameters $$k_a$$, $$k_d$$ , $$k_s$$ and the specular
reflection exponents n can be found in [5]. The visualisation of the Utah
Teapot with the material property ruby is based on this in this chapter.
The application of Eq. (9.8) to the three primary colours red, green and
blue does not mean a great deal of additional computational effort since
only the coefficients $$I_{\text{ self-lighting }}$$,
$$I_{\text{ scattered } \text{ lighting }}$$, $$I_j$$,
$$k_a$$, $$k_d$$ and $$k_s$$ change with the different
colours, which are constant and do not have to be calculated. On the other
hand, the other coefficients values for each point, i.e., for each vertex or
fragment, and for each light source, require laborious calculation.
One technique that can lead to acceleration when several light sources
are used is deferred shading, which is based on the depth buffer algorithm
(see Sect. 8.2.3). The disadvantage of the depth buffer algorithm is that the
complex calculations are also performed for those objects that are not
visible and are overwritten by other objects further ahead in the scene in the
later course of the algorithm. Therefore, in the case of deferred shading, the
depth buffer algorithm first run through, in which only the depth values are
entered into the z-buffer. In addition to the depth value, the normals for each
pixel and the material are stored in textures. The z-buffer is also saved in
texture to have access to the original depth values. Only on the second pass
with already filled z-buffer, the entries for the colours in the frame buffer
are updated. Since both the normals and the material are stored in textures,
the illumination (without having to rasterize the geometry again) can be
calculated. It should be noted that this process has to be repeated depending
on the number of light sources. This means that the complex calculations
for illumination only need to be carried out for the visible objects.

9.3 The Lighting Model According to Phong in


the OpenGL
As explained in the previous Sects. 9.1 and 9.2, the standard lighting model
in the OpenGL is composed of the components ambient (ambient light),
diffuse (diffuse reflections), specular (specular reflections) and emissive
(self-luminous). For the lighting, the following light sources are available as
directional lights, point lights and spotlights. Furthermore, material
properties must be defined for an object, which determines the effects of
illumination. A simple illuminated scene is shown in Fig. 9.13.

Fig. 9.13 An illuminated scene in the OpenGL

The Blinn–Phong lighting model is implemented in the OpenGL as


follows. In the following, the analysis is separated into a fixed function and
programmable pipeline. First, the fixed function pipeline is considered, in
which Blinn–Phong is the standardised procedure. One specifies in the
init method the parameters for a light source. One can use the following
method “setLight” from Fig. 9.14, which specifies the parameters of a light
source.

Fig. 9.14 Illumination according to Blinn-Phong in the fixed function pipeline (Java): This is part of
the init method of the OpenGL renderer

The source code in Fig. 9.15 defines a white light source at position
(0, 2, 6). Ambient, diffuse and specular illumination or reflection are fixed
to the colour white. Note that the colours are specified in the range between
0 and 1 in the data type Float instead of integer (instead of 0 to 255 with 8
bits, see Sect. 6.3).

Fig. 9.15 Definition of a white light source at position (0, 2, 6) (Java): Part of the init method of
the OpenGL renderer

Thus, the parameters of the Blinn–Phong lighting model are assigned as


follows. From $$I \; = \; k_a\cdot I_a$$ resulting $$I_a$$ is
calculated for each colour channel separately with lightAmbientCol,
$$I_d$$ from
$$I \; = \; I_d \cdot k_d \cdot (\textbf{n}^\top \cdot \textbf{l})$$ with
lightDiffuseCol and $$I_s$$ from
$$I \; = \; I_s \cdot k_s\cdot (\textbf{r}^\top \cdot \textbf{v})^n$$
with lightSpecularCol. The parameters $$k_a$$,
$$k_d$$ and $$k_s$$ determine the material properties and are
also set individually for each colour channel with matAmbient,
matDiffuse and matSpecular. The parameter matShininess is
the exponent n of the specular light component. Figure 9.16 shows the
source code examples for the material properties of the treetop from
Fig. 9.12.

Fig. 9.16 Definition of a material for lighting (Java): This is a part of init method of the OpenGL
renderer
In the programmable pipeline (core profile), the commands of the fixed
function pipeline do not exist, and therefore, the formulas of the light
components must be implemented. This represents a certain amount of
programming effort, but in contrast, allows for extensive design options
when implementing the lighting. In the fixed function pipeline, the lighting
calculations are carried out before the rasterization. This has the advantage
that these calculations can be made very efficiently. On closer inspection,
however, one notices distortions, especially in the area of the mirror
reflection, which are caused by the interpolation of the colour values in the
screen. The reason for this is that the colour values for the corner points are
created adapted to the lighting before the screening and these colour values
are then interpolated in the screening. The flat and Gouraud shading
procedures presented in Sect. 9.4 work according to this principle. To
reduce such distortions, it is better to interpolate the normal values instead
of the colour values, and then use the interpolated normals to calculate the
illumination (and from this the colour value calculation). For this reason,
one intends to implement the more complex calculation in the fragment
shader of the programmable pipeline.
This means that the normal of the vertices are transformed by the
viewing pipeline (note that the transposed inverse of the overall matrix of
the viewing pipeline is to be used, see Sect. 5.11). These geometrical
calculations including the calculation of the positions of the vertices with
the total matrix of the viewing pipeline are performed in the vertex shader.
The transformed normals are then interpolated in the rasterization and must
be normalised again in the fragment shader after this phase. For each
fragment, lighting calculations are implemented in the fragment shader
based on the interpolated normals and from this the colour value
calculations. This principle corresponds to the Phong shading from
Sect. 9.4. Of course, Gouraud shading can also be implemented with
shaders, but nowadays, the hardware is no longer a performance limitation
for phong shading.
For this purpose, the parameters for a light source are specified in the
init method. A class “ LightSource” can be created for this purpose
(see Figs. 9.22 and 9.23). For example, the following source code of
Fig. 9.17 defines a white light source at the position (0, 3, 3)
Fig. 9.17 Definition of a white light source for illumination (Java)

Furthermore, material properties are defined in the init method for


each object, as shown in Fig. 9.18.

Fig. 9.18 Definition of a red material for the illumination in the programmable pipeline (Java)

In the display method, for each object, the light source and material
parameters are transferred to the vertex and the fragment shader is
specified. It is best to define it directly after passing the model-view matrix
“MvMatrix”) to the shaders. Figure 9.19 shows this situation.

Fig. 9.19 Transfer of the parameters for the illumination of the programmable pipeline to the shader
program (Java)

Since the normals in the rasterizer are interpolated for each fragment,
the transposed inverse of the overall matrix of the viewing pipeline must be
interpolated with call glGetMvitMatrixf of the PMVMatrix class
(PMVTool). To avoid having to recalculate this class each time a fragment
is called, the result matrix is transferred to the vertex shader as a uniform
variable.

In the vertex shader, this matrix is provided as follows:

In the vertex shader, the updated transformed normals are calculated and
passed to the rasterizer as out variables.

Then, after the normals are normalised again in the fragment shader, the
illumination calculations (according to Blinn–Phong) are performed.
In the case of additional texture incorporation (techniques for this are
explained in the Chap. 10), the final colour values can be calculated as
follows:

Figures 9.20 and 9.21 show the implementation in the vertex and
fragment shaders. If one wants to display several light sources of different
types, then it is desirable to send the parameters of the light sources bundled
per light source to the shaders. This means that a procedure is needed that
passes array structures to the vertex or fragment shaders. For this, the
LightSource class is modified to allow access via uniform array
structure as a struct. Essentially, only the structures defined in the class
LightSource arrays must be accessible from outside, in addition to the
get and set methods. In addition, the parameters for a headlamp
illumination are supplemented. Figures 9.22 and 9.23 show the extended
LightSource class.

Fig. 9.20 Implementation of the illumination model according to Blinn–Phong in the vertex shader
(GLSL)

Fig. 9.21 Implementation of the illumination model according to Blinn–Phong in the Fragment
Shader (GLSL)

Fig. 9.22 LightSource class part 1 (Java)

Fig. 9.23 LightSource Class part 2 (Java)

Fig. 9.24 Method to initialise an object, in this case, Utah teapot part 1 (Java)

Fig. 9.25 Method to initialise an object, in this case, Utah teapot part 2 (Java)
Fig. 9.26 Method to initialise an object, in this case, Utah teapot part 3 (Java)

It should be noted that the shaders do not perform attenuation inversely


proportional to the distance, but the light source is a point light source. For
the sake of clarity, this attenuation is neglected in the implementation.
In the following, the Utah Teapot is shown as an object with several
light sources. The method initObject1 from Figs. 9.24, 9.25 and 9.26
initialises the Utah teapot. This method is called in the init method, as
soon as all shaders and the program are activated. In contrast to the
previous implementation of a point light source, where the locations of the
in-variables and uniforms in the shader are stored in of the init method, a
further variant is presented below. If the locations are not set in advance,
they are automatically assigned by the shader. This will be explained in the
following for the uniforms. With the method
gl.glGetUniformLocation, the locations can also be activated and
queried. This must be done after linking the program. Since these locations
are also required in the display method for activating the CPU- and
GPU-based uniforms, they should be stored in an array, here, e.g.,
lightsUnits for the parameters of a light source i. For example,
lightsUnits[ i ].ambient, in this case, returns the location of the
parameter of the ambient light component of a light source i from a shader
point of view. Then the Utah teapot is loaded, its vertices, normals and
texture coordinates specified by vertex buffer object (VBO) and vertex
array object (VAO) are transferred to the GPU. Finally, the material
properties are initialised and a texture is loaded and assigned (see
Chap. 10).
The displayObject1 method, shown in Figs. 9.27 and 9.28, is
called in the display method. It prepares the Utah teapot for drawing.
With glUniform, the uniforms with the previously assigned locations are
passed to the shaders. Then with glDrawElements, the call for drawing
the Utah teapot is made.

Fig. 9.27 Method displayObject1 for the preparatory visualisation of the Utah teapot part 1
(Java)
Fig. 9.28 Method displayObject1 for the preparatory visualisation of the Utah teapot part 2
(Java)

Fig. 9.29 Vertex shader for multiple light sources (GLSL)

The vertex shader is almost identical to that of the point light source in
Fig. 9.20. This is shown in Fig. 9.29.
The Blinn–Phong illumination model according to Eq. (9.8) is
implemented in the fragment shader of Figs. 9.30–9.32.
$$f_{\text{ att }}$$ is identical to the variable atten, and
$$g_{\text{ cone }}$$ corresponds to the variable textttspotatten.
More details about the illumination model in the OpenGL can be found
in [6].

Fig. 9.30 Fragment shader for multiple light sources part 1 (GLSL)

Fig. 9.31 Fragment shader for multiple light sources part 2 (GLSL)

Fig. 9.32 Fragment shader for multiple light sources part 3 (GLSL)

9.4 Shading
In the calculations for the reflection of light from a surface in Sect. 9.2, it
was assumed that the normal vector is known at every point on the surface.
To correctly colour a fragment in the projection plane, it is necessary to
determine not only which object surfaces are visible there, but also where
the beam passing through the corresponding pixel hits the surface. For
surfaces modelled with cubic freeform surfaces, this would mean that one
would have to solve a system of equations whose unknowns occur to the
third power, which would require enormous computing effort. Therefore,
when calculating the reflection, one does not use the freeform surfaces
themselves, but their approximation by plane polygons. In the simplest
case, one ignores that the normal vectors should actually be calculated for
the original curved surface, and instead determines the normal vector of the
respective polygon.
With constant shading (flat shading), the colour of a polygon is
calculated at only one point, i.e., one normal vector and one point for
shading are used per polygon. The polygon is shaded evenly with the colour
thus determined. This procedure would be correct if the following
conditions were met:
The light source is infinitely far away so that
$$\textbf{n}^\top \cdot \textbf{l}$$ is constant.
The observer is infinitely far away, so that
$$\textbf{n}^\top \cdot \textbf{v}$$ is constant.
The polygon represents the actual surface of the object and is not just an
approximation of a curved surface.
Under these assumptions, shading can be calculated relatively easily
and quickly, but it leads to unrealistic representations. Figure 9.33 shows
the same sphere as in Fig. 4.11, also with different tessellations. In
Fig. 9.33, however, a constant shading was used. Even with the very fine
approximation by triangles on the far right of the figure, the facet structure
on the surface of the sphere is still clearly visible, whereas in Fig. 4.11 the
facet structure is almost no longer optically perceptible even at medium
resolution. The same can be observed in Fig. 9.34. In order to make the
facets disappear with constant shading, an enormous increase in resolution
is required. This is partly due to the fact that human vision is automatically
pre-processed, which increases edges, i.e., contrasts so that even small
kinks are detected.

Fig. 9.33 A sphere, shown with constant shading and different tessellation

Fig. 9.34 The Utah teapot, shown with constant shading and different tessellation

Instead of constant shading, interpolation is, therefore, used in shading.


The normal vectors of the vertices of the polygon are needed for this. In this
way, a triangle approximating a part of a curved surface is assigned three
normal vectors, all of which do not have to coincide with the normal vector
belonging to the triangle plane. If the triangles are not automatically
calculated to approximate a curved surface but were created individually,
normal vectors can be generated at the vertices of a triangle by averaging
the normal vectors of all triangular surfaces that meet at this vertex,
component by component. This technique has already been described in
Sect. 4.7.
Assuming that triangles approximate the surface of an object, the
Gouraud shading [4] determines the shading at each corner of the triangle
using the different normal vectors in the vertices. Shading of the points
inside the triangle is done by colour interpolation using the colours of the
three vertices. This results in a linear progression of the colour intensity
over the triangle. Figure 9.35 shows the colour intensity as a function over
the triangle area. Figure 9.37 shows the implementation with different
degrees of tessellation for the Utah teapot. Figure 9.38 shows the same
model with flat and Gouraud shading with specular reflection.

Fig. 9.35 The colour intensity as a function over the triangular area in Gouraud shading

Fig. 9.36 Scan-line method for calculating the Gouraud shading

Fig. 9.37 The Utah Teapot with Gouraud shading of different tessellations

Fig. 9.38 The Utah Teapot with flat and Gouraud shading for specular reflection

The effective calculation of the intensities within the triangle is done by


a scan-line method. For this purpose, for a scan line $$y_s$$ first the
intensities $$I_a$$ and $$I_b$$ are determined on the two
edges intersecting the scan line. This is done by interpolating the intensities
of the two corners accordingly. The intensity changes linearly along the
scan line from the start value $$I_a$$ to the end value $$I_b$$.
Figure 9.36 illustrates this procedure. The intensities are determined with
the following formulas:
$$\begin{aligned} I_a= & {} I_1 -(I_1 - I_2)\frac{y_1 - y_s}{y_1 -
y_2}\\[1mm] I_b= & {} I_1 -(I_1 - I_3)\frac{y_1 - y_s}{y_1 - y_3}\\
[1mm] I_p= & {} I_b -(I_b - I_a)\frac{x_b - x_p}{x_b - x_a}.
\end{aligned}$$
The intensities to be calculated are integer colour values between 0 and 255.
As a rule, the difference in intensity on the triangle will not be very large,
so that the slope of the linear intensity curve on the scan line will almost
always be less than one. In this case, the midpoint algorithm can be used to
determine the discrete intensity values along the scan line.
The facet effect is significantly reduced by the Gouraud shading.
Nevertheless, the maximum intensity of the Gouraud shading within a
triangle can only be assumed in the corners due to the linear interpolation,
so that it can still happen that the corners of the triangles stand out a bit
(Fig. 9.38).
Phong shading [8] is similar to Gouraud shading, but instead of
intensities, the normal vectors are interpolated. Thus, the maximum
intensity can also be obtained inside the triangle, depending on the angle at
which light falls on the triangle. Figure 9.39 shows a curved area and a
triangle that approximates a part of the area. The normal vectors at the
corners of the triangle are the normal vectors to the curved surface at these
three points. Within the triangle, the normal vectors are calculated as a
convex combination of these three normal vectors.

Fig. 9.39 Interpolated normal vectors in Phong shading

Gouraud and Phong shading provide only very good approximations of


the actual shading of a curved surface. It would be correct to determine for
each triangle point the corresponding normal vector on the original curved
surface and thus determine the intensity in the point. However, this would
mean that when displaying a scene, the information about the triangles
including normal vectors in the corner points would not be sufficient and
that the curved surfaces would always have to be accessed directly when
calculating the scene, which would lead to a very high calculation effort.

9.5 Shading in the OpenGL


Flat and Gouraud shading is implemented before Phong shading is
implemented after rasterizing. In OpenGL, rasterising always interpolates.
In the case of flat shading, the same colour is assigned per primitive to each
vertex of the primitive. The source vertex for this colour is called provoking
vertex (see Sect. 2.5.2). This directly implies that flat and Gouraud shading
can be performed both in the fixed function and in the programmable
pipeline. Phong shading, on the other hand, can only be implemented in the
programmable pipeline. The source code of the lighting calculation in the
fragment shader shown in Sect. 9.3 already corresponds to the
implementation of Phong shading. In the fixed function pipeline, flat or
Gouraud shading can be controlled by the functions
$$\mathtt{gl.glShadeModel(gl.GL\_FLAT)}$$ or
$$\mathtt{gl.glEnable(gl.GL\_FLAT)}$$,
$$\mathtt{gl.glShadeModel(GL\_SMOOTH)}$$ or
$$\mathtt{gl.glEn}$$ $$\mathtt{able(gl.GL\_SMOOTH)}$$. In the
programmable pipeline, the correct choice ofan interpolation qualifier
(flat for flat shading and smooth for Gouraud shading) defines flat or
Gouraud shading. All that is required is that the respective qualifier is
assigned before in or out variable of the corresponding normal (or colour).
In the vertex shader, this means concretely using the example of the
colour:

and in the fragment shader

9.6 Shadows
An important aspect that has not yet been considered in shading in the
previous sections is shadows. The Chap. 8 describes methods to determine
which objects are visible to the viewer of a scene. With the previous
considerations of shading visible surfaces, the angle between the normal
vector and the vector pointing in the direction of the light source or its
scalar product can be used to decide whether the light source illuminates the
object from behind. In this case, it has no effect on the representation of the
surface. It was not taken into account whether a light source whose light
points in the direction of an object actually illuminates this object or
whether another object might be an obstacle on the way from the light
source to the object so that the object is in shadow with respect to the light
source. Figure 9.40 shows how a shadow is created on an object when it is
illuminated from above by a light source and another object is located in
between.

Fig. 9.40 Shadow on an object

Shadow does not mean blackening of the surface, but only that the light
of the corresponding light source does not—or at most in the form of the
separately modelled scattered light—contribute to the illumination of the
object. In terms of the illumination model according to Phong, these would
be the diffuse and specular light components. Determining whether the light
of a light source reaches the surface of an object is equivalent to
determining whether the object is visible from the light source. The
question of the visibility of an object has already been dealt with in the
Sect. 8.2.3, but from the point of view of the observer. This relationship
between visibility and shadow is exploited by the two-pass depth buffer
algorithm (z-buffer algorithm).
In the first pass, the depth buffer algorithm (see Sect. 8.2.3) is executed
with the light source as an observer’s point of view. With directional light,
this means a parallel projection in the opposite direction to the light beams.
With a point light source or a spotlight a perspective projection with the
light source in the projection centre is applied. For the depth buffer
algorithm, this projection must be transformed to a parallel projection that
can be traced back to the xy-plane. In this first pass of the depth buffer
algorithm, only the z-buffer $$Z_L$$ and no colour buffer is used.
The second pass corresponds to the normal depth buffer algorithm from the
observer’s point of view with the following modification.
A transformation $$T_V$$ is necessary, which reduces the
perspective projection with the viewer in the projection centre to a parallel
projection on the xy-plane. If during this second pass, an object or a point
on its surface is entered into the z-buffer $$T_V$$ for the viewer,
before calculating the value for the colour buffer, check whether the point is
in the shadow of another object. If the point has the coordinates
$$(x_V,\, y_V,\, z_V)$$ in the parallel projection on the xy-plane, the
transformation
$$ \left( \begin{array}{c} x_L\\ y_L\\ z_L \end{array} \right) \; = \; T_L
\cdot T_V^{-1} \cdot \left( \begin{array}{c} x_V\\ y_V\\ z_V \end{array}
\right) $$
receive the coordinates of the point as seen from the light source. Here
$$T_V^{-1}$$ is the inverse transformation, i.e., the inverse matrix,
to $$T_V$$. The value $$z_L$$ is then compared with the
entry in the z-buffer $$Z_L$$ of the light source at the point
$$(x_L,\, y_L)$$. If there is a smaller value in the z-buffer
$$Z_L$$ at this point, there is obviously an object that is closer to the
light source than the point under consideration. Thus the point is in the
shadow of the light source. Otherwise, the light of the light source reaches
the point. When entering the point in the colour buffer $$F_V$$ for
the viewer, it can be taken into account whether the light source contributes
to the illumination of the point or not. In the case of a light source, if the
point is in shadow, only the scattered light would be included in its
colouration. If there are several light sources, the first run of the two-pass
depth buffer algorithm must be calculated separately for each light source.
On the second pass, after entering a value in the z-buffer $$Z_V$$ of
the observer, a check is made for each light source to see if it illuminates
the point. This is then taken into account accordingly when determining the
value for the colour buffer. The following algorithm describes this
procedure for calculating shadows using the z-buffer algorithm from
Sect. 8.2.3:

Step 1: Calculate z-buffer (smallest z-value) $$\mathtt{z\_depth}$$ for


each pixel as seen from the light source L.
Step 2: Render the image from the viewer’s perspective using the
modified z-buffer algorithm. For each visible pixel, P(x, y, z)
transformation is performed from the point of view of the light source L
through, receive $$(x',y',z')$$.

If z‘> z_depth(x‘,y‘): P in the shadow of L


If z‘= z_depth(x‘,y‘): illuminate P with L

9.7 Opacity and Transparency


Transparency means that you can partially see through a fragment of an
object or surface, such as a tinted glass plane (see Chap. 6 on colour and the
alpha channel in RGBA). Transparency of a surface means here that
although only a part of the light from the fragments of an object behind it is
transmitted, no further distortion takes place, such as a frosted glass pane
that allows light to pass through but behind which no contours can be seen.
This frosted glass effect is called translucency, which is not as well as
transparency in combination with refraction. Opacity is the opposite of
transparency. Strictly speaking, opacity is defined with the same alpha
channel: Alpha equal to 0 means transparent whereas alpha equal to 1
means opaque. OpenGL uses the term opacity in this context.
To explain how opacity can be modelled, a surface $$F_2$$ is
considered, which lies behind the transparent surface $$F_1$$. For
the model of interpolated or filtered transparency , an opacity coefficient
$$k_{\text{ opaque }} \in [0,1]$$ must be selected, which indicates
the proportion of light that passes through the surface $$F_1$$. For
$$k_{\text{ opaque }}=1$$, the surface $$F_1$$ is completely
opaque. For $$k_{\text{ opaque }}= 0$$, the surface $$F_1$$
is transparent. The colour intensity $$I_P$$ of a point P on the
transparent area $$F_1$$ is then calculated as follows:
$$\begin{aligned} I_P \; = \; k_{\text{ opaque }}\cdot I_1 + (1 -
(9.9)
k_{\text{ opaque }}) \cdot I_2. \end{aligned}$$
Here, $$I_1$$ is the intensity of the point, which would result from
the observation of the surface $$F_1$$ alone if it were opaque.
$$I_2$$ is the intensity of the point behind it, which would result
from the observation of the surface $$F_2$$ alone if $$F_1$$
were not present. The transparent surface is assigned a colour with the
intensities $$I_1$$ for red, green and blue. It should be noted that k is
normally used for the absorption coefficient. This means that in this
simplified model the scattering is neglected. Transmission, strictly
speaking, considers both absorption and scattering. Transparent surfaces
make the calculations more difficult for visibility considerations.
Here the depth buffer algorithm shall be considered as an example. If a
transparent surface is to be entered into the z- and the colour buffer, the
following problems arise:
Which z-value should be stored in the z-buffer? If the old value of the
object behind the transparent area is kept at O, an object that lies between
O and the transparent area and is entered later would completely
overwrite the colour buffer due to the smaller z coordinate, even though it
lies behind the transparent area. If instead, one use the z-value of the
transparent area, such an object would not be entered into the colour
buffer at all, although it would have to be visible behind the transparent
area at least.
Which value should be transferred to the colour buffer? If you use the
interpolated transparency according to Eq. (9.9), the information about
the value $$I_1$$ is lost for later objects directly behind the
transparent area. The value $$I_1$$ alone would not be sufficient
either. There is not quite an optimal possibility to use alpha blending. As
the usual storage of RGB colours occupies three bytes, and blocks of two
or four bytes are easier to manage in the computer, RGB values are often
provided with a fourth value, called alpha. This alpha value contains the
opacity coefficient $$k_{\text{ opaque }}$$ for opacity. But even
when using this alpha value, it is not clear to which object behind the
transparent surface the alpha blending, i.e., Eq. (9.9), should be applied.
To display opacity, all opaque areas should, therefore, be entered in the
depth buffer algorithm. Only then are the transparent areas with alpha
blending added. Again, this can cause problems if several transparent areas
lie behind each other. Make sure that the transparent areas are entered in the
correct order, i.e., from back to front. Usually, they are sorted according to
their z-coordinates.

Fig. 9.41 50% (left) and 25% (right) screen door

An alternative solution to this problem is the screen door transparency,


which is based on a similar principle as the halftone method introduced in
the Sect. 6.1. The mixture of the intensity of the transparent surface with
that of the object behind it is simulated by the proportionate colouring of
the fragments. With an opacity coefficient of
$$k_{\text{ opaque }}=0.75$$ every fourth pixel would be coloured
with the colour of the object behind it, all other pixels with the colour of the
transparent area. Figure 9.41 illustrates this principle using greatly enlarged
fragments. The red colour is assigned to the transparent area, the green
colour to the object behind it. Left in of the figure was used
$$k_{\text{ opaque }}= 0.5$$, right
$$k_{\text{ opaque }} = 0.75$$.
The screen door transparency could be implemented with the depth
buffer algorithm. The value from which the colouring originates would be
entered in the z-buffer. So with $$k_{\text{ opaque }} = 0.75$$,
three-quarters of the points would be the z-value of the front one,
transparent area and a quarter of the z-value of the area behind it. A later
viewed object in front of the transparent area would correctly overwrite
everything. An object behind the opaque object already entered would not
be entered at all. An object between the transparent surface and the
previously entered object will overwrite exactly the same amount of pixels
that were assigned to the object behind, i.e., you would get the same amount
of pixels for the new object behind the transparent surface.
Screen door transparency in combination with the depth buffer
algorithm, like the halftone methods, only leads to satisfactory results if the
resolution is sufficiently high. With opacity coefficients close to 50%, the
results can hardly be distinguished from the interpolated transparency. With
very weak or very strong opacity, the screen door transparency tends to
form dot patterns.

9.8 Radiosity Model


When dealing with reflections from surfaces in Sect. 9.2, it is explained that
objects that have their own luminosity are not considered a light source in
the scene unless you add an additional light source at the position of the
object. This simplification is a special case of the basic assumption in the
reflection calculation that the light emitted by an object by its own
luminosity or by reflection does not contribute to the illumination of other
objects. By adding constant scattered light to the scene, the effect is to
model the effect that objects are not only illuminated by light coming
directly from light sources. The result of this simplifying modelling is
partly very sharp edges and contrasts. For example, if light and a dark wall
form a corner, the light wall will radiate light onto the dark wall. This effect
can be seen especially where the light wall meets the dark one, that means
on the edge between the two walls. With the illumination models for light
sources from Sect. 9.1 and for reflection from Sect. 9.2, the interaction of
light between individual objects is neglected. This results in a
representation of the two walls as shown in Fig. 9.42 on the left. This
results in a very sharp edge between the two walls. In the right part of the
figure, it was taken into account that the bright wall radiates light onto the
dark one. This makes the dark wall appear slightly brighter near the corner
and the edge between the two walls is less sharp.

Fig. 9.42 Light emission of an object onto another

A simple approach to take this effect into account is to use irradiance


mapping. However, irradiance mapping is not used as textures in the
classical sense (i.e., as an image on an object surface), but to calculate the
light that other objects radiate onto a given object. To do this, the shading of
the scene is first calculated as in Sect. 9.2, without considering the
interaction between the objects. Afterwards, the corresponding irradiance
map is determined for each object. The colour value resulting from this in
each fragment is treated as if it was derived from an additional light source
originates and is added to the previously calculated intensities according to
the reflection properties of the object surface. The radiosity model [3, 7]
bypasses these recursive calculations. The radiosity $$B_i$$ is the
rate of energy that the surface $$O_i$$ emits in the form of light. This
rate is composed of the following individual values when only diffuse
reflection is considered:
The inherent luminosity t $$E_i$$ of the surface. This term is not equal
to zero only for the light sources within the scene.
The reflected light that is emitted from other surfaces onto the surface
$$O_i$$. For example, if $$O_i$$ is a surface section of the dark
wall in Fig. 9.42 and $$O_j$$ is a surface portion of the light-coloured
wall, the energy of the light reflected by $$O_j$$ is calculated as
follows:
$$ \varrho _i \cdot B_j\cdot F_{ji} $$
$$\varrho _i$$ is the reflection coefficient of the surface $$O_i$$,
$$B_j$$ is the energy still to be determined which radiates $$O_j$$.
$$F_{ji}$$ is a (also dimensionless) shape or configuration factor
specifying the proportion of light emitted from the surface piece
$$O_j$$ that strikes $$O_i$$. In $$F_{ji}$$ the shape, the size
and the relative orientation of the two surface pieces to each other are
taken into account. For example, if the two pieces of surface are at a
perpendicular angle to each other, less light will pass from one piece of
the surface to the other than when they are opposite. The exact
determination of the form factors is explained below.
In the case of transparent surfaces, the energy of light shining through the
surface from behind must also be taken into account. For reasons of
simplification, transparency is not taken into account here.
The total energy of light emitted by the surface $$O_i$$ is the
sum of these individual energies. For n surfaces—including the light
sources—in the scene, one get
$$\begin{aligned} B_i \; = \; E_i + \varrho _i \cdot \sum _{j=1}^n
(9.10)
B_j\cdot F_{ji}. \end{aligned}$$
This results in the following linear system of equations with the unknowns
$$B_i$$
$$\begin{aligned} \left( \begin{array}{cccc} 1-\varrho _1 F_{1,1}
&{} -\varrho _1 F_{1,2} &{} \ldots &{} -\varrho _1
F_{1,n} \\[3mm] -\varrho _2 F_{2,1} &{} 1-\varrho _2 F_{2,2}
&{} \ldots &{} -\varrho _2 F_{2,n} \\[3mm] \vdots &
{} \vdots &{} \vdots &{} \vdots \\[3mm] -\varrho _n
(9.11)
F_{n,1} &{} -\varrho _n F_{n,2} &{} \ldots &{} 1-
\varrho _n F_{n,n} \end{array} \right) \cdot \left( \begin{array}{c}
B_1\\[3mm] B_2\\[3mm] \vdots \\[3mm] B_n \end{array} \right) \; =
\; \left( \begin{array}{c} E_1\\[3mm] E_2\\[3mm] \vdots \\[3mm]
E_n \end{array} \right) . \end{aligned}$$
This system of equations must be solved for each of the three primary
colours red, green and yellow. The number of equations is equal to the
number of area pieces considered, i.e., usually the number of triangles in
the scene plus light sources. If the triangles are very large, they have to be
tessellated in advance, otherwise inaccurate estimates will be made,
increasing the number of area pieces. However, the number of light sources
will be negligible in relation to the triangles. It is, therefore, a system of
equations with hundreds or thousands of equations and unknowns. For
typical scenes, it will be a thin system of equations. This means that for the
most surfaces will be the form factor $$F_{ij}= 0$$, because they are
either at a wrong angle to each other or are too far apart.

Fig. 9.43 Determination of the form factor with the normals $$n_{i}$$ and $$n_{j}$$ of the
two infinitesimal small areas $$dA_i$$ and $$dA_j$$ which are at a distance r from each
other

To calculate the form factor of the surface $$A_i$$ to surface


$$A_j$$, the two surfaces are divided into infinitesimally small
surface pieces $$dA_i$$ and $$dA_j$$ with distance r and
summed, i.e., integrated over these surface pieces. If the area piece
$$dA_j$$ is visible from $$dA_i$$, one get the differential
form factor with the designations from Fig. 9.43
$$ dF_{d_i,d_j} \; = \; \frac{\cos (\theta _i)\cdot \cos (\theta _j)}{\pi \cdot
r^2} \cdot dA_j. $$
It decreases quadratically with the distance between the two surface pieces,
corresponding to the damping. Also, the angle between the two surfaces
plays an important role. If the surface pieces are facing each other, the form
factor is the largest because then the normal vectors of the face pieces are
parallel to the connecting axis of the centres of the face pieces. As the angle
increases, the shape factor decreases until it reaches zero at an angle of
$$90^{\circ }$$. For angles greater than $$90^{\circ }$$, the
surface sections do not illuminate each other. The Cosine would take
negative values. To exclude this case, the factor
$$ H_{ij} \; = \; \left\{ \begin{array}{cc} 1 &{} \text{ if } dA_j
\text{ is } \text{ visible } \text{ from } dA_i\\ 0 &{} \text{ otherwise }
\end{array} \right. $$
is introduced so that
$$ dF_{d_i,d_j} \; = \; \frac{\cos (\theta _i)\cdot \cos (\theta _j)}{\pi \cdot
r^2} \cdot H_{ij} \cdot dA_j $$
applies for the differential form factor. Through integration, one get the
form factor from the differential area dAi $$dA_i$$ to area
$$A_j$$
$$ dF_{d_i,j} \; = \; \int _{A_j} \frac{\cos (\theta _i)\cdot \cos (\theta _j)}
{\pi \cdot r^2} \cdot H_{ij} \, dA_j. $$
By repeated integration, one finally get the form factor of the area
$$A_i$$ to area $$A_j$$
$$ F_{i,j} \; = \; \frac{1}{A_i} \int _{A_i} \int _{A_j} \frac{\cos (\theta
_i)\cdot \cos (\theta _j)}{\pi \cdot r^2} \cdot H_{ij} \, dA_j \, dA_i. $$
For small areas, the form factor can be calculated approximately by placing
a hemisphere around the centre of gravity of the area. The form factor for
determining how much a surface $$A_i$$ contributes by its light to
the illumination of the considered surface piece results from the projection
of $$A_i$$ onto the hemisphere and the subsequent projection onto
the circular surface around the centre of the hemisphere. The proportion of
the projected area in the circle corresponds to the form factor. This
procedure is shown in Fig. 9.44. The proportion of the dark grey area on the
circular surface gives the form factor. A simpler, but coarser approximation
for the form factor was proposed by Cohen and Greenberg [2]. The
hemisphere is replaced by a half of a cube.

Fig. 9.44 Determination of the form factor according to Nusselt

Also, for the solution of the system of Eq. (9.11) to determine the
radiosity values $$B_i$$, techniques are used that quickly arrive at
good approximate solutions. The stepwise refinement [1] calculated with the
help of Eq. (9.10) iteratively updates the values $$B_i$$ by
evaluating the equations. To do this, all values are first set to
$$B_i= E_i$$, i.e., all values except for the light sources are
initialised with zero. In the beginning, $$\varDelta B_i = E_i$$ is also
defined for the changes. Then the surface $$O_{i_0}$$ is selected for
which the value $$\varDelta B_{i_0}$$ is the highest. In the first
step, this is the brightest light source. This means that all $$B_i$$ are
recalculated by
$$\begin{aligned} B_i^{\text{(new) }} \; = \; B_i^{\text{(old) }}+
(9.12)
\varrho _i\cdot F_{i_0i}\cdot \varDelta B_{i_0}. \end{aligned}$$
All $$\varDelta B_i$$ are also updated using
$$\begin{aligned} \varDelta B_i^{\text{(new) }} \; = \; \left\{
\begin{array}{ll} \varDelta B_i^{\text{(old) }}+ \varrho _i\cdot
(9.13)
F_{i_0i}\cdot \varDelta B_{i_0} &{} \text{ if } i\ne i_0\\[2mm]
0 &{} \text{ if } i = i_0. \end{array} \right. \end{aligned}$$
The light from the surface or light source $$O_{i_0}$$ was thus
distributed to the other surfaces. Then the surface with the highest value
$$\varDelta B_{i_0}$$ is selected again, and the calculations
according to (9.12) and (9.13) are carried out. This procedure is repeated
until there are only minimal changes or until a maximum number of
iteration steps has been reached. The advantage of stepwise refinement is
that it can be stopped at any time and—depending on the number of
iterations—provides a more or less good approximate solution.
The radiosity model leads to much more realistic results in lighting. The
radiosity models presented here, which only take diffuse reflection into
account, can be performed independently of the position of the observer.
For fixed light sources and no or only a few moving objects, the radiosity
values can be precalculated and stored as textures in lightmaps (see
Chap. 10), which are applied to the objects. The real-time calculations are
then limited to the display of the textures, specular reflection, and shading
of the moving objects.

9.9 Raycasting and Raytracing


Beam tracing is another image-space method for determining visibility. A
distinction is made between raycasting, which is used for simple raytracing,
and raytracing for lighting effects, which means raytracing with reflections.
As a rule, raytracing is implemented with refraction and with shadows
(white raytracing). For each pixel of the window to be displayed on the
projection plane, a ray is calculated, and it is determined which object the
ray intersects first, thus determining the colour of the pixel. Beam tracing is
suitable for parallel projections, but can also be used in the case of
perspective projection without transforming the perspective projection into
a parallel projection. In a parallel projection, the rays run parallel to the
projection direction through the pixels. In a perspective projection, the rays
follow the connecting lines between the projection centre and the pixels.
Figure 9.45 illustrates the raytracing technique. In the picture, the pixels
correspond to the centres of the squares on the projection plane.

Fig. 9.45 Beam tracing

For a perspective projection with projection centre in point


$$(x_0,y_0,z_0)$$, the beam to the pixel with the coordinates
$$(x_1,y_1,z_1)$$ can be parameterized as a straight-line equation as
follows:
$$\begin{aligned} x \; = \; x_0 + t\cdot \varDelta x, y \; = \; y_0 +
(9.14)
t\cdot \varDelta y, z \; = \; z_0 + t\cdot \varDelta z \end{aligned}$$
with
$$ \varDelta x \; = \; x_1-x_0, \varDelta y \; = \; y_1-y_0, \varDelta z \; =
\; z_1-z_0. $$
For values $$t < 0$$, the beam (interpreted as a straight line) is
located behind the projection centre, for $$t \in [0,1]$$ between the
projection centre and the projection plane, for $$t > 1$$ behind the
projection plane.
To determine whether and, if so, where the beam intersects a plane
polygon, first the intersection point of the shaft with the plane
$$Ax+By+Cz+D = 0$$, which spans the polygon, is determined.
Then it is checked whether the point of intersection with the plane lies
within the polygon. If the straight-line equation of the beam (9.14) is
inserted into the plane equation, the value for t is obtained
$$ t \; = \; -\frac{Ax_0+By_0+Cz_0+D}{A\varDelta x + B\varDelta y +
C\varDelta z}. $$
If the equation $$Ax+By+Cz+D = 0$$ describes a plane, the
denominator can only become zero if the beam is parallel to the plane. In
this case, the plane is irrelevant for the considered pixel through which the
beam has passed. To clarify whether the intersection point is within the
polygon, the polygon and the intersection point are projected into one of the
coordinate planes by omitting one of the three coordinates. Usually, the
plane most parallel to the polygon is selected for projection. For this
purpose, the plane perpendicular to the coordinate with the highest
coefficient of the normal vector (A, B, C) of the plane must be projected.
After the projection, using the odd-parity rule (see Sect. 7.5), it can be
determined whether the point is within the polygon. Figure 9.46 illustrates
this procedure.

Fig. 9.46 Projection of a polygon to determine whether a point lies within the polygon

The beams are usually calculated in parallel due to the efficiency.


Therefore, no dependencies between neighbouring pixels should be created
in the calculations. Due to the beam tracking, aliasing effects can occur (see
Chap. 7) if the far clipping plane is very far away. The background results
from objects hit more or less randomly by rays so that for very distant
objects, it can no longer be assumed that neighbouring pixels get the same
colour. Supersampling can be used to avoid this effect. This is done by
calculating several rays for a pixel, as shown in Fig. 9.47, and using the—
possibly weighted—average of the corresponding colours. However,
supersampling is very complex in practice because each pixel is
individually oversampled.

Fig. 9.47 Supersampling

The presented raycasting method for visibility calculations is a simple


beam tracing method whose principle can also be used for illumination. For
this purpose, rays emitted by the observer or, in case of parallel projection,
parallel rays are sent through the pixel grid. A beam is tracked until it hits
the nearest object. Every object in the three-dimensional scene must be
tested, which is hugely complex. There, the usual calculations for the
illumination are first carried out. In Fig. 9.48, the beam first hits a pyramid.
At this point, the light source present in the scene is taken into account,
which contributes to a diffuse reflection. The beam is then followed in the
same way as with specular reflection, but in the opposite direction from the
observer and not from the light source. If it hits another object, the specular
reflection is again tracked “backward” until a maximum recursion depth is
reached, a light source or no other object is hit. It should be noted that this
procedure is applied to flat or light sources with an expansion ahead. A ray
will usually not hit a point in space. Finally, the light components of the
illumination of the object hit first must be taken into account, which is
caused by specular reflections on the other objects along the beam that
continues to follow. This standard procedure was developed by Turner
Whitted and uses the light sources exclusively for local illumination. This
type of beam tracing is called raytracing, as opposed to simple beam
tracing.

Fig. 9.48 Recursive Raytracing

9.10 Exercises
Exercise 9.1 In OpenGL, define one spotlight each in the colours red,
green and blue. Aim the three headlights at the front side of a white cube in
such a way that a white colouration appears in the middle of the surface and
the different colours of the headlights can be seen at the edges.

Exercise 9.2 Illuminate an area of a three-dimensional scene. Calculate


the specular portion of illumination for one point of the surface. The point
to be illuminated lies at the Cartesian coordinates (1, 1, 3). Assume that you
have already mirrored the vector from the point to the light source at the
normal and obtained the vector (0, 1, 0). The eyepoint is located at the
coordinates (4, 5, 3). Use the formula discussed in this chapter to determine
the specular light component I
$$\begin{aligned} I = I_L\cdot k_{\text {sr}} \cdot \cos \alpha ^{n}.
\end{aligned}$$
Assume, that $$k_{\text {sr}} = 1$$, $$I_L = \frac{1}{2}$$, and
$$n= 2$$ applies. Make a sketch of the present situation. Calculate I
using the information provided.

Exercise 9.3 A viewer is located at the point (2, 1, 4) and looks at the
(y, z) plane, which is illuminated by a point light source at infinity. The light
source lies in the direction of the vector (1, 2, 3). At which point of the
plane does the observer see specular reflection?

Exercise 9.4 Model a lamp as an object in the OpenGL and assign a light
source to it at the same time. Move the lamp with the light source through
the scene.

Fig. 9.49 Square for the application Flat- or Gouraud-Shadings

Exercise 9.5 The square of Fig. 9.49 should be coloured with integer grey
values. Remember that grey values have identical RGB entries, so the
figure simply contains one number each. For the grey value representation
of the individual pixels, 8 bits are available. For the corner points of the
rectangle, the grey values given in each case have already been calculated
by the illumination equation.
a)
Triangulate the square and colour it using the flat shading method.
Write the calculated colour values in the boxes representing each
fragment. Note: You can decide whether to colour the fragments on the
diagonal between two triangles with the colour value of one or the
other triangle.
b)
Triangulate the square and colour it using the Gouraud shading
method. Write the calculated colour values in the boxes representing
each fragment. Round off non-integer values.

Exercise 9.6 In the radiosity model, should the backside distance be


applied before setting up the illumination Eq. (9.11)?
Exercise 9.7 Let there be the eyepoint $$(x_{0},y_{0},z_{0})^{T}$$
with $$x_{0},y_{0},z_{0}\in \mathbb {R}$$ and a pixel
$$(x_{1},y_{1},z_{1})^{T}$$ with
$$x_{1},y_{1},z_{1}\in \mathbb {N}$$ on the image area. Furthermore,
the raycasting ray as a straight line
$$\begin{aligned} \left( \begin{array}{c} x\\ y\\ z\end{array}\right)
=\left( \begin{array}{c} x_{0}\\ y_{0}\\ z_{0}\end{array}\right) +t\cdot
\left( \begin{array}{c} x_{1}-x_{0}\\ y_{1}-y_{0}\\ z_{1}-
z_{0}\end{array}\right) \end{aligned}$$
and a bullet
$$\begin{aligned} (x - a)^{2} + (y - b)^{2} + (z - c)^{2} = r^{2}
\end{aligned}$$
with the centre $$(a,b,c)\in \mathbb {R}^{3}$$ and the radius
$$r\in \mathbb {R}$$.
a)
Develop the raycasting calculation rule, which can be used to calculate
the intersection of the raycasting beam with the sphere surface. Sketch
the situation and give the necessary steps for the general case.
b)
How is the associated normal vector to the point of intersection
obtained?
Let there be the eyepoint in the coordinate origin and a billiard ball with
the centre point $$(\sqrt{3},\sqrt{3},\sqrt{3})$$ and the radius
$$r= \sqrt{3}$$.
Determine the visible intersection, if any, of the billiard ball surface
with the raycasting beam, each passing through the pixel
$$(x_1,y_1,z_1)^{T}$$ defined below. Determine the normal vector
for each intersection point visible from the eyepoint.
c)
Specify the visible intersection and the normal vector for the pixel
$$(x_1,y_1,z_1)^{T}=(1,1,1)^{T}$$.
d)
Specify the visible intersection and the normal vector for the pixel
$$(x_1,y_1,z_1)^{T}=(0,1,1)^{T}$$.
References
1. M. F. Cohen, S. E. Chen, J. R. Wallace and D. P. Greenberg. “A Progressive Refinement
Approach to Fast Radiosity Image Generation”. In: SIGGRAPH Comput. Graph. 22.4 (1988), pp.
75–84.

2. M. F. Cohen and D. P. Greenberg. “The Hemi-Cube: A Radiosity Solution for Complex


Environments”. In: SIGGRAPH Comput. Graph. 19.3 (1985), pp. 31–40.

3. C. M. Goral, K. E. Torrance, D. P. Greenberg and B. Battaile. “Modeling the Interaction of Light


between Diffuse Surfaces”. In: SIGGRAPH Comput. Graph. 18.3 (1984), pp. 213–222.

4. H. Gouraud. “Continuous Shading of Curved Surfaces”. In: IEEE Transactions on Computers C-


20 (1971), pp. 623–629.

5. Mark J. Kilgard. OpenGL/VRML Materials. Tech. rep. Abgerufen 26.8.2021. Silicon Graphics
Inc, 1994. URL: http://devernay.free.fr/cours/opengl/materials.html.

6. A. Nischwitz, M. Fischer, P. Haberäcker and G. Socher. Computergrafik. 3. Auflage. Vol. 1.


Computergrafik und Bildverarbeitung. Wiesbaden: Springer Fachmedien, 2011.

7. T. Nishita and E. Nakamae. “Continuous Tone Representation of Three-Dimensional Objects


Taking Account of Shadows and Interreflection”. In: SIGGRAPH Comput. Graph. 19.3 (1985),
pp. 23–30.

8. B. T. Phong. “Illumination for Computer Generated Pictures”. In: Commun. ACM 18.6 (1975), pp.
311–317.

9. D. R. Warn. “Lighting Controls for Synthetic Images”. In: SIGGRAPH Comput. Graph. 17.3
(1983), pp. 13–21.

Footnotes
1 See [5] material property ruby.

2 See [5] material property ruby.

3 $$\alpha $$ is given in radians.

4 See [5] material property ruby.


5 See [5] material property ruby.
© Springer Nature Switzerland AG 2023
K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science
https://doi.org/10.1007/978-3-031-28135-8_10

10. Textures
Karsten Lehn1 , Merijam Gotzes2 and Frank Klawonn3
(1) Faculty of Information Technology, Fachhochschule Dortmund,
University of Applied Sciences and Arts, Dortmund, Germany
(2) Hamm-Lippstadt University of Applied Sciences, Hamm, Germany
(3) Data Analysis and Pattern Recognition Laboratory, Ostfalia University
of Applied Sciences, Braunschweig, Germany

Karsten Lehn (Corresponding author)


Email: [email protected]

Merijam Gotzes
Email: [email protected]

Frank Klawonn
Email: [email protected]

Textures are images or graphics that are applied to the surface of an object.
These are used to define or change the appearance of the surface of an
object. Textures can influence both the colour design and the geometry of
an object. These can be two-dimensional, most commonly in flat
rectangular form, or three-dimensional in curved form. This curvature of
the surface is approximated in OpenGL as a tessellation of planar triangles;
see Chap. 4. The aim is to efficiently represent scenes photorealistically
with a very high level of detail according to the situation without increasing
the complexity of the geometry of the surface. In addition, this technique
allows geometry to be visualised with different textures. Textures can also
be applied to different geometries. In this chapter, only two-dimensional
textures are considered, which are two-dimensional raster graphics mapped
onto a three-dimensional geometry. In the simplest case, textures serve to
use a colour gradient or pattern instead of a single colour. For example, a
patterned wallpaper would use a corresponding texture that is placed on the
walls. In this case, the texture would be applied several times to fill the
walls. A single picture hanging on a wall could also be realised by a texture,
but applied only once.

10.1 Texturing Process


Textures are often used to represent grains or structures. A woodchip
wallpaper gets its appearance from the fact that, in contrast to smooth
wallpaper, it has a three-dimensional structure that is fine but can be felt
and seen. The same applies to the bark of a tree, a brick-built wall or the
drape of a garment. Theoretically, such tiny three-dimensional structures
could also be approximated by even smaller polygons of detail. The
modelling effort and the computational effort for the representation of the
scene would be unacceptable.
Textures can be any images. Usually, they are images of patterns or
grains, for example, a wood grain. If a surface is to be filled with a texture
in the form of a rectangular image, the position where the texture is to be
created must be defined. The area to be filled need not be rectangular. When
drawing, clipping must be performed on the area to be textured. In many
cases, a single rectangular texture is not sufficient to fill a larger area. In
this case, the area should be filled by using the same texture several times
as with a tile pattern. Also, in this case, clipping must be performed on the
area to be textured. The positioning of the texture is specified in the form of
an anchor. The anchor defines a point from which the rectangular texture is
laid in all directions, like tiles. Figure 10.1 shows texture in the form of a
brick pattern on the left. On the right, in the figure, a rectangular area has
been filled with this texture. The black circle marks the anchor. Note that
interpolation is used to efficiently transfer the correct colour values of the
texture to the inside of the polygon.
When a texture is used several times to fill a surface, there are
sometimes clearly visible transitions at the edges where the textures are
joined together. This effect is also shown in Fig. 10.1, especially at the
horizontal seams. To avoid such problems, either a texture must be used
where such effects do not occur, or the texture to be used must be modified
so that the seams fit together better. How this is achieved depends on the
type of texture. For highly geometric textures, such as the brick pattern in
Fig. 10.1, the geometric structure may need to be modified. For somewhat
unstructured grains, for example, marble surfaces, simple techniques such
as colour interpolation at the edges, are sufficient as described in Sect. 6.4.

Fig. 10.1 Filling an area with a texture

A texture is, therefore, a mostly rectangular raster graphic image (see


Chap. 7), which is applied to a (three-dimensional) object surface, as shown
in Fig. 10.2.

Fig. 10.2 Using a texture

The steps of this texturing process are performed as follows for a point
on the surface of the object. The procedure is then explained:
1.
The object surface coordinates $$(x',y',z')$$ of the point are
calculated first.
2.
The (polygon) area relevant for $$(x',y',z')$$ is determined. From
this, surface coordinates (x, y) result relative to this parameterised two-
dimensional surface.
3.
From now on, the so-called texture mapping is applied:
a.
A projection is made into the parameter space to calculate u and v.
b.
The texture coordinates s and t are derived from u and v, with
consideration of correspondence functions $$k_{x}$$ and
$$k_{y}$$, if necessary.
4. The texture coordinates (s, t) are adjusted with the next texel or bilinear
interpolation of the four adjacent texels if (s, t) does not fall precisely
on a texel.
5.
The appearance is modified with the extracted texture values of the
texel (s, t).
The parameterisation and the subsequent texture mapping will be
considered below using the example of a globe, which is a special case. The
globe is tessellated along the lines of latitude and longitude. This results in
triangular areas around the two poles (two-dimensional), as shown in
Fig. 10.3, whereas the remaining polygons are (two-dimensional)
rectangles. It should be noted that other types of tessellations of a sphere
also exist. The radius r of the sphere is constant, and the centre lies in the
coordinate origin. It should be the texture of a globe that can be mapped
onto this sphere; see Figs. 10.3 and 10.5. The resulting vertices on the
tessellated surface of the sphere are described continuously and iteratively
as parameterised two-dimensional coordinates so that they do not have to be
specified individually. For this purpose, the angles $$\alpha $$ and
$$\beta $$ of the latitude and longitude circles of the tessellation are
used as follows (see Fig. 10.4): Each point on the surface can be described
with a suitable angle $$0\le \alpha \le \pi $$ and
$$0\le \beta \le 2\pi $$ in the form
$$(x',y'z')=(r\cdot \sin (\alpha )\cdot \cos (\beta ), r\cdot \sin (\alpha
)\cdot \sin (\beta ),r\cdot \cos (\alpha ))$$
. This means that each corner point (resulting from the tessellation) is
parameterised in two-dimensional coordinates
$$(x,y)=(\beta ,\alpha )$$ (see also polar or spherical coordinates). In
order to map the texture onto this sphere, certain texels can be assigned to
the vertices of the sphere. The texture is not repeated in either the x- or y-
direction ( $$k_{x}=1$$ and $$k_{y}=1$$ for the
correspondence functions). The following applies:
$$u=\frac{x}{x_{\text {width}}}\cdot k_{x}=\frac{\beta }{2\pi
}\cdot 1 = \frac{\beta }{2\pi }$$
and there with $$u\in [0,1]$$ also $$s=\frac{\beta }{2\pi }$$.
Analogously applies to
$$v=\frac{y}{y_{\text {height}}}\cdot k_{y}=\frac{\alpha }{\pi
}\cdot 1 = \frac{\alpha }{\pi } = t$$
.
Fig. 10.3 Tessellation of a sphere

Fig. 10.4 Illustration of the angles $$\alpha $$ and $$\beta $$ that are used in the coordinates
of the surface points of the sphere

Fig. 10.5 Textur NASA’s Blue Marble (land surface, ocean colour, sea ice and clouds) [5]:
visualised with angles $$\alpha $$ and $$\beta $$

This results in the texture coordinates (s, t) of the vertices. To get the
texture coordinates of the points within a triangle or a square of the sphere,
they are interpolated in relation to the texture coordinates (s, t) of the
vertices. The procedure is identical to the other properties of the vertices
during rasterising. As discussed in Chap. 7, vertices (besides the texture
coordinates) are assigned specific additional properties, such as position
coordinates, colour and normals. In the rasterising process, the interpolated
texture coordinates (s, t) (the same as for interpolated position coordinates
and, if necessary, interpolated colour and normals) are determined for all
fragments whose centre is located within such squares or triangles. From
this, the corresponding texel, the texture’s pixel, is calculated and utilised.
For squares, this interpolation is bilinear, and for triangles barycentric. The
interpolation can also be adjusted to consider the perspective projection.
u and v do not necessarily have to be between 0 and 1, as in this
example (for another example, see Fig. 10.12). In the following, therefore,
the cases with u and v are not considered in the interval $$[0,1]$$.
Also, the calculated texture coordinates s and t do not have to correspond
precisely to the coordinates of a texel. This means that in order to obtain a
colour value or material properties of (s, t), they must either be taken from
the closest texel or interpolated between the colour values or material
properties of the four directly adjacent texels. This will be explained later in
this section.
If the values for u and v are outside the interval between 0 and 1, there
are possibilities to continue the textures beyond this range of numbers. The
texture can be applied in the x- or y-direction according to the following
choices:
Repeat, see example in Fig. 10.6,
Mirrored repeat, see example in Fig. 10.7,
Continuation of the values at $$s=1$$ or $$t=1$$ for values outside
these (clamp), see example in Fig. 10.8 or
Any assignment of value so that a border is created, see example in
Fig. 10.9.
In Fig. 10.8, the y-direction is repeated and the continuation clamp is
used in the x-direction, while in Fig. 10.9 a colour is used as a border in the
x-direction.

Fig. 10.6 Texture mapping: Repeated mapping of the texture on the (parameterised two-
dimensional) surface, twice in the x- and the y-direction
(Image source [4])

Fig. 10.7 Texture mapping: Repeated mirror image of the texture on the (parameterised two-
dimensional) surface, two times in the x- and y-directions
(Image source [4])

Fig. 10.8 Texture mapping: Repeated mapping of the texture on the (parameterised two-
dimensional) surface in the y-direction, then in the remaining x-direction, application of the texture
values at $$s = 1$$ is performed
(Image source [4])

Fig. 10.9 Texture mapping: Repeated mapping of the texture on the (parameterised two-
dimensional) surface in the y-direction, followed by the application of any colour value as a border in
conjunction with the x-direction
(Image source [4])

The exact calculation of the responsible texel per fragment is achieved


using so-called correspondence functions. In the following, this is
illustrated through the repeated application of the texture (repeat). As
shown in Fig. 10.10, the texel is calculated to the (parameterised two-
dimensional) point (x, y), which is projected into the parameter space with u
and v. As shown in Fig. 10.6, the texture should be repeated in the x- and y-
directions. In this example, the texture is repeated four times in the x-
direction and twice in the y-direction. Thus, the correspondence functions
are derived in the x-direction with $$k_{x}=4$$ and in the y-direction
with $$k_{y}=2$$. In the following, the correct s, t-values for (x, y)
are calculated. A detour via the parameters u and v in the so-called area
space is done as follows:

Fig. 10.10 Texture mapping: Mapping the texture coordinates (s, t) to the (parameterised two-
dimensional) surface coordinates (x, y)
(Image source [4])

$$\begin{aligned} u \; =\; \frac{x}{x_{\text {width}}}\cdot k_{x}


(10.1)
\end{aligned}$$
and
$$\begin{aligned} v \; =\; \frac{y}{y_{\text {height}}}\cdot k_{y}.
(10.2)
\end{aligned}$$
If the (parameterised two-dimensional) surface has a width of
$$x_{\text {width}}=400$$ and height
$$y_{\text {height}}=200$$ and $$(x,y)=(320,160)$$ for the
(parameterised two-dimensional) point, formulas 10.1 and 10.2 show that
$$u=3.2$$ and $$v=1.6$$ s and t can easily be determined from
u and v, one only needs to look at the decimal places. Thus
$$s=u \mod 1 = 0.2$$ and $$t= v \mod 1 = 0.6$$.
If (s, t) corresponds to the texture coordinates of a texel, its colour or
material properties for the (parameterised two-dimensional) point (x, y) are
taken over. Identifying the corresponding texel is accessible as soon as (s, t)
coincides with it. However, this is generally not the case.
For texture mapping, it is essential to define which procedure should be
performed as soon as the calculated (s, t) is not mapped to a texel but lies
somewhere between four texels. There are two procedures in question.
Either one chooses the colour value of the texel closest to (s, t) (nearest
neighbour) or one interpolates between the colour values of the four texels
directly adjacent to (s, t) with bilinear interpolation. The calculation of the
nearest neighbour is very efficient. In the bilinear interpolation method, the
colour values of all four neighbouring texels of (s, t) are interpolated (see
Fig. 10.11).

Fig. 10.11 Bilinear interpolation

Let $$(s_{i}, t_{j})$$, $$(s_{i},t_{j+1})$$,


$$(s_{i+1},t_{j})$$, $$(s_{i+1},t_{j})$$ and
$$(s_{i+1},t_{j+1})$$ with $$i,j\in \mathbb {N}$$ be these
four neighbouring texels. In addition, let $$C(s_{k},t_{k})$$ be the
colour value belonging to texel $$(s_{k},t_{k})$$ with
$$k\in \mathbb {N}$$. Then C(s, t) is first determined by
interpolation in the s- and then in the t-direction, i.e., bilinear, as follows. In
the s-direction, the result is
$$\begin{aligned} C(p_{1}) = \frac{s_{i+1}-s}{s_{i+1}-s_{i}}\cdot
C(s_{i},t_{j}) + \frac{s-s_{i}}{s_{i+1}-s_{i}}\cdot C(s_{i+1},t_{j})
\end{aligned}$$
$$\begin{aligned} C(p_{2}) = \frac{s_{i+1}-s}{s_{i+1}-s_{j}}\cdot
C(s_{i},t_{j+1}) + \frac{s-s_{i}}{s_{i+1}-s_{i}}\cdot C(s_{i+1},t_{j+1})
\end{aligned}$$
and in the t-direction
$$\begin{aligned} C(s,t) = \frac{t_{j+1}-t}{t_{j+1}-t_{j}}\cdot
C(p_{1}) + \frac{t-t_{j}}{t_{j+1}-t_{j}}\cdot C(p_{2}). \end{aligned}$$
The second procedure is more complicated than the first, but the advantage
is not to be underestimated. While the first method produces hard edges, the
second method works like a blur filter because of the interpolation, blurring
the transitions and therefore drawing them smoothly into each other. The
second procedure, interpolation, thus reduces the staircase effect at edges.
In this way, the primary colour of the surface induced by the texture is
obtained at the point under consideration. Figure 10.10 visualises this
relationship. This value must then be added to the lighting (see Chap. 9).
Here, one can additionally consider whether the texture should be glossy or
matt. It should be added that in addition to the individual assignment of
texture coordinates, there are also procedures for the automatically
calculated transfer of a texture. In this case, the object is wrapped with
simple primitive geometry, for example, a cuboid, a sphere or a cylinder.
Texture mapping is carried out on this primitive geometry to map it suitably
onto the object. It is also possible to calculate the texture with a
mathematical function.

Fig. 10.12 Texture mapping: Unique size-mapping of the texture onto the (parameterised two-
dimensional) surface
(Image source [4])

10.1.1 Mipmap and Level of Detail: Variety in Miniature


Perspective projection makes nearby objects appear larger, while more
distant objects appear smaller. The same applies to the corresponding
polygons. If one wants to apply textures, one obviously needs textures with
a higher resolution on close polygons, while those textures further away are
not very useful. Note that with texture mapping on distant polygons,
aliasing can still occur despite the use of bilinear interpolation. The
challenge that arises with more distant polygons is that many texels are
mapped to the surroundings of a fragment by perspective projection, and
thus the choice of only four colour values of directly adjacent pixels is not
sufficient for the calculation of the colour value. In this case, in order to
calculate the correct colour value, not only these four but all the texels in
the vicinity would have to be taken into account in the interpolation. If, on
the other hand, precisely one texture in a given resolution was chosen,
many pixels would meet a few texels for close objects (so-called
magnification), and many texels would meet a single pixel for more distant
objects (so-called minification). This would be a very complex calculation.
Instead of calculating it precisely, one chooses the approximation
method. It can be seen that at the optimal resolution, one pixel should fall
precisely on one texel or, in the case of bilinear interpolation, only the
colour values of the directly adjacent pixels should be of importance. This
is the reason for the so-called mipmapping, where mip is derived from
Multum in parvo (variety in miniature). There is a mipmap in the form of a
Gaussian pyramid, in which different resolutions of texture are stored. For a
polygon, the one that contains the most useful resolution is chosen.
Meaningful in this context means that both procedures (colour value
determined according to a nearest neighbour or bilinear interpolation)
provide consistent results. Figure 10.13 describes such a (here simplified)
representation of different resolutions for the realisation of the level of
detail.

Fig. 10.13 Simple example of the different resolutions for carrying out a Mipmapping: On the left is
the highest resolution level visualised from which, looking from left to right, the three further
hierarchical levels are created by interpolation

There are different types of implementations. For example, if you start


with the highest resolution and delete every second row and column in
every step, you save the resulting resolution to the next hierarchy level of
the pyramid. The conversion in Fig. 10.13 is done by dividing the fragments
into contiguous groups of four (delimited by broad lines), so-called quads,
whose interpolation, here averaging the RGB values, is stored in the
corresponding pixel of the next hierarchical level, also called mipmap level.
This corresponds to low-pass filtering and reduces aliasing. A mipmap is,
therefore, a pyramid of different resolutions, whose resolution of the next
hierarchy level (mipmap level) differs from the previous resolution by
exactly half. With this procedure, one can now use a lower resolution for
texture mapping of distant polygons. This means that the calculation of the
colour values must be carried out on a hierarchy level (mipmap level),
whose resolution is just coarse enough that the nearest neighbour or bilinear
interpolation delivers meaningful results. This is achieved by the
approximation of the most significant gradient increase in the x- or y-
direction, i.e., the approximation of the first (partial) derivative in the x- and
y-directions. This requires only a simple colour value difference in the
quads in the x- or y-direction. The higher the difference is, the lower the
resolution level (mipmap level) that must be selected.

10.1.2 Applications of Textures: Approximation of Light,


Reflection, Shadow, Opacity and Geometry
Textures have many applications. In the following, various applications are
briefly presented. One can define a background texture, for example, to
create a skydome. More complex, more realistic lighting techniques, like
the radiosity model presented in Sect. 9.8, are sophisticated in the real-time
calculation. Under certain circumstances, however, they can be used to
precalculate at least diffuse reflection and apply it to the surfaces as a
texture—a so-called lightmap—so that afterward, only mirror reflection
must be taken into account. A similar approach can be implemented for
mirroring and mapped as a gloss map.
Another similar method can be used for the calculation of shadows, in
the so-called shadow map. Here, the modified z-buffer algorithm for
calculating shadows in relation to a light source is performed in advance.
The distances to the light source are written as depth values into the depth
buffer. These values are synonyms for the fragments that are directly
illuminated by the light source. All other fragments of the same pixel,
which are further away, must, therefore, be in the shadow of this light
source.
Textures can have an alpha channel. When they are drawn, the visibility
order must, therefore, be respected. Alpha mapping takes opacity into
account. The following values are assigned: 0 for completely transparent,
255 for opaque and values in between depending on their intensity for
partially transparent fragments. When using alpha mapping, the order in
which the fragments of a pixel are drawn is of utmost importance to achieve
the desired result. Opacity is discussed in detail in the colour chapter (see
Chap. 6).
Environment or reflection mapping is a method for displaying reflective
surfaces, for example, smooth water surfaces or wall mirrors. The viewer is
mirrored on the plane defined by the reflecting surface. This point is used as
a new projection centre for a projection onto the mirror plane. The resulting
image is applied to the mirror as a texture to create the scene with the
original to project the observer’s point of view as a projection centre.
Figure 10.14 illustrates this procedure. Figure 10.15 shows an example.

Fig. 10.14 Displaying a mirror using reflection mapping

Fig. 10.15 Textured Utah teapot: environment mapping

When textures are used to represent relief-like structures, the reliefs


appear flat, especially when illuminated by a concise light source, because
no real three-dimensional information is contained in the texture. If this
texture is applied to a cuboid, for example, the lighting calculation cannot
create a convincing three-dimensional illusion by shading. This is because
only one normal vector per side surface of the box can be included in the
illumination calculation. In fact, there are four normals, which are located
in the corners of the cuboid surface (all four are equally oriented). Fine
surface structures of the texture fall additionally within a surface triangle of
the rough tessellation of a cuboid side.

Fig. 10.16 Bump mapping

Fig. 10.17 A box without (top) and with (bottom) bump mapping

Note that up to now, the illumination calculation has been based on the
interpolated normals of the corresponding normal vectors of the vertices of
a polygon (usually a triangle). Also, note that potential normals adapted to a
three-dimensional relief structure (which is visualised by the texture)
usually have a different orientation than the interpolated normals of the
vertices. The challenge here is that the lighting calculation takes place on a
flat (polygon) surface, thus suggesting a flat surface while the texture
inconsistently visualises a high level of detail of the surface structure. This
leads to a striking discrepancy.
To create a better three-dimensional effect of such a texture on the (flat)
surface, bump mapping [1] is used. The surface on which the texture is
applied remains flat. In addition to the colour information of the texture, the
bump map stores information about the deviations of the normal vectors in
order to calculate the perturbed normal vectors belonging to the surface of
the relief structure. In the bump map a disturbance value B(s, t) is stored at
each texture point (s, t), which is the corresponding point P on the surface
to which the texture is applied, in the direction of the perturbed normal
vector. If the surface is given in the parameterised form and the point to be
modified $$P = (x ,y)$$ with corresponding texture coordinates (s, t),
the nonnormalised perturbed normal vector at (s, t) for the parameterised
point P results from the cross product of the partial derivatives to s and t in
the texture (note that the partial derivative can be approximated via the
colour value differences in the s- and the t-direction):
$$ \textbf{n} \; = \; \frac{\partial P}{\partial s} \times \frac{\partial P}
{\partial t}. $$
If B(s, t) is the corresponding bump value at (s, t) for P, the new (with the
disturbance value B(s, t) shifted in direction $$\textbf{n}$$)
parameterised point $$P'$$ is obtained
$$ P' \; = \; P + B(s,t)\cdot \frac{\textbf{n}}{\parallel \textbf{n} \parallel
} $$
on the (parameterised) surface. A good approximation for the new
perturbed normal vector $$\mathbf{n'}$$ in this point $$P'$$
then provides
$$ \mathbf{n'} \; = \; \frac{\textbf{n} + \textbf{d}}{\parallel \textbf{n} +
\textbf{d} \parallel } \text{ mit } \textbf{d} \; = \; \frac{ \frac{\partial B}
{\partial s} \cdot \left( \textbf{n} \times \frac{\partial P}{\partial t} \right) -
\frac{\partial B}{\partial t} \cdot \left( \textbf{n} \times \frac{\partial P}
{\partial s} \right) }{ \parallel \textbf{n} \parallel }. $$
Bump mapping allows varying normal vectors to be transferred to an
otherwise flat or smooth plane. The bump mapping, therefore, changes the
(surface) normals. Figure 10.16 illustrates how the perturbed normal vectors
belonging to a small trough are mapped to a flat surface. Note, however,
that no height differences are visible from the side view. Also, shadows are
not shown correctly due to the fine structures. Figure 10.17 shows a box
with and without bump mapping.
The new perturbed normals are used for the calculations in the Blinn–
Phong lighting model, as already mentioned. This changes the diffuse and
specular light components. Figure 10.18 clearly shows this change in the
diffuse and specular light portion. The superposition (which is shown as an
addition in this figure) results in the boxes in Fig. 10.17.

Fig. 10.18 A box and its light components of the Blinn–Phong lighting model without (top) and
with (bottom) bump mapping with the result shown in Fig. 10.17
The illusion of the three-dimensional relief structure is visible on each
side of the block. Such effects work well on flat surfaces.
Further optimisation of the bump mapping is the so-called normal
mapping. The approach is identical to bump mapping, with the difference
that in normal mapping, the perturbed normals are saved in texture, the so-
called normal map, and used in the subsequent illumination calculation.
This makes access to the new perturbed normals efficient. The procedure is
explained in the following. A normal N is usually normalised, which means
that $$-1\le x,y,z \le 1$$ for the normal coordinates (with a length of
normal equal to one) is valid. The normalised normals are stored per pixel
in a texture with RGB colour channels in the following form:
$$r=(x + 1)/2$$, $$g=(y+1)/2$$ and $$b=(z+1)/2$$.
When reading out from the normal map, this mapping is undone (i.e.,
$$x=2r-1$$, $$y=2g-1$$ and $$z=2b-1$$), and these
standards are then used in the lighting calculation.
The normal map can be easily created from a colour texture with an
image processing program such as Photoshop or GIMP, and can also be
used as a texture in OpenGL. The colour texture to be used is first
converted into a two-dimensional gradient field. Gradients can be
interpreted by approximating the first (partial) derivative in the x- and y-
directions, in the form of a height difference of neighbouring pixels in the
corresponding digital brightness or grey value image. Approximately, this
brightness or grey-value image can thus be understood as a heightmap. The
local differences of the brightness or grey values of near fragments in the x-
or y-direction in this heightmap are thus approximately calculated as
gradients in the x- or y-direction. These can be calculated as tangents in the
x- and y-directions and calculate the respective normal with the cross
product (see Sect. 4.7). After the normalisation, these standards are stored
as RGB colour values in a standard map, as described above. Algorithms
for creating a normal map are explained in [3] and [6]. Figure 10.20 shows
the height and normal map of the texture visualised in Fig. 10.19.

Fig. 10.19 Old stone brick wall as texture

Fig. 10.20 Heightmap (top) and normal map (bottom) of the texture from Fig. 10.19
The lighting conditions do not seem to be consistent, no matter whether
one uses bump or normal mapping. This is due to the curved outer surface
that is to be simulated on the Utah teapot. This means that the interpolated
normal vectors per corner point of a triangular surface differ and are used
for the calculation per fragment that can be interpolated from these.
However, these calculated perturbed normal vectors of the bump and
normal map refer to a flat surface. This does not generally apply. The
curvature behaviour must, therefore, be additionally mapped in order to
create the right illusion on curved surfaces. In order to map the curvature
behaviour correctly to the three vectors of the illumination calculation per
pixel (light vector L, the direction of view vector V and normal vector N;
see Chap. 9), a separate tangent coordinate system is installed for each
fragment. For each fragment, a new coordinate system is created, which
consists of two tangent vectors and the normal vector, the so-called shading
normal, which consists of the corresponding vertices of the (polygon)
surface interpolated from the normals. Both tangent vectors can be
approximated in any fragment from the previously discussed two gradients
in the x- and the y-direction of the heightmap. Note that these coordinate
systems do not necessarily have to be orthogonal due to the curvature of the
map. However, one can approximately assume that they are “almost”
orthogonal. For this reason, orthogonality is often heuristically assumed.
This simplifies some considerations, especially those of the coordinate
system change because the inverse matrix of the orthonormal matrices is the
transposed matrix. Nevertheless, care must be taken, because any
approximation can also falsify the result, if one expects accuracy. These
calculations are carried out either in the model coordinate system or in the
tangential coordinate system, and for this purpose, either the (interpolated)
surface normal or the perturbed normals are transformed. The two vectors L
and V of the illumination calculation must then be transformed by
coordinate system transformation (see Sect. 5.13) into the coordinate
system in which the perturbed normal vector N is present. These are thus
included in the lighting calculation, including the curvature behaviour.
The tangential coordinate system Is based on the assumption that the
non-perturbed (i.e., interpolated) normal vector n is mapped to (0, 0, 1), i.e.,
the z-axis of the tangential coordinate system. Each perturbation applied
causes a deviation of this vector (0, 0, 1) in the x- or the y-direction. Since it
is assumed that the normal vectors are normalised, a deviation can only
occur in the hemisphere with $$0\le z\le 1$$. Apparently, it is
sufficient to consider the difference in the x- and the y-direction. This
difference can be calculated exactly from values read from the bump map or
normal map. Since in a normal map, the RGB values range from 0 to 255,
i.e., normalised from 0 to 1, the normalised RGB colour value 0
corresponds to the normal coordinate $$-1$$. The normalised RGB
colour value 1 corresponds to the normal coordinate 1. The normal
coordinates are linearly interpolated as RGB colour values between 0 and 1.
The normal vector (0, 0, 1), therefore, is displayed as RGB colour value
(0.5, 0.5, 1.0). If the illumination calculation takes place in this tangential
coordinate system with the transformed three vectors, it represents the
illumination conditions as intensities between 0 and 1. The transformation
from the tangential coordinate system into the model coordinate system can
be performed by coordinate system transformation (see Sect. 5.13) and thus
by matrix multiplication with the following matrix. In homogeneous
coordinates, the three basis vectors T, B and N are used as columns, and
(0, 0, 0) as the coordinate origin:
$$\left( \begin{array}{cccc} T_x &{} B_x &{} N_x &{}
0\\ T_y &{} B_y &{} N_y &{} 0\\ T_z &{} B_z
&{} N_z &{} 0\\ 0 &{} 0 &{} 0 &{} 1
\end{array} \right) $$
with $$N=(N_x,N_y,N_z)^{T}$$ the normal vector,
$$T=(T_x,T_y,T_z)^{T}$$ the corresponding tangent vector and
$$B=(B_x,B_y,B_z)^{T}$$ the corresponding bitangent vector. This
matrix is called TBN matrix. The mathematical considerations necessary for
this can be found under [3]. Notice that these three basis vectors T, B and N
usually differ per fragment. This means that for each fragment, this matrix
has to be set up anew, and the three vectors in question have to be
accordingly in order to be able to carry out the lighting calculation only
afterward. Note also that the inverse of the TBN matrix is required to invert
the mapping. Since it is an orthonormal matrix, the TBN matrix has to be
simply transposed. Based on this fact, besides the model coordinate system,
it can also be converted mathematically into the world coordinate system,
as it can also be transformed into the camera coordinate system. From
which of the three coordinate systems (model, world and camera) one
transfers into these tangential coordinate systems depends on the coordinate
system in which the three vectors V, L and N are present; see Figs. 10.21
and 10.22.

Fig. 10.21 Possibilities to transform into the tangent space using the TBN matrix

Fig. 10.22 Tangential coordinate systems on a curved surface in three points corresponding to three
pixels

While the bump or normal mapping changes the normals and not the
geometry (but creates this illusion), the displacement mapping manipulates
the geometry. This can be exploited by giving an object only a rough
tessellation to refine the surface in combination with a displacement map.
For example, the coarse tessellation of the Utah tessellation in Fig. 10.23,
which is only four per cent of the sophisticated tessellation, could be
sufficient to create the same fine geometry when combined with an
appropriate displacement map.

Fig. 10.23 Different coarse tessellations of the Utah teapot

This displacement map thus contains information about the change of


the geometry. With this technique, one only needs different displacement
maps to create a differently finely structured appearance of such a coarsely
tessellated object. This means that applying multiple displacement maps to
the same rough geometry requires less memory than if one had to create
multiple geometries.
In conclusion, it can be said that more realism can be achieved by
combining texture mapping techniques. Also, several textures can be linked
per surface. It has to be checked if the order of linking and the kind of
mathematical operations used play an important role.
Besides two-dimensional textures, there are also three-dimensional
ones. Texture coordinates are extended by a third dimension r besides s and
t. Three-dimensional textures are used, for example, in medical imaging
procedures.
10.2 Textures in the OpenGL
This section provides an introductory look at texture mapping. The OpenGL
specifications are available at [7]. In the fixed-function pipeline, the texture
coordinates are set per corner point with the command glTexCoord2f.
These are applied per corner points in the following way:

In the programmable pipeline, the positions of the vertices, their


normals and texture coordinates of the object must be stored in the buffer of
the graphics processor. In the following, the texture coordinates are
prepared for the vertex shader during the initialisation of the object with an
ID, similar to the other properties of the vertices. In the following example,
ID 3 is chosen because the other properties (positions, normals and colours)
already occupy the IDs 0 to 2. In the vertex shader, for each vertex, this ID
is accessed with location $$==3$$, and so are the corresponding
texture coordinates:

Also, the texture must be loaded from a file and transferred to the
buffer. The background is that the normal and texture coordinates are first
written into the main memory of the CPU so that the graphics processor can
use them for later calculations. The driver decides when to copy from the
main memory to the graphics memory.

The determination of the colour values according to nearest neighbour


or bilinear interpolation can be realised with the parameters
$$\mathtt{GL\_NEAREST}$$ or $$\mathtt{GL\_LINEAR}$$.
In OpenGL, the $$\mathtt{GL\_TEXTURE\_MIN\_FILTER}$$ or
$$\mathtt{GL\_TEXTURE\_MAG\_FILTER}$$ is set to either
$$\mathtt{GL\_NEAREST}$$ or $$\mathtt{GL\_LINEAR}$$.
$$\mathtt{GL\_TEXTURE\_MAG\_FILTER}$$ sets the filter for
magnification and $$\mathtt{GL\_TEXTURE\_MIN\_FILTER}$$ for
minification. For $$\mathtt{GL\_TEXTURE\_MIN\_FILTER}$$, the
following additional parameters are available since mipmapping is only
useful in this case:
$$\mathtt{GL\_NEAREST\_MIPMAP\_NEAREST}$$:
$$\mathtt{GL\_NEAREST}$$ is applied, and the mipmap level is
used, which covers approximately the size of the pixel that needs to be
textured.
$$\mathtt{GL\_NEAREST\_MIPMAP\_LINEAR}$$:
$$\mathtt{GL\_NEAREST}$$ is applied, and the two mipmap levels
are used, which cover approximately the size of the pixel that needs to be
textured. This results in one texture value per the mipmap level. The final
texture value is then a weighted average between the two values.
$$\mathtt{GL\_LINEAR\_MIPMAP\_NEAREST}$$:
$$\mathtt{GL\_LINEAR}$$ is applied, and the mipmap level is used,
which covers approximately the size of the pixel that needs to be
textured.
$$\mathtt{GL\_LINEAR\_MIPMAP\_LINEAR}$$:
$$\mathtt{GL\_LINEAR}$$ is applied and both mipmap levels are
used, which cover approximately the size of the pixel that needs to be
textured. This results in one texture value per mipmap level. The final
texture value is a weighted average between the two values.
To activate and calculate mipmapping, the parameters must be set as
follows:

In addition, $$\mathtt{GL\_NEAREST}$$ or
$$\mathtt{GL\_LINEAR}$$ can be set as parameters, i.e., the
selection of the colour value of the next pixel or the bilinear interpolation of
the colour values of the directly adjacent pixels (usually the four directly
adjacent neighbouring pixels that have one side in common). After that,
rules are defined, which deal with the well-known problem of texture
mapping as soon as the texture coordinates are outside the values 0 and 1
(see Sect. 10.1). The behaviour can be modified using
texture.setTexParameteri in the following way. Among others,
the following parameters are available:
$$\mathtt{GL\_REPEAT}$$: Repeat,
$$\mathtt{GL\_MIRRORED\_REPEAT}$$: Repetitive mirroring,
$$\mathtt{GL\_CLAMP}$$: Continuation of the values at $$s=1$$
or $$t=1$$ for values outside these or
$$\mathtt{GL\_CLAMP\_TO\_BORDER}$$: Any assignment of
value so that a border is created.
It describes the texture mapping (
$$\mathtt{GL\_TEXTURE\_2D}$$) with a single repetition (
$$\mathtt{GL\_REPEAT}$$) in the s-direction (
$$\mathtt{GL\_TEXTURE\_WRAP\_S}$$). The t-direction is
converted analogously.

Thus, the behaviour of the texture during its repetition in the s- and t-
directions can be designed independently of each other. The following
source code

represents another option of repetition in the t-direction.


Furthermore, one can define how the texture should be mixed with other
textures or colour values. One can also define how, for example, two
textures should be merged into each other or how the alpha value should be
used for opacity with the colour value to be used. This is done by the
function glTexEnvf, which can be used in the fixed-function pipeline as
well as in the programmable pipeline.

The parameters shown in Table 10.1 can be selected. Besides


$$\mathtt{GL\_REPLACE}$$, $$\mathtt{GL\_ADD}$$,
$$\mathtt{GL\_BLEND}$$, $$\mathtt{G\_MODULATE}$$,
$$\mathtt{GL\_DECAL}$$ or $$\mathtt{GL\_COMBINE}$$
can be used. As explained on the OpenGL reference page of Khronoss [2],
the formulae explained in Table 10.1 emerge. In the following,
$$C_{p}$$ is the current pixel colour, $$A_{p}$$ the current
texture alpha value of the fragment, $$C_{s}$$ the texture colour,
$$C_{c}$$ the texture environment colour and $$A_{s}$$ the
texture alpha value. Furthermore, $$C_{v}$$ is the resulting texture
colour, and $$A_{v}$$ is the resulting texture alpha value. The
texture can be in RGB format (the alpha value is then set to 1) or combined
with the alpha value, which lies between 0 and 1, the so-called RGBA
format (see Chap. 6).
Table 10.1 Calculation of the parameters $$\mathtt{GL\_REPLACE}$$,
$$\mathtt{GL\_ADD}$$, $$\mathtt{GL\_BLEND}$$, $$\mathtt{G\_MODULATE}$$,
$$\mathtt{GL\_DECAL}$$, $$\mathtt{GL\_COMBINE}$$

GL $$\_$$RGB GL $$\_$$RGBA
GL $$\_$$ $$C_{v}=C_{s}$$ $$C_{v}=C_{s}$$
REPLACE
$$A_{v}=A_{p}$$ $$A_{v}=A_{s}$$
GL $$\_$$ $$C_{v}=C_{p} + C_{s}$$ $$C_{v}=C_{p} + C_{s}$$
ADD
$$A_{v}=A_{p}$$ $$A_{v}=A_{p}\cdot A_{s}$$
GL $$\_$$ $$C_{v}=C_{p}\cdot (1-C_{s}) + $$C_{v}=C_{p}\cdot (1-C_{s}) +
BLEND C_{c}\cdot C_{s}$$ C_{c}\cdot C_{s}$$
$$A_{v}=A_{p}$$ $$A_{v}=A_{p}\cdot A_{s}$$
GL $$\_$$ $$C_{v}=C_{p}\cdot C_{s}$$ $$C_{v}=C_{p}\cdot C_{s}$$
MODULATE
$$A_{v}=A_{p}$$ $$A_{v}=A_{p}\cdot A_{s}$$
GL $$\_$$ $$C_{v}=C_{s}$$ $$C_{v}=C_{p}\cdot (1-A_{s}) +
DECAL C_{s}\cdot A_{s}$$
$$A_{v}=A_{p}$$ $$A_{v}=A_{p}$$

Finally, with texture.enable(gl) and gl.glActiveTexture


$$\mathtt{(GL\_TEXTURE0)}$$ texture mapping with type
$$\mathtt{GL\_TEXTURE\_2D}$$ of the texture (further options are
$$\mathtt{GL\_TEXTURE\_1D}$$,
$$\mathtt{GL\_TEXTURE\_3D}$$,
$$\mathtt{GL\_TEXTURE\_CUBE\_MAP}$$) activated in 0, the
texture is bound to a texture unit with $$\mathtt{gl.glBindTexture}$$
$$\mathtt{(GL\_TEXTURE\_2D}$$,
texture.getTextureObject(gl)), and loaded into the texture
buffer with calls like glTexImage2D. A texture unit is a functional unit
(hardware) on the GPU.
This concept describes the access to the texture image. One selects these
units with gl.glActiveTexture
$$\mathtt{(GL\_TEXTURE0)}$$ and makes it an active buffer for
future operations. For more effects, one can also use
$$\mathtt{GL\_TEXTURE1}$$,
$$\mathtt{GL\_TEXTURE2}$$, etc. With a sample, one can bind
several texture units at the same time and then apply multi-texturing (the
possibility to draw several textures on one primitive), for example.

In the vertex shader, as mentioned at the beginning of this section, one


can access the corresponding texture coordinates per vertex using

The variable vInUV enters the vertex shader as in variable and leaves
it as out variable to be able to be included in the fragment shader again as
in variable. Variables, intended from the vertex shader for further
processing in the fragment shader, can be defined as out variables. The
naming of the out variable can be different from the in variables.
Furthermore, the fragment shader should have access to this texture. This is
done by the predefined type sampler2D for two-dimensional texture use:

In fragment shaders, textures are accessed via so-called samples.


Depending on the dimension, there are different variants: sample1D,
sample2D, sample3D, sampleCube, etc. Samples are used as uniform
variables.
Figures 10.24–10.26 show the corresponding source code.
Fig. 10.24 Initialisation of a texture calculation (Java)

Fig. 10.25 Implementation of a texture calculation in the vertex shader (GLSL)

In the following, the implementation of bump mapping with normal


maps (normal mapping) in OpenGL is explained. On flat surfaces, the
perturbed normal vectors can be extracted from the normal map with the
following source code in GLSL:

These normal vectors only have to be converted from the RGB colour
values of the normal map into the range $$[- 1 ,1]$$ (see
Sect. 10.1.2). If the other vectors for Blinn–Phong illumination calculations
are available in the camera coordinates, the perturbed normal vectors must
also be transformed from the world coordinate system into the camera
coordinate system. Then the normal vectors (instead of the interpolated
normal vectors) must be mapped with the transposed inverse of the total
matrix that calculates the world coordinate system to the camera
coordinates system. The transformed perturbed normal vectors can then be
included with the other vectors in the illumination calculation.
To texture the Utah teapot, the tangent coordinate systems must be
created to achieve the illusion of a curved surface. Since both the viewing
vector and the light direction vector are present in the camera coordinate
system, the disturbed normal vectors per fragment are transformed from
their local tangent coordinate system into the camera coordinate system. For
this purpose, the TBN matrix of the last Sect. 10.1 per fragment is applied.
This is shown in the source code of Fig. 10.28, and the lighting calculation
is also performed. The method for creating the TBN matrix is shown in
Fig. 10.27.
The following call is used to pass the modified normals that are
included in the lighting calculation: Here, the vector order projection
direction V, the normal vector N and the texture coordinates vUV.
Fig. 10.26 Implementation of a texture calculation in the fragment shader (GLSL)

Fig. 10.27 Creating the TBN matrix for transformation into tangent space in the fragment shader
(GLSL)

Fig. 10.28 The calculation of the disturbed normals from the normal map and their transformation
into the tangent space in the fragment shader (GLSL): These normal vectors are further used in the
fragment shader in the illumination calculation

Since displacement mapping changes the geometry in the form of a


more complex tessellation, it is not used in the vertex or fragment shader
but in the so-called tessellation unit. The tessellation unit contains the
tessellation control shader and the tessellation evaluation shader (see
Chap. 2).
The corresponding commands in OpenGL for three-dimensional
textures are similar to the two-dimensional ones. Usually, only the
exchange of 2D with 3D is done in the function name.

10.3 Exercises
Exercise 10.1 Use a custom image in jpg format as a texture for several
cylinders of different sizes. Write an OpenGL program that implements
this.

Exercise 10.2 An image used as a background remains unchanged even if


the viewer moves. Place a background image on the front surface of a
distant, large cube so that the viewer experiences a change in the
background when he moves, at least within specified limits. Write an
OpenGL program that implements this.

References
1. J. F. Blinn. “Simulation of Wrinkled Surfaces”. In: SIGGRAPH Computer Graphics 12.3 (1978),
pp. 286–292.
2.
The Khronos Group Inc. “OpenGL 2.1 Reference Pages”. Abgerufen 03.04.2019. 2018. URL:
https://www.khronos.org/registry/OpenGL-Refpages/gl2.1/xhtml

3. E. Lengyel. Mathematics for 3D Game Programming and Computer Graphics. Boston: Course
Technology, 2012.
[zbMATH]

4. NASA. “Taken Under the ‘Wing’ of the Small Magellanic Cloud”. Abgerufen 08.02.2021,
19:21h. URL: https://www.nasa.gov/image-feature/taken-under-the-wing-of-the-small-
magellanic-cloud.

5. NASA. “The Blue Marble: Land Surface, Ocean Color, Sea Ice and Clouds (Textur)”. Abgerufen
07.03.2019, 22:51h. URL: https://visibleearth.nasa.gov/view.php?id=57735.

6. A. Nischwitz, M. Fischer, P. Haberäcker and G. Socher. Computergrafik. 3. Auflage. Vol. 1.


Computergrafik und Bildverarbeitung. Wiesbaden: Springer Fachmedien, 2011.

7. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6 (Core
Profile)–October 22, 2019). Abgerufen 8.2.2021. The Khronos Group Inc, 2019. URL: https://
www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf
© Springer Nature Switzerland AG 2023
K. Lehn et al., Introduction to Computer Graphics, Undergraduate Topics in Computer Science
https://doi.org/10.1007/978-3-031-28135-8_11

11. Special Effects and Virtual Reality


Karsten Lehn1 , Merijam Gotzes2 and Frank Klawonn3
(1) Faculty of Information Technology, Fachhochschule Dortmund,
University of Applied Sciences and Arts, Dortmund, Germany
(2) Hamm-Lippstadt University of Applied Sciences, Hamm, Germany
(3) Data Analysis and Pattern Recognition Laboratory, Ostfalia University
of Applied Sciences, Braunschweig, Germany

Karsten Lehn (Corresponding author)


Email: [email protected]

Merijam Gotzes
Email: [email protected]

Frank Klawonn
Email: [email protected]

Supplementary Information
The online version contains supplementary material available at https://doi.
org/10.1007/978-3-031-28135-8_11.

This chapter contains selected special topics in computer graphics. Since


virtual reality (VR) applications are an important application area of
computer graphics, factors are explained that can create a high degree of
immersion in such applications so that a user feels truly present in a virtual
environment. Simulations of fog, particle systems or dynamic surfaces can
create realistic effects in computer graphics scenes. For interactive
computer graphics, the selection of objects and the detection and handling
of collisions are important. This allows users to explore and manipulate
three-dimensional virtual worlds. For these topics, this chapter contains the
technical basics, supported by OpenGL examples for most topics. Since the
sense of hearing contributes greatly to immersion and thus presence, this
chapter presents some important findings and technical principles for
auralising acoustic scenes. The last part of this chapter contains a summary
of important factors for the creation of a visual impression of depth in a
scene. The focus is on seeing with both eyes (binocular vision) and the
technical reproduction through stereoscopic output techniques. These
techniques are used in 3D television and virtual reality headsets.

11.1 Factors for Good Virtual Reality


Applications
For the simulation of a virtual environment, the stimulation of the sense of
sight through images and image sequences is crucial, as this sense is the
dominant human sense. (1) The more the observer feels surrounded by the
stimulation, (2) the clearer the stimulation is and (3) the more interactions
with the scene and the objects are possible, the more the observer gets
involved in the virtual world. These points are three factors of the so-
called immersion. Two other factors of immersion are (4) the number of
senses involved, such as sight, hearing, touch, smell, taste, balance, body
sensation, temperature sensation and pain sensation and (5) the degree of
correspondence between these sensory modalities. Another important factor
for immersion is (6) the consistency of the possible actions enabled by the
virtual representation. In other words: “Immersion is the objective degree to
which a VR system and application projects stimuli onto the sensory
receptors of users in a way that is extensive, matching, surrounding, vivid,
interactive, and plot conforming.” [19, p. 45 after Slater and Wilbur 1997].
If there is a high enough degree of immersion, the user can feel present
in the virtual world, which is the perception of actually “being there” in the
simulated world. In order to experience presence, the user himself plays a
very important role. Only when he or she willingly engages with the
situation in the simulation true presence can arise. Immersion is therefore
the important technical prerequisite for experiencing presence, so that the
user feels present [19, p. 46–47].
For the sake of completeness, it should be noted that the described
monotonic correlation between the degree of immersion and presence
applies in principle, but partially breaks down for renderings and
animations of human characters. This breakdown is described by the so-
called uncanny valley. When artificial human characters or robots in the
real world become very similar to real humans, but not yet similar enough,
a sudden rejection occurs in the perceiver. This effect can be seen, for
example, when viewing or coming into contact with prosthetic limbs or
cadavers. Only when the similarity increases again and the simulation is
very close to the human (behaviour), the acceptance increases again [22].
For this reason, first animated films in which real people are animated were
not very successful, because the simulation was close but not close enough
to the human (behaviour). Solutions are, for example, the introduction of a
higher degree of abstraction, as is the case in cartoons, or very detailed and
natural renderings and animations.
Immersion can be increased, for example, by enabling the user to
interact with the virtual scene as fully as possible. This includes free
movement possibilities in the virtual world. This requires position and
orientation sensors (tracking systems) and possibly eye-tracking systems,
through which the position, orientation and viewing direction of the user
can be recorded. The free navigation in a scene ultimately requires the
change of the viewer position, which has already been explained in the
context of projections in Sect. 5.8 and already made possible in most
sample programs offered with this book. For this, the mouse must be used
instead of moving freely. However, the main computation techniques do not
differ between virtual reality applications and standard computer graphics.
The same applies to interactions with objects in the scene (see Sects. 11.7
and 11.8). Even without special input devices, such as flysticks or data
glove, this interaction can be simulated using the mouse.
Interaction also means that objects in the scene can be influenced, for
example, moved or scaled. The underlying principles can be understood
using two-dimensional displays, like computer monitors. Although
commercial applications for designing virtual worlds directly in the virtual
three-dimensional world have been available for the last few years, the
design and development of virtual reality applications is still largely done
using two-dimensional displays and mouse and keyboard interaction.
Furthermore, the immersion of virtual reality applications can be
significantly increased by the possibility of stereoscopic viewing by the
user (see Sect. 11.12), called stereoscopy. For this purpose, special output
devices such as stereoscopic displays using special glasses (colloquially
“3D glasses”) or head-mounted displays (colloquially “VR glasses”) are
available.
For today’s virtual reality applications, the presentation of visual and
auditory stimuli is the most advanced compared to the stimulation of other
human senses. Moreover, interaction with the scene is almost always
possible. Since the sense of hearing is the second most important human
sense after the sense of sight, the foundation of auralisation of virtual
acoustic scenes is given in Sect. 11.11. Haptic stimulation occurs little or in
simple form in such applications, for example, as vibration in a hand
control (controller). With certain data gloves, tactile stimulation is possible.
Some of these devices can be used as input devices for user interaction. In
some more advanced applications, the sense of balance is addressed by
treadmills or by a moving surface, called motion platform. All other sensory
modalities are very rarely used.
When human senses conflict to each other at a high degree of
immersion, the perceptual system can become irritated and react with
nausea, dizziness, sweating or even vomiting. This occurs, for example,
when flying virtually (and only visually) through a scene when the user is at
the same time standing or sitting still in the real physical world. In this case,
the information from the sense of sight and balance to the brain do not
reflect the real-world situation. This situation can lead to the so-
called motion sickness, which causes the symptoms mentioned above. A
more general term for these types of effects is VR sickness, which covers
other situations in virtual reality applications that can trigger symptoms of
illness [19, p. 160]. These include, for example, the delay (latency) between
the user’s input and the realisation of the effect or change in the virtual
world. This latency in interaction is one of the most important triggers of
sickness in virtual reality applications. Therefore, it is important that virtual
reality applications are realised by powerful and efficient computer graphics
systems (software and hardware) that minimise latency.

11.2 Fog
When light penetrates a medium, such as smoke, haze, fog or clouds, an
interaction of light with the medium occurs through physical effects such as
absorption, scattering, reflection, refraction, birefringence, optical activity
or photoeffects. Fog consists of fine water droplets formed by condensation
of humid and supersaturated air. Unlike clouds, fog is always in contact
with the ground. The light is refracted at the surface of the water droplets
and reflected by these particles. Since the water droplets have a similar size
to the wavelength of visible light, so-called Mie scattering occurs, which
triggers the Tyndall effect, in which bundles of light are scattered out of the
fog medium. Only through this do the water droplets in the air become
visible in white colour and the light is attenuated by the fog. [25, p. 431]
In the context of volume rendering, the path tracing approach is used to
visualise such effects, which is an extension of ray tracing (see Sect. 9.9). If
the light modelled by rays hits a particle of the medium along a path, then
the following four effects can be distinguished according to this approach.
Through absorption, incident photons of light are absorbed and
converted into another form of energy, such as heat.
By emission, photons emanate from the particle when the medium
reaches a certain temperature. This is the case, for example, with a
burning fire.
By scattering away from the particle (out-scattering) scattered
photons emanate from the particle.
By scattering to the particle (in-scattering), photons arrive at a particle
that were previously scattered away from another particle.
Extinction is a measure of attenuation and according to this model is
composed of absorption and scattering away from the particle. These effects
are in principle wavelength dependent, which is already taken into account
in complex modelling approaches, see, for example, [20]. In practical
interactive applications, however, a simplification is usually made by
performing a separate calculation only for the three colours of the RGB
colour model [1, p. 589–591]. More detailed information on this topic can
be found, for example, in [21] and [26, Chap. 11].
The simulation of all these described effects, for example, in the context
of a path tracing method, requires complex calculations. The sufficiently
detailed consideration of the dependence on the wavelengths also increases
the complexity. These types of detailed representations are more suitable for
non-interactive computer graphics. Therefore, two very simple models for
the representation of fog are described below, which have been used
successfully in interactive computer graphics for a long time (see also [1, p.
600–602]). These models are available in the OpenGL fixed-function
pipeline (see Sect. 11.3).
To represent fog, a monotonically decreasing function
$$f_b: {\mathchoice{\textrm{I}\!\textrm{R}}
{\textrm{I}\!\textrm{R}}{\textrm{I}\!\textrm{R}}
{\textrm{I}\!\textrm{R}} }^+_0 \rightarrow [0,1]$$
with $$f_b(0) = 1$$ and
$$\displaystyle \lim _{d\rightarrow \infty } f_b(d) = 0$$ is needed to
determine the fog factor. If d with $$d \ge 0$$ is the distance of the
object from the observer, $$C_{\text{ object }}$$ is the colour
intensity of the object and $$C_{\text{ fog }}$$ is the colour intensity
of the fog, the colour intensity of the object C lying in the fog is given by
the following equation. This equation is analogous to the interpolated
transparency.
$$\begin{aligned} C = f_b(d)\cdot C_{\text{ object }} + (1 -
(11.1)
f_b(d))\cdot C_{\text{ fog }} \end{aligned}$$
Since $$f_b(d)$$ approaches zero with increasing distance d, the
colour of the fog dominates at greater distances. This formula can be
applied to any colour channel of the RGB colour model.
For the fog factor, a function with a linear or exponential decay is
usually used. For linear fog the following formula can be used.
$$\begin{aligned} f(d) = \frac{d_1 - d}{d_1 - d_0}
(11.2)
\end{aligned}$$
$$d_0$$ denotes the distance at which a fog effect should occur.
$$d_1$$ is the distance at which the fog dominates the object. In
order to maintain this visual restriction outside these limits and to limit it to
the interval [0, 1], the following formula can be applied.
$$\begin{aligned} f_b (x) = \min (\max (x, 0), 1) \end{aligned}$$ (11.3)
After calculating f(d) according to Eq. (11.2), the result is inserted into Eq.
(11.3) for x. This limitation is also called clamping in computer graphics. It
follows that the linear fog factor as a function of distance d is given by the
following equation.
(11.4)
$$\begin{aligned} f_b(d) \; = \; \left\{ \begin{array}{ll} 1 &{}
\text{ if } d \le d_0\\[1mm] \frac{d_1 - d}{d_1 - d_0} &{} \text{
if } d_0< d < d_1\\[1mm] 0 &{} \text{ if } d_1 \le d
\end{array} \right. \end{aligned}$$
This means that there is no visual restriction up to the distance
$$d=d_0$$. From distance $$d=d_1$$ the fog completely
dominates, so that from this distance no more objects are visible. Between
$$d_0$$ and $$d_1$$, the fog effect increases linearly.
This simple fog model can be used to generate so-called depth fog
or distance fog to highlight within the scene which objects have a greater
distance from the observer than other objects. Since distant objects can be
obscured by fog, depending on the parameter settings, the sudden
appearance and disappearance of background objects can be masked by fog.
The more realistic exponential fog is based on an exponential increase
in fog by a factor of $$\alpha >0$$ and uses the following
function.
$$\begin{aligned} f(d) \; = \; e^{-\alpha \cdot d } \end{aligned}$$ (11.5)
The greater the value $$\alpha $$, the denser the fog. Therefore,
$$\alpha $$ is called the fog density. A stronger exponential decay
can be achieved with the following function where the exponent is squared.
$$\begin{aligned} f(d) \; = \; e^{-(\alpha \cdot d)^2}
(11.6)
\end{aligned}$$
Together with the blending function, the fog increases more strongly with
increasing distance d. Also when using Eqs. (11.5) and (11.6) for the
realisation of exponential fog, a limitation of the result to the interval [0, 1]
is performed applying Eq. (11.3). Only this result is used in Eq. (11.1) as
the limiting fog factor for computing the colour values per colour channel
of the object lying in the fog.

11.3 Fog in the OpenGL


In the OpenGL fixed-function pipeline, the realisation of fog is
implemented according to the formulae from Sect. 11.2. The core profile
does not contain any special functions for the representation of fog.
Therefore, this section first illustrates the realisation of fog using shaders
(see also [31]). Based on this, the parameter settings for fog rendering in the
compatibility profile are explained at the end of this section.

Fig. 11.1 Vertex shader for the realisation of fog (GLSL)

Figures 11.1 and 11.2 show a vertex shader and a fragment shader for
the representation of fog. The user-defined input variables and uniforms
specified in these shaders must be defined by the associated OpenGL
application (for example, by a JOGL program) and the corresponding data
passed through it (see Sects. 2.9 and 2.10 for the basic approach).

Fig. 11.2 Fragment shader for the realisation of fog (GLSL)

The majority of the fog calculation in this shader pair takes place in the
fragment shader. The crucial part of the fog calculation in the vertex shader
is the determination of the vertex position in camera coordinates
eyecoord_Position and forwarding this information to the next stage
of the pipeline. Through rasterisation, this position is calculated for each
fragment and is available under the same name in the fragment shader (see
Fig. 11.2). Since the origin of the camera coordinate system is at the
position of the camera, the fragment position eyecoord_Position
corresponds to the vector between the viewer and the fragment for which (a
fraction of) fog is to be calculated. The distance between the observer and
the fragment is thus the length of this vector, which is stored in the variable
FogFragCoord. This variable corresponds to d according to Sect. 11.2.
The determination of the vector length through

can be simplified by using the absolute value of the z coordinate as follows:

If the viewer is far enough away from the fragment and the viewer position
deviates little from a perpendicular line of sight, the resulting error is hardly
noticeable. Using this simplification increases the efficiency for the
calculation.
A further optimisation can be made by assuming that the distance
between the viewer and the fragment is always greater than or equal to zero.
In this case

can be exchange by

The next step in the fragment shader is to determine the fog factor fog
for mixing the fragment colour with the fog colour. The parameter
FogMode is used to select one of the three calculation modes for the fog
factor given in Sect. 11.2 (called f there). The necessary parameter values
are passed from the OpenGL application to the fragment shader using
uniform variables. When applying the linear formula for the fog factor, the
variable FogScale is used. The value for this variable can be determined
as follows:

Since this value does not change after the choice of parameters, it is pre-
calculated in the OpenGL application and passed to the fragment shader.
The formula for FogScale corresponds to the term
$$1 / (d_1 - d_0)$$ according to Sect. 11.2.
In a further step, the function clamp is applied to restrict the fog factor
to the interval [0, 1], analogous to the calculation of the value $$f_b$$
according to Eq. (11.3). Finally, the colour of the fragment FragColor is
determined by mixing the original fragment colour color with the colour
of the fog FogColor by linear interpolation using the function mix. The
weighting factor for the fog is the fog factor fog. This operation applies
Eq. (11.1) (Sect. 11.2) to each of the three RGB colour components. There
is no change in the alpha value of the fragment.
This procedure first determines the opaque colour of the fragment of an
object, which is followed by blending with the colour of the fog. This is
therefore volume rendering with a homogeneous medium. In principle, this
operation can also be realised by using the alpha channel and the blending
functions, which can be executed as per-fragment operations after the
fragment shader.
The shader pair specified in Figs. 11.1 and 11.2 performs an accurate
but at the same time a complex fog calculation. An increase in efficiency
can be achieved by performing the calculation of the distance
FogFragCoord and the calculation of the fog factor fog in the vertex
shader. This reduces the computational effort if there are fewer vertices than
fragments. The rasterisation then generates a fog factor for each fragment, if
necessary by interpolation, so that only the blending between the fog colour
and the fragment colour has to take place in the fragment shader (see last
command in the fragment shader in Fig. 11.2). This separation of the
calculation is useful to increase efficiency for scenes with many small
polygons. However, if the scene consists of large polygons, such as in a
large landscape, then calculating the fog factor per fragment makes sense.

Fig. 11.3 Example scenes with fog created by a renderer in the core profile with the shader pair from
Figs. 11.1 and 11.2: Linear fog ( $$\texttt {FogMode} = 0$$ ) was used in the upper part.
Exponential fog ( $$\texttt {FogMode} = 1$$ ) was used in the lower part

Figure 11.3 shows the result of fog calculation with the shader pair
specified in this section. For this example, the bright fog colour
$$\texttt {FogColor} = (0.9, 0.9, 0.9)$$ was chosen, which was also
set for the background. For the application of the linear formula (at the top
of the figure), the values $$\texttt {FogStart} = 0$$ and
$$\texttt {FogEnd} = 18$$ were set. The exponential fog was
calculated using Eq. (11.5) with
$$\alpha = \texttt {FogDensity} = 0.12$$. The more realistic fog
representation by the exponential formula is clearly visible in the figure.
As explained at the beginning of this section, calculation and rendering
of fog is part of the fixed-function pipeline and thus available in the
compatibility profile. In this profile, fog can be enabled by the JOGL
command glEnable(gl.GL_FOG). As default values for the fog
parameters the fog colour black (0, 0, 0), as fog type the exponential
calculation according to Eq. (11.5) (GL_EXP) and as fog density the value 1
(GL_FOG_DENSITY) are specified. The type of fog calculation according
to Eq. (11.6) and a different fog density can be set, for example, as follows
in the JOGL source code.

Below is a JOGL example to set the linear fog calculation according to Eq.
(11.2) and to set the fog colour and the two fog parameters for the linear
calculation.

The default values for the start distance of the fog effect is 0 and for the end
distance of the fog effect is 1.
Whether the distance between the fragment and the camera is calculated
by the exact determination of the vector length or whether it is
approximated by the absolute value of the z component of the difference
vector (see above for the discussion of the advantage and disadvantage) is
left to the concrete OpenGL implementation for the GPU. The fog
calculation in the compatibility profile does not work if a fragment shader is
active. Further details on fog calculation and possible parameter settings
can be found in the OpenGL specification of the compatibility profile [32].

Fig. 11.4 The sparks of a sparkler as an example of a particle system

11.4 Particle Systems


Linear and exponential fog models as presented in Sect. 11.2 model a
homogeneous fog of constant density. Individual clouds of fog hanging in
the air are not coved by these simple models. Such effects and related ones,
such as smoke, can be reproduced by particle systems [28, 29]. A particle
system consists of very many small objects (particles) that are not
controlled individually but by a random mechanism with a few parameters.
The individual particles are assigned a basic behavioural pattern that varies
individually by the random mechanism. A particle system is often defined
by the characteristics outlined below. As an example, consider a sparkler
whose sparks are modelled by a particle system (see Fig. 11.4).
The point of origin of a particle: For the sparkler, this would be a random
point on the surface.
The initial velocity of a particle.
The direction in which a particle moves: For the sparkler, a random
direction away from the surface could be chosen.
The lifetime of a particle: For the sparkler, this corresponds to how long a
spark glows.
The intensity of particle emission: The time between the generation of
two successive particles.
The appearance of the particles.
In many cases, particles are very small elements, such as the sparks of
the sparkler. This does not imply that each particle must necessarily be
modelled individually as an object. For example, for a sandstorm, if each
grain of sand were represented as a single particle or object, the number of
particles would be so large that the computations of the movements and the
entire rendering would take far too long. Instead, such particles are often
grouped together to form a larger abstract particle. Often a larger particle in
the form of a simple surface—such as a rectangle or triangle—is used, on
which a matching, possibly semi-transparent texture is placed [14]. For the
sparkler, a transparent texture could be used with individual bright points
representing sparks. For a sandstorm, a semi-transparent sand pattern could
be used as a texture. In this way, the number of particles, which are now
larger, and thus the animation effort can be significantly reduced.
Other aspects can be included in a particle system. For example,
particles may have the ability to split into particle of smaller or equal size.
Often a force is taken into account that influences the trajectory of the
particles. In the simplest case, this can be gravity, such as in a fountain
modelled as a particle system of water droplets. In this case, the trajectory
of a particle as it is created could be calculated from its initial velocity and
direction. Applying a hypothetical weight acted upon by the gravitational
acceleration would result in a parabolic trajectory for the particles. If
dynamic forces, such as non-constant wind, are to be taken into account, the
trajectories of the particles must be constantly updated during the
animation. For such calculations, physics engines [24] are used, which
specifically handle the calculations for physical simulations, especially
movements and deformations of objects.
Swarm behaviour [30], as used for modelling swarms of birds or fish, is
related to particle systems. In a swarm, however, interaction and
coordination of the individuals with each other play a major role. The
individuals are not perfectly synchronised, but they move at approximately
the same speed in approximately the same direction. Collisions between
individuals are avoided. Therefore, techniques from the field of artificial
intelligence are often used to model swarm behaviour.
For clouds, a quite realistic animation model is described in [12]. The
sky is first divided into voxels, with each voxel representing a cell of a
cellular automaton. Depending on how the state transitions of this
automaton are defined, different types of clouds can be simulated.
Depending on their current state, each individual cell is assigned to a sphere
of a certain colour. The resulting structure is mapped onto a sky texture.

11.5 A Particle System in the OpenGL


In this section, the realisation of a simple particle system with the help of
shaders in OpenGL under Java is explained using the example of a confetti
cannon. An implementation of the corresponding OpenGL application in
the programming language C is described in [31, p. 470–475].
The confetti cannon is to eject individual particles (confetti) of random
colour, which are rendered as points (GL_POINTS). The particles move at
a constant speed in a certain direction and are pulled downwards by the
Earth’s gravitational field over time. For this simple system, the following
properties must be modelled for each individual particle.
The starting position $$s_0$$ at which the particle is created (three-
dimensional coordinate).
The colour of the particle (RGB colour value and an alpha value).
The constant velocity v with which a particle moves within the three
spatial dimensions (three-dimensional velocity vector).
The start time $$t_0$$ at which the particle is created.
The gravitational acceleration (acceleration due to gravity), by which a
particle is pulled downwards, is to be taken into account with the
approximate value of $$g = 9.81 \text {m/s}^2$$. The lifetime t of a
particle is the current simulation time minus the start time of a particle, i.e.,
$$t = t_{\text{ current }} - t_0$$. This allows the new position of a
particle to be determined using the following formula.
$$\begin{aligned} s = s_0 + v \cdot t - \frac{1}{2} \cdot g \cdot t^2
(11.7)
\end{aligned}$$
This relationship is in accordance with the laws of motion of mechanics.
This formula must be applied to each component of the position vector of a
particle. The gravitational acceleration should only act in the y-direction.
Therefore, g is zero for the x- and z-components. The downward direction
of this acceleration (in the negative y-direction) is already taken into
account by the minus sign in the formula above.

Fig. 11.5 Vertex shader for the realisation of a confetti cannon as a particle system (GLSL)

Fig. 11.6 Fragment shader for the realisation of a confetti cannon as a particle system (GLSL)

Figures 11.5 and 11.6 show a vertex shader and a fragment shader for
the realisation of the particle animation for the confetti cannon. The input
variables defined in the vertex shader correspond to the properties of a
particle given in the list above. Since a particle is to be represented by a
point (see above), these properties can be defined by exactly one vertex.
This vertex data must be generated only once in the OpenGL application
(for example, by the init method of a JOGL program) with random
values and passed to the OpenGL. The transformation matrices (pMatrix
and mvMatrix) are passed as uniform variables from the OpenGL
program once per frame as usual (for example, by the display method of
a JOGL program). Sections 2.9 and 2.10 show the basic approach of
passing data from a JOGL program to shaders. The generation of the
random vertex data is left to the reader as a programming exercise.

Fig. 11.7 Part of the display method for calculating and passing the system time (Java)
In addition, the current simulation time currentTime is required for
this particle animation in the vertex shader. Figure 11.7 shows part of the
display method of a JOGL renderer to update the simulation time and pass
it to the vertex shader via the layout position 2 by a uniform. The variable
currentTime must be declared in the renderer and initialised with zero
in the init method. The last command in the source code section triggers
the drawing of the particles as points whose data are already in an OpenGL
buffer. For each of these points (on the GPU), an instance of the vertex
shader is called.
In the vertex shader (see Fig. 11.5), it is first checked whether the
lifetime of the particle in question is greater than or equal to zero, i.e.,
whether the particle has already been created (born). If it has already been
created, the distance the particle has travelled since its creation is
determined. The new particle position can be determined according to Eq.
(11.7). As stated earlier in this section, the calculation is done component-
wise for each dimension of the particle position. The gravitational
acceleration is only taken into account for the y-component, since it should
only act in this direction. Furthermore, the colour of the particle is passed
on to the next stage of the graphics pipeline without any change. If the
particle has not yet been created (lifetime less than zero), the particle
remains at its initial position. This newly calculated position of the particle
is transformed using the transformation matrices and passed on to the next
stage of the graphics pipeline. The lifetime of the particle is also passed on.

Fig. 11.8 Some frames of the animation of the confetti cannon as a particle system

In the fragment shader (see Fig. 11.6), it is checked whether the particle
has not yet been created (lifetime less than zero). If it has not yet been
created, the fragment in question is discarded by discard and it is not
forwarded to the next stage of the graphics pipeline and thus not included in
the framebuffer. This means that this fragment is not displayed. The colour
of the fragment is passed on to the next stage of the graphics pipeline
without any change.
Figure 11.8 shows some frames of the animation of the confetti cannon
created by the described particle system. The animation was performed with
30,000 points, all generated at the same starting position with random
colours. The vectors $$(4, 5, 0)^T \text {m/s}$$ and
$$(6, 12, 5)^T \text {m/s}$$ served as lower and upper bounds for the
randomly generated velocity vectors of the particles. The start times for
each particle were randomly chosen from the range of 0 to 10 seconds.
This example shows how easy it is to realise a particle system using
shaders. The data for the 30,000 particles are generated exactly once and
transferred to the OpenGL. The main computations of the animation are
done entirely by the GPU. The OpenGL application only determines and
transfers some varying data per frame, such as the simulation time and the
transformation matrices. Thus, GPU support for particle systems can be
realised very efficiently with the help of shaders. By changing the
transformation matrices, the scene itself can be manipulated during the
particle animation. For example, a rotation or a displacement of the scene is
possible.
The particle system described in this section can be extended in many
ways. The spectrum of colours can be narrowed down to, for example, only
yellow-red colours, allowing for a spark simulation. The colours could
change from red to orange and yellow to white depending on the lifetime of
the particles, simulating a glow of the particles. If the lifetime of the
particles is also limited, the impression of annealing of the particles is
created. The alpha components can also be varied to slowly fade out a
particle. The random locations of particle formation can be extended to a
surface, for example, the surface of the sparkler considered in Sect. 11.4. As
a result, particles appear to emanate from this sparkler.
Another possible variation is to give the y-component of the position
coordinate a lower limit, so that the particles gather at a certain place on the
ground. In addition, the particles could be represented by lines instead of
points, which would allow the representation of particles with a tail.
Likewise, the particles could consist of small polygonal nets that are
provided with (partly transparent) textures in order to simulate more
complex particles. Furthermore, in addition to gravitational acceleration,
other acceleration or velocity components can be added, which would
enable rendering of (non-constant) wind effects, for example.

11.6 Dynamic Surfaces


In this book, motion of objects is modelled by applying transformations to
the objects. These transformations usually describe only the motion of the
objects, but not their deformation. For rigid objects such as a vehicle or a
crane, this approach is sufficient to describe motion appropriately. If
persons or animals move, the skin and the muscles should also move in a
suitable way, otherwise the movements will lead to a robot-like impression.
Therefore, for surfaces that move in a more flexible manner, more complex
models are used. On the other hand, it would be too inefficient and complex
to model the movement of the surface by individual descriptions for the
movements of its surface polygons.
One approach to animate dynamic surfaces is to model the surface in an
initial and an end position and, if necessary, in several intermediate
positions. The animation is then carried out by an interpolation between the
different states of the surface. For this purpose, the method presented in
Sect. 5.1.4 for converting the letter D into the letter C and the triangulation-
based method from Sect. 6.4 can be adapted to three-dimensional structures.
These techniques are based on the principle that two different geometric
objects are defined by corresponding sets of points and structures—usually
triangles—defined by associated points. Then a step-by-step interpolation
by convex combinations between corresponding points is carried out,
defining intermediate structures based on the interpolated points. In Sect. 5.
1.4, the structures formed based on the points are lines, quadratic or cubic
curves. In Sect. 6.4, these structures are triangles. In any case, it is
important that there is a one-to-one correspondence between the points of
the two objects and that the associated structures in the two objects are
defined by corresponding points.

Fig. 11.9 Intermediate steps for the interpolation between surfaces that are defined by points and
triangles with a one-to-one correspondence

In the three-dimensional space, two surfaces can be modelled by


triangles. It is important to ensure that the number of points in both objects
is the same and there must be a one-to-one correspondence between the two
groups of points establishing a one-to-one correspondence between the
triangles that define the two surfaces. Figure 11.9 shows the intermediate
steps (convex combinations for $$\alpha {=}$$ 0, 0.2, 0.4, 0.6, 0.8, 1)
in the interpolation of two surfaces that are defined by points and triangles
with a one-to-one correspondence.
Instead of defining several intermediate states for the dynamic surface,
motion can be described in terms of a few important points. For example, in
a human arm, the motion is essentially constrained by the bones and the
joints. The upper arm bone can rotate more or less freely in the shoulder
joint, the forearm can only bend at the elbow joint, but not rotate. These
observations alone provide important information for modelling arm
motion. If only the movements of the bones are considered, a simplified
model with only one bone for the upper and one for the lower arm is
sufficient. When the hand of the arm carries out a movement, the bones
simply follow the hand’s movement under the restrictions that are imposed
by the joints. A swaying movement must automatically be carried out in the
shoulder joint of the upper arm, as the elbow joint does not allow for
rotations. The bones themselves, however, are not visible, so their
movement must still be transferred to the surface, in this example to the
skin. The position of the arm bones can be clearly determined by three
skeletal points—the shoulder, elbow and wrist.
If the surface of the arm is modelled by freeform curves or
approximated by triangles, each control point or vertex of the freeform
curves or triangles can be assigned a weight relative to the three skeleton
points. The weights indicate how much a point (vertex) on the surface is
influenced by the skeleton points. This weight will usually be greatest for
the skeletal point closest to the vertex. Vertices that are approximately in the
middle between two skeletal points will each receive a weight of 50% for
each of the neighbouring skeletal points. Figure 11.10 shows such
a skeleton as it could be used for modelling an arm.

Fig. 11.10 Representation of a skeleton with a flexible surface (skinning)

The skeleton is shown dashed, the three skeleton points are marked by
squares. The greyscale of the vertices of the very coarse triangle grid that is
supposed to model the skin indicates how the weights of the vertices are
chosen. The vertex at the bottom right is assigned to the right skeleton point
with a weight of one. The vertices next to it have already positive weights
for the right and the middle skeleton point.
When the skeleton is moved, the skeleton points will undergo different
transformations. With three skeleton points
$$\boldsymbol{s}_1,\, \boldsymbol{s}_2,\, \boldsymbol{s}_3$$
moved by the transformations $$T_1,\, T_2$$ and $$T_3$$, a
vertex point $$\boldsymbol{p}$$ of the surface with weights
$$w_1^{(\textbf{p})}$$, $$w_2^{(\textbf{p})}$$,
$$w_3^{(\textbf{p})}$$ to the skeleton points would be transformed
according to the following transformation.
$$ T_{\boldsymbol{p}} \; = \; w_1^{(\boldsymbol{p})} \cdot T_1 +
w_2^{(\boldsymbol{p})} \cdot T_2 + w_3^{(\boldsymbol{p})} \cdot T_3.
$$
It is assumed that the weights form a convex combination, i.e.,
$$w_1^{(\boldsymbol{p})}, w_2^{(\boldsymbol{p})},
w_3^{(\boldsymbol{p})} \in [0,1]$$
and
$$w_1^{(\boldsymbol{p})} + w_2^{(\boldsymbol{p})} +
w_3^{(\boldsymbol{p})} = 1$$
.
This approach implies that the surface essentially follows the skeleton
like a flexible hull. If a distinct muscle movement is to be modelled, the
tensing and relaxing of the muscles results in an additional inherent
movement of the surface. In this case, it is recommended to describe the
surface of the object in different elementary positions—for example, with
the arm stretched and bent—and then apply convex combinations of
elementary positions for the movement of the skin.
Instead of such heuristics, mathematical models for the surface
movements can also be specified. For example, [37] shows cloth modelling
for virtual try-on based on finite element methods.

11.7 Interaction and Object Selection


A simple way for the viewer to interact with the three-dimensional world
modelled in the computer is to navigate through the virtual world. When the
viewer moves in a virtual reality application, special sensors are used to
determine the viewer’s position and orientation. From the position of the
viewer, which is determined by these sensors or manually by mouse
control, a projection of the scene must be calculated using the viewer
position as the projection centre. This topic is covered in Sect. 5.8.
If the viewer should be able to interact with objects in the scene,
suitable techniques are required to choose and select objects for the
interaction. In a scene graph, it should be made clear which objects can be
selected on which level and what happens when an object is selected. As a
simple example, the three-dimensional model in the computer can serve as
a learning environment in which the user explores a complex technical
object, such as an aircraft. When an object is selected, the corresponding
name and function is displayed. If the viewer is to trigger dynamic changes
in the scene by selecting certain objects, for example, operating a lever, the
corresponding movements must be implemented and executed when the
user selects the corresponding object. The selection of an object in a scene
is called picking in computer graphics, especially when the scene is
rendered on a two-dimensional screen and the mouse is used for selection.
When object picking is carried out with the mouse, the problem of
finding out which object has been picked must be solved. The mouse can
only specify a two-dimensional point in device coordinates (window
coordinates) on the projection plane. This means, the object must be found
in the three-dimensional scene to which the projected point belongs. This
section describes some of the different selection methods that exist.
One method of object picking is colour picking. With this method, the
objects or even the individual polygons of the scene are coloured with a
unique colour and stored in an invisible framebuffer. If this coloured scene
is a copy of the visible scene, then the two-dimensional device coordinate
of the mouse click can be used to read the colour at the same location from
the invisible framebuffer. Since the colour is uniquely assigned to an object
or polygon, this solves the problem of assigning the mouse click to an
object. The main disadvantage of this method is the need to draw the scene
twice. However, this can be done with the support of the GPU.
Another method works with the back projection of the window
coordinates of the mouse click into the model or world coordinate system.
For this purpose, a back transformation of the viewing pipeline (see Sect. 5.
36) must be carried out. Since no unique three-dimensional point can be
determined from a two-dimensional coordinate, additional assumptions or
information must be added. For this purpose, the depth information of a
fragment can be read from the depth buffer (z-buffer). Equation (11.14)
contains the mathematical steps for such a back transformation, which are
explained in more detail with a source code example in Sect. 11.8. In the
world coordinate system, it can be determined whether the point is inside an
object or not. Alternatively, this comparison can also be made in the model
coordinate system. To do this, the point for each object must be transformed
again into the model coordinate system of the respective object before a
comparison can take place.
Instead of back-projecting a single point, the ray casting technique
presented in Sect. 9.9 can be used. This technique involves sending a ray
(or beam) into the scene to determine which object or objects are hit by the
ray. The object closest to the camera is usually selected. The required
comparison is usually made in the world coordinate system. For this
purpose, the ray must be created in this coordinate system or transferred to
this coordinate system, starting from the window coordinate of the mouse
click.
Determining whether a point lies within a scene object or is intersected
by a ray can become arbitrarily complex depending on the object geometry
and accuracy requirements. Determining the intersection points with all the
triangles that make up the objects in a scene is usually too complex for
realistic scenes. Therefore, in computer graphics, bounding volumes are
used to increase efficiency. These bounding volumes represent or better
enclose (complex) objects for the purpose of object selection by much less
complex objects. This allows an efficient intersection calculation or an
efficient test if a point lies within the bounding volume. As a further
prerequisite, the object must be enclosed as completely as possible so that
clicking on the outer points of the object also leads to the desired object
selection. On the other hand, the object should be wrapped as tightly as
possible so that a selection is only made when clicking on the object and
not next to it. However, if the geometry of the selected bounding volume is
very different from the geometry of the enclosed object, this is not always
satisfactorily possible and a compromise between complete and tight
enclosure may have to be found. For example, if a very thick and long
cuboid is completely enclosed by a bounding sphere, the bounding volume
must be significantly larger than the enclosed object. In this case, a wrong
selection may occur if the bounding volume intersects an adjacent object
that does not intersect the enclosed object. If the bounding sphere is made
smaller so that it does not completely enclose the cuboid, then the range in
which a wrong selection can occur when clicking next to the object can be
reduced. However, a selection by clicking on certain border areas of the
enclosed object is then no longer possible. For this example, choosing a
different geometry for the bounding volume may make more sense. Some
frequently used bounding volumes are explained below.
A bounding sphere can be simply represented by its radius and the
three-dimensional coordinate of its centre. Determining whether a point lies
within a sphere or whether it is intersected by a ray is thus relatively easy.
Furthermore, this volume is invariant to rotations. When translating the
enclosed object, the centre point can simply be moved with the object. For a
uniform scaling of all dimensions, only a simple adjustment of the radius
with the same factor is necessary. Only scaling with different scaling factors
for the different coordinate directions require a more complex adjustment of
the size of the bounding volume.
Cuboids are often used as bounding volumes. If the edges of such a
cuboid are parallel to the coordinate axes, the bounding volume is called
an axis-aligned bounding box (AABB). Two vertices that are diagonally
opposite with respect to all axes are sufficient to represent such a cuboid.
The calculation of whether a point is inside the volume can be done by six
simple comparisons of the coordinates. Translation and scaling of an AABB
does not cause any difficulties. If the enclosed object is rotated, the AABB
cannot usually rotate with it due to its property of parallel alignment to the
coordinate axes. The object must rotate within the AABB, and the size of
this cuboid must be adjusted. The AABB and the enclosed object may be
more different from each other after rotation.
The solution to this problem of rotating objects within an AABB is an
oriented cuboid called oriented bounding box (OBB). An OOB can be
arbitrarily rotated so that more types of objects can be more tightly enclosed
than with an AABB. An OOB can be represented by its centre point, three
normal vectors describing the directions of the side faces, and the three
(half) side lengths of the cuboid. This representation is thus more complex
than for the AABB. Likewise, testing whether a ray intersects the OOB or if
a point lies within the OOB is more complex. On the other hand, this
complexity has the advantage that an OOB can be transformed arbitrarily
along with the enclosed object. [1, p. 959–962] provides methods for
intersection calculation between a ray and a cuboid.
A more complex bounding volume is the k-DOP (discrete oriented
polytope). Unlike an AABB or an OOB, this volume can have more than six
bounding surfaces, potentially enclosing an object more tightly than with an
AABB or OOB. Any two of these bounding surfaces of a k-DOP must be
oriented parallel to each other. These pairs of bounding surfaces are referred
to as slabs. The representation of a k-DOP is given by the normalised
normal vectors of the k/2 slabs and two scalar values per slab, indicating the
distance of the slabs from the origin and their extent. Thus k must only be
even. In contrast to cuboidal bounding volumes, the normal vectors of the
slabs do not have to be oriented perpendicular to each other. A rectangle in
two-dimensional space can be regarded as a special case of a 4-DOP. In the
three-dimensional space, a cuboid is a 6-DOP with pairwise normal vectors
oriented perpendicular to each other. Since a k-DOP is defined as the set of
k bounding surfaces that most closely encloses an object, it represents the
best hull—in the sense of being closely enclosed—for a given k. [1, p. 945–
946] contains a formal definition of a k-DOP.
The most closely enclosing convex bounding volume for a given object
is its (discrete) convex hull. In this case, the bounding surfaces are not
necessarily parallel to each other and their number is unlimited. This results
in a higher representational effort and a higher computational effort for
testing whether a ray intersects the bounding volume or a point lies within
the bounding volume as opposed to a k-DOP.
Besides the question of whether a bounding volume represents an object
better or worse in principle, the problem to be solved is how the parameters
of such a volume are concretely determined in order to enclose a given
object as tightly as possible. The easiest way to derive an AABB is from the
geometry of the object to be enclosed. For this purpose, the minima and
maxima of all vertex coordinates of the object can be used to determine the
corner points of the AABB.
Since a k-DOP can be seen as an extension of an AABB and as it is
defined as the bounding volume that, for a given k, most closely encloses
the object, its determination for a concrete object is more complex than for
an AABB, but nevertheless relatively simple.
Determining a suitable bounding sphere is more complex than it may
seem. A simple algorithm consists of first determining an AABB. Then, the
centre of this geometry is used as the centre of the sphere. The diagonal
from this centre to one of the vertices of the AABB can be used to
determine the radius. However, this often leads to a not very tightly
enclosed object. To improve this, the vertex furthest from the centre can be
used to determine the radius.
Determining the OBB that encloses the object in an optimal shape is the
most complex among the bounding volumes presented in this section.
Methods for generating well-fitting bounding volumes for concrete objects,
along with further reading, can be found in [1, p. 948–953].
As these considerations on the different types of bounding volumes
show, they each have specific advantages and disadvantages that have to be
taken into account when implementing a concrete application. The
following criteria are relevant for the selection of the optimal bounding
volume.
The complexity of representation.
The suitability for a tight enclosure of the target geometry (of the to be
enclosed object).
The effort required to calculate a specific bounding volume for an object.
The need and, if applicable, the effort for changing the bounding volume
due to object transformations, for example, for motion animations.
The complexity of determining whether a point lies within the bounding
volume.
The complexity of the intersection calculation between a ray and the
bounding volume.
For simple applications or for applications where a very fast selection of
objects is required, but at the same time precision is not the main concern,
simple bounding volumes that deviate more than slightly from the object
geometry are sufficient under certain circumstances. For the example of a
bounding sphere, this sphere can be chosen very small compared to its more
cuboid-shaped object, in order to minimise a wrong selection of
neighbouring objects. This will, in most cases, make the selection of the
outer parts of the object impossible, but this may be acceptable. However,
modern graphics systems are often capable of realising an accurate and
efficient selection of objects in complex hierarchies of bounding volumes.
The remainder of this section contains some considerations on bounding
spheres. It is shown how to verify whether a point lies within this kind of
bounding volume. It is also shown how to determine intersection points
between a sphere and a ray.
A sphere as a bounding volume can be represented by the position
vector to its centre $$\boldsymbol{c} = (c_x,c_y,c_z)^\top $$ and its
radius $$r \in \mathbb {R}$$. Furthermore, for any real vector
$$\boldsymbol{q} = (q_x,q_y,q_z)^\top $$, let the Euclidean norm
$$\Vert \boldsymbol{q} \Vert = \sqrt{{q_x}^2 + {q_y}^2 +
{q_z}^2}$$
be given. As known from mathematics, this norm can be used to find the
length of a vector or the distance between two points (between the start and
end points of the vector). A point
$$\boldsymbol{p} = (p_x,p_y,p_z)^\top $$ lies inside the sphere or
on its surface if its distance from the centre of the sphere is less than or
equal to its radius r. This can be expressed as an inequality as follows:
$$\begin{aligned} \Vert \boldsymbol{c} - \boldsymbol{p} \Vert \le
(11.8)
r. \end{aligned}$$
Using the equation for the Euclidean norm, the following inequality results.
$$\begin{aligned} (c_x - p_x)^2 + (c_y - p_y)^2 + (c_z - p_z)^2 \le
(11.9)
r^2 \end{aligned}$$
For reasons of efficiency, the square root can be omitted when determining
whether a point is inside the sphere or not. In general terms, Eqs. (11.8) and
(11.9) for the sphere with centre $$\boldsymbol{c}$$ and radius r
define all points within a sphere or on its surface.
If the ray casting method is used, then the sphere must be intersected
with a ray. To determine the intersection points with a sphere, let the ray be
represented in the following parametrised form.
$$\begin{aligned} \boldsymbol{s} = \boldsymbol{o} + t\
\boldsymbol{d} \text{ mit } t \in \mathbb {R}, t > 0 (11.10)
\end{aligned}$$
The point $$\boldsymbol{o}$$ is the starting point of the ray and the
vector $$\boldsymbol{d}$$ is the displacement vector that
determines the direction of propagation of the ray. The parameter t
expresses how far a point on the ray is from its starting point in the
direction of vector $$\boldsymbol{d}$$. Let the sphere be given as
above by its centre $$\boldsymbol{c} = (c_x,c_y,c_z)^\top $$ and its
radius $$r \in \mathbb {R}$$. Using Eq. (11.8), the following
approach for calculating the intersection points is obtained.
$$\begin{aligned} \Vert \boldsymbol{c} - \boldsymbol{s} \Vert =
(11.11)
r \end{aligned}$$
Since it is sufficient to determine the intersection points on the surface of
the sphere, this equation contains an equal sign.
Inserting Eq. (11.10) for the ray into Eq. (11.11) and using the definition
$$\boldsymbol{h} := \boldsymbol{c} - \boldsymbol{o}$$ gives the
following equation.
$$\begin{aligned} \Vert \boldsymbol{h} - t\ \boldsymbol{d} \Vert = r
\end{aligned}$$
Using the Euclidean norm (see above) results in a quadratic equation.
$$\begin{aligned} (h_x - t\ d_x)^2 + (h_y - t\ d_y)^2 + (h_z - t\
(11.12)
d_z)^2 = r^2 \end{aligned}$$
The maximum number of two possible solutions to this quadratic equation
is consistent with the observation that a ray can intersect the surface of a
sphere at most twice. In this case, the ray passes through the sphere. If only
one solution exists, then the sphere is touched by the ray at one point. If no
(real-valued) solution exists, then the sphere is not hit by the ray.
After solving the quadratic equation (11.12), assuming that the
displacement vector $$\boldsymbol{d}$$ of the ray is normalised to
length one, i.e., $$\Vert \boldsymbol{d} \Vert = 1$$, the parameter
values of the two intersection points are given by the following equations.
$$\begin{aligned} t_1 = a + \sqrt{b} t_2 = a - \sqrt{b}
(11.13)
\end{aligned}$$

$$\begin{aligned} \text{ with } a : = h_x d_x + h_y d_y + h_z d_z \text{
and } b := a^2 - ({h_x}^2 + {h_y}^2 + {h_z}^2) + r^2 \end{aligned}$$
If b becomes negative, then the root terms yield imaginary parts and the
results for $$t_1$$ and $$t_2$$ are complex numbers. Since in
this case there is no real-valued solution to the quadratic equation, the ray
does not intersect the sphere at all. Therefore, if only the existence of an
intersection point has to be verified, the calculation of b is sufficient.
Subsequently, it is only necessary to verify whether b is greater than or
equal to zero.
If real-valued solutions ( $$b \ge 0$$) exist, then the intersection
point with the smaller of the two parameter values $$t_1$$ and
$$t_2$$ is closest to the starting point $$\boldsymbol{o}$$ of
the ray. According to the definition in Eq. (11.10), the t values must not
become smaller than zero. In this case, the intersections lie behind the
observer. Therefore, it has to be verified whether $$t_1$$ and
$$t_2$$ are positive. For t values equal to zero, the observer position
(starting point of the ray) is exactly on the surface of the sphere and in the
intersection point. [1, p. 957–959] presents a more efficient method for
determining the intersection points of a ray with a sphere.
In a scene with several selectable objects, if only the object with the
smallest distance to the starting point of the ray is to be selected, the
parameter values t for all intersections with the bounding spheres of the
objects must be determined. The object hit by the ray with the smallest t is
closest to the starting point $$\boldsymbol{o}$$ of the ray.
If the length of the ray is very long, objects may be incorrectly selected
if they are very close to each other and if the geometries of the bounding
volumes do not exactly match those of the objects (which is the rule). In
this case, the bounding volumes can overlap even though the objects are
displayed separately. The more the object geometries deviate from the
geometries of the bounding volumes, the more likely this is (see above).
To increase the precision of selection, the value of the depth buffer at
the point of the mouse click can be used. This value usually varies between
zero and one. Fragments with large depth values are further away from the
viewer than fragments with small depth values. If the value read is equal to
one, then there is no fragment at the position viewed, but the background.
In this case, object selection should be prevented in order to avoid an
incorrect selection. Furthermore, if the depth buffer value is less than one, it
can be used to shorten the ray and thus potentially create fewer wrong
intersections with neighbouring objects. [18, Chap. 7] contains more robust
intersection tests.

11.8 Object Selection in the OpenGL


This section presents a JOGL program sketch with the essential elements
for the realisation of object selection (picking) in the OpenGL. For this
example, let each selectable object in the scene be enclosed by a bounding
volume in the form of a sphere. These bounding spheres are represented by
their centres and radii, as described in Sect. 11.7. Figure 11.11 shows two
frames of an example scene containing cuboid objects, all enclosed by
invisible bounding spheres. In the frame on the right, two yellow objects are
selected.

Fig. 11.11 A scene with selectable objects: In the image on the right, some objects are selected
(highlighted in yellow). The scene was created by a JOGL renderer

If the mouse pointer is moved to the scene and a mouse button is


pressed, the mouse coordinates within the window must be determined. For
this purpose, the JOGL interface MouseListener can be used.1 The
following method of this interface is called when a mouse button is pressed.

To be able to retrieve the mouse click information the JOGL renderer must
implement this interface and this method and register itself as an observer
(listener) for these events. When the mouse is clicked, this method is called
and the argument of type MouseEvent2 supplies the two-dimensional
coordinate of the position of the mouse pointer when clicked. This position
can be queried as follows:

Since the JOGL renderer outlined in Sect. 2.7 already implements the
interface GLEventListener and the display method is called for
each frame, it makes sense to process the mouse clicks by a separate object,
which can also store the last clicked mouse coordinate. For this example,
this object is called interactionHandler. Since rendering a frame and
clicking the mouse takes place asynchronously, the
interactionHandler object must also save whether a mouse click has
taken place. Under these preconditions, the display method of the
renderer, which is called for each frame, can query from this object whether
the mouse has been clicked since the last frame. Figure 11.12 shows the
relevant source code for this process. If a mouse click has taken place, the
(last) coordinate can be read. The last call in this source code section
initiates the process of identifying the object in the scene to be selected by
the mouse click.

Fig. 11.12 Parts of the source code of the display method of a JOGL renderer (Java) for reading
the mouse click coordinates if the mouse has been clicked since the last frame

Fig. 11.13 Submethod of the display method of a JOGL renderer (Java) for transforming a point
back from the window coordinate system to the world coordinate system using a z-value from the
depth buffer

In the next step, the transformation of the window coordinates back into
the world coordinate system along the viewing pipeline, as mentioned in
Sect. 11.7, must take place. Figure 11.13 shows a method that performs this
reverse transformation. First, the position and dimensions of the currently
used projection plane (viewport) are read. Section 5.1.2 contains
explanations about the view window. After this operation, the array
viewport contains—with ascending indices—the x-coordinate, the y-
coordinate, the width and the height of the viewport. The x- and y-
coordinates represent the position of the lower left corner of the viewport.
For the back transformation, the depth value (z-value) of the clicked
fragment must be read from the depth buffer. First, however, the origin of
the y-coordinates must be transformed from the upper edge of the window
to its lower edge. This changes the y-coordinate representation for the Java
window to the representation for the OpenGL. The command
glReadPixels reads the content of the depth buffer. Since the result is
delivered in the efficiently implemented FloatBuffer data structure, the
subsequent conversion into a Java array and the Java variable
depthValue is required. With the mouse click coordinates, the z-value
from the depth buffer and the properties of the viewport, the reverse
transformation can be comfortably performed using the method
gluUnProject of the JOGL class PMVMatrix. If the transformation
was successful, the result is available in world coordinates in the array
objectPosition. The determined three-dimensional coordinate refers
to the fragment whose z-value was read from the depth buffer and the result
variable is named objectPosition. The back transformation of a two-
dimensional coordinate—along the viewing pipeline—into a three-
dimensional world coordinate is naturally not possible without additional
information such as this z-component.
The calculation rule for the method gluUnProject is given by the
following equation.3
$$\begin{aligned} \left( \begin{array}{c} xObj \\ yObj \\ zObj \\
w \end{array} \right) = (mvMatrix)^{-1} \cdot (pMatrix)^{-1} \cdot
\left( \begin{array}{c} \frac{2 (xWin - viewport[0])}{viewport[2]} - (11.14)
1 \\ \frac{2 (yWin - viewport[1])}{viewport[3]} - 1 \\ 2 (depthValue)
- 1 \\ 1 \end{array} \right) \end{aligned}$$
In this equation the variable names from Fig. 11.13 are used. The vector on
the right describes the transformation back from the window coordinates to
the normalised projection coordinate system (NPC). This vector is
multiplied by the inverses of the projection matrix and the inverse of the
model-view matrix. Section 5.10 provides more details on these matrices.
Usually, the model-view matrix will not contain the transformation of the
model coordinates of an object into the world coordinates. Thus, the result
vector from Fig. 11.13 contains the point of the mouse click in world
coordinates, related to the depth value of the fragment read from the depth
buffer. If the model coordinates are required, the inverse of the
transformation matrix from model coordinates to world coordinates must
then be applied to each object.

Fig. 11.14 Method of class BoundingSphere (Java) to test whether a point lies within the
bounding sphere

If the point of the mouse click is available in world coordinates, it must


be tested for each clickable object whether this coordinate lies within the
bounding volume of the respective object. Figure 11.14 shows the Java
source code of the method contains of the class BoundingSphere.
By calling this method with a three-dimensional coordinate as argument, it
can be determined whether this coordinate of a point lies within the
bounding sphere represented by its centre and radius. This method applies
Eq. (11.9).

Fig. 11.15 Submethod of the display method of a JOGL renderer (Java) to generate a ray in the
world coordinate system for ray casting, based on the window coordinates of a point and a z-value of
the depth buffer

Instead of applying a back transformation, the ray casting method can


be used. In this case a ray is generated in the world coordinate system on
the basis of the clicked window coordinate in order to subsequently test it
for intersections with selectable objects in the scene. Figure 11.15 shows
the Java source code of a method with this functionality. Except for the call
to the last method before the return expression, this method is identical
to the method in Fig. 11.13. The gluUnProjectRay command creates a
ray in the world coordinate system as an object of the JOGL class Ray,
which, as described in Sect. 11.7, is represented by a starting point
$$\boldsymbol{o}$$ and a displacement vector
$$\boldsymbol{d}$$. This ray passes through the points (xWin,
yWin, zWin0) and (xWin, yWin, zWin1), where (xWin,
yWin) is the two-dimensional window coordinate of the mouse click and
zWin0 and zWin1 are two depth values to be added. If setting zWin0 =
0 and zWin1 = 1, then the ray starts directly on the projection plane and
extends to the background of the scene. In this case the depth value
depthValue from the depth buffer is not needed. With this parameter
choice, however, the objects may be selected even if the mouse click only
hits a bounding volume and not a fragment of the object itself, since usually
the geometry of the bounding volume differs from the geometry of the
enclosed object (see Sect. 11.7). If the parameter zWin0 = depthValue
is chosen as shown in the source code, the precision of the selection
increases. If there is no fragment at the point of the mouse click, then the z-
value from the depth buffer is equal to one, which means that the start and
end points of the ray are at the background of the scene. This makes it
unlikely that a bounding volume of an object will be hit by the ray. This is
intentional in this case, as no object was clicked. If a fragment is hit by the
mouse click, then the depth value in the depth buffer is less than one and the
starting point of the ray is closer to the camera, allowing the ray to hit
visible objects and thus select them.
After calculating the ray, it must be intersected with all clickable
objects. According to Eq. (11.13) both intersection points with a bounding
sphere can be determined. As a rule, the object located in the foreground is
to be selected. For this purpose, the smallest t-value for the intersection
points with each object must be determined. Subsequently, the object that
has the smallest t-value of all the objects intersected by the ray—and is thus
in the foreground—must be determined.
In the scene shown in Fig. 11.11 displayed by an OpenGL renderer, the
objects are selectable by both methods described in this section. For this
example, cuboids were deliberately chosen, which have quite different
shapes and deviate strongly in their geometry from the bounding spheres
used. This is especially true for the two flat objects on the left and for the
elongated objects in the middle of the scene. In some cases, this leads to a
wrong object selection or to a selection of two objects at the same time.
However, by slightly rotating the scene, a satisfactory selection can be
achieved. Overall, the object selection is reliable.
A suitable optimisation measure is to adapt the geometry of the
bounding volumes to that of the objects. For this scene, bounding cuboids,
i.e., AABB or OBB (see Sect. 11.7) would be more suitable. However, with
the bounding spheres used here, an efficient implementation is available
that provides good results for simple use cases.
The approach presented in this section and its optimisation are kept very
simple in order to gain an understanding of the basic principles. As
explained in Sect. 11.7, modern graphics systems are often able to
accurately and efficiently realise the selection of objects in complex
hierarchies of bounding volumes. In this case, the algorithms applied are
much more elaborated and complex than the methods presented in this
section.

11.9 Collision Detection


Collision detection can be used to determine whether moving objects will
collide with each other, or in other words, whether they (will soon)
intersect. Without collision detection and collision handling, objects may
interpenetrate when they are rendered. For example, a vehicle could drive
head-on into a wall in the virtual world and pass through it without
obstruction.
Collision detection for complex-shaped objects requires an enormous
computational effort when it is carried out on the level of the polygons that
model the surfaces of the objects. Even for only two objects, each
consisting of 100 triangles, unless additional information about the
distribution of the triangles is used, $$100 \cdot 100 = 10,000$$
overlap tests for the triangles (per frame) would have to be performed for
collision detection.
For this reason, bounding volumes are used to enclose objects that may
collide. Section 11.7 contains details about commonly used bounding
volumes in the context of object selection. Since the bounding volumes
usually have simpler geometric shapes than the enclosed objects, the
collision decision becomes easier.
If an object is given by a finite set of vertices and lies within the convex
hull of these vertices, an enclosing cuboid can be defined by determining
the smallest and the largest x-, y- and z-coordinate of all points. In this way,
two points $$(x_{\min },\, y_{\min },\, z_{\min })$$ and
$$(x_{\max },\, y_{\max },\, z_{\max })$$ are obtained. These two
points define the position of two diagonally opposite vertices of the
enclosing axis-aligned bounding box (AABB) as introduced in Sect. 11.7.
Deciding whether two AABBs overlap is relatively easy. Therefore, this
bounding volume is often used for objects moving on the ground, such as a
vehicle. It is easy to detect and avoid collisions with the flat ground the
object is moving on. Furthermore, the translation of the AABB and the
detection of collisions with obstacles is simple. However, if an object can
rotate, using an AABB as bounding volume may not be the best choice. In
this case, an oriented bounding boxes (OBB) might be more suitable (see
Sect. 11.7). However, collision detection between OBBs is more difficult
because these bounding volumes can be rotated arbitrarily and their
representation by only two corner points is no longer possible.

Fig. 11.16 Bounding volumes in the form of a cuboid and a sphere

A small bounding sphere enclosing a symmetrical object is easy to find


and the collision test for spheres is simple. For an asymmetric object, the
definition of a bounding sphere is somewhat more complex than the
definition of an AABB (see Sect. 11.7). Figure 11.16 shows an object
consisting of a cylinder with an attached cone as its top. The bounding
cuboid is very easy to determine, while it is not immediately obvious how
to calculate the centre and radius of the smallest enclosing sphere.
As an example, consider a cuboid object of width w, height h and depth
d. In world coordinates the centre of this cuboid lies at the origin, so that an
exactly fitting AABB can be defined by the vertices
$$(-w/2, -h/2, -d/2)$$ and (w/2, h/2, d/2) (see above). For bounding
spheres, the centre $$\boldsymbol{c}$$ and the radius r of a sphere
that completely encloses the cuboid can be defined as follows (see also
Sect. 11.7).
$$\begin{aligned} \boldsymbol{c} = (0, 0, 0)^T r = \frac{1}{2}
(11.15)
\sqrt{w^2 + h^2 + d^2}. \end{aligned}$$
Since the geometry of the sphere is not very well suited for a cuboid, the
bounding volume contains large areas outside the object. This can lead to
false collision detection when another object moves past the cuboid without
actually intersecting it. To reduce this problem, a constant factor for the
radius can be used to shrink the sphere. However, this leads to a sphere that
does not completely enclose the object. Collision detection is no longer
possible for the parts outside of this sphere.
The collision detection for two (bounding) spheres is relatively simple.
Two spheres collide (intersect) if the distance between the centres of the
spheres is less than or equal to the sum of their radii. For two bounding
spheres, let the centres
$$\boldsymbol{c}_1 = (c_{1,x}, c_{1,y}, c_{1,z})^T$$ and
$$\boldsymbol{c}_2 = (c_{2,x}, c_{2,y}, c_{2,z})^T$$ and the radii
$$r_1$$ and $$r_2$$ be given. Then the collision test using the
Euclidean distance can be expressed by the following inequality.
$$\begin{aligned} (c_{1,x} - c_{2,x})^2 + (c_{1,y} - c_{2,y})^2
(11.16)
+ (c_{1,z} - c_{2,z})^2 \le (r_1 + r_2)^2. \end{aligned}$$
Since the determination of the exact distance between the centres of the
spheres is not necessary for testing the collision, the inequality does not
have to be solved for $$(r_1 + r_2)$$. The result of the left side of the
inequality can be compared with the square of the sum of the radii (right
side of the inequality). This avoids the application of the square root for the
left side, which increases efficiency because determining the square root is
more time-consuming than a multiplication.
Section 11.7 summarises some properties of bounding volumes
commonly used in computer graphics. Almost all criteria mentioned there
for the selection of bounding volumes for efficient object selection apply to
the selection of bounding volumes for efficient collision detection. The
complexity of testing if a point lies within a bounding volume or the
computation of the intersections with a ray do not play a role in this case.
Instead, the effort required to detect an overlap of bounding volumes of the
same or different geometry must be taken into account. [1, p. 976–981]
provides methods for such tests.
In computer graphics, bounding volumes are usually used in collision
tests. For dynamic surfaces, as described in Sect. 11.6, this collision
detection strategy is more complex to apply. Such a collision test can be
realised, for example, by dynamically recalculating bounding volumes
depending on the (dynamically changing) geometric object to be enclosed.
However, this is already possible for powerful graphics processors.
Furthermore, dynamic surfaces may penetrate themselves if no
countermeasures are defined. In [37], a method for simulating cloth is
proposed in which complex calculations are carried out to avoid the fabric
penetrating itself.

Fig. 11.17 Frames from an animation of a scene in which the small cube moves from right to left:
Collisions occur with the cube in the middle (frames 2 and 3) and the cube on the left (frame 6). The
collisions are indicated by a colour change to yellow of the two colliding cubes. The scene was
animated by an OpenGL renderer

11.10 Collision Detection in the OpenGL


Figure 11.17 shows some frames of an animated scene of different coloured
cubes. In the animation, the smallest of the cubes moves from right to left
through the scene, intersecting first the cube in the middle and then the cube
at the bottom left. If a collision of two cubes is detected, then the colours of
both colliding objects change. In the scene, the small cube can move
through the other cubes. Apart from the colour change, there is no reaction
to a collision. This reaction is called collision handling. The application, for
example, could stop the movement of the small cube as soon as it collides
with another cube.

Fig. 11.18 Method of the class Cuboid (Java) to create a bounding sphere in model coordinates for
a cuboid: The calculation is done according to Eq. (11.15)

Fig. 11.19 Method of the class boundingSphere (Java) for detecting the collision between two
bounding spheres: The calculation is done according to formula (11.16)

As a prerequisite for collision detection, each cube is represented by a


bounding sphere. Figure 11.18 shows a method of the class Cuboid. The
objects of this class are used to represent the cubes in the scene. Besides the
cube properties and the properties of the bounding volume, such an object
contains the corresponding data to trigger drawing by the GPU via the
OpenGL interface (not shown in the figure). The class also contains the
method for generating a bounding sphere as shown in the figure. Based on
the dimensions of the cube, the centre and the radius of the sphere are
determined according to Eq. (11.15). The subsequently created object of the
class BoundingSphere is stored as an instance variable of the object
(not shown in the figure). In the calculation for the radius, a factor of 0.85
was added to reduce the size of the spheres so that, on the one hand, a
collision is only detected when the objects actually intersect and, on the
other hand, the objects do not have to overlap too much before a collision is
detected. The factor was determined based on test runs. This choice delivers
good results for the cubes involved in this example, as shown in Fig. 11.17.
Figure 11.19 shows the method collide for detecting a collision
between two bounding spheres. This method belongs to the class
BoundingSphere, whose objects represent bounding spheres for a cube.
When the method is called, a bounding sphere of another object is passed as
an argument to test for a collision. The return value is the result of the
evaluation of the inequality (11.16) as a boolean value.
Overall, this simple and efficient implementation yields satisfactory
results, although the geometry of the bounding spheres is not ideal for
enclosing cubes. On the other hand, the collision test is very simple and
thus efficient. For a more precise collision calculation between cubes, for
example, an OOB or a k-DOP can be used as a tighter enclosing volume
(see Sects. 11.9 and 11.7). However, the effort would increase compared to
the example in this section.

11.11 Auralisation of Acoustic Scenes


Since the sense of hearing can highly contribute to immersion and thus to
presence (see Sect. 11.1), some important insights into the auralisation of
acoustic scenes are presented below. Auralisation refers to making acoustic
situations audible. This section refers to virtually simulated acoustic scenes
that are completely artificial or simulate real scenes. This can be, for
example, a replica of an actual room, a future room (designed by an
architect) or a completely fictional scene.
The following sections provide an introduction to the core of modelling
and simulating virtual acoustic scenes in order to develop a basic
understanding. For further reading on spatial hearing and its psychoacoustic
effects, the book by Blauert [6] is recommended. Vorländer [36] contains
further elaborations on the auralisation of auditory virtual realities.
Section 11.11.1 presents (physical) basics of acoustic scenes and
comparisons with visual scenes. The localisability of auditory events (see
Sect. 11.11.2) plays a crucial role in achieving a good spatial auditory
impression of a simulated acoustic scene (see Sect. 11.11.3). Section
11.11.4 summarises important properties of acoustic playback systems,
whereby wave field synthesis is described in more detail, as it offers the
possibility of reproducing acoustic scenes very accurately. The increasingly
important Ambisonics method is described in Sect. 11.11.5. Finally, some
important standards in the context of auralisation of acoustic scenes are
mentioned in Sect. 11.11.6.

11.11.1 Acoustic Scenes


This book describes many effects in visual scenes using the methods
of geometrical optics—also called ray optics—which is a common
approach in computer graphics. This is especially important when fast
computation is required and little computing power is available, for
example, for real-time interactive applications on simple computers or
mobile phones. In this case, light is modelled as a ray that propagates in a
purely geometric way along lines. The ray tracing method (without
extensions) described in Sect. 9.9 is a good example for this approach. With
this model, many problems in computer graphics can be solved efficiently.
However, geometric optics reaches its limits when diffraction, interference
or absorption effects are to be modelled.
Quite analogous to geometrical optics, geometrical acoustics—also
called ray acoustics—can be used for the simplified but efficients
modelling of auditory scenes. In this case, sound waves are modelled as
rays, which in essence behave according to the same rules as in geometrical
optics. For the auralisation of rooms, it is usually sufficient to simulate the
direct sound from a sound source to the listener and some early reflections
using geometrical acoustics. The early reflections are those sound
components that are reflected with high intensity from the room boundaries
(walls, ceiling and floor) and objects in the room and arrive at the listener
only with a short time delay. This is similar to the simulation of visual
scenes by the ray tracing method. In this case, too, the computation can be
terminated after a certain number of light reflections (early light reflections)
and still achieve a good quality. For the improvement of the spatial auditory
impression, diffuse sound is also important, which can be simulated using
reverberation, taking into account the parameters of the room. In this case,
the decay curve of a broadband impulse emitted during an acoustic
measurement of the room is of particular importance. The characteristics of
the room are recorded as a so-called impulse response. Details on this topic
can be found, for example, in [34].
The simulation results achievable with geometrical acoustics, together
with a reverberation simulation if necessary, are a good approximation as
long as the involved wavelengths are small compared to the modelled
geometric structures. If $$\lambda $$ is the wavelength, f is the
frequency and c is the speed,4 then the following relationship generally
applies.
$$\begin{aligned} c = f \cdot \lambda \end{aligned}$$
Therefore, geometric acoustics is more suitable for signals with high rather
than low frequency components, as long as the surface irregularities or
structures are small compared to the wavelength. Like pure geometrical
optics, pure geometrical acoustics is also limited. For example, diffraction
effects cannot be modelled without extensions.
Under these conditions, significant acoustic effects can be modelled.
One type of acoustic effect is background noise, e.g., from wind, rain or
traffic in a large city. This type of background noise is caused by the
superposition of direct sound from a large number of spatially distributed
sound sources or their diffuse reflections, e.g., from building walls or other
objects, so that a listener cannot determine the spatial position of the sound
sources. Other background noises, such as background music, the ticking of
a clock or the clacking of a finished toast in a toaster, can be localised, but
their exact spatial position is usually of little importance to the listener.
Thus, both types of ambient noise are characterised by the fact that their
exact spatial position is not essential for modelling an acoustic scene.
Therefore, their reproduction can be achieved by simple means, such as the
additive addition of wind noise or music. Background noise is similar to
ambient light, which is present uniformly throughout a visual scene and
does not emanate from a particular light source or direction.
Many acoustic effects emanate from a concrete sound source whose
spatial position can be localised (see Sect. 11.11.2) and is significant for the
listener, for example, the voice of a conversation partner, the sound of a
passing animal or the closing sound of a certain door. For a realistic
reproduction of a sound source in an acoustic scene, in addition to its
position and orientation in space, its radiation characteristic is important.
Due to this characteristic, sound is emitted with different strengths in
different directions. This is similar to the light sources in a visual scene. For
example, a spotlight emits a cone of light in a specific direction, while a
point light emits light in all directions (see Sect. 9.1).

11.11.2 Localisability
As explained in the previous section, many acoustic effects emanate from
localisable (locatable) sound sources. This creates an essential part of the
auditory spatial impression. Another contribution to an auditory spatial
perception is provided by (diffuse) reflections.
The localisation of sound by humans is essentially based on the
characteristics of the two individually shaped outer ears and the position of
the ears at the side of the head. This provides the auditory system with
depth cues, similar to spatial vision (see Sect. 11.12.1), to determine the
position of a sound source. As in visual perception, monaural depth cues,
which can be perceived with only one ear, can be distinguished from
binaural depth cues, which require both ears.
(Spatial) hearing with both ears is called binaural hearing (see [6]).
Consider a sound source located in the horizontal plane to the side of the
listener. Due to the finite speed of sound, the sound from this source reaches
the ear facing away from the sound source with a time delay compared to
the ear facing the sound source. This depth cue is called interaural time
difference (ITD). Due to the acoustic shadowing by the head, sound arrives
at the ear facing away from the sound source with a lower sound level
compared to the ear facing the sound source. This depth cue is
called interaural level difference (ILD). The number of sound sources that a
person can distinguish in the horizontal plane with the help of these
binaural depth cues depends strongly on the concrete acoustic situation.
To determine the height of a sound source above the head or whether
the sound source is located in front of or behind the head, the hearing
system evaluates the spectral distortions of the incoming sound signal,
which arise due to the filter effect caused by the shapes of the outer ears, the
head and the torso. However, the shapes of the pinnae as part of the outer
ears are the decisive components. With these monaural depth cues, the
positions of sound sources can be distinguished from each other in the so-
called median plane. These depth cues are called “monaural” because they
can also only be perceived with one ear.
For the perception of distance in air, the amplitude of the sound signal
plays an essential role. This amplitude decreases quadratically with
increasing distance. This is mainly due to the uniform propagation of sound
in all directions (distribution of the available energy) and the friction with
other molecules in the air (air absorption). Due to the attenuation caused by
this air absorption on the propagation path, high-frequency components are
attenuated more than low-frequency components, so that at greater
distances such spectral distortions are clearly perceptible. For example, the
thunder of a thunderstorm is perceived as a low-frequency rumble at a high
distance, while a more broadband crash is heard close to the thunderstorm.
In addition, the hearing system uses echoes from surrounding objects,
which are perceived as a separate auditory event. For example, echoes from
room walls are used to determine the distance in a room.
In the so-called acoustic free field, there are no acoustic obstacles. In
this case, the direct sound reaches the listener only with a distance-
dependent attenuation. In a room, reflections occur off the walls, ceiling and
floor, causing parts of the sound to reach the listener via a longer path than
the direct path. Reflections arrive at the listener later than direct sound. Up
to a certain time delay, a reflection is not perceived as an independent
auditory event, only a change in the perceived position of the sound source
or a sound distortion occurs. A longer time delay leads to the reflection
being perceived as an independent echo. If a large number of reflections
with a lower amplitude than the direct sound arrive late at the listener
(many late reflections), then the listener perceives a reverberation.
Besides propagation through the air, sound is transmitted through
objects. For example, sound is mainly reflected by walls. Part of the
remaining energy is absorbed by the wall. The remaining part is radiated
again through the wall on the other side. Since the degree of this
transmission is frequency dependent, this creates a sound distortion. For
example, the deep bass of a piece of music, if played loud enough, is still
clearly audible on the other side of a wall, while the higher and middle
frequency components are not perceptible or only very quietly perceptible.
Because in this example the entire wall is excited to vibrate, the spatial
position of the music can no longer be clearly localised behind the wall. Of
course, this effect also depends on the properties of the wall.

11.11.3 Simulation
In order to use the findings on the localisability of auditory events (see Sect.
11.11.2) for the simulation of a realistic, spatial audio reproduction, the
individual head related transfer functions (HRTF) of a human can be
measured. For the measurement, small microphones can be inserted into the
ear canal and measurement signals from all directions of incidence of
interest in the horizontal and media plane can be presented to the listener.
After evaluating the recorded signals, a catalogue of head-related transfer
functions is obtained for each measured direction of incidence and for each
ear. The direction of incidence is typically given as a tuple of an angle in
the horizontal and median plane. For example, $$(0^\circ , 0^\circ )$$
represents the direction of incidence directly in front of the listener and
$$(-30^\circ , 30^\circ )$$ represents a direction of incidence laterally to
the upper right. Through this type of measurement, the transfer functions
include the filtering effect of the head and torso in addition to the positional
differences of the ears and the characteristics of the pinnae of the outer ears.
In more recent approaches, head-related transfer functions are determined
on the basis of the positions and dimensions of the outer ears and the
dimensions of the head and torso. These anthropometric data can be
obtained, for example, from photographs or 3D scans. Measured head-
related transfer functions and the corresponding anthropometric data can be
found, for example, in freely available databases, such as [9].
With the information about the position, orientation and properties of
the sound sources, the listener and the acoustically relevant objects in a
scene, the amplitudes of the signal components arriving at the listener from
the sound sources and their direction of incidence at the listener position
can be determined using geometrical acoustics. Similar to visual scenes, a
ray tracing method can be used for this purpose. If the acoustic
representation of reflecting obstacles is required in this simulation, the
resulting reflections can be modelled as so-called mirror sound sources. In
this approach, it is assumed that behind the acoustic obstacle there is
another sound source at a suitable distance and position which emits this
reflection. This is a proven and well-applicable method, which can be used
as an alternative to or in combination with a ray tracing method.
In these calculations, movements of the sound sources and the listener
can be included by simple changes in position. If the acoustic scene is
auralised together with a visual scene, then in most cases the current
position and orientation information of the (head of the) listener is
available. Modern head-mounted displays already have corresponding
sensors used for the visual representation that can be accessed for this
purpose. While traditional auralisation methods usually only use head
orientation (rotation) and thus three degrees of freedom, newer approaches
additionally take into account the displacement of the listener and the head
in space (translation) (see, for example, [23] or [27]). Such methods use all
six degrees of freedom (rotation around three axes and translation in three
directions), which increases the plausibility of the acoustic simulation.
If the object is moving very fast, the Doppler effect must also be taken
into account. In reality, if the listener and the sound source are moving
rapidly towards each other, the signal is compressed in time due to the
properties of sound waves, which leads to an increase in the perceived
frequency. If the listener and the sound source move away from each other,
then a temporal stretching and thus a reduction of the perceived frequency
occurs. This effect can be heard in passing police or fire brigade vehicles.
When such a vehicle approaches quickly with its siren on, the pitch appears
higher than when it moves away. Such compression and stretching of the
signal can be simulated by increasing or decreasing the playback speed of
the audio data.

11.11.4 Reproduction Systems


To present the simulation of an acoustic scene to a listener, a suitable
playback system must output the calculated simulation results. Basically, a
distinction can be made between playback via headphones and playback via
loudspeakers.
In a classical headphone-based reproduction of stereo signals, for
example, from a music CD, the spatial impression is limited. The signals
are usually processed in such a way that the auditory events are located at
different positions within the head and always between the ears.
The perception of auditory events outside of the head and thus a better
spatial perception can be achieved by using head-related transfer functions
(see Sect. 11.11.3). To auralise the calculated signal components for the
listener, the corresponding head-related transfer function must be selected
from the catalogue of head-related transfer functions for the direction of
incidence of each incident signal component at the listener position. These
functions are applied as filters to the respective signal of the sound source.
The attenuation of the signal due to the distance from the listener and any
sound distortion that may occur must be taken into account according to the
simulation computation. Furthermore, care must be taken that the signal
used from the sound source does not (yet) contain any spatial information,
for example, due to reflections in a room when the signal was recorded.
Therefore, such signals should only be recorded in a special anechoic room
or at least in a low-reflection studio room. After filtering as described
above, for each signal component from a certain direction of incidence, a
signal is available for the left and right ear. These signals are to be
additively superimposed for each ear, so that exactly one signal is available
for the left and right ear, which can be presented to the listener via a
specially equalised headphone. The use of a headphone eliminates the effect
of the pinnae of the outer ears, the head and the torso. These characteristics
are already included in the head-related transfer functions (see above).
Therefore, this method provides a very realistic and accurate reproduction
of auditory events outside of the head. Head-related transfer functions are
individual, as each person has a slightly different shape of the outer ears,
head and torso. Therefore, the spatial impression is best when using the
individually measured head-related transfer functions of one’s own ears.
However, head-related transfer functions of other listeners often already
provide a plausible spatial perception.
If the auralisation takes place together with a (three-dimensional) visual
presentation, the position and orientation information of the (head of the)
listener is available, for example, through corresponding sensors of a head-
mounted display. The position and orientation values can be used to
dynamically determine the ear signals as well. This allows the auditory
event to be perceived as remaining at the simulated position in space when
the head moves. In this case, the auditory event does not move with a head
movement.
This type of headphone-based audio reproduction is suitable for many
applications, for example, to achieve a realistic, spatial audio representation
in computer games. The disadvantage of this type of playback is the
calculation of the ear signals for only one listener and the necessity of
playback through a headphone.
An alternative is the output via loudspeakers. If classical stereo signals,
for example, from a music CD, are reproduced via two stereo loudspeakers,
then, similar to the reproduction of these signals via stereo headphones, the
spatial auditory impression is limited. In this case, the best sound
impression is achieved when the listener forms an equilateral triangle with
the two speakers. This results in the so-called stereo triangle. The resulting
position of the listener is the optimal listening zone (sweet spot), which is
only optimal for one listener. Through the so-called sum localisation,
auditory events are perceived at different positions between the
loudspeakers. Classical audio recordings are usually optimised for such a
listening situation.
Special audio signal processing methods (see, for example, Sect.
11.11.5) can be used to improve the spatial impression via stereo speakers.
For further enhancement, methods with more than just two speakers are
available, for example, as a 5.1 or 7.1 surround system. However, these
surround productions usually reproduce the modelled acoustic scene less
precisely than it is possible via headphone reproduction with (individual)
head-related transfer functions. Furthermore, this type of reproduction, just
like classical stereo reproduction, depends on the characteristics of the
speakers used and the acoustic environment in which the speakers and the
listener are located. Furthermore, even with these methods, a listener can
only perceive a good spatial sound within a narrow optimal listening zone.
A more complex loudspeaker-based reproduction method for a three-
dimensional sound field is the wave field synthesis [4]. This method allows
a realistic virtual sound field to be reproduced in a real room, which is
independent of the number of listeners and has a large listening zone (sweet
spot). This method is based on Huygens’ principle, which describes how
any wavefront can be generated from a superposition of elementary waves.
This allows several listeners at different positions with different head
orientations to listen to the sound field simultaneously. However, the
reproduction of a virtual sound field in a real room requires the
consideration of many parameters, especially the acoustic effect of the room
through its shape, dimensions and physical properties of the boundary
surfaces and the resulting reflections from walls, ceiling and floor.
Furthermore, the reproduction requires a larger number of loudspeakers,
whose number, acoustic properties and positions in the room also have an
influence. The appropriate number of loudspeakers depends on the size of
the area in which the sound field is reproduced and on the expected quality
of the reproduction. As a rule, however, considerably more than six or eight
loudspeakers will be required, as is the case with conventional surround
systems. The influencing factors mentioned must be determined once by
measurement before putting a wave field synthesis system into operation
and must be included in the subsequent computations for generating the
virtual sound field.

11.11.5 Ambisonics
Ambisonics is a method for recording, transmitting and reproducing a
spatial sound field based on a closed mathematical approach (see, for
example, [38]). With this method, a real sound field can be recorded or
virtual sound sources can be placed in three-dimensional space. The source
for the virtual sound sources can be signals that were recorded in an
anechoic room and thus contain few reflections and reverberations of the
recording room. The sound field is broken down into spatial components,
which are recorded or represented by microphones with different directional
characteristics. The basic version of this method, called first-order
Ambisonics, has four channels that can be represented by an encoder in the
so-called (first-order Ambisonics) B-format and stored in a file.
The sound field can be reproduced via loudspeakers or headphones.
Before playback, a decoder must be used to convert the B-format signals
into the desired number of playback channels. This decoding allows the use
of simple and often already existing speaker arrangements, such as a 5.1 or
7.1 layout. Within limits, any non-standard speaker layout can be used. This
layout must be known in order to calculate the correct signals for the
individual speaker positions (see [38, p. 5ff]). In principle, with each
additional decoded channel, an improvement in the spatial stability of the
sound field is achievable. In order to increase the playback quality in
practice, the decoder can be optimised based on psychoacoustic findings
[2]. Instead of using the four-channel B-format directly for transmission and
storage, it can be converted to one of the so-called UHJ formats (also called
C format), which allow backward compatibility to a stereo or mono
playback system.
The use of the first-order B-format is suitable for the distribution of
music material with (moderate) surround sound over standard Blu-ray,
DVD or CD media and playback over stereo or surround systems. A higher
spatial reproduction quality and the extension of the listening zone (sweet
spot) can be achieved by higher-order Ambisonics. This is particularly
interesting for the realisation of acoustic virtual worlds. In this case, the
number of directional components increases and with it the number of
transmission channels and the required computing power. For very good
spatial imaging and reconstruction and a large listening zone, the theoretical
method quickly requires a high order and thus a large number of channels to
be calculated and—if no headphone reproduction is used—an equally large
number of loudspeakers. In order to reduce this effort and to get by with a
moderate order, psychoacoustic knowledge of the human auditory system is
applied, whereby good localisability of auditory events can be achieved
even in the case of erroneous sound field reproduction [3, 10]. Although
Ambisonics was developed and used in academia as early as the 1970s, it is
not yet widely used. Only recently has there been a stronger interest from
standardisation institutions in the media sector and commercial interest
from internet companies. [38, p. IV–V].

11.11.6 Interfaces and Standards


Since the OpenGL does not support modelling of acoustic scenes,
alternative specifications or methods must be used. With the Open Audio
Library (OpenAL) an interface specification is available that is similar to
the OpenGL in many respects (see https://www.openal.org). For the
OpenAL, the Java binding Java OpenAL (JOAL) exists, which is available
from the same source as the Java binding JOGL to the OpenGL (see https://
jogamp.org). The OpenAL can be used by audio hardware manufacturers as
an open standard to implement platform-independent, hardware-supported
audio playback.
In addition to this interface specification, a number of other standards
and possibilities exist for auralising acoustic scenes. In this context, the
international standard MPEG-H 3D Audio (specified as ISO/IEC 23008-3)
should be mentioned, which supports the coding of audio channels, audio
objects or for higher-order Ambisonics. Dolby Atmos is a surround audio
format developed by Dolby Laboratories and used for the hardware of this
company. This format allows for a theoretically unlimited number of
channels, the encoding of moving audio objects, and is backward
compatible to 5.1 and 7.1 systems. In addition, there are open interfaces
from internet and game industry companies for the realisation of acoustic
virtual realities. Two examples are Google Resonance or Steam Audio by
Valve.

11.12 Spatial Vision and Stereoscopy


Seeing with both eyes enables spatial vision, also called stereo vision or
stereopsis. When viewing a real scene, the observer gets a real and true
impression of depth. This is the highest form of human vision. Section
11.12.1 describes depth cues that the human visual system can evaluate and
some aspects of the human visual perception.
Three-dimensional computer-generated stereoscopic scenes provide two
different two-dimensional images corresponding to the slightly different
viewpoints of the left and right eye. The stereoscopic output methods
presented in Sect. 11.12.2 allow these images to be presented separately to
the viewer’s left and right eyes. This type of presentation is
called stereoscopy. To reproduce the different viewpoints and perspectives
of the two eyes, a suitable perspective distortion must be applied. Section
11.12.3 provides two methods for this perspective distortion. Section
11.12.4 contains implementation examples for stereoscopic rendering using
the OpenGL with JOGL.

11.12.1 Perceptual Aspects of Spatial Vision


The impression of depth in a viewed scene can be created by various depth
cues. These cues are processed by the human visual system to create a
uniform perceptual impression. There are two types of visual depth cues.
Monocular depth cues can be perceived with only one eye. Binocular depth
cues require both eyes. The impression of spatial depth is thus not solely
bound to seeing with both eyes and can already be perceived with one eye
or in a two-dimensional representation (an image).
The strongest monocular depth cue is occlusion. When one object is
partially or fully occluded by another, the visual system infers that the fully
visible object is in front of the other.
The linear perspective or point-projection perspective is also a very
strong depth cue. Parallel lines converging towards a vanishing point
indicate depth. This effect can be observed well on (long) straight roads or
railway lines.
The relative size of an object compared to its surroundings is also used
by the visual system. Everyday experience shows that closer objects are
larger than more distant objects. Knowledge of object sizes help to estimate
distance. By combining the familiar size (known size) with the perceived
size of an object, the visual system can infer the distance of this object.
If the shadow of an object is cast on another object, for example, on the
ground, it can be assumed that the object producing the shadow is standing
on the floor. If the shadow is cast on another object and deformed
accordingly, the illuminated object must be in front of the other. Likewise,
the shape of the shadow can be used to infer the shape of the object and
thus its position in depth.
If a horizon is visible in a scene, the relative height of an object above
the horizon indicates its distance in depth from the horizon. Objects below
the horizon appear closer the further they are from the horizon. Objects
above the horizon appear more distant the further they are from the horizon.
Another cue of depth is the texture gradient. The increase in texture
gradient with distance is observed, for example, in surfaces with a regular
pattern, such as uniform tiles or paving stones. The even spacing between
patterns appears smaller with increasing distance. Smaller perceived objects
are interpreted as being further away than larger perceived objects,
especially if these objects are in the same scene.
Objects that are far away are obscured by layers of air and particles such
as dust, water droplets and dirt, making the objects appear dulled and with
less contrast than close objects. This effect is called aerial perspective or
atmospheric perspective.
The depth cues considered so far can also be produced by creative
means in a non-animated two-dimensional image. If a real scene is
observed, the observer receives images from different perspectives through
head movement, from which depth information can be obtained. Similarly,
depth information can be obtained through movements of the observer
herself. In both cases, this is possible even when the observer is viewing the
scene with only one eye.
A motion parallax occurs when objects move relative to each other or
relative to the viewer. The position and size of the objects play a role. The
objects that the observer can observe move at different speeds depending on
their distance. This effect can be observed when looking out of the side
window of a vehicle. When the observer fixates an object at a certain
distance, the objects that are closer to the observer than the fixation point
appear to move against the direction of the vehicle’s motion. Objects further
away from the observer than the fixation point appear to move with the
direction of vehicle motion. This observed movement is slower than the
movement of the closer objects.
When an object is viewed up close, the muscle tension in the lens of the
eye (ciliary muscles) changes to bring the object into focus on the retina.
This effect is called accommodation. The resulting focus on one object
potentially blurs other objects in the foreground and background. The visual
system creates a depth perception in this situation.
All depth stimuli described so far can be perceived with only one eye.
For the perception of binocular depth cues both eyes are required. The
visual system fuses the two images of the eyes into a unified spatial
perception.
The eyes have the ability to change their direction of gaze by rotating
the eyeball. The closer an object is to the face, the more the eyeballs must
rotate inward to fixate the object. From the strength of this eye
convergence, i.e., from the muscle tension of the eyes, the spatial distance
of the fixated object to the face can be inferred.
Since the muscular tension of the eyes is processed by the visual system
for depth perception through accommodation or convergence of the eyes,
these two depth cues are called oculomotor depth cues. The viewer can
literally feel the spatial depth of an object. In a self-experiment,
convergence can be felt by first holding a finger far in front of the face and
slowly bringing it up to the nose. When the finger is close to the nose, the
muscle tension can be clearly perceived.
Another binocular depth cue is the binocular disparity. This disparity
occurs because the two eyes are at different positions. For German women,
the median pupil distance is 60 mm and for German men 61 mm. Overall,
this distance varies from 55 mm to 69 mm (5%-percentile and 95%-
percentile) for German women and men, respectively (see [11]). Thus, each
of the eyes looks at the three-dimensional scene from a slightly different
perspective, resulting in slightly different images projected onto the retinae.
When the eyes fixate on an object, the images for the left and right eyes fall
on the fovea, the most sensitive part of the retina that provides the highest
visual acuity. Other objects located in front of or behind the fixated object
are projected onto the retina of the left and right eyes at non-corresponding
points. Corresponding retinal points are the points that are located at the
same position on a retina when the two retinae are thought to overlap. The
discrepancy between the positions of the non-corresponding points of an
object on the retinae is the binocular disparity. By comparing the images of
the left and right eye, the visual system creates a spatial impression.
However, it is not fully understood how exactly the visual system
superimposes the two images to identify corresponding points of object
images [17, p. 238–240].
Table 11.1 Classification of depth cues and technical simulation

Depth cues and their classification Technical simulation


Monocular
Occlusion Two-dimensional image
Linear perspective Two-dimensional image
Relative size Two-dimensional image
Familiar size Two-dimensional image
Shadow Two-dimensional image
Relative height to the horizon Two-dimensional image
Depth cues and their classification Technical simulation
Texture gradient Two-dimensional image
Aerial perspective Two-dimensional image
(Atmospheric perspective)
Motion parallax Motion induced Two-dimensional animation
Head movement Motion induced Rendering controlled by head tracking
Accommodation Oculomotor Multiple image layers, hologramm
Binocular
Convergence Oculomotor Stereoscopic rendering
Binocular disparity Stereoscopic rendering

Table 11.1 provides an overview of the visual depth cues presented in


this section. As can be seen in the table, the classification into monocular
and binocular stimuli overlaps with the characterisation of depth cues as
oculomotor and non-oculomotor. The last column of the table indicates by
which technical reproduction the respective depth cue can be presented to
the viewer with minimal effort. Even two-dimensional paintings can contain
the monocular depth cues from occlusion to atmospheric perspective.
Painters have used these means for centuries as a stylistic device to provide
paintings with effective depth perception. In some cases, these depth cues
are used competitively to create deliberately contradictory representations.
In order to create a motion parallax, a still image is naturally not
sufficient. The scene must be animated, for example, by a two-dimensional
video or a computer graphics animation. The consideration of head
movements requires information about the position of the viewer’s head,
which can be determined by a sensor (head tracker). Depending on this
position information, the viewer is presented with different perspectives of
the scene. This makes the presentation interactive in the sense of computer
graphics. In principle, this technique can already be applied to two-
dimensional images, but it is usually used in stereoscopic displays.
To stimulate accommodation of the eye, stimuli must actually be
presented at different distances from the eye. This can be realised by
different image planes or by holograms. Displays with different image
planes are, for example, so-called light field displays. For this monocular
depth cue, too, the reproduction technique can in principle be applied to
two-dimensional images, but it is usually used in stereoscopic displays.
11.12.2 Stereoscopy Output Techniques
The depth cues of convergence and binocular disparity need both eyes.
Therefore, they can only be reproduced by stereoscopic rendering. In this
technique, the viewer is presented with different two-dimensional images
for the left and right eye, corresponding to the slightly different viewpoints.
This type of presentation is called stereoscopy, stereoscopics or stereo
imaging.
To realise stereoscopic rendering using computer graphics, the same
scene must be computed with two slightly different projection centres at
interpupillary distance. Section 11.12.3 presents two such projections. After
determining the projections and projection matrices for the left and right
eyes, the simplest approach is to render the full scene twice. However, since
only the eye viewpoints differ, optimisations can be made so that much of
the computational effort does not have to be duplicated (see [33, p. 409–
412]). The main problem a stereoscopic display solves is the presentation of
two different images to the two eyes.
The simplest form of a stereoscopic display is to place the images for
the left and right eye side-by-side and to use a special viewing technique.
The images should be small enough to maintain the physiological
interpupillary distance (see above). In the parallel viewing technique, the
viewer directs his gaze with parallel eyes to the two adjacent images, so that
the left eye looks at the left image and the right eye at the right image. In
doing so, he brings his face quite close to the images. When he relaxes his
eyes (accommodation) to look into the distance, the two images fuse into
one spatial perception. This creates the sensation of looking through the
images. Concentrating on only one of the images per eye can be supported
by a visual barrier, for example, a small piece of paper, placed between the
images and the viewer’s nose. This viewing technique can be tested with
the illustrations in Figs. 11.27 or 11.28. To further support this viewing
method, special optics can be placed in front of the viewer’s eyes. Such
optics were already used in early stereoscopes and are used today in head-
mounted displays (HMD) (see below).
Another stereoscopic viewing technique is cross-eyed viewing, where
the images are reversed compared to the parallel viewing arrangement. The
image for the left eye is now on the right side and the image for the right
eye is on the left side. For a spatial perception, the viewer’s eyes must
converge so that both eyes receive the correct image again. In this method,
the eyes accommodate to the near.
Since two-dimensional displays, such as computer monitors or
projection screens, are common output methods, stereoscopy methods were
developed in which the two images for the left and right eye are
superimposed on such displays. On the viewer’s side, special glasses must
separate the left from the right image to present the correct image to the
respective eye.
The colour anaglyph technique encodes the images for the left and right
eye using two different colours—often red and cyan5—and then
superimposes the images for display on a two-dimensional screen. For
viewing, coloured glasses are required that separate the superimposed
images for the left and right eyes. By using the colours to separate the
channels between the eyes, colour information of the scene is lost. Thus,
this technique can only be used for colour scenes to a limited extent. High-
quality colour representations are not possible with this technique.
Sometimes this technique is simply referred to as anaglyph technique.
However, an anaglyph technique in its original meaning can be any kind of
stereo projection, such as the techniques described below.
To avoid colour distortion, the polarisation technique can be used. Light
waves normally oscillate in all directions perpendicular to the plane of
propagation. Special polarisation filters can be used to polarise these waves
so that they only oscillate in one direction. Since this does not change the
information content of the images, the images for the two eyes can be
created with a different polarisation direction—for example, horizontally
and vertically polarised—and then superimposed. For projection onto a
screen, the screen must be specially coated, for example, using silver, so
that the reflection of the light from the screen does not change the direction
of polarisation. To separate the images, the viewer must wear glasses with
polarising filters that separate the images for the left and right eyes. When
using simple linear polarisation with one horizontal and one vertical
polarisation direction, the channel separation depends on the head position.
This can cause crosstalk between the channels if the viewer tilts her head to
one side. For this reason, polarising filters that produce circular polarisation
are often used. With this technique, the light waves for the left and right
eyes are polarised counter-clockwise or clockwise respectively, resulting in
invariance to lateral head tilts.
The two differently polarised images can be superimposed using two
projectors. The use of only one projector is also possible by displaying the
two images in different areas, e.g., top and bottom, on the output chip of the
projector and then passing them through differently polarising filters and
superimpose them. A time-shifted presentation of the images can also be
realised, similar to the shutter technique explained below. In this case, a
different polarisation is achieved by a (quickly) switchable polarisation
filter. Unlike the shutter technique, the glasses do not require any time-
dependent, active switching. In the case of LCD screens, the two images
can be output interleaved line by line. For example, the even lines can be
polarised for the left image and the odd lines for the right image.
A general advantage of the polarisation technique is the use of relatively
inexpensive glasses, which are completely passive and thus do not depend
on a power source or a synchronisation signal. Due to these properties, this
technique is preferably used for stereoscopic displays in (3D) cinemas.
Furthermore, the light intensity is less limited by filtering than by the
shutter technique (see below). However, the channel separation of this
method is not optimal, so that crosstalk between the channels, so-called
ghosting, can occur. By using two projectors, a projector with a bottom and
top display or the line-by-line interlaced display, the technically available
spatial resolution (e.g., 4k used by two 2k left and right images) is
effectively halved compared to the resolution of the stereoscopically
presented image (e.g., 2k). In the case of a temporally interlaced
presentation, the temporal resolution is halved, which either places a higher
demand on the projector and the rendering system or results in a lower
refresh rate of the final image.
The shutter technique is used more often for monitors than for
projectors. In this technique, the images for the two eyes are shown
alternately on the display at a high frequency. A pair of shutter glasses
darkens the eye whose image is not currently being displayed. For this
purpose, an LCD layer on the glasses can be used, which darkens one eye
depending on an electrical signal. In order to show the correct image to the
correct eye the shutter glasses must be synchronised with the renderings
system, for example, wirelessly via infrared or radio. To do this, such
glasses require electronics and a power supply, which is why this technique
is often called active shutter technique. By alternately darkening the eyes,
the refresh rate for each eye is reduced to half. To avoid distracting
flickering, displays with a refresh rate of at least 100 Hz should be used. In
contrast to the polarisation technique, shutter glasses offer better channel
separation (less ghosting). However, they are significantly more expensive
and reduce the light intensity of the display by half due to the alternating
dimming. Active shutter glasses can fail due to a lost synchronisation signal
or an empty battery. Especially during presentations in large groups or for
important decision-makers, this can lead to a problem if no impression of
depth is created.
A head-mounted display is a helmet construction that positions one or
two small displays in front of the user’s eyes so that different images can be
presented to each eye. For each of the eyes, there is an optical system that
ensures that the user can view the presented scene with a relaxed gaze. This
corresponds to the stereoscopic parallel viewing technique (see above).
Head-mounted displays usually have a sensor that records the movement of
the head (head tracking), so that the scene can be changed accordingly in
real time. This gives the viewer the impression of moving freely in a virtual
world. The first head-mounted display was invented as early as 1968, but
only recently has it been developed into a device suitable for mass use.
These devices are known as virtual reality headsets or VR headset and often
consist of an HMD, a stereo headset and sensors for head tracking. The
most advanced of these headsets have a battery-based power supply and an
integrated graphics computer that renders the two images for the eyes based
on the detected head posture. This eliminates the need for a cable
connection to the rendering computer and increases freedom of movement.
Despite all the current developments and lighter constructions, the
remaining weight is the main disadvantage of head-mounted displays,
which is annoying when worn for long periods.
In recent years, stereoscopy techniques have been developed for
displays that do not require additional aids for the viewer in the form of
special glasses [5, 16, 35]. In lenticular displays, a lens or prism mask is
used in front of the screen so that the two eyes of a viewer receive different
images. The parallax barrier works on a similar principle. It is based on an
opaque mask with a series of precisely arranged slits that block the light
coming from the screen so that every second column of pixels is visible
only to the left or right eye. With both methods, however, depth perception
is limited to a relatively small area in front of the display.
An improvement is provided by holographic techniques. With special
displays, depth perception is possible from almost all viewing angles, as a
separate image is generated for each viewing angle. Static holograms are
currently used in the arts sector or as security features on identity
documents or cashless means of payment.
With multi-layer LCD displays, different viewing angles of a scene can
be shown through the different layers. Furthermore, virtual retinal displays
(retinal scan displays, retinal projectors) project images directly onto the
retina of the viewer. These two techniques allow light fields to be displayed
that enable the viewer to focus on different levels of depth.

Fig. 11.20 Accommodation and convergence (parallax) in a natural viewing situation and in
artificial stereoscopic viewing

A fundamental problem with all stereoscopy techniques with only one


focal plane is that the eyes must constantly focus on the display, regardless
of how far away an object is in the virtual scene. Figure 11.20 illustrates
this problem. During the natural viewing of an object, the oculomotor depth
cues accommodation and convergence match. In an artificial stereoscopic
viewing situation, these depth cues diverge. The eyes are focused on the
display (accommodation) while the line of sight of the eyes is directed to
the virtual object in front (or behind) the display plane (convergence). This
contradiction is called accommodation-vergence conflict. For some viewers,
this contradiction between focus and vergence causes nausea or headaches
when using stereoscopy techniques for extended periods of time. In some
cases, no spatial perception is possible at all.

Fig. 11.21 Parallax for stereoscopic viewing

Figure 11.21 shows the parallax for different positions of an object


when viewing an image plane using a stereoscopy technique. If an object
lies exactly on the display screen, i.e., in the projection plane, it appears in
the images for the left and right eye at the same position on the screen
without parallax (zero parallax). An object behind the projection plane
appears further to the left on the screen in the image for the left eye and
further to the right on the screen in the image for the right eye. This is
called positive parallax. For objects in front of the display, the opposite
situation results, which is called negative parallax.
A computer-generated scene can also be used to create a divergent
parallax, where an object is shown to the left of the left eye and to the right
of the right eye. Since this does not occur in natural vision, it should be
avoided in artificial scenes.
Another problem of artificial displays, such as monitors or projection
surfaces, is their limitation. Objects with positive parallax, i.e., objects that
lie behind the projection plane, are perceived by the display as through a
window whose frame is the border of the display. Conversely, objects with
negative parallax are cut off by this frame, although they should actually be
in front of the display. Due to this limitation, it can happen that only the left
or the right image is cut off, which leads to erroneous and unpleasant
perceptions.

Fig. 11.22 Left: xz-plane of the projection volumes of two inward-facing cameras (toe-in method),
right: xz-plane of the projection volumes of two parallel-facing cameras (off-axis method)

11.12.3 Stereoscopic Projections


For stereoscopic rendering, the same scene must be calculated with two
different projection centres. Figure 11.22 shows the xz-plane of the
projection volumes created by two common methods. The left part of the
figure shows the result for the toe-in method, in which the cameras for the
left eye (L) and the right eye (R) are turned inwards and point to a common
focal point on the projection plane (top of the figure). This simulates the
convergence of the eyes when looking at this point. The triangles in the
figure each represent a section in the xz-plane through the pyramid resulting
from the projections. The characteristic shape of the frustum only arises
through the introduction of the near clipping plane (see below and Sect. 5.
9). It can be seen that these projections and thus the frustums are
symmetrical. Therefore their dimensions can easily be calculated, for
example, using gluPerspective in the OpenGL (see Sect. 11.12.4).
With this method, strong stereo effects can be achieved. However, due to
the inward-facing cameras, the projection planes only match at the common
focal point. At the other points, there is a vertical parallax, which increases
towards the outside of the projection plane. This parallax leads to visible
disturbances during stereoscopic viewing and increased discomfort. Since
the projection volumes are symmetrical, they can usually be easily created
by any graphics package. This simple way of realisation apparently leads to
the frequent use in 3D television and cinema.
The right part of the Fig. 11.22 shows the xz-plane of the projection
volumes resulting from applying the off-axis method. In this method, the
cameras for the left eye (L) and the right eye (R) are aligned parallel in the
same direction. In contrast to the toe-in method, the projection planes for
the left eye match, so there is no interference at the edges and stereoscopic
viewing is much more relaxed. However, the projection volumes are
asymmetric, making their generation more complex and not supported by
every graphics package. In the OpenGL, the generation of such projection
volumes is possible through the command glFrustum (see Sect. 11.12.4).
In principle, scenes can also be rendered using the off-axis method from
two symmetrical volumes with the same projection centres. The
symmetrical volumes are constructed in such a way that they extend beyond
the projection plane for the left eye at the left edge and for the right eye at
the right edge. This extension must then be removed in the calculated
scenes by post-processing [8].
Due to this post-processing effort and the fact that parallel cameras of
the off-axis method are not common, the toe-in method is used for simple
stereoscopic film productions (simple 3D films). In scientific literature, the
toe-in method is considered the wrong and the off-axis method the correct
method for creating stereoscopic projections. Further details on these two
methods can be found in [7] and [13, p. 70–71].

Fig. 11.23 xz-plane of the projection volume for the left camera of the off-axis method

To calculate an asymmetric frustum for the off-axis method, the near


clipping plane must be taken into account. Figure 11.23 shows the xz-plane
of such a projection volume for the left camera.
Let the two parallel cameras for the left and right eye have the distance
s to each other. This is also called pupillary distance or interpupillary
distance. The camera coordinate system has its origin at the position of the
left camera. Let the plane in which the scene is displayed with a zero
parallax (no parallax) (top of the image) be at a distance $$d > 0$$
from the left camera and let it have width w and height h. The near clipping
plane shall be at a distance $$z_n > 0$$ from the left camera.
For use in the OpenGL, the coordinates of the near clipping plane must
be determined. Since this plane is aligned parallel to the axes of the camera
coordinate system (axis-aligned), the determination of two opposite corner
points is sufficient. For the left camera, the x-coordinates
$$x_{L, L}$$ (left) and $$x_{L, R}$$ (right) and the y-
coordinates $$y_{L, B}$$ (bottom) and $$y_{L, T}$$ (top)
must be determined. These values can be used as parameters in the OpenGL
command glFrustum (see Sect. 11.12.4).
From Fig. 11.23, the following relationship can be taken and converted
to l.
$$\begin{aligned} \tan \varphi = \frac{l}{z_n} = \frac{(w/2) - (s/2)}{d}
\Leftrightarrow l = \left( \frac{w}{2} - \frac{s}{2}\right) \frac{z_n}{d}
\end{aligned}$$
Since l is only the distance from the z-axis to the searched coordinate and
the x-axis is oriented to the right in the figure, the left coordinate of the near
clipping plane $$x_{L, L}$$ results as follows:
$$\begin{aligned} x_{L, L} = - l = \left( - \frac{w}{2} + \frac{s}
{2}\right) \frac{z_n}{d}. \end{aligned}$$
By a similar observation, the right coordinate $$x_{L, R}$$ can be
determined as follows:
$$\begin{aligned} x_{L, R} = r = \left( \frac{w}{2} + \frac{s}{2}\right)
\frac{z_n}{d}. \end{aligned}$$
Since the camera positions for the left and right eyes differ only in the
horizontal direction (x-coordinates), the pupillary distance s is not used to
determine the y-coordinates of the near clipping plane. With this
simplification, these coordinates can be calculated in a similar way as the x-
coordinates and are given as follows:
$$\begin{aligned} y_{L, B} = - \left( \frac{h}{2} \right) \frac{z_n}{d}
y_{L, T} = \left( \frac{h}{2} \right) \frac{z_n}{d}. \end{aligned}$$
By similar considerations, the x-coordinates and the y-coordinates of the
near clipping plane for the right camera can be determined as follows:
$$\begin{aligned} x_{R, L} = - \left( \frac{w}{2} + \frac{s}{2}\right)
\frac{z_n}{d} x_{R, R} = \left( \frac{w}{2} - \frac{s}{2}\right) \frac{z_n}
{d} \end{aligned}$$
$$\begin{aligned} y_{R, B} = - \left( \frac{h}{2} \right) \frac{z_n}{d}
y_{R, T} = \left( \frac{h}{2} \right) \frac{z_n}{d}. \end{aligned}$$
In summary, the pupillary distance s and the distance to the plane for zero
parallax d can be used to set the depth effect to be achieved for a
stereoscopic presentation using the off-axis method. The width w and the
height h of the plane in which the zero parallax occurs are also freely
selectable. However, an aspect ratio that differs from the aspect ratio of the
viewport would lead to a distorted presentation of the scene.

11.12.4 Stereoscopy in the OpenGL


Stereoscopic rendering requires the scene to be rendered from two different
camera positions (see explanations in Sects. 11.12.2 and 11.12.3). Figure
11.24 shows a Java method with parameters cameraPosXOffset and
centerPosXOffset that cause the camera position to be displaced
horizontally and the camera to be pointed to a horizontally displaced point.
Calling the gluLookAt command sets the viewer position and the
viewing direction by calculating the perspective transformation and model
transformation matrices accordingly (see Sect. 2.8).

Fig. 11.24 Submethod of the display method (Java) for rendering the scene from a horizontally
shifted camera position and different viewing angle

For the toe-in method (see Sect. 11.12.3), centerPosXOffset =


0f must be set. This places the point to which the camera is pointed
(specified by the centre vector) to the origin of the world coordinate system.
In this case, only the argument for the horizontal shift of the viewer position
(cameraPosXOffset) has to be varied in a call of this submethod. If the
left camera is shifted to the left (negative value) and the right camera to the
right (positive value), the result is an inward alignment of both cameras,
which corresponds to the toe-in method.
For the off-axis method, choose centerPosXOffset =
cameraPosX Offset. A negative value moves a camera parallel to the
left and a positive value moves it parallel to the right. This allows to shift
the cameras horizontally, align the two cameras in parallel and pointing
them to the projection plane with zero parallax according to the off-axis
method (see Sect. 11.12.3).
A horizontal shift of the camera corresponds to a straight head position
with the eyes on a horizontal straight line. If information about the head
orientation is available, for example, from a head tracker, then positions of
the eyes that are not on a horizontal straight line can be taken into account.
In this way, for example, a laterally tilted head can be simulated. To realise
this, the specified method must be modified accordingly. The objects in the
scene are then rendered as in the case of monoscopic output (non-
stereoscopic output). This method from the figure can be used for various
stereoscopic rendering and output methods.
The simplest way of stereoscopic output when using a two-dimensional
screen is to display the image for the left and right eye on the left and right
side of the screen respectively, i.e., displaying the images side-by-side (see
Sect. 11.12.2).
Figure 11.25 shows a Java method which renders the scene into two
adjacent viewports according to the toe-in method. Explanations of the
viewport can be found in Sect. 5.1.2. By calling the method
displaySceneWithOffset (see above), the camera for the left eye is
moved (horizontalCamDistance / 2f) to the left and the camera
for the right eye is moved the same amount to the right. Since the second
argument in this call is zero, both cameras are facing inwards. Before these
two render passes, the contents of the colour buffer and the depth buffer are
cleared by the command glClear. Since there is depth information in the
depth buffer after the calculation of the scene for the left eye, the content of
the depth buffer must be cleared before rendering the scene for the right
eye. The content of the colour buffer must not be deleted because the colour
information of both scenes is written to different positions of the same
colour buffer. Deleting the content of the colour buffer at this position in the
source code would delete the output for the left eye.

Fig. 11.25 Submethod of the display method (Java) for rendering the scene twice into side-by-
side viewports using the toe-in method
Fig. 11.26 Submethod of the display method (Java) for creating a projection matrix that results in
a projection volume in the form of a symmetric frustum for the toe-in method. This method can be
used in identical form for the left and the right camera

The parameters viewportWidth and viewportHeight contain


the width and height of the available area of the current viewport. If the
variable drawable is of type GLAutoDrawable, these values can be
retreived by the commands drawable.getSurfaceWidth() and
drawable.getSurfaceHeight() (call not shown). The width for the
parts of the viewport for the image for the respective eye is set to half of the
available width. The height remains unchanged. The command
glViewport transmits these sizes to the GPU and thus determines the
current viewport, so that the images for the left eye are displayed on the left
half and the images for the right eye on the right half of the entire viewport.
After the respective switching of the viewport area and before the render
call for the scene content, the projection volume is set by calling the method
setFrustum. Since the projection volumes for the left and right cameras
are identical in the toe-in method, only this one method is required.
Figure 11.26 shows the method setFrustum for setting the
symmetrical frustum for the left and right cameras of the toe-in method. In
this method, the corresponding projection matrix for the model-view
transformation is essentially created by gluPerspective so that a
correct perspective is generated for the scene.
As can be seen from the parameter list of the method in Fig. 11.25,
values for the following parameters are required for rendering using the toe-
in method.
horizontalCamDistance (s): Horizontal distance between the left
and right camera (interpupillary distance).
fovy: Field of view (opening angle of the projection volume) in y-
direction in degrees.
viewportWidth: Width of the current viewport in pixels.
viewportHeight: Height of the current viewport in pixels.
zNear ( ): Distance of the near clipping plane to the camera.
zFar: Distance of the far clipping plane to the camera.
In brackets are the corresponding mathematical symbols as used in Sect.
11.12.3.
Figure 11.27 shows a frame of an example scene in which the image for
the left eye was output on the left side and the image for the right eye on the
right side of the window using the toe-in method. The values
horizontalCamDistance = 0.1f, fovy = 45,
viewportWidth = 640, viewportHeight = 480, zNear =
0.1f and zFar = 200f were used. Using the parallel viewing technique
(see Sect. 11.12.2), a spatial perception is possible. By swapping the
images, an output for cross-eyed viewing can be created.

Fig. 11.27 Stereoscopic output generated by an OpenGL renderer in side-by-side viewports using
the toe-in method: The image for the left eye is on the left side and the image for the right eye is on
the right side. Using the parallel viewing technique (see Sect. 11.12.2) creates a spatial visual
impression
Fig. 11.28 Stereoscopic output generated by an OpenGL renderer in side-by-side viewports using
the off-axis method: The image for the left eye is on the left side and the image for the right eye is on
the right side. Using the parallel viewing technique (see Sect. 11.12.2) creates a spatial visual
impression

Figure 11.28 shows a frame of the same example scene as in Fig. 11.27,
which in this case was created using the off-axis method. The image for the
left eye is displayed on the left side of the window and the image for the
right eye on the right side. The source code for rendering a scene using the
off-axis method can be derived by adapting the mechanism for switching
the partial viewports (see Fig. 11.25) for rendering according to the off-axis
method for anaglyph images, which is shown in Fig. 11.29 (see below). The
values horizontalCamDistance = 0.02f,
zeroParallaxDistance = 1f, surfaceWidth = 1f,
surfaceHeight= 0.75f, zNear= 0.1f and zFar = 200f were
used.
These examples show how a stereoscopic presentation can work in the
case of simple head-mounted displays that have a continuous screen or in
which a smartphone is mounted in landscape format in front of both eyes by
a simple construction. A corresponding (smartphone) application must
essentially ensure that the images for the left and right eyes are correctly
displayed side-by-side on the screen in landscape format and at the correct
distance from each other. For relaxed viewing, such simple head-mounted
displays usually contain (simple) optics. Another application of side-by-side
rendering is the transmission of frames of stereoscopic scenes to an output
device when only one image can be transmitted over the transmission
channel. The receiving device must ensure that the images are separated
again and displayed correctly to the viewer.
Another possibility of stereoscopic output is the colour anaglyph
technique (see Sect. 11.12), which works with a two-dimensional colour
display and special glasses (colour anaglyph glasses). Figure 11.29 shows a
Java method for generating a two-dimensional colour anaglyph image. The
method is similar to the method for rendering side-by-side images (see Fig.
11.25). Instead of rendering the images for the left and right eye into
different viewports, the colour anaglyph technique renders the images into
different colour channels. Since the OpenGL uses RGBA colour values, it is
very easy to create a colour anaglyph image for the colours red and cyan.
The command glColorMask determines into which RGBA colour
channels of the colour buffer the respective image is written. In the method
in the figure, this colour mask is used in such a way that the image for the
left eye is coloured cyan6 and the image for the right eye is coloured red.
After rendering the two partial images, the colour mask is reset. In this
method, too, the content of the depth buffer must be deleted between the
rendering passes for the left and right eye. The deletion of the colour buffer
content is not necessary at this point, since the output is rendered into
different colour channels and thus no existing values of the left channel can
be overwritten by values of the right channel.

Fig. 11.29 Submethod of the display method (Java) for generating a colour anaglyph image by
rendering the scene twice using the off-axis method: The image from the parallel horizontally shifted
viewer position to the left is output in cyan (green and blue colour channels). The image from the
parallel horizontally shifted viewer position to the right is output in red

The other difference between the method shown in Fig. 11.29 and the
method in Fig. 11.25 is the use of the off-axis method instead of the toe-in
method. For this, the method displaySceneWithOffset is called
with the argument (horizontalCamDistance / 2f) so that the
position and viewing direction of the left camera is shifted to the left by half
of the distance between the cameras. The position and viewing direction of
the right camera is shifted to the right by half the distance between the
cameras with the second call of this method. This causes the desired parallel
alignment of the cameras to the plane for zero parallax and the shift of the
camera positions in horizontal directions. In addition, different and
asymmetrical projection volumes must be determined and applied for the
left and right cameras. This is done by calling the methods
setLeftFrustum and setRightFrustum.

Fig. 11.30 Submethod of the display method (Java) for creating a projection matrix leading to a
projection volume for the left camera in the form of an asymmetric frustum for the off-axis method

Figure 11.30 shows the method setLeftFrustum to set the


asymmetric frustum for the left camera. For this purpose, the command
glFrustumf is used, which requires the coordinates of two vertices of the
near clipping plane left ( $$x_{L, L}$$), right (
$$x_{L, R}$$), bottom ( $$y_{L, B}$$) and top (
$$y_{L, T}$$). In addition, the distances from the camera position to
the near clipping plane zNear ( $$z_n$$) and to the far clipping
plane zFar are needed as arguments. In brackets behind these variables are
the mathematical symbols used in Sect. 11.12.3. The section also contains
the explanations of the formulae for calculating the coordinates of the near
clipping plane, which are implemented in the method setLeftFrustum.

Fig. 11.31 Submethod of the display method (Java) for creating a projection matrix leading to a
projection volume for the right camera in the form of an asymmetric frustum for the off-axis method

Figure 11.31 shows the method setRightFrustum, which sets the


asymmetric frustum for the right camera. In this case, the command
glFrustumf requires the coordinates of the near clipping plane of the
right camera left ( $$x_{R, L}$$), right ( $$x_{R, R}$$),
bottom ( $$y_{R, B}$$) and top ( $$y_{R, T}$$).
Furthermore, the distances from the camera position to the near clipping
plane zNear ( $$z_n$$) and to the far clipping plane zFar are
needed as arguments. The correspondences to the mathematical symbols
from Sect. 11.12.3 are given in brackets behind the names of the variables.
Section 11.12.3 also contains the explanations of the formulae used in this
method for calculating the coordinates of the near clipping plane for the
right camera.
The parameters of the method from Fig. 11.29 can be summarised as
follows:
horizontalCamDistance (s): Horizontal distance between the left
and right camera (interpupillary distance).
zeroParallaxDistance (d): Distance of the plane in which the
zero parallax (no parallax) occurs.
surfaceWidth (w): Width of the plane in which the zero parallax
occurs.
surfaceHeight (h): Height of the plane in which the zero parallax
occurs.
zNear ( $$z_n$$): Distance of the near clipping plane from the
camera.
zFar: Distance of the far clipping plane from the camera.
In brackets are the corresponding mathematical symbols as given in
Sect. 11.12.3.

Fig. 11.32 Stereoscopic output generated by an OpenGL renderer as a colour anaglyph image using
the off-axis method. Viewing with colour anaglyph glasses with a red filter on the left side and a cyan
filter on the right side creates a spatial visual impression

Figure 11.32 shows an example scene rendered as a colour anaglyph


image using the off-axis method. For this horizontalCamDistance
= 0.02f, zeroParallaxDistance = 1f,
surfaceWidth = 1f, surfaceHeight = 0.75f, zNear =
0.1f and zFar = 200f were used. The red and cyan colouring of the
two partial images is clearly visible. By using colour anaglyph glasses with
a red filter on the left and a cyan filter on the right, a spatial visual
impression is created when viewing this image.

Fig. 11.33 Stereoscopic output generated by an OpenGL renderer as a colour anaglyph image using
the toe-in method. Viewing with colour anaglyph glasses with a red filter on the left side and a cyan
filter on the right side creates a spatial visual impression

Figure 11.33 shows an example scene rendered as a colour anaglyph


image using the toe-in method. The values horizontalCamDistance
= 0.1f, fovy = 45, viewportWidth = 640, viewportHeight
= 480, zNear = 0.1f and zFar = 200f were used. The source
code for the calculation of a colour anaglyph image according to the toe-in
method can be obtained by applying the calculation for the toe-in method
for the output to side-by-side viewports (see Fig. 11.25) to the generation of
colour anaglyph images (see Fig. 11.29).
In principle, rendering of coloured scenes is possible with this
technique. However, if colour information differs greatly due to the
perspective shift between the left and the right partial image, then colour
distortions may become visible. Therefore, this method is not very suitable
for professional applications. On the other hand, it is an easy-to-implement
and robust technique for stereoscopic presentations.
A stereoscopic output without colour distortions is possible, for
example, using the polarisation technique, the shutter technique or via a
head-mounted display (see Sect. 11.12). The transmission of stereoscopic
images to output devices that work with one of these techniques can in
principle take place via only one image per frame, which contains side-by-
side sub-images for the left and right channels. For this purpose, the method
can be used for output into two side-by-side viewports (see Fig. 11.25). If
the parallel transmission of two images per frame is possible, then the
partial images for the left and right channels can be transmitted separately.
Some very high-quality graphics cards support this function through a
stereoscopic output mode using several OpenGL colour buffers. In the
OpenGL, up to four colour buffers can be assigned to the default frame
buffer (quad buffer), which are referred to by GL_FRONT_LEFT,
GL_BACK_LEFT, GL_FRONT_RIGHT and GL_BACK_RIGHT.7 The left
colour buffers are used for standard output (monoscopic output). For
stereoscopic output, the right colour buffers are added.
The front and back colour buffers are intended for the so-called double
buffering mechanism, in which the content of the front colour buffer
(GL_FRONT) is displayed on the output device. The new frame is built-up
in the back colour buffer (GL_BACK) while the current frame in the front
colour buffer is displayed. Only when the build in the back buffer is
complete, the buffer content of the back colour buffer is transferred to the
front colour buffer to be displayed. The OpenGL specification does not
require a true buffer swap, only the contents of the back colour buffer must
be transferred to the front colour buffer. Using this mechanism, it can be
avoided that the image build-up process becomes visible and causes visible
interferences. This buffer switching is automated in the JOGL system and
does not have to be explicitly triggered. For stereoscopic rendering, the two
back colour buffers should be written to and their contents transferred to the
respective front colour buffers after buffer switching. In this case, all four
colour buffers are used.
If four colour buffers are available for the default frame buffer in the
GPU used, the stereoscopic output mode with all these colour buffers can
be activated in a JOGL system with connection to the windowing system
via an object of the type GLCapabilities. This object is passed to the
constructor when the GLWindow object (the output window) is created (see
Sect. 2.7). The following JOGL source code part enables these colour
buffers, if available.

Fig. 11.34 Submethod of the display method (Java) for stereoscopic rendering to different colour
buffers of the default framebuffer using the toe-in method

Fig. 11.35 Stereoscopic output generated by an OpenGL renderer using the toe-in method: A
photograph of the output of a projector using the shutter technique is shown

Figure 11.34 shows a method of a JOGL render to write the images for
the left and the right eye into the left and the right back colour buffer
respectively according to the toe-in method. The switch to the respective
colour buffers takes place using the command glDrawBuffer. Before
each render pass of the scene for each of the eyes, the contents of the colour
buffer and the depth buffer are deleted.
Figure 11.35 shows the output of a stereoscopic projector using the
shutter technique on a standard projection screen. The stereoscopic images
were generated using the toe-in method. The values
horizontalCamDistance = 0.1f, fovy = 45,
viewportWidth = 640, viewportHeight = 480, zNear =
0.1f and zFar = 200f were used. Using the shutter technique, the
images for the left and right eye are displayed alternately at different times.
The photograph in the figure was taken with an exposure time longer than
the presentation time for the left and right eye images. Therefore, these
images were superimposed in the photograph. Since these partial images are
rendered for different viewer positions, they are not identical and the
superimposed image appears blurry. By viewing the stereo projection on a
screen with suitable special glasses (active shutter glasses), the images for
the left and right eye can be recovered and presented to the correct eye (see
Sect. 11.12), so that a spatial visual impression is created.
The source code for rendering this scene according to the off-axis
method can be derived by transferring the mechanism for switching the
buffers (see Fig. 11.34) to the method for rendering according to the off-
axis method for anaglyph images (see Fig. 11.29).

11.13 Exercises
Exercise 11.1 Generate a scene with fog and let an object move into the
fog and disappear in the fog.

Exercise 11.2 Model a light switch in a scene that when pressed, i.e.,
when clicked with the mouse, turns on and off an additional light source in
the scene.

Exercise 11.3 Give a bounding volume in the form of a cuboid and in the
form of a sphere for the object in Fig. 11.16. Let the cylinder have a radius
of one unit and a height of three units. The cone on top of it is two units
high.

Exercise 11.4 Write a program in which two cuboids become transparent


when colliding. Vary the shape of the bounding volume by alternatively
using a sphere, an axis-aligned bounding box (AABB) and an oriented
bounding box (OBB). Vary the orientation of the cuboids and contrast the
advantages and disadvantages of the different bounding volumes for this
application.

Exercise 11.5
(a) The command gluPerspective takes the parameters fovy,
aspect, zNear and zFar. Explain the meaning of these parameters
and research www.khronos.org/opengl to find the projection matrix
resulting from these parameters for creating a symmetrical frustum.
(b)
The command glFrustum takes the parameters left, right,
bottom, up, nearVal and farVal. Explain the meaning of these
parameters and research www.khronos.org/opengl to find the
projection matrix resulting from these parameters for the generation of
an asymmetric frustum.
(c)
Compare the projection matrices from (a) and (b) with each other.
Then show how the arguments for the call to glFrustum can be
derived from the parameters for gluPerspective. What is the
significance of these formulae?

Exercise 11.6 Section 11.12.4 shows parts of the source code for three
stereoscopic output methods using the toe-in method or the off-axis method.
(a)
Write an OpenGL program that renders a simple scene
stereoscopically using side-by-side viewports using the toe-in method
(see Fig. 11.25).
(b)
Modify the program from (a) so that the output is rendered using the
off-axis method.
(c)
Figure 11.29 shows part of the source code for the stereoscopic
rendering of a colour anaglyph image using the off-axis method. Write
an OpenGL program that replicates this rendering. Then modify the
program so that the stereoscopic rendering uses the toe-in method.
(d)
Figure 11.34 shows part of the source code for stereoscopic rendering
to different colour buffers of the default framebuffer according to the
toe-in method. Derive a method in pseudo code which renders such
stereoscopic output according to the off-axis method.

Exercise 11.7 Write an OpenGL program which renders a cube rotating


around a sphere. The cube should also rotate itself around its centre of
gravity. Render this scene stereoscopically in side-by-side viewports for the
left and right eyes. Test the spatial visual impression using the parallel
viewing technique. Extend your stereoscopic rendering program for using
the colour anaglyph technique and test the spatial visual impression using
appropriate colour anaglyph glasses.

References
1. T. Akenine-Möller, E. Haines, N. Hoffman, A. Pesce, M. Iwanicki and S. Hillaire. Real-Time
Rendering. 4th edition. Boca Raton, FL: CRC Press, 2018.
[Crossref]

2. E. Benjamin, R. Lee and A. Heller. “Is My Decoder Ambisonic?” In: 125th AES Convention.
San Francisco, CA, 2008.

3. E. Benjamin, R. Lee and A. Heller. “Localization in Horizontal-Only Ambisonic Systems”. In:


121st AES Convention. San Francisco, 2006.

4. A. J. Berkhout. “A holographic approach to acoustic control”. In: Journal of the Audio


Engineering Society 36.12 (Dec. 1988), pp. 977–995.

5. A. Beuthner. “Displays erobern die dritte Dimension”. In: Computer Zeitung 30 (2004), pp. 14–
14.

6. J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization. Revised.


Cambridge: MIT Press, 1997.

7. P. Bourke. “Calculating Stereo Pairs”. retrieved 26.8.2020, 11:30h. url: http://paulbourke.net/


stereographics/stereorender

8. P. Bourke. “Creating correct stereo pairs from any raytracer”. retrieved 26.8.2020, 11:30h. 2001.
URL: http://paulbourke.net/stereographics/stereorender.

9. F. Brinkmann, M. Dinakaran, R. Pelzer, J. J. Wohlgemuth, F. Seipel, D. Voss, P. Grosche and S.


Weinzierl. The HUTUBS head-related transfer function (HRTF) database. Tech. rep. Abruf
18.10.2020, 21–50h. TU-Berlin, 2019. URL: http://dx.doi.org/10.14279/depositonce-8487.

10. J. Daniel. “Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding
Filters and a Viable, New Ambisonic Format”. In: 23rd International Conference: Signal
Processing in Audio Recording and Reproduction. Copenhagen, 2003.

11. DIN 33402-2:2005-12: Ergonomie - Körpermaße des Menschen - Teil 2: Werte. Berlin: Beuth,
2005.

12. Y. Dobashi, K. Kaneda, H. Yamashita, T. Okita and T. Nishita. “A Simple Efficient Method for
Realistic Animation of Clouds”. In: Proceedings of the 27th Annual Conference on Computer
Graphics and Interactive Techniques. USA: ACM Press/Addison-Wesley Publishing Co., 2000,
pp. 19–28.
13.
R. Dörner, W. Broll, P. Grimm and B. Jung.Virtual und Augmented Reality (VR/AR): Grundlagen
und Methoden der Virtuellen und Augmentierten Realität. 2. Auflage. Berlin: Springer, 2019.
[Crossref]

14. D. S. Ebert, F. K. Musgrave, D. Peachey, K. Perlin and S. Worley. Texturing and Modeling: A
Procedural Approach. 3rd edition. San Francisco: Elsevier, 2003.

15. L. O. Figura. “Lebensmittelphysik”. In: Taschenbuch für Lebensmittel-chemiker. Ed. by Frede


W. Berlin, Heidelberg: Springer, 2004.

16. C. Geiger. “Helft mir, Obi-Wan Kenobi”. In: iX - Magazin für professionelle Informationstechnik
(May 2004), pp. 97–102.

17. E. B. Goldstein. Sensation and Perception. 8th edition. Belmont, CA: Wadsworth, 2010.

18. E. Haines and T. Akenine-Möller, eds. Ray Tracing Gems. Berkeley, CA: Apress, 2019.

19. J. Jerald. The VR-Book: Human-Centered Design for Virtual Reality. Morgan & Claypool
Publishers-ACM, 2016.

20. P. Kutz, R. Habel, Y. K. Li and J. Novák. “Spectral and Decomposition Tracking for Rendering
Heterogeneous Volumes”. In: ACM Trans. Graph. 36.4 (July 2017), Article 111.

21. N. Max. “Optical models for direct volume rendering”. In: IEEE Transactions on Visualization
and Computer Graphics 1.2 (June 1995), pp. 99–108.

22. M. Mori, K. F. MacDorman and N. Kageki. “The Uncanny Valley [From the Field]”. In: IEEE
Robotics & Automation Magazine 19.2 (June 2012), pp. 98–100.

23. A. Neidhardt and N. Knoop. “Binaural walk-through scenarios with actual self-walking using an
HTC Vive”. In: 43. Jahrestagung für Akustik (DAGA). Kiel, 2017, pp. 283–286.

24. G. Palmer. Physics for Game Programmers. Berkeley: Apress, 2005.

25. F. L. Pedrotti, L. S. Pedrotti, W. Bausch and H. Schmidt. Optik für Ingenieure. 3. Auflage.
Berlin, Heidelberg: Springer, 2005.

26. M. Pharr, W. Jakob and G. Humphreys. Physically Based Rendering: From Theory To
Implementation. 3rd edition. Cambridge, MA: Morgan Kaufmann, 2017.

27. A. Plinge, S. Schlecht, O. Thiergart, T. Robotham, O. Rummukainen and E. Habets. “Six-


degrees-of-freedom binaural audio reproduction of first-order ambisonics with distance
information”. In: 2018 AES International Conference on Audio for Virtual and Augmented
Reality. 2018.

28. W. T. Reeves. “Particle Systems - A Technique for Modelling a Class of Fuzzy Objects”. In:
ACM Trans. Graph. 2.2 (1983), pp. 91–108.

29. W. T. Reeves and R. Blau. “Approximate and Probabilistic Algorithms for Shading and
Rendering Particle Systems”. In: SIGGRAPH Comput Graph. 19.3 (1985), pp. 313–322.

30. C. W. Reynolds. “Flocks, Herds and Schools: A Distributed Behavioral Model”. In: SIGGRAPH
Comput. Graph. 21.4 (1987), pp. 25–34.
31.
R. J. Rost and B. Licea-Kane. OpenGL Shading Language. 3rd edition. Upper Saddle River, NJ
[u. a.]: Addison-Wesley, 2010.

32. M. Segal and K. Akeley. The OpenGL Graphics System: A Specification (Version 4.6
(Compatibility Profile) - October 22 2019. Abgerufen 8.2.2021. The Khronos Group Inc, 2019.
URL: https://www.khronos.org/registry/OpenGL/specs/gl/glspec46.compatibility.pdf.

33. G. Sellers, S. Wright and N. Haemel. OpenGL SuperBible. 7th edition. New York: Addison-
Wesley, 2016.

34. P. Stade. Perzeptiv motivierte parametrische Synthese binauraler Raumimpulsantworten.


Dissertation. Abgerufen 17.10.2020. TU-Berlin, 2018. URL: https://depositonce.tu-berlin.de/
handle/11303/8119.

35. A. Sullivan. “3-Deep: new displays render images you can almost reach out and touch”. In:
IEEE Spectrum 42.4 (2005), pp. 30–35.

36. M. Vorländer. Auralization: Fundamentals of Acoustics and Modelling, Simulation, Algorithms


and Acoustic Virtual Reality. Berlin: Springer, 2008.

37. M. Wacker, M. Keckeisen, S. Kimmerle, W. Straßer, V. Luckas, C.Groß, A. Fuhrmann, M.


Sattler, R. Sarlette and R. Klein. “Virtual Try-On: Virtuelle Textilien in der Graphischen
Datenverarbeitung”. In: Informatik Spektrum 27 (2004), pp. 504–511.

38. F. Zotter and M. Frank. Ambisonics: A Practical 3D Audio Theory for Recording, Stereo
Production, Sound Reinforcement and Virtual Reality. Cham, CH: Springer Open, 2018.

Footnotes
1 In this example, the JOGL interface MouseListener from the JOGL package
com.jogamp.newt.event is used to identify mouse clicks in the GLWindow. A similar
interface with the same name is available for other window systems, such as AWT or Swing.

2 For this example, the JOGL class MouseEvent from the JOGL package
com.jogamp.newt.event is used.

3 See https://www.khronos.org/registry/OpenGL-Refpages/gl2.1/xhtml/gluUnProject.xml, retrieved


16.3.2019, 13:20h.

4 The speed of light in vacuum is m/s. The speed of sound in dry air at C
is m/s. The speed of sound depends mainly on temperature. According to [15,
p.356], it changes by 3.7–3.9% for a temperature change of C.
5 Using the RGB colour model, the colour vector for the colour red is $$(1, 0, 0)^T$$ and the
colour vector for the colour cyan is $$(0, 1, 1)^T$$.

6 Cyan is the colour that results from additive mixing of the colours green and blue.

7 Alternative names (aliases) exist for the front buffers and the left buffer buffer, see https://www.
khronos.org/opengl/wiki/Default_Framebuffer, retrieved 11.4.2022, 21:10h.
Web References A
Supplementary material online: A web reference to supplementary
material can be found in the footer of the first page of a chapter in this book
(Supplementary Information). The supplementary material is also available
via SpingerLink: https://link.springer.com
OpenGL: Further information on OpenGL is available on the official
OpenGL web pages:

Khronos Group: www.khronos.org


OpenGL homepage: www.opengl.org
OpenGL specifications: www.khronos.org/registry/OpenGL/index_gl.
php
Here you will also find a web reference to known OpenGL extensions.

Java bindings: The two commonly used Java bindings to the OpenGL
can be found at the following web references:

Java OpenGL (JOGL): www.jogamp.org


Lightweight Java Game Library (LWJGL): www.lwjgl.org

3D modelling: In addition to numerous professional CAD modelling


tools in the design field, the following two programs for modelling and
animating three-dimensional objects are very widespread in the areas of
computer games development, animation and film creation:

Autodesk Maya (Commercial License): www.khronos.org


Blender (Free and Open Source License): www.blender.org
Index
A
Abstract Windowing Toolkit (AWT)
A-buffer algorithm
Accommodation-vergence conflict
Accumulation buffer
Active shutter technique
Adobe RGB colour space
Algorithm of Brons
Aliasing effect
Moiré pattern
Alpha blending
Alpha mapping
Ambient light
Ambisonics
Anaglyph technique
Anchor
Animation
Antialiasing
A-buffer algorithm
accumulation buffer
coverage sample
multisample
supersampling
Approximation
Area sampling
Area subdivision method
Associated data
Atmospheric attenuation
Attribute
Augmented Reality (AR)
Auralisation
Averaging filter
Axis-aligned bounding box
B
Backface culling
Barycentric coordinates
Base primitives
Basic geometric object
area
closed polyline
curve
line
point
polygon
polyline
Bernstein polynomial
Bézier curve
Bézier point
inner
Bézier surface
Binaural hearing
Binocular cues
Binocular depth cues
Binomial
Bitmask
Black-and-white image
Bounding sphere
Bounding volume
Boxcar filter
Bresenham algorithm
for circles
for lines
Brons, algorithm of
B-spline
Buffer object
frame-
index
vertex
Bump mapping
C
Calibrated colour space
Camera coordinate system
Central Processing Unit (CPU)
Central projection
Centroid interpolation
Centroid sampling
C for Graphics (Cg)
CIE chromaticity diagram
Clipmap
Clipping
three-dimensional
Clipping plane
far
near
Clipping volume
Closed polyline
CMY colour model
CMYK colour model
Cohen–Sutherland clipping algorithm
Collision handling
Collison detection
Colorimetric colour space
Colour anaglyph technique
Colour gamut
Colour index mode
Colour model
additive
CMY
CMYK
HLS
HSV
perception-oriented
RGB
sRGB
standard RGB
subtractive
$$\text{ YC}_{\text{ b }}\text{ C}_{\text{ r }}$$
YIQ
YUV
Colour Naming System (CNS)
Colour picking
Colour space
Adobe RGB
calibrated
CIE L*a*b*
CIELab
CIEXYZ
colorimetric
sRGB
standard RGB
Colour sum
Compatibility profile
Compute shader
Computer graphics pipeline
Computer-Aided Design (CAD)
Computer-Aided Manufacturing (CAM)
Constant shading
Constant specular reflection coefficient
Control points
Controllability (of the curve)
Convex
Convex combination
Convolution matrix
Core profile
Coverage mask
Coverage Sample Antialiasing (CSAA)
Coverage value
Cross-eyed viewing
Cross product
CSG scheme
Culling
backface
frontface
hidden face
Cyrus–Beck clipping algorithm
D
Data visualisation
Default framebuffer
Deferred shading
Depth algorithm
Depth buffer
Depth buffer algorithm
Depth fog
Device coordinates
Difference of sets
Diffuse reflection
Discrete oriented polytope
Displacement mapping
Display list
Distance fog
Dither matrix
Dolby Atmos
Double buffering
Downsampling
E
Early fragment test
Environment mapping
Eulerian angles
Even–odd rule
Exponential fog
Extinction
Extrinsic view
F
Far clipping plane
Far plane
Field of view
Filter kernel
Fixed-function pipeline
Flat shading
Fog
exponential
linear
Folding frequency
Form parameter
Fragment
Fragment shader
Framebuffer
Framebuffer object
Freeform surface
Frontface culling
Frustum
G
Gamut
Geometric primitive
Geometric transformation (2D)
Geometrical acoustics
Geometrical optics
Geometry shader
Gloss map
Gouraud shading
Graphics output primitive
Graphics pipeline
Graphics Processing Unit (GPU)
Graphics processor
Greyscale image
H
Halftoning
Head-mounted display
Head related transfer functions
Heightmap
Hidden face culling
High-Level Shading Language (HLSL)
Holographic techniques
Homogeneous coordinate
Hue
I
Image-space method
Immediate mode
Immersion
Index Buffer Object (IBO)
Information visualisation
Inside test
Intensity (colour)
Interaural Level Difference (ILD)
Interaural Time Difference (ITD)
Interpolation
Interpupillary distance
Intersection of sets
Intrinsic view
Irradiance mapping
J
Java OpenAL (JOAL)
Java OpenGL (JOGL)
K
$$k $$-DOP
Kernel
Knot
L
Lenticular display
Level Of Detail (LOD)
Lighting
attenuation
directional
parallel incident
phong
Lightmaps
Lightness
Lightweight Java Game Library (LWJGL)
Linear fog
Line style
Line thickness
Local lighting
Locality principle (of curves)
Low-pass filter
M
Magnification
Matrix stack
Midpoint algorithm
for circles
for lines
Minification
Mipmap
Mipmap level
Mirror reflection
Model coordinate system
Model view matrix
Moiré pattern
Monocular cues
Monocular depth cues
Motion platform
Motion sickness
Moving pen technique
Multisample Antialiasing (MSAA)
N
Near clipping plane
Near plane
Non-Uniform Rational B-Splines (NURBS)
Normal mapping
Normal vector
Normalised device coordinates
Nyquist frequency
O
Object-space method
Octree
Odd parity rule
Off-axis method
Opacity
Opacity coefficient
Open Audio Library (OpenAL)
Open Computing Language (OpenCL)
Open Graphics Library (OpenGL)
OpenGL Extension
OpenGL for Embedded Systems (OpenGL ES)
OpenGL graphics pipeline
fixed-function
programmable
OpenGL Shading Language (GLSL)
OpenGL Utility Library (GLU)
Oriented bounding box
Overshading
P
Parallax
divergent
negative
positive
Parallax barrier
Parallel projection
Parallel viewing
Particle systems
Patch primitive
Path tracing
Perspective division
Perspective projection
Phong shading
Physics engine
Picking
Pixel
Pixel replication
Point
Point sampling
Polarisation technique
Polygon
Polygon mesh
Polyline
Post-filtering
Pre-filtering
Presence
Primitive
Primitive restart
Priority algorithms
Profile
compatibility
core
Programmable pipeline
Projection
Projection center
Projection plane
Provoking vertex
Pupillary distance
Q
Quad
Quadtree
Quaternion
R
Radiosity model
Raster graphics
Rasterisation
Raster-oriented graphics
Ray acoustics
Ray casting
Ray optics
Raytracing
Recursive division algorithms
Reflection mapping
Rendering
Rendering pipeline
Retinal projector
Retinal scan display
RGBA colour value
RGB colour model
Right-handed coordinate system
Rotation
S
Sample
Sampling theorem
Saturation (colour)
Scalar product
Scaling
Scan conversion
Scan line technique
Scene graph
Shader
Shading
constant
flat
Gouraud
normal
Phong
smooth
Shadow
Shadow map
Shear
Shutter technique
Side-by-side
Skeleton
Slab
Smooth shading
Smoothing operator
Smoothness (of curves)
Spatial frequency
Spatial partitions
Specular reflection
Specular reflection exponent
SPIR-V
Spotlighting
Standard Portable Intermediate Representation (SPIR)
Standard RGB colour model (sRGB colour model)
Standard RGB colour space (sRGB colour space)
Stepwise refinement
Stereo imaging
Stereoscope
Stereoscopics
Stereoscopy
Structural algorithm
Subdivision surfaces
Supersampling
Supersampling Antialiasing (SSAA)
Swarm behaviour
Sweep representation
Swing
Swizzling
Symmetric difference of sets
T
TBN matrix
Tessellation
Tessellation control shader
Tessellation evaluation shader
Tessellation unit
Texture
Texture data (texel data)
Texture mapping
displacement
environment
Textures, applications
shadow map
Toe-in method
Top-left rule
Transform feedback
Transformation group
Transformation matrix
Translation
Translucency
Transparency
filtered
interpolated
Triangulation
True colour
Two-pass depth buffer algorithm
U
Uncanny valley
Uniforms
Union of sets
Unweighted area sampling
Upsampling
Up vector
V
Vanishing point
Vector
Vector graphics
Vector-oriented graphics
Vertex
Vertex array
Vertex Array Object (VAO)
Vertex attribute
Vertex Buffer Object (VBO)
Vertex shader
Vertex stream
Vertices
Viewing pipeline
Viewport
Viewport transformation
View Reference Coordinate system (VRC)
View transformation
Virtual Reality (VR)
Virtual retinal display
Visibility
Visual analytics
Visual computing
Volume rendering
Voxels
VR sickness
Vulkan
W
Warn model
Wave field synthesis
Web Graphics Library (WebGL)
Weighted area sampling
Winding order
Window coordinates
World coordinates
Write masking
Z
$$z $$-buffer
Z-buffer algorithm
Zero parallax
Z-fighting

You might also like