Lecture 4
Data Types
Copyright © 2017 Pearson Education, Ltd. All rights reserved. 1-1
Lecture 4 Topics
• Introduction
• Primitive Data Types
• Character String Types
• User-Defined Ordinal Types
• Array Types
• Associative Arrays
• Record Types
• Union Types
• Pointer and Reference Types
Copyright © 2009 Addison-Wesley. All rights reserved. 1-2
Introduction
• A data type defines a collection of data
objects and a set of predefined operations
on those objects
• A descriptor is the collection of the
attributes of a variable
• An object represents an instance of a
user-defined data type
• One design issue for all data types: What
operations are defined and how are they
specified?
Copyright © 2009 Addison-Wesley. All rights reserved. 1-3
Primitive Data Types
• Almost all programming languages provide
a set of primitive data types
• Primitive data types: Those not defined in
terms of other data types
• Some primitive data types are merely
reflections of the hardware
• Others require only a little non-hardware
support for their implementation
Copyright © 2009 Addison-Wesley. All rights reserved. 1-4
Primitive Data Types: Integer
• Almost always an exact reflection of the
hardware so the mapping is trivial
• There may be as many as eight different
integer types in a language
• Java’s signed integer sizes: byte, short,
int, long
Copyright © 2009 Addison-Wesley. All rights reserved. 1-5
Primitive Data Types: Floating Point
• Model real numbers, but only as
approximations
• Languages for scientific use support at
least two floating-point types (e.g., float
and double; sometimes more
• Defined by Range
and precision
• IEEE Floating-Point
Standard 754
Copyright © 2009 Addison-Wesley. All rights reserved. 1-6
Primitive Data Types: Complex
• Some languages support a complex type,
e.g., Fortran, and Python
• Each value consists of two floats, the real
part and the imaginary part
• Imaginary part is specified by j (in Python):
(7 + 3j), where 7 is the real part and 3 is
the imaginary part
Copyright © 2009 Addison-Wesley. All rights reserved. 1-7
Primitive Data Types: Decimal
• For business applications (money)
– Essential to COBOL
– C# offers a decimal data type
• Store a fixed number of decimal digits, in
coded form (BCD)
• Advantage: accuracy
• Disadvantages: wastes memory
Copyright © 2009 Addison-Wesley. All rights reserved. 1-8
Primitive Data Types: Boolean
• Simplest of all
• Range of values: two elements, one for
“true” and one for “false”
• Could be implemented as bits, but often as
bytes
– Advantage: readability
Copyright © 2009 Addison-Wesley. All rights reserved. 1-9
Primitive Data Types: Character
• Stored as numeric codings
• Most commonly used coding: ASCII
• An alternative, 16-bit coding: Unicode
(UCS-2)
– Includes characters from most natural
languages
– Java, C# and JavaScript support Unicode
• 32-bit Unicode (UCS-4)
– Supported by Fortran, starting with 2003
Copyright © 2009 Addison-Wesley. All rights reserved. 1-10
Character String Types
• Values are sequences of characters
• Design issues:
– Is it a primitive type or just a special kind of
array?
– Should the length of strings be static or
dynamic?
Copyright © 2009 Addison-Wesley. All rights reserved. 1-11
Character String Types Operations
• Typical operations:
– Assignment and copying
– Comparison (=, >, etc.)
– Catenation
– Substring reference
– Pattern matching
Copyright © 2009 Addison-Wesley. All rights reserved. 1-12
Character String Type in Certain
Languages
• C and C++
– Not primitive
– Use char arrays and a library of functions that provide
operations
• Fortran and Python
– Primitive type with assignment and several operations
• Java
– Object via the String class
• Perl, JavaScript, Ruby, and PHP
- Provide built-in pattern matching.
Copyright © 2009 Addison-Wesley. All rights reserved. 1-13
Character String Length Options
• Static: COBOL, Java’s String class
• Limited Dynamic Length: C and C++
– String variables can store any number of chars
between zero and the maximum. The null string
used in C
char *str = “apples”; // char ptr points at the
str apples0
• Dynamic (no maximum): Perl, JavaScript
• Ada supports all three string length options
Copyright © 2009 Addison-Wesley. All rights reserved. 1-14
Character String Type Evaluation
• Aid to writability
• As a primitive type with static length, they
are inexpensive to provide--why not have
them?
• Dynamic length is nice, but is it worth the
expense?
Copyright © 2009 Addison-Wesley. All rights reserved. 1-15
Character String Implementation
• Static length: compile-time descriptor
• Limited dynamic length: may need a run-
time descriptor for length (but not in C and
C++)
• Dynamic length: need run-time descriptor;
allocation/de-allocation is the biggest
implementation problem
Copyright © 2009 Addison-Wesley. All rights reserved. 1-16
Compile- and Run-Time Descriptors
Compile-time Run-time
descriptor for descriptor for
static strings limited dynamic
strings
Copyright © 2009 Addison-Wesley. All rights reserved. 1-17
User-Defined Ordinal Types
• An ordinal type is one in which the range of
possible values can be easily associated
with the set of positive integers
• Examples of primitive ordinal types in Java
– integer
– char
– Boolean
• There are two user-defined ordinal type
: enumeration and subrange.
Copyright © 2009 Addison-Wesley. All rights reserved. 1-18
Enumeration Types
• All possible values, which are named
constants, are provided in the definition
• C# example
enum days {mon, tue, wed, thu, fri, sat, sun};
• Design issues
– Is an enumeration constant allowed to appear in
more than one type definition, and if so, how is
the type of an occurrence of that constant
checked?
– Are enumeration values coerced to integer?
– Any other type coerced to an enumeration type?
Copyright © 2009 Addison-Wesley. All rights reserved. 1-19
Evaluation of Enumerated Type
• Aid to readability, e.g., no need to code a
color as a number
• Aid to reliability, e.g., compiler can check:
– operations (don’t allow colors to be added)
– No enumeration variable can be assigned a
value outside its defined range
– Ada, C#, and Java 5.0 provide better support for
enumeration than C++ because enumeration
type variables in these languages are not
coerced into integer types
Copyright © 2009 Addison-Wesley. All rights reserved. 1-20
Subrange Types
• An ordered contiguous subsequence of an
ordinal type
– Example: 12..18 is a subrange of integer type
• Ada’s design
type Days is (mon, tue, wed, thu, fri, sat, sun);
subtype Weekdays is Days range mon..fri;
subtype Index is Integer range 1..100;
Day1: Days;
Day2: Weekdays;
Day2 := Day1;
Copyright © 2009 Addison-Wesley. All rights reserved. 1-21
Subrange Evaluation
• Aid to readability
– Make it clear to the readers that variables of
subrange can store only certain range of values
• Reliability
– Assigning a value to a subrange variable that is
outside the specified range is detected as an
error
Copyright © 2009 Addison-Wesley. All rights reserved. 1-22
Array Types
• An array is an aggregate of homogeneous
data elements in which an individual
element is identified by its position in the
aggregate, relative to the first element.
Copyright © 2009 Addison-Wesley. All rights reserved. 1-23
Array Indexing
• Indexing (or subscripting) is a mapping
from indices to elements
array_name (index_value_list) an element
• The syntax of array references
– The array name is followed by the list of
subscripts, which is surrounded by either
parentheses or brackets
– FORTRAN, PL/I, Ada use parentheses
– Most other languages use brackets
Copyright © 2009 Addison-Wesley. All rights reserved. 1-24
Arrays Index (Subscript) Types
• FORTRAN, C: integer only
• Ada: allows other types such as Boolean and char
• Java: integer types only
• Index range checking
- C, C++, Perl, and Fortran do not specify
range checking
- Java, ML, C# specify range checking
- In Ada, the default is to require range
checking, but it can be turned off
Copyright © 2009 Addison-Wesley. All rights reserved. 1-25
Subscript Binding and Array Categories
• Static: subscript ranges are statically bound
and storage allocation is static (before run-
time)
– Advantage: efficiency (no dynamic allocation)
• Fixed stack-dynamic: subscript ranges are
statically bound, but the allocation is done
during execution time
– Advantage: space efficiency
Copyright © 2009 Addison-Wesley. All rights reserved. 1-26
Subscript Binding and Array Categories
(continued)
• Stack-dynamic: subscript ranges are
dynamically bound and the storage
allocation is dynamic (done at run-time)
– Advantage: flexibility (the size of an array need
not be known until the array is to be used)
• Fixed heap-dynamic: similar to fixed stack-
dynamic: storage binding is dynamic but
fixed after allocation (i.e., binding is done
when requested and storage is allocated
from heap, not stack)
Copyright © 2009 Addison-Wesley. All rights reserved. 1-27
Subscript Binding and Array Categories
(continued)
• Heap-dynamic: binding of subscript ranges
and storage allocation is dynamic and can
change any number of times
– Advantage: flexibility (arrays can grow or shrink
during program execution)
Copyright © 2009 Addison-Wesley. All rights reserved. 1-28
Subscript Binding and Array Categories
(continued)
• C and C++ arrays that include static
modifier are static
• C and C++ arrays without static modifier
are fixed stack-dynamic
• C and C++ provide fixed heap-dynamic
arrays
• C# provides fixed heap-dynamic
• Perl, JavaScript, Python, and Ruby support
heap-dynamic arrays
Copyright © 2009 Addison-Wesley. All rights reserved. 1-29
Array Initialization
• Some language allow initialization at the
time of storage allocation
– C, C++, Java, C# example
int list [] = {4, 5, 7, 83}
– Character strings in C and C++
char name [] = “freddie”;
– Arrays of strings in C and C++
char *names [] = {“Bob”, “Jake”, “Joe”];
– Java initialization of String objects
String[] names = {“Bob”, “Jake”, “Joe”};
Copyright © 2009 Addison-Wesley. All rights reserved. 1-30
Heterogeneous Arrays
• A heterogeneous array is one in which the
elements need not be of the same type
• Supported by Perl, Python, PHP, JavaScript,
and Ruby
Copyright © 2009 Addison-Wesley. All rights reserved. 1-31
Arrays Operations
• The most common array operations are
Assignment
Catenation
Comparison for equality and Inequality
Slices
• Ada and Python’s allow array assignment and
catenation
• Ruby also provides array catenation
• Fortran provides elemental operations because
they are between pairs of array elements
– For example, + operator between two arrays results in an
array of the sums of the element pairs of the two arrays
Copyright © 2009 Addison-Wesley. All rights reserved. 1-32
Rectangular and Jagged Arrays
• A rectangular array is a multi-dimensioned
array in which all of the rows have the same
number of elements and all columns have
the same number of elements
• A jagged matrix has rows with varying
number of elements
• C, C++, and Java support jagged arrays
• Fortran, Ada, and C# support rectangular
arrays (C# also supports jagged arrays)
Copyright © 2009 Addison-Wesley. All rights reserved. 1-33
Slices
• A slice is some substructure of an array;
nothing more than a referencing
mechanism
• Slices are only useful in languages that
have array operations
Copyright © 2009 Addison-Wesley. All rights reserved. 1-34
Slice Examples
• Consider the following Python declarations:
vector = [2, 4, 6, 8, 10, 12, 14, 16]
mat = [[1, 2, 3],[4, 5, 6],[7, 8, 9]]
• vector[3:6] is a three-element array
• mat[1] refers to the second row of mat
• mat[0][0:2] refers to the first and second
element of the first row of mat
• vector[0:7:2] references every other
element of vector
Copyright © 2009 Addison-Wesley. All rights reserved. 1-35
Multi-dimensioned Arrays to Single-
dimension Array
• Two common ways:
– Row major order (by rows) – used in most languages
– column major order (by columns) – used in Fortran
Example, consider array matrix below
347
625
138
• It would be stored in row major order as 3, 4, 7, 6, 2, 5, 1, 3, 8
• It would be stored in column major order as 3, 6, 1, 4, 2, 3, 7, 5, 8
Copyright © 2009 Addison-Wesley. All rights reserved. 1-36