0% found this document useful (0 votes)
2 views

Data Types

Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data Types

Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 103

CSCI 3210: Theory of

Programming Language

Data Types

Based on Textbook Slides

Dr. Zhijiang Dong @ Middle Tennessee State U 1


niversity
Outline
 DataTypes
 Elementary Data Types
 Scalar Data Types
 Composite Data Types
 Structured Data Types
 Array
 Record
 List
 Set

Dr. Zhijiang Dong @ Middle Tennessee State U 2


niversity
Data Types
 Each data object has a type:
 Values: for objects of that type
 Operations: for objects of that type
 Implementation: (Storage representation) for objects of that type
 Attributes: (e.g., name) for objects of that type
 Signature: (of operation f): f: type x type  type

Dr. Zhijiang Dong @ Middle Tennessee State U 3


niversity
Data Type
A data type is a class of data objects together
with a set of operations for creating and
manipulating them.
 Every language has a set of primitive data

types that are built into the language.


 In addition, a language may provide facilities

to allow the programmer to define new data


types.

Dr. Zhijiang Dong @ Middle Tennessee State U 4


niversity
Data Type Specification
 Thebasic elements of a specification of a
data type are as follows:
 The attributes that distinguish data objects of that
type,
 The values that data objects of that type may
have, and
 The operations that define the possible
manipulations of data objects of that type.

Dr. Zhijiang Dong @ Middle Tennessee State U 5


niversity
Data Type Specification
 Specification of an array data type
 Attributes
 the number of dimensions
 the subscript range for each dimension
 the data type of the components
 Values
 The set of numbers that form valid values for array
components
 Operations
 Subscripting to select individual array components
 Create arrays
 Access attributes such as upper and lower bounds of
subscripts

Dr. Zhijiang Dong @ Middle Tennessee State U 6


niversity
Data Type Implementation
 Thebasic elements of the implementation of a
data type are as follows:
 The storage representation that is used to
represent the data objects of the data type in the
storage of the computer during program execution,
and
 The manner in which the operations defined for the
data type are represented in terms of particular
algorithms or procedures that manipulate the
chosen storage representation of the data objects.
Dr. Zhijiang Dong @ Middle Tennessee State U 7
niversity
Declaration
 Both specification and implementation are largely independent of
the particular syntactic forms used in the language.
 Attributes of data objects are often represented syntactically by
declarations or type definitions.
 A declaration is a program statement that serves to communicate
to the language translator information about the name and type
of data objects needed during program execution.
 Explicit declaration
 Like variable declaration in C
 Implicit/default declaration
 Declaration that hold when no explicit declaration is given.
 In FORTRAN, a simple variable INDEX may be used without
explicit declaration. By default, it is assumed as an integer
variable.

Dr. Zhijiang Dong @ Middle Tennessee State U 8


niversity
Declaration
 Declarations serve several important
purposes
 Choice of storage representations
 Storage management
 Polymorphic operations
 Type checking

Dr. Zhijiang Dong @ Middle Tennessee State U 9


niversity
Outline
 DataTypes
 Elementary Data Types
 Scalar Data Types
 Composite Data Types
 Structured Data Types
 Array
 Record
 List
 Set

Dr. Zhijiang Dong @ Middle Tennessee State U 10


niversity
Elementary Data Types
 An elementary data object contains a single data
value.
 A class of such data objects over which various
operations are defined is termed an elementary data
type.
 Integer
 Real
 Character
 Boolean
 Enumeration
 pointer

Dr. Zhijiang Dong @ Middle Tennessee State U 11


niversity
Elementary Data Types:
Specification
 Attributes
 Basic attributes of any data object, such as data type and
name, are usually invariant during its lifetime.
 Some of the attributes may be stored in a descriptor as part
of the data object during program execution.
 Values
 The type of a data object determines the set of possible
values that it may contain.
 The set of values defined by an elementary data type is
usually an ordered set with a least value and a greatest
value; for any pair of distinct values, one is greater than the
other.

Dr. Zhijiang Dong @ Middle Tennessee State U 12


niversity
Elementary Data Types:
Specification
 Operations
 The operations may be primitive operations,
which means they are specified as part of the
language definition, or
 They may be programmer-defined operations, in
the form of subprograms or method declarations
as part of class definitions.

Dr. Zhijiang Dong @ Middle Tennessee State U 13


niversity
Elementary Data Types:
Implementation
 Storage representation
 Strongly influenced by the underlying computer that will execute
the program
 The attributes of elementary data objects
 For efficiency, data attributes can be determined by the compiler. (like
C)
 For flexibility, data attributes can be stored in a descriptor as part of
the data object at run time. (like LISP, Prolog)
 Operations
 Each operation may be implemented in one of three main ways:
 Directly as a hardware operation
 As a procedure or function subprogram
 As an inline code sequence

Dr. Zhijiang Dong @ Middle Tennessee State U 14


niversity
Outline
 DataTypes
 Elementary Data Types
 Scalar Data Types
 Composite Data Types
 Structured Data Types
 Array
 Record
 List
 Set

Dr. Zhijiang Dong @ Middle Tennessee State U 15


niversity
Scalar Data Types
 Scalardata objects: objects that have a
single attribute for its data object
 Numeric Data Types
 Integers
 Subranges
 Floating-point real numbers
 Enumerations
 Boolean
 Character
Dr. Zhijiang Dong @ Middle Tennessee State U 16
niversity
Numeric Data Types: Integers
 Specification
 The set of integer values defined for the type forms an
ordered subset, within some finite bounds, of the infinite set
of integers studied in mathematics.
 The range is typically defined as constants
 May have different specification: short, int, long, and char
 Operations
 Arithmetic operations: +, -, *, /, %, negation (-), identify (+)

 Relational operations: >, >=, <, <=, ==, !=

 Assignment operations: =

 Bit operations: &, |, ^, <<, >>

Dr. Zhijiang Dong @ Middle Tennessee State U 17


niversity
Numeric Data Types: Integers
 Implementation
 Using hardware-defined integer storage representation and
set of hardware arithmetic and relational primitive
operations on integers
 Three possible approaches as indicated in the next
slide
 First one requires declarations and static type checking for
integer data objects
 Second one requires more space (LISP use this approach)
 Third one cannot use arithmetic operations cannot be used
directly
Dr. Zhijiang Dong @ Middle Tennessee State U 18
niversity
Numeric Data Types: Integers

Dr. Zhijiang Dong @ Middle Tennessee State U 19


niversity
Numeric Data Types:
Subranges
A subrange of an integer data type is a
subtype of the integer data type and consists
of a sequence of integer values within some
restricted range.
 Specification
 Declaration:
A : 1 .. 10 (Pascal)
A : integer range 1..10 (Ada)
 Operations: the same set of operations for
ordinary integer types can be applied
Dr. Zhijiang Dong @ Middle Tennessee State U 20
niversity
Numeric Data Types:
Subranges
 Implementation
 Usually stored in fewer bits than a general integer value
 However, subrange values are often represented as the
smallest number of bits for which the hardware implements
arithmetic operations. This can generally be 8 or 16 bits.
 Better type checking
 MonthVar := 0 is illegal if Month: 1 .. 12. This can
be performed at compile time
 If check involves computed values, run-time checking is
needed to determine whether the new value is still iwthin
the bounds declared. For example:
MonthVar := MonthVar + 1

Dr. Zhijiang Dong @ Middle Tennessee State U 21


niversity
Numeric Data Types: Floating-
Point Real Numbers
 Specification
 The values form an ordered sequence from some hardware-
determined minimum negative value to a maximum value, but the
values are not distributed evenly across this range
 Often specified with only the single data type attribute real
(FORTRAN), or float (C)
 Alternatively, the precision can be specified in terms of the
number of digits used in the decimal representation. (Ada)
 Operations
 The same arithmetic, relational and assignment operations described
for integers are also provided for reals.
 Due to roundoff issues, boolean operations , especially equality are
sometimes restricted.
 Language designers may probibit equality between two real numbers

Dr. Zhijiang Dong @ Middle Tennessee State U 22


niversity
Numeric Data Types: Floating-
Point Real Numbers
 Implementations
 Using scientific notation: any number N can be expressed
as: N = m x 2k for m between 0 and 1 & for some integer k
 Based on an underlying hardware representation in which
a storage location is divided into a mantissa (i.e. the
significant digits of the number) and an exponent.

Dr. Zhijiang Dong @ Middle Tennessee State U 23


niversity
Numeric Data Types: Floating-
Point Real Numbers
 IEEE standard 754 specifies both a 32- and 64-bit
standard.
 Numbers consist of three fields:

 S: a one-bit sign field. 0 is positive.


 E: an exponent in excess-127 notation. Values (8 bits)
range from 0 to 255, corresponding to exponents of 2 that
range from -127 to 128.
 M: a mantissa of 23 bits. Since the first bit of the mantissa
in a normalized number is always 1, it can be omitted and
inserted automatically by the hardware, yielding an extra
24th bit of precision.
Dr. Zhijiang Dong @ Middle Tennessee State U 24
niversity
Numeric Data Types: Floating-
Point Real Numbers
 Decoding IEEE format

 Given E, and M, the value of the representation is:

 Parameters Value
 E=255 and M  0 An invalid number

E=255 and M = 0 
 0<E<255 2{E-127}(1.M)
 E=0 and M  0 2 {-126}.M
 E=0 and M=0 0

Dr. Zhijiang Dong @ Middle Tennessee State U 25


niversity
Numeric Data Types: Floating-
Point Real Numbers
+1= 20*1= 2{127-127}*(1).0 (binary) 0 01111111 000000...

+1.5= 20*1.5= 2{127-127}*(1).1 (binary) 0 01111111 100000...

-5= -22*1.25= 2{129-127}*(1).01 (binary)1 10000001 010000...

 This gives a range from 10-38 to 1038.

 In 64-bit format,the exponent is extended to 11 bits giving a


range from -1022 to +1023, yielding numbers in the range 10 -
308
to 10308.

Dr. Zhijiang Dong @ Middle Tennessee State U 26


niversity
Numeric Data Types: Floating-
Point Real Numbers
 Converting Decimal Fractions to Binary (by converting the decimal value .625 to a
binary representation)
 Step 1: Begin with the decimal fraction and multiply by 2. The whole number part of the
result is the first binary digit to the right of the point.
Because .625 x 2 = 1.25, the first binary digit to the right of the point is a 1.
So far, we have .625 = .1??? . . . (base 2) .
 Step 2: Disregard the whole number part of the previous result (the 1 in this case) and
multiply by 2 once again. The whole number part of this new result is the second binary
digit to the right of the point.
Because .25 x 2 = 0.50, the second binary digit to the right of the point is a 0.
So far, we have .625 = .10?? . . . (base 2) .
 Step 3: Disregarding the whole number part of the previous result (this result was .50 so
there actually is no whole number part to disregard in this case), we multiply by 2 once
again. The whole number part of the result is now the next binary digit to the right of the
point.
Because .50 x 2 = 1.00, the third binary digit to the right of the point is a 1.
So now we have .625 = .101?? . . . (base 2) .
 Step 4: In fact, we do not need a Step 4. We are finished in Step 3, because we had 0 as
the fractional part of our result there.
Hence the representation of .625 = .101 (base 2) .

Dr. Zhijiang Dong @ Middle Tennessee State U 27


niversity
Numeric Data Types:
Enumerations
 Specification
 An enumeration is an ordered set of distinct symbolic values
 Need to defines both the literal names to be used for the values
and their ordering
 C:
Enum StudentClass {Fresh, Soph, Junior, Senior};
 Pascal
type StudentClass = (Fresh, Soph, Junior, Senior);
 The type definition introduces the type name StudentClass,
which may be used wherever a primitive type name might be
used.
 It also introduces the literals of Fresh, Soph, Junior, and Senior.
 Operations
 Full set of relational operations
 Assignment, and operations like successor and predecessor

Dr. Zhijiang Dong @ Middle Tennessee State U 28


niversity
Numeric Data Types:
Enumerations
 Implementation
 Each value in the enumeration sequence is represented at
run time by one of the integers 0, 1, 2….

 The successor and predecessor operations involve simply


adding or subtracting one from the integer representing the
value and checking to see that the result is within the
proper range.

 Hardware-provided operations on integers can be used to


implement the basic operations on enumerations.

Dr. Zhijiang Dong @ Middle Tennessee State U 29


niversity
Numeric Data Types: Booleans
 Specification
 The boolean data type consists of data objects having one
of two values – true or false.

 In Pascal and Ada, the boolean data type is considered


simply a language-defined enumeration

 Operations
 AND, OR, NOT,

Dr. Zhijiang Dong @ Middle Tennessee State U 30


niversity
Numeric Data Types: Booleans
 Implementation
 Theoretically, a single bit of storage is enough
 However, the storage representation is often extended
to be a single addressable unit such as a byte,
because single bits may not be separately
addressable in memory
 true and false might be represented in two ways within
this storage unit:
 A particular bit is used for the value, and the rest is ignored;
or
 A zero value in the entire storage unit represents false, and
any other nonzero value represents true.

Dr. Zhijiang Dong @ Middle Tennessee State U 31


niversity
Numeric Data Types:
Characters
 Specification
 A character data type provides data objects that have
a single character as their value.
 The set of possible character values is usually taken
to be a language-defined enumeration corresponding
to the standard character set supported by the
underlying hardware and operating systems.
 The ordering of the characters in this character set is
called collating sequence for the character set.
 Operations
 Relational operations, assignment

Dr. Zhijiang Dong @ Middle Tennessee State U 32


niversity
Numeric Data Types:
Characters
 Implementation
 Character data values are almost always directly
supported by the underlying hardware and operating
system because of their use in input-output.

Dr. Zhijiang Dong @ Middle Tennessee State U 33


niversity
Outline
 DataTypes
 Elementary Data Types
 Scalar Data Types
 Composite Data Types
 Structured Data Types
 Array
 Record
 List
 Set

Dr. Zhijiang Dong @ Middle Tennessee State U 34


niversity
Composite Data Types
 Composite Data Types
 Multiple attributes are often given for each such data type
 Their implementation usually involves a complex data
structure organization by the compiler
 The following composite data types are usually
considered elementary data objects
 Character strings
 Pointers
 User-defined Data types
 Files

Dr. Zhijiang Dong @ Middle Tennessee State U 35


niversity
Composite Data Types:
Character String
 Character String
 A data object composed of a sequence of characters
 Different types of character string
 Fixed declared length
 Variable length to a declared bound
 Unbounded length
 Examples
 Fixed length:
char A(10) - C
DCL B CHAR(10) - PL/I
var C packed array [1..10] of char - Pascal
 Variable length:
DCL D CHAR(20) VARYING - PL/I - 0 to 20 characters
E = “ABC” - SNOBOL4 - any size, dynamic
F = `ABCDEFG\0' - C - any size, programmer defined

Dr. Zhijiang Dong @ Middle Tennessee State U 36


niversity
Composite Data Types:
Character String
 The string in C is actually a bit more complicated.
 Strings are arrays of characters, but C has no string
declaration.

 If an array of characters need to be treated as a C string, it


is programmer’s responsibility to make sure the array
include the null character

 A string constant written as “string liter” will have the null


character appended to the string constant by the C
translator.

Dr. Zhijiang Dong @ Middle Tennessee State U 37


niversity
Composite Data Types:
Character String
 Operations
 Concatenation

 Relational operations on strings based on lexicographic ordering

 Substring selection using positioning subscripts


 Providing the positions of its first and last characters; or
 Providing the first character position and length of the substring

 Substring selection using pattern matching


 The pattern specifies the form of the substring desired

Dr. Zhijiang Dong @ Middle Tennessee State U 38


niversity
Composite Data Types:
Character String
 Operations
 Input and output formatting

 Dynamic strings
 In Perl, string can be static and dynamic
print ‘$ABC’ will output $ABC

print “$ABC” will cause the string to be evaluated, and the


value of the Perl variable $ABC is printed instead

Dr. Zhijiang Dong @ Middle Tennessee State U 39


niversity
Composite Data Types:
Character String
Storage representation for strings

Dr. Zhijiang Dong @ Middle Tennessee State U 40


niversity
Composite Data Types:
Pointers
 Pointer
 A data object whose value is a reference to some
object
 First introduced in PL/I

 Pointers serve two purposes:


 efficient (and sometimes intuitive) access to
elaborated objects (as in C)

 dynamic creation of linked data structures, in


conjunction with a heap storage manager
Dr. Zhijiang Dong @ Middle Tennessee State U 41
niversity
Composite Data Types:
Pointers
 In some languages (Pascal, Ada 83, and Modula-3), pointers are
restricted to point only to objects in the heap.
 In other languages (PL/I, Algol 68, C, C++, and Ada 95), one can
create a pointer to a nonheap object by using an “address of”
operator.
 How and when is storage reclaimed for objects that are no longer
needed?
 Programmer needs to reclaim space explicitly. (C, C++, Pascal,
Modula-2)
 May cause memory leak and/or dangling reference problem
 Language implementation to reclaim unused objects automatically,
as known garbage collection. (Modula-3, Java, C#, all
functional and scripting languages)
 Impose certain run-time costs, and
 Raises the question of how the language implementation is to
distinguish garbage from active objects.
Dr. Zhijiang Dong @ Middle Tennessee State U 42
niversity
Composite Data Types:
Pointers
 Operations:
 Allocation
 Deallocation
 Dereferencing of pointers to access the objects to which
they point
 Assignment of one pointer to another

 The behavior of these operations depends heavily


on
 Whether the language is functional or imperative, and
 Whether it employs a reference or value model for
variables/names

Dr. Zhijiang Dong @ Middle Tennessee State U 43


niversity
Composite Data Types:
Pointers
 Value model for variables/names
 A variable contains a value.
 Reference model for variables/names
 A variable refers to an object with a value.
 Variables have to be dereferenced to get their r-value; however, the
dereferencing is implicit in most languages, so the distinction is subtle.
 Example:
b := 2;
c := b;
a := b + c;
 Value model: put the value 2 in b and then copy it into c. We then read these
values, add them together, and place the resulting 4 in a.

 Reference Model: let b refer to 2 and then let c refer to it also. We then pass
these reference to the + operator, and let a refer to the result, namely 4.

Dr. Zhijiang Dong @ Middle Tennessee State U 44


niversity
Composite Data Types:
Pointers
 Functional languages generally employ a reference model for
names
 A pure functional language has no variables or assignments

 Variables in an imperative language may use either a value or a


reference mode, or some combination of the two.
 C, Pascal, and Ada employ a value model
 Clu and SmallTalk employe a reference model
 Java charts an intermediate, in which the usual implementation of
the reference model is made explicit in the language semantics
 Variables of build-in Java types (integers, floating-point numbers,
characters, and Booleans) employ a value model
 Variable of user-defined types (strings, arrays, and other objects in
the object-oriented sense of the word) employ a reference model

Dr. Zhijiang Dong @ Middle Tennessee State U 45


niversity
Composite Data Types:
Pointers
 Implementation
 Two major storage representations are used for
pointer values
 Absolute addresses: a pointer value may be
represented as the actual memory address of the
storage block the foe data object
 Relative addresses: a pointer value may be
represented as an offset from the base address of
some larger heap storage block within which the data
object is allocated.

Dr. Zhijiang Dong @ Middle Tennessee State U 46


niversity
Composite Data Types:
Pointers
 Pointers and arrays are closely linked in C.
 In most contexts, an unsubscripted array name in C is
automatically converted to a pointer to the array’ first
element (the one with index 0)
 Pointer arithmetic: Given a pointer to an element of
an array, the addition of an integer k produces a pointer to
the element k positions later in the array (earlier if k is
negative)
 Remarkably, the subscript operator [ ] in C is actually
defined in terms of pointer arithmetic.
 Correctness requires only that one operand of [ ] have an
array or pointer type and the other have an integral type.
 Thus, A[3] is equivalent to 3[A].

Dr. Zhijiang Dong @ Middle Tennessee State U 47


niversity
Composite Data Types:
Pointers
 C pointers and arrays
int *a == int a[]
int **a == int *a[]

 BUT equivalences don't always hold


 Specifically, a declaration allocates an array if it
specifies a size for the first dimension
 otherwise it allocates a pointer
int **a, int *a[] pointer to pointer to int
int *a[n], n-element array of row pointers
int a[n][m], 2-d array

Dr. Zhijiang Dong @ Middle Tennessee State U 48


niversity
Composite Data Types:
Pointers
 Compiler has to be able to tell the size of the things to
which you point
 So the following aren't valid:
int a[][]; bad
int (*a)[]; bad

 C declaration Right Left rule: read right as far as you


can (subject to parentheses), then left, then out a level
and repeat
int *a[n]; n-element array of pointers to
integer
int (*a)[n]; pointer to n-element array of
integers

Dr. Zhijiang Dong @ Middle Tennessee State U 49


niversity
Composite Data Types: Files
 A file is a data structure with two special properties:
 It ordinarily is represented on a secondary storage
device such a a disk or tape and thus may be much
larger than most data structures of other types
 Its lifetime may encompass a greater span of time
than that of the program creating it.
 Two general use for files
 For input and output of data to an external operating
environment
 As temporary scratch storage for data when not
enough high-speed memory is available

Dr. Zhijiang Dong @ Middle Tennessee State U 50


niversity
Composite Data Types: Files
 Different types of files
 Sequential files: a data structure composed of a
linear sequence of components of the same type

 Direct-Access files: any single component may be


accessed at random, just as in an array or a
record

 Indexed Sequential files: Similar to a direct-


access file, with the additional facility to access
components in sequence, beginning from a
component selected at random.
Dr. Zhijiang Dong @ Middle Tennessee State U 51
niversity
Composite Data Types: Files
 Declaration
Master: file of EmployeeRec; (Pascal)
 Defines a file variable named Master, whose
components are of type EmployeeRec.

 Tyically,
the file may be accessed in either
read mode or write mode.
 In either mode, there is a file-position
pointer that designates a position before the
first file component, between two components, or
after the last component.
Dr. Zhijiang Dong @ Middle Tennessee State U 52
niversity
Composite Data Types: Files
 Operations
 Open
 Read

 Write

 End-of-file test

 Close

Dr. Zhijiang Dong @ Middle Tennessee State U 53


niversity
Composite Data Types: Files
 Implementation
 The underlying operating system has the primary
responsibility for the implementation of files because
files are created and manipulated by various
programming language processors and utilities.

 File operations are primarily implemented by calls on


primitives provided by the operating system.
 From the language viewpoint, the primary implementation
problem comes from the need to provide storage for system
data and buffers required by the operating system primitives.

Dr. Zhijiang Dong @ Middle Tennessee State U 54


niversity
Outline
 DataTypes
 Elementary Data Types
 Scalar Data Types
 Composite Data Types
 Structured Data Types
 Array
 Record
 List
 Set

Dr. Zhijiang Dong @ Middle Tennessee State U 55


niversity
Structured Data Types
 The current goal in programming language design is
to make the distinctions in the various forms of data
transparent to the programmer who uses the data
types.
 Four basic mechanisms exist to provide the
programmer with the ability to create new data types
and operations on that type.
 Structured data
 Subprograms
 Type declarations
 Inheritance

Dr. Zhijiang Dong @ Middle Tennessee State U 56


niversity
Structured Data Types
 A data object that is constructed as an aggregate of other data
objects, called component, is termed a structured data object or
data structure.
 A component may be elementary or another data structure

 The bindings of data structures to values, names, and locations


are important and somewhat more complex comparing to
elementary data types
 How to indicate the component data objects of a data structure
and their relationships in such a way that selection of a
component is straightforward?

 Many operations on data structures bring up storage


management issues that are not present for elementary data
objects.

Dr. Zhijiang Dong @ Middle Tennessee State U 57


niversity
Structured Data Types
 Structured Data Types
 Vectors
 Arrays
 Records
 Lists
 Sets

Dr. Zhijiang Dong @ Middle Tennessee State U 58


niversity
Structured Data Types:
Specification
 The major attributes for specifying data structures
include the following
 Number of components
 A data structure maybe fixed size such as array, record, or
variable size such as lists, stacks, sets
 Variable-size data structure types usually provide insert and
delete operations
 Type of each component
 Homogeneous: all components of the same type
 Heterogeneous: components of different types
 Mechanism to be used for selecting components
 Maximum number of components
 Organization of the components

Dr. Zhijiang Dong @ Middle Tennessee State U 59


niversity
Structured Data Types:
Specification
 Specification of the domain and range of operations on data
structure types may be given in much the same manner as for
elementary types.
 Some new classes of operations are of particular importance:
 Component selection operations
 Random selection: an arbitrary component can be accessed
 Sequential selection: components are selected in a predetermined
order
 Whole data structure operations
 Operations take entire data structures as arguments and produce
new data structures as results
 Examples: addition of two arrays, copy record, union of two sets
 Insertion/deletion of components
 Creation/destruction of data structures
Dr. Zhijiang Dong @ Middle Tennessee State U 60
niversity
Structured Data Types:
Implementation
 Implementation consideration for data structure
types include the same issues as for elementary
types.

 In addition, two new issues develop that strongly


affect the choice of storage representation
 Efficient selection of components from a data structure,
and
 Efficient overall storage management for the language
implementation.

Dr. Zhijiang Dong @ Middle Tennessee State U 61


niversity
Implementation:
Storage Representation
 Storage representations for a data structure include
 Storage for the components of the structure, and
 An optional descriptor that stores some or all of the attributes of
the structure
 Two basic storage representations
 Sequential representation
 Stored in a single contiguous block of storage that includes both
description and components
 Typically used for fixed-size structures and sometimes for
homogeneous variable-size structures such as strings or stacks
 Linked representation
 Stored in several noncontiguous blocks of storage, with the blocks
linked together through pointers.
 Commonly used for variable-size structures such as lists

Dr. Zhijiang Dong @ Middle Tennessee State U 62


niversity
Implementation:
Storage Representation

Dr. Zhijiang Dong @ Middle Tennessee State U 63


niversity
Implementation: Operations
 Implementation of operations on data structures
 Component selection is of primary importance

 Efficiency of both random-selection and sequential-


selection operations is needed

 Sequential-selection should more efficiently than simply as


a sequence of random selections

 These two basic types of component selection are


implemented differently for sequential and linked storage
representations
Dr. Zhijiang Dong @ Middle Tennessee State U 64
niversity
Operation Implementation:
Sequential Representation
 Random selection involves
a base-address-plus-
offset calculation using
an accessing formula.

 The relative location of the


selected component within
the sequential block is
called its offset.

Dr. Zhijiang Dong @ Middle Tennessee State U 65


niversity
Operation Implementation:
Sequential Representation
• Selection of a sequence of components from the
structure is possible by the following steps, for a
homogeneous structure such as an array
• To select the first component of the sequence, use the
base-address-plus-offset calculation

• To advance to the next component in the sequence,


add the size of the current component to the location
of the current component

Dr. Zhijiang Dong @ Middle Tennessee State U 66


niversity
Operation Implementation:
Linked Representation
• Random selection of a component from a linked
structure involves following a chain of pointer from
the first block of storage in the structure to the
desired component.

• Selection of a sequence of components proceeds by


selecting the first component and then following the
link pointer from the current component to the next
component for each subsequence selection

Dr. Zhijiang Dong @ Middle Tennessee State U 67


niversity
Type Checking on Structured
Data Types
• Type checking on structured data types is more complex
because component selection operations must be taken into
account
• Existence of a selected component
• The arguments to a selection operation may be of the right types,
but the component designated may not exist in the data structure

• Type of a selected component


• A selection sequence may define a complex path through a data
structure to the desired component. For example:
A[2][3].link->item
• To perform static type checking, the translator should be able to
determine the type of component selected by any valid
composite selector

Dr. Zhijiang Dong @ Middle Tennessee State U 68


niversity
Outline
 DataTypes
 Elementary Data Types
 Scalar Data Types
 Composite Data Types
 Structured Data Types
 Array
 Record
 List
 Set

Dr. Zhijiang Dong @ Middle Tennessee State U 69


niversity
Array
 Arrays are the most common and important composite
data types
 arrays are usually homogeneous
 Semantically, they can be thought of as a mapping
from an index type to a component or element type
 Array declaration typically specify the following
attributes
 Number of components
 Data type of each component
 Subscript to be used to select each component
 Maybe a range of values as -5 .. 5, or
 An upper bound with an implied lower bound

Dr. Zhijiang Dong @ Middle Tennessee State U 70


niversity
Array Declaration
• Pascal
V: array [-5 .. 5] of real;
• C
Float a[10];
• FORTRAN
character, dimension (1:26) :: upper
character (26) upper !shorthand notation
• Ada
upper: array (character range ‘a’ .. ‘z’) of
character;

Dr. Zhijiang Dong @ Middle Tennessee State U 71


niversity
Array Storage
 Array in most language implementations are stored in contiguous
locations in memory
 One dimensional array
 ith element is stored immediately after the (i-1) th element. (subject
to alignment constraints)
 Two dimensional array
 Row major order (Most programming languages)
 A[i, j] is followed by A[I, j+1] immediately
 Column major order (FORTHAN)
 A[I, j] is followed by A[i+1, j] immediately
 Row major order is more popular
 The difference between row- and column-major layout can be
important for programs that use nested loops to access all the
elements of a large, multi-dimensional array. (efficiency because
of cache)
Dr. Zhijiang Dong @ Middle Tennessee State U 72
niversity
Array Storage
Row-major order vs. Column-major order

Dr. Zhijiang Dong @ Middle Tennessee State U 73


niversity
Array Storage: Alternatives
 Row-pointer layout is an alternative to contiguous allocation.
 Rather than require the row of an array to be adjacent, they allow
them to lie anywhere in memory, and create an auxiliary array of
pointers to the rows.
 Advantages of row-pointer layout
 Individual elements can be access more quickly, especially on
CISC machines with slow multiplication instruction
 Rows may have different lengths, without devoting space to
holes at the ends of the rows. (called ragged array)
 An array can be constructed from preexisting rows without
copying.
 Disadvantages: Requires more space
 C, C++, and C# provide both contiguous and row-pointer
organizations for multidimensional arrays.
 Java uses the row-pointer layout for all arrays
Dr. Zhijiang Dong @ Middle Tennessee State U 74
niversity
Array Storage Comparison

Contiguous array allocation v. row pointers in C. The declaration on the left is a tr ue two-dimensional array. The slashed
boxes are NUL bytes; the shaded areas are holes. The declaration on the right is a ragged array of pointers to arrays of
character s. In both cases, we have omitted bounds in the declaration that can be deduced from the size of the
initializer (aggregate). Both data structures permit individual characters to be accessed using double subscripts, but the
memory layout (and corresponding address arithmetic) is quite different.

Dr. Zhijiang Dong @ Middle Tennessee State U 75


niversity
Accessing Array Elements

Dr. Zhijiang Dong @ Middle Tennessee State U 76


niversity
Accessing Array Elements:
Virtual Origin
 Rewriting access equation:
L-value(A[I,J]) =  - d1*L1 - d2*L2 +I*d1 + J*d2
 Set I = 0; J= 0;
L-value(A[0,0]) =  - d1*L1 - d2*L2 +0*d1 + 0*d2
=  - d1*L1 - d2*L2, which is a constant.

 Call this constant the virtual origin (VO); It


represents the address of the 0th element of the
array.
L-value(A[I,J]) = VO +I*d1 + J*d2
Dr. Zhijiang Dong @ Middle Tennessee State U 77
niversity
Dope Vectors
 Dope vector is used to store array shape information
such as
 Lower bound of each dimension
 The size of each dimension
 Upper bound of each dimension (optional, can be included
to avoid computing repeatedly at runtime)
 To perform semantic checking as well as address
calculation.
 If these information are statically known, the translator can
look them up in the dope vector
 If these information are not statically known, the translator
must generate code to look them up in a dope vector at run
time.

Dr. Zhijiang Dong @ Middle Tennessee State U 78


niversity
Array Storage Representation

One dimensional array

Right figure: Two dimensional array


Dr. Zhijiang Dong @ Middle Tennessee State U 79
niversity
Slices
A slice or section is a rectangular portion of
an array.

 Fortran 90 and provides extensive facilities


for slicing, as do many scripting languages,
including Perl, Python, Ruby, and R.

 Adaprovides more limited support: a lice is


simply a contiguous range of elements in a
one-dimensional array.
Dr. Zhijiang Dong @ Middle Tennessee State U 80
niversity
Slices

Dr. Zhijiang Dong @ Middle Tennessee State U 81


niversity
Array Slices in Fortran 90

The slashes in the second


subscript delimit an explicit
list of positions.
 a: b: c in a subscript indicates positions a, a+c, a+2c, ...through b.
 If a or b is omitted, the corresponding bound of the array is assumed.
 If c is omitted, 1 is assumed.
 It is even possible to use negative values of c in order to select positions in reverse order.

Dr. Zhijiang Dong @ Middle Tennessee State U 82


niversity
Array Operations
 Most programming languages permit selection of an
element and assignment operations on arrays

 A few languages (Ada, Fortran 90) allow arrays to


be compared for equality
 Ada even allows one-dimensional arrays whose elements
are discrete to be compared for lexicographic ordering.

 Fortran 90 has a very rich set of array operations


that take entire arrays as arguments such as A + B.

Dr. Zhijiang Dong @ Middle Tennessee State U 83


niversity
Associative Arrays
 In some applications, there is a desire to
access information by name without having a
predefined ordering or enumeration of the
appropriate subscripts.
 An associative array (map, table, or

dictionary) is an abstract data type composed


of a collection of (key, value) pairs, such that
each possible key appears at most once in
the collection.
Dr. Zhijiang Dong @ Middle Tennessee State U 84
niversity
Associative Arrays
 In Perl, an associative array is created with
the % operators
%ClassList = (
“Michelle”, ‘A’,
“Doris”, ‘B’,
“Michael”, ‘D’ );
 It creates a three-element associative array:
Student name as the key, and grade as the
value.
Dr. Zhijiang Dong @ Middle Tennessee State U 85
niversity
Outline
 DataTypes
 Elementary Data Types
 Scalar Data Types
 Composite Data Types
 Structured Data Types
 Array
 Record
 List
 Set

Dr. Zhijiang Dong @ Middle Tennessee State U 86


niversity
Record
 Record/Structure: A data structure composed of a
fixed number of components of different types.
 usually laid out contiguously
 possible holes for alignment reasons
 smart compilers may re-arrange fields to minimize
holes (C compilers promise not to)
 Record specification
 Type of each component (called field)
 Name of each component (called field name)
 The number of components (implied)
 Name of the structure
Dr. Zhijiang Dong @ Middle Tennessee State U 87
niversity
Record Declaration

struct element { type element = record


char name[2]; name : two_chars;
int atomic_number; atomic_number : integer;
double atomic_weight; atomic_weight : double;
_Bool metallic; metallic : Boolean;
}; /*C Version*/ }; /*Pascal Version*/

Dr. Zhijiang Dong @ Middle Tennessee State U 88


niversity
Record:
Storage Representation

• Likely layout in memory for objects of type element


on a 32-bit machine. Alignment restrictions lead to
the shaded “holes.”
Dr. Zhijiang Dong @ Middle Tennessee State U 89
niversity
Array of Record
• Array element
may be a
record

Dr. Zhijiang Dong @ Middle Tennessee State U 90


niversity
Record
• A record may also have
components that are arrays or
other records, leading to
records that have a
hierarchical structure
consisting of a top level of
components, some of which
may be arrays and records

Dr. Zhijiang Dong @ Middle Tennessee State U 91


niversity
Nested Record
• Pay attention to Nested record. They behave differently in
value mode and reference model of variables
• The program below prints 0 for value model of variables, 7 for
reference model of variables.
type
T = record
j : integer;
end;
S = record
I : integer;
n : T;
end;
var s1, s2 : S;

s1.n.j := 0;
s2 := s1;
s2.n.j := 7;
writeln(s1.n.j);

Dr. Zhijiang Dong @ Middle Tennessee State U 92


niversity
Record Operations
 Component selection is the basic operation on a
record.
 Typically, using dot operator and the filed name
 Can be implemented easily
 Field name is known during translation
 The size of each component and its position within the
storage block can be determined during translation
 Most commonly assignment of records of identical
structure is provided
 Can be implemented as a simple copy of the contents of
the storage block representing the first record into the
storage block representing the second record
Dr. Zhijiang Dong @ Middle Tennessee State U 93
niversity
Variant Records
typedef union {
int X;
float Y;
char Z[4];} B;
B P;

 Similar to records, except all have overlapping (same) L-value.

 But problems can occur. What happens below?


P.X = 142;
printf(“%O\n”, P.Z[3])

 All 3 data objects have same L-value and occupy same storage. No
enforcement of type checking.
 Poor language design

Dr. Zhijiang Dong @ Middle Tennessee State U 94


niversity
Variant Records
type PayType=(Salaried, Hourly);
var Employee:record
ID: integer;
Dept: array[1..3] of char;
Age: integer;
case PayClass: PayType of
Salaried:(MonthlyRate:real;

StartDate:integer);
Hourly:(HourRate:real;
Reg:integer;

Overtime:integer)
end

Dr. Zhijiang Dong @ Middle Tennessee State U 95


niversity
Outline
 DataTypes
 Elementary Data Types
 Scalar Data Types
 Composite Data Types
 Structured Data Types
 Array
 Record
 List
 Set

Dr. Zhijiang Dong @ Middle Tennessee State U 96


niversity
Lists
 List: A data structure composed of an ordered sequence of data
structures
 Lists are rarely of fixed length
 Lists are rarely homogeneous
 Languages that use lists typically declare such data implicitly
without explicit attributes for list members.
 LISP list are heterogeneous: any object may be placed in a list
syntax: (a b c)
 MP list are homogeneous
syntax: [a, b, c]
 Lists are primitive data objects in languages like ML, LISP,
Prolog, Python, but do not appear in the usual compiled
languages like C, Pascal, or Ada.
Dr. Zhijiang Dong @ Middle Tennessee State U 97
niversity
List Operations
 Test a list to see if it is empty
 Return the length of a list
 Return the nth element of a list
 Return a list consisting of all but the first n
elements
 Reverse the order of the elements of a list
 Search a list for elements matching some
predicate
 Apply a function to every element of a list

Dr. Zhijiang Dong @ Middle Tennessee State U 98


niversity
List:
Storage Representation for LISP

Dr. Zhijiang Dong @ Middle Tennessee State U 99


niversity
Outline
 DataTypes
 Elementary Data Types
 Scalar Data Types
 Composite Data Types
 Structured Data Types
 Array
 Record
 List
 Set

Dr. Zhijiang Dong @ Middle Tennessee State U 100


niversity
Sets
 Set : a data object containing an unordered
collection of distinct values.

 Basic Operations
 Membership: Is data value X a member of set S?
 Insertion and deletion of single value
 Union, intersection, and difference of sets.

 Note that accessing of components of a set by


subscript or relative position plays no part in set
processing.
Dr. Zhijiang Dong @ Middle Tennessee State U 101
niversity
Sets
 Pascal supports sets of any discrete type, and provides union,
intersection, and difference operations:
var A, B, C : set of char
D, E : set of weekday;

A := B + C; //union
A := B * C; //intersection
A := B – C; //difference
 Python supports sets of arbitrary type

 Sets appear in the standard libraries of many object-oriented


languages, including C++, Java, and C#.

Dr. Zhijiang Dong @ Middle Tennessee State U 102


niversity
Set Implementation
 There are many ways to implement sets
 Array
 Hash tables
 Various forms of trees

Dr. Zhijiang Dong @ Middle Tennessee State U 103


niversity

You might also like