0% found this document useful (0 votes)
30 views50 pages

Slides 09 Programming Languages - UET CS - Talha Waheed - Data Types

Data types provide structures for representing and manipulating data in programs. Primitive data types include integers, decimals, Booleans, characters, and strings. User-defined types like enumerations and sub-ranges allow modeling domain-specific concepts. Arrays provide a way to aggregate homogeneous data elements that can be accessed via indices. Key design decisions for data types include syntax, supported operations, representation, and static versus dynamic aspects.

Uploaded by

laraibnawaz86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views50 pages

Slides 09 Programming Languages - UET CS - Talha Waheed - Data Types

Data types provide structures for representing and manipulating data in programs. Primitive data types include integers, decimals, Booleans, characters, and strings. User-defined types like enumerations and sub-ranges allow modeling domain-specific concepts. Arrays provide a way to aggregate homogeneous data elements that can be accessed via indices. Key design decisions for data types include syntax, supported operations, representation, and static versus dynamic aspects.

Uploaded by

laraibnawaz86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

CS445 Programming Languages Slides 9

– Data Types

By Talha Waheed, UET Lahore


PL Data Types Background
• Programs produce results by manipulating data.
• What are data types , structures to include in language?
• How well these data types model real world?

Abstract Data Type


Use of type is separate from its representation and set of
operations on values of that type.

Design Issues for All Data Types


1. What is syntax of references to variables?
2. What operations are defined and how are they specified?
Evolution of Data Types

FORTRAN I (1956)
• Limited to INTEGER, REAL, arrays
• link-list , binary trees modeled with arrays

• Ada (1983)
• User create unique type for every category of variables in
problem space
• system enforce uniqueness of types (base for ADT)
Syntax / Declaration

• int a; // C++, Java


• a : integer ; // Ada, Pascal
Primitive Data Types
(those not defined in terms of other data types)
Integer
• Exact reflection of hardware, mapping trivial
• Integer value represented in computer by string of bits
– one of bits, typically leftmost representing sign

• Sign magnitude notation, vs. 2’s ,1’s complement for int


storage
• Many different integer types in language (according to size
& sign)
Decimal
- For business applications (like currency)
- Store fixed number of decimal digits (coded)
- BCD (H/W or S/W support ) vs Binary storage
- Advantage: accuracy 34.50 00110100
- Disadvantages: limited range, wastes memory

Boolean
- Simplest of all types int flag = 0;
- - ALGOL 60 introduced it.
- Could be implemented as bits, but often as bytes. 00000001
- In C there is no Boolean type (But C++ has it)
- Advantage: readability, Dis-advantage: ??
Floating Point
- Model real numbers, as approximations
- (PI can’t be stored exactly)
- Even 0.1 in decimal can’t be represented by finite binary digits
- Scientific languages support 2+ float types
- fractions and exponents
- Storage on new computers in IEEE float point standard 754
- Usually exactly like hardware, software simulation is slow
- Some languages allow accuracy specs in code e.g. (Ada)

type SPEED is digits 7 range 0.0 .. 1000.0;


type VOLTAGE is delta 0.1 range -12.0 .. 24.0;
Character and String Types
- Numeric coding (ASCII and Unicode)
- String is sequences of characters

Design issues
1. Is it just special kind of array or primitive type
(with no array-style subscript operations) ?
2. Is length of objects static or dynamic?

?What Operations to Provide


- Assignment
- Comparison (=, >, etc.)
- Concatenation
- Substring reference
- Pattern matching
Strings Examples
Pascal
- Not primitive;
- Operations: assignment, comparison only (of packed arrays)

Name: Array [1..5] of Char;


Name: PACKED Array [1..5] of Char;

Ada, FORTRAN 77, FORTRAN 90 and BASIC

- Somewhat primitive
- Operations: Assignment, comparison, catenation, substring
reference
- FORTRAN has intrinsic support for pattern matching
Ada
N := N1 & N2 (concatenation)
N(2..4) (substring reference)
C / C++
- Not primitive
- Use char arrays , library of functions that provide operations

SNOBOL4
- String manipulation language
- Primitive
- Many operations, including elaborate pattern matching

Java
- Primitive type of String class and StringBuffer class
String Length Options
1. Static
- FORTRAN 77, Ada, COBOL
- e.g. in FORTRAN 90 CHARACTER (LEN = 15) NAME;

2. Limited Dynamic Length


- C and C++ actual length is indicated by null character ‘\0’.

3. Dynamic
- flexible but overhead of allocation/deallocation
- in SNOBOL4, PERL
Evaluation of strings
- Strings aid in writebility

- Pattern Matching and Concatenation are essential


operations.
-
As primitive type with static length, inexpensive to provide
as compared to char arrays -why not have them?
- (C++ provides library functions for strings to complement)
Implementation of strings
Static length
- compile-time descriptor
(descriptor is collection of attributes of variable)

Limited dynamic length


- may need run-time descriptor for length
(but not in C/C++ : we use special symbol of “\0” in these languages)

Dynamic length
- need run-time descriptor
- allocation/deallocation is biggest implementation problem
Ordinal Types (User Defined)
Ordinal type: range of possible values can be associated with set of
positive integers

1. Enumeration Types
User enumerates all possible values, which are symbolic constants.
e.g. in C++
typedef enum {North = 1, East = 2, West = 3, South = 4} Directions;

Design Issue:
Should symbolic constant be allowed to be in more than one type
definition?
typedef enum {Right = 0, Left = 1} Directions;
typedef enum {Right = 1, Wrong = 2} Answer;
Examples
Pascal
- cannot reuse constants; can be used for array subscripts e.g.
Array[LEFT] , for variables, case selectors
e.g. in case RIGHT:
- NO input or output; compared in conditions.

Ada
- constants can be reused (overloaded literals)
- disambiguate with context or type_name (one of them)
- can be used as in Pascal; CAN be input and output

C/C++
- like Pascal, except they can be input and output as integers

Java
- does not include an enumeration type
Evaluation of enumeration
a. Enhances Readability
e.g. no need to code a color as a number,
use Colors as Enumerations
Color = 0; vs. Color = BLACK;

b. Enhances Reliability
e.g. compiler can check operations and ranges of
values
2. Sub-range Type
An ordered contiguous subsequence of an ordinal type
Design Issue
How can they be used?

Examples
Pascal
- Subrange types behave as their parent types
- can be used as for variables and array indices
e.g. type position = 0 .. MAXINT;
Ada
- Subtypes are not new types, just constrained existing types
(so they are already compatible);
can be used as in Pascal, plus in case constants
e.g.
subtype POS_TYPE is
INTEGER range 0 ..INTEGER'LAST;
Evaluation of sub-range types

- Enhances Readability

- Reliability - restricted ranges helps in error detection

Implementation of sub-range types

- Enumeration types are implemented as integers

- Sub-range types are parent types with code inserted

- (by the compiler) to restrict assignments to sub-range variables


Arrays
Array is aggregate of homogeneous data elements in which individual
element identified by its position in aggregate, relative to first element
(Random Access possible).

Design Issues

1. What data types are legal for subscripts?


2. When are subscript ranges bound?
3. When does memory allocation take place?
4. What is maximum number of subscripts in a multi-dimensional
array?
5. Can array objects be initialized?
6. Are any kind of slices allowed?
Arrays
Indexing is a mapping from indices to elements

map (array_name, index_value_list)  an element

Syntax
- FORTRAN, PL/I, Ada uses parentheses () name(1)

- Most others languages used square brackets []


Array Subscript Types
• FORTRAN, C/C++ - int only int index = 1; array[index]

• Java - integer types only

• Pascal - any ordinal type (int, boolean, char, enum)

• Ada – typically int or enum (includes boolean and char)


Four Categories of Arrays
(based on subscript binding and storage binding to memory)

1. Static
- range of subscripts and storage bindings are static
e.g. FORTRAN 77, some arrays in Ada

Advantage: execution efficiency (no allocation/deallocation)

2. Fixed stack dynamic


- range of subscripts is statically bound
- but storage is bound at elaboration time (declaration is first scene)
e.g. Pascal locals and, C locals that are not static int A(){
int B[10]; 0 -9
}

Advantage: space efficiency


3. Stack-dynamic
- Subscript range and storage are dynamic
- but fixed from then on for variable’s lifetime

- e.g. Ada declare blocks


declare
STUFF : array (1..N) of FLOAT;
begin
...
end;

Advantage: flexibility
- size need not be known until array is about to be used
4. Heap-dynamic

- subscript range and storage bindings are dynamic and not fixed

e.g. (FORTRAN 90)

INTEGER, ALLOCATABLE, ARRAY (:,:) :: MAT


(Declares MAT to be a dynamic 2-d array)

ALLOCATE (MAT (10, NUMBER_OF_COLS))


(Allocates MAT to 10 rows and NUMBER_OF_COLS columns)

DEALLOCATE MAT (Deallocates MAT’s storage)

- In APL & Perl, arrays grow and shrink as needed


- In Java, all arrays are objects (heap-dynamic, auto garbage
collection)
Number of subscripts
- FORTRAN I allowed up to three

- FORTRAN 77 allows up to seven

- C/C++, and Java allow just one,


but its elements can be another array e.g. int Array[2][3];

- Others - no limit
Array Initialization
list of values that are put in array in order in which array elements are
stored in memory.

Any mechanism for assigning bulk values in array after initialization


other than loop?

Examples
1. FORTRAN - uses DATA statement,
or put values in / ... / on the declaration time.

2. C and C++ - put values in braces; let compiler count them


e.g. int stuff [] = {2, 4, 6, 8};

3. Pascal and Modula-2 do not allow array initialization


Array Operations

1. Ada
- Assignment; RHS can be an aggregate constant or array name

- Concatenation; for all single-dimensioned arrays

- Relational operators (= and /= only)

2. FORTRAN 90

- Intrinsics (subprograms) for a wide variety of array operations

(e.g., matrix multiplication, vector dot product)


Slices
A slice is some substructure of an array;
nothing more than a referencing mechanism

Examples
1. FORTRAN 90
INTEGER MAT (1 : 4, 1 : 4)
MAT(1 : 4, 1) - the first column
MAT(2, 1 : 4) - the second row

2. Ada – One-D arrays only


LIST(4..10)
Implementation of Arrays
- Access function maps subscript expressions to address
in array. ch[i][j] i*No.of.Col +J =2*3+1= 7+100=107
Ch[2][1] j*No.of.Rows +i for(j=0; j<3 ; j++)
for
Two Approaches
- Row major (storage by rows) vs.
column major order (storage by columns)

What is affected if you access


a row major array with
a column major approach?
Or vice versa?

- Subscript Checking
Associative Arrays (Maps)
Associative array: unordered collection of data elements, indexed by equal
number of values called keys. (Key, Value pair).

Something like 2 d array, one have data other have keys for indexing.

Typically found in modern scripting languages like Python, Perl, Ruby


Example:
{ "Pride and Prejudice": "Alice",
"Wuthering Heights": "Alice",
"Great Expectations": "John"}

Design Issues

1. What is form of references to elements?


2. Is size of array static or dynamic?
Associative Arrays
Structure and Operations in Perl
- Names begin with %
- Literals are delimited by parentheses e.g.,
%hi_temps =("Monday”=>77, "Tuesday”=>79,…);
- Subscripting done using braces and keys e.g.,
$hi_temps{"Wednesday"} = 83;

- Elements can be removed with delete e.g.,


delete $hi_temps{"Tuesday"};
Records
Record: possibly heterogeneous aggregate of data
elements in which individual elements identified by
names.
Struct student { char name[5]; float age; int session;
float marks};

Design Issues:

1. What is form of references?


2. What unit operations are defined?
Record Definition Syntax
- COBOL uses level numbers to show nested

records; others use recursive definitions

Record Field References


1. COBOL
field_name OF record_name_1 OF ... OF record_name_n

2. Others (dot notation)

record_name_1.record_name_2. ... .record_name_n.field_name


Fully qualified references must include all record names.

Elliptical references allow leaving out record names as long as


reference is unambiguous.

Pascal and Modula-2 provide a with clause to abbreviate references


Record Operations
1. Assignment
- Pascal, Ada, and C allow it if types are identical
- In Ada, RHS can be aggregate constant

2. Initialization
- Allowed in Ada, using aggregate constant

3. Comparison
- In Ada, = and /=; one operand can be aggregate constant

4. MOVE CORRESPONDING
- In COBOL - it moves all fields in source record to
fields
with same names in destination record
Comparing records and arrays
1. Access to array elements is much slower than access to
record fields, because subscripts are dynamic (field
names are static)
2. Dynamic subscripts could be used with record field
access, but it would disallow type checking and it would be
much slower
Unions

union is type whose variables allowed to store different type


values at different times during execution.

Design Issues:

1. What kind of type checking, if any, must be done?

2. Should unions be integrated with records?


Discriminated Versus Free Unions
• C, C++ provide union constructs with no language support for type checking
• Free unions: programmers allowed complete freedom from type checking
• E.g., example, consider following C union:
union flexType {
int intEl; 4 bytes
27
float floatEl; 6 byte
};
union flexType el1; intE1= 27
float x;
... flaotE1=
el1.intEl = 27; 27.5
x = el1.floatEl;
• Type checking of unions requires union construct include type indicator tag,
or discriminant,
• union with discriminant called discriminated union
Union – Examples from Different Languages
1. Algol 68 - discriminated unions
- Use a hidden tag to maintain current type
- Tag is implicitly set by assignment
- References legal only in conformity clauses
Union (int, real) ir1, ir2; Conformity clause
-
int count; Union (int, real) ir1;
ir1:=33; // assignment int count; real sum;
Count:=ir1 illegal statement case ir1 in
(int intval): count:= intval,
(real realval): sum:=realval
Count= ir1;
runtime type selection is safe method of
accessing union objects
Union – Examples from Different Languages
3. Pascal - both discriminated and non-discriminated unions e.g.
type intreal = record tagg : Boolean of
true : ( boolInt : integer);
false : ( boolReal : real); 45.3
end; FALSE

Reasons Why Pascal’s Type Checking is Ineffective


a. User can create inconsistent unions (tag can be individually assigned)

var boolUrb : intreal;


var x : real; 47

boolUrb.tagg := true; { it is an integer }


boolUrb.boolInt := 47; { ok }
boolUrb.tagg := false; { it is a real }
// boolUrb.boolReal := 45.3; {real type}
x:= boolUrb.boolReal; { assigns an integer to a real }

b. tag is optional! Now, only declaration and second and last assignments are
Union – Examples from Different Languages
4. Ada - discriminated unions
- Reasons they are safer than Pascal & Modula-2:
a. Tag must be present
b. It is impossible for user to create inconsistent union
(because tag cannot be assigned by itself,
all assignments to union must include tag value)

5. C and C++ - free unions (no tags)


- Not part of their records
- No type checking of references

6. Java has neither records nor unions


Evaluation of Unions
- potentially unsafe in most languages (not Ada).
Sets
set is type whose variables can store unordered collections
of distinct values from some ordinal type.
Design Issue:
What is maximum No. of elements in any set base type?

Evaluation
- If language does not have sets, they must be simulated,
either with enumerated types or arrays
- Arrays more flexible than sets, but much slower operations

Implementation
- Usually stored as bit strings and use logical operations for
set operations
Set Examples from Different Languages
1. Pascal
- No maximum size in language definition
- Operations: union (+), intersection (*), difference (-),
equal (=), not equal (<>), superset (>=), subset (<=), in

2. Modula-2 and Modula-3


- Additional operations: INCL, EXCL, / (symmetric set
difference, elements in one but not both operands)

3. Ada - does not include sets, but defines in as set


membership operator for all enumeration types
4. Java includes a class for set operations
Pointers
pointer type is type in which range of values consists of memory
addresses and special value, nil (or null)
Uses:
1. Addressing flexibility 2. Dynamic storage management
Design Issues
1. What is scope and lifetime of pointer variables?
2. What is lifetime of heap-dynamic variables?
3. Are pointers restricted to pointing at a particular type?
4. Are pointers used for dynamic storage management, indirect
addressing, or both?
5. Should a language support pointer types, reference types, or both?

Fundamental Pointer Operations:


6. Assignment of an address to a pointer.
7. References (explicit versus implicit dereferencing).
Pointer Operations
• Assignment
• Int x, *ptr; ptr=&x
• Dereferencing
• int *ptr = new (int);
– j = *ptr
• Pointer to Structure
– pointer p to structure
student with field age
– (*p). Age
– P->age
Problems with pointers
1. Dangling Pointers (dangerous)
- pointer points to heap-dynamic variable that has been deallocated

Creating a Dangling Pointer:


a. Allocate a heap-dynamic variable and set pointer to point at it
b. Set second pointer to value of first pointer
c. Deallocate heap-dynamic variable, using first pointer.
d. Second pointer is still pointing to address of deallocated memory.
e.g., in C++
• int * arrayPtr1;
• int * arrayPtr2 = new int[100];
• arrayPtr1 = arrayPtr2;
• delete [] arrayPtr2;
• // Now, arrayPtr1 is dangling, because the heap storage
• // to which it was pointing has been deallocated.
Pointer Problems
. heap-dynamic variable that is no longer referenced by any program
pointer

Creating Lost Heap-Dynamic Variable:


a. Pointer p1 is set to point to newly created heap-dynamic variable.
b. p1 is later set to point to another newly created heap-dynamic variable.
int *p1 = new int;
p1= new int ;

- process of losing heap-dynamic variables is called memory leakage


Problems with Pointers –
Examples from Different Languages
1. Pascal: used for dynamic storage management only
- Explicit dereferencing
- Dangling pointers are possible (dispose)
- Dangling objects are also possible

2. Ada: little better than Pascal and Modula-2


- Some dangling pointers disallowed
- dynamic objects can be automatically deallocated at end of
pointer's scope
- All pointers are initialized to null
- Similar dangling object problem (but rarely happens)
Problems with Pointers –
Examples from Different Languages
3. C and C++
- Used for dynamic storage management, addressing
- Explicit dereferencing, address-of operator (&)

- Can do address arithmetic in restricted forms e.g.


float stuff[100]; // Array of float
float *p;
p = stuff;

* ( p + 5) is equivalent to stuff[5] and p[5]


* ( p + i) is equivalent to stuff[i] and p[i]

- Domain type need not be fixed


void *, can point to any type,
can be type checked, cannot be dereferenced
Problems with Pointers –
Examples from Different Languages
5. C++ Reference Types
• pointer refers to address in memory, reference refers to
object or value in memory int result = 0; int &result_ref=
result;
- Constant pointers that are implicitly dereferenced
- Used for parameters
- Advantages of both pass-by-reference and pass-by-value.
6. Java - Only references
- No pointer arithmetic String str1;
- str1= “this is a strimg”;
- Can only point at objects (which are all on heap)
- No explicit deallocator (garbage collection is used)
Evaluation of pointers
1. Dangling pointers and dangling objects are problems, as is
heap management.

2. Pointers are like goto's - they widen range of locations that


can be accessed by a variable.

3. Pointers are necessary - so we can't design a language


without them.

You might also like