Simple Data Types in C
Alan L. Cox
alc@rice.edu
Objectives
Be able to explain to others what a data type
is
Be able to use basic data types in C programs
Be able to see the inaccuracies and limitations
introduced by machine representations of
numbers
Cox / Fagan Simple Data Types 2
What do you see?
Cox / Fagan Simple Data Types 3
What do you see?
Cox / Fagan Simple Data Types 4
Last one
Cox / Fagan Simple Data Types 5
Everything is Just a Bunch of Bits
Bits can represent many different things
Depends on interpretation
You and your program must keep track of
what kind of data is at each location in the
computer’s memory
E.g., program data types
Cox / Fagan Simple Data Types 6
Big Picture
Processor works with finite-sized data
All data implemented as a sequence of bits
Bit = 0 or 1
Represents the level of an electrical charge
Byte = 8 bits
Word = largest data size handled by processor
32 bits on most older computers
64 bits on most new computers
Cox / Fagan Simple Data Types 7
Data types in C
Only really four basic types:
char
int (short, long, long long, unsigned)
float
double
Size of these types on Type Size (bytes)
CLEAR machines: char 1
int 4
short 2
long 8
Sizes of these types
long long 8
vary from one machine
float 4
to another! double 8
Cox / Fagan Simple Data Types 8
Characters (char)
Roman alphabet, punctuation, digits, and
other symbols:
Encoded within one byte (256 possible symbols)
ASCII encoding (man ascii for details)
In C:
char a_char = ’a’;
char newline_char = ’\n’;
char tab_char = ’\t’;
char backslash_char = ’\\’;
Cox / Fagan Simple Data Types 9
ASCII
Special
control
characters From “man ascii”:
| 0 NUL| 1 SOH| 2 STX| 3 ETX| 4 EOT| 5 ENQ| 6 ACK| 7 BEL|
| 8 BS | 9 HT | 10 NL | 11 VT | 12 NP | 13 CR | 14 SO | 15 SI |
| 16 DLE| 17 DC1| 18 DC2| 19 DC3| 20 DC4| 21 NAK| 22 SYN| 23 ETB|
| 24 CAN| 25 EM | 26 SUB| 27 ESC| 28 FS | 29 GS | 30 RS | 31 US |
| 32 SP | 33 ! | 34 " | 35 # | 36 $ | 37 % | 38 & | 39 ' |
| 40 ( | 41 ) | 42 * | 43 + | 44 , | 45 - | 46 . | 47 / |
| 48 0 | 49 1 | 50 2 | 51 3 | 52 4 | 53 5 | 54 6 | 55 7 |
| 56 8 | 57 9 | 58 : | 59 ; | 60 < | 61 = | 62 > | 63 ? |
| 64 @ | 65 A | 66 B | 67 C | 68 D | 69 E | 70 F | 71 G |
| 72 H | 73 I | 74 J | 75 K | 76 L | 77 M | 78 N | 79 O |
| 80 P | 81 Q | 82 R | 83 S | 84 T | 85 U | 86 V | 87 W |
| 88 X | 89 Y | 90 Z | 91 [ | 92 \ | 93 ] | 94 ^ | 95 _ |
| 96 ` | 97 a | 98 b | 99 c |100 d |101 e |102 f |103 g |
|104 h |105 i |106 j |107 k |108 l |109 m |110 n |111 o |
|112 p |113 q |114 r |115 s |116 t |117 u |118 v |119 w |
|120 x |121 y |122 z |123 { |124 | |125 } |126 ~ |127 DEL|
Cox / Fagan Simple Data Types 10
Characters are just numbers
What does this function do?
return type argument type
char and name
procedure fun(char c)
name {
local variable char new_c;
comparisons
type and name
with characters!
if ((c >= ’A’) && (c <= ’Z’))
new_c = c - ’A’ + ’a’; Math on
characters!
else
new_c = c;
return (new_c);
}
Cox / Fagan Simple Data Types 11
Integers
Fundamental problem:
Fixed-size representation can’t encode all numbers
Standard low-level solution:
Limit number range and precision
• Usually sufficient
• Potential source of bugs
Signed and unsigned variants
unsigned modifier can be used with any sized
integer (short, long, or long long)
Cox / Fagan Simple Data Types 12
Signed Integer
Cox / Fagan Simple Data Types 13
Integer Representations
1111 0000 0001
Base 2 Base 16 Unsigned 2’s Comp.
0000 0 0 0 1110 0 0010
F 1
0001 1 1 1
0010 2 2 2 E 15 0 1 2
0011 3 3 3
1101 14 -1 0 1 2 0011
0100 4 4 4 D 3
0101 5 5 5 13 -2 2 3
0110 6 6 6 -3 3
0111 7 7 7 1100 C 12 -4 4 4 4 0100
1000 8 8 -8 -5 5
1001 9 9 -7 11 5
B -6 6 5
1010 A 10 -6 -7 -8 7
1011 10 6 0101
1011 B 11 -5
1100 C 12 -4 A 9 8 7 6
1101 D 13 -3
1110 E 14 -2 1010 9 8 7 0110
1111 F 15 -1
1001 1000 0111
Why one more negative than positive?
Cox / Fagan Simple Data Types 14
Integer Representations
Base 2 Base 16 Unsigned 2’s Comp. Math for n bits:
0000 0 0 0
0001 1 1 1
0010 2 2 2 Define x xn 1 x0
0011 3 3 3
0100 4 4 4
n 1
B 2U ( x ) 2i xi
0101 5 5 5
0110 6 6 6
0111 7 7 7 i 0
1000 8 8 -8
n2
B 2T ( x ) 2 xn 1 2i xi
1001 9 9 -7 n 1
1010 A 10 -6
1011 B 11 -5 i 0
1100 C 12 -4
1101 D 13 -3
sign bit
1110 E 14 -2
0=non-negative
1111 F 15 -1
1=negative
Cox / Fagan Simple Data Types 15
Integer Ranges
Unsigned
UMinn … UMaxn = 0 … 2n-1:
32 bits: 0 ... 4,294,967,295 unsigned int
64 bits: 0 ... 18,446,744,073,709,551,615 unsigned long int
2’s Complement
TMinn … TMaxn = -2n-1 … 2n-1-1:
32 bits: -2,147,483,648 ... 2,147,483,647 int
64 bits: -9,223,372,036,854,775,808 … 9,223,372,036,854,775,807 long int
Note: C numeric ranges are platform dependent!
#include <limits.h> to define ULONG_MAX, UINT_MIN, INT_MAX, …
Cox / Fagan Simple Data Types 16
Detecting Overflow in Programs
Some high-level languages (ML, Ada, …):
Overflow causes exception that can be handled
C:
Overflow causes no special event
Programmer must check, if desired
E.g., given a, b, and c=UAddn(a,b) – overflow?
Claim: Overflow iff c < a (Or similarly, iff c < b)
Proof: Know 0 b < 2n
If no overflow, c = (a + b) mod 2n = a + b a + 0 = a
If overflow, c = (a + b) mod 2n = a + b – 2n < a
Cox / Fagan Simple Data Types 17
Overflow
unsigned int x = 2123456789U; z is 951,946,282 not
unsigned int y = 3123456789U; 5,246,913,578 as
unsigned int z; expected
z = x + y;
int x = 2123456789; y is not a valid positive
int y = 3123456789; number (sign bit is set)!
int z; It’s -1,171,510,507
z = x + y; z is still 951,946,282
Cox / Fagan Simple Data Types 18
However, …
#include <assert.h>
Should this assertion
void ever fail?
procedure(int x)
{ Do a web search for “GCC
… bug 30475”.
assert(x + 10 > x);
The C language definition does not assume 2’s
complement as the underlying implementation.
The behavior of signed integer overflow is undefined.
Cox / Fagan Simple Data Types 19
You say, “What should I do?”
#include <limits.h>
/* Safe, signed add. Won’t overflow. */
int
safe_add(int x, int y)
{
if (y < 0)
return (safe_sub(x, -y));
if (INT_MAX – y < x)
return (INT_MAX); /* Don’t overflow! */
return (x + y);
}
Cox / Fagan Simple Data Types 20
Bit Shifting as Multiplication
Shift left (x << 1) multiplies by 2:
0 0 1 1 =3 1 1 0 1 = -3
0 1 1 0 =6 1 0 1 0 = -6
Works for unsigned, 2’s complement
Can overflow
In decimal, same idea multiplies by 10: e.g., 42 420
Cox / Fagan Simple Data Types 21
Bit Shifting as Division
Logical shift right (x >> 1) divides by 2 for unsigned:
0 1 1 1 =7 1 0 0 1 =9
0 0 1 1 =3 0 1 0 0 =4
Always rounds down!
Arithmetic shift right (x >> 1) divides by 2 for 2’s complement:
0 1 1 1 =7 1 0 0 1 = -7
0 0 1 1 =3 1 1 0 0 = -4
Always rounds down!
Cox / Fagan Simple Data Types 22
Bit Shifting for Multiplication/Division
Why useful?
Simpler, thus faster, than general multiplication &
division
Standard compiler optimization
Can shift multiple positions at once:
Multiplies or divides by corresponding power-of-2
a << 5 a >> 5
Cox / Fagan Simple Data Types 23
A Sampling of Integer Properties
For both unsigned & 2’s complement:
Mostly as usual, e.g.: Some surprises, e.g.:
0 is identity for +, - ÷ doesn’t distribute over +, -
1 is identity for ×, ÷ (a,b > 0 a + b > a)
+, -, × are associative
+, × are commutative
× distributes over +, -
Why should you care?
– Programmer should be aware of behavior of their programs
– Compiler uses such properties in optimizations
Cox / Fagan Simple Data Types 24
Beware of Sign Conversions in C
Beware implicit or explicit conversions between
unsigned and signed representations!
One of many common mistakes:
?
unsigned int u;
… ? What’s wrong?
if (u > -1) …
Always false(!) because -1 is converted to unsigned, yielding UMaxn
Cox / Fagan Simple Data Types 25
Non-Integral Numbers: How?
Fixed-size representations
Rational numbers (i.e., pairs of integers)
Fixed-point (use integer, remember where point is)
Floating-point (scientific notation)
Variable-size representations
Sums of fractions (e.g., Taylor-series)
Unbounded-length series of digits/bits
Cox / Fagan Simple Data Types 26
Floating-point
Binary version of scientific notation
1.001101110 × 25 = 100110.1110
= 32 + 4 + 2 + 1/2 + 1/4 + 1/8
= 38.875
-1.011 × 2-3 = -.001011
= - (1/8 + 1/32 + 1/64)
= -.171875
binary point
Cox / Fagan Simple Data Types 27
FP Overflow & Underflow
Fixed-sized representation leads to limitations
Large positive exponent.
Unlike integer arithmetic, overflow
imprecise result (), not inaccurate result
Round Round
to - Zero to +
Negative Expressible Negative Positive Expressible Positive
overflow negative values underflow underflow positive values overflow
Large negative exponent
Round to zero
Cox / Fagan Simple Data Types 28
FP Representation
1.001101110 × 25
significand exponent
Fixed-size representation
Using more significand bits increased precision
Using more exponent bits increased range
Typically, fixed # of bits for each part, for
simplicity
Cox / Fagan Simple Data Types 29
FP Representation: IEEE 754
Current standard version of floating-point
Single-precision (float)
One word: 1 sign bit, 23 bit fraction, 8 bit exponent
Positive range: 1.17549435 × 10-38 … 3.40282347 × 10+38
Double-precision (double)
Two words: 1 sign bit, 52 bit fraction, 11 bit exponent
Positive range: 2.2250738585072014 × 10-308 … 1.7976931348623157
× 10+308
Lots of details in B&O Chapter 2.4
Cox / Fagan Simple Data Types 30
IEEE 754 Special Numbers
+0.0, -0.0
+, -
NaN: “Not a number”
(+1.0 × 10+38)2 = + +0.0 +0.0 = NaN
+1.0 +0.0 = + + - + = NaN
+1.0 -0.0 = - 1 = NaN
Cox / Fagan Simple Data Types 31
FP vs. Integer Results
int i = 20 / 3;
float f = 20.0 / 3.0;
True mathematical answer: 20 3 = 6 2/3
i= ? 6 Integer division ignores remainder
f= ? 6.666667 FP arithmetic rounds result
Cox / Fagan Simple Data Types 32
FP vs. Integer Results
int i = 1000 / 6;
float f = 1000.0 / 6.0;
True mathematical answer: 1000 6 = 166 2/3
i= ? 166 Integer division ignores remainder
f= ? 166.666672 FP arithmetic rounds result
Surprise!
Arithmetic in binary, printing in decimal –
doesn’t always give expected result
Cox / Fagan Simple Data Types 33
FP Integer Conversions in C
#include <limits.h>
#include <stdio.h>
int
main(void)
{
unsigned int ui = UINT_MAX;
float f = ui;
printf(“ui: %u\nf: %f\n”, ui, (double)f);
}
Surprisingly, this program print the following. Why?
ui: 4294967295
f: 4294967296.000000
Cox / Fagan Simple Data Types 34
FP Integer Conversions in C
int i = 3.3 * 5;
float f = i;
True mathematical answer: 3.3 5 = 16 ½
i= ? 16 Converts 5 5.0 – Truncates result 16 ½ 16
f= ? 16.0
integer FP: FP integer:
Can lose precision Truncate fraction
Rounds, if necessary If out of range,
undefined – not error
32-bit int fits in
double-precision FP
Cox / Fagan Simple Data Types 35
FP Behavior
Programmer must be aware of accuracy limitations!
Dealing with this is a subject of classes like CAAM 453
(1010 + 1030) + –1030 =? 1010 + (1030 + –1030)
1030 – 1030 =? 1010 + 0
0 1010
Operations not associative!
(1.0 + 6.0) ÷ 640.0 =? (1.0 ÷ 640.0) + (6.0 ÷ 640.0)
7.0 ÷ 640.0 =? .001563 + .009375
.010937 .010938
×,÷ not distributive across +,-
Cox / Fagan Simple Data Types 36
What about other types?
Booleans
A late addition to C
Strings
We’ll cover these in a later class
Enumerated types
A restricted set of integers
Complex Numbers
Not covered
Cox / Fagan Simple Data Types 37
Booleans
One bit representation
0 is false
1 is true
One byte or word representation
Inconvenient to manipulate only one bit
Two common encodings:
0000…0000 is false 0000…0000 is false
0000…0001 is true all other words are true
all other words are garbage
Wastes space, but space is usually cheap
Cox / Fagan Simple Data Types 38
Booleans in C
#include <stdbool.h>
Important!
bool bool1 = true; Compiler needs this or it
won't know about "bool"!
bool bool2 = false;
bool added to C in 1999
Many programmers had already defined their
own Boolean type
To avoid conflict bool is disabled by default
Cox / Fagan Simple Data Types 39
C’s Common Boolean Operations
C extends definitions to integers
Booleans are encoded as integers
• 0 == false
• non-0 == true
Logical AND: 0 && 4 == 0 3 && 4 == 1 3 && 0 == 0
Logical OR: 0 || 4 == 1 3 || 4 == 1 3 || 0 == 1
Logical NOT: ! 4 == 0 ! 0 == 1
&& and || short-circuit
Evaluate 2nd argument only if necessary
E.g., 0 && error-producing-code == 0
Cox / Fagan Simple Data Types 40
Enumerated Types
E.g., a Color = red, blue, black, or yellow
Small (finite) number of choices
Booleans & characters are common special cases
Pick arbitrary bit patterns for each
Not enforced in C
Actually just integers
Can assign values outside of the enumeration
Could cause bugs
Cox / Fagan Simple Data Types 41
Enumerated Types in C
enum Color { RED, WHITE, BLACK, YELLOW };
enum Color my_color = RED;
The new type name is
Alternative style: “enum Color”
enum AColor { COLOR_RED, COLOR_WHITE,
COLOR_BLACK, COLOR_YELLOW };
typedef enum AColor color_t;
color_t my_color = COLOR_RED;
Pre-C99 Boolean definition:
enum Bool { false = 0, true = 1 };
typedef enum Bool bool;
bool my_bool = true;
Cox / Fagan Simple Data Types 42
First Assignment
Carefully read the assignments web page
Honor code policy
Slip day policy
All assignments will be posted on that web
page
Assignments are due at 11:55PM on the due
date, unless otherwise specified
Assignments must be done on CLEAR servers
Cox / Fagan Simple Data Types 43
Next Time
Arrays and Pointers in C
Cox / Fagan Simple Data Types 44