Bits, Bytes, and Integers: Computer Architecture and Organization
Bits, Bytes, and Integers: Computer Architecture and Organization
2
Encoding Byte Values
al y
x ci m ar
n
Byte = 8 bits He De Bi
0 0 0000
Binary 000000002 to 111111112 1
2
1
2
0001
0010
Decimal: 010 to 25510 3
4
3
4
0011
0100
Hexadecimal 0016 to FF16 5 5 0101
6 6 0110
Base 16 number representation 7 7 0111
8 8 1000
Use characters ‘0’ to ‘9’ and ‘A’ to ‘F’ 9 9 1001
A 10 1010
Write FA1D37B16 in C as B 11 1011
0xFA1D37B C 12 1100
D 13 1101
0xfa1d37b E 14 1110
F 15 1111
3
Boolean Algebra
Developed by George Boole in 19th Century
Algebraic representation of logic
Encode “True” as 1 and “False” as 0
And Or
A&B = 1 when both A=1 and A|B = 1 when either A=1 or B=1
B=1
4
General Boolean Algebras
Operate on Bit Vectors
Operations applied bitwise
01101001 01101001 01101001
& 01010101 | 01010101 ^ 01010101 ~ 01010101
01000001
01000001 01111101
01111101 00111100
00111100 10101010
10101010
5
Bit-Level Operations in C
Operations &, |, ~, ^ Available in C
Apply to any “integral” data type
long, int, short, char, unsigned
View arguments as bit vectors
Arguments applied bit-wise
Examples (Char data type [1 byte])
In gdb, p/t 0xE prints 1110
~0x41 → 0xBE
~010000012 → 101111102
~0x00 → 0xFF
~000000002 → 111111112
0x69 & 0x55 → 0x41
011010012 & 010101012 → 010000012
0x69 | 0x55 → 0x7D
011010012 | 010101012 → 011111012
6
Representing & Manipulating Sets
Representation
Width w bit vector represents subsets of {0, …, w–1}
aj = 1 if j ∈ A
01101001 { 0, 3, 5, 6 }
76543210
MSB Least significant bit (LSB)
01010101 { 0, 2, 4, 6 }
76543210
Operations
& Intersection 01000001 { 0, 6 }
| Union 01111101 { 0, 2, 3, 4, 5, 6 }
^ Symmetric difference00111100 { 2, 3, 4, 5 }
~ Complement 10101010 { 1, 3, 5, 7 }
7
Contrast: Logic Operations in C
Contrast to Logical Operators
&&, ||, !
View 0 as “False”
Anything nonzero as “True”
Always return 0 or 1
Short circuit
8
Shift Operations
Left Shift: x << y
Shift bit-vector x left y positions Argument x 01100010
Throw away extra bits on left
Fill with 0’s on right << 3 00010000
Right Shift: x >> y Log. >> 2 00011000
Shift bit-vector x right y positions Arith. >> 2 00011000
Throw away extra bits on right
Logical shift
Fill with 0’s on left Argument x 10100010
Arithmetic shift << 3 00010000
Replicate most significant bit on left
Log. >> 2 00101000
Undefined Behavior
Shift amount < 0 or ≥ word size Arith. >> 2 11101000
9
Today: Bits, Bytes, and Integers
Representing information as bits
Bit-level manipulations
Integers
Representation: unsigned and signed
Conversion, casting
Expanding, truncating
Addition, negation, multiplication, shifting
Making ints from bytes
Summary
10
Data Representations
char 1 1 1
short 2 2 2
int 4 4 4
long 4 4 8
long long 8 8 8
float 4 4 4
double 8 8 8
pointer 4 4 8
11
How to encode unsigned integers?
12
How to encode signed integers?
13
Unsigned & Signed Numeric Values
X B2U(X) B2T(X) Equivalence
0000 0 0
0001 1 1
Same encodings for
0010 2 2 nonnegative values
0011 3 3 Uniqueness
0100 4 4 Every bit pattern represents
0101 5 5
0110 6 6
unique integer value
0111 7 7 Each representable integer has
1000 8 –8 unique bit encoding
1001 9 –7
1010 10 –6
Can Invert Mappings
1011 11 –5 U2B(x) = B2U-1(x)
1100 12 –4 Bit pattern for unsigned integer
1101 13 –3 T2B(x) = B2T-1(x)
1110 14 –2
Bit pattern for two’s comp integer
1111 15 –1 14
Encoding Integers
Unsigned Two’s Complement
w1 w2
xi 2 xi 2
i w1 i
B2U(X ) B2T (X ) xw1 2
i0 i0
Sign Bit
For 2’s complement, most significant bit indicates sign
0 for nonnegative
1 for negative
15
Encoding Example (Cont.)
x = 15213: 00111011 01101101
y = -15213: 11000100 10010011
111…1
Values for W = 16
Decimal Hex Binary
UMax 65535 FF FF 11111111 11111111
TMax 32767 7F FF 01111111 11111111
TMin -32768 80 00 10000000 00000000
-1 -1 FF FF 11111111 11111111
0 0 00 00 00000000 00000000
17
Values for Different Word Sizes
W
8 16 32 64
UMax 255 65,535 4,294,967,295 18,446,744,073,709,551,615
TMax 127 32,767 2,147,483,647 9,223,372,036,854,775,807
TMin -128 -32,768 -2,147,483,648 -9,223,372,036,854,775,808
Observations C Programming
#include <limits.h>
|TMin | = TMax + 1
Declares constants, e.g.,
Asymmetric range ULONG_MAX
UMax = 2 * TMax LONG_MAX
+1 LONG_MIN
Values platform specific
18
Today: Bits, Bytes, and Integers
Representing information as bits
Bit-level manipulations
Integers
Representation: unsigned and signed
Conversion, casting
Expanding, truncating
Addition, negation, multiplication, shifting
Making ints from bytes
Summary
19
Mapping Between Signed & Unsigned
22
Conversion Visualized
2’s Complement
0 0
Range
–1
–2
TMin
24
Negation: Complement & Increment
x 10011101
+ ~x 01100010
-1 11111111
25
Complement & Increment Examples
x = 15213
Decimal Hex Binary
x 15213 3B 6D 00111011 01101101
~x -15214 C4 92 11000100 10010010
~x+1 -15213 C4 93 11000100 10010011
y -15213 C4 93 11000100 10010011
x=0
Decimal Hex Binary
0 0 00 00 00000000 00000000
~0 -1 FF FF 11111111 11111111
~0+1 0 00 00 00000000 00000000
26
Signed vs. Unsigned in C
Constants
By default are considered to be signed integers
Unsigned if have “U” as suffix
0U, 4294967259U
Casting
Explicit casting between signed & unsigned same as U2T and T2U
int tx, ty;
unsigned ux, uy;
tx = (int) ux;
uy = (unsigned) ty;
void getstuff() {
char mybuf[MSIZE];
copy_from_kernel(mybuf, MSIZE);
printf(“%s\n”, mybuf);
}
30
Malicious Usage /* Declaration of library function memcpy */
void *memcpy(void *dest, void *src, size_t n);
void getstuff() {
char mybuf[MSIZE];
copy_from_kernel(mybuf, -MSIZE);
. . .
}
31
Summary
Casting Signed ↔ Unsigned: Basic Rules
32
Today: Bits, Bytes, and Integers
Representing information as bits
Bit-level manipulations
Integers
Representation: unsigned and signed
Conversion, casting
Expanding, truncating
Addition, negation, multiplication, shifting
Making ints from bytes
Summary
33
Sign Extension
Task:
Given w-bit signed integer x
Convert it to w+k-bit integer with same value
Rule:
Make k copies of sign bit:
X = xw–1 ,…, xw–1 , xw–1 , xw–2 ,…, x0 w
X • • •
k copies of MSB
• • •
X • • • • • •
k w 34
Sign Extension Example
35
Summary:
Expanding, Truncating: Basic Rules
37
Unsigned Addition
u • • •
Operands: w bits
+v • • •
True Sum: w+1 bits u+v • • •
Discard Carry: w bits UAddw(u , v) • • •
u v u v 2w
UAdd w (u,v) w
u v 2 u v 2w
38
Visualizing (Mathematical) Integer Addition
28
Values increase 24
16
14
39
Visualizing Unsigned Addition
2w+1 Overflow 12
10
8
14
2 w 6
10
12
4
8
2
6 v
0 0
0
4
Modular Sum 2
4
6
8
2
u 10
12
14
0
40
Mathematical Properties
41
Two’s Complement Addition
u • • •
Operands: w bits
+ v • • •
True Sum: w+1 bits
u+v • • •
Discard Carry: w bits TAddw(u , v) • • •
42
TAdd Overflow
43
Visualizing 2’s Complement Addition
NegOver
Values
4-bit two’s comp. TAdd4(u , v)
Range from -8 to +7
Wraps Around 8
6
If sum 2w–1 4
2
Becomes negative 0
6
-2 4
At most once -4
2
0
-6
If sum < –2w–1 -8 -4
-2
-8
-6 v
Becomes positive -4
-2
0
2 -8
-6
4
PosOver
At most once u 6
44
Characterizing TAdd
Positive Overflow
Functionality TAdd(u , v)
True sum requires >0
w+1 bits v
<0
Drop off MSB
Treat remaining bits <0 >0
u
Negative Overflow
as 2’s comp. integer
45
Multiplication
u • • •
Operands: w bits
* v • • •
True Product: 2*w bits u · v • • • • • •
UMultw(u , v) • • •
Discard w bits: w bits
47
Code Security Example #2
SUN XDR library
Widely used library for transferring data between
void*machines
copy_elements(void *ele_src[], int ele_cnt, size_t ele_size);
ele_src
malloc(ele_cnt * ele_size)
48
XDR Code
void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) {
/*
* Allocate buffer for ele_cnt objects, each of ele_size bytes
* and copy from locations designated by ele_src
*/
void *result = malloc(ele_cnt * ele_size);
if (result == NULL)
/* malloc failed */
return NULL;
void *next = result;
int i;
for (i = 0; i < ele_cnt; i++) {
/* Copy object i to destination */
memcpy(next, ele_src[i], ele_size);
/* Move pointer to next memory region */
next += ele_size;
}
return result;
}
49
XDR Vulnerability
malloc(ele_cnt * ele_size)
What if:
ele_cnt = 220 + 1
ele_size = 4096 = 212
Allocation = ??
50
Signed Multiplication in C
u • • •
Operands: w bits
* v • • •
True Product: 2*w bits u · v • • • • • •
TMultw(u , v) • • •
Discard w bits: w bits
51
Power-of-2 Multiply with Shift
Operation
u << k gives u * 2k
Both signed and unsigned k
u • • •
Operands: w bits
* 2k 0 ••• 0 1 0 ••• 0 0
True Product: w+k bits u · 2k • • • 0 ••• 0 0
Discard k bits: w bits UMultw(u , 2k) ••• 0 ••• 0 0
TMultw(u , 2k)
Examples
u << 3 == u * 8
u << 5 - u << 3 == u * 24
Most machines shift and add faster than multiply
Compiler generates this code automatically
52
Compiled Multiplication Code
C Function
int mul12(int x)
{
return x*12;
}
54
Compiled Unsigned Division Code
C Function
unsigned udiv8(unsigned x)
{
return x/8;
}
k
x ••• ••• Binary Point
Operands:
/ 2k 0 ••• 0 1 0 ••• 0 0
Division: x / 2k 0 ••• ••• . •••
Result: RoundDown(x / 2k) 0 ••• •••
Division Computed Hex Binary
y -15213 -15213 C4 93 11000100 10010011
y >> 1 -7606.5 -7607 E2 49 11100010 01001001
y >> 4 -950.8125 -951 FC 49 11111100 01001001
y >> 8 -59.4257813 -60 FF C4 11111111 11000100
56
Arithmetic: Basic Rules
Addition:
Unsigned/signed: Normal addition followed by truncate,
same operation on bit level
Unsigned: addition mod 2 w
Mathematical addition + possible subtraction of 2w
Signed: modified addition mod 2 w (result in proper range)
Mathematical addition + possible addition or subtraction of 2w
Multiplication:
Unsigned/signed: Normal multiplication followed by truncate, same
operation on bit level
Unsigned: multiplication mod 2 w
Signed: modified multiplication mod 2 w (result in proper range)
60
Arithmetic: Basic Rules
Unsigned ints, 2’s complement ints are isomorphic rings: isomorphism =
casting
Left shift
Unsigned/signed: multiplication by 2k
Always logical shift
Right shift
Unsigned: logical shift, div (division + round to zero) by 2 k
Signed: arithmetic shift
Positivenumbers: div (division + round to zero) by 2 k
Negative numbers: div (division + round away from zero) by 2 k
Use biasing to fix
61
Today: Integers
Representing information as bits
Bit-level manipulations
Integers
Representation: unsigned and signed
Conversion, casting
Expanding, truncating
Addition, negation, multiplication, shifting
Summary
Making ints from bytes
Summary
62
Properties of Unsigned Arithmetic
63
Properties of Two’s Comp. Arithmetic
Isomorphic Algebras
Unsigned multiplication and addition
Truncating to w bits
Two’s complement multiplication and addition
Truncating to w bits
Both Form Rings
Isomorphic to ring of integers mod 2w
Comparison to (Mathematical) Integer Arithmetic
Both are rings
Integers obey ordering properties, e.g.,
u>0 u+v>v
u > 0, v > 0 u · v > 0
These properties are not obeyed by two’s comp. arithmetic
TMax + 1 == TMin
15213 * 30426 == -10030 (16-bit words)
64
Why Should I Use Unsigned?
66
Byte-Oriented Memory Organization
• •0 •• F
• •
00 FF
•••
68
Word-Oriented Memory Organization
32-bit 64-bit
Bytes Addr.
Words Words
Addresses Specify Byte
0000
Locations Addr
= 0001
Address of first byte in word 0000
?? 0002
Addr
= 0003
Addresses of successive 0000
?? 0004
Addr
words differ by 4 (32-bit) or = 0005
0004
?? 0006
8 (64-bit) 0007
0008
Addr
= 0009
0008
??
Addr
0010
= 0011
0008
?? 0012
Addr
=
0013
0012
?? 0014
0015
69
Where do addresses come from?
The compilation pipeline
0 1000
Library
Library Library
Routines
Routines Routines
0 100 1100
prog
prog PP P:
P: :: :: :
:: ::
push ......
push :: :
:: push ......
push
inc :: :
foo() inc inc SP, SP, 44
foo() inc SP,
SP, xx jmp jmp
jmp 175 jmp 1175
:: jmp jmp 75 75 175
jmp _foo
_foo :: :: :
:: :: ...... ...
75 ...... 175 1175
end
end PP foo: ......
foo:
70
int A[10];
int main() {
int j = 10;
printf("Location and difference %p %ld(1-0) %ld(1-0)\n",
&A[0],
&A[1] - &A[0],
&A[1] - A);
printf(" Int differences %ld(sizeof) %ld(1-0) %ld(2-0) %ld(3-0)\n",
sizeof(A[0]),
&A[1] - &A[0],
&A[2] - &A[0],
&A[3] - &A[0]);
printf(" Byte differences %ld(sizeof) %ld(1-0) %ld(2-0) %ld(3-0)\n",
sizeof(A[0]),
(char*)&A[1] - (char*)&A[0],
(char*)&A[2] - (char*)&A[0],
(char*)&A[3] - (char*)&A[0]);
printf(" j Value %d pointer %p\n", j, &j);
return 0;
}
71
int A[10];
int main() {
int j = 10;
printf("Location and difference %p
%ld(1-0) %ld(1-0)\n",
&A[0],
&A[1] - &A[0],
&A[1] - A);
72
Output
int A[10];
int main() {
int j = 10;
printf("Location and difference %p %ld(1-0)
%ld(1-0)\n",
&A[0],
&A[1] - &A[0],
&A[1] - A);
Location and difference 0x601040 1(1-0) 1(1-0)
73
int A[10];
int main() {
…
printf(" Int differences
%ld(sizeof) %ld(1-0) %ld(2-0) %ld(3-0)\n",
sizeof(A[0]),
&A[1] - &A[0],
&A[2] - &A[0],
&A[3] - &A[0]);
74
int A[10];
int main() {
…
printf(" Int differences
%ld(sizeof) %ld(1-0) %ld(2-0) %ld(3-0)\n",
sizeof(A[0]),
&A[1] - &A[0],
&A[2] - &A[0],
&A[3] - &A[0]);
Int differences 4(sizeof) 1(1-0) 2(2-0) 3(3-
0)
75
int A[10];
int main() {
int j = 10;
…
printf(" Byte differences %ld(sizeof)
%ld(1-0) %ld(2-0) %ld(3-0)\n",
sizeof(A[0]),
(char*)&A[1] - (char*)&A[0],
(char*)&A[2] - (char*)&A[0],
(char*)&A[3] - (char*)&A[0]);
printf(" j Value %d pointer %p\n",
j, &j);
76
int A[10];
int main() {
int j = 10;
…
printf(" Byte differences %ld(sizeof)
%ld(1-0) %ld(2-0) %ld(3-0)\n",
sizeof(A[0]),
(char*)&A[1] - (char*)&A[0],
(char*)&A[2] - (char*)&A[0],
(char*)&A[3] - (char*)&A[0]);
printf(" j Value %d pointer %p\n", j,
&j);
Byte differences 4(sizeof) 4(1-0) 8(2-0) 12(3-0)
77
int A[10];
int main() {
int j = 10;
…
printf(" j Value %d pointer %p\n", j,
&j);
return 0;
}
78
int A[10];
int main() {
int j = 10;
…
printf(" j Value %d pointer %p\n", j,
&j);
return 0;
}
j Value 10 pointer 0x7fff860787ec
79
Byte Ordering
80
Byte Ordering Example
Big Endian
Least significant byte has highest address
Little Endian
Least significant byte has lowest address
Example
Variable x has 4-byte representation 0x01234567
Address given by &x is 0x100
Big Endian
0x100 0x101 0x102 0x103
01
01 23
23 45
45 67
67
Little Endian 0x100 0x101 0x102 0x103
67
67 45
45 23
23 01
01
81
Reading Byte-Reversed Listings
Disassembly
Text representation of binary machine code
Generated by program that reads the machine code
Example Fragment
Address Instruction Code Assembly Rendition
8048365: 5b pop %ebx
8048366: 81 c3 ab 12 00 00 add $0x12ab,%ebx
804836c: 83 bb 28 00 00 00 00 cmpl $0x0,0x28(%ebx)
Deciphering Numbers
Value: 0x12ab
Pad to 32 bits: 0x000012ab
Split into bytes: 00 00 12 ab
Reverse: ab 12 00 00
82
Examining Data Representations
Code to Print Byte Representation of Data
Casting pointer to unsigned char * creates byte array
typedef
typedef unsigned
unsigned char
char *pointer;
*pointer;
void
void show_bytes(pointer
show_bytes(pointer start,
start, int
int len){
len){
int
int i;
i;
for
for (i
(i == 0;
0; ii << len;
len; i++)
i++)
printf(”%p\t0x%.2x\n",start+i,
printf(”%p\t0x%.2x\n",start+i, start[i]);
start[i]);
printf("\n");
printf("\n");
}}
Printf directives:
%p: Print pointer
%x: Print Hexadecimal
83
show_bytes Execution Example
int
int aa == 15213;
15213;
printf("int
printf("int aa == 15213;\n");
15213;\n");
show_bytes((pointer)
show_bytes((pointer) &a,
&a, sizeof(int));
sizeof(int));
Result (Linux):
int
int aa == 15213;
15213;
0x11ffffcb8
0x11ffffcb8 0x6d
0x6d
0x11ffffcb9
0x11ffffcb9 0x3b
0x3b
0x11ffffcba
0x11ffffcba 0x00
0x00
0x11ffffcbb
0x11ffffcbb 0x00
0x00
84
Data alignment
A memory address a, is said to be n-byte aligned
when a is a multiple of n bytes.
n is a power of two in all interesting cases
Every byte address is aligned
A 4-byte quantity is aligned at addresses 0, 4, 8,…
Some architectures require alignment (e.g., MIPS)
Some architectures tolerate misalignment at
performance penalty (e.g., x86)
85
Data alignment in C structs
Struct members are never reordered in C & C++
Compiler adds padding so each member is aligned
struct {char a; char b;} no padding
struct {char a; short b;} one byte pad after a
Last member is padded so the total size of the
structure is a multiple of the largest alignment of
any structure member (so struct can go in array)
struct containing int requires 4-byte alignment
struct containing long requires 8-byte (on 64-bit arch)
86
Data alignment malloc
malloc(1)
16-byte aligned results on 32-bit
32-byte aligned results on 64-bit
int posix_memalign(void **memptr, size_t
alignment, size_t size);
Allocates size bytes
Places the address of the allocated memory in *memptr
Address will be a multiple of alignment, which must
be a power of two and a multiple of sizeof(void *)
87
Decimal:
Decimal: 15213
15213
Representing Integers
Binary:
Binary: 0011
00111011
10110110
01101101
1101
Hex:
Hex: 33 BB 66 DD
90
Integer C Puzzles
• x<0 ((x*2) < 0)
• ux >= 0
• x & 7 == 7 (x<<30) < 0
• ux > -1
• x>y -x < -y
• x * x >= 0
Initialization • x > 0 && y > 0 x + y > 0
• x >= 0 -x <= 0
int x = foo(); • x <= 0 -x >= 0
int y = bar(); • (x|-x)>>31 == -1
unsigned ux = x; • ux >> 3 == ux/8
• x >> 3 == x/8
unsigned uy = y;
• x & (x-1) != 0
91