Lecture 2 - Bits, Bytes and Integers
Lecture 2 - Bits, Bytes and Integers
1
Carnegie Mellon
2
Carnegie Mellon
Binary Representations
0 1 0
3.3V
2.8V
0.5V
0.0V
3
Carnegie Mellon
4
Carnegie Mellon
•••
5
Carnegie Mellon
Machine Words
Machine Has “Word Size”
▪ Nominal size of integer-valued data
▪ Including addresses
▪ Most current machines use 32 bits (4 bytes) words
▪ Limits addresses to 4GB
▪ Becoming too small for memory-intensive applications
▪ High-end systems use 64 bits (8 bytes) words
▪ Potential address space ≈ 1.8 X 1019 bytes
▪ x86-64 machines support 48-bit addresses: 256 Terabytes
▪ Machines support multiple data formats
▪ Fractions or multiples of word size
▪ Always integral number of bytes
6
Carnegie Mellon
Data Representations
C Data Type Typical 32-bit Intel IA32 x86-64
char 1 1 1
short 2 2 2
int 4 4 4
long 4 4 8
long long 8 8 8
float 4 4 4
double 8 8 8
pointer 4 4 8
8
Carnegie Mellon
Byte Ordering
How should bytes within a multi-byte word be ordered in
memory?
Conventions
▪ Big Endian: Sun, PPC Mac, Internet
▪ Least significant byte has highest address
▪ Little Endian: x86
▪ Least significant byte has lowest address
9
Carnegie Mellon
10
Carnegie Mellon
Deciphering Numbers
▪ Value: 0x12ab
▪ Pad to 32 bits: 0x000012ab
▪ Split into bytes: 00 00 12 ab
▪ Reverse: ab 12 00 00
11
Carnegie Mellon
Decimal: 15213
Representing Integers Binary: 0011 1011 0110 1101
Hex: 3 B 6 D
Representing Strings
char S[6] = "18243";
Strings in C
▪ Represented by array of characters
▪ Each character encoded in ASCII format Linux/Alpha Sun
▪ Standard 7-bit encoding of character set 31 31
▪ Character “0” has code 0x30 38 38
– Digit i has code 0x30+i 32 32
▪ String should be null-terminated 34 34
▪ Final character = 0
33 33
Compatibility 00 00
▪ Byte ordering not an issue
13
Carnegie Mellon
14
Carnegie Mellon
Boolean Algebra
Developed by George Boole in 19th Century
▪ Algebraic representation of logic
▪ Encode “True” as 1 and “False” as 0
And Or
◼ A&B = 1 when both A=1 and B=1 ◼ A|B = 1 when either A=1 or B=1
15
Carnegie Mellon
16
Carnegie Mellon
▪ 01101001 { 0, 3, 5, 6 }
▪ 76543210
▪ 01010101 { 0, 2, 4, 6 }
▪ 76543210
Operations
▪ & Intersection 01000001 { 0, 6 }
▪ | Union 01111101 { 0, 2, 3, 4, 5, 6 }
▪ ^ Symmetric difference 00111100 { 2, 3, 4, 5 }
▪ ~ Complement 10101010 { 1, 3, 5, 7 }
17
Carnegie Mellon
Bit-Level Operations in C
Operations &, |, ~, ^ Available in C
▪ Apply to any “integral” data type
▪ long, int, short, char, unsigned
▪ View arguments as bit vectors
▪ Arguments applied bit-wise
Examples (Char data type)
▪ ~0x41 = 0xBE
▪ ~010000012 = 101111102
▪ ~0x00 = 0xFF
▪ ~000000002 = 111111112
▪ 0x69 & 0x55 = 0x41
▪ 011010012 & 010101012 = 010000012
▪ 0x69 | 0x55 = 0x7D
▪ 011010012 | 010101012 = 011111012
18
Carnegie Mellon
19
Carnegie Mellon
Shift Operations
Left Shift: x << y Argument x 01100010
▪ Shift bit-vector x left y positions << 3 00010000
– Throw away extra bits on left
Log. >> 2 00011000
▪ Fill with 0’s on right
Arith. >> 2 00011000
Right Shift: x >> y
▪ Shift bit-vector x right y positions
▪ Throw away extra bits on right Argument x 10100010
Undefined Behavior
▪ Shift amount < 0 or ≥ word size
20
Carnegie Mellon
21
Carnegie Mellon
Encoding Integers
Unsigned Two’s Complement
w−1 w−2
xi 2 xi 2
i w−1 i
B2U(X) = B2T(X) = − xw−1 2 +
i=0 i=0
Sign Bit
▪ For 2’s complement, most significant bit indicates sign
▪ 0 for nonnegative
▪ 1 for negative
22
Carnegie Mellon
Observations C Programming
▪ |TMin | = TMax + 1 ▪ #include <limits.h>
▪ Asymmetric range ▪ Declares constants, e.g.,
▪ UMax = 2 * TMax + 1 ▪ ULONG_MAX
▪ LONG_MAX
▪ LONG_MIN
▪ Values platform specific
24
Carnegie Mellon
25
Carnegie Mellon
Conversion Visualized
2’s Comp. Unsigned
▪ Ordering Inversion UMax
▪ Negative Big Positive
UMax – 1
TMax + 1 Unsigned
TMax TMax Range
2’s Complement
0 0
Range
–1
–2
TMin
26
Carnegie Mellon
27
Carnegie Mellon
Casting Surprises
Expression Evaluation
▪ If there is a mix of unsigned and signed in single expression,
signed values implicitly cast to unsigned
▪ Including comparison operations <, >, ==, <=, >=
▪ Examples for W = 32: TMIN = -2,147,483,648 , TMAX = 2,147,483,647
Constant1 Constant2 Relation Evaluation
0 0 0U
0U == unsigned
-1 -1 00 < signed
-1 -1 0U
0U > unsigned
2147483647
2147483647 -2147483647-1
-2147483648 > signed
2147483647U
2147483647U -2147483647-1
-2147483648 < unsigned
-1 -1 -2
-2 > signed
(unsigned)-1
(unsigned) -1 -2
-2 > unsigned
2147483647
2147483647 2147483648U
2147483648U < unsigned
2147483647
2147483647 (int)2147483648U
(int) 2147483648U > signed
28
Carnegie Mellon
Summary
Casting Signed ↔ Unsigned: Basic Rules
Bit pattern is maintained
But reinterpreted
Can have unexpected effects: adding or subtracting 2w
29
Carnegie Mellon
30
Carnegie Mellon
Sign Extension
Task:
▪ Given w-bit signed integer x
▪ Convert it to w+k-bit integer with same value
Rule:
▪ Make k copies of sign bit:
▪ X = xw–1 ,…, xw–1 , xw–1 , xw–2 ,…, x0
k copies of MSB w
X •••
•••
X= ••• •••
k w 31
Carnegie Mellon
32
Carnegie Mellon
Summary:
Expanding, Truncating: Basic Rules
Expanding (e.g., short int to int)
▪ Unsigned: zeros added
▪ Signed: sign extension
▪ Both yield expected result
33
Carnegie Mellon
34
Carnegie Mellon
35
Carnegie Mellon
x=0
Decimal Hex Binary
0 0 00 00 00000000 00000000
~0 -1 FF FF 11111111 11111111
~0+1 0 00 00 00000000 00000000
36
Carnegie Mellon
Unsigned Addition
Operands: w bits u •••
+v •••
True Sum: w+1 bits
u+v •••
Discard Carry: w bits UAddw(u , v) •••
u+ v u + v 2w
UAdd w (u,v) = w
u + v − 2 u + v 2w
37
Carnegie Mellon
38
Carnegie Mellon
TAdd Overflow
Functionality True Sum
▪ True sum requires w+1 0 111…1 2w–1
PosOver
bits TAdd Result
▪ Drop off MSB 0 100…0 2w –1 011…1
▪ Treat remaining bits as
2’s comp. integer 0 000…0 0 000…0
1 000…0 NegOver
–2w
39
Carnegie Mellon
Multiplication
Computing Exact Product of w-bit numbers x, y
▪ Either signed or unsigned
Ranges
▪ Unsigned: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1
▪Up to 2w bits
▪ Two’s complement min: x * y ≥ (–2w–1)*(2w–1–1) = –22w–2 + 2w–1
▪ Up to 2w–1 bits
▪ Two’s complement max: x * y ≤ (–2w–1) 2 = 22w–2
▪ Up to 2w bits, but only for (TMinw)2
40
Carnegie Mellon
Unsigned Multiplication in C
u •••
Operands: w bits
* v •••
True Product: 2*w bits u · v ••• •••
UMultw(u , v) •••
Discard w bits: w bits
41
Carnegie Mellon
Signed Multiplication in C
u •••
Operands: w bits
* v •••
True Product: 2*w bits u · v ••• •••
TMultw(u , v) •••
Discard w bits: w bits
42
Carnegie Mellon
43
Carnegie Mellon
44
Carnegie Mellon
45
Carnegie Mellon
46
Carnegie Mellon
47
Carnegie Mellon
Case 1: No rounding k
Dividend: u 1 ••• 0 ••• 0 0
+2k –1 0 ••• 0 0 1 ••• 1 1
1 ••• 1 ••• 1 1 Binary Point
Divisor: / 2k 0 ••• 0 1 0 ••• 0 0
Incremented by 1
Biasing adds 1 to final result
49
Carnegie Mellon
50
Carnegie Mellon
Today: Integers
Representation: unsigned and signed
Conversion, casting
Expanding, truncating
Addition, negation, multiplication, shifting
Summary
51