Random Number Generation
Dr. John Mellor-Crummey
Department of Computer Science
Rice University
[email protected]COMP 528 Lecture 21 5 April 2005
Topics for Today
Understand
Motivation
Desired properties of a good generator
Linear congruential generators
multiplicative and mixed
Tausworthe generators
Combined generators
Seed selection
Myths about random number generation
Whats used today: MATLAB, R, Linux
Why Random Number Generation?
Simulation must generate random values for variables in a
specified random distribution
examples: normal, exponential,
How? Two steps
random number generation: generate a sequence of uniform FP
random numbers in [0,1]
random variate generation: transform a uniform random
sequence to produce a sequence with the desired distribution
How Random Number Generators Work
Most commonly use recurrence relation
x n = f (x n"1, x n"2 ,...)
recurrence is a function of last 1 (or a few numbers), e.g.
!Example:
x n = (5x n"1 + 1) mod 16
For x0= 5, first 32 numbers are 10, 3, 0, 1, 6, 15, 12, 13, 2, 11, 8, 9,
14, 7, 4, 5, 10, 3, 0, 1, 6, 15, 12, 13, 2, 11, 8, 9, 14, 7, 4, 5
!xs are integers in [0,16]
dividing by 16, get random numbers in interval [0,1]
Properties of pseudo-random number sequences
from seed value, can determine entire sequence
they pass statistical tests for randomness
reproducibility (often desirable)
4
Random Number Sequences
Some generators do not repeat the initial part of a sequence
cycle length
tail
period
Desired Properties of a Good Generator
Efficiently computable
Period should be large
dont want random numbers in a simulation to recycle
Successive values should be
independent
uniformly distributed
Linear-Congruential Generators
1951: D.H. Lehmer found that residues of successive powers
of a number have good randomness
x n = a n mod m;
after computing x n"1, x n = ax n"1 mod m
multiplier
modulus
Lehmers generator: multiplicative LCG
Modern generalization: mixed LCG
x n = (ax n"1 + b) mod m
a,b,m > 0
Result: xn are integers in [0, m-1]
Popular because
analyzed easily
certain guarantees can be made about their properties
7
Properties of LCGs
Choice of a, b, m affects
period
autocorrelation
Observations about LCGs
period can never be more than m modulus m should be large
m = 2k yields efficient implementation by truncation
if b is non-zero, obtain period of m iff
m & b are relatively prime
every prime that is a factor of m is also a factor of a - 1
if m is a multiple of 4, a - 1 must be too
all of these conditions are met if
m = 2k, for some integer k
x n = (ax n"1 + b) mod
a = 4c + 1, for some integer c
b is an odd integer
Full-period generator = one with period m
not all are equally good
! adjacent elements = better
lower autocorrelation between
Example: Two Candidate LCGs
Which is better?
x n = ((2 34 + 1)x n"1 + 1) mod 2 35
x n = ((218 + 1)x n"1 + 1) mod 2 35
Both must be full period generators
m = 2k, for some integer k
a = 4c + 1, for some integer c
b is an odd integer
x n = (ax n"1 + b) mod m
Multiplicative LCGs
More efficient than mixed LCGs: no addition
Two classes: m = 2k, m 2k
10
Multiplicative LCG with m = 2k
n
x n = a mod 2
Most efficient LCG: mod = truncation
Not full-period: maximum possible period for m = 2k is 2k-2
only possible if multipler a = 8i3 and x0 is odd
!
consider
x n = 5x n"1 mod 2 5 (lcg_m2k_good)
x n = 7x n"1 mod 2 5 (lcg_m2k_bad)
If 2k-2 period suffices, may use multiplicative LCG for efficiency
11
Multiplicative LCG with m 2k
n
x n = a mod m, m " 2
Avoid small period of LCG when m = 2k: use prime modulus
Full period generator with proper choice of a
when a is primitive root of m
i.e. an mod m 1 for n = 1, 2, , m-2
Consider
x n = 3x n"1 mod 31 (lcg_mprime_good)
x n = 5x n"1 mod 31 (lcg_mprime_bad)
Note : 5 3 mod 31 = 125 mod 31 = 1
Observations
unlike mixed LCG, xn can never be 0 when m is prime
12
Examining Bits of a Multiplicative LCG
testgenerator(@r1,1,20)
n
decimal binary
--- ---------- ----------------1
25173 01100010 01010101
2
12345 00110000 00111001
3
54509 11010100 11101101
4
27825 01101100 10110001
!
5
55493 11011000 11000101
6
25449 01100011 01101001
7
13277 00110011 11011101
8
53857 11010010 01100001
9
64565 11111100 00110101
10
1945 00000111 10011001
11
6093 00010111 11001101
12
24849 01100001 00010001
13
48293 10111100 10100101
14
52425 11001100 11001001
15
61629 11110000 10111101
16
18625 01001000 11000001
17
2581 00001010 00010101
18
25337 01100010 11111001
19
11949 00101110 10101101
20
47473 10111001 01110001
x n = 25,173x n"1 mod 216
bit 1: always 1
bit 2: always 0
bit 3: cycle (10) of length 2
bit 4: cycle (0110) of length 4
In general:
kth bit follows cycle
of length 2k-2, k 2
Typical of multiplicative
LCG with modulus 2k
13
Examining Bits of a Mixed LCG
testgenerator(@r2,1,20)
n
decimal binary
--- ---------- ----------------1
39022 10011000 01101110
2
61087 11101110 10011111
3
20196 01001110 11100100
4
45005 10101111 11001101
!
5
3882 00001111 00101010
6
21259 01010011 00001011
7
65216 11111110 11000000
8
19417 01001011 11011001
9
30502 01110111 00100110
10
20919 01010001 10110111
11
26076 01100101 11011100
12
16421 01000000 00100101
13
44130 10101100 01100010
14
63139 11110110 10100011
15
32824 10000000 00111000
16
14513 00111000 10110001
17
51934 11001010 11011110
18
36303 10001101 11001111
19
35284 10001001 11010100
20
8573 00100001 01111101
x n = (25,173x n"1 + 13,849)mod 216
bit 1: cycle (10) of length 2
bit 2: cycle (1100) of length 4
bit 3: cycle (11110000) of length 8
In general:
kth bit follows cycle of length 2k
Typical of mixed LCG with
modulus 2k
14
LCG Cautions
Properties guaranteed only if
computations are exact: no roundoff
use integer arithmetic without overflow
Low-order bits not very random, high-order bits better
if one wants k bits && k < machine word length
better to choose high-order k bits than low-order k bits.
15
Tausworthe Generators
Significant interest in huge random numbers
cryptographic applications want many-bit random numbers
produce k-bit numbers by
produce random sequence of bits
chunk bit stream into k-bit quantities
1965: Tausworthe generator
bn = c q"1bn"1 # c q"2bn"2 # c q"3bn"3 # ... # c 0bn"q
c i and bi are binary variables
# is the xor operation (mod 2 addition)
uses last q bits of bit stream to compute next bit
autoregressive, order q: AR(q)
AR(q) generator maximum period = 2q - 1
16
Tausworthe Generator Notation
Characteristic polynomial notation
characteristic polynomial
x7 + x3 +1
bn +7 " bn +3 " bn = 0, n = 0,1,2,...
bn +7 = bn +3 " bn , n = 0,1,2,...
bn = bn#4 " bn#7 ,
n = 7,8,9,...
Most polynomials for Tausworthe generators are trinomials
Period depends on characteristic polynomial
if period = 2q - 1, characteristic polynomial is primitive polynomial
17
Implementing Tausworthe Generators
Linear feedback shift registers
x7 + x3 +1
bn +7 " bn +3 " bn = 0, n = 0,1,2,...
bn +7 = bn +3 " bn , n = 0,1,2,...
bn = bn#4 " bn#7 ,
bn
bn-1
bn-2
n = 7,8,9,...
bn-3
bn-4
bn-5
bn-6
bn-7
out
Disadvantage of Tausworthe generators
while sequence is good overall, local behavior may not be
known to perform negatively on runs up and down test
first-order serial correlation almost 0
suspected that some polynomials may give poor high-order corr.
18
Generating k-bit Random Numbers
k-bit random numbers xn from binary sequence bn
Generalized feedback shift register method (Lewis & Payne 73)
x n = 0. bn bn +sbn +2s ... bn +(k"1)s
s is carefully selected delay
s k: xn and xj have no bits in common for n j
!
s relatively
prime to 2q - 1: guarantees full period for xn
Advantage
xn can be generated very efficiently with wide-word shift and
exclusive or operations
Requires
storing an array of seed numbers
careful initialization of seed array
19
Extended Fibonacci Generators
Fibonacci sequence:
Fibonacci RNG:
Properties
x n = x n"1 + x n -2
x n = (x n"1 + x n -2 )mod m
not very good randomness
high serial correlation
Extended Fibonacci generator (Marsaglia 1983)
x n = (x!n"5 + x n -17 )mod2 k
state: ring buffer with 17 values
initialization
save integers in 17 values (not all integers even)
initialize j=16,k=4 cursors for buffer
generate
x = B[j] + B[k]
B[j] = x
j = j -1 mod 17; k = k -1 mod 17
return x
Properties
passes most statistical tests
period = 2k(217-1) (much longer than LCGs)
20
Some Combined Generators
Can combine 2 or more generators to produce a better one
Adding random numbers from 2 or more generators
if xn and yn are random sequences in [0,m-1], then
wn= (xn + yn) mod m
can be used as a random number
why do this?
can increase period and randomness if two generators have different periods
Exclusive-or random numbers from 2 or more generators
Santha & Vazirani (1984)
xor of 2 random n-bit streams generates a more random sequence
Shuffle
use sequence a to pick which recent element in sequence b to return
Marsaglia & Bray (1964)
keep 100 items of sequence b
use sequence a to select which to return next and replace
claim: better k distributivity than LFSR methods
problem: not easy to skip long sequence for multi-stream simulations
21
Seed Selection Issues
Wrong combination of seed and RNG can hurt
especially if RNG is flawed
e.g. seed might be RNG fixed point
Cases
one stream needed
if RNG has full period, then any seed as good as another
multiple streams needed
e.g. queue simulation requires
interarrival time stream
service time stream
requires special care!
22
Seed Selection Guidelines I
Dont use 0
multiplicative LCGs and Tausworthe generators would stick at 0
Avoid even values
seed should be odd for multiplicative LCG with m = 2k
for full period generators, all non-zero values equally good
Dont subdivide one stream
dont use a single stream for all random variables
might be a strong correlation between items in same stream
Use non-overlapping streams
each stream requires separate seed
dont use same seed for 2 or more streams!
if seeds are bad, streams will overlap and not be independent
right way: select seeds so streams dont overlap at all
example: need 3 streams of 20,000 numbers
pick u0 as seed for first stream
pick u20,000 as seed for second stream
pick u40,000 as seed for third stream
23
Seed Selection Guidelines II
Reuse seeds in successive replications
if simulation experiment is replicated several times
can use seeds from end of previous replication in next one
Dont use random seeds
simulation cant be reproduced
impossible to guarantee multiple streams wont overlap
24
Myths I
A complex set of operations leads to random results
complicated code random sequence of numbers that will pass
tests of uniformity and independence
A single test of goodness suffices
sequence 0, 1, , m-1
not random but passes chi-square test
will fail run test
use as many tests as possible
Pseudo-random numbers are unpredictable
e.g. can identify LCG parameters with a few numbers and predict
LCG unsuitable for cryptographic applications where
unpredictability is desired
Some seeds are better than others
e.g. odd vs. even, avoid particular seeds, etc.
x n = (9806x n"1 + 1)mod(217 "1)
37,911 is a fixed point!
may be true for some generators, but these should be avoided!
any non-zero seed should produce equally valid results
25
Myths II
Accurate implementation is not important
period and randomness are guaranteed only if formula is
implemented without overflow or truncation
overflows and truncations can
change the path of a generator
reduce the period
Bits of successive words are equally-randomly distributed
if an algorithm produces a k-bit wide number, randomness is
only guaranteed when all k bits are used
unless specified otherwise, assume any particular bit position
(or sequence thereof) will not be equally random
26
Whats Used Today: MATLAB
rand function
lagged Fibonacci generator
seed
cache of 32 floating point numbers
combined with a shift register random integer generator
core: j ^= (j<<13); j ^= (j>>17); j ^= (j<<5)
properties:
period: > 21492
fairly sure all FP numbers in [e/2,1-e/2] are generated
e = 2-52
27
Whats Used Today: R
Mersenne-Twister (Matsumoto and Nishimura,1998) [default]
twisted GFSR based on Mersenne primes
seed: 623-dimensional set of 32-bit integers + a cursor
period: 219937 - 1
equi-distribution in 623 consecutive dimensions (whole period)
[note: variant of MT for independent parallel streams exists too]
Knuth-TAOCP (Knuth, 1997)
GFSR using lagged Fibonacci sequences with subtraction
X[j] = (X[j-100] - X[j-37]) mod 230
seed: the set of the 100 last numbers + cyclic shift of buffer
period: about 2^129.
Knuth-TAOCP-2002
initialization of GFSR from seed was altered
28
Whats Used Today: R (continued)
Wichmann-Hill
seed: integer vector of length 3
seed[i] is in 1:(p[i] - 1)
p is the length 3 vector of primes, p = (30269, 30307, 30323)
cycle length: 6.9536e12 = prod(p-1)/4
reference: Applied Statistics (1984) 33, 123
Marsaglia-Multicarry multiply-with-carry RNG (Marsaglia)
seed: two integers, all values allowed
period: > 260
has passed all tests (according to Marsaglia)
Super-Duper (Marsaglia)
doesnt pass the MTUPLE test of the Diehard battery
period: about 4.6*10^18 for most initial seeds
seed: 2 integers (first: all values allowed; second: odd value).
default seeds are the Tausworthe and congruence long integers
29
Whats Used Today: Linux
random function
non-linear additive feedback-based generator
state: 8, 32, 64, 128, or 256 bytes
all bits considered random
rand function
bottom 12 bits go through cyclic pattern
higher-order bits more random
30