### Table of Contents

### Introduction

This article describes typical ways of generating uniform random numbers, identifies common flaws, and provides improved alternatives. It does not attempt to describe the underlying algorithms for generating random bits (e.g. Mersenne Twister, etc.) but rather ways to use those underlying algorithms in practical programs. Nonetheless, fast and high-quality underlying algorithms are crucial, so I'll recommend looking at the modern algorithms developed by George Marsaglia, particularly the KISS variants and the Xorshift variants. The code given here will assume the existence of the following function:

uint NextUInt32() { /* generate a random 32-bit unsigned integer */ }

### Poor algorithms and sources of bias

Here are some of the typical algorithms in widespread use and the sources of bias in those algorithms.

#### Generating integers in a range

There are two common functions for generating numbers in a range, with signatures like the following:

// Returns a random number from 0 to exclusiveMax-1. exclusiveMax must be > 0. uint Next(uint exclusiveMax); // Returns a random number from min to max (inclusive). min must be less than or equal to max. uint Next(uint min, uint max);

We will focus on the former function, since the latter can be implemented in terms of the former as follows:

uint Next(uint min, uint max) { if(min > max) throw new ArgumentException("The minimum must be less than or equal to the maximum."); uint diff = max - min + 1; return diff != 0 ? Next(diff) + min : NextUInt32(); // if diff == 0 then it's the full 32-bit range }

Similarly, we will focus on unsigned integers since signed integers can be implemented in terms of unsigned:

int Next(int exclusiveMax) { if(exclusiveMax <= 0) throw new ArgumentOutOfRangeException(); return (int)Next((uint)exclusiveMax); } int Next(int min, int max) { if(min > max) throw new ArgumentException("The minimum must be less than or equal to the maximum."); uint diff = (uint)(max - min + 1); return diff != 0 ? (int)Next(diff) + min : (int)NextUInt32(); // if diff == 0 then it's the full 32-bit range }

A typical poor implementation of Next(uint) looks like this:

uint Next(uint exclusiveMax) { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); return NextUInt32() % exclusiveMax; }

#### Sources of bias

There are two main sources of bias in code like the above: modulus bias and range bias.

##### Modulus bias

The code above is a prime example of modulus bias. The modulus operation maps values from the range of NextUInt32 (0
to 2^{32}–1, i.e. 2^{32} different values) onto the output range (0 to
*exclusiveMax*–1, i.e. *exclusiveMax* different values). If the range of NextUInt32 is not evenly
divisible by *exclusiveMax*, then some values will be more likely to be returned than others.
(Imagine the underlying generator returned values from 0 to 9, and *exclusiveMax* is 4. Then 0, 4, and 8, taken
modulo 4, map to 0; 1, 5, and 9 map to 1; 2 and 6 map to 2; and 3 and 7 map to 3. Out of the 10 possibilities, three map
onto each of 0 and 1, and only two map onto each of 2 and 3. Thus, 0 or 1 are 50% more likely to be chosen than 2 or
3.) Although the function may be no less biased than other, "better" methods of generating integers, the distribution of
the bias is important. The simple modulus approach concentrates the bias in the lower portion of the output range. This
is important since a very common use of the random numbers produced by such a function is to compare them against a
threshold. For example:

if(Next(100) < 30) { /* do this 30% of the time */ } if(Next(5) == 0) { /* do this 20% of the time */ }

The concentration of bias in the lower numbers makes these kinds of comparisons particularly biased. The bias may be small in this case, but it exists.

##### Range bias

Range bias is less commonly discussed than modulus bias, but it plagues nearly all random number generators. To "eliminate bias", people generally suggest replacing code like the Next method above with something like this:

uint Next(uint exclusiveMax) { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); return (uint)(NextDouble()*exclusiveMax); // look ma, no bias! or is there? }

But in a sense this is no less biased than the modulus approach if NextDouble is implemented as follows (as it all too often is):

// Returns a random floating point number in the range [0,1). double NextDouble() { return NextUInt32() * (1.0 / 4294967296); // divide by 2^32 }

In both cases, 2^32 values are being mapped directly onto *exclusiveMax* output values. The only difference is
the distribution of the bias. While the modulus approach concentrates the bias in the low numbers, the "multiply by a
floating point number" spreads the bias out. This *is* an improvement, since threshold comparisons will be much
less affected, but for some other uses of random numbers it's no improvement at all.

In fact, the bias of the floating-point method is often worse than that, even in widely used software. The implementation of System.Random.NextDouble in the Microsoft .NET framework does this:

double NextDouble() { return NextInt() * (1.0 / int.MaxValue); // divide by 2^31-1 }

We can do much better when it comes to generating floating-point numbers.

Another example of range bias occurs when the output range is larger than the input range. Consider the following 64-bit integer generator.

ulong Next(ulong exclusiveMax) { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); return (ulong)(NextDouble()*exclusiveMax); }

Even if NextDouble is implemented properly, the double will have at most 52 bits of randomness, whereas
*exclusiveMax* is a 64-bit value. If *exclusiveMax* is larger than 2^{52}, some values can never be
generated. In that case, you might be forced to choose: should you use the modulus approach with a 64-bit generator and
suffer the modulus bias, or use the floating-point approach to eliminate modulus bias and suffer range bias that's
about 2^{64–52} = 4096 times worse than the alternative? Perhaps we don't have to choose.

#### Improved algorithms

Here are some ways to improve upon the typical algorithms to eliminate modulus bias, reduce and bound range bias, and finally eliminate range bias, plus some other tips as well.

##### Smearing the bias

An example of how to smear the bias across the output space (and thus avoid modulus bias) was given above — multiply by a random floating point number in [0,1) — and it is the standard approach. While a good implementation of NextDouble (i.e. one that has the full 52 bits of randomness rather than a mere 31 or 32) will provide acceptably low bias — about one part in a million — when used to generate 32-bit integers, it's slow. There's just no good reason to use floating-point math when you can use much faster fixed-point math instead.

Fixed-point math is where you treat part of an integer as a whole number and the other part as a fractional number.
For example, a 32-bit unsigned integer is normally considered to hold integer values up to 2^{32} (exclusive),
but if you split it into a 20-bit whole number portion and a 12-bit fractional portion, say, it can instead store values
up to 2^{20} = 1048576 (exclusive) with each whole number being divided into 2^{12} = 4096 fractional
parts. (We'll call this a 20:12 fixed-point scheme.) For example, to compute 2.25 + 3.125 (i.e. 2 1/4 + 3 1/8) and
2.25 * 3.125 with such a fixed-point scheme, you could do:

uint a = (2<<12) | (4096/4); // 2 and 1/4th. note that 4096 == 2^12 uint b = (3<<12) | (4096/8); // 3 and 1/8th uint c = a + b; // c == 5.375. addition of fixed-point numbers proceeds normally uint wholePart = c >> 12; // wholePart == 5 uint fraction = c & 4095; // fraction == 1536 and 1536/4096 == 0.375 // multiplication is similar c = (a * b) >> 12; // when multiplying, you have to shift the result back into place wholePart = c >> 12; // wholePart == 7 fraction = c & 4095; // fraction == 128 and 128/4096 == 0.03125. 2.25 * 3.125 == 7.03125, so it's correct

This is not a very practical example, but it demonstrates one way to store non-integer values in integer variables.
For our purposes, we want to duplicate the approach of multiplying a fractional value from [0,1) by *exclusiveMax*.
We already have NextUInt32 which returns a number from 0 to 2^{32}–1. If interpreted as a 64-bit
fixed-point number with a 32-bit whole portion and a 32-bit fractional portion, then NextUInt32 produces a value that
fits snugly in the fractional portion, resulting in a number from [0,1). We can multiply that number by
*exclusiveMax*. Multiplying a fixed-point number by a "regular" number doesn't need any shift, so
(ulong)NextUInt32()**exclusiveMax* is exactly equivalent to NextDouble()**exclusiveMax*. Then we just need to
truncate the result back to an integer. We just extract the whole portion as above, by right-shifting. Putting it all
together, we get the following code. Note that this example makes no attempt to reduce the amount of bias, only to smear
it. Reducing the bias is discussed in the next section.

uint Next(uint exclusiveMax) { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); // NextUInt32 is a fractional number in [0,1) when interpreted as 32:32 fixed-point. multiply it by exclusiveMax and // truncate the fixed-point result to an integer by right-shifting return (uint)(((ulong)exclusiveMax * NextUInt32()) >> 32); // this is about 5x as fast as floating-point }

For generating 64-bit numbers, we can use a similar approach, but it's much more complicated because we'd want an integer type larger than 64 bits to hold the fixed-point result.

ulong Next(ulong exclusiveMax) { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); // NextUInt64 is a fractional number in [0,1) when interpreted as 64:64 fixed-point, but... return (ulong)(((ulong128)exclusiveMax * NextUInt64()) >> 64); // ulong128 doesn't exist! what can we do? }

By the way, NextUInt64 can be implemented as follows (although some underlying generators natively produce 64-bit values):

ulong NextUInt64() { // it's slightly faster to shift and OR than to write the individual halves with 'unsafe' code, at least in C# return ((ulong)NextUInt32() << 32) | NextUInt32(); }

Now to actually perform that 128-bit multiplication. Some languages (like many implementations of C++) provide access to
hardware instructions for natively multiplying 64-bit integers and obtaining the 128-bit result. C# does not, and doing
that in C++ is not portable anyway, but we can emulate it. To multiply two arbitrary-precision integers *a* and
*b*, we first split them into 32-bit chunks *a*_{0} (containing the low 32 bits of *a*),
*a*_{1} (containing the next 32 bits), etc. and the same for *b*. Then *a* =
(*a*_{0}**s*^{0} + *a*_{1}**s*^{1} + ... +
*a*_{N}**s*^{N}) where *s* is two to the power of the word size, or 2^{32} in our
case. The result of the multiplication needs a number of bits equal to the sum of the numbers of bits needed to represent
*a* and *b*.

In the specific case of multiplying two 64-bit numbers, we have *a*_{0}, *a*_{1},
*b*_{0}, and *b*_{1}, and the result could be up to 128 bits, which we'll place in the result
values *r*_{0} through *r*_{3}. We have *r* = *a***b* =
(*a*_{0}**s*^{0} + *a*_{1}**s*^{1}) *
(*b*_{0}**s*^{0} + *b*_{1}**s*^{1}) =
(*a*_{0}**s*^{0} * *b*_{0}**s*^{0}) +
(*a*_{0}**s*^{0} * *b*_{1}**s*^{1}) +
(*a*_{1}**s*^{1} * *b*_{0}**s*^{0}) +
(*a*_{1}**s*^{1} * *b*_{1}**s*^{1}) =
(*a*_{0} * *b*_{0}) * *s*^{0} +
(*a*_{0} * *b*_{1}) * *s*^{1} +
(*a*_{1} * *b*_{0}) * *s*^{1} +
(*a*_{1} * *b*_{1}) * *s*^{2}. Essentially, each part of *a* is multiplied
by each part of *b* and the results are combined. The result of each multiplication of parts of
*a* and *b* is a 32-bit multiplication producing a 64-bit product (call it *t*_{0} and
*t*_{1}). Now, since *s* represents our word size, and
the values are split into words, multiplying by *s* simply shifts a value higher in the result. That is to say,
(*a*_{0} * *b*_{0}) * *s*^{0} produces *t*_{0} and
*t*_{1} which get added into the result at *r*_{0} and *r*_{1} respectively,
whereas (*a*_{0} * *b*_{1}) * *s*^{1} produces *t*_{0} and
*t*_{1} which get added into the result at *r*_{1} and *r*_{2}. (The *r*
subscripts are increased by one because the result is multiplied by *s*^{1}.) The only complication is
carry between parts of *r*. Putting it all together into code looks something like this:

ulong t = (ulong)a0*b0; // multiply a0 r0 = (uint)t; t = (t >> 32) + (ulong)a0*b1; r1 = (uint)t; r2 = (uint)(t >> 32); t = r1 + (ulong)a1*b0; // multiply a1 r1 = (uint)t; t = r2 + (t >> 32) + (ulong)a1*b1; r2 = (uint)t; r3 = (uint)(t >> 32);

(For truly arbitrary precision, you could use arrays and put code like this in a loop, although there are fancier
algorithms.) Now in our case, *r* is a fixed-point value that we're going to right-shift by 64, which effectively
means throwing away *r0* and *r1*. So we can eliminate those variables and simplify it to this:

ulong t = (ulong)a0*b0; t = (t >> 32) + (ulong)a0*b1; r2 = (uint)(t >> 32); t = (uint)t + (ulong)a1*b0; t = r2 + (t >> 32) + (ulong)a1*b1; r2 = (uint)t; r3 = (uint)(t >> 32);

Furthermore, what we're going to do with *r2* and *r3* is just return them as a 64-bit number.
That means there's no point in unpacking the 64-bit *t* into *r2* and *r3* when we could just return the
64-bit *t* directly. So we can eliminate *r2* and *r3* as well, and the final function looks like this:

ulong Next(ulong exclusiveMax) { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); uint a0 = NextUInt32(), a1 = NextUInt32(), b0 = (uint)exclusiveMax, b1 = (uint)(exclusiveMax >> 32); ulong t = ((ulong)a0*b0 >> 32) + (ulong)a0*b1; return (t >> 32) + (((uint)t + (ulong)a1*b0) >> 32) + (ulong)a1*b1; }

Even this complicated code is twice as fast as using NextDouble, while providing 64 bits of randomness rather than at 52 (at most) from the double. But as I mentioned before, this method still has too much range bias to be acceptable, and reducing the bias will be addressed in the next section.

##### Bounding the bias

So far, we've seen how to avoid modulus bias while continuing to use fast integer math by means of fixed-point
calculations. But those methods still have significant range bias. There are various ways to compute bias, but a simple
estimation is to take the range of the underlying generator, *r* and divide by the range of the output, *n*:
*w* = *r* / *n*. Then take *bias* = max(abs(floor(*w*) / *w* – 1), ceiling(*w*)
/ *w* – 1). This is based on the intuition that the average/ideal weight given to each result is *r* /
*n* = *w*, but since our weights can only be integers, some values will have a weight of floor(*w*) and
others will have a weight of ceiling(*w*). Dividing those by *w* and subtracting 1 gives the proportional
difference of the actual weights from the ideal weights, and taking the one with the largest magnitude gives the maximum
single-point bias of the function. Another estimation is *bias* = ceiling(*w*) / floor(*w*) – 1, which
describes how much more likely the most likely values are compared to the least likely values. In all cases, the bias is
inversely proportional to *w*, so to reduce bias we must increase *w*. Since *w* = *r* / *n*
and we can't change *n*, we have to increase *r*. In short, to reduce the bias we must map a wider range of
values onto the output (assuming *r* isn't divisible by *n*, in which case there's no bias).

It's useful for a random number generator to have a guaranteed upper bound on bias. For example, let's say we want
our generator to have a bias of at most one in a million. We could simply ensure that *r* / *n* is at least
2^{20} (which is about 1.05 million), or in other words that we generate at least 20 more random bits than the
number of bits needed to represent *n*. This leads to the idea of an adaptive generator function that uses more
random bits as *n* gets larger. Let's return to our previous 32-bit generator function, which used fixed-point
math to smear the bias across the output space.

uint Next(uint exclusiveMax) { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); return (uint)(((ulong)exclusiveMax * NextUInt32()) >> 32); }

The problem with this function is that it generates 32 random bits and the output space can also be up to 32 bits.
This can lead to significant bias as *exclusiveMax* gets large. So lets say we want to apply our bias guarantee.
That means the 32 random bits can only be used for values of *exclusiveMax* up to 32 – 20 = 12 bits. Then we
need a strategy for larger values. For example, our fixed-point approach can be used for products up to 64 bits, i.e. up
to the point where the number of random bits plus the bit size of *exclusiveMax* equals 64. Solving 2*x* + 20
= 64, we get *x* = 22. So for values of *exclusiveMax* up to 2^{22}, we can generate 42 random bits
and use nearly the same fixed-point formula.

But then what do we do with the case of even larger values of *exclusiveMax*? Remember our strategy for
computing the 128-bit product above. In the case of 32-bit values of *exclusiveMax*, we'd need 52 random bits,
resulting in an 84-bit product. Let's start with the same 128-bit product code from above:

ulong t = (ulong)a0*b0; r0 = (uint)t; t = (t >> 32) + (ulong)a0*b1; r1 = (uint)t; r2 = (uint)(t >> 32); t = r1 + (ulong)a1*b0; r1 = (uint)t; t = r2 + (t >> 32) + (ulong)a1*b1; r2 = (uint)t; r3 = (uint)(t >> 32);

But in this case, we know that *b1* = 0 (because *b* is *exclusiveMax*, which fits in a single 32-bit
word). So we can simplify the code with that assumption.

ulong t = (ulong)a0*b0; r0 = (uint)t; r1 = (uint)(t >> 32); t = r1 + (ulong)a1*b0; r1 = (uint)t; r2 = (uint)(t >> 32);

And, since we're generating 52 random bits, we're going to right-shift the product by 52, which means throwing away
all of *r0* and 20 bits of *r1*. Also as before, we don't need to unpack *t* into *r1* and
*r2* at the end, since we're just going to return it (albeit shifted 20 bits). So, we can simplify it into a
single expression:

(((ulong)a0*b0 >> 32) + (ulong)a1*b0) >> 20

You'd plug in NextUInt32() for *a0*, NextBits(20) for *a1*, and *exclusiveMax* for *b0*. That
said, with most underlying generators, it's faster to consume 64 random bits and compute a 96-bit product than to try to
consume only 52 random bits (which requires saving the extra bits in a bit buffer). So in most cases you'll want to use
the code below and use NextUInt32() for *a1* as well, but if your underlying generator is slow, the code above may
be better.

(((ulong)a0*b0 >> 32) + (ulong)a1*b0) >> 32

The resulting function is:

uint Next(uint exclusiveMax) { if(exclusiveMax <= 1u<<12) // if the range is no more than 2^12 and we need up to 32 random bits... { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); // NextUInt32 has the range [0,1) considered as 32:32 fixed-point return (uint)(((ulong)NextUInt32() * exclusiveMax) >> 32); } else if(exclusiveMax <= 1u<<22) // otherwise, if we need up to 42 random bits { // use a 22:42 fixed-point scheme, whose product still fits in 64 bits return (uint)(((((ulong)NextBits(10)<<32) | NextUInt32()) * exclusiveMax) >> 42); } else // otherwise, we need up to 52 random bits { // we could use NextDouble() * exclusiveMax since it provides 52 bits of randomness, but fixed-point math is // still 30-100% faster. compute the multiplication of 64 random bits and the 32-bit range, producing a 96-bit // fixed-point result. we don't compute the bottom 32 bits because we'd right-shift it away anyway return (uint)((((ulong)NextUInt32()*exclusiveMax >> 32) + (ulong)NextUInt32()*exclusiveMax) >> 32); } }

So there we have an adaptive generation function that handles ranges up to 2^{12} using 32 random bits, up to
2^{22} using 42 random bits, and up to 2^{32} using 64 random bits, in each case ensuring that we use at
least 20 more random bits than the size of *exclusiveMax* and thereby keeping the bias below one part in a
million. For simplicity you can drop the first and/or second cases if you like, although it may be a bit slower. In any
case, this is not the fastest approach. Later we'll see a method that is both faster than complex fixed-point math
and is completely unbiased.

A similar approach can be taken for the 64-bit generator function, with the only difference being that larger fixed-point products must be handled. The code would look something like this:

ulong Next(ulong exclusiveMax) { if(exclusiveMax <= uint.MaxValue) // if the range is less than 2^32... { return Next((uint)exclusiveMax); // just use the 32-bit method } else if(exclusiveMax <= 1UL<<38) // otherwise, if we need up to 58 random bits... { // compute the multiplication of 58 random bits and the 38-bit range, producing a 96-bit fixed-point result. // we actually consume 64 random bits because it's faster than going through the bit buffer return ((NextUInt32()*exclusiveMax >> 32) + (NextUInt32()&((1u<<26)-1))*exclusiveMax) >> 26; } else // otherwise, if we need up to 84 random bits... { // compute the multiplication of 96 random bits and the 64-bit range, producing a 160-bit fixed-point result. this is complicated but // still 4x faster than creating a random 84-bit FP107 value and doing the floating-point multiplication uint a0 = NextUInt32(), a1 = NextUInt32(), a2 = NextUInt32(), b0 = (uint)exclusiveMax, b1 = (uint)(exclusiveMax >> 32); ulong t = (((ulong)a0*b0) >> 32) + (ulong)a0*b1; t = (uint)(t >> 32) + (((uint)t + (ulong)a1*b0) >> 32) + (ulong)a1*b1; return (uint)(t >> 32) + (((uint)t + (ulong)a2*b0) >> 32) + (ulong)a2*b1; } }

You can also try to optimize the number generators to generate exactly log2(*exclusiveMax*) bits when
*exclusiveMax* is a power of two, rather than blindly generating an extra 20 bits, because there's no bias in that
case. It's often not worth it, but more details will be given below.

By the way, the function NextBits, which consumes exactly the given number of bits from the underlying generator (on average), works by saving the extra bits in a bit buffer to be used next time. It can be implemented like this:

uint bitBuffer; int bitsInBuffer; // could be a byte but that may be slightly slower // Returns a random number from 0 to 2^bits-1. uint NextBits(int bits) { uint value; if((uint)bits <= (uint)bitsInBuffer) // if we have enough bits in the buffer... { value = bitBuffer & ((1u<<bits)-1); // extract them bitBuffer >>= bits; bitsInBuffer -= bits; } else if(bits < 32) // otherwise, if we need more bits (but not a full word)... { bits -= bitsInBuffer; // get as many bits as we can from the buffer value = bitBuffer << bits; // leave space for the remaining bits bitBuffer = NextUInt32(); // then refill the buffer value |= bitBuffer & ((1u<<bits)-1); // and grab the remaining bits from it bitBuffer >>= bits; bitsInBuffer = 32 - bits; } else if(bits == 32) // otherwise, if we need a full word, just call NextUInt32() { value = NextUInt32(); } else // bits < 0 || bits > 32 { throw new ArgumentOutOfRangeException(); } return value; } ulong NextBits64(int bits) { if(bits <= 32) return NextBits(bits); else return ((ulong)NextBits(bits-32)<<32) | NextUInt32(); }

In practice and roughly speaking, most underlying generators are so fast that the overhead of managing the bit buffer outweighs the time saved by avoiding extra calls to the underlying generator unless you're saving at least half the calls (e.g. unless you're calling NextBits with 16 or less for a 32-bit underlying generator). But the bit buffer still has great utility, especially for generating booleans and numbers in ranges that are powers of two. For example, NextBits(8) is equivalent to Next(256), but faster.

bool NextBoolean() { if(--bitsInBuffer < 0) // if the bit buffer is empty... { bitsInBuffer = 31; // refill it (minus the one bit we're about to consume) bitBuffer = NextUInt32(); } bool result = (bitBuffer & 1) != 0; bitBuffer >>= 1; return result; }

So now we have random number generators that are free from modulus bias and also have guaranteed upper bounds on range bias, while being faster and less biased than the standard technique of multiplying by a random floating-point number. In the next section we'll see how we can generate numbers even faster and with no bias at all.

P.S. You may want to special-case powers of two. You can detect powers of two easily; *n* is a power of two (or zero)
if *n* & (*n*–1) == 0. Computing log2(*exclusiveMax*) and related values is another story. If you're using a
language that gives access to hardware instructions for counting leading zero bits (CLZ), the base-2 logarithm of a
32-bit power of two is 31 – CLZ(*n*). But if you must write portable code, the expense of computing the
logarithm is usually not worth it. Nonetheless, if you want to attempt it, you can use code like the following.

uint Next(uint exclusiveMax) { if((exclusiveMax & (exclusiveMax-1)) == 0) // if exclusiveMax is a power of two or zero... { return NextBits(Log2(exclusiveMax)); // Log2 will check for exclusiveMax == 0 } ... } // Returns the number of bits needed to represent the value. If the value is zero, zero is returned. static int BitsNeeded(uint value) { int count = 0; if(value > 0xFFFF) { count = 16; value >>= 16; } if(value > 0x00FF) { count += 8; value >>= 8; } if(value > 0x000F) { count += 4; value >>= 4; } if(value > 0x0003) { count += 2; value >>= 2; } if(value > 0x0001) { count += 1; value >>= 1; } return count + (int)value; } // Returns the base-2 logarithm of the value, rounded down. static int Log2(uint value) { if(value == 0) throw new ArgumentOutOfRangeException(); return BitsNeeded(value) - 1; // for faster code, BitsNeeded should be inlined } static int BitsNeeded(ulong value) { int count = 0; uint lo = (uint)(value >> 32); if(lo != 0) count = 32; else lo = (uint)value; if(lo > 0xFFFF) { count += 16; lo >>= 16; } if(lo > 0x00FF) { count += 8; lo >>= 8; } if(lo > 0x000F) { count += 4; lo >>= 4; } if(lo > 0x0003) { count += 2; lo >>= 2; } if(lo > 0x0001) { count += 1; lo >>= 1; } return count + (int)lo; } static int Log2(ulong value) { if(value == 0) throw new ArgumentOutOfRangeException(); return BitsNeeded(value) - 1; // for faster code, BitsNeeded should be inlined }

The following optimization using a look-up table is slightly faster but still rarely fast enough to be worth it.

struct ByteTable { public unsafe fixed byte Entries[256]; } unsafe static int BitsNeeded(uint value) { int count = 0; if(value > 0xFFFF) { count = 16; value >>= 16; } if(value > 0x00FF) { count += 8; value >>= 8; } fixed(byte* p = bitsTable.Entries) return count + p[value]; } unsafe static int BitsNeeded(ulong value) { int count = 0; uint lo = (uint)(value >> 32); if(lo != 0) count = 32; else lo = (uint)value; if(lo > 0xFFFF) { count += 16; lo >>= 16; } if(lo > 0x00FF) { count += 8; lo >>= 8; } fixed(byte* p = bitsTable.Entries) return count + p[lo]; } static unsafe ByteTable CreateBitsNeededTable() { var table = new ByteTable(); for(int i = 1, n = 0; i < 256; i++) { if((i & (i-1)) == 0) n++; // when we see a power of two, increase the bits needed table.Entries[i] = (byte)n; } return table; } static ByteTable bitsTable = CreateBitsNeededTable();

##### Eliminating bias

Perhaps low bias isn't good enough. How can we eliminate bias entirely? In general, to eliminate bias we may have to reject some values from the underlying generator, looping until we find one that's suitable. The number of bits we consume is therefore unbounded, but a good method should terminate quickly. There are several methods but they tend to fall into two general groups — various modulo methods and the power-of-two method — and I'm only going to recommend two of them.

The first method is the modulo method. There are actually many modulo methods, but let's start with a simple one. I
don't recommend this method, but it gets the idea across. The basic idea is that the range of the random numbers must be
divisible by the range of the output. For example, if we want a random number from 0 to *N*–1 (i.e. *N*
possible values), we would like an underlying generator that generates numbers within a range that is a multiple of
*N*. In that case, there would be no bias of any kind if we just did Next() % *N*. But underlying generators
almost always return numbers in a range that's a power of two. So we would find the largest multiple of *N* within
the underlying generator's range — call it *M* — and accept values from the underlying generator if
they're less than *M*, and reject them otherwise. This yields a random number from 0 to *M*–1. Having
reduced the range to a multiple of *N*, we can take that value modulo *N* to get a result without bias.

uint Unbiased(uint exclusiveMax) { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); const ulong Max = 1UL<<32; // 2^32 // take N = exclusiveMax and compute the largest multiple of N (call it M) less than or equal to 2^32. then // limit = M - 1 (so it fits in a uint). there are better ways to compute the limit. (see below.) uint limit = (uint)(Max - Max % exclusiveMax) - 1, r; do r = NextUInt32(); while(r > limit); // r is drawn from a random range that's a multiple of exclusiveMax return r % exclusiveMax; // therefore r % exclusiveMax has no bias }

The method above works by rejecting the last 2^{32} % *exclusiveMax* values from the range of the underlying
generator. A slightly simpler method is to reject the first values from the range instead of the last.

uint Unbiased(uint exclusiveMax) { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); // compute the first value we can accept, which is the same as the number of values we must reject. we would like to // take N = exclusiveMax and find 2^32 % N, but to avoid 64-bit division, we'll instead take (2^32 - N) % N. the // result is the same, since adding or subtracting N doesn't change a value taken modulo N. that is, x % N = // (x + N) % N = (x - N) % N (assuming no overflow or underflow). we also use the trick that (uint)-x = 2^32 - x uint limit = (uint)-(int)exclusiveMax % exclusiveMax, r; // limit == 2^32 % N do r = NextUInt32(); while(r < limit); // r is drawn from a random range that's a multiple of exclusiveMax return r % exclusiveMax; // therefore r % exclusiveMax has no bias }

This also works, but in both cases we need to perform at least two slow modulo operations. There exists a method to get away with a minimum of one modulo operation, and this is the method I recommend.

uint Unbiased(uint exclusiveMax) { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); // let's say N = exclusiveMax = 10. then NextUInt32() % N would be biased because 2^32 is not divisible by 10. we need // to reject 2^32 % 10 = 6 values, leaving a usable range of 2^32 - 6 which IS divisible by 10. two common ways to do // this are to reject random values less than 6 or greater than 2^32-1 - 6 and then return value % N if it wasn't // rejected. the problem with this approach is that it requires performing at least two (slow) modulus operations. so // we use a trick, which is to instead take limit = 2^32 - 10 (not -6) and for each random value r compare r - r%N to // that limit. for values 0 to N-1, r - r%N is zero. (0-0%10 = 0, 1-1%10 = 0, ..., 9-9%10 = 0.) then for N to 2N-1, // r - r%N == N. (10 - 10%10 = 10 - 0 = 10, 11 - 11%10 = 11 - 1 = 10, etc.) thus r - r%N is always a multiple of N. // since limit = 2^32 - N, limit is the largest legal value of r - r%N. (beyond that limit, there isn't enough space // in the range for another full set of N values.) in our example, limit = 2^32 - 10 but it's not until r = 2^32 - 6 // that r - r%N will exceed the limit. when r = 2^32 - 7, r - r%N = 2^32 - 16 and when r = 2^32 - 6, r - r%N = // 2^32 - 6. thus, this approach also excludes six values from the range but requires a minimum of one modulus // operation instead of two uint r, v, limit = (uint)-(int)exclusiveMax; // limit = 2^32 - exclusiveMax do { r = NextUInt32(); v = r % exclusiveMax; } while(r-v > limit); return v; }

How many times might these methods need to loop? You might be afraid that if *exclusiveMax* is near
2^{32} that we'd reject almost the entire range, but that doesn't happen. Let's say it's 2^{32}–1.
Then 2^{32} % *exclusiveMax* is 1, so we'd only exclude a single value from the range. The worst case
occurs when *exclusiveMax* is 2^{31}±1, when we exclude 2^{31}–1 values from the range,
rejecting them nearly half the time. That means there's still a 1/2 chance of returning after one iteration, 1/4 chance
after two, 1/8 after three, etc. Since 1/2 + 2/4 + 3/8 + 4/16 + ... = 2, the average number
of iterations is two. In general, the method is so fast that it beats the more complex fixed-point multiplication
method. Thus I'd recommend the biased methods look like this:

uint Next(uint exclusiveMax) { if(exclusiveMax <= 1u<<12) // if the range is no more than 2^12 and we need up to 32 random bits... { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); // NextUInt32 has the range [0,1) considered as 32:32 fixed-point return (uint)(((ulong)NextUInt32() * exclusiveMax) >> 32); } else // otherwise, the unbiased method is actually faster than larger fixed-point products { uint r, v, limit = (uint)-(int)exclusiveMax; do { r = NextUInt32(); v = r % exclusiveMax; } while(r-v > limit); return v; } } ulong Next(ulong exclusiveMax) { if(exclusiveMax <= uint.MaxValue) // if the range is no more than 2^32... { return Next((uint)exclusiveMax); // just use the 32-bit method } else if(exclusiveMax <= 1UL<<38) // otherwise, if we need up to 58 random bits... { // compute the multiplication of 58 random bits and the 38-bit range, producing a 96-bit fixed-point result. // we actually consume 64 random bits because it's faster than going through the bit buffer return ((NextUInt32()*exclusiveMax >> 32) + (NextUInt32()&((1u<<26)-1))*exclusiveMax) >> 26; } else // otherwise, if we need up to 84 random bits, use the unbiased method, which is faster than a large fixed-point product { ulong r, v, limit = (ulong)-(long)exclusiveMax; do { r = NextUInt64(); v = r % exclusiveMax; } while(r-v > limit); return v; } }

Those are in fact the methods I use in my own code. But there's another way to generate unbiased random numbers,
and it's even faster in some cases. The idea is to generate ceiling(log2(*exclusiveMax*)) random bits and loop until the
result is less than *exclusiveMax*. The worst cases occur when *exclusiveMax* is one greater than a power of
two, when the method takes slightly less than two iterations on average — the exact same worst-case behavior as
the modulo technique. Bad cases are somewhat more common with this method, but it makes up for it by consuming fewer
bits from the underlying generator and using cheaper operations, except for the calculation of
log2(*exclusiveMax*). The expense of calculating the logarithm slows the method down enough that I can't recommend
it unless you have access to a hardware instruction to count leading zero bits. That said, for 64-bit values on a 32-bit
architecture, this method is faster than the modulo method for small to medium values, even without a fast way to
compute the logarithm. Here's an implementation.

uint Unbiased(uint exclusiveMax) { if(exclusiveMax == 0) throw new ArgumentOutOfRangeException(); int bits = BitsNeeded(exclusiveMax-1); uint v; do v = NextBits(bits); while(v >= exclusiveMax); return v; }

In the case of unbiased number generation as well, you might consider special-casing powers of two, although I personally haven't found it to be worthwhile. (Normally when I need a power-of-two range I know it and just call NextBits directly.)

##### Generating better floating-point numbers

This section is short and simple. Above I gave a simple and poor method of generating random floating-point values.

// Returns a random floating point number in the range [0,1). double NextDouble() { return NextUInt32() * (1.0 / 4294967296); // divide by 2^32 }

The main problem with this method is that a double has a 52-bit mantissa but the method only generates 2^{32}
different values. This could be fixed by using more random bits.

// Returns a random floating point number in the range [0,1). double NextDouble() { return NextBits64(52) * (1.0 / (1UL<<52)); // divide by 2^52 }

The second problem is that the method must divide the number to get it in the right range, and division is slow.
Although the division has been optimized to a multiplication by the reciprocal, it's still a bit slower than it has to be.
Instead, we'd like to be able to simply fill the mantissa with 52 random bits using integer operations. To do that, we
must choose an appropriate exponent. Our goal is to return a value within [0,1), so we need an exponent that makes all
the values of the mantissa lie within a range of size 1 (exclusive). Remember that floating-point numbers are basically
numbers of the form *sign* * *mantissa* * 2^{exponent}, where the mantissa is a number within
[1,2) (except for denormalized values which I won't consider here). That is, a mantissa of zero represents 1.0 *
2^{exponent}, while the maximum mantissa represents 1.999... * 2^{exponent}. (We'll ignore
*sign* since we only generate non-negative values.) The mantissa, then, is already in a range of size 1 (exclusive),
exactly what we need. So we want to choose an exponent of zero so the value becomes [1,2) * 2^{0} = [1,2).
Then we just need to subtract 1 to shift it into the range [0,1). Exponents in IEEE754 floating point are "biased",
which just means offset by a fixed value and is their way to allow negative exponents to be represented. So instead of
putting zero in the exponent field, we have to put 0 + *bias*. Putting it all together, we get this.

// Returns a random floating point number in the range [0,1). unsafe double NextDouble() { // generate 52 random mantissa bits and an (unbiased) exponent of 0. it turns out to be significantly faster to use // an integer variable and reinterpret as a double than to use a double variable and reinterpret as an integer ulong n = ((ulong)(NextBits(20) | (1023u<<20)) << 32) | NextUInt32(); return *(double*)&n - 1; // 1.xxxxx * 2^0 lies within [1,2). subtract 1 to get it in [0,1) } unsafe float NextFloat() { uint n = NextBits(23) | (127u<<23); // generate 23 random mantissa bits and an (unbiased) exponent of 0 return *(float*)&n - 1; // 1.xxxxx * 2^0 lies within [1,2). subtract 1 to get it in [0,1) }

##### Gaussian, exponential, and other non-uniform distributions

While this article has primarily focused on uniform distributions, it is also of interest to efficiently generate values from nonuniform distributions. The best algorithm for this that I know of is the Ziggarat algorithm, which works for many unimodal distributions. While I haven't implemented every possible optimization, I do have a decent implementation, which I use for generating numbers from the normal and exponential distributions. See Random.cs in my mathematics library.

## Comments

No comments yet.