## Saturday, January 19, 2013

### An unbiased random number generator: references / classic solutions and a 2nd try

In a previous post I discussed my first attempt to create "an efficient unbiased" random number generator.  Efficient may not have been the right word to use;  friends have subsequently pointed me towards the classic solution to this problem:
http://stackoverflow.com/questions/1986859/unbiased-random-number-generator-using-a-biased-one
The events (p)(1-p) and (1-p)(p) are equiprobable. Taking them as 0 and 1 respectively and discarding the other two pairs of results you get an unbiased random generator.
Basically, treat a 0 followed by a 1 as a 1; treat a 1 followed by a 0 as  0.  If a 0-0 or a 1-1 occur, re-roll.  This is efficient and easy to implement.  One problem however is that it has an unbounded upper limit on its run time.  Related, as the fraction p gets closer to 1 or 0 the number of re-rolls required increases.  Note that  the above StackOverflow answer links out to a solution which solves these problems!
http://www.eecs.harvard.edu/~michaelm/coinflipext.pdf

I modified the algorithm / code from my previous post to attempt to remove the bias that was still present.  The previous algorithm and the new one both share the benefit of having well-defined upper bound on the run time.  The new algorithm calls BIASED N times, and compares the fraction of 1's returned to the previously obtained distribution of 1's.  If the ratio of the current fraction to the previous fraction is greater than 1.0, then it returns a 1, otherwise a zero.  After each run, it updates the number of calls N to make sure that it is high enough such that the increments in fractions is of the result is close enough to the running fraction of measured of 1's and 0's.

The result is an algorithm that is still biased, and that is much less efficient than the Von Neuman algorithm!  The code is available at github anyway:
https://github.com/dllahr/sandbox/tree/master/intro_to_alg/5.1-3/second_attempt

Update:  A thought occurred to me that my algorithm is basically treating this like a control problem.  I'm using a feedback loop to take the observed bias of BIASED (the error) and compensate accordingly as needed.  The reason it does not work perfectly is that basic control theory says that response should be proportional to error, derivative of the error and the integral of the error:
R = a*E + b*(dE / dt) + c*(integral(E dt))

In both of the algorithms I've written I'm just using the integral of the error.

Why does the above algorithm require many more calls?  I think the short answer is provided from the StackOverflow entry linked to above (1st link), as it describes an enhanced Von Neuman solution to the problem (2nd link):
Further on, the paper develops this into generating multiple unbiased bits from the biased source, essentially using two different ways of generating bits from the bit-pairs, and giving a sketch that this is optimal in the sense that it produces exactly the number of bits that the original sequence had entropy in it.
I am not sure exactly how to apply the above to the algorithm I wrote; at best I would say that converting UNBIASED results to numbers / doubles is wasted information.