Saturday, January 19, 2013

An unbiased random number generator: references / classic solutions and a 2nd try

In a previous post I discussed my first attempt to create "an efficient unbiased" random number generator.  Efficient may not have been the right word to use;  friends have subsequently pointed me towards the classic solution to this problem:
http://stackoverflow.com/questions/1986859/unbiased-random-number-generator-using-a-biased-one
The events (p)(1-p) and (1-p)(p) are equiprobable. Taking them as 0 and 1 respectively and discarding the other two pairs of results you get an unbiased random generator.
Basically, treat a 0 followed by a 1 as a 1; treat a 1 followed by a 0 as  0.  If a 0-0 or a 1-1 occur, re-roll.  This is efficient and easy to implement.  One problem however is that it has an unbounded upper limit on its run time.  Related, as the fraction p gets closer to 1 or 0 the number of re-rolls required increases.  Note that  the above StackOverflow answer links out to a solution which solves these problems!
http://www.eecs.harvard.edu/~michaelm/coinflipext.pdf

I modified the algorithm / code from my previous post to attempt to remove the bias that was still present.  The previous algorithm and the new one both share the benefit of having well-defined upper bound on the run time.  The new algorithm calls BIASED N times, and compares the fraction of 1's returned to the previously obtained distribution of 1's.  If the ratio of the current fraction to the previous fraction is greater than 1.0, then it returns a 1, otherwise a zero.  After each run, it updates the number of calls N to make sure that it is high enough such that the increments in fractions is of the result is close enough to the running fraction of measured of 1's and 0's.

The result is an algorithm that is still biased, and that is much less efficient than the Von Neuman algorithm!  The code is available at github anyway:
https://github.com/dllahr/sandbox/tree/master/intro_to_alg/5.1-3/second_attempt

Update:  A thought occurred to me that my algorithm is basically treating this like a control problem.  I'm using a feedback loop to take the observed bias of BIASED (the error) and compensate accordingly as needed.  The reason it does not work perfectly is that basic control theory says that response should be proportional to error, derivative of the error and the integral of the error:
R = a*E + b*(dE / dt) + c*(integral(E dt))

In both of the algorithms I've written I'm just using the integral of the error.

Why does the above algorithm require many more calls?  I think the short answer is provided from the StackOverflow entry linked to above (1st link), as it describes an enhanced Von Neuman solution to the problem (2nd link):
Further on, the paper develops this into generating multiple unbiased bits from the biased source, essentially using two different ways of generating bits from the bit-pairs, and giving a sketch that this is optimal in the sense that it produces exactly the number of bits that the original sequence had entropy in it.
I am not sure exactly how to apply the above to the algorithm I wrote; at best I would say that converting UNBIASED results to numbers / doubles is wasted information.


Wednesday, January 16, 2013

Attempting to create an Unbiased random number generator (with defined upper bound runtime) from a Biased random number generator

This is based on a problem in Chapter 5 of Introduction to Algorithms.  Basically, the problem asks:
Given a random number generator that generates 1's and 0's that is biased (BIASED) such that it generates 1's p fraction of time and 0's (1-p) fraction of the time, write an algorithm that generates 1's and 0's with an even distribution.  You do not know p (p is not an input to the algorithm).
Summary:  I discuss how I attempted to solve the problem, statistical & performance results of an implementation of the solution, and a problem with the implementation.

Update:  from comments and friends' help, a new post summarizing classic / efficient solutions and a second attempt:  http://dllahr.blogspot.com/2013/01/2nd-try-unbiased-random-number.html

I assumed that I couldn't just measure p as the start up / initialization of the algorithm.  Here's how I attempted to solve this problem:

  1. create an array with 2 entries
    • The first entry is the cumulative number of 0's returned by BIASED
    • The second entry is the cumulative number of 1's returned by BIASED
  2. when generating an unbiased number:
    1. determine which array entry is smaller - the smaller indicates the value that we must favor in order to balance the distribution - call this the "target value"
      • if the first entry is larger than BIASED currently appears to be biased towards 0's
      • if the second entry is larger than BIASED currently appears to be biased towards 1's
    2. take the ratio of the array entries such that the ratio > 1
    3. call BIASED repeatedly until either the target value is returned or the number of calls equals the ratio
      • during each call store the results in the array above
      • NOTE:  do not update the ratio
    4. if the target value was returned from BIASED, return this result
    5. if the instead BIASED was called ratio number of times without returning the target value, return the opposite of the target value
      • i.e. if target value was 1, the opposite value is 0
I implemented the above in c++ (Cygwin g++ to compile), available at github here