Wednesday, January 16, 2013

Attempting to create an Unbiased random number generator (with defined upper bound runtime) from a Biased random number generator

This is based on a problem in Chapter 5 of Introduction to Algorithms.  Basically, the problem asks:
Given a random number generator that generates 1's and 0's that is biased (BIASED) such that it generates 1's p fraction of time and 0's (1-p) fraction of the time, write an algorithm that generates 1's and 0's with an even distribution.  You do not know p (p is not an input to the algorithm).
Summary:  I discuss how I attempted to solve the problem, statistical & performance results of an implementation of the solution, and a problem with the implementation.

Update:  from comments and friends' help, a new post summarizing classic / efficient solutions and a second attempt:

I assumed that I couldn't just measure p as the start up / initialization of the algorithm.  Here's how I attempted to solve this problem:

  1. create an array with 2 entries
    • The first entry is the cumulative number of 0's returned by BIASED
    • The second entry is the cumulative number of 1's returned by BIASED
  2. when generating an unbiased number:
    1. determine which array entry is smaller - the smaller indicates the value that we must favor in order to balance the distribution - call this the "target value"
      • if the first entry is larger than BIASED currently appears to be biased towards 0's
      • if the second entry is larger than BIASED currently appears to be biased towards 1's
    2. take the ratio of the array entries such that the ratio > 1
    3. call BIASED repeatedly until either the target value is returned or the number of calls equals the ratio
      • during each call store the results in the array above
      • NOTE:  do not update the ratio
    4. if the target value was returned from BIASED, return this result
    5. if the instead BIASED was called ratio number of times without returning the target value, return the opposite of the target value
      • i.e. if target value was 1, the opposite value is 0
I implemented the above in c++ (Cygwin g++ to compile), available at github here

Algorithm results and performance

The source code for the key class is listed below, but first I'd like to present some data from running this.  The biased source had a ratio of 3:1 more 0's than 1's.  A single test of the calculation consisted of running the unbiased algorithm 10,000,000 times.  The statistics of 200 of these tests are presented in the left columns.  For the average, the percentage is the percent different from the expected value of 5,000,000.  For the standard deviation, the percentage is the ratio of the standard deviation to the average.  For comparison, the right column shows the same calculation using the direct stlib random number generator.
average 5063910 1.278% 5000029 0.001%
stdev 490099.8 10% 1528.178 0.03%

What is most striking to me is the difference between standard deviations of the unbiased algorithm and the standard random number generator.  This difference in the standard deviations can easily account for the difference in the averages.  The standard deviation from the standard random number generator is inline with the predicted value from a binomial distribution.  The unbiased algorithm is not.  I can only provide hand-waving rationalizations for this difference - basically, my hunch is that the array that tracks the results of the calls to BIASED is acting like a badly tuned control / feedback loop and it continually overshoots the "correct" ratio that would create unbiased output.  This may be because of the digital nature of the problem.

A bigger question is how does the algorithm behave when the bias of BIASED is made worse.  This is summarized these charts and the table of data.  Within each chart, the total number of loop executions is plotted vs. the ratio of the bias in BIASED.  For the first chart the test consisted of 1000 runs, the second chart 10,000 runs, the third 100,000 runs.  The "loop execution" that is being measured is the heart of the algorithm - this is the loop that repeatedly calls the BIASED algorithm to attempt to achieve the balanced result.

num_runs  ratio  true_ratio  count  num_loops  buffer0  buffer1 measured_ratio
1000 1 1 490 1002 513 491 1.04
1000 2 3 529 2079 1551 530 2.93
1000 3 7 584 4842 4259 585 7.28
1000 4 15 594 9271 8678 595 14.58
1000 5 31 605 18351 17747 606 29.29
1000 6 63 613 38941 38329 614 62.43
10000 1 1 4964 10002 5039 4965 1.01
10000 2 3 4499 18039 13541 4500 3.01
10000 3 7 5795 46557 40763 5796 7.03
10000 4 15 6006 95736 89731 6007 14.94
10000 5 31 6170 200481 194312 6171 31.49
10000 6 63 6206 395288 389083 6207 62.68
100000 1 1 50046 100000 49955 50047 1.00
100000 2 3 53252 212561 159310 53253 2.99
100000 3 7 56838 454377 397540 56839 6.99
100000 4 15 61591 984469 922879 61592 14.98
100000 5 31 62229 1987277 1925049 62230 30.93
100000 6 63 62658 4006560 3943903 62659 62.94

What really jumps out from the data is the problem that the unbiased algorithm is actually failing to be truly unbiased.  The trend is clear that as the bias ratio increases, the bias of unbiased increases - although not as much and in the opposite direction!  So, for example, although BIASED is putting out 63 times more 0's than 1's, unbiased in putting out a ratio of 63:37 1's to 0's.

Source code

Most of the above is captured in this class:
#include "biased_rand.h"
class UnbiasedFromBiased {
  unsigned long buffer[2];
  BiasedRand * biased_rand;
  UnbiasedFromBiased(BiasedRand * biased_rand);
  ~UnbiasedFromBiased() {}
  unsigned short int generate();
  BiasedRand * get_biased_rand() {return biased_rand;}
  void reset();
  //each time the loop executes the value in one of the buffer entries is
  //incremented. But each buffer starts with a value of 1.
  unsigned long get_num_loop_runs() {return buffer[0] + buffer[1] - 2;}
  unsigned long get_buffer_entry(unsigned short int index) {return buffer[index];}

#include "unbiased_from_biased.h"
UnbiasedFromBiased::UnbiasedFromBiased(BiasedRand * biased_rand) {
this->biased_rand = biased_rand;
void UnbiasedFromBiased::reset() {
buffer[0] = 1;
buffer[1] = 1;
unsigned short int UnbiasedFromBiased::generate() {
unsigned short int target_value;
unsigned int ratio;
if (buffer[0] > buffer[1]) {
target_value = 1;
  ratio = buffer[0] / buffer[1];
} else {
target_value = 0;
ratio = buffer[1] / buffer[0];
unsigned short int i = 0;
bool notDone = true;
unsigned short int result;
while (i < ratio && notDone) {
  result = biased_rand->generate();
if (target_value == result) {
  notDone = false;
return result;


  1. Do you know the classical method of unbiasing an RNG?

    1. no! I started googling it after I saw your comment. We've been working through introduction to algorithms in journal club and so in general I'm not trying to look up the answer, I'm attempting to solve it myself first. Although this one clearly got away from me.

    2. ps. What is the classical method? My google skills are failing me.

    3. I did find this:

      Which makes think that instead of the integer version of the ratio above (which is probably a source of problems - standard deviation and bias) I should calculate a double fraction which represents the fraction of times the biased towards value is generated by BIASED. Then, I call BIASED to get enough bits to generate a float/double (and these results are stored in the buffer). I then compare this generated float/double to the fraction, if it is greater than the fraction, return the biased-towards value, otherwise return the biased-against value.