Distractions: Attempting to create an Unbiased random number generator (with defined upper bound runtime) from a Biased random number generator

This is based on a problem in Chapter 5 of Introduction to Algorithms. Basically, the problem asks:

Given a random number generator that generates 1's and 0's that is biased (BIASED) such that it generates 1's p fraction of time and 0's (1-p) fraction of the time, write an algorithm that generates 1's and 0's with an even distribution. You do not know p (p is not an input to the algorithm).

Summary: I discuss how I attempted to solve the problem, statistical & performance results of an implementation of the solution, and a problem with the implementation.

Update: from comments and friends' help, a new post summarizing classic / efficient solutions and a second attempt: http://dllahr.blogspot.com/2013/01/2nd-try-unbiased-random-number.html

I assumed that I couldn't just measure p as the start up / initialization of the algorithm. Here's how I attempted to solve this problem:

create an array with 2 entries

The first entry is the cumulative number of 0's returned by BIASED
The second entry is the cumulative number of 1's returned by BIASED

when generating an unbiased number:

determine which array entry is smaller - the smaller indicates the value that we must favor in order to balance the distribution - call this the "target value"

if the first entry is larger than BIASED currently appears to be biased towards 0's
if the second entry is larger than BIASED currently appears to be biased towards 1's

take the ratio of the array entries such that the ratio > 1
call BIASED repeatedly until either the target value is returned or the number of calls equals the ratio

during each call store the results in the array above
NOTE: do not update the ratio

if the target value was returned from BIASED, return this result
if the instead BIASED was called ratio number of times without returning the target value, return the opposite of the target value

i.e. if target value was 1, the opposite value is 0

I implemented the above in c++ (Cygwin g++ to compile), available at github here

https://github.com/dllahr/sandbox/tree/master/intro_to_alg

Algorithm results and performance

The source code for the key class is listed below, but first I'd like to present some data from running this. The biased source had a ratio of 3:1 more 0's than 1's. A single test of the calculation consisted of running the unbiased algorithm 10,000,000 times. The statistics of 200 of these tests are presented in the left columns. For the average, the percentage is the percent different from the expected value of 5,000,000. For the standard deviation, the percentage is the ratio of the standard deviation to the average. For comparison, the right column shows the same calculation using the direct stlib random number generator.

average	5063910	1.278%		5000029	0.001%
stdev	490099.8	10%		1528.178	0.03%

What is most striking to me is the difference between standard deviations of the unbiased algorithm and the standard random number generator. This difference in the standard deviations can easily account for the difference in the averages. The standard deviation from the standard random number generator is inline with the predicted value from a binomial distribution. The unbiased algorithm is not. I can only provide hand-waving rationalizations for this difference - basically, my hunch is that the array that tracks the results of the calls to BIASED is acting like a badly tuned control / feedback loop and it continually overshoots the "correct" ratio that would create unbiased output. This may be because of the digital nature of the problem.

A bigger question is how does the algorithm behave when the bias of BIASED is made worse. This is summarized these charts and the table of data. Within each chart, the total number of loop executions is plotted vs. the ratio of the bias in BIASED. For the first chart the test consisted of 1000 runs, the second chart 10,000 runs, the third 100,000 runs. The "loop execution" that is being measured is the heart of the algorithm - this is the loop that repeatedly calls the BIASED algorithm to attempt to achieve the balanced result.

num_runs	ratio	true_ratio	count	num_loops	buffer0	buffer1	measured_ratio
1000	1	1	490	1002	513	491	1.04
1000	2	3	529	2079	1551	530	2.93
1000	3	7	584	4842	4259	585	7.28
1000	4	15	594	9271	8678	595	14.58
1000	5	31	605	18351	17747	606	29.29
1000	6	63	613	38941	38329	614	62.43
10000	1	1	4964	10002	5039	4965	1.01
10000	2	3	4499	18039	13541	4500	3.01
10000	3	7	5795	46557	40763	5796	7.03
10000	4	15	6006	95736	89731	6007	14.94
10000	5	31	6170	200481	194312	6171	31.49
10000	6	63	6206	395288	389083	6207	62.68
100000	1	1	50046	100000	49955	50047	1.00
100000	2	3	53252	212561	159310	53253	2.99
100000	3	7	56838	454377	397540	56839	6.99
100000	4	15	61591	984469	922879	61592	14.98
100000	5	31	62229	1987277	1925049	62230	30.93
100000	6	63	62658	4006560	3943903	62659	62.94

What really jumps out from the data is the problem that the unbiased algorithm is actually failing to be truly unbiased. The trend is clear that as the bias ratio increases, the bias of unbiased increases - although not as much and in the opposite direction! So, for example, although BIASED is putting out 63 times more 0's than 1's, unbiased in putting out a ratio of 63:37 1's to 0's.

Source code

Most of the above is captured in this class:


#include "biased_rand.h"



class UnbiasedFromBiased {

private:

  unsigned long buffer[2];

  

  BiasedRand * biased_rand;



public:

  UnbiasedFromBiased(BiasedRand * biased_rand);

  ~UnbiasedFromBiased() {}

  

  unsigned short int generate();

  

  BiasedRand * get_biased_rand() {return biased_rand;}

  

  void reset();

  

  //each time the loop executes the value in one of the buffer entries is

  //incremented.  But each buffer starts with a value of 1.

  unsigned long get_num_loop_runs() {return buffer[0] + buffer[1] - 2;}

  

  unsigned long get_buffer_entry(unsigned short int index) {return buffer[index];}

};


#include "unbiased_from_biased.h"



UnbiasedFromBiased::UnbiasedFromBiased(BiasedRand * biased_rand) {

    this->biased_rand = biased_rand;



    reset();

}





void UnbiasedFromBiased::reset() {

    buffer[0] = 1;

    buffer[1] = 1;

}





unsigned short int UnbiasedFromBiased::generate() {



    unsigned short int target_value;

    unsigned int ratio;

    if (buffer[0] > buffer[1]) {

        target_value = 1;

        ratio = buffer[0] / buffer[1];

    } else {

        target_value = 0;

        ratio = buffer[1] / buffer[0];

    }





    unsigned short int i = 0;

    bool notDone = true;

    unsigned short int result;

    while (i < ratio && notDone) {

        result = biased_rand->generate();





        if (target_value == result) {

            notDone = false;

        }



        buffer[result]++;



        i++;

    }



    return result;

}

Distractions

Wednesday, January 16, 2013

Attempting to create an Unbiased random number generator (with defined upper bound runtime) from a Biased random number generator

Algorithm results and performance

Source code

5 comments:

Contributors

Blog Archive

Subscribe To Distractions

Wednesday, January 16, 2013

Attempting to create an Unbiased random number generator (with defined upper bound runtime) from a Biased random number generator

Algorithm results and performance

Source code

5 comments:

Contributors

Blog Archive