Given a random number generator that generates 1's and 0's that is biased (BIASED) such that it generates 1's p fraction of time and 0's (1-p) fraction of the time, write an algorithm that generates 1's and 0's with an even distribution. You do not know p (p is not an input to the algorithm).

**Summary**: I discuss how I attempted to solve the problem, statistical & performance results of an implementation of the solution, and a problem with the implementation.

**Update:**from comments and friends' help, a new post summarizing classic / efficient solutions and a second attempt: http://dllahr.blogspot.com/2013/01/2nd-try-unbiased-random-number.html

I assumed that I couldn't just measure p as the start up / initialization of the algorithm. Here's how I attempted to solve this problem:

- create an array with 2 entries
- The first entry is the cumulative number of 0's returned by BIASED
- The second entry is the cumulative number of 1's returned by BIASED
- when generating an unbiased number:
- determine which array entry is smaller - the smaller indicates the value that we must favor in order to balance the distribution - call this the "target value"
- if the first entry is larger than BIASED currently appears to be biased towards 0's
- if the second entry is larger than BIASED currently appears to be biased towards 1's
- take the ratio of the array entries such that the ratio > 1
- call BIASED repeatedly until either the target value is returned or the number of calls equals the ratio
- during each call store the results in the array above
- NOTE: do not update the ratio
- if the target value was returned from BIASED, return this result
- if the instead BIASED was called ratio number of times without returning the target value, return the opposite of the target value
- i.e. if target value was 1, the opposite value is 0

I implemented the above in c++ (Cygwin g++ to compile), available at github here

#### Algorithm results and performance

The source code for the key class is listed below, but first I'd like to present some data from running this. The biased source had a ratio of 3:1 more 0's than 1's. A single test of the calculation consisted of running the unbiased algorithm 10,000,000 times. The statistics of 200 of these tests are presented in the left columns. For the average, the percentage is the percent different from the expected value of 5,000,000. For the standard deviation, the percentage is the ratio of the standard deviation to the average. For comparison, the right column shows the same calculation using the direct stlib random number generator.

average | 5063910 | 1.278% | 5000029 | 0.001% | |

stdev | 490099.8 | 10% | 1528.178 | 0.03% |

What is most striking to me is the difference between standard deviations of the unbiased algorithm and the standard random number generator. This difference in the standard deviations can easily account for the difference in the averages. The standard deviation from the standard random number generator is inline with the predicted value from a binomial distribution. The unbiased algorithm is not. I can only provide hand-waving rationalizations for this difference - basically, my hunch is that the array that tracks the results of the calls to BIASED is acting like a badly tuned control / feedback loop and it continually overshoots the "correct" ratio that would create unbiased output. This may be because of the digital nature of the problem.

A bigger question is how does the algorithm behave when the bias of BIASED is made worse. This is summarized these charts and the table of data. Within each chart, the total number of loop executions is plotted vs. the ratio of the bias in BIASED. For the first chart the test consisted of 1000 runs, the second chart 10,000 runs, the third 100,000 runs. The "loop execution" that is being measured is the heart of the algorithm - this is the loop that repeatedly calls the BIASED algorithm to attempt to achieve the balanced result.

num_runs | ratio | true_ratio | count | num_loops | buffer0 | buffer1 | measured_ratio |

1000 | 1 | 1 | 490 | 1002 | 513 | 491 | 1.04 |

1000 | 2 | 3 | 529 | 2079 | 1551 | 530 | 2.93 |

1000 | 3 | 7 | 584 | 4842 | 4259 | 585 | 7.28 |

1000 | 4 | 15 | 594 | 9271 | 8678 | 595 | 14.58 |

1000 | 5 | 31 | 605 | 18351 | 17747 | 606 | 29.29 |

1000 | 6 | 63 | 613 | 38941 | 38329 | 614 | 62.43 |

10000 | 1 | 1 | 4964 | 10002 | 5039 | 4965 | 1.01 |

10000 | 2 | 3 | 4499 | 18039 | 13541 | 4500 | 3.01 |

10000 | 3 | 7 | 5795 | 46557 | 40763 | 5796 | 7.03 |

10000 | 4 | 15 | 6006 | 95736 | 89731 | 6007 | 14.94 |

10000 | 5 | 31 | 6170 | 200481 | 194312 | 6171 | 31.49 |

10000 | 6 | 63 | 6206 | 395288 | 389083 | 6207 | 62.68 |

100000 | 1 | 1 | 50046 | 100000 | 49955 | 50047 | 1.00 |

100000 | 2 | 3 | 53252 | 212561 | 159310 | 53253 | 2.99 |

100000 | 3 | 7 | 56838 | 454377 | 397540 | 56839 | 6.99 |

100000 | 4 | 15 | 61591 | 984469 | 922879 | 61592 | 14.98 |

100000 | 5 | 31 | 62229 | 1987277 | 1925049 | 62230 | 30.93 |

100000 | 6 | 63 | 62658 | 4006560 | 3943903 | 62659 | 62.94 |

What really jumps out from the data is the problem that the unbiased algorithm is actually failing to be truly unbiased. The trend is clear that as the bias ratio increases, the bias of unbiased increases - although not as much and in the opposite direction! So, for example, although BIASED is putting out 63 times more 0's than 1's, unbiased in putting out a ratio of 63:37 1's to 0's.

#### Source code

Most of the above is captured in this class:

#include "biased_rand.h"class UnbiasedFromBiased {private:unsigned long buffer[2];BiasedRand * biased_rand;public:UnbiasedFromBiased(BiasedRand * biased_rand);~UnbiasedFromBiased() {}unsigned short int generate();BiasedRand * get_biased_rand() {return biased_rand;}void reset();//each time the loop executes the value in one of the buffer entries is//incremented. But each buffer starts with a value of 1.unsigned long get_num_loop_runs() {return buffer[0] + buffer[1] - 2;}unsigned long get_buffer_entry(unsigned short int index) {return buffer[index];}};

#include "unbiased_from_biased.h"UnbiasedFromBiased::UnbiasedFromBiased(BiasedRand * biased_rand) {this->biased_rand = biased_rand;reset();}void UnbiasedFromBiased::reset() {buffer[0] = 1;buffer[1] = 1;}unsigned short int UnbiasedFromBiased::generate() {unsigned short int target_value;unsigned int ratio;if (buffer[0] > buffer[1]) {target_value = 1;ratio = buffer[0] / buffer[1];} else {target_value = 0;ratio = buffer[1] / buffer[0];}unsigned short int i = 0;bool notDone = true;unsigned short int result;while (i < ratio && notDone) {result = biased_rand->generate();if (target_value == result) {notDone = false;}buffer[result]++;i++;}return result;}

Do you know the classical method of unbiasing an RNG?

ReplyDeleteno! I started googling it after I saw your comment. We've been working through introduction to algorithms in journal club and so in general I'm not trying to look up the answer, I'm attempting to solve it myself first. Although this one clearly got away from me.

Deleteps. What is the classical method? My google skills are failing me.

DeleteI did find this:

Deletehttp://stackoverflow.com/questions/8157340/unbiased-random-number-generator

Which makes think that instead of the integer version of the ratio above (which is probably a source of problems - standard deviation and bias) I should calculate a double fraction which represents the fraction of times the biased towards value is generated by BIASED. Then, I call BIASED to get enough bits to generate a float/double (and these results are stored in the buffer). I then compare this generated float/double to the fraction, if it is greater than the fraction, return the biased-towards value, otherwise return the biased-against value.

http://stackoverflow.com/questions/8157340/unbiased-random-number-generator

Delete