ciphergoth | More Geiger counter stuff

I keep promising to write up things about my life on here, like my astonishingly fabulous weekend with

calar, but the thing that usually inspires me to write is some geek nonsense, so here we go again.

Everyone knows that radioactive decay is truly random: whether or not a given nucleus will decay in the next second is entirely down to chance. If it doesn't decay this second, the chances that it'll do so in the next are exactly the same. If it survives another million years, it still has the same chance of decaying in the second following that; it doesn't "age". For the nucleus, every second is a game of Russian Roulette, and there are always the same number of bullets in the barrel. That's why you can't measure the "life" of a radioactive source - instead, we measure the "half-life", which is the time it takes before the chances it'll have decayed are 1/2 - or to put it another way, the time before half the nuclei in a given sample have decayed.

So it's an appealing source of random numbers - stick a Geiger counter next to a radioactive source, and the "clicks" are randomly distributed. If the source is large enough and has a long enough half life, we can ignore its slow decay, and treat each "click" as an independent event - the probability of a click in the next millisecond is the same no matter what the history is.

Only it isn't. The problem is that the Geiger counter takes a while to recover after emitting a "click" - its sensitivity is impaired, and it takes a while to return to full sensitivity. So the chances of there being a "click" in the next millisecond aren't always the same - they depend on how long ago the last click was, in a complex way to do with the exact hardware of the kind of particle detector you're using.

If this wasn't so, getting random numbers from a Geiger counter would be easy. For each millisecond, record whether there was a "click". Call it "heads" if there was, and "tails" if there wasn't; each millisecond would be independent of the others, so we can treat it as the toss of a coin. It's a biased coin, more likely to come down on one side than the other, but that's OK, because there is a very effective way to turn a sequence of biased but independent coin flips into a sequence of unbiased, independent coin flips: the Advanced Multi-Level Strategy or AMLS for short, developed by Yuval Peres. It doesn't even need to know the bias, and it makes very efficient use of the randomness available given enough data.

However, we can't apply this strategy directly because of the "recovery time" of the Geiger counter; a different approach is needed, one that makes no assumptions about how the intervals between clicks are distributed except that they are independent.

HotBits is a website which allows you to download random bits generated by Geiger counter clicks. HotBits simply takes pairs of successive intervals, and compares them. If the first is bigger, emit 0, if the second, emit 1, otherwise skip them and move on. This produces slightly under 0.5 bits per click.

A better strategy would be to estimate the median interval length, and then emit Heads for an interval which is under, and Tails if it's over; if you estimate the median wrong, then there will be a bias, but you feed the stream of coin flips to AMLS to get a guaranteed unbiased bit stream. This will produce slightly under 1 bit/click.

For rather more complexity, I developed today the following strategy which does rather better; against data such as HotBits uses and taking intervals in blocks of 8192, it produces around 5.72 bits/click. It's inspired by QuickSort and should be pretty easy to implement in C: the following code is in Python, but should be readable by any programmer:

def extract_juice(data):
    if len(data) < 3:
        return []
    pivot = data[0]
    data[0:1] = []
    flips = [x >= pivot for x in data]
    bits = amls(flips)
    left = [x for x in data if x < pivot]
    right = [x for x in data if x >= pivot]
    return (bits + extract_juice(left) + extract_juice(right))

Essentially, instead of estimating the median it takes the very first sample in the block to be the pivot, and produces coin flips based on that which are run through AMLS for unbiasing. However, it then divides the block into two sub-blocks, one for those less than the pivot and one for those greater, and re-runs the algorithm over both sub-blocks. This produces provably unbiased bits, assuming only that the input intervals are shuffled fairly.

I've tried quite a few variants on this, but this one seems to be the winner for both simplicity and efficiency.