ciphergoth | More Geiger counter stuff

I keep promising to write up things about my life on here, like my astonishingly fabulous weekend with

calar, but the thing that usually inspires me to write is some geek nonsense, so here we go again.

Everyone knows that radioactive decay is truly random: whether or not a given nucleus will decay in the next second is entirely down to chance. If it doesn't decay this second, the chances that it'll do so in the next are exactly the same. If it survives another million years, it still has the same chance of decaying in the second following that; it doesn't "age". For the nucleus, every second is a game of Russian Roulette, and there are always the same number of bullets in the barrel. That's why you can't measure the "life" of a radioactive source - instead, we measure the "half-life", which is the time it takes before the chances it'll have decayed are 1/2 - or to put it another way, the time before half the nuclei in a given sample have decayed.

So it's an appealing source of random numbers - stick a Geiger counter next to a radioactive source, and the "clicks" are randomly distributed. If the source is large enough and has a long enough half life, we can ignore its slow decay, and treat each "click" as an independent event - the probability of a click in the next millisecond is the same no matter what the history is.

Only it isn't. The problem is that the Geiger counter takes a while to recover after emitting a "click" - its sensitivity is impaired, and it takes a while to return to full sensitivity. So the chances of there being a "click" in the next millisecond aren't always the same - they depend on how long ago the last click was, in a complex way to do with the exact hardware of the kind of particle detector you're using.

If this wasn't so, getting random numbers from a Geiger counter would be easy. For each millisecond, record whether there was a "click". Call it "heads" if there was, and "tails" if there wasn't; each millisecond would be independent of the others, so we can treat it as the toss of a coin. It's a biased coin, more likely to come down on one side than the other, but that's OK, because there is a very effective way to turn a sequence of biased but independent coin flips into a sequence of unbiased, independent coin flips: the Advanced Multi-Level Strategy or AMLS for short, developed by Yuval Peres. It doesn't even need to know the bias, and it makes very efficient use of the randomness available given enough data.

However, we can't apply this strategy directly because of the "recovery time" of the Geiger counter; a different approach is needed, one that makes no assumptions about how the intervals between clicks are distributed except that they are independent.

HotBits is a website which allows you to download random bits generated by Geiger counter clicks. HotBits simply takes pairs of successive intervals, and compares them. If the first is bigger, emit 0, if the second, emit 1, otherwise skip them and move on. This produces slightly under 0.5 bits per click.

A better strategy would be to estimate the median interval length, and then emit Heads for an interval which is under, and Tails if it's over; if you estimate the median wrong, then there will be a bias, but you feed the stream of coin flips to AMLS to get a guaranteed unbiased bit stream. This will produce slightly under 1 bit/click.

For rather more complexity, I developed today the following strategy which does rather better; against data such as HotBits uses and taking intervals in blocks of 8192, it produces around 5.72 bits/click. It's inspired by QuickSort and should be pretty easy to implement in C: the following code is in Python, but should be readable by any programmer:

def extract_juice(data):
    if len(data) < 3:
        return []
    pivot = data[0]
    data[0:1] = []
    flips = [x >= pivot for x in data]
    bits = amls(flips)
    left = [x for x in data if x < pivot]
    right = [x for x in data if x >= pivot]
    return (bits + extract_juice(left) + extract_juice(right))

Essentially, instead of estimating the median it takes the very first sample in the block to be the pivot, and produces coin flips based on that which are run through AMLS for unbiasing. However, it then divides the block into two sub-blocks, one for those less than the pivot and one for those greater, and re-runs the algorithm over both sub-blocks. This produces provably unbiased bits, assuming only that the input intervals are shuffled fairly.

I've tried quite a few variants on this, but this one seems to be the winner for both simplicity and efficiency.

Flat | Top-Level Comments Only

From:

ciphergoth.livejournal.com

I'm assuming we're measuring the intervals with a high-resolution timer. However, it will not have an inverse-exponential distribution, because of the recovery time; it'll have some weird distribution. If you discard all intervals shorter than the recovery time, and subtract the recovery time from the rest, you get an inverse-exponential distribution, to which you can apply my first algorithm.

However, this leaves you throwing away a lot of data. Interval lengths detected by HotBits are not inverse-exponential at all, but an excellent fit to a normal distribution; this indicates that the source is so hot that the detector pretty much never fully recovers before detecting its next particle. In this situation, your strategies would be very unproductive.

So you need a strategy that doesn't care what the distribution of the intervals is, only that they are all independent. That's what the strategy above does.

I've written a C implementation now...

pavlos.livejournal.com

Hmmm... Maybe this is a naive question. The part I don't understand is why comparing intervals is the thing to do. If you simply measure the length of each tick with a high-resolution timer, why can't you treat each bit of the timer value as a biased coin? Is it because there is no good strategy to remove the correlation between the bits?

If you attempt to rank the intervals by length, as you seem to be doing, the number of bits that you can get per tick before any statistical correction seems limited by the number of ticks (n) in the block. You will get a particular ranking out of n!. Allegedly this can be approximated as (here is how I embarass myself by making some elementary algebra error):

ln(n!) ~= n ln(n) - n

ln(n!) ~= n (ln(n) - 1)
lg(n!) ~= n (ln(n) - 1) / ln(2)

So you should get roughly 94k bits per block, or a yield of 11 bits per sample, and the yield should grow roughly by one bit per tick every time you double the block. That, of course, assumes an infinitely fine timer. You then have to compensate for the fact that, with a finite-resolution timer, some intervals would have the same length and the ranking would be partial.

I've no idea how to begin to analyze the tradeoff between better yield, more computation to extract the random bits out of the ranking, and a better (higher resolution) timer that would result in fewer colliding intervals. All I'm trying to say is that I vaguely expect you need to do something like use the largest practical block in order to make efficient use of the resolution of your timer.

Pavlos

Is it because there is no good strategy to remove the correlation between the bits?

Yes, it's for just this reason. I proposed this very strategy myself on sci.crypt before I realised this problem!

One way to remove the correlation is to have separate streams for the n+1th most significant bit based on the n-bit prefix - but if you start on that road you end up on this strategy.

Basically comparing intervals is the most effective way to assume nothing about their distribution!

At least for HotBits data, the tradeoff isn't as sharp as you might think, probably because the timer isn't high resolution enough. The s.d. of the integers is only 26.4, so colliding interval lengths are common. I've tried block sizes of 10 million with my C implementation, and I still can't get 7 bits/click, so I think I'm approaching a natural limit.

Well, if there are lots of equal intervals then the number of distinguishable rankings is much less than n! It's probably

n! / Product {m! : for every set S of equal intervals m = population (S)}

Please excuse the crappy notation. Whould that be adequate explanation for the limited yield of the HotBits data? If not, I can't see what else might be the problem.

Yes, I think that's why the yield on HotBits isn't higher. Increasing the s.d. by a factor of four increases the yield by 1.27 bits on a block size of 8192 - I expect the effect would be stronger at a larger block size.

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Paul Crowley

More Geiger counter stuff

More Geiger counter stuff

Re: Timing the intervals?

Re: Timing the intervals?

Re: Timing the intervals?

Re: Timing the intervals?

Re: Timing the intervals?

Profile

January 2025

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags