More Geiger counter stuff
Nov. 25th, 2002 04:43 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I keep promising to write up things about my life on here, like my astonishingly fabulous weekend with
ergotia,
lilithmagna, and
calar, but the thing that usually inspires me to write is some geek nonsense, so here we go again.
Everyone knows that radioactive decay is truly random: whether or not a given nucleus will decay in the next second is entirely down to chance. If it doesn't decay this second, the chances that it'll do so in the next are exactly the same. If it survives another million years, it still has the same chance of decaying in the second following that; it doesn't "age". For the nucleus, every second is a game of Russian Roulette, and there are always the same number of bullets in the barrel. That's why you can't measure the "life" of a radioactive source - instead, we measure the "half-life", which is the time it takes before the chances it'll have decayed are 1/2 - or to put it another way, the time before half the nuclei in a given sample have decayed.
So it's an appealing source of random numbers - stick a Geiger counter next to a radioactive source, and the "clicks" are randomly distributed. If the source is large enough and has a long enough half life, we can ignore its slow decay, and treat each "click" as an independent event - the probability of a click in the next millisecond is the same no matter what the history is.
Only it isn't. The problem is that the Geiger counter takes a while to recover after emitting a "click" - its sensitivity is impaired, and it takes a while to return to full sensitivity. So the chances of there being a "click" in the next millisecond aren't always the same - they depend on how long ago the last click was, in a complex way to do with the exact hardware of the kind of particle detector you're using.
If this wasn't so, getting random numbers from a Geiger counter would be easy. For each millisecond, record whether there was a "click". Call it "heads" if there was, and "tails" if there wasn't; each millisecond would be independent of the others, so we can treat it as the toss of a coin. It's a biased coin, more likely to come down on one side than the other, but that's OK, because there is a very effective way to turn a sequence of biased but independent coin flips into a sequence of unbiased, independent coin flips: the Advanced Multi-Level Strategy or AMLS for short, developed by Yuval Peres. It doesn't even need to know the bias, and it makes very efficient use of the randomness available given enough data.
However, we can't apply this strategy directly because of the "recovery time" of the Geiger counter; a different approach is needed, one that makes no assumptions about how the intervals between clicks are distributed except that they are independent.

HotBits is a website which allows you to download random bits generated by Geiger counter clicks. HotBits simply takes pairs of successive intervals, and compares them. If the first is bigger, emit 0, if the second, emit 1, otherwise skip them and move on. This produces slightly under 0.5 bits per click.
A better strategy would be to estimate the median interval length, and then emit Heads for an interval which is under, and Tails if it's over; if you estimate the median wrong, then there will be a bias, but you feed the stream of coin flips to AMLS to get a guaranteed unbiased bit stream. This will produce slightly under 1 bit/click.
For rather more complexity, I developed today the following strategy which does rather better; against data such as HotBits uses and taking intervals in blocks of 8192, it produces around 5.72 bits/click. It's inspired by QuickSort and should be pretty easy to implement in C: the following code is in Python, but should be readable by any programmer:
I've tried quite a few variants on this, but this one seems to be the winner for both simplicity and efficiency.
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
Everyone knows that radioactive decay is truly random: whether or not a given nucleus will decay in the next second is entirely down to chance. If it doesn't decay this second, the chances that it'll do so in the next are exactly the same. If it survives another million years, it still has the same chance of decaying in the second following that; it doesn't "age". For the nucleus, every second is a game of Russian Roulette, and there are always the same number of bullets in the barrel. That's why you can't measure the "life" of a radioactive source - instead, we measure the "half-life", which is the time it takes before the chances it'll have decayed are 1/2 - or to put it another way, the time before half the nuclei in a given sample have decayed.
So it's an appealing source of random numbers - stick a Geiger counter next to a radioactive source, and the "clicks" are randomly distributed. If the source is large enough and has a long enough half life, we can ignore its slow decay, and treat each "click" as an independent event - the probability of a click in the next millisecond is the same no matter what the history is.
Only it isn't. The problem is that the Geiger counter takes a while to recover after emitting a "click" - its sensitivity is impaired, and it takes a while to return to full sensitivity. So the chances of there being a "click" in the next millisecond aren't always the same - they depend on how long ago the last click was, in a complex way to do with the exact hardware of the kind of particle detector you're using.
If this wasn't so, getting random numbers from a Geiger counter would be easy. For each millisecond, record whether there was a "click". Call it "heads" if there was, and "tails" if there wasn't; each millisecond would be independent of the others, so we can treat it as the toss of a coin. It's a biased coin, more likely to come down on one side than the other, but that's OK, because there is a very effective way to turn a sequence of biased but independent coin flips into a sequence of unbiased, independent coin flips: the Advanced Multi-Level Strategy or AMLS for short, developed by Yuval Peres. It doesn't even need to know the bias, and it makes very efficient use of the randomness available given enough data.
However, we can't apply this strategy directly because of the "recovery time" of the Geiger counter; a different approach is needed, one that makes no assumptions about how the intervals between clicks are distributed except that they are independent.

HotBits is a website which allows you to download random bits generated by Geiger counter clicks. HotBits simply takes pairs of successive intervals, and compares them. If the first is bigger, emit 0, if the second, emit 1, otherwise skip them and move on. This produces slightly under 0.5 bits per click.
A better strategy would be to estimate the median interval length, and then emit Heads for an interval which is under, and Tails if it's over; if you estimate the median wrong, then there will be a bias, but you feed the stream of coin flips to AMLS to get a guaranteed unbiased bit stream. This will produce slightly under 1 bit/click.
For rather more complexity, I developed today the following strategy which does rather better; against data such as HotBits uses and taking intervals in blocks of 8192, it produces around 5.72 bits/click. It's inspired by QuickSort and should be pretty easy to implement in C: the following code is in Python, but should be readable by any programmer:
def extract_juice(data): if len(data) < 3: return [] pivot = data[0] data[0:1] = [] flips = [x >= pivot for x in data] bits = amls(flips) left = [x for x in data if x < pivot] right = [x for x in data if x >= pivot] return (bits + extract_juice(left) + extract_juice(right))Essentially, instead of estimating the median it takes the very first sample in the block to be the pivot, and produces coin flips based on that which are run through AMLS for unbiasing. However, it then divides the block into two sub-blocks, one for those less than the pivot and one for those greater, and re-runs the algorithm over both sub-blocks. This produces provably unbiased bits, assuming only that the input intervals are shuffled fairly.
I've tried quite a few variants on this, but this one seems to be the winner for both simplicity and efficiency.
no subject
Date: 2002-11-25 09:25 am (UTC)no subject
Date: 2002-11-25 10:25 am (UTC)(Just out of interest, how do we know that radio-active decay is truly random?)
no subject
Date: 2002-11-25 10:33 am (UTC)Further, this greatly reduces the total randomness available. I discussed discarding intervals smaller than the recovery time of the Geiger counter and subtracting that time from the rest, but basically that throws away more randomness than it preserves.
no subject
Date: 2002-11-25 10:58 am (UTC)However, if there is a significant time interval between clicks you could get even more bits/click though (by treating it as frequency modulated wave). But then there might be a problem due to there being no upper bound on the wavelength. But how about forcing a click every x ms in that case? Not sure it would work...
Incidentally, couldn't discarding intervals smaller than the recovery time still cause problems because (and I'm thinking in terms of Geiger counters with high voltage across a gas that is ionized and conducts type things here) a second particle entering the detector during the recovery period would cause the recovery time to be extended, i.e. a constant stream of particles might cause the counter never to register anything past the first click?
(Even though it's not solid state I thought a spinning lead disc might be quite cute.)
no subject
Date: 2002-11-26 02:02 am (UTC)Yes, this would certainly be the case for the HotBits data - basically there are *no* intervals longer than the recovery time!
Timing the intervals?
Date: 2002-11-25 11:40 am (UTC)If your goal is to extract random bits efficiently then, given that a tick is quite short compared to the quiet interval, you could time the interval with a high resolution timer. Instead of a sequence of coin flips, you would then have a sequence of random numbers with inverse-exponential distribution: There is a fixed probability that NONE of the nuclei in your sample decays in the next microsecond and hence that the interval lasts another microsecond. Your randomless-collection rate would then be limited only by the time resolution of your electronics.
Pavlos
PS. Haven't read your program yet. Need food.
PSS. It just occurred to me that this use makes a Geiger counter doubly suspicious. Imagine trying to travel with one.
Re: Timing the intervals?
Date: 2002-11-25 12:31 pm (UTC)However, this leaves you throwing away a lot of data. Interval lengths detected by HotBits are not inverse-exponential at all, but an excellent fit to a normal distribution; this indicates that the source is so hot that the detector pretty much never fully recovers before detecting its next particle. In this situation, your strategies would be very unproductive.
So you need a strategy that doesn't care what the distribution of the intervals is, only that they are all independent. That's what the strategy above does.
I've written a C implementation now...
Re: Timing the intervals?
Date: 2002-11-25 09:58 pm (UTC)If you attempt to rank the intervals by length, as you seem to be doing, the number of bits that you can get per tick before any statistical correction seems limited by the number of ticks (n) in the block. You will get a particular ranking out of n!. Allegedly this can be approximated as (here is how I embarass myself by making some elementary algebra error):
so
So you should get roughly 94k bits per block, or a yield of 11 bits per sample, and the yield should grow roughly by one bit per tick every time you double the block. That, of course, assumes an infinitely fine timer. You then have to compensate for the fact that, with a finite-resolution timer, some intervals would have the same length and the ranking would be partial.
I've no idea how to begin to analyze the tradeoff between better yield, more computation to extract the random bits out of the ranking, and a better (higher resolution) timer that would result in fewer colliding intervals. All I'm trying to say is that I vaguely expect you need to do something like use the largest practical block in order to make efficient use of the resolution of your timer.
Pavlos
Re: Timing the intervals?
Date: 2002-11-26 01:59 am (UTC)Yes, it's for just this reason. I proposed this very strategy myself on sci.crypt before I realised this problem!
One way to remove the correlation is to have separate streams for the n+1th most significant bit based on the n-bit prefix - but if you start on that road you end up on this strategy.
Basically comparing intervals is the most effective way to assume nothing about their distribution!
At least for HotBits data, the tradeoff isn't as sharp as you might think, probably because the timer isn't high resolution enough. The s.d. of the integers is only 26.4, so colliding interval lengths are common. I've tried block sizes of 10 million with my C implementation, and I still can't get 7 bits/click, so I think I'm approaching a natural limit.
Re: Timing the intervals?
Date: 2002-11-26 10:55 am (UTC)Please excuse the crappy notation. Whould that be adequate explanation for the limited yield of the HotBits data? If not, I can't see what else might be the problem.
Pavlos
Re: Timing the intervals?
Date: 2002-11-26 11:16 am (UTC)no subject
Date: 2002-11-25 01:47 pm (UTC)If I understand it right, then if the half-life of a source is a year, the average time between clicks of your Geiger counter will be double what it was a year earlier.
This must mean that statistically the clicks will get further apart in time, even though they occur randomly.
So, hardware-recovery-time aside, is that a factor that influences the randomness, or is it absorbed in the AMLS algorithm, or is it just too small a factor to worry about?
no subject
Date: 2002-11-25 02:59 pm (UTC)It only matters within each "block" of intervals (perhaps a few thousand) that the algorithm processes, and each "block" is far too short a period of time for detectable change in rate.
no subject
Date: 2002-11-26 04:18 am (UTC)Not a terribly good example that as Russian Roulette isn't really terribly random and is quite easy to manipulate the chances of being shot in any given game.
no subject
Date: 2002-11-26 05:02 am (UTC)no subject
Date: 2002-11-26 06:03 am (UTC)For the nucleus, every second is a game of darts.
or
For the nucleus, every second is a game of dominoes.
etc.
no subject
Date: 2002-11-26 10:54 am (UTC)You been on the training course?
:-)
no subject
Date: 2002-11-27 02:26 am (UTC)wasn't so much a training course as an analysis
Re:
Date: 2002-11-27 05:42 am (UTC)Hmm.. just imagining a room full of thousands of people with guns to their heads, and some scientists with clipboards...
no subject
Date: 2002-11-28 01:43 am (UTC)Re:
Date: 2002-11-28 05:44 am (UTC)Must have got a bit messy by the end if there was only one...
no subject
Date: 2002-11-28 05:53 am (UTC)no subject
Date: 2002-11-26 06:29 am (UTC)myfun(t1/t2), myfun(t2/t3), myfun(t3/t4) ...
fun myfun(x) {
x := int(x)
if (x > 0) { return 1 } else {return 0 }
}
And so therefore the problem is that if you have a short interval the next interval is more likely to be longer, and if you have a long interval the next is more likely to be shorter, so therefore in the sequence a 1 is more likely to follow a 0 and vice versa (but not guaranteed to, hence the randomness)? Or to put an extreme case, if you had a sequence of 100 1s, the odds would be very hight that the next element would be 0 (because all the tns would be getting longer and longer, and therefore t99 would be considerably higher than the median). Or am I totaly wrong here...
(NB, I know this isn't dealing with the recovery period for the counter, but I'm still trying to get to grips with the basics.)
no subject
Date: 2002-11-26 07:03 am (UTC)myfun(t_1 - t_0), myfun(t_3 - t_2), myfun(t_5 - t_4)...
That's why it only produces 0.5 bits per sample. This strategy copes very well with the detector recovery stuff - it just doesn't produce a lot of data...
no subject
Date: 2002-11-26 08:18 am (UTC)And note that I'm not comparing (t_0, t_1) and then (t_1, t_2) - I'm doing (t_0, t_1), (t_2, t_3). So it doesn't have the problem you state, but it only produces a bit from every other interval.
no subject
Date: 2002-11-26 08:55 am (UTC)I've got it now (god, my brain must be slowing down). I'm overlooking the small details in programming, which probably suggests I've been in management too long.
So the image below has a top line with the actual particle emissions (and times marked in red), and a bottom line with what the geiger counter registers (and times marked in blue).
And because of this the ts are longer then they should be (just as you could get two particles being emitted simultaneously). But that doesn't matter if you compare consecutive pairs. Apologies for being so slow today.
Can you keep running the algorithm over sub-blocks of sub-blocks and so on?
no subject
Date: 2002-11-26 09:10 am (UTC)no subject
Date: 2002-11-27 11:34 am (UTC)It's also a unnecessary to get your bits directly from the source - it's much more efficient to get a seed of 'truly' random bits, then use either AES in counter mode or mersenne twister (depending on your cryptographic needs, or lack thereof) to generate later random numbers.
I'm very bitter about /dev/random blocking unnecessarily. It's a good example of something really basic which the crypto community as a whole can't get its act together on.
no subject
Date: 2002-11-27 12:16 pm (UTC)However, while in practice I have great faith in SHA-x and AES, their security is unproven. In fact, not only can we not prove that SHA-x is secure for this purpose - we don't even know how to state in precise formal terms what we're assuming about it.
The discussion arose out of how best to gather random bits for applications with unconditional provable security, such as the One Time Pad. I would nearly always recommend against using the OTP, which in most practical situations is less secure than more usual approaches, but I was intrigued by the engineering challenge of how one might gather bits for use in one while preserving the unconditional security.
In practice there are better sources of randomness than Geiger counters full stop, but if you want the strongest theoretical guarantees then an approach like this is interesting. It might also be useful as a conservative entropy estimator.
On your other point, /dev/random wasn't written by a cryptographer. I don't know if I see the point of /dev/random blocking myself - but then there's always /dev/urandom. AFAIK the Yarrow paper is still the best reference we have on practical design of an entropy-gathering CPRNG. You can't hold the community responsible for something that really isn't our responsibility - it's Ted Ts'o's.
no subject
Date: 2002-11-27 07:28 pm (UTC)Pavlos
/dev/random
Date: 2009-08-04 11:56 pm (UTC)I remember big arguments over how to estimate the entropy in the pool. I maintained that it didn't really matter because, if you use a good hash function (which may not be as easy as it once sounded), the generator degrades from "true" randomness to "practical" randomness -- an attacker can't predict the future sequence given reasonable CPU, which is all you can really ask for in any cipher.
It wasn't my idea to block /dev/random when the entropy falls too far; but as you say we also have /dev/urandom.
So yeah, the right way to use a Geiger counter to generate random bits is to strobe a fast clock with it, then hash the clock. And hash every other external event you can get your hands on: disk interrupts, keyboard interrupts, network interrupts, etc. Hashing any source of randomness "distills" its entropy regardless of its distribution, and if it's not really random then you've only lost a little CPU time.
just can't resist to comment, even 3+ years late ;)
Date: 2006-02-27 02:46 pm (UTC)Re: just can't resist to comment, even 3+ years late ;)
Date: 2010-04-03 01:50 pm (UTC)Re: just can't resist to comment, even 3+ years late ;)
Date: 2010-04-03 05:11 pm (UTC)