Date: Sat, 6 Oct 2001 16:49:15 -0700
Reply-To: "Karsten M. Self" <kmself@IX.NETCOM.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Karsten M. Self" <kmself@IX.NETCOM.COM>
Subject: Re: Friday Fishwrap: 1 Million Random Digits
In-Reply-To: <5.1.0.14.2.20011006234453.0260c320@pop3.powernet.co.uk>; from
John.W@mediscience.co.uk on Sat, Oct 06, 2001 at 11:48:47PM +0100
Content-Type: multipart/signed; micalg=pgp-sha1;
protocol="application/pgp-signature";
on Sat, Oct 06, 2001 at 11:48:47PM +0100, John Whittington (John.W@mediscience.co.uk) wrote:
> At 00:38 05/10/01 -0700, Karsten M. Self wrote (in part):
>
> >A recent discussion brought up the topic of a book published in the
> >1950s consisting of random digits. It was the RAND Corporation's _A
> >Million Random Digits with 100,000 Normal Deviates_, published in 1955,
> >the result of several years' exhaustive work, and the zenith of
> >technological capability at the time..... [snip]
> >
> >By contrast, I just created a file of 1000 random bytes in 17 seconds on a
> >last-year's model laptop.
>
> Karsten, I'm trying to work out how you can have taken so long, evening
> assuming you meant to say 100,000 rather than 1,000 !! With my
> steam-powered 1995/96 PC, I can't make it take more than 4 or 5 seconds to
> write a file of 100,000 random digits (each on a separate line). Were you
> perhaps creating this file on a floppy disk? :-)
What's the quality of your random stream? What are you using to
generate the random data?
Under GNU/Linux, there are two sources of random data, /dev/random, and
/dev/urandom. They differ in their treatment of the entropy pool, but
essentially, it's required that there be sufficient entropy in the
system to create high-quality random data, and will wait for the entropy
pool to regenerate if there isn't enough randomness available.
/dev/urandom will continue returning bytes regardless of available
entropy. This may lead to theoretical attacks in encryption
applications.
Though I haven't run tests on the data, my point was that a high-quality
random data stream was available in a matter of seconds, rather than
years. The time to generate high-quality random data is a function of
the size of the entropy pool and rate of addition of entropy, not of the
processing overhead or disk speed.
Repeating the experiment again today, I find:
# High-quality random data:
$ time dd if=/dev/random of=random count=1000 bs=1000
~48 seconds
# Low-quality random data:
$ time dd if=/dev/urandom of=urandom count=1000 bs=1000
~4 seconds
From man (4) random:
The character special files /dev/random and /dev/urandom (present
since Linux 1.3.30) provide an interface to the kernel's random
number generator. File /dev/random has major device number 1 and
minor device number 8. File /dev/urandom has major device number 1
and minor device number 9.
The random number generator gathers environmental noise from device
drivers and other sources into an entropy pool. The generator also
keeps an estimate of the number of bit of the noise in the entropy
pool. From this entropy pool random numbers are created.
When read, the /dev/random device will only return random bytes
within the estimated number of bits of noise in the entropy pool.
/dev/random should be suitable for uses that need very high quality
randomness such as one-time pad or key generation. When the entropy
pool is empty, reads to /dev/random will block until additional
environmental noise is gathered.
When read, /dev/urandom device will return as many bytes as are
requested. As a result, if there is not sufficient entropy in the
entropy pool, the returned values are theoretically vulnerable to
a cryptographic attack on the algorithms used by the driver.
Knowledge of how to do this is not available in the current
non-classified literature, but it is theoretically possible that
such an attack may exist. If this is a concern in your applica
tion, use /dev/random instead.
From the Linux kernel sources /usr/src/linux/drivers/char:
Sources of randomness from the environment include inter-keyboard
timings, inter-interrupt timings from some interrupts, and other
events which are both (a) non-deterministic and (b) hard for an
outside observer to measure. Randomness from these sources are
added to an "entropy pool", which is mixed using a CRC-like
function. This is not cryptographically strong, but it is adequate
assuming the randomness is not chosen maliciously, and it is fast
enough that the overhead of doing it on every interrupt is very
reasonable. As random bytes are mixed into the entropy pool, the
routines keep an *estimate* of how many bits of randomness have been
stored into the random number generator's internal state.
When random bytes are desired, they are obtained by taking the SHA
hash of the contents of the "entropy pool". The SHA hash avoids
exposing the internal state of the entropy pool. It is believed to
be computationally infeasible to derive any useful information about
the input of SHA from its output. Even if it is possible to analyze
SHA in some clever way, as long as the amount of data returned from
the generator is less than the inherent entropy in the pool, the
output data is totally unpredictable. For this reason, the routine
decreases its internal estimate of how many bits of "true
randomness" are contained in the entropy pool as it outputs random
numbers.
If this estimate goes to zero, the routine can still generate random
numbers; however, an attacker may (at least in theory) be able to
infer the future output of the generator from prior outputs. This
requires successful cryptanalysis of SHA, which is not believed to
be feasible, but there is a remote possibility. Nonetheless, these
numbers should be useful for the vast majority of purposes.
Peace.
--
Karsten M. Self <kmself@ix.netcom.com> http://kmself.home.netcom.com/
What part of "Gestalt" don't you understand? Home of the brave
http://gestalt-system.sourceforge.net/ Land of the free
Free Dmitry! Boycott Adobe! Repeal the DMCA! http://www.freesklyarov.org
Geek for Hire http://kmself.home.netcom.com/resume.html
[application/pgp-signature]