FreeBSD Manual Pages

home | help
dieharder(1)		    General Commands Manual		  dieharder(1)

NAME
       dieharder  -  A testing and benchmarking	tool for random	number genera-
       tors.

SYNOPSIS
       dieharder [-a] [-d dieharder test number] [-f filename] [-B]
		 [-D output flag [-D output flag] ... ]	[-F] [-c separator]
		 [-g generator number or -1] [-h] [-k ks_flag] [-l]
		 [-L overlap] [-m multiply_p] [-n ntuple]
		 [-p number of p samples] [-P Xoff]
		 [-o filename] [-s seed	strategy] [-S random number seed]
		 [-n ntuple] [-p number	of p samples] [-o filename]
		 [-s seed strategy] [-S	random number seed]
		 [-t number of test samples] [-v verbose flag]
		 [-W weak] [-X fail] [-Y Xtrategy]
		 [-x xvalue] [-y yvalue] [-z zvalue]

dieharder OPTIONS
       -a runs all the tests with standard/default options to create a
	      user-controllable	report.	 To control the	formatting of the  re-
	      port,  see  -D  below.   To control the power of the test	(which
	      uses default values for tsamples that cannot generally be	varied
	      and psamples which generally can)	see -m below as	a "multiplier"
	      of the default number of psamples	(used only in a	-a run).

       -d test number -	 selects specific diehard test.

       -f filename - generators	201 or 202 permit either raw binary or
	      formatted	ASCII numbers to be read in from a file	 for  testing.
	      generator	 200  reads  in	 raw  binary numbers from stdin.  Note
	      well: many tests with default parameters require a lot of	rands!
	      To see a sample of the (required)	header for ASCII formatted in-
	      put, run

		       dieharder -o -f example.input -t	10

	      and then examine the contents of example.input.  Raw binary  in-
	      put  reads  32  bit  increments  of  the	specified data stream.
	      stdin_input_raw accepts a	pipe from a raw	binary stream.

       -B binary mode (used with -o below) causes output rands to be written
       in raw binary, not formatted ascii.

       -D output flag -	permits	fields to be selected for inclusion in
	      dieharder	output.	 Each flag can be entered as a	binary	number
	      that turns on a specific output field or header or by flag name;
	      flags  are aggregated.  To see all currently known flags use the
	      -F command.

       -F - lists all known flags by name and number.

       -c table	separator - where separator is e.g. ','	(CSV) or ' ' (white-
       space).

       -g generator number - selects a specific	generator for testing.	Using
	      -g -1 causes all known generators	to be printed out to the  dis-
	      play.

       -h prints context-sensitive help	-- usually Usage (this message)	or a
	      test synopsis if entered as e.g. dieharder -d 3 -h.

       -k ks_flag - ks_flag

	      0	is fast	but slightly sloppy for	psamples > 4999	(default).

	      1	 is  MUCH slower but more accurate for larger numbers of psam-
	      ples.

	      2	is slower still, but (we hope) accurate	to  machine  precision
	      for  any	number of psamples up to some as yet unknown numerical
	      upper limit (it has been tested out  to  at  least  hundreds  of
	      thousands).

	      3	is kuiper ks, fast, quite inaccurate for small samples,	depre-
	      cated.

       -l list all known tests.

       -L overlap

	      1	(use overlap, default)

	      0	(don't use overlap)

	      in  operm5 or other tests	that support overlapping and non-over-
	      lapping sample modes.

       -m multiply_p - multiply	default	# of psamples in -a(ll)	runs to	crank
	      up the resolution	of failure.  -n	ntuple - set ntuple length for
	      tests on short bit strings that permit the length	to  be	varied
	      (e.g. rgb	bitdist).

       -o filename - output -t count random numbers from current generator to
       file.

       -p count	- sets the number of p-value samples per test (default 100).

       -P Xoff - sets the number of psamples that will cumulate	before decid-
       ing
	      that  a generator	is "good" and really, truly passes even	a -Y 2
	      T2D run.	Currently the default is 100000; eventually it will be
	      set from AES-derived T2D test failure thresholds for fully auto-
	      mated reliable operation,	but for	now it	is  more  a  "boredom"
	      threshold	 set  by how long one might reasonably want to wait on
	      any given	test run.

       -S seed - where seed is a uint.	Overrides the default random seed
	      selection.  Ignored for file or stdin input.

       -s strategy - if	strategy is the	(default) 0, dieharder reseeds (or
	      rewinds) once at the beginning when the random number  generator
	      is  selected  and	then never again.  If strategy is nonzero, the
	      generator	is reseeded or rewound at the beginning	of EACH	 TEST.
	      If  -S  seed  was	specified, or a	file is	used, this means every
	      test is applied to the same sequence (which is useful for	 vali-
	      dation  and  testing  of	dieharder,  but	not a good way to test
	      rngs).  Otherwise	a new random seed is selected for each test.

       -t count	- sets the number of random entities used in each test,	where
	      possible.	 Be warned -- some tests have fixed sample sizes; oth-
	      ers are variable but have	practical minimum sizes.  It  is  sug-
	      gested you begin with the	values used in -a and experiment care-
	      fully on a test by test basis.

       -W weak - sets the "weak" threshold to make the test(s) more or less
	      forgiving	 during	 e.g.  a  test-to-destruction run.  Default is
	      currently	0.005.

       -X fail - sets the "fail" threshold to make the test(s) more or less
	      forgiving	during e.g. a  test-to-destruction  run.   Default  is
	      currently	 0.000001,  which is basically "certain	failure	of the
	      null hypothesis",	the desired  mode  of  reproducible  generator
	      failure.

       -Y Xtrategy - the Xtrategy flag controls	the new	"test to failure"
       (T2F)
	      modes.  These flags and their modes act as follows:

		0  -  just run dieharder with the specified number of tsamples
	      and psamples, do not dynamically modify a	run based on  results.
	      This is the way it has always run, and is	the default.

		1  - "resolve ambiguity" (RA) mode.  If	a test returns "weak",
	      this is an undesired result.  What does that  mean,  after  all?
	      If  you run a long test series, you will see occasional weak re-
	      turns for	a perfect generators because p is  uniformly  distrib-
	      uted  and	 will appear in	any finite interval from time to time.
	      Even if a	test run returns more than one weak result, you	cannot
	      be certain that the generator is failing.	 RA mode adds psamples
	      (usually in blocks of 100) until the test	result ends up solidly
	      not weak or proceeds to unambiguous failure.   This  is  morally
	      equivalent  to  running  the test	several	times to see if	a weak
	      result is	reproducible, but  eliminates  the  bias  of  personal
	      judgement	 in the	process	since the default failure threshold is
	      very small and very unlikely to be reached by random chance even
	      in many runs.

	      This option should only be used with -k 2.

		2 - "test to destruction" mode.	 Sometimes you	just  want  to
	      know  where  or if a generator will .I ever fail a test (or test
	      series).	-Y 2 causes psamples to	be added 100 at	a time until a
	      test returns an overall pvalue lower than	the failure  threshold
	      or a specified maximum number of psamples	(see -P) is reached.

	      Note  well!  In this mode	one may	well fail due to the alternate
	      null hypothesis -- the test itself is  a	bad  test  and	fails!
	      Many  dieharder tests, despite our best efforts, are numerically
	      unstable or have only approximately known	target	statistics  or
	      are straight up asymptotic results, and will eventually return a
	      failing result even for a	gold-standard generator	(such as AES),
	      or  for the hypercautious	the XOR	generator with AES, threefish,
	      kiss, all	loaded at once and xor'd together.   It	 is  therefore
	      safest to	use this mode .I comparatively,	executing a T2D	run on
	      AES to get an idea of the	test failure threshold(s) (something I
	      will  eventually	do and publish on the web so everybody doesn't
	      have to do it independently) and then running it on your	target
	      generator.   Failure with	numbers	of psamples within an order of
	      magnitude	of the AES thresholds should  probably	be  considered
	      possible	test  failures,	 not  generator	failures.  Failures at
	      levels significantly less	than the known gold standard generator
	      failure thresholds are, of course, probably failures of the gen-
	      erator.

	      This option should only be used with -k 2.

       -v verbose flag -- controls the verbosity of the	output for debugging
	      only.  Probably of little	use to non-developers, and  developers
	      can  read	the enum(s) in dieharder.h and the test	sources	to see
	      which flag values	turn on	output on which	routines.  1 is	result
	      in a highly detailed trace of program activity.

       -x,-y,-z	number - Some tests have parameters that can safely be varied
	      from their default value.	 For example, in the diehard birthdays
	      test, one	can vary the number of length, which can also be  var-
	      ied.  -x 2048 -y 30 alters these two values but should still run
	      fine.   These  parameters	should be documented internally	(where
	      they exist) in the e.g. -d 0 -h visible notes.

	      NOTE WELL: The assessment(s) for the rngs	may, in	fact, be  com-
	      pletely incorrect	or misleading.	There are still	"bad tests" in
	      dieharder,  although we are working to fix and improve them (and
	      try to document them in the test descriptions  visible  with  -g
	      testnumber  -h).	In particular, 'Weak' pvalues should occur one
	      test in two hundred, and 'Failed'	pvalues	should occur one  test
	      in  a million with the default thresholds	- that's what p	MEANS.
	      Use them at your Own Risk!  Be Warned!

	      Or better	yet, use the new -Y 1 and -Y 2	resolve	 ambiguity  or
	      test  to	destruction  modes above, comparing to similar runs on
	      one of the as-good-as-it-gets cryptographic generators,  AES  or
	      threefish.

DESCRIPTION
       dieharder

       Welcome	to the current snapshot	of the dieharder random	number tester.
       It encapsulates all of the Gnu Scientific Library (GSL)	random	number
       generators  (rngs) as well as a number of generators from the R statis-
       tical library, hardware sources such as /dev/*random,  "gold  standard"
       cryptographic  quality generators (useful for testing dieharder and for
       purposes	of comparison to new generators) as well  as  generators  con-
       tributed	by users or found in the literature into a single harness that
       can  time them and subject them to various tests	for randomness.	 These
       tests are variously drawn from George Marsaglia's "Diehard  battery  of
       random  number  tests", the NIST	Statistical Test Suite,	and again from
       other sources such as  personal	invention,  user  contribution,	 other
       (open source) test suites, or the literature.

       The  primary  point  of	dieharder  is to make it easy to time and test
       (pseudo)random number generators, including both	software and  hardware
       rngs,  with  a  fully  open source tool.	 In addition to	providing "in-
       stant" access to	testing	of all built-in	generators, users  can	choose
       one  of	three  ways  to	 test  their  own  random number generators or
       sources:	 a unix	pipe of	a raw binary (presumed	random)	 bitstream;  a
       file  containing	 a (presumed random) raw binary	bitstream or formatted
       ascii uints or floats; and embedding your generator in dieharder's GSL-
       compatible rng harness and adding it to the list	 of  built-in  genera-
       tors.   The  stdin  and file input methods are described	below in their
       own section, as is suggested "best practice" for	newbies	to random num-
       ber generator testing.

       An important motivation for using dieharder is  that  the  entire  test
       suite  is  fully	 Gnu  Public  License (GPL) open source	code and hence
       rather than being prohibited from "looking  underneath  the  hood"  all
       users  are  openly  encouraged to critically examine the	dieharder code
       for errors, add new tests or generators or user interfaces, or  use  it
       freely  as is to	test their own favorite	candidate rngs subject only to
       the constraints of the GPL.  As a result	 of  its  openness,  literally
       hundreds	 of  improvements and bug fixes	have been contributed by users
       to date,	resulting in a far stronger and	more reliable test suite  than
       would  have  been  possible with	closed and locked down sources or even
       open sources (such as STS) that lack the	dynamical  feedback  mechanism
       permitting corrections to be shared.

       Even  small  errors  in test statistics permit the alternative (usually
       unstated) null hypothesis to become an important	factor in rng  testing
       -- the unwelcome	possibility that your generator	is just	fine but it is
       the test	that is	failing.  One extremely	useful feature of dieharder is
       that  it	is at least moderately self validating.	 Using the "gold stan-
       dard" aes and threefish cryptographic generators, you can  observe  how
       these  generators  perform on dieharder runs to the same	general	degree
       of accuracy that	you wish to use	on the generators you are testing.  In
       general,	dieharder tests	that consistently fail at any given  level  of
       precision  (selected  with  e.g.	-a -m 10) on both of the gold standard
       rngs (and/or the	better GSL generators, mt19937,	gfsr4, taus) are prob-
       ably unreliable at that precision and it	would hardly be	surprising  if
       they failed your	generator as well.

       Experts	in  statistics are encouraged to give the suite	a try, perhaps
       using any of the	example	calls below at first and then using it	freely
       on  their  own  generators  or as a harness for adding their own	tests.
       Novices (to either statistics or	random number generator	 testing)  are
       strongly	 encouraged  to	read the next section on p-values and the null
       hypothesis and running the test suite a few times with a	 more  verbose
       output report to	learn how the whole thing works.

QUICK START EXAMPLES
       Examples	 for  how  to set up pipe or file input	are given below.  How-
       ever, it	is recommended that a user play	with some of the built in gen-
       erators to gain familiarity with	dieharder  reports  and	 tests	before
       tackling	 their	own favorite generator or file full of possibly	random
       numbers.

       To see dieharder's default standard test	report for its default genera-
       tor (mt19937) simply run:

	  dieharder -a

       To increase the resolution of possible failures of the standard	-a(ll)
       test,  use  the -m "multiplier" for the test default numbers of pvalues
       (which are selected more	to make	a full test run	take an	hour or	so in-
       stead of	days than because it is	truly  an  exhaustive  test  sequence)
       run:

	  dieharder -a -m 10

       To  test	 a  different generator	(say the gold standard AES_OFB)	simply
       specify the generator on	the command line with a	flag:

	  dieharder -g 205 -a -m 10

       Arguments can be	in any order.  The generator can also be  selected  by
       name:

	  dieharder -g AES_OFB -a

       To  apply  only the diehard opso	test to	the AES_OFB generator, specify
       the test	by name	or number:

	  dieharder -g 205 -d 5

       or

	  dieharder -g 205 -d diehard_opso

       Nearly every aspect or field in dieharder's  output  report  format  is
       user-selectable	by  means  of  display option flags.  In addition, the
       field separator character can be	selected by the	user to	make the  out-
       put  particularly  easy	for  them  to  parse (-c ' ') or import	into a
       spreadsheet (-c ',').  Try:

	  dieharder -g 205 -d diehard_opso -c ',' -D test_name -D pvalues

       to see an extremely terse, easy to import report	or

	  dieharder -g 205 -d diehard_opso -c '	' -D default -D	 histogram  -D
       description

       to  see a verbose report	good for a "beginner" that includes a full de-
       scription of each test itself.

       Finally,	the dieharder binary is	remarkably autodocumenting even	if the
       man page	is not available. All users should try the following  commands
       to see what they	do:

	  dieharder -h

       (prints the command synopsis like the one above).

	  dieharder -a -h
	  dieharder -d 6 -h

       (prints the test	descriptions only for -a(ll) tests or for the specific
       test indicated).

	  dieharder -l

       (lists all known	tests, including how reliable rgb thinks that they are
       as things stand).

	  dieharder -g -1

       (lists all known	rngs).

	  dieharder -F

       (lists  all  the	currently known	display/output control flags used with
       -D).

       Both beginners and experts should be aware that the assessment provided
       by dieharder in its standard report should be regarded with great  sus-
       picion.	It is entirely possible	for a generator	to "pass" all tests as
       far  as their individual	p-values are concerned and yet to fail utterly
       when considering	them all together.  Similarly, it is probable  that  a
       rng  will  at  the very least show up as	"weak" on 0, 1 or 2 tests in a
       typical -a(ll) run, and may even	"fail" 1 test one such run  in	10  or
       so.   To	understand why this is so, it is necessary to understand some-
       thing of	rng testing, p-values, and the null hypothesis!

P-VALUES AND THE NULL HYPOTHESIS
       dieharder returns "p-values".  To understand what a p-value is and  how
       to use it, it is	essential to understand	the null hypothesis, H0.

       The null	hypothesis for random number generator testing is "This	gener-
       ator  is	 a perfect random number generator, and	for any	choice of seed
       produces	a infinitely long, unique sequence of numbers  that  have  all
       the  expected statistical properties of random numbers, to all orders".
       Note well that we know that this	hypothesis is  technically  false  for
       all  software  generators as they are periodic and do not have the cor-
       rect entropy content for	this statement to ever be true.	 However, many
       hardware	generators fail	a priori as well, as they contain subtle  bias
       or  correlations	 due to	the deterministic physics that underlies them.
       Nature is often unpredictable but it is rarely random and the two words
       don't (quite) mean the same thing!

       The null	hypothesis can be practically true,  however.	Both  software
       and  hardware  generators  can  be "random" enough that their sequences
       cannot be distinguished from random ones, at least not easily  or  with
       the available tools (including dieharder!) Hence	the null hypothesis is
       a practical, not	a theoretically	pure, statement.

       To  test	 H0  ,	one uses the rng in question to	generate a sequence of
       presumably random numbers.  Using these numbers one  can	 generate  any
       one  of a wide range of test statistics -- empirically computed numbers
       that are	considered random samples that may or  may  not	 be  covariant
       subject	to  H0,	 depending  on whether overlapping sequences of	random
       numbers are used	to generate successive samples	while  generating  the
       statistic(s), drawn from	a known	distribution.  From a knowledge	of the
       target  distribution  of	the statistic(s) and the associated cumulative
       distribution function (CDF) and the empirical  value  of	 the  randomly
       generated  statistic(s),	 one can read off the probability of obtaining
       the empirical result if the sequence was	truly random, that is, if  the
       null  hypothesis	is true	and the	generator in question is a "good" ran-
       dom number generator!  This probability is the "p-value"	for  the  par-
       ticular test run.

       For  example,  to  test	a coin (or a sequence of bits) we might	simply
       count the number	of heads and tails in a	very long string of flips.  If
       we assume that the coin is a "perfect coin", we expect  the  number  of
       heads and tails to be binomially	distributed and	can easily compute the
       probability of getting any particular number of heads and tails.	 If we
       compare	our recorded number of heads and tails from the	test series to
       this distribution and find that the probability of getting the count we
       obtained	is very	low with, say, way more	heads than tails we'd  suspect
       the coin	wasn't a perfect coin.	dieharder applies this very test (made
       mathematically precise) and many	others that operate on this same prin-
       ciple  to the string of random bits produced by the rng being tested to
       provide a picture of how	"random" the rng is.

       Note that the usual dogma is that if the	p-value	is  low	 --  typically
       less  than 0.05 -- one "rejects"	the null hypothesis.  In a word, it is
       improbable that one would get the result	obtained if the	generator is a
       good one.  If it	is any other value, one	does not "accept" the  genera-
       tor  as	good, one "fails to reject" the	generator as bad for this par-
       ticular test.  A	"good random number generator" is hence	 one  that  we
       haven't been able to make fail yet!

       This  criterion	is, of course, naive in	the extreme and	cannot be used
       with dieharder!	It makes just as much sense to reject a	generator that
       has p-values of 0.95 or more!  Both of these p-value ranges are equally
       unlikely	on any given test run, and should be returned for (on average)
       5% of all test runs by a	perfect	random number generator.  A  generator
       that  fails  to	produce	 p-values  less	than 0.05 5% of	the time it is
       tested with different seeds is a	bad random number generator, one  that
       fails  the  test	 of the	null hypothesis.  Since	dieharder returns over
       100 pvalues by default per test,	one would expect  any  perfectly  good
       rng  to "fail" such a naive test	around five times by this criterion in
       a single	dieharder run!

       The p-values themselves,	as it turns  out,  are	test  statistics!   By
       their  nature,  p-values	 should	 be uniformly distributed on the range
       0-1.  In	100+ test runs with independent	seeds, one should not be  sur-
       prised  to  obtain 0, 1,	2, or even (rarely) 3 p-values less than 0.01.
       On the other hand obtaining 7 p-values in the range 0.24-0.25, or  see-
       ing that	70 of the p-values are greater than 0.5	should make the	gener-
       ator highly suspect!  How can a user determine when a test is producing
       "too many" of any particular value range	for p?	Or too few?

       Dieharder  does	it  for	you, automatically.  One can in	fact convert a
       set of p-values into a p-value by comparing their distribution  to  the
       expected	one, using a Kolmogorov-Smirnov	test against the expected uni-
       form distribution of p.

       These  p-values	obtained  from looking at the distribution of p-values
       should in turn be uniformly distributed and could in principle be  sub-
       jected to still more KS tests in	aggregate.  The	distribution of	p-val-
       ues  for	 a  good generator should be idempotent, even across different
       test statistics and multiple runs.

       A failure of the	distribution of	p-values at any	level  of  aggregation
       signals	trouble.   In fact, if the p-values of any given test are sub-
       jected to a KS test, and	those p-values are  then  subjected  to	 a  KS
       test,  as  we  add more p-values	to either level	we will	either observe
       idempotence of the resulting distribution of p  to  uniformity,	or  we
       will  observe idempotence to a single p-value of	zero!  That is,	a good
       generator will produce a	roughly	uniform	distribution of	 p-values,  in
       the  specific  sense that the p-values of the distributions of p-values
       are themselves roughly uniform and so on	ad infinitum, while a bad gen-
       erator will produce a non-uniform distribution of p-values, and as more
       p-values	drawn from the non-uniform distribution	are added  to  its  KS
       test, at	some point the failure will be absolutely unmistakeable	as the
       resulting p-value approaches 0 in the limit.  Trouble indeed!

       The question is,	trouble	with what?  Random number tests	are themselves
       complex	computational  objects,	 and there is a	probability that their
       code is incorrectly framed or that roundoff or other numerical  --  not
       methodical  -- errors are contributing to a distortion of the distribu-
       tion of some of the p-values obtained.  This is not  an	idle  observa-
       tion;  when  one	 works on writing random number	generator testing pro-
       grams, one is always testing the	tests themselves with "good" (we hope)
       random number generators	so that	egregious failures of the null hypoth-
       esis signal not a bad generator but an error in	the  test  code.   The
       null  hypothesis	 above is correctly framed from	a theoretical point of
       view, but from a	real and practical point of view it should read: "This
       generator is a perfect random number generator, and for any  choice  of
       seed  produces  a infinitely long, unique sequence of numbers that have
       all the expected	statistical properties of random numbers, to  all  or-
       ders  and  this test is a perfect test and returns precisely correct p-
       values from the test computation."  Observed "failure"  of  this	 joint
       null  hypothesis	 H0'  can come from failure of either or both of these
       disjoint	components, and	comes from the second as often or  more	 often
       than the	first during the test development process.  When one cranks up
       the  "resolution"  of  the  test	 (discussed next) to where a generator
       starts to fail some test	one realizes, or should	realize, that develop-
       ment never ends and that	new test regimes will always reveal new	 fail-
       ures not	only of	the generators but of the code.

       With  that  said, one of	dieharder's most significant advantages	is the
       control that it gives you over a	critical test parameter.  From the re-
       marks above, we can see that we should feel  very  uncomfortable	 about
       "failing"  any  given  random number generator on the basis of a	5%, or
       even a 1%, criterion, especially	 when  we  apply  a  test  suite  like
       dieharder  that	returns	over 100 (and climbing)	distinct test p-values
       as of the last snapshot.	 We want failure to be unambiguous and	repro-
       ducible!

       To  accomplish this, one	can simply crank up its	resolution.  If	we ran
       any given test against a	random number generator	and it returned	 a  p-
       value of	(say) 0.007328,	we'd be	perfectly justified in wondering if it
       is  really  a good generator.  However, the probability of getting this
       result isn't really all that small -- when one uses dieharder for hours
       at a time numbers like this will	definitely happen quite	frequently and
       mean nothing.  If one runs the same test	again (with a  different  seed
       or  part	 of the	random sequence) and gets a p-value of 0.009122, and a
       third time and gets 0.002669 -- well, that's three 1% (or  less)	 shots
       in  a  row and that should happen only one in a million times.  One way
       to clearly resolve failures, then, is to	increase the number of	p-val-
       ues generated in	a test run.  If	the actual distribution	of p being re-
       turned  by  the test is not uniform, a KS test will eventually return a
       p-value that is not some	ambiguous 0.035517 but	is  instead  0.000000,
       with the	latter produced	time after time	as we rerun.

       For  this  reason, dieharder is extremely conservative about announcing
       rng "weakness" or "failure" relative to any given test.	It's  internal
       criterion for these things are currently	p < 0.5% or p >	99.5% weakness
       (at the 1% level	total) and a considerably more stringent criterion for
       failure:	 p  < 0.05% or p > 99.95%.  Note well that the ranges are sym-
       metric -- too high a value of p is just as bad (and  unlikely)  as  too
       low,  and it is critical	to flag	it, because it is quite	possible for a
       rng to be too good, on average, and not to produce enough low  p-values
       on  the	full  spectrum	of  dieharder  tests.  This is where the final
       kstest is of paramount importance, and where the	"histogram" option can
       be very useful to help you visualize the	failure	in the distribution of
       p -- run	e.g.:

	 dieharder [whatever] -D default -D histogram

       and you will see	a crude	ascii histogram	of the pvalues that failed (or
       passed) any given level of test.

       Scattered reports of weakness or	 marginal  failure  in	a  preliminary
       -a(ll)  run should therefore not	be immediate cause for alarm.  Rather,
       they are	tests to repeat, to watch out for, to push the rng  harder  on
       using  the -m option to -a or simply increasing -p for a	specific test.
       Dieharder permits one to	increase the number of p-values	generated  for
       any  test,  subject  only  to the availability of enough	random numbers
       (for file based tests) and time,	to make	failures unambiguous.  A  test
       that  is	 truly	weak  at -p 100	will almost always fail	egregiously at
       some larger value of psamples, be it -p 1000 or	-p  100000.   However,
       because dieharder is a research tool and	is under perpetual development
       and  testing, it	is strongly suggested that one always consider the al-
       ternative null hypothesis -- that the failure is	a failure of the  test
       code  in	dieharder itself in some limit of large	numbers	-- and take at
       least some steps	(such as running the same test at the same  resolution
       on  a  "gold  standard" generator) to ensure that the failure is	indeed
       probably	in the rng and not the dieharder code.

       Lacking a source	of perfect random numbers to use as a reference, vali-
       dating the tests	themselves is not easy and always leaves one with some
       ambiguity (even aes or threefish).  During development the best one can
       usually do is to	rely heavily on	these "presumed	 good"	random	number
       generators.   There are a number	of generators that we have theoretical
       reasons to expect to be extraordinarily good and	to  lack  correlations
       out to some known underlying dimensionality, and	that also test out ex-
       tremely	well quite consistently.  By using several such	generators and
       not just	one, one can hope that those  generators  have	(at  the  very
       least)  different correlations and should not all uniformly fail	a test
       in the same way and with	the same number	 of  p-values.	 When  all  of
       these  generators  consistently fail a test at a	given level, I tend to
       suspect that the	problem	is in the test code, not the  generators,  al-
       though  it  is  very  difficult	to  be	certain,  and  many  errors in
       dieharder's code	have been discovered and ultimately fixed in just this
       way by myself or	others.

       One advantage of	dieharder is that it has a number of these "good  gen-
       erators"	immediately available for comparison runs, courtesy of the Gnu
       Scientific  Library  and	 user  contribution  (notably David Bauer, who
       kindly encapsulated aes and threefish).	I use AES_OFB,	Threefish_OFB,
       mt19937_1999,  gfsr4,  ranldx2 and taus2	(as well as "true random" num-
       bers from random.org) for this  purpose,	 and  I	 try  to  ensure  that
       dieharder  will	"pass" in particular the -g 205	-S 1 -s	1 generator at
       any reasonable p-value resolution out to	-p 1000	or farther.

       Tests (such as the diehard operm5 and sums test)	that consistently fail
       at these	high resolutions are flagged as	being  "suspect"  --  possible
       failures	 of  the  alternative null hypothesis -- and they are strongly
       deprecated!  Their results should not be	used  to  test	random	number
       generators pending agreement in the statistics and random number	commu-
       nity  that  those  tests	are in fact valid and correct so that observed
       failures	can indeed safely be attributed	to a failure of	 the  intended
       null hypothesis.

       As  I  keep  emphasizing	(for good reason!) dieharder is	community sup-
       ported.	I therefore openly ask that the	users of dieharder who are ex-
       pert in statistics to help me fix the code or algorithms	 being	imple-
       mented.	I would	like to	see this test suite ultimately be validated by
       the  general  statistics	 community in hard use in an open environment,
       where every possible failure of the testing mechanism itself is subject
       to scrutiny and eventual	correction.  In	this way  we  will  eventually
       achieve	a very powerful	suite of tools indeed, ones that may well give
       us very specific	information not	just about failure but of the mode  of
       failure as well,	just how the sequence tested deviates from randomness.

       Thus  far,  dieharder  has  benefitted tremendously from	the community.
       Individuals have	openly contributed tests, new generators to be tested,
       and fixes for existing tests that were revealed by their	own work  with
       the  testing  instrument.   Efforts are underway	to make	dieharder more
       portable	so that	it will	build on more platforms	 and  faster  so  that
       more thorough testing can be done.  Please feel free to participate.

FILE INPUT
       The  simplest way to use	dieharder with an external generator that pro-
       duces raw binary	(presumed random) bits is to pipe the raw binary  out-
       put  from  this generator (presumed to be a binary stream of 32 bit un-
       signed integers)	directly into dieharder, e.g.:

	 cat /dev/urandom | ./dieharder	-a -g 200

       Go ahead	and try	this example.  It will run the entire dieharder	 suite
       of  tests  on  the  stream  produced  by	 the  linux built-in generator
       /dev/urandom (using /dev/random is not recommended as it	is too slow to
       test in a reasonable amount of time).

       Alternatively, dieharder	can be used to test files of numbers  produced
       by a candidate random number generators:

	 dieharder -a -g 201 -f	random.org_bin

       for raw binary input or

	 dieharder -a -g 202 -f	random.org.txt

       for formatted ascii input.

       A  formatted  ascii input file can accept either	uints (integers	in the
       range 0 to 2^31-1, one per line)	or decimal uniform  deviates  with  at
       least ten significant digits (that can be multiplied by UINT_MAX	= 2^32
       to  produce  a  uint  without  dropping	precition), also one per line.
       Floats with fewer digits	will almost certainly fail bitlevel tests, al-
       though they may pass some of the	tests that act on uniform deviates.

       Finally,	one can	fairly easily wrap any generator  in  the  same	 (GSL)
       random  number  harness used internally by dieharder and	simply test it
       the same	way one	would  any  other  internal  generator	recognized  by
       dieharder.   This is strongly recommended where it is possible, because
       dieharder needs to use a	lot of random numbers  to  thoroughly  test  a
       generator.  A built in generator	can simply let dieharder determine how
       many  it	 needs	and  generate them on demand, where a file that	is too
       small will "rewind" and render the test results where a	rewind	occurs
       suspect.

       Note  well  that	file input rands are delivered to the tests on demand,
       but if the test needs more than are available  it  simply  rewinds  the
       file  and cycles	through	it again, and again, and again as needed.  Ob-
       viously this significantly reduces the sample space  and	 can  lead  to
       completely  incorrect  results  for the p-value histograms unless there
       are enough rands	to run EACH test without repetition (it	is harmless to
       reuse the sequence for different	tests).	 Let the user beware!

BEST PRACTICE
       A frequently asked question from	new users wishing to test a  generator
       they  are  working  on for fun or profit	(or both) is "How should I get
       its  output  into  dieharder?"	This  is  a  nontrivial	 question,  as
       dieharder  consumes  enormous  numbers of random	numbers	in a full test
       cycle, and then there are features like -m 10 or	-m 100	that  let  one
       effortlessly  demand  10	or 100 times as	many to	stress a new generator
       even more.

       Even with large file support in dieharder, it is	difficult  to  provide
       enough  random numbers in a file	to really make dieharder happy.	 It is
       therefore strongly suggested that you either:

       a) Edit the output stage	of your	random number generator	and get	it  to
       write its production to stdout as a random bit stream --	basically cre-
       ate  32	bit unsigned random integers and write them directly to	stdout
       as e.g. char data or raw	binary.	 Note that this	is  not	 the  same  as
       writing raw floating point numbers (that	will not be random at all as a
       bitstream) and that "endianness"	of the uints should not	matter for the
       null  hypothesis	 of  a "good" generator, as random bytes are random in
       any order.  Crank the generator and feed	this stream to dieharder in  a
       pipe as described above.

       b) Use the samples of GSL-wrapped dieharder rngs	to similarly wrap your
       generator  (or  calls  to your generator's hardware interface).	Follow
       the examples in the ./dieharder source directory	to add it as a	"user"
       generator in the	command	line interface,	rebuild, and invoke the	gener-
       ator  as	 a  "native" dieharder generator (it should appear in the list
       produced	by -g -1 when done correctly).	The advantage of doing it this
       way is that you can then	(if your new generator is  highly  successful)
       contribute  it  back to the dieharder project if	you wish!  Not to men-
       tion the	fact that it makes testing it very easy.

       Most users will probably	go with	option a) at least initially,  but  be
       aware  that  b) is probably easier than you think.  The dieharder main-
       tainers may be able to give you a hand with it if you get into trouble,
       but no promises.

WARNING!
       A warning for those who are testing files of random numbers.  dieharder
       is a tool that tests random number generators, not files	of random num-
       bers!  It is extremely inappropriate to try to "certify"	a file of ran-
       dom numbers as being random just	because	it fails to "fail" any of  the
       dieharder  tests	in e.g.	a dieharder -a run.  To	put it bluntly,	if one
       rejects all such	files that fail	any test at the	 0.05  level  (or  any
       other),	the one	thing one can be certain of is that the	files in ques-
       tion are	not random, as a truly random sequence would  fail  any	 given
       test at the 0.05	level 5% of the	time!

       To put it another way, any file of numbers produced by a	generator that
       "fails to fail" the dieharder suite should be considered	"random", even
       if  it contains sequences that might well "fail"	any given test at some
       specific	cutoff.	 One has to presume that passing the broader tests  of
       the  generator itself, it was determined	that the p-values for the test
       involved	was globally correctly distributed, so that  e.g.  failure  at
       the 0.01	level occurs neither more nor less than	1% of the time,	on av-
       erage,  over many many tests.  If one particular	file generates a fail-
       ure at this level, one can therefore safely presume that	it is a	random
       file pulled from	many thousands of similar files	 the  generator	 might
       create  that have the correct distribution of p-values at all levels of
       testing and aggregation.

       To sum up, use dieharder	to validate your  generator  (via  input  from
       files  or an embedded stream).  Then by all means use your generator to
       produce files or	streams	of random numbers.  Do not use dieharder as an
       accept/reject tool to validate the files	themselves!

EXAMPLES
       To demonstrate all tests, run on	the default GSL	rng, enter:

	 dieharder -a

       To demonstrate a	test of	an external generator of a raw	binary	stream
       of bits,	use the	stdin (raw) interface:

	 cat /dev/urandom | dieharder -g 200 -a

       To use it with an ascii formatted file:

	 dieharder -g 202 -f testrands.txt -a

       (testrands.txt should consist of	a header such as:

	#==================================================================
	# generator mt19937_1999  seed = 1274511046
	#==================================================================
	type: d
	count: 100000
	numbit:	32
	3129711816
	  85411969
	2545911541

       etc.).

       To use it with a	binary file

	 dieharder -g 201 -f testrands.bin -a

       or

	 cat testrands.bin | dieharder -g 200 -a

       An  example that	demonstrates the use of	"prefixes" on the output lines
       that make it relatively easy to filter off the different	parts  of  the
       output  report  and chop	them up	into numbers that can be used in other
       programs	or in spreadsheets, try:

	 dieharder -a -c ',' -D	default	-D prefix

DISPLAY	OPTIONS
       As of version 3.x.x, dieharder has a single output interface that  pro-
       duces  tabular  data per	test, with common information in headers.  The
       display control options and flags can be	used to	customize  the	output
       to your individual specific needs.

       The  options are	controlled by binary flags.  The flags,	and their text
       versions, are displayed if you enter:

	 dieharder -F

       by itself on a line.

       The flags can be	entered	all at once by adding up all the  desired  op-
       tion  flags.   For  example,  a very sparse output could	be selected by
       adding the flags	for the	test_name (8) and the associated pvalues (128)
       to get 136:

	 dieharder -a -D 136

       Since the flags are cumulated from zero (unless no flag is entered  and
       the default is used) you	could accomplish the same display via:

	 dieharder -a -D 8 -D pvalues

       Note  that you can enter	flags by value or by name, in any combination.
       Because people use dieharder to obtain values and then with  to	export
       them into spreadsheets (comma separated values) or into filter scripts,
       you can chance the field	separator character.  For example:

	 dieharder -a -c ',' -D	default	-D -1 -D -2

       produces	 output	 that  is ideal	for importing into a spreadsheet (note
       that one	can subtract field values from the base	set of fields provided
       by the default option as	long as	it is given first).

       An interesting option is	the -D prefix flag, which  turns  on  a	 field
       identifier  prefix  to  make  it	easy to	filter out particular kinds of
       data.  However, it is equally easy to turn on any  particular  kind  of
       output to the exclusion of others directly by means of the flags.

       Two other flags of interest to novices to random	number generator test-
       ing  are	the -D histogram (turns	on a histogram of the underlying pval-
       ues, per	test) and -D description (turns	on a  complete	test  descrip-
       tion,  per test).  These	flags turn the output table into more of a se-
       ries of "reports" of each test.

PUBLICATION RULES
       dieharder is entirely original code and can be  modified	 and  used  at
       will by any user, provided that:

	 a) The	original copyright notices are maintained and that the source,
       including  all  modifications, is made publically available at the time
       of any derived publication.  This is open source	software according  to
       the  precepts and spirit	of the Gnu Public License.  See	the accompany-
       ing file	COPYING, which also must accompany any redistribution.

	 b) The	primary	author of the code (Robert G. Brown) is	 appropriately
       acknowledged and	referenced in any derived publication.	It is strongly
       suggested  that	George Marsaglia and the Diehard suite and the various
       authors of the Statistical Test Suite be	 similarly  acknowledged,  al-
       though  this  suite shares no actual code with these random number test
       suites.

	 c) Full responsibility	for the	accuracy, suitability, and  effective-
       ness  of	 the  program  rests  with  the	users and/or modifiers.	 As is
       clearly stated in the accompanying copyright.h:

       THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFT-
       WARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND  FITNESS,
       IN  NO EVENT SHALL THE COPYRIGHT	HOLDERS	BE LIABLE FOR ANY SPECIAL, IN-
       DIRECT OR CONSEQUENTIAL DAMAGES OR  ANY	DAMAGES	 WHATSOEVER  RESULTING
       FROM  LOSS  OF  USE, DATA OR PROFITS, WHETHER IN	AN ACTION OF CONTRACT,
       NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT	OF  OR	IN  CONNECTION
       WITH THE	USE OR PERFORMANCE OF THIS SOFTWARE.

ACKNOWLEDGEMENTS
       The  author of this suite gratefully acknowledges George	Marsaglia (the
       author of the diehard test suite) and the various authors of NIST  Spe-
       cial Publication	800-22 (which describes	the Statistical	Test Suite for
       testing pseudorandom number generators for cryptographic	applications),
       for  excellent  descriptions  of	the tests therein.  These descriptions
       enabled this suite to be	developed with a GPL.

       The author also wishes to reiterate that	the academic  correctness  and
       accuracy	 of the	implementation of these	tests is his sole responsibil-
       ity and not that	of the authors of the Diehard or STS suites.  This  is
       especially  true	where he has seen fit to modify	those tests from their
       strict original descriptions.

COPYRIGHT
       GPL 2b; see the file COPYING that accompanies the source	of  this  pro-
       gram.   This  is	 the "standard Gnu General Public License version 2 or
       any later version", with	the one	minor (humorous) "Beverage"  modifica-
       tion listed below.  Note	that this modification is probably not legally
       defensible  and	can  be	 followed  really pretty much according	to the
       honor rule.

       As to my	personal preferences in	beverages, red wine is great, beer  is
       delightful,  and	 Coca Cola or coffee or	tea or even milk acceptable to
       those who for religious or personal reasons wish	to avoid stressing  my
       liver.

       The Beverage Modification to the	GPL:

       Any satisfied user of this software shall, upon meeting the primary au-
       thor(s)	of this	software for the first time under the appropriate cir-
       cumstances, offer to buy	him or her or them a beverage.	This  beverage
       may  or	may  not  be  alcoholic, depending on the personal ethical and
       moral views of the offerer.  The	beverage cost need not exceed one U.S.
       dollar (although	it certainly may at the	whim of	the offerer:-) and may
       be accepted or declined with no further obligation on the part  of  the
       offerer.	 It is not necessary to	repeat the offer after the first meet-
       ing, but	it can't hurt...

dieharder		Copyright 2003 Robert G. Brown		  dieharder(1)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=dieharder&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages