# Applications for randomness: random numbers have been shown to be valuable in sampling, simulations, modeling, data encryption, gambling and even musical composition.

The mathematician, Robert R. Coveyou, said: "The generation of
random numbers is too important to be left to chance." Random
numbers are used in sampling, simulations, modeling, data encryption,
gambling and even musical composition. A random number is one selected
from a set of equally possible values. Any sequence of random numbers
must be statistically independent of the others.

There are two major methods of random number generation, each with their own strengths and applications: Pseudo-Random Number Generators (PRNGs) and True Random Number Generators (TRNGs). These can be compared by three characteris tics: efficiency, determinism and periodicity. Efficiency means that many numbers can be produced quickly. Determinism means that the sequence can be reproduced, provided that the starting point is known. Periodicity means that the sequence eventually repeats itself. The methods are compared in Figure 1.

These characteristics make PRNGs more suitable for sampling, simulations, modeling and musical composition, whereas TRNGs are more suitable for data encryption and gambling.

There are statistical test suites to evaluate randomness. Three of the more common ones are Diehard, Crypt-XS and NIST. The NIST tests are built on hypothesis testing, whether a specific sequence of zeroes and ones is random or not. The battery of 15 tests evaluates frequencies, cumulative sums, runs, ranks and periodicity. After the tests have been applied, a comparison of how well the results match their theoretical distribution can be done by performing a goodness of fit of the distribution of the p-values to a uniform distribution. One evaluation method is to compare the mean and variances of the p-values to those for a uniform distribution. Another evaluation method is to compute a chi-square statistic based on the frequency counts of p-values among bins. The procedure to generate PRNGs often uses a computational method such as the equation in Figure 2.

The first [P.sub.1] is the seed ([x.sub.0]) and determines the sequence of numbers generated, whereas the [P.sub.2], N and the subsequent [P.sub.1] values determine the characteristics of the PRNGs. The "mod N" signifies that the preceding portion of the equation is divided by N and the remainder calculated to produce the first random number. The first random number becomes the [P.sub.1] value for the second iteration of the equation to produce the second random number, and so on.

Other computational methods to generate PRNGs use probability functions. For example, SAS can use the standard normal distribution with a seed. The program and output are shown in Figure 3.

For sampling from multivariate distributions, functions such as randnormal, randmvt and randmultinomial can be used to generate samples from multivariate normal, multivariate Student's t and multinomial distributions, respectively.

Random sampling from a finite data set is used to determine conformance to specifications. A program and output using Proc surveyselect in SAS is shown in Figure 4 for selection of five random samples from a set of 10.

Simulations can be used in software, such as JMP, to evaluate or model the outputs of a process as a function of randomness in the factors and noise in the model. Once a model is created of output (y) as a function of inputs (xs) using Fit Model and the Prediction Profiler and Simulator selected, factor levels can be selected. For each factor, they can either by fixed at a specific value, given a random value with a specified distribution and parameters, given a value based on an expression that allows the user to create their own distributions, or using a multivariate normal when correlated factors exist. For the response, if only the response from the model needs to be evaluated, then no noise needs to be added to the response. Other options for the response are either adding normal random or multivariate random noise. A setup to generate 500 runs using a response y with 2 factors x1 and x2 and random noise from a normal distribution for each factor and random noise in the response is shown in Figure 5.

Using the simulated table output, a normal distribution of the response can be fit using the distribution platform. Specifications can be entered and capability calculated to determine a defect rate as shown in Figure 6.

The expected process mean is 421.5 with a standard deviation of 73.8. For a lower specification of 200 and an upper specification of 600, a capability Cpk of 0.81 is calculated with a defect rate of 0.9 %. This is only an initial estimate and needs to be confirmed with additional process data.

The procedures to generate TRNGs often measure a random physical occurrence, such as radioactive decay or atmospheric noise, although the use of dice or coin flipping is still used. Lavarand used a technique of running a hash function against images from a number of lava lamps.

Random numbers have been shown to be valuable in sampling, simulations, modeling, data encryption, gambling and even musical composition. Either computational or physical methods are used depending on the application. A battery of statistical tests for randomness is recommended for evaluation.

Note: SAS version 9.3 and JMP version 11.2.0 were used to generate data in figures.

Mark Anawis is a Principal Scientist and ASQ Six Sigma Black Belt at Abbott. He may be reached at editor@ScientificComputing.com.

There are two major methods of random number generation, each with their own strengths and applications: Pseudo-Random Number Generators (PRNGs) and True Random Number Generators (TRNGs). These can be compared by three characteris tics: efficiency, determinism and periodicity. Efficiency means that many numbers can be produced quickly. Determinism means that the sequence can be reproduced, provided that the starting point is known. Periodicity means that the sequence eventually repeats itself. The methods are compared in Figure 1.

These characteristics make PRNGs more suitable for sampling, simulations, modeling and musical composition, whereas TRNGs are more suitable for data encryption and gambling.

There are statistical test suites to evaluate randomness. Three of the more common ones are Diehard, Crypt-XS and NIST. The NIST tests are built on hypothesis testing, whether a specific sequence of zeroes and ones is random or not. The battery of 15 tests evaluates frequencies, cumulative sums, runs, ranks and periodicity. After the tests have been applied, a comparison of how well the results match their theoretical distribution can be done by performing a goodness of fit of the distribution of the p-values to a uniform distribution. One evaluation method is to compare the mean and variances of the p-values to those for a uniform distribution. Another evaluation method is to compute a chi-square statistic based on the frequency counts of p-values among bins. The procedure to generate PRNGs often uses a computational method such as the equation in Figure 2.

The first [P.sub.1] is the seed ([x.sub.0]) and determines the sequence of numbers generated, whereas the [P.sub.2], N and the subsequent [P.sub.1] values determine the characteristics of the PRNGs. The "mod N" signifies that the preceding portion of the equation is divided by N and the remainder calculated to produce the first random number. The first random number becomes the [P.sub.1] value for the second iteration of the equation to produce the second random number, and so on.

Other computational methods to generate PRNGs use probability functions. For example, SAS can use the standard normal distribution with a seed. The program and output are shown in Figure 3.

For sampling from multivariate distributions, functions such as randnormal, randmvt and randmultinomial can be used to generate samples from multivariate normal, multivariate Student's t and multinomial distributions, respectively.

Random sampling from a finite data set is used to determine conformance to specifications. A program and output using Proc surveyselect in SAS is shown in Figure 4 for selection of five random samples from a set of 10.

Simulations can be used in software, such as JMP, to evaluate or model the outputs of a process as a function of randomness in the factors and noise in the model. Once a model is created of output (y) as a function of inputs (xs) using Fit Model and the Prediction Profiler and Simulator selected, factor levels can be selected. For each factor, they can either by fixed at a specific value, given a random value with a specified distribution and parameters, given a value based on an expression that allows the user to create their own distributions, or using a multivariate normal when correlated factors exist. For the response, if only the response from the model needs to be evaluated, then no noise needs to be added to the response. Other options for the response are either adding normal random or multivariate random noise. A setup to generate 500 runs using a response y with 2 factors x1 and x2 and random noise from a normal distribution for each factor and random noise in the response is shown in Figure 5.

Using the simulated table output, a normal distribution of the response can be fit using the distribution platform. Specifications can be entered and capability calculated to determine a defect rate as shown in Figure 6.

The expected process mean is 421.5 with a standard deviation of 73.8. For a lower specification of 200 and an upper specification of 600, a capability Cpk of 0.81 is calculated with a defect rate of 0.9 %. This is only an initial estimate and needs to be confirmed with additional process data.

The procedures to generate TRNGs often measure a random physical occurrence, such as radioactive decay or atmospheric noise, although the use of dice or coin flipping is still used. Lavarand used a technique of running a hash function against images from a number of lava lamps.

Random numbers have been shown to be valuable in sampling, simulations, modeling, data encryption, gambling and even musical composition. Either computational or physical methods are used depending on the application. A battery of statistical tests for randomness is recommended for evaluation.

Note: SAS version 9.3 and JMP version 11.2.0 were used to generate data in figures.

Mark Anawis is a Principal Scientist and ASQ Six Sigma Black Belt at Abbott. He may be reached at editor@ScientificComputing.com.

Figure 1: Comparison of two major methods of random number generation Method Comparison Pseudo-Random True Random Number Generators Number Generators Efficient Inefficient Deterministic Nondeterministic Periodic Aperiodic Figure 2: Example of an equation to generate Pseudo-Random Number Generators Random Number Generator Equation [x.sub.n+1] = [P.sub.1][x.sub.n] + [P.sub.2] (mod N) where n = 0,1,2,3, .... Figure 3: SAS can use the standard normal distribution with a seed. The SAS System Obs i randomlist 1 1 0.89803 2 2 0.65453 3 3 0.30436 4 4 0.72410 5 5 0.69216 6 6 0.76219 7 7 0.01738 8 8 0.86332 9 9 0.44058 10 10 0.32769 Figure 4: Using Proc survey select in SAS The SURVEYSELECT Procedure Selection Method Simple Random Sampling Input Data Set SAMPLES Random Number Seed 870897001 Sample Size 5 Selection Probability 0.5 Sampling Weight 2 Output Data Set SAMPLESRS Obs sample data 1 b 1637 2 d 1233 3 e 1731 4 f 1470 5 h 1347 Figure 5: A setup to generate 500 runs using a response y with 2 factors x1 and x2 and random noise from a normal distribution for each factor and random noise in the response. JMP Simulator from Fit Model, Prediction Profiler option Response y Whole Model Summary of Fit RSquare 0.44652 RSquare Adj 0238383 Root Mean Square Error 61.16517 Mean of Response 421.2 Observations (or Sum Wgts) 10 Figure 6: Specifications can be entered and capability calculated to determine a defect rate. JMP Distribution and Capability Analysis of Simulator Output Summary Statistics Mean 421.53164 Std Dev 73.808057 Std Err Mean 1.0438035 Upper 95% Mean 423.57795 Lower 95% Mean 419.48532 N 5000 Capability Index Lower CI Upper CI CP 0.903 0.886 0.921 CPK 0.806 0.788 0.824 CPM 0.867 0.850 0.884 CPL 1.000 0.979 1.022 CPU 0.806 0.788 0.824

Printer friendly Cite/link Email Feedback | |

Title Annotation: | DATA ANALYSIS |
---|---|

Author: | Anawis, Mark |

Publication: | Scientific Computing |

Date: | Nov 1, 2014 |

Words: | 1222 |

Previous Article: | JMP 11: remarkable statistics, graphics and integration: designed for the technician, scientist, engineer and businessperson. |

Next Article: | Exploration and analysis of DNA microarray and other high-dimensional data: new strategies to clean, normalize and analyze data are covered with... |

Topics: |