Printer Friendly

Applications for randomness: random numbers have been shown to be valuable in sampling, simulations, modeling, data encryption, gambling and even musical composition.

The mathematician, Robert R. Coveyou, said: "The generation of random numbers is too important to be left to chance." Random numbers are used in sampling, simulations, modeling, data encryption, gambling and even musical composition. A random number is one selected from a set of equally possible values. Any sequence of random numbers must be statistically independent of the others.

There are two major methods of random number generation, each with their own strengths and applications: Pseudo-Random Number Generators (PRNGs) and True Random Number Generators (TRNGs). These can be compared by three characteris tics: efficiency, determinism and periodicity. Efficiency means that many numbers can be produced quickly. Determinism means that the sequence can be reproduced, provided that the starting point is known. Periodicity means that the sequence eventually repeats itself. The methods are compared in Figure 1.

These characteristics make PRNGs more suitable for sampling, simulations, modeling and musical composition, whereas TRNGs are more suitable for data encryption and gambling.

There are statistical test suites to evaluate randomness. Three of the more common ones are Diehard, Crypt-XS and NIST. The NIST tests are built on hypothesis testing, whether a specific sequence of zeroes and ones is random or not. The battery of 15 tests evaluates frequencies, cumulative sums, runs, ranks and periodicity. After the tests have been applied, a comparison of how well the results match their theoretical distribution can be done by performing a goodness of fit of the distribution of the p-values to a uniform distribution. One evaluation method is to compare the mean and variances of the p-values to those for a uniform distribution. Another evaluation method is to compute a chi-square statistic based on the frequency counts of p-values among bins. The procedure to generate PRNGs often uses a computational method such as the equation in Figure 2.

The first [P.sub.1] is the seed ([x.sub.0]) and determines the sequence of numbers generated, whereas the [P.sub.2], N and the subsequent [P.sub.1] values determine the characteristics of the PRNGs. The "mod N" signifies that the preceding portion of the equation is divided by N and the remainder calculated to produce the first random number. The first random number becomes the [P.sub.1] value for the second iteration of the equation to produce the second random number, and so on.

Other computational methods to generate PRNGs use probability functions. For example, SAS can use the standard normal distribution with a seed. The program and output are shown in Figure 3.

For sampling from multivariate distributions, functions such as randnormal, randmvt and randmultinomial can be used to generate samples from multivariate normal, multivariate Student's t and multinomial distributions, respectively.

Random sampling from a finite data set is used to determine conformance to specifications. A program and output using Proc surveyselect in SAS is shown in Figure 4 for selection of five random samples from a set of 10.

Simulations can be used in software, such as JMP, to evaluate or model the outputs of a process as a function of randomness in the factors and noise in the model. Once a model is created of output (y) as a function of inputs (xs) using Fit Model and the Prediction Profiler and Simulator selected, factor levels can be selected. For each factor, they can either by fixed at a specific value, given a random value with a specified distribution and parameters, given a value based on an expression that allows the user to create their own distributions, or using a multivariate normal when correlated factors exist. For the response, if only the response from the model needs to be evaluated, then no noise needs to be added to the response. Other options for the response are either adding normal random or multivariate random noise. A setup to generate 500 runs using a response y with 2 factors x1 and x2 and random noise from a normal distribution for each factor and random noise in the response is shown in Figure 5.

Using the simulated table output, a normal distribution of the response can be fit using the distribution platform. Specifications can be entered and capability calculated to determine a defect rate as shown in Figure 6.

The expected process mean is 421.5 with a standard deviation of 73.8. For a lower specification of 200 and an upper specification of 600, a capability Cpk of 0.81 is calculated with a defect rate of 0.9 %. This is only an initial estimate and needs to be confirmed with additional process data.

The procedures to generate TRNGs often measure a random physical occurrence, such as radioactive decay or atmospheric noise, although the use of dice or coin flipping is still used. Lavarand used a technique of running a hash function against images from a number of lava lamps.

Random numbers have been shown to be valuable in sampling, simulations, modeling, data encryption, gambling and even musical composition. Either computational or physical methods are used depending on the application. A battery of statistical tests for randomness is recommended for evaluation.

Note: SAS version 9.3 and JMP version 11.2.0 were used to generate data in figures.

Mark Anawis is a Principal Scientist and ASQ Six Sigma Black Belt at Abbott. He may be reached at editor@ScientificComputing.com.
Figure 1: Comparison of two major methods of
random number generation

Method Comparison

Pseudo-Random       True Random
Number Generators   Number Generators

Efficient           Inefficient
Deterministic       Nondeterministic
Periodic            Aperiodic

Figure 2: Example of an equation to generate
Pseudo-Random Number Generators

Random Number Generator Equation

[x.sub.n+1] = [P.sub.1][x.sub.n] + [P.sub.2] (mod N)

where n = 0,1,2,3, ....

Figure 3: SAS can use the standard normal
distribution with a seed.

The SAS System

Obs   i   randomlist

1    1    0.89803
2    2    0.65453
3    3    0.30436
4    4    0.72410
5    5    0.69216
6    6    0.76219
7    7    0.01738
8    8    0.86332
9    9    0.44058
10   10   0.32769

Figure 4: Using Proc survey select in SAS

The SURVEYSELECT Procedure

Selection Method        Simple Random Sampling

Input Data Set            SAMPLES
Random Number Seed      870897001
Sample Size                     5
Selection Probability         0.5
Sampling Weight                 2
Output Data Set         SAMPLESRS

Obs   sample   data

1     b        1637
2     d        1233
3     e        1731
4     f        1470
5     h        1347

Figure 5: A setup to generate 500 runs using a response y with 2
factors x1 and x2 and random noise from a normal distribution for
each factor and random noise in the response.

JMP Simulator from Fit Model, Prediction Profiler option

Response y
Whole Model
Summary of Fit

RSquare                       0.44652
RSquare Adj                   0238383
Root Mean Square Error       61.16517
Mean of Response                421.2
Observations (or Sum Wgts)         10

Figure 6: Specifications can be entered and capability calculated to
determine a defect rate.

JMP Distribution and Capability Analysis of Simulator Output

Summary Statistics

Mean                 421.53164
Std Dev              73.808057
Std Err Mean         1.0438035
Upper 95% Mean       423.57795
Lower 95% Mean       419.48532
N                        5000

Capability   Index   Lower CI   Upper CI

CP           0.903      0.886      0.921
CPK          0.806      0.788      0.824
CPM          0.867      0.850      0.884
CPL          1.000      0.979      1.022
CPU          0.806      0.788      0.824
COPYRIGHT 2014 Advantage Business Media
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2014 Gale, Cengage Learning. All rights reserved.

 
Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:DATA ANALYSIS
Author:Anawis, Mark
Publication:Scientific Computing
Date:Nov 1, 2014
Words:1222
Previous Article:JMP 11: remarkable statistics, graphics and integration: designed for the technician, scientist, engineer and businessperson.
Next Article:Exploration and analysis of DNA microarray and other high-dimensional data: new strategies to clean, normalize and analyze data are covered with...
Topics:

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters