# Chapter 4: Probability distributions.

Sometimes we are interested not only in the probability associated with a particular event, but also in the distribution of probabilities over the whole range of possible events. For example, let's consider a fair die with faces numbered 1, 2, ..., 6. If X represents the random variable, the number of dots that turn up on the face of the die, we can write down the probabilities associated with each value of X as in equation 4.1. All of the probabilities are the same because we assume the

P(X = 1) = 1/6, P(X = 2) = 1/6, ..., P(X = 6) = 1/6 [4.1]

die is fair. The listing of the possible values of X and their associated probabilities is called a probability distribution. We can avoid writing each separate probability by using a formula such as equation 4.2. We note that the probabilities summed over all the possible values of X add to 1.

P(X) = 1/6 for (X = 1, 2, ..., 6). [4.2]

This is a characteristic of all probability distributions and it means that all possibilities are accounted for. We remember that all probabilities must be positive, or non-negative.

Probability distributions are somewhat related to frequency distributions. To review, a frequency distribution is a listing of all of the possible outcomes of a variable that has been divided into classes, along with the frequency associated with each class. A probability distribution lists all of the possible outcomes of a random variable along with the probability associated with each outcome, rather than frequencies.

The purpose of many statistical problems is to select the best probability distribution to portray the random variable, as we mentioned before. Prior to examining the distributions used most often in statistics, we look at probability density functions.

[FIGURE 4.1 OMITTED]

Probability Density Functions

A probability density function, abbreviated pdf, may be either discrete or continuous. A discrete pdf is a point function defined over a finite sample space; hence it takes on only a finite number of values. The pdf assigns a probability to each possible outcome of the discrete random variable. A continuous pdf is a set function that represents a distribution in which a probability is assigned to a range of values from a continuous random variable. The continuous pdf assigns a probability to the range 3.15 to 3.25 for the continuous random variable X, the diameters of shafts produced by a farm machinery company (figure 4.1).

A pdf is set apart from a probability distribution in that a pdf is really a rule for assigning a probability to the events of an experiment, while a probability distribution is an orderly presentation or arrangement of the probabilities once they have been assigned. With this in mind, let's examine a few of the more common probability distributions.

Binomial Probability Distribution

A frequently encountered set of statistical problems involves random events for which there are only two possible outcomes. The outcomes occur without any fixed pattern, and the probability of either outcome remains fixed for each trial. These problems are called Bernoulli trials. Bernoulli systems have two outcomes--the outcome we want or are interested in, called a success, and the other outcome, called a failure. We sometimes know with certainty the proportion of successes obtained from a very large number of trials and call it a parameter.

The probability density function that comes from Bernoulli trials is the binomial pdf, which gives the probability, p, of r successes in n trials of an experiment. Of course, if we know the number of successes, r, we also know the number of failures, n - r, since there are only two outcomes to the experiment. We can easily compute the probability of a failure as 1 - p, which we sometimes write as q. The formula for the binomial pdf is presented in equation 4.3.

P(r | np) = [sub.n[C.sub.r][p.sub.r][(1 - p).sup.n - r] or [sub.n][C.sub.r][p.sup.r][q.sup.n - r] [4.3]

Consider the following example. Suppose we toss a fair die four times and want to know the probability of obtaining three 1s in those four tosses. First, we compute the number of ways of obtaining three 1s in four tosses by using combinations; thus

[sub.4][C.sub.3] = 4!/(4 - 3)!3! = 4 ways.

We obtain the probability that any of the four ways will occur by the multiplication rule. For instance, the probability of obtaining the sequence of three 1s (1) and one value other than 1 (1') is

P(1111') = (1/6)(1/6)(1/6)(5/6) = 5/1296.

Thus, the probability of getting three 1s in four tosses of a die is (4)(5/1296) = 20/1296 or 5/324.

We can express the same problem using the binomial pdf as

[sub.4][C.sub.3] pppq or [sub.4][C.sub.3][p.sup.3][q.sup.1] or (4)[(1/6).sup.3][(5/6).sup.1] = 5/324.

Let's consider another example and this time use equation 4.3 to solve it. Suppose a litter of ten pigs is born and we want to know the probability that it contains only one male. We can use the binomial pdf with r = 1, n = 10, and p = 0.5. Thus

P(r = 1 | n = 10, p = 0.5) = [sub.10][C.sub.1][(0.5).sup.1][(0.5).sup.9] = 10!/1!9!(0.5)(0.00195) = 0.0098,

which makes it very unlikely.

The term binomial comes from the way the pdf is set up. Its value for given values of r, n, and p is equivalent to the corresponding term of the expansion for the binomial expression [(p + q).sup.n].

The formula for the binomial pdf provides a means of computing the probability for every possible number of successes in a given number of trials. An orderly arrangement of these probabilities is called a binomial probability distribution. Thus, the binomial pdf defines an entire family of distributions, one for every combination of values for n and p, the parameters of these distributions. While only distributions with the value of p = 0.5 are symmetric, distributions with small values of p are reasonably symmetric, especially for large n. Thus, we can use the normal curve, which is symmetric, as an approximation to the binomial, and vice-versa.

There are cases when we want to find the probability of "r or more" successes or of "r or less" successes in n trials. In effect, what we want is the probability corresponding to the area under one of the tails of a binomial distribution. We obtain such probabilities by adding the binomial probabilities for the particular events. For experiments with large numbers of trials, this summation can be long and boring, especially if the value of r is not near zero or near n. For this reason, tables of binomial probabilities are published that give the tail areas for a large range of values for n, p, and r. See Appendix table 2, page 221 for an example table.

The Mean and Variance of a Bernoulli Process

It is fairly easy to derive the mean and variance of the binomial distribution because we can depict Bernoulli trials in several ways. Suppose we have an experiment in which two animals are drawn, with replacement, from a large pen in which half of the animals are male, M, and the other half are female, F. If the total number of animals is N, then the proportion of females is FIN = p. Thus, the probability of a success, selecting a female animal, F, is p = 1/2 and the probability of a failure, selecting a male animal, M, is (1 - p) = 1/2. Rather than run this through the binomial pdf, let's use a different approach. Assume we draw a random sample of n animals from the pen and we have the discrete random variable X that takes on the value 1 for a success (getting a female, F) and a value of 0 for a failure. In this manner, we define n independent random variables, [X.sub.1], [X.sub.2], ..., [X.sub.n], one for each of the individual trials. To make the example more concrete, suppose we randomly select seven animals from the pen and the outcomes are as listed in the following set:

{F, M, F, F, M, F, M}.

That is, we have the data presented in table 4.1. The expected value of X is stated in equation 4.4.

[mu] = E(X) = [summation]X x P(X) = (1)(p) + (0)(1 - p) = p [4.4]

The formula for the variance of X, noted Var (X) or [[sigma].sup.2] appears in equation 4.5. For our example,

[[sigma].sup.2] = E[(X - [mu]).sup.2] or E[(X - p).sup.2] = [summation][(X - p).sup.2] x P(X) [4.5]

it is computed in equation 4.6. Now r, the number of successes in n trials, is expressed by the

[[sigma].sup.2] = [(1 - p).sup.2] x p + [(0 - p).sup.2] x (1 - p) = p(1- p) [4.6]

sum in equation 4.7 and it follows that for n = 7, the expected value of r is calculated in equation 4.8.

r = [X.sub.1] + [X.sub.2] + ... + [X.sub.n] [4.7]

E(r) = E([X.sub.1]) + E([X.sub.2]) + ... + E([X.sub.7]) = p + p + ... + p = np [4.8]

Similarly, the variance of r is expressed in equation 4.9 and the standard deviation of r is

Var (r) = p(1 - p) + p(1 - p) + ... + p(1 - p) = np(1 - p) or npq [4.9]

the square root of the variance.

Consider the following example. A farm supply dealer has a bin of parts that we know contains 10 percent defectives. If we consider the selection of a defective part as a success, S, and the selection of a good part as a failure, F, then the expected value of the number of successes, r, per sample of two items is calculated from equation 4.8 as

E(r) = np = (2)(0.10) = 0.20,

and the variance of r for n = 2 is obtained from equation 4.9 as

Var (r) = npq = (2)(0.10)(0.90) = 0.18,

while the standard deviation is the square root of 0.18 or 0.42.

Poisson Probability Distribution

A second set of problems is described by the small probability of success for any one of several trials of an experiment. Such problems include, for example, applications in crop insurance such as the number of weather-related disasters in a given time period, the number of trucks arriving at a grain elevator during harvest and, hence, the kind of queue or waiting line that forms, the demand for items from a farm supply company's inventory, and the number of defects in a farm machinery manufacturer's product. The Poisson probability distribution is suitable for representing these kinds of problems. It can also be viewed as the limiting form of the binomial as well as a probability distribution in its own right.

As a limiting form of the binomial, it is most useful when the number of trials from a Bernoulli process, n, is large and the probability of a success, p, is small. In this case, the computations we make using the binomial pdf are long and tedious, but they are quite simple with the Poisson pdf. Actually, a Poisson distribution is not described by events with two possible outcomes and a constant probability of success as are Bernoulli trials. However, under certain conditions, we can use the Poisson distribution to solve Bernoulli problems. For example, cultivator frames are continuously manufactured by an extrusion process and the thickness of the wall of the tubing is occasionally below minimum acceptable standards. In a sample of 1,000 feet of such tubing, twenty defective places were found to randomly occur over the tubing, giving an average of two defects per 100 feet. If the probability of finding a specified number of defects per 100 feet of tubing remains constant, we can then consider this Poisson problem as a Bernoulli by viewing a very short length as an independent trial with an outcome of zero or more successes. As long as we keep the mean, np, equal to two defects per 100 feet, we can divide the 100-foot length into 10-foot segments, or 1-foot segments, or whatever is convenient. As the number of pieces, n, becomes large, the probability of more than one defect per piece becomes very small and the difference between the binomial pdf and the Poisson pdf becomes negligible.

Thus, we can say that in a Bernoulli problem, probabilities are associated with the probability, p, of a success in n independent trials and r number of successes. In a Poisson problem, we express probabilities for a given number of successes, 1, per unit of space (such as a short length of tubing) and the number of successes, m, in a given amount of space. The letters p and l have the same meaning, and so do r and m. Thus, the expected number of successes, np, for Bernoulli trials corresponds to lm (equation 4.10), the expected number of successes in a given amount of space

[mu] = lm [4.10]

for a Poisson problem.

Consider the following example. In a study of the arrivals of grain trucks at an elevator, the probability of an arrival during any given minute, a trial, is p = 0.0333, while the expected number of arrivals per half hour is

np = (30)(0.0333) = 1.

By viewing a minute of time as a unit of space, 1, and a half hour as the given amount of space, m, then

[mu] = (0.0333)(30) = 1.

We can compute the binomial probabilities for r successes in n trials by our pdf using equation 4.11.

P(r | n, p) = [sub.n][C.sub.r][p.sup.r][q.sup.n - r] [4.11]

We know that for small values of p, the resulting probability distribution is highly skewed to the right. For very small values of p and large values of n, as long as np remains constant, the limit of the binomial is the Poisson, which is written in equation 4.12

lim P(r|n,p) = [(np).sup.r]/r! [e.sup.-np] [4.12]

where e is the base of natural logarithms with the value 2.71828. Now if we use [mu] in place of np, then we can finally write the formula for the Poisson (equation 4.13).

P(r|[mu]) = [[mu].sup.r.]/r! [e.sup.-[mu]] [4.13]

Thus, the Poisson distribution has one parameter, R,, and we can determine the entire distribution once we know it; the E(r) = [mu] and Var (r) = [mu]. The standard deviation is the square root of [mu]. These results come directly from those for the binomial distribution since for it E(r) = np and Var (r) = npq and the Poisson is the limit of the binomial distribution as n approaches infinity and p approaches zero, so that np = [mu] remains constant. So npq approaches np = [mu] as q approaches 1.

[FIGURE 4.2 OMITTED]

Consider the following example. Inspectors in a farm machinery manufacturing plant check 300-foot lengths of hydraulic hose for defects and record the number of defects for each length of hose. From past experience, we know that the number of defects per length of hose follows a Poisson distribution. Suppose we inspect a sample of twenty lengths of hose and observe ten defects. Thus, our estimate of [mu] = 10/20 = 0.5 is the sample average and the variance. The standard deviation is the square root of 0.5, or 0.7.

The Normal Probability Distribution

While the binomial distribution is the most commonly used probability distribution for discrete variables, the normal is the most helpful continuous distribution. The graph of a general normal distribution looks like the symmetrical, bell-shaped distribution shown in figure 4.2. Its mean, [mu], falls at the midpoint of the distribution and its standard deviation, [sigma], along with [mu], completely define the distribution. Since this is the theoretical distribution or true distribution associated with all of the values of the population, we use the Greek letters to refer to the values of the mean and standard deviation rather than [bar.X] and S, which we will use later when we discuss the mean and standard deviation of samples drawn from this distribution.

Some of the characteristics of a normal curve are:

1. The area under the curve between [mu] - [sigma] and [mu] + [sigma] is approximately 68 percent of the total area.

2. The area under the curve between [mu] - 2[sigma] and [mu] + 2[sigma] is approximately 95 percent of the total area.

3. The area under the curve between [mu] - 3[sigma] and [mu] + 3[sigma] is 99.7 percent of the total area.

Thus, while the random variable X theoretically ranges from minus infinity to plus infinity under the normal curve, almost all of the probability is contained in the range of the mean plus or minus three standard deviations.

The location and shape of the normal curve are completely determined by the values of its mean and standard deviation. The value of the mean locates the center of the distribution along the real number line, whereas the value of the standard deviation determines the extent of the curve's spread about the mean. Since all normal curves representing theoretical probability distributions have a total area of one, as the standard deviation increases, the curve must decrease in height and spread out. Because its shape is completely determined by its standard deviation, we may reduce all normal curves to a standard one by a simple change of variable. The easiest normal curve to work with is one with a mean of 0 and a standard deviation of 1 (the standard normal curve, figure 4.2). Thus, when we need to calculate probabilities for any normal curve, we first reduce it to this standard one. The normal pdf is written in equation 4.14 and is not easily

P(X) = 1/[sigma][square root of 2 [pi]]. e -[(x - [mu]).sup.2/2[[sigma].sup.2] [4.14]

evaluated; therefore, we use Appendix table 7 to obtain probabilities for the standard normal distribution. This is possible because a point on the x axis of the standard normal curve corresponds to a point on the x axis of any normal curve, and we can determine its value by stating how many standard deviations it is away from the mean. For example, suppose we have two curves that differ only by their standard deviations, i.e., the first has mean 0 and standard deviation 1, while the second has mean 0 and standard deviation 3. We can make the second conform to the first by changing the x axis and compressing the curve to one-third its length, i.e., by dividing every term by [sigma] = 3. Thus, the point X = 6 is equivalent to the point X = 2 on the standard normal curve because 6 is two standard deviations to the right of the mean, 0. In general, if a point X on the axis of some normal curve with mean R, and standard deviation Q corresponds to a point Z on the standard normal curve, then the point X is Z standard deviations to the right of [mu] or X = [mu] + Z[sigma], which we can rewrite as equation 4.15 by using algebra to solve for Z.

Z = (X - [mu])/[sigma] [4.15]

This statement for Z corresponds to the way we standardized individual X values earlier. By this practice of expressing all X values from any given normal curve in terms of corresponding Z values for the standard normal curve, we can reduce all normal curves to the standard one and may use Appendix table 7 for the standard normal for finding probabilities under any normal curve. For example, suppose X belongs to a normal distribution with mean 230 and standard deviation 20, i.e., X ~ N(230, 20). What is the probability of obtaining a value of X from this distribution that is as large or larger than 280?

Z = 280 - 230/20 = 50/50 = 2.5

From Appendix table 7, the probability between the mean and Z is 0.4938. We get the answer by subtracting this value from 0.5, the probability under the curve to the right of the mean, or

0.5000 - 0.4938 = 0.0062.

Thus, the probability that a value of X from a normal distribution with mean 230 and standard deviation 20 is 280 or larger is 0.0062. Now consider a second question. What is the probability that a value of X from this distribution is smaller than 270?

Z = 270-230/20 = 40/20 = 2.0

From the appendix table, the probability between the mean and Z is 0.4772 and the probability to the left of the mean is 0.5. Thus, we find the probability that a value of X from this distribution is smaller than 270 by

0.5000 + 0.4772 = 0.9772.

Notice that these probabilities correspond to areas under the curve, and for us to sum or subtract probabilities is equivalent to summing or subtracting areas (figure 4.3). However, the probability that X is equal to a single value is not defined, since no area corresponds to a point. We always pose probability questions for a range of values for the random variable X when dealing with continuous distributions such as the normal. In the preceding example, it does not matter, therefore, whether we say "find the probability that X is less than 270" or "find the probability that X is less than or equal to 270." The answer is the same.

[FIGURE 4-3 OMITTED]

Normal Approximation to Binomial Probabilities

When we have Bernoulli trials and want to calculate the probability of r successes in n trials where n is large, sometimes we use the normal approximation to the binomial distribution. For example, if we select a sample of n = 20 items randomly from a manufacturing process in which p = 0.4, and we want to know the probability of obtaining exactly five defectives, i.e., the probability that r = 5, we use the binomial pdf. Thus, we get

P(r = 5 | n = 20, p = .4) = [sub.20][C.sub.5] x [4.sup.5] x [6.sup.15] = 0.0746.

We can compute the normal approximation of this binomial probability by solving for E(r) and [[sigma].sub.r] and using them to find the approximate normal area. We recall that

E(r) = np = 20 x 0.4 = 8, and

[[sigma].sub.r] = [(npq).sup.1/2] = [(20 x 0.4 x 0.6).sup.1/2] = 2.19.

To find the probability for the r value, we use what at first glance might seem to be trickery, but actually it is not. As we discussed earlier, the normal curve is a continuous curve and we cannot find the probability for a point such as r = 5. In this case there is no problem, for if we construct a histogram of the binomial, we see that r = 5 is really the midpoint of the class represented by that bar of the histogram and the endpoints of the class, or the bar, are 4.5 and 5.5. Using these endpoints as the values for r, then we find the area under the normal distribution that approximates the area of the bar of the histogram, which is the probability we are seeking. So let's do that. First we standardize each value of r by calculating Z values; next we find the corresponding probabilities from Appendix table 7. Finally we subtract the probabilities to obtain the approximate probability for the area of the bar.

[Z.sub.1] = 4.5 - 8/2.19 = -3.5/2.19 = -1.60

[Z.sub.2] = 5.5 - 8/2.19 = -2.5/2.19 = -1.14

From the appendix table, the probability that

[Z.sub.1] = -1.60 is 0.4452 and that

[Z.sub.2] = -1.14 is 0.3729.

Thus, the

P(4.5 [less than or equal to] r [less than or equal to] 5.5) = 0.4452 - 0.3729 = 0.0723.

This normal probability closely approximates the true binomial probability of 0.0746. Therefore, we get sufficiently close answers with the normal approximation and we find the calculations are generally easier than with the binomial pdf to justify its use, especially when we want cumulative binomial probabilities. The normal pdf is the limit of the binomial as n becomes very large; consequently the normal approximation improves as n becomes larger. As a rule of thumb, we use the normal approximation anytime the E(r) > 5 when p < 0.5 and/or nq > 5 when p > 0.5. The normal approximation is poorer in the tails of the binomial than nearer the mean.

Exercises

1. The Farm Co., a local farm supply business, gives customers who pay in ten days a 1 percent discount, with the full amount due after ten days. In the past, 20 percent of its customers have paid in ten days. In a recent month, The Farm Co. sent out seven invoices. What is the probability that:

a. Two customers take the discount?

b. At most two customers take the discount?

c. No customers take the discount?

d. All seven take the discount?

e. Five or more take the discount?

2. Find the mean and variance of the binomial probability distribution in question 1.

3. A produce buyer for Great-Value Supermarkets knows from past experience that 2 percent of the tomatoes received from Mexico are bad and must be discarded. The buyer inspects a shipment that has just arrived by taking a random sample of four tomatoes. What is the probability that:

c. At least two are bad?

d. At most one is bad?

4. A veterinarian knows from past experience that the probability is 30 percent that a calf with a certain disease will recover. The vet is called to a local ranch and diagnoses six calves with the disease. What is the probability that:

a. All six calves will recover?

b. At most two will recover?

c. None will recover?

d. Draw the probability distribution using a bar chart. Is it symmetrical or skewed? What are its mean and standard deviation?

5. The number of accidents per month in the All Natural Meat Packing Co. is distributed according to the Poisson distribution with a mean of 0.5.

During the last month, what is the probability that there were:

a. No accidents?

b. One accident?

c. More than one accident? Hint: Use the complement rule.

6. The number of pickup calls to the Triple A Rendering Co. is distributed according to the Poisson distribution with a mean of three per hour. What is the probability that during the next hour the company receives:

a. At least two calls?

b. One call?

c. No calls?

7. The sales department of The Cotton House, a seed company, makes 500 phone calls a day during its busy season. The probability that any call results in a sale is 0.02. Use the Poisson approximation to the binomial to find the probability that the 500 calls result in:

a. Exactly ten sales.

b. More than fifteen sales.

8. For the standard normal distribution, find the probability that Z is:

a. Less than 1.

b. Between 1 and 1.5.

c. Greater than 2.17.

d. Between 0 and -0.83.

e. Less than -1.66.

f. Between -1.2 and 0.34.

9. Given the following probabilities from a standard normal distribution, find the Z value such that:

a. P(Z < -[z.sub.o]) = 0.05

b. P(Z > [z.sub.o]) = 0.1628

c. P(-[z.sub.o] < Z < [z.sub.o]) = 0.7540

d. P(Z > [z.sub.o]) = 0.0485

10. The weight of a fruit dessert packaged by Fruity Co-op is normally distributed with a mean of [mu] = 10 ounces and standard deviation [sigma] = 0.4 ounces. If we select a package at random, what is the probability that it weighs:

a. Less than 10 ounces?

b. Less than 10.7 ounces?

c. Between 9.6 and 10.7 ounces?

11. The birth weights of Hampshire pigs are normally distributed with a mean of 4 pounds and a standard deviation of 0.5 pounds. If a pig is randomly selected from a large group of Hampshire newborns, what is the probability that it weighs:

a. More than 3.5 pounds?

b. More than 5 pounds?

c. Between 4.5 and 5 pounds?

d. Between 3.5 and 4.5 pounds?

12. If the number of blooms on 6-inch pots of chrysanthemums is normally distributed with a mean of 16 and a standard deviation of 2, what is the probability that a pot selected at random from a large greenhouse full of these pots will contain:

a. Less than fourteen blooms?

b. Less than twenty blooms?

c. Between twelve and twenty blooms?

d. 90 percent of the pots will contain more than -- blooms.
```TABLE 4.1 Outcomes of Seven Trials in Which Male and Female Animals
are Selected Randomly

Trial         Outcome       Value of X      Variable
Defined

1                F              1          [X.sub.1]
2                M              0          [X.sub.2]
3                F              1          [X.sub.3]
4                F              1          [X.sub.4]
5                M              0          [X.sub.5]
6                F              1          [X.sub.6]
7                M              0          [X.sub.7]
```