Bayesian approach to assessing uncertainty and calculating a reference value in Key Comparison experiments.International experiments called Key Comparisons pose an interesting statistical problem, the estimation estimation In mathematics, use of a function or formula to derive a solution or make a prediction. Unlike approximation, it has precise connotations. In statistics, for example, it connotes the careful selection and testing of a function called an estimator. of a quantity called a Reference Value. There are many possible forms that this estimator can take. Recently, this topic has received much international attention. In this paper, it is argued that a fully Bayesian approach to this problem is compatible with the current practice of metrology metrology Science of measurement. Measuring a quantity means establishing its ratio to another fixed quantity of the same kind, known as the unit of that kind of quantity. , and can easily be used to create statistical models which satisfy the varied properties and assumptions of these experiments. Key words: Bayesian hierarchical models In a hierarchical data model, data are organized into a tree-like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent. ; linear opinion pool; Markov Chain Monte Carlo Markov chain Monte Carlo (MCMC) methods (which include random walk Monte Carlo methods), are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. methods; synthesis of probability distributions Many probability distributions are so important in theory or applications that they have been given specific names. Discrete distributions With finite support
1. Introduction In October of 1999, the directors of the national metrology institutes of 38 of the member states of the Metre Convention, signed a document called the Mutual Recognition Agreement (MRA MRA Medical Record Administrator. MRA Magnetic resonance angiography, see MR angiography ) [1] dealing with national measurement standards. The objectives of the MRA are the establishment of degree of equivalence of national measurement standards maintained by the national metrology institutes, the recognition of calibration calibration /cal·i·bra·tion/ (kal?i-bra´shun) determination of the accuracy of an instrument, usually by measurement of its variation from a standard, to ascertain necessary correction factors. and measurement services provided by the institutes and consequently, the establishment of a secure technical foundation for wider agreements related to international trade. The process of achieving these objectives is through international comparisons of measurements called Key Comparisons. The overall coordination of the Key Comparisons is through the International Bureau for Weights and Measures weights and measures, units and standards for expressing the amount of some quantity, such as length, capacity, or weight; the science of measurement standards and methods is known as metrology. (BIPM BIPM - Bureau International des Poids et Mesures ) under the authority of the International Committee for Weights and Measures The International Committee for Weights and Measures is the English name of the Comité international des poids et mesures (CIPM, sometimes written in English Comité International des Poids et Mesures). (CIPM CIPM Comité International des Poids et Mesures (International Committee of Weights and Measures) CIPM Center for Integrated Pest Management CIPM Certificate in Investment Performance Measurement ). Details of the agreements can be found on the BIPM website, http://www.bipm.fr. There are general guidelines guidelines, n.pl a set of standards, criteria, or specifications to be used or followed in the performance of certain tasks. , given in Ref. [2], for the execution of the Key Comparison experiments, dealing with the selection of what is to be measured, who is to participate and how the results are to be disseminated disseminated /dis·sem·i·nat·ed/ (-sem´i-nat?ed) scattered; distributed over a considerable area. dis·sem·i·nat·ed adj. Spread over a large area of a body, a tissue, or an organ. . There are now Key Comparison experiments being performed in a great variety of disciplines. Some examples are temperature measurements, gas flow measurements, responsivity of photodiodes to various wavelengths of light, sound measurements and many more. Two classes of Key Comparisons have been defined in Refs. [3, 4]. A Class 1 Comparison is one where each participant measures a local standard, or possibly a traveling artifact A distortion in an image or sound caused by a limitation or malfunction in the hardware or software. Artifacts may or may not be easily detectable. Under intense inspection, one might find artifacts all the time, but a few pixels out of balance or a few milliseconds of abnormal sound . In such an experiment there may be a single quantity that is being measured (measurand) if the artifact is stable, but if not, there may be large systematic laboratory effects which need to be accounted for in the statistical model and analysis. Currently, most Key Comparisons belong to Class 1 and an example is given in Sec. 3. A Class 2 Comparison is one where all laboratories take measurements of a single physical state or property. Thus in a Class 2 experiment, there is a clearly defined single measurand. In either type of comparison, the laboratories perform their measurements and report their results back to a pilot laboratory. The results from each laboratory consist of an average of a set of measurements (often with a small sample size) of the quantity being measured and the total uncertainty in the measurements. The calculation of the uncertainty is prescribed pre·scribe v. pre·scribed, pre·scrib·ing, pre·scribes v.tr. 1. To set down as a rule or guide; enjoin. See Synonyms at dictate. 2. To order the use of (a medicine or other treatment). in the Guide to the Expression of Uncertainty in Measurement [5], a publication of the ISO (1) See ISO speed. (2) (International Organization for Standardization, Geneva, Switzerland, www.iso.ch) An organization that sets international standards, founded in 1946. The U.S. member body is ANSI. . The pilot laboratory assembles all of the data and calculates various measures of agreement among the participating laboratories. The exact form of the analysis is not prescribed, and there is considerable debate in the metrology community about this issue. However, it is stated in Ref. [2] that the pilot laboratory writes a report on the key comparison that should include "proposals for a reference value," called the Key Comparison Reference Value (KCRV KCRV Key Comparison Reference Value ). The KCRV is said to be "usually a close approximation approximation /ap·prox·i·ma·tion/ (ah-prok?si-ma´shun) 1. the act or process of bringing into proximity or apposition. 2. a numerical value of limited accuracy. to the corresponding SI value." In a Class 2 Key Comparison, the KCRV is clearly an estimate of the measurand. Class 1 Key Comparisons may have multiple measurands and so the interpretation of the KCRV is not straightforward. To date, there have been numerous proposals for the calculation of the KCRV. The articles given in Refs. [6-10] present some of the methods and more can be found in their references. The published methods have so far been based mostly on frequentist statistical models, even though, the Guide uses belief-based definition of probability, and so can be said to be more compatible with Bayesian statistical methods. This paper proposes fully Bayesian models for Key Comparison data analysis. It is shown that their assumptions are compatible with the approach of the Guide. The models allow for a unified approach to the analysis of Class 1 Key Comparisons, whether or not systematic laboratory effects are present. In Sec. 2, the Guide's definition of measurement uncertainty is fully described and a compatible Bayesian mathematical model
2. The Measurement Model The approach to quantification quan·ti·fy tr.v. quan·ti·fied, quan·ti·fy·ing, quan·ti·fies 1. To determine or express the quantity of. 2. of uncertainty in measurement, which is now widely used in the physical sciences, is that presented in the Guide to the Expression of Uncertainty in Measurement. The basic idea of the Guide is to approximate a measurement equation [mu] = g([[theta Theta A measure of the rate of decline in the value of an option due to the passage of time. Theta can also be referred to as the time decay on the value of an option. If everything is held constant, then the option will lose value as time moves closer to the maturity of the option. ].sub.1],..., [[theta].sub.p]), (1) where g is a known function, [mu]. denotes the measurand and [[theta].sub.1],..., [[theta].sub.p] denote de·note tr.v. de·not·ed, de·not·ing, de·notes 1. To mark; indicate: a frown that denoted increasing impatience. 2. p input quantities (random variables), by a first order Taylor series about the expected values Expected value The weighted average of a probability distribution. Also known as the mean value. of the [[theta].sub.i] s. The uncertainty in the measurand, denoted u([mu]), is then defined as the standard deviation In statistics, the average amount a number varies from the average number in a series of numbers. (statistics) standard deviation - (SD) A measure of the range of values in a set of numbers. of the probability distribution Probability distribution A function that describes all the values a random variable can take and the probability associated with each. Also called a probability function. probability distribution of [mu] based on this linear approximation linear approximation In mathematics, the process of finding a straight line that closely fits a curve (function) at some location. Expressed as the linear equation y = ax + b, the values of a and b , that is u([mu]) = [square root of ([summation summation n. the final argument of an attorney at the close of a trial in which he/she attempts to convince the judge and/or jury of the virtues of the client's case. (See: closing argument) ][c.sub.i.sup.2] var([[theta].sub.i]) + 2[summation over (i<j)] [c.sub.i][c.sub.j] cov([[theta].sub.i], [[theta].sub.j]))], (2) where the [c.sub.i] are the partial derivatives partial derivative In differential calculus, the derivative of a function of several variables with respect to change in just one of its variables. Partial derivatives are useful in analyzing surfaces for maximum and minimum points and give rise to partial differential of [mu] with respect to [[theta].sub.i], var denotes the variance The discrepancy between what a party to a lawsuit alleges will be proved in pleadings and what the party actually proves at trial. In Zoning law, an official permit to use property in a manner that departs from the way in which other property in the same locality , and cov denotes the covariance Covariance A measure of the degree to which returns on two risky assets move in tandem. A positive covariance means that asset returns move together. A negative covariance means returns vary inversely. . The Guide uses an interpretation of probability consistent with the Bayesian paradigm, that is, probability distributions of the [[theta].sub.i] and [mu] summarize sum·ma·rize intr. & tr.v. sum·ma·rized, sum·ma·riz·ing, sum·ma·riz·es To make a summary or make a summary of. sum our knowledge about these quantities. The expected values and variances of the [[theta].sub.i] may be based on actual physical measurements or on other information such as expert opinion. The Guide defines two types of uncertainty evaluations. Type A is "by the statistical analysis of series of observations" and this has usually been interpreted as "using sample standard deviations." Type B is by "other means" and this has usually meant using manufacturer specifications, expert knowledge or even data from additional experiments. An example of a Type B evaluated uncertainty is the uncertainty in the internal volume of a 100 ml flask flask (flask) 1. a laboratory vessel, usually of glass and with a constricted neck. 2. a metal case in which materials used in making artificial dentures are placed for processing. being used in a chemistry experiment. Here, the manufacturer may give a volume of 100 ml [+ or -] 0.1 ml. This could be interpreted as the volume having a rectangular rec·tan·gu·lar adj. 1. Having the shape of a rectangle. 2. Having one or more right angles. 3. Designating a geometric coordinate system with mutually perpendicular axes. distribution on the interval (99.9 ml, 100.1 ml), that is, having a standard deviation of 0.058. A key idea is that the data from the present experiment is not informative about sources of uncertainty evaluated by Type B methods. Such uncertainty is due to systematic effects that influence all of the observations in the experiment, such as for example, a flask not really having a volume of 100 ml. The most common analysis of a metrology experiment estimates the expected value of [mu] by y, the output of the measurement equation y = g([x.sub.1],...[x.sub.r], [[lambda].sub.1],..., [[lambda].sub.s]) (3) where p = r + s, r of the input quantities having physical measurements and s of the input quantities being based on other information. The [x.sub.i] represent sample means of the measurements used to estimate [[theta].sub.i], [[lambda].sub.i] represent the subjective evaluations of the means of the remaining input quantities. The u([mu]) is approximated as u(y) as follows. The variances of the frequency-based distributions of the r sample means are usually used for the corresponding variances in Eq. (2). Subjective evaluations are used for the variances of the remaining [[theta].sub.i]'s. The usual interpretation of the y and the u(y) is as the mean and standard deviation of a probability distribution of the measurand. From a statistical perspective, this usage represents methodology that is neither totally frequentist nor totally Bayesian, but can be viewed as an approximate solution to a Bayesian inference Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process. problem. For further discussion of this subject see Refs. [11, 12]. In the analysis of Key Comparison experiments, the pilot laboratory receives the values y and the uncertainties u(y) from all of the participants. Generally, the participants can also provide an estimate of the repeatability component of the uncertainty. Even though, it is understood that y and u(y) are features of a probability distribution for a measurand, no information about the shape of this distribution is provided. For a single laboratory, the following Bayesian statistical model gives results consistent with the Guide's measurement model, and has the added benefit of being easily extended to accomplish the KCRV estimation Y | [theta], [[sigma].sup.2] ~ N([theta], [[sigma].sup.2]) [theta] | [mu], [[tau].sup.2] ~ N([mu], [[tau].sup.2]) [mu] | m, [[omega].sup.2] ~ N(m, [[omega].sup.2]). (4) The notation notation: see arithmetic and musical notation. How a system of numbers, phrases, words or quantities is written or expressed. Positional notation is the location and value of digits in a numbering system, such as the decimal or binary system. Y | [theta], [[sigma].sup.2] ~ f(Y | [theta], [[sigma].sup.2]) = N([theta], [[sigma].sup.2]) represents conditioning. That is, the probability distribution of Y, given [theta] and [[sigma].sup.2], is f(Y|[theta], [[sigma].sup.2]) which is a Gaussian (Normal) distribution with mean [theta] and variance [[sigma].sup.2]. The participant's inputs are a sample mean y, a sample standard deviation [s.sup.2] (an estimate of [[sigma].sup.2]) and the remaining uncertainty [[tau].sup.2]. In this model, stage one in the hierarchy is used to quantify Quantify - A performance analysis tool from Pure Software. the usual sampling variability of y, that is, the uncertainty component due to repeatability. Stage two represents the remaining uncertainty, both that evaluated by Type A and Type B methods. Stage three is a prior distribution of the measurand [mu]. Normal distributions are used in this article but other forms of probability distributions can easily be substituted when appropriate. Generally, a non-informative stage three prior distribution on [mu] would be used, that is, allow [[omega].sup.2] [right arrow] [infinity infinity, in mathematics, that which is not finite. A sequence of numbers, a1, a2, a3, … , is said to "approach infinity" if the numbers eventually become arbitrarily large, i.e. ]. Application of Bayes Theorem Bayes theorem a statistical means of including local general information, intuitive judgment, clinical skill as learned over a long period, and similar subjective influences, in the assessment of probability, e.g. in making a diagnosis. f([mu] | y) = [f(y|[theta], [[sigma].sup.2])f([theta]|[mu], [[tau].sup.2])f([mu])]/[[integral]f(y|[theta], [[sigma].sup.2])f([theta]|[mu], [[tau].sup.2])f([mu])d[mu]] leads to a posterior posterior /pos·ter·i·or/ (pos-ter´e-er) directed toward or situated at the back; opposite of anterior. pos·te·ri·or adj. 1. Located behind a part or toward the rear of a structure. distribution for [mu] [mu] | y, [[tau].sup.2], [[sigma].sup.2] ~ N(y, [[tau].sup.2] + [[sigma].sup.2]). (5) In the above model the variance [[sigma].sup.2] is assumed to be a known quantity. As this is generally not true, [[sigma].sup.2] would be estimated by the sample variance [s.sup.2]. When [[tau].sup.2] dominates [[sigma].sup.2], as is often the case in high precision physical measurements, or when the sample size on which [s.sup.2] is based is large, the posterior distribution of [mu] can be well approximated by [mu] | y, [[tau].sup.2] ~ N(y, [[tau].sup.2] + [s.sup.2]). (6) (When the relationship between [[tau].sup.2] and [[sigma].sup.2] is less extreme, it is better to assign [[sigma].sup.2] a non-informative prior distribution and obtain the posterior distribution of [mu] by Markov Chain Monte Carlo methods. This will be shown in an example). Thus the approximate posterior mean and posterior standard deviation arising from Eq. (4) are in fact the quantities recommended for estimation by the Guide. In the next section, an example of a Key Comparison experiment is described in order to motivate the proposed modeling and analysis. 3. Example of a Key Comparison Experiments, Vibration Acceleration (CCAUV.V-K1) The aim of this experiment in the area of vibration and shock measurement was to compare measurements of sinusoidal sinusoidal /si·nus·oi·dal/ (si?nu-soi´dal) 1. located in a sinusoid or affecting the circulation in the region of a sinusoid. 2. shaped like or pertaining to a sine wave. linear accelerations in the frequency range from 40 Hz to 5 kHz. During the period from January 2000 to June 2001, 12 national metrology institutes used two different accelerometers, one of single-ended design and one of back-to-back design, to measure charge sensitivity at different frequencies. The charge sensitivity was given in pico coulomb coulomb (k `lŏm) [for C. A. de Coulomb], abbr. coul or C, unit of electric charge. The absolute coulomb, the current U.S. per meters
per second squared [pC/(m/[s.sup.2])]. All laboratories followed the
same measurement protocol, controlling temperature and relative humidity relative humidityn. The ratio of the amount of water vapor in the air at a specific temperature to the maximum amount that the air could hold at that temperature, expressed as a percentage. and other external variables which could affect the measurements. The German institute Physikalisch Technische Bundesanstalt (PTB PTB Physikalisch Technische Bundesanstalt (Germany) PTB Partido Trabalhista Brasileiro (Brazilian Labor Party) PTB Phosphotyrosine-Binding PTB Powers That Be PTB Power Tab ) was the pilot laboratory, responsible for checking the long term stability of the accelerometers. These were hand-carried from the pilot laboratory to the participating institutes in a closed box by representatives of the various institutes. The data from the Key Comparison was published in Report on Key Comparison CCAUV.V-K1 [13] and is now available for further study. This is a Class 1 Key Comparison with a traveling artifact, which can be considered as having a single measurand. A subset A group of commands or functions that do not include all the capabilities of the original specification. Software or hardware components designed for the subset will also work with the original. of this data, charge sensitivity for the single-ended accelerometer accelerometer Instrument that measures acceleration. Because it is difficult to measure acceleration directly, the device measures the force exerted by restraints placed on a reference mass to hold its position fixed in an accelerating body. at 40 Hz, is given here in Table 1. The table contains the mean values ([y.sub.i]), the repeatability component of the uncertainty ([s.sub.i]) and the Type B evaluated uncertainties ([[tau].sub.i]) for this measurement. The mean measurements are averages over five measurements for all but three laboratories. Laboratory one took nine measurements, laboratory eight took three measurements, and laboratory nine took four measurements. The [s.sub.i] are the sample standard deviations of the means. Each laboratory calculated their uncertainty values by the usual error propagation The transmission (spreading) of signals from one place to another. techniques, see Ref. [5], and included terms such as uncertainty from possible voltage disturbances, phase noise, uncertainty in the vibration frequency measurement and others. All of the possible sources of uncertainty that were to be considered are described in the publication [14]. Each laboratory assessed their Type B uncertainty independently of the other laboratories and without knowledge of the other laboratories' measurements. 4. Two Models for Key Comparison Data 4.1 Multiple Means Model A Key Comparison experiment is a multi-laboratory study. If we treat all laboratories data totally independently from each other, that is, if we assume that there are no relationships between the measurands or the uncertainties of the various laboratories, we can extend the statistical model given in Eq. (4) as follows. For i = 1,..., k, where k is the number of laboratories, we have [Y.sub.i] | [[theta].sub.i], [[sigma].sub.i.sup.2] ~ N([[theta].sub.i], [[sigma].sub.i.sup.2]) [[theta].sub.i] | [[mu].sub.i], [[tau].sub.i.sup.2] ~ N([[mu].sub.i], [[tau].sub.i.sup.2]) [[mu].sub.i] | [m.sub.i], [[omega].sup.2] ~ N([m.sub.i], [[omega].sup.2]) (7) The posterior distributions of the [[mu].sub.i] can be approximated by [[mu].sub.i] | [y.sub.i], [[tau].sub.i.sup.2] ~ N([y.sub.i], [[tau].sub.i.sup.2] + [s.sub.i.sup.2]), (8) and so the standard uncertainty of each laboratory is approximately the Guide recommended quantity. 4.2 Single Mean Experiment Suppose now that there is a common measurand as would be true in all Class 2 Key Comparisons and in most Class 1 Key Comparisons. Equation (7) can be modified to reflect this fact: [Y.sub.i] | [[theta].sub.i], [[sigma].sub.i.sup.2] ~ N([[theta].sub.i], [[sigma].sub.i.sup.2]) [[theta].sub.i] | [[mu].sub.i], [[tau].sub.i.sup.2] ~ N([[mu].sub.i], [[tau].sub.i.sup.2]) [[mu].sub.i] | [mu], [[gamma].sup.2] ~ N([mu], [[gamma].sup.2]) [mu] ~ N(m, [[omega].sup.2]) [gamma] ~ U(0, c). (9) The notation U(0, c) represents a rectangular (Uniform) distribution on the interval (0, c), where c is a constant. The prior distributions on the [[mu].sub.i] are now hierarchical A structure made up of different levels like a company organization chart. The higher levels have control or precedence over the lower levels. Hierarchical structures are a one-to-many relationship; each item having one or more items below it. , the common mean [mu] being the measurand of the entire experiment. Note that [[gamma].sup.2] represents variability due to systematic laboratory effects. When a prior distribution for it is specified, [[gamma].sup.2] can be estimated from the combined data of the participating laboratories. Recall that for each individual laboratory, the contribution of such systematic laboratory effects to their uncertainty is not estimable es·ti·ma·ble adj. 1. Possible to estimate: estimable assets; an estimable distance. 2. Deserving of esteem; admirable: an estimable young professor. by their own data and so is evaluated by Type B methods and is part of [[tau].sub.i]. For example, in the CCAUV.V-K1 key comparison, the uncertainty in the vibration frequency measurement is of this type. So is essentially any uncertainty attributed to the individual laboratory's technique. Thus the uncertainty estimate of [mu] based on Eq. (9) is somewhat conservative as it is impossible, without a complete list of all Type B evaluated uncertainties from each participant, to separate out the effects which are estimable from the pooled data and the effects which are not. When the [[sigma].sub.i.sup.2] are considered unknown and are given prior distributions, analysis based on Eq. (9) must be done numerically nu·mer·i·cal also nu·mer·ic adj. 1. Of or relating to a number or series of numbers: numerical order. 2. Designating number or a number: a numerical symbol. as closed form solutions are not available. Markov Chain Monte Carlo methods can readily be used, see Ref. [15] for sample computer programs. A special case of Eq. (9) is when [[gamma].sup.2] = 0, that is, when all laboratories are assumed to be truly measuring the same quantity. In such a case, allowing [[omega].sup.2] [right arrow] [infinity] and following Ref. [16], it can be shown that approximately, the posterior distribution of [mu] is normal with mean [[mu].sub.p] and standard deviation [[omega].sub.p] where [[mu].sub.p] = [[k.summation over 1][y.sub.i]([[tau].sub.i.sup.2] + [s.sub.i.sup.2])[.sup.-1]]/[[k.summation over 1]([[tau].sub.i.sup.2] + [s.sub.i.sup.2])[.sup.-1]] (10) [[omega].sub.p] = 1/[[k.summation over 1]([[tau].sub.i.sup.2] + [s.sub.i.sup.2])[.sup.-1]]. (11) 5. Analysis 5.1 Single Measurand Experiment Both Eqs. (7) and (9) can be used to construct a KCRV and its uncertainty for a single measurand experiment. First, consider using Eq. (9). In this case, the meaning of the KCRV is clear and so is its estimation. It is plainly an estimate of the common measurand [mu] and can be provided by the mean of the posterior distribution. The uncertainty is the posterior standard deviation of [mu]. If [[gamma].sup.2] is set to 0, the solution [Eq.(10)] is the most commonly used KCRV estimate described in Ref. [6]. However, unlike in that publication, here it is derived based on a Bayesian model. The underlying assumption of a common laboratory mean has been questioned in the literature, see for example Ref. [17]. Equation (9) with a prior distribution on [[gamma].sup.2] provides a sensible alternative, one that allows for systematic laboratory differences, provides a more conservative estimate of the uncertainty of the KCRV and also allows for a degree of validation See validate. validation - The stage in the software life-cycle at the end of the development process where software is evaluated to ensure that it complies with the requirements. of the stated uncertainties. The publication [17] takes another approach and directly models the differences between the measurand [mu] and the [[mu].sub.i]. The resulting KCRV is somewhat related to that described in Sec.5.2. It is possible that the participants of the Key Comparison may not wish to allow the mathematical model to pool their data automatically, but want to determine the form of the KCRV more directly. An alternate approach then would be to use the multiple means Eq. (7) and to construct a KCRV based on it. This requires a synthesis of the probability distributions of the [[mu].sub.i] into a single distribution for [mu]. The literature on such methods is rich and has been reviewed in Refs. [18] and [19]. An approach that is sensible for the current application is the Supra-Bayesian technique given in Ref. [20]. This can be described as follows: A single person with vague prior knowledge of a parameter (1) Any value passed to a program by the user or by another program in order to customize the program for a particular purpose. A parameter may be anything; for example, a file name, a coordinate, a range of values, a money amount or a code of some kind. [mu] consults k experts who provide the means (in our notation [y.sub.i]) and standard deviations (in our notation [square root of ([[tau].sub.i.sup.2] + [s.sub.i.sup.2])]) of their probability distributions for [mu]. The person then combines the k experts' distributions into a single probability distribution. He does this by first specifying a normal likelihood function to express his opinion about the experts' knowledge and then using Bayes Theorem. Namely he specifies that the distribution p([y.sub.1],..., [y.sub.k] | [[tau].sub.1],... [[tau].sub.k], [s.sub.1],..., [s.sub.k], [mu]) (12) is multivariate The use of multiple variables in a forecasting model. normal with means [[alpha].sub.i] + [[beta].sub.i][mu], standard deviations [[kappa Kappa Used in regression analysis, Kappa represents the ratio of the dollar price change in the price of an option to a 1% change in the expected price volatility. Notes: Remember, the price of the option increases simultaneously with the volatility. ].sub.i][square root of ([[tau].sub.i.sup.2] + [s.sub.i.sup.2])], and correlations [[rho].sub.ji] for i = 1,..., k. In this way he can express his beliefs about the possible biases (in terms of the [[alpha].sub.i] and [[beta].sub.i]) of the experts, about their precision (in terms of [[kappa].sub.i]), and to what extent their assessments are correlated cor·re·late v. cor·re·lat·ed, cor·re·lat·ing, cor·re·lates v.tr. 1. To put or bring into causal, complementary, parallel, or reciprocal relation. 2. or not. In the case of no correlation between the laboratories, the resulting posterior distribution for [mu] is normal with mean [[mu].sub.B] = [[k.summation over 1][[beta].sub.i]([y.sub.i] - [[alpha].sub.i])[[kappa].sub.i.sup.-2]([[tau].sub.i.sup.2] + [s.sub.i.sup.2])[.sup.-1]]/[[k.summation over 1][[beta].sub.i.sup.2][[kappa].sub.i.sup.-2]([[tau].sub.i.sup.2] + [s.sub.i.sup.2])[.sup.-1]] (13) and variance [[omega].sub.B] = 1/[[k.summation over 1][[beta].sub.i.sup.2][[kappa].sub.i.sup.-2]([[tau].sub.i.sup.2] + [s.sub.i.sup.2])[.sup.-1]]. (14) Note that when the [[alpha].sub.i] are set to 0, and the [[beta].sub.i] and [[kappa].sub.i] are set equal to one, these expressions become Eqs. (10) and (11). Note that this is again the weighted mean estimator as described in [6]. The most frequent criticism of this analysis is based on the belief that the values of the Type B uncertainties cannot be considered as well determined quantities but rather only as estimates of the underlying variability. Taking the [[kappa].sub.i] not as constants but giving them a probability distribution can model this fact. For simplicity, it will be assumed here that the laboratories' results are independent and thus that the [[rho].sub.i] are equal to 0. This is generally a reasonable assumption in Key Comparisons but can be relaxed if necessary. Reference [20] shows that without loss of generality Without loss of generality (abbreviated to WLOG or WOLOG and less commonly stated as without any loss of generality) is a frequently used expression in mathematics. , the [[alpha].sub.i] may be set to 0 and the [[beta].sub.i] to 1, when the following probability model for the [[kappa].sub.i] is employed: [[v.sub.i] [c.sub.i.sup.2]]/[[kappa].sub.i.sup.2] ~ [X.sub.v.sub.i]. (15) Note that E([[kappa].sub.i.sup.2]) = [c.sub.i.sup.2][v.sub.i]/([v.sub.i] - 2) and the coefficient of variation Coefficient of Variation A measure of investment risk that defines risk as the standard deviation per unit of expected return. of [[kappa].sub.i] is (v/2 - 2)[.sup.-1/2]. The values of [c.sub.i] and the degrees of freedom [v.sub.i] therefore specify the location and the spread of the distribution of [[kappa].sub.i]. The selection of the values can be aided by noting that [[kappa].sub.i] is approximately normal with mean log [c.sub.i] and variance (2[v.sub.i])[.sup.-1], which further implies that approximately, [a.sub.i.sup.-1] [c.sub.i] < [[kappa].sub.i] < [a.sub.i][c.sub.i], for [a.sub.i] such that log [a.sub.i] = [square root of (2/[v.sub.i])]. Further discussion of these relationships appears in Ref. [20] and also in Ref. [21]. Combining the likelihood function [Eq. (12)] and the prior distribution [Eq. (15)] via Bayes Theorem for a single laboratory i results in ([mu] - [y.sub.i])/[c.sub.i][square root of ([[tau].sub.i.sup.2] + [s.sub.i.sup.2])] (16) having a student [t.sub.vi] distribution. Because of the independence of the laboratories, using Bayes Theorem with Eqs. (12) and (15) for all k laboratories results in a posterior distribution of [mu] which is a product of the [t.sub.vi](i = l,..., k) distributions. An interesting property of this distribution is that it can be multi-modal when there is strong disagreement among the laboratories. This model, in a more general context, is also discussed in some detail in Ref. [22] and further generalized gen·er·al·ized adj. 1. Involving an entire organ, as when an epileptic seizure involves all parts of the brain. 2. Not specifically adapted to a particular environment or function; not specialized. 3. by Ref. [23]. The KCRV can be taken to be the mean or possibly the median of this distribution. The uncertainty is the standard deviation of this distribution. Such quantities cannot be obtained in closed form but can easily be computed using standard Markov Chain Monte Carlo methods (Ref. [15]). Interestingly, the same distribution was derived using a different model and different motivation by [24] for the problem of combining data which appear to be in mutual disagreement. Both approaches, that is KCRV estimation based on Eq. (9) or the Supra-Bayesian KCRV based on Eq. (7), are reasonable. Arguably ar·gu·a·ble adj. 1. Open to argument: an arguable question, still unresolved. 2. That can be argued plausibly; defensible in argument: three arguable points of law. , the Supra-Bayesian method introduces fewer assumptions, and allows a more direct modeling of possible inaccuracies in the individual Type B evaluations. On the other hand, the straightforward interpretation of Eq. (9) and the possibility of data-based estimation of the effects of the systematic laboratory effects makes this approach appealing as well. Analysis of data from CCAUV.V-K1 key comparison in Sec. 6 will illustrate the two approaches. 5.2 Multiple Measurand Experiments In some Class 1 experiments there are clearly defined multiple measurands. In such a case, Eq. (7) would be used. The question then is again how to estimate the KCRV since it has no natural interpretation. The Supra-Bayesian solution given in Sec. 5.1 is not applicable here since there is no common measurand. One possible solution is the following method, based on the so-called linear opinion pool, which dates back to Laplace. In this method k probability distributions [p.sub.i]() are combined as p() = [k.summation over (i=1)][w.sub.i][p.sub.i]() (17) where the weights [w.sub.i] add up to one. In the present application, the k laboratories posterior distributions for [[mu].sub.i] could be combined into the mixture distribution of a new random variable [mu], namely p([mu]) = [k.summation over (i=1)][w.sub.i][1/[square root of (2[pi]([[tau].sub.i.sup.2] + [s.sub.i.sup.2]))]][e.sup.-[([mu] - [y.sub.i])[.sup.2]]/[2([[tau].sub.i.sup.2] + [s.sub.i.sup.2])]]. (18) This is using the Gaussian density for [p.sub.i]. In most cases, the weights [w.sub.i] would be taken to be 1/k representing a view that the laboratories' data are of equal quality. The mean of this distribution, that is, the average of the k laboratories measurements would be taken as the KCRV. The standard deviation of this distribution u([bar.y]) = [square root of ([[[summation over i][[tau].sub.i.sup.2] + [s.sub.i.sup.2])]/k]+[[[summation over i]([y.sub.i] - [bar.y])[.sup.2]]/k])] (19) being the standard uncertainty of the KCRV. The linear opinion pool is an easily understood and easily performed method. It is intuitively pleasing because the weights can be thought to represent the relative quality of the laboratories' results. The estimator, with a frequentist interpretation, was used in Refs. [3] and [25]. Its main appeal in the Bayesian context, is that u([bar.y]) can be thought to represent the total variability in the population of measurands of the Key Comparison. This can be viewed as the true measure of uncertainty in such a Key Comparison, because of the assumed equality of the laboratories in terms of their competence. 6. Example: Analysis of CCAUV.V-K1 The Key Comparison in Vibration Acceleration is an interesting example of Class 1 Key Comparison with a traveling artifact. The data is given in Sect. 3.1. Table 1 summarizes the results of the analysis. Using Eq. (9), various results can be obtained depending on the value of [[gamma].sup.2]. In Table 2, the results for [gamma] = 0 are labeled "common mean model." The results using a uniform prior distribution on [gamma] are labeled "lab-effect model." The third column contains the Supra A relational DBMS from Cincom Systems, Inc., Cincinnati, OH (www.cincom.com) that runs on IBM mainframes and VAXs. It includes a query language and a program that automates the database design process. Bayes estimate, one with [c.sub.i] = 1.0 and [a.sub.i] = 2.0, that is, 0.5 < [[kappa].sub.i] < 2.0, (i = 1,..., 12). This choice of [c.sub.i] and [a.sub.i] gives a reasonable range of possible values for the standard uncertainties in this particular experiment as the declared standard uncertainties based on Table 1 range from a minimum of about one third of the average standard uncertainty to a maximum of about twice the average. The fourth column contains the results based on the linear pool estimator. In this particular Key Comparison, the separate measurands model is possibly too extreme as there truly was a single measurand. However, the separate measurands model can be thought of as a limiting case of a systematic lab effects model and as such can be used to obtain an upper bound on the uncertainty of the KCRV. It is clear from Table 2 that even though the values of the Bayesian KCRV estimates under the various assumptions are very similar, their uncertainties are not. Thus, it is clearly important to examine the assumptions underlying the various analyses and make sure that they are reasonable. Equation (9), without the restriction on [gamma], quite objectively estimates the underlying variability due to systematic lab effects. It allows for the uncertainty to be as low as that given in Eq.(11), when the data supports it. In this case, the data clearly indicates that the uncertainty is larger. The posterior mean of [gamma] is 1.9421E-4 with posterior standard deviation of 1.5838E-4. This gives a 95 % Highest Posterior Density Interval (HPD HPD Honolulu Police Department (Honolulu County, Island of Oahu) HPD Housing Preservation and Development HPD Housing Preservation and Development (New York City Department) ) of (8.2604E-6, 5.9033E-4). This is the shortest interval of possible values for [gamma] which has probability of 0.95. For comparison, the individual laboratories' Type B evaluated uncertainties ranged from 1.8974E-4 to 1.3153E-3. Note that these included terms based on random laboratory effects (estimated by [gamma]) as well as terms based on other factors. Thus on the whole, the stated Type B evaluated uncertainties are reasonable, and the lab-effects-model KCRV would make a good choice. The published analysis of CCAUV.V-K1 used Eqs. (10) and (11) for the KCRV and its uncertainty. The analysis (referred to as the common mean model above) was described as "correctly reflecting the declared uncertainties of the individual laboratories." There is a discussion in the report, from a frequentist perspective, of statistical issues concerning the underlying assumptions of this analysis. Because of concerns with the validity of these assumptions, other frequentist estimates of the KCRV, including the average [bar.y], the median of the [y.sub.i], and a Maximum Likelihood Estimator (Ref. [23]) were computed. The report concluded that the values of these estimators were similar enough to justify the use of the common mean model for the KCRV in this key comparison. The report did not explicitly show the uncertainties associated with the various estimators. 7. Conclusions Key Comparison experiments, performed in various sub-disciplines of physics and chemistry, pose numerous challenges to the analyst. Most importantly Adv. 1. most importantly - above and beyond all other consideration; "above all, you must be independent" above all, most especially , Type B evaluated uncertainty must be included in the statistical model in a meaningful way, one that satisfies both the scientist and the statistician. Further, the scientific objectives of the experiment must be reflected in the statistical summaries and the results must be compliant with the Guide to the Expression of Uncertainty in Measurement. It is shown in this paper, that the Bayesian paradigm allows flexible modeling of Type B evaluated uncertainty and that it can produce estimates that are satisfactory to the needs of the scientists. 8. References [1] Mutual Recognition of national measurements standards and of calibration and measurement certificates issued by national metrology institutes, Bureau International des Poids et Mesures (body, standard) Bureau International des Poids et Mesures - (BIPM) The standards body that ensures world-wide uniformity of measurements and their traceability to the International System of Units (SI). , Paris, 14 October 1999. [2] CIPM, Guidelines for CIPM key comparisons, BIPM publication (1999). [3] P. Ciarlini, M. Cox, F. Pavese, and G. Regoliosi, The Use of a Mixture of Probability Distributions in Temperature Interlaboratory Comparisons, Metrologia 41, 11-121 (2004). [4] F. Pavese, Compound Modelling of Metrological me·trol·o·gy n. pl. me·trol·o·gies 1. The science that deals with measurement. 2. A system of measurement. Data Series, Advanced Mathematical and Computational Having to do with calculations. Something that is "highly computational" requires a large number of calculations. Tools in Metrology, World Scientific Publishing Established in 1981, World Scientific Publishing Company (WSPC) is one of the leading scientific publishers in the world, and the largest international scientific publisher in the Asia-Pacific region. Company (2004). [5] ISO Technical Advisory Group, Working Group 3, Guide to the Expression of Uncertainty in Measurement, International Organization for Standardization International Organization for Standardization (ISO) Organization for determining standards in most technical and nontechnical fields. Founded in Geneva in 1947, its membership includes more than 100 countries. , Geneva Geneva, canton and city, Switzerland Geneva (jənē`və), Fr. Genève, canton (1990 pop. 373,019), 109 sq mi (282 sq km), SW Switzerland, surrounding the southwest tip of the Lake of Geneva. (1993). [6] M. G. Cox, The Evaluation of Key Comparison Data, Metrologia 39, 589-595 (2002) [7] C. Elster and A. Link, Analysis of key comparison data: Assessment of current methods for determining a reference value, Measurement Science and Technology 12, 1431-1438 (2000). [8] C. Sutton, Analysis and linking of international measurement comparisons, Metrologia 41, 272-277 (2004). [9] D. White, On the analysis of measurement comparisons, Metrologia 41, 122-131 (2004). [10] N. F. Zhang, H-K H-K Hunter-Killer . Liu, N. Sedransk, and W. Strawderman, Statistical analysis of key comparisons with linear trends, Metrologia 41, 231-237 (2004). [11] R. Kacker and A. Jones, On use of Bayesian statistics to make the Guide to the Expression of Uncertainty in Measurement consistent, Metrologia 40, 235-248 (2003). [12] C. Wang and H. Iyer, Propagation of uncertainties In statistics, propagation of uncertainty (or propagation of error) is the effect of variables' uncertainties (or errors) on the uncertainty of a function based on them. in measurements using generalized inference (logic) inference - The logical process by which new facts are derived from known facts by the application of inference rules. See also symbolic inference, type inference. , Metrologia 42, 145-153 (2005). [13] H. Von Martens, C. Elster, A. Link, A. Taubner, W. Wabinski, and H. Will, Report on Key Comparison CCAUV.V-K1, PTB-1.22, Physikalisch Technische Bundesantalt, Braunschweig (2001). [14] ISO, Methods for the calibration of vibration and shock transducers--Part 11, Reference number ISO 16063-11:1999(E), International Organization for Standardization, Geneva (1999). [15] B. Toman to·man n. A gold coin formerly used in Persia worth 10,000 dinars. [Farsi t m , Linear statistical models in the presence of
systematic effects requiring a Type B evaluation of uncertainty,
Metrologia 43, 27-33 (2006).[16] D. M. Lindley and A. F. M. Smith, Bayes estimates for the linear model, J. Roy. Statist stat·ism n. The practice or doctrine of giving a centralized government control over economic planning and policy. stat ist adj. . Soc. B 34, 1-41 (1972).[17] R. Kacker, R. Datla, and A. Parr, Statistical analysis of CIPM key comparisons based on the ISO Guide, Metrologia 41, 340-352 (2004). [18] C. Genest and J. V. Zidek, Combining probability Distributions: A Critique and Annotated Bibliography An annotated bibliography is a bibliography that gives a summary of the research that has been done. It is still an alphabetical list of research sources. In addition to bibliographic data, an annotated bibliography provides a brief summary or annotation. , Statistical Sci. 1, 114-148 (1986). [19] R. T. Clemen and R. L. Winkler Winkler may refer to:
[20] D. M. Lindley, Reconciliation of Probability Distributions, Operations Res. 31, 866-879 (1983). [21] H. Jeffreys, Theory of Probability Noun 1. theory of probability - the branch of applied mathematics that deals with probabilities probability theory applied math, applied mathematics - the branches of mathematics that are involved in the study of the physical or biological or sociological , third Ed., Oxford University Press, New York New York, state, United States New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of (1963). [22] G. Box and G. Tiao, Bayesian Inference in Statistical Analysis, Addison-Wesley (1973). [23] M. Vangel and A. Rukhin, Estimation of a Common Mean and Weighted Means Statistics, J. Amer. Statistical Assoc. 93, 303-309 (1998). [24] G. D'Agostini, Skeptical Combination of Experimental Results: General Considerations and Application to [epsilon]'/[epsilon], CERN-EP/99-139, (1999). [25] D. L. Duewer, A Robust Approach for the Determination of CCQM Key Comparison Reference Values ref·er·ence values pl.n. A set of laboratory test values obtained from an individual or from a group in a defined state of health. and Uncertanties, Working Document CCQM 04-15, BIPM (2004). About the author: Blaza Toman is a mathematical statistician Noun 1. mathematical statistician - a mathematician who specializes in statistics statistician mathematician - a person skilled in mathematics in the Statistical Engineering Division of the Information Technology Laboratory. The National Institute of Standards and Technology National Institute of Standards and Technology, governmental agency within the U.S. Dept. of Commerce with the mission of "working with industry to develop and apply technology, measurements, and standards" in the national interest. is an agency of the Technology Administration, U.S. Department of Commerce. Blaza Toman National Institute of Standards and Technology, Gaithersburg, MD 20899-8980 toman@nist.gov Accepted: November 14, 2005. Available online: http://www.nist.gov/jres
Table 1. Charge sensitivity and the associated uncertainties
[s.sub.i] in [[tau].sub.i] in
[y.sub.i] in [10.sup.5] [10.sup.5]
Laboratory number pC/(m/[s.sup.2]) pC/(m/[s.sup.2]) pC/(m/[s.sup.2])
1 0.12901 3.6 19
2 0.12914 5.5 73
3 0.12924 8.9 57
4 0.12874 6.1 66
5 0.12960 55.9 85
6 0.12890 3.9 72
7 0.12875 2.4 43
8 0.12870 9.6 39
9 0.12853 9.9 53
10 0.12830 4.4 54
11 0.12950 5.5 43
12 0.12877 11.6 132
Table 2. Comparison of KCRV estimates
Common mean Lab-effects Supra-Bayes Linear-pool
model pC/ model pC/ pC/ pC/
(m/[s.sup.2]) (m/[s.sup.2]) (m/[s.sup.2]) (m/[s.sup.2])
KCRV 0.12892 0.12894 0.12894 0.12893
Uncertainty 0.000128 0.000167 0.000253 0.000791
|
|
||||||||||||||||||

`lŏm)
m
ist adj.
Printer friendly
Cite/link
Email
Feedback
Reader Opinion