Comments on "common statistical errors and mistakes: valuation and reliability".
I have read Mr. Matthew Trimble's comments on this article and find his positions to be valid and theoretically sound. It would be counterproductive for me to plow the same ground. Instead, I focus my attention to Mr. Trimble's closing cautionary admonition.
I have devoted a good portion of the past several years to development of tools and learning opportunities for appraisers directed toward improvement of competency in the use of descriptive and inferential statistical methods in their valuation practices. Interactions with students and appraisers over the years continue to reveal a large number of difficult-to-dispel misconceptions regarding application of statistical methods. These run the gamut from simple misuse of statistical terminology to more complex ideas, such as the roles played by randomness, the normal distribution, and statements of confidence and/or margin of error. The article by Mr. Dell is rife with many of these misunderstandings and thereby perpetuates them.
For example, the article begins with a misstatement of the statistical meaning of reliability. In its first paragraph, it indicates the appraiser's goal is probabilistic reliability, saying to be reliable "the estimate of value must be both accurate and precise." While this statement sounds appealing and laudable, it is essentially incorrect. The meanings of words matter, and if one is going to write about reliability, there is an expectation that the writer knows the meaning of the term reliable in the context of the discipline being written about. Accuracy and precision have nothing to do with reliability. Reliability is an admirable goal, but it is entirely possible for a measure to be simultaneously reliable and inaccurate. An action or measurement is reliable if it is consistent and stable-the clearest example deals with target shooting. A rifle can be reliable if it misses the target by the same amount each time it is fired (consistent) and continues to miss by the same amount if it is stored away and fired again at another time (stable). The ultimate goal of appraisal is validity, which is accurately measuring what one intends to measure, and reliability, which is evidenced by consistent and stable validity. In the appraisal context, validity relies both on the quality of transaction data and how the data are analyzed. Contrary to Mr. Dell's expressed opinion, simple descriptive measures of all of the recent comparable transactions are usually insufficient for appraisal validity.
The article presents a confused view of populations, samples, and inference. The author takes the position that a data set representing a census of a market area's sold properties over a given time frame constitutes a population. While such a census may be viewed as a population, one must ask what population is it and how does it relate to development of an opinion of value. In my view, it could be viewed as a population of property sales, and as the citation in the article from Dr. Epley (p. 334) says, the sales can be mathematically treated as a population to develop parameters, such as mean or median price or price per unit of measure, standard deviations of prices or property characteristics, ranges, quartiles, and the like. While these sorts of sale-census parameters can be employed to describe the sales data, the parameters alone cannot and should not be used to infer market value in an appraisal assignment. Valid inferences are required to make a connection between a sold property census and development of an opinion of market value for a property that may or may not be an element of the census. When inferring statistically, it is necessary and appropriate to mathematically treat the sold property "population" as a "sample."
The article says, "Because complete data sets are available, it is not necessary to use samples (p. 334)." In actuality, it is inappropriate to view a "complete data set" as anything other than a sample w hen developing an opinion of market value for a subject property by use of statistical methods. Mr. Trimble in his comments addresses this when he discusses the concept of value being the expected value of "all possible [market] prices that could be paid for the appraised property." A simple principle underlying value theory deals with the difference between price and value. Data sets consist of observed prices that are analyzed to gain an understanding of value, which cannot be observed. Therefore, the only alternative is to infer value from price data. If an analyst opts to rely on statistical methods for inference, then an understanding of the inferential tool or tools employed to accomplish this is paramount. In contrast, Mr. Dell says,
There is no need for statistical sampling and all the tests and approximations. No confidence intervals. No standard errors. No hypothesis tests. No Type-1 errors. No Chi-squared. No p-values. Nothing to do with the probability sample required for inferential statistics (p. 338).
Nothing could be further from the truth. His view implies deterministic relationships, allowing for no uncertainty whatsoever in reaching a value opinion derived from price observations. Furthermore, the final sentence in this excerpt perpetuates the myth that inferential statistics require probability sampling. Valid opinions require representative samples (our principle of substitution derives from this line of reasoning), and random sampling is one, but not the only, way of collecting a representative sample.
Mr. Dell implies that data sets must be normal (page 340). This is untrue for applications of linear regression modeling. What is true is that regression errors are assumed to be normal. In a practical sense, this means that some of the market prices in the data set exceed market value, some are beneath market value, and these differences are offsetting (unbiased). Also, it means that more observations are close to the central tendency than are far from it. This results in a mean price indication that is centered at or near the most probable price, with regression model estimation errors clustered around the mean. The spread of such regression errors can he relatively narrow or wide, leading to an understanding of the precision of a model's market value prediction. (I can't imagine why an analyst or client would not value an understanding of the relative precision of an inference.) Importantly, if the data are representative, unbiased, and sufficient in number, then a linear regression model will generate valid predictions of central tendency (in appraisal, these are usually mean price predictions--defined as market value estimates) and valid confidence intervals indicative of the precision of the prediction (e.g., margins of error).
While it may be that inferential statistical methods are not appropriate for everyone, the choice to employ them or not is a personal one. I am the first to say that reliance solely on our traditional methods can produce highly valid professional results. However, if a professional elects to employ an additional analytical tool he or she is obligated to become proficient in its use: intuitive, challenging, and difficult or not.
Marvin L. Wolverton, PhD, MAI (Ret.)
I am pleased to have Dr. Wolverton comment on my article, as he has had so much to do with the current advanced analytical theory as taught by the Appraisal Institute. While I appreciate Dr. Wolverton crediting me with introducing a new valuation modeling paradigm, my article rests on established appraisal theory, while emphasizing the potential of today's data science tools. I agree, however, that my thesis--that inferential models using sample statistics are not necessary for appraisal work--requires thorough discussion and examination.
Dr. Wolverton states Mr. Trimble's positions are valid and theoretically sound. Their common fundamental belief appears to be that statistical inference is the correct data science tool necessary for appraisal work. My position is that the force-fitting of this solution onto appraisal work and appraisal education is problematic. It neglects numerous other data analysis tools now available. The basis of their belief in the inferential assumption seems to be that when you have a data set (comparables):
* It must be a sample.
* The sample can provide a mathematically sound inference.
* It may comprise some or all the actual competitive market sales.
* The sample came from some kind of superpopulation.
* You can apply probability-based tests, such as Chi-squared, confidence intervals, and hypothesis tests.
In order to apply inferential statistics to appraisal, however, also requires the following beliefs:
1. You do not need sample randomness. A judgment sample is just the same.
2. The population is some kind of imaginary or hypothetical larger market data set.
3. The pretended scientific sampling mechanism has mysteriously already taken place.
Semantic issues are present in the discussions in both letters, with inattention to, and equivocation of, the critical term to infer and its different connotations. An inference in logical or scientific contexts is a conclusion based on evidence and reasoning. An inference in a statistical context is a specific way of characterizing a population from a random sample (random selection or random assignment). In the discussion, there is no clear concept of what population is analyzed. Dr. Wolverton seems to accept that the competitive market segment may be relevant but also accepts the soundness of Mr. Trimble's alternative characterizations of populations, which state "the statistical population is never obtainable when the parameter being estimated is the market value," and "a set of comparable sales is assumed to already be an unbiased sample, representative of the population to be modeled." These positions are incongruent and provide a perplexing basis for understanding the underlying beliefs.
I agree that there is much misuse of terminology on this topic. My article attempts to reduce such misunderstanding using plain language. Dr. Wolverton states that the writer is expected to know the meaning of the term reliable in the context of the discipline one is writing about. I agree. The discipline here is appraisal in The Appraisal Journal, not in the Inferential Statistics Journal. I use the word reliability in its common dictionary definition, as it is used in the Uniform Standards of Professional Appraisal Practice and in The Appraisal of Real Estate, where reliability means the ability to be relied on or depended on; accurate, or provide a correct result.
In An Introduction to Statistics for Appraisers, Dr. Wolverton states, "A reliable model would produce results that can be thought of as consistent, dependable, and predictable," (2) but also that accuracy and precision have nothing to do with reliability. This creates a semantic problem. Clients and reviewers depend on accuracy (trueness) and precision (sureness). (Note that Figure 1 in my article is titled "Validity of an Estimate of Value.") The confusion is that there is a distinction (in data science) between the reliability of data/measurement and the reliability of an analysis/conclusion. Dr. Wolverton is correct in that (for example) a shrunken tape measure (or damaged rifle sight) may be consistent in overestimation, yet still reliable. Contrarily, an appraiser who is consistently too high is not considered reliable in the profession. The target analogy is interesting. Yes, the rifle is reliable, even if it shoots consistently too high to the right. The problem is we do not care about this reliable rifle. We care about where the bullet goes. It is this redefinition of reliable that causes credibility problems for appraisers and the profession as a whole. It creates issues in the courtroom--where, for example, the statistician/appraiser explains to the jury that his statistical inference is "highly reliable," but misses the target entirely. Wolverton states that simple descriptive measures of recent comparable transactions are usually insufficient for appraisal validity. I agree. But they are a core tool for market characterization, classification, and analysis.
Contrary to Dr. Wolverton's comment, the article does present a clear view of population, samples, and inferences. Population and sample are not hard to define. A population is the data set providing information. For appraisers, this consists of the sales in the competitive market segment. A sample is any subset of the population. Inference, however, is alternately used in two ways by Dr. Wolverton: (1) as a logical/analytical inference, and (2) as statistical inference. This distinction is important. For example, he states that a census (of a market area's sold properties) may be viewed as a population. But //you want to apply inferential statistics then you must treat that population as if it were a sample. (Presumably, a sample from some imaginary superpopulation).
I agree that the descriptive parameters of a market segment alone are not sufficient to infer value (in the logical/analytical inference sense of the term). I also agree that when inferring statistically, it is necessary and appropriate to mathematically treat the sold property population as a sample. (3) Then the sample is the population, when we own the complete data set. Thus, there is no uncertainty due to sampling. The only uncertainty left is that of the original data measurement itself (see Figure 5, Error Sources).
As Dr. Wolverton suggests, it is acceptable to view a population as a sample. Just remember, there is no remaining sampling error. It's all gone. Poof! He also states that it is a myth that inferential statistics require probability sampling. If so, this "myth" appears to be widespread. The following are representative of passages from among numerous statistics and econometrics texts on this issue.
The data sample, obtained from the population, should be randomly drawn ... Only by studying randomly drawn samples can one expect to arrive at legitimate conclusions, about the whole population, from the data analyses. (4) Inferential statistics are based on taking a random sample from a larger population. (5) Fundamentally, inferential statistics involves estimating a population parameter using sample data or reaching a conclusion concerning one or more populations based on sample data.... The measure of accuracy states the degree of uncertainty associated with the inference. ... Uncertainty cannot, however, be quantified when the sample is a nonprobability sample. (6) Samples can be broadly divided into two categories-probability samples and nonprobability samples. Statistical inferences formed through the analysis of probability samples are preferred because inferences drawn from nonprobability samples may be unreliable and inaccurate. (7) Nonprobability Samples: Information obtained from the sample data may not be applicable to the larger population because there is no guarantee that the sample data are representative of the population. (8)
The US Government Accounting Office also comments on this issue:
Inferential Statistic: A statistic used to describe a population using information from observations on only a probability sample of cases from the population. (9) A group of cases can also be treated as a batch, a group produced by a process about which we make no probabilistic assumptions. For example, the evaluators might use their judgment, not probability, to select ... cases for study.... As such, the techniques of descriptive statistics can be applied but not those of inferential statistics. Thus, conclusions about the population of which the batch is a part cannot be based on statistical rules of inference. (10) Results from statistical samples are objective and defensible.... However, judgment samples cannot be defended by mathematical theory, not because the conclusions that are reached are wrong but because there is no way of objectively determining if they are right or wrong. (11)
Finally, the judicial system clearly is moving away from the "trust me" approach to scientifically based evidence:
It is randomness in the technical sense that provides assurance of unbiased estimates from a randomized controlled experiment or a probability sample. Randomness in the technical sense also justifies calculations of standard errors, confidence intervals, and p-values. (12) If the data are collected on the basis of a probability sample or a randomized experiment, there will be statistical models that suit the occasion, and inferences based on these models will be secure. Otherwise, calculations are generally based on analogy. (13)
It appears that traditional comparable selection would probably be considered as analogy based--purposeful sampling constructed on the appraiser's experience and education. Purposeful sampling has its place, but not within inferential statistical theory. Today, we do not need to rely on analogous analysis. We can concentrate on complete-data selection. To this end, we must abandon the inferential assumption and concentrate on developing a more rigorous scientific definition of what is a competitive market, i.e., what is a directly or indirectly comparable sale. When appraisers are able to do this--rigorously define markets--the profession can take back a huge opportunity, and once again better serve the public good.
I cannot agree with Dr. Wolverton that statistical inference is a personal choice. It is a professional matter related to the scope/problem to be solved, professional competency, and best practices. The appraisal problem is simple but twofold: (1) identify the market, and (2) position the subject in that market. It simply makes no sense to force the statistical inferential assumption on a problem for which it is not well suited.
George Dell, MAI, SRA
San Diego, California
(2.) Marvin L. Wolverton, An Introduction to Statistics for Appraisers (Chicago: Appraisal Institute, 2009), 145.
(3.) A sample can be large as the appraiser wants and can/should include the entire population, but be no larger. This eliminates statistical sampling variability and the need for statistical inference. This is the strongest (but not the only) argument to use all the relevant market competitive sold properties.
(4.) Joaquim P Marques de S3, Applied Statistics Using SPSS, STATiSTICA, MATLAB and R, 2nd ed. (Berlin: Springer, 2007), 5-6.
(5.) Clint Ballinger, "Why Inferential Statistics Are Inappropriate for Development Studies and How the Same Data Can Be Better Used" (February 2011), 2; http://papers.ssrn.com/sol3/pa pers.cfm?abstract_id=1775002.
(6.) Appraisal Institute, The Appraisal of Real Estate, 14th ed. (Chicago: Appraisal Institute, 2013), 279.
(7.) Wolverton, An Introduction to Statistics, 147.
(8.) Ibid., 152.
(9.) Eleanor Chelimsky, ed., "Glossary" in Quantitative Data Analysis: An Introduction (Washington, DC: US General Accounting Office, May 1992), 123; http://www.gao.gov/special.pubs/pe10111.pdf. I recommend this free online government guide, as it provides some of the clearest explanations available, and in a manner accessible to a layperson with minimal data science background.
(10.) Ibid., 27.
(11.) Eleanor Chelimsky, ed., Using Statistical Sampling (Gaithersburg, MD: US General Accounting Office, May 1992), 118; http://www.gao.gov /assets/80/76112.pdf.
(12.) Federal Judicial Center and National Research Council, "Reference Guide on Statistics" in Reference Manual on Scientific Evidence 3rd ed. (Washington, DC: National Academies of Sciences, 2011), 230; http://www.fjc.gov/public/pdf.nsf/lookup/SciMan3D07.pdf/$file/SciMan3D07.pdf.
(13.) Ibid., 241.
|Printer friendly Cite/link Email Feedback|
|Author:||Wolverton, Marvin L.|
|Article Type:||Letter to the editor|
|Date:||Mar 22, 2014|
|Previous Article:||Comments on "common statistical errors and mistakes: valuation and reliability".|
|Next Article:||Residential Green Valuation Tools.|