Using Multiple Regression Analysis in Real Estate Appraisal.
This paper illustrates, using a representative appraisal report, the pitfalls or mistakes that real estate appraisers can make when using multiple regression analysis (MRA) to estimate the value of real estate. The paper also presents techniques appraisers can use to avoid these pitfalls. Most of the pitfalls can be avoided by using a sufficient number of comparable sales. The study concludes with recommendations to appraisers regarding the application of MRA in the appraisal of real estate.
This study addresses the pitfalls of using multiple regression analysis (MRA) in real estate appraisal. It also addresses techniques for avoiding these pitfalls. The study begins with a brief overview of the use of MRA in the appraisal of real estate. Then the study presents a representative application of MRA to the appraisal of an industrial property. Next, the study lists and discusses the pitfalls of using MRA in real estate appraisal with illustrations of these pitfalls taken from the representative appraisal report. The study continues with a discussion of the techniques for avoiding these mistakes and problems. Finally, the study concludes with recommendations to appraisers regarding the application of MRA in the appraisal of real estate.
Overview of the Application of MRA in the Appraisal of Real Estate
The idea that one can estimate the value of a highly heterogeneous product, such as real estate, by using statistical techniques to compare its value to its characteristics has been around for a long time. Court is generally given credit for the development of the notion that the price or value of a heterogeneous good (automobiles in Court's 1939 article) can be modeled as a function of the good's characteristics (make, model, year, and accessories).  Court calls his modeling of automobile prices a "hedonic" price index because his model is not strictly derived from rigorous theory. Although Webster's Ninth New Collegiate Dictionary defines hedonic as "of, relating to, or characterized by pleasure,"  the term hedonic is now used by urban economists and real estate analysts to describe any regression of the price of a parcel of real estate upon the characteristics of the real estate. The technique is called "hedonic" because the relationship between the price and the characteristics of the real estate is whatever fits the data. Thus, the technique is driven by data, not theory.
Griliches popularized the technique by using it to develop an index of quality change in automobiles  Shenkel is one of the earliest appraisers to demonstrate that the technique can also be used to predict real estate values.  In 1978 the American Institute of Real Estate Appraisal gave multiple regression analysis even more visibility by including the technique in an appendix of the seventh edition of The Appraisal of Real Estate. The current (12th) edition of The Appraisal of Real Estate includes an example of how to use multiple regression analysis to estimate the value of real estate. More recently, Ramsland and Markham demonstrated and discussed how to apply MRA to the appraisal on an industrial property.  Also, MRA has become popular among tax assessors as a mass appraisal technique.
MRA has become even easier to use as more and more personal computer software become available that can perform the necessary calculations quickly and accurately. Today, both of the two most popular spreadsheet packages (Lotus and Excel) contain built-in multiple regression functions. Access to these built-in multiple regression functions gives appraisers a powerful tool to use in the appraisal process. Unfortunately, powerful tools can produce powerful mistakes just as quickly as they can produce powerful results. As shown below, the complexities and data requirements of MRA make mistakes such as erroneous applications, erroneous conclusions, and unsupported interpretations of the results all too easy for the unsuspecting appraiser to commit.
A Representative Application of MRA in Real Estate Appraisal
The appraisal of a manufacturing plant near a small, Midwestern city is selected as representative of the application of MRA in real estate appraisal. This particular appraisal is selected for three reasons:
1. The appraisal is performed by a well-trained appraiser.
2. The appraiser follows and cites Ramsland and Markham  in the report.
3. The final estimate of value is based entirely upon MRA.
The subject property is a manufacturing plant site that consists of 24 buildings containing 2.2 million square feet of floor space on 178 acres of land. The site is connected by a rail siding to a major rail line, and it has several accesses to a nearby state highway. The site is zoned heavy industrial, and it is assessed for 1997 property tax purposes at $16.58 million.
The subject property is being appraised as part of the owner's efforts to reduce the assessed value of the property. The appraiser is state-certified, holds the MAI, SREA, and CRE designations, has a bachelor's degree from a major state university, and has considerable experience appraising industrial property. The appraiser applies the cost approach and market  approach (using MRA) but not the income capitalization approach to the subject property. The cost approach estimate is given no weight in the final estimate of value. Thus, the final estimate of value is based completely on the MRA performed by the appraiser.
The results of the appraiser's MRA is shown in Table 1. Table 1 is an exact replica of the page in the subject appraisal report that contains the results of the appraiser's MRA. The format of Table 1 (and the exhibit in the appraiser's repot) is nearly identical to the format Ramsland and Markham use in their article on applying multiple regression analysis.
In deposition, the appraiser reports that Lotus software was used to perform the calculations. In an effort to support and bolster the estimate of value, the appraiser reports that the [R.sup.2] of the regression model (0.8503) "means that 85% of the differences between the independent variables and dependent variable can be explained by the data." The appraiser also states that the standard error of the regression model (1.4078) indicates that "a range in per square foot values of $1.41 per square foot above and below the predicted value of the subject can be supported." The final estimate of value based upon the regression model is $4.45 per square foot or about $9.735 million (41.3% below its assessed value) as of January 1, 1997.
In any application of MRA to real estate, the major pitfalls can be found in two areas:
* The model specification
* The robustness (sensitivity to changes in underlying assumptions, data, and procedures) of the results of the regression
Although the terms model specification and robustness are well known in statistics, appraisers in general are not familiar with them. As a result, further discussion of these terms is warranted.
Model specification is concerned primarily with:
1. The choice of dependent and independent variables
2. The functional form of the relationship between these variables
3. The statistical significance of the independent variables
By examining these three aspects of a MRA, a judgement can be made regarding the trustworthiness of the model specification.
Robustness of the results of the regression analysis is concerned primarily with:
4. The multivariate normality of the data 
5. The sensitivity of the results to variations in the individual sales used in the analysis
6. Measurement errors in the data
Examination of these three aspects of a MRA gives insight into the robustness of the results.
Examination of these six critical aspects sheds light upon the extent that the results of any MRA support the resultant estimate of value. Appraisers who ignore these six critical aspects of MRA risk falling into the pitfalls found therein. Each of these six critical aspects is examined further below with illustrations of the pitfalls taken from the representative appraisal report.
Analysis of the specification of any multiple regression model focuses upon three primary issues:
1. The choice of dependent and independent variables
2. The functional form of the relationship, between these variables
3. The statistical significance of the independent variables
The functional relationship between the dependent and independent variables in a well-specified model will be:
* Supported by theory to the fullest extent possible
* Free of built-in, spurious relationships
Moreover, the independent variables in a well-specified model will be statistically significant at some acceptable level of confidence. (Statistical inference and confidence levels are discussed further below.)
Pitfall Number One: The Choice of Dependent and Independent Variables
Most appraisers are very good at selecting relevant and important dependent and independent variables to include in an MRA model. Appraisal textbooks and literature are full of rich examples of what units of comparison to use in appraising most any type of property. This same literature is also rich in examples of what property characteristics to include in the appraiser's analysis of comparable sales. Thus, appraisers usually do not fall into this particular pitfall. Indeed, in the representative appraisal report, the appraiser selects a defendable dependent variable and a defendable set of property characteristics for inclusion in the MRA model as independent variables.
Pitfall Number Two: The Functional Form of the Model
In the representative appraisal report, the appraiser's MRA model actually includes location, date of sale, effective age, land-to-building area ratio, and the log of building area as the independent variables with price per square foot of building area as the dependent variable. Upon first glance, the model appears to contain a logical set of independent variables. Certainly, factors such as location, date of sale, building size, and effective age have an influence on value. The problem with the specification of this model is that it uses building area three times:
1. As the denominator of the dependent variable (price per square foot of building area)
2. In the denominator of one of the independent variables (land-to-building area ratio)
3. In log form (log of building area) as another independent variable
This multiple use of building area has the potential of distorting the true relationship between value and building area as well as introducing a spurious relationship between value and building area in this model.
Pitfall Number Three: Statistical Significance
The quintessential tests of model specification are the t- and f-tests of the statistical significance of the parameter estimates of the model. A 95% confidence level (maximum of a 5% error) is used in this study to distinguish between significant and insignificant results.  That is, the chance that a particular parameter estimate is equal to zero can be rejected with at least 95% confidence (no more than a 5% chance of incorrectly rejecting the hypothesis that the coefficient is zero).
In Table 1, all of the parameter estimates (called X coefficients in Table 1) in the regression model as well as the f-value of the regression are statistically insignificant with 95% confidence. The fact that the X coefficients or parameter estimates are statistically insignificant means that the set of independent variables used by the appraiser contribute very little to the final estimate of value.
In addition, the appraiser uses an adjustment factor for date of sale (-6.3% per year) that cannot be supported by the regression model because the X coefficient for date variance (date of sale) is statistically insignificant (again, with 95% confidence). Indeed, the regression model implies that no adjustment be made for date of sale or time because the X coefficient for date of sale is statistically insignificant (no different than zero).
In short, the X coefficients in the appraiser's MRA model are suspect. Moreover, Table 1 reveals that the standard errors are so large and t- and f-statistics so low that anyone well-versed in MRA would not rely on these results for a supportable estimate of value. The appraiser has fallen deep into the model specification pit because the multiple regression results suffer from poor model specification.
However, it is somewhat understandable why the appraiser succumbed to this particular pitfall. It is apparent that the appraiser followed the example MRA found in Ramsland and Markham. Unfortunately, Ramsland and Markham fall into this same pit, leading other appraisers to do likewise.
Robustness of the Results of the Regression
Another extremely important aspect of regression analysis is to examine the robustness of the results of the results of the regression. The use of MRA to predict value is an application of statistical inference. Statistical inference is a type of reasoning that proceeds from the particular to the general or, in other words, from a few cases to the universal situation. That is, based upon the relationship between a dependent variable (such as value or price) and a set of independent variables (such as property characteristics), the appraiser infers an estimate of value for a subject property (the general situation) from the selling prices of a set of comparable properties (the particular situations). This inference works best when the results are insensitive to departures from the assumptions used in obtaining the results. The fundamental assumption underlying all MRA is that the data used to fir the regression model form a multivariate normal distribution. (See footnote 9 for further discussion of multivariat e normal distributions.) When the results of a MRA are insensitive to slight departures from the assumption of multivariate normality, the results are said to be robust.
The appraiser does not report any examination of the robustness of the results. Therefore, it is necessary to replicate the MRA results in order to investigate their robustness. This is done using version 6.07 of TS301 SAS computer software." The raw data used by the appraiser are fitted to a model identical to that used by the appraiser. The findings of this analysis are reported below.
Pitfall Number Four: Multivariate Normality
First and foremost, robustness requires that the distribution of the data be multivariate normal or at least closely multivariate normal. Tests for multivariate normality are not well developed, but tests for univariate normality are available. Data that is multivariate normal is always univariate normal, but not vice versa. Thus, if a data set is not univariate normal, it cannot be multivariate normal.
The Shapiro-Wilk test for univariate normality is particularly useful in testing for multivariate normality. Unfortunately, the MRA features of the standard spreadsheet packages do not include this important test statistic. Therefore, it is calculated using the SAS software package cited above.
For any data set to be multivariate normal, it must also be univariate normal one variable at a time. Thus, if a Shapiro-Wilk statistic is not statistically significant, then the variable is not distributed normally, and the set of independent variables cannot form a multivariate normal distribution. If all of the Shapiro-Wilk statistics are statistically significant, then the independent variables might form a multivariate normal distribution, but there is no guarantee of it and further testing is required to confirm the presence of multivariate normality.
Testing the appraiser's independent variables using the Shapiro-Wilk statistic reveals that the appraiser's comparable sales data set does not form a multivariate normal distribution. Only one of the independent variables is distributed normally, namely the log of building size. So, the log of building size should improve the chances of the data set being multivariate normal, but it does not achieve this end.
The lack of multivariate normality is not always a serious problem in MRA. If the sample size of the data set is large enough, the results often are asymptotically multivariate normal. That is, as more and more observations are added to the data set, the data becomes more and more multivariate normal. Unfortunately, the appraiser's set of comparable sales is far too small to ensure asymptotic multivariate normality. Hair, Anderson, Tathum, and Black suggest that a data set should have at least ten observations for each independent variable in the model,  while others call for 30 observations for each independent variable. Thus, the results of the MRA lack robustness due to the lack of multivariate normality and the small number of comparable properties the appraiser uses to conduct the MRA.
Pitfall Number Five: The Use of III-Conditioned Comparable Sales
Robustness of the results of a MRA also requires a data set that is well-conditioned. That is, the results of the regression analysis should not be sensitive to the deletion of one of the observations in the data set. One way in which a data set can be compromised is by something called ill-conditioned data. Further examination of the 10 comparable sales using appropriate statistical techniques  reveals that the appraiser's data are ill-conditioned. That is, the parameter estimates and subsequent estimate of value of the subject property are significantly affected by some of the comparable properties, specifically comparable sales number 1, 9, and 10.
The statistical tests for ill-conditioned data amount to the search for influential observations of comparable sales. One statistic that captures this effect is called the Dffits statistic. Unfortunately, standard spreadsheet packages do not calculate Difits statistics. Therefore, the Dffits values associated with each comparable sale are calculated using the previously cited SAS computer software. The Dffits values for comparable sales 1, 9, and 10 are 4.124, -4.735, and 10.375, respectively. In cases involving small sample size, Dffit values greater than 1.0 indicate influential sales, which in a regression analysis tend to distort the parameter estimates and estimates of value obtained from the model. Thus, the robustness of the results of the regression analysis is further weakened by the appraiser's use of ill-conditioned data.
Pitfall Number Six: Measurement Errors
Measurement errors in the data used in a regression analysis can also destroy the robustness of the results. It is important for any source of measurement error in the data to be carefully evaluated by the appraiser. Several of the property characteristics selected by the appraiser are worthy of this sort of evaluation. For example, the appraiser offers no objective data or information to support his measurement of two critically important property characteristics: location and age. Measurement errors are almost certainly present in these two property characteristics, due to the subjective nature of these measurements.
The appraiser measures location using an ordinal scale of 1.0 to 4.0, with 4.0 being the best location. Five of the comparable sales have a location score of 4.0, and the comparable sale in the poorest location has a score of 1.5. The appraiser gives the subject property a location score of 1.0, which lies outside the range of the location scores of the comparable properties. Thus, it is not possible for the appraiser to extract a defensible adjustment factor for location from the comparable properties. Moreover, half of the comparable properties have location scores four times that of the subject property. The appraiser should have found comparable properties with location scores both above and below the subject property's location score to properly adjust for location, and the appraiser should have used a more objective measure of location, such as proximity to nearby cities and major transit routes. In order to infer reasonably the adjustment factor for any property characteristic, it is necessary for the characteristics of the comparable sales to straddle the characteristic of the subject property. If this straddling is not achieved, the appraiser is extrapolating the adjustment factor beyond the range supported by the comparable sales data.  Thus, the location variable may contain some serious measurement errors.
Effective age is another variable that may contain measurement errors. Chronological age may be a better measure, but there are some problems with using chronological age as an indicator of effective age. One way of dealing with this problem is to include chronological age squared as well as chronological age in the set of independent variables. 
But the extent of these measurement errors is impossible to evaluate without on-site inspections of the comparable and subject properties to gather more objective data regarding the location and effective age of each property. The appraiser should have included a discussion of objective data that supports the subjective measurements in the report. But because this discussion is not included in the report, the full effect of these measurement errors cannot be assessed.
The regression analysis uses data that is not multivariate normal, and the data contain influential observations as well as measurement errors. Thus, the lack of robust results implies that the MRA should not be used as a basis for estimating the value of the subject property.
How to Avoid the Pitfalls
The representative appraisal report illustrates exceptionally well the pitfalls that await the unsuspecting appraiser who attempts to use MRA as a tool for estimating the value of real estate. It is understandable why the author of the representative appraisal report fell into so many of these pitfalls; until recently, the appraisal literature did not warn appraisers about them.
Appraisers can avoid these pitfalls by doing two things. First and foremost, appraisers using MRA must use more comparable sales than they are accustomed to using. A good rule of thumb is to us at least ten, preferably more, comparable sales per independent variable included in the MRA model However, appraisers are often required to work with very little data. When this is the case, it is a gross misrepresentation and standards violation to employ a data-hungry statistical technique like MRA present it as being reliable and accurate.
Secondly, appraisers who employ MRA should conduct the same sort of statistical tests discussed it this paper. Of course, most appraisers do not have the training nor the computer software to perform the statistical tests necessary to avoid the pitfalls. Fortunately, Levine, Berenson, and Stephan contains both the information and spreadsheet software fox performing these tests.  This particular text should be easily understandable by most appraisers, because it is commonly used in undergraduate business statistics courses throughout the country.
Hans R. Isakson, PhD, is a co-recipient of the 1979 Arthur A. May Award and is a professor of economics in the department of finance at the University of Northern Iowa, Cedar Fails, Iowa.
Excerpt from Papers and Proceedings, published by Valuation 2000 in July 2000.
(1.) A.T. court, "Hedonic Price Indexes with Automotive Examples," The Dynamics of Automobile Demand (New York: General Motors, 1939).
(2.) Webster's Ninth New Collegiate Dictionary (Springfield, Massachusetts: Merriam-Webster, Inc., 1990): 350.
(3.) Z. Griliches, "On an Index of Quality Change," Journal of American Statistical Society (56: 1961): 535-548.
(4.) W.M. Shenkel, Modern Real Estate Appraisal (New York: McGraw-Hill, 1978).
(5.) Maxwell O. Ramsland and Daniel E. Markham, "Market-Supported Adjustments Using Multiple Regression Analysis," The Appraisal Journal (April 1998): 181-191.
(7.) The market approach to value is also known as the sales comparison approach to value.
(9.) In order to make reliable inferences (i.e., predict the dependent variable) from the data being analyzed, the data must be distributed multivariate normal. See Henri Theli, Principles of Econometrics (New York: John Wiley & Sons, Inc., 1971): 67 for a mathematical definition of the multivariate normal distribution.
(10.) See Appraisal Institute, The Appraisal of Real Estate, 11th ed. (Chicago: Appraisal Institute, 1996): 731-736 for a discussion of statistical inference and confidence levels.
(11.) SAS Institute, Inc., World Headquarters, SAS campus Drive, Cary, North Carolina (1989).
(12.) Hair, Anderson, Tathum, and Black, multivariate Data Analysis with Readings (New York: McMillan, 1992).
(13.) See Belsley, Kuh, and Welsch for discussion of these techniques.
(14.) See Neter, et al,, 84-85 for a discussion of making inferences outside the range of the data set.
(15.) Marvin L. Wolverton, "Empirical Analysis of the Breakdown Method of Estimating Physical Depreciation," The Appraisal Journal (April 1998): 163-171.
(16.) David M. Levine, Mark L. Berenson, and David Stephan, Statistic for Managers Using Microsoft Excel (New Jersey: Prentice Hall, 1989).
Appraisal Institute. The Appraisal of Real Estate, 7th and 11th editions. Chicago: Appraisal Institute, 1978 and 1996.
Belsley, D.A., Kuh, E. and Welsch, R.E. Regression Diagnostics. New York: John Wiley & Sons, 1980.
Dilmore, Gene. "Of Regression Analysis, Business Valuation, Lotus 1-2-3, Hewlett-Packard, and William of Ockham." Business Valuation Review (June 1995): 75-82.
Table 1 MRA Results Taken from Representative Appraisal Base Date: 1/15/1992 Date of No. Location Sale When Built Sale Price Size 1 Ypsilanti Twp., MI 3/15/1997 1942, 78 $10,600,000 2,166,600 2 Silvis, IL 1/31/1997 1966 $2,625,000 751,658 3 Davenport, IA 9/14/1995 1967, 78 $10,500,000 2,422,650 4 Spirit Lake, IA 7/14/1995 1968, 72 $1,850,000 224,573 5 Columbus, OH 3/7/1995 1971 $20,000,000 3,917,800 6 Framingham, MA 12/21/1994 1947, 87 $8,000,000 2,866,526 7 Springfield, MO 6/22/1994 1967, 81 $10,000,000 1,698,161 8 Romulus, MI 7/15/1993 1942, 78 $6,670,000 1,046,260 9 Mentor, OH 5/17/1993 1969 $5,825,000 1,108,828 10 Underwood, IA 1/15/1992 1973 $4,517,275 405,160 Actual Price/ Predicted Precent Date No. Location Sq. Ft. Price/Sq. Ft. Variance Variance 1 Ypsilanti Twp., MI $4.89 $3.81 -22.18% 62 2 Silvis, IL $3.49 $4.56 30.17% 60 3 Davenport, IA $4.33 $4.65 7.30% 43 4 Spirit Lake, IA $8.24 $7.94 -3.59% 41 5 Columbus, OH $5.10 $4.97 -2.70% 37 6 Framingham, MA $2.79 $2.90 4.00% 35 7 Springfield, MO $5.89 $5.43 -7.79% 29 8 Romulus, MI $6.38 $5.49 -13.95% 18 9 Mentor, OH $5.25 $7.25 38.03% 16 10 Underwood, IA $11.15 $10.52 -5.66% 0 Eff. Land/Bldg. Log of No. Location Age Location Ratio Bldg. Size 1 Ypsilanti Twp., MI 30 4.0 2.0105234 6.3357787 2 Silvis, IL 30 2.0 3.5425952 5.8760203 3 Davenport, IA 28 3.0 3.6318432 6.3842907 4 Spirit Lake, IA 25 1.5 4.6532753 5.3513575 5 Columbus, OH 26 4.0 1.7118025 6.5930423 6 Framingham, MA 35 4.0 2.2794142 6.4573559 7 Springfield, MO 28 3.0 3.1225249 6.2299789 8 Romulus, MI 33 4.0 4.7712567 6.0196396 9 Mentor, OH 28 4.0 7.5689827 6.0448642 10 Underwood, IA 24 2.0 17.190246 5.6076266 Regression Output Constant Std Err of Y Est [R.sup.2] No. of Observations Degrees of Freedom X Coefficient Std Error t-stat f-stat Subject property 1/1/1996 $10,379,562 2,185,171 $.475 29.704 1.4087 0.8503 10 4 -0.03 -0.285 0.37122 0.1245317 -2.716181 0.0445 0.1811 0.98715 0.2079735 2.3080811 -0.68 -1.574 0.37606 0.5987864 -1.176814 4.54 Subject property 47 25 1.0 3.5469215 6.3394854
Note: This Exhibit is presented in the same format as that found in Ramsland and Markham. The column labeled Actual Price/Sq. Ft. is the dependent variable. The last four columns from the left contain the independent vaiables. The output from the MRA is to the right. The line at the bottom of the spreadsheet contains data for the subject property, including the estimated value ($10,379,562) from the MRA.
Contact the author for a copy of this spreadsheet containing the spreadsheet functions used that generate the Regression Output.