# Linear regression analysis of economic variables in the sales comparison and income approaches.

In "The Use of an Economic Indicator in the Sales Comparison Approach," Andrew J. Moye illustrated the concept of a paired-comparison technique in which the subject's projected net operating income (NOI) per square foot is divided by the comparable sales' NOIs per square foot, and the resulting ratio is multiplied by the respective sale prices per square foot.(1) This technique is performed in an increasing number of appraisals; however, despite its intuitive appeal, there is a flaw in the method. As Marvin L. Wolverton noted in a letter to the editor in the October 1991 issue of The Appraisal Journal, the ratio technique is nothing more than the equivalent of a subject's projected NOI per square foot capitalized by the respective sales' overall rates (|R.sub.O~). Wolverton added that Moye's further adjustment to the sales' overall rates in the Appendix to Moye's article involved circular logic.(2) Moye concurred in a response letter published in the January 1992 issue of The Appraisal Journal.(3)

Mark W. Galleshaw expanded on Wolverton's critique and emphasized the importance of applying a traditional sales comparison approach method. Galleshaw also illustrated the flaw in the net income/sale price ratio technique, showing that it merely disguises the capitalization of the subject's NOI per square foot by the comparable sales' overall rates.(4)

I concur with Galleshaw and Wolverton that the ratio technique, as such, is invalid. There is often a high correlation, however, between a property's NOI per square foot and sale price per square foot. Further, the intuitive appeal of a comparative technique is understandable. Despite the invalidity of the ratio technique, there is a method that circumvents this problem by providing a mathematical relationship between these two variables--a linear regression model. A linear regression analysis of the NOI-per-square-foot and sale-price-per-square-foot variables could provide another check and balance in the valuation process. This article demonstrates the application of a linear regression analysis of these two variables.(5)

CONFUSION IN USING THE HP-12C FOR LINEAR REGRESSION ANALYSIS

In preparing for the Appraisal Institute's comprehensive examination, I discovered that an identical linear regression problem was presented in two separate course handouts. The problem appeared on page 7-6A of the course material of Capitalization Theory and Techniques, Part B and elsewhere in course materials. The example is presented in Table 1.

As is illustrated in Table 1, while the problems are identical the solutions are different. The solutions are different precisely because the order in which the two sets of variables are input is different. Although the correlation coefficients (0.971863) are identical in both problems, the predicted rents per square foot are different. Therefore, only one technique can be correct.

When the correlation coefficient is fairly high, the predicted variables are usually fairly similar regardless of how the data are input. To enhance the credibility of the analysis, however, it is important to use the correct procedure. Based on my experience in reviewing appraisals, an increasing number of appraisals use a linear regression analysis to estimate certain variables. Evidently, many appraisers are unaware of the importance of the order in which data are input--even the Appraisal Institute course material is contradictory. Indeed, Hewlett-Packard's HP-12C booklet dated September 1983 presents an example of linear regression using the hours per week that seven salespersons worked and their respective sales per month.(6) It is logical to expect that the sales per month are a function of the hours worked. Thus, the sales per month would be the dependent variable (y) and the hours worked per week would be the independent variable (x). This Hewlett-Packard booklet, however, actually illustrates an example in which the hours worked were entered as the dependent variable and the sales volume as the independent variable (i.e., the opposite of what the proper order should have been). It is thus easy to see why many users of the HP-12C calculator assume that the order in which data are input is irrelevant--the correlation coefficients are identical and the manufacturer's own booklet is in error. I was not aware of this distinction until I took a Case Studies course in June 1987 from James J. Mason. During a discussion of a linear regression problem Mason almost off-handedly told the class that the data could be input only in one way. Those of us in the class who were somewhat familiar with the regression capabilities of the HP-12C were stunned. Nobody volunteered agreement with Mason and one student challenged him. During the next break this particular student worked an example both ways and discovered to his amazement that Mason was correct.(7)

LINEAR REGRESSION ANALYSIS

When two or more variables are analyzed, the focus is usually the degree to TABULAR DATA OMITTED which the variables are related and the nature of this relationship. For example, a relationship often exists between sale comparables' NOIs per square foot and their sale prices per square foot, particularly if reliable data have been obtained. The statistical methods for analyzing these relationships are correlation and regression.

A relationship or correlation exists between two variables when there is a systematic pattern in the magnitude of data from each data set. For example, sale properties with relatively high NOIs per square foot often have relatively high sale prices per square foot. Correlation analysis focuses on the types of variables and the degree to which they are related. A scatter plot, in which two variables are plotted on graph paper, is a simple form of correlation analysis. A scatter plot, however, does not provide an objective measure of the correlation between two variables. There are several techniques for constructing this measure depending on the shape of the relationship. The most common measure for linear relationships is the correlation coefficient.

The correlation coefficient (R) provides a mathematical measure of the degree of correlation between two variables; R measures how close the relationship is to being a straight line. The formula for calculating the correlation coefficient is shown below:

|Mathematical Expression Omitted~

where

n = Number of observations (sample size) or data points

|x.sub.i~ = Independent variables in data set

|Mathematical Expression Omitted~

|y.sub.i~ = Dependent variable in data set

|Mathematical Expression Omitted~

The correlation coefficient ranges in value from -1.0 to 1.0. A -1.0 correlation coefficient indicates a perfect negative correlation between two variables; an R of 1.0 indicates a perfect positive correlation. Measures between the two indicate the relative degree of correlation, with zero indicating no correlation.

A measure related to the correlation coefficient is the coefficient of determination (|R.sup.2~). The coefficient of determination is merely the correlation coefficient squared. The coefficient of determination provides the percentage of variance in one variable related to or explained by variance in the other variable.

The correlation coefficient measures the degree of the linear relationship between two variables; regression analysis defines the relationship mathematically. Once a formula is established, it can be used to predict the value of one variable given a particular value of another variable.

The objective of linear regression is to determine the formula for the line that best fits the pattern of the data. The formula for a line has the general form:

y = a + bx

where

y = Dependent variable

a = Intercept or y - bx

b = Slope or x coefficient

x = Independent variable

With this background, the linear regression problem outlined earlier in this article that was extracted from Appraisal Institute course material is performed using Lotus 1-2-3, release 2.3, software as well as the actual mathematical methodology. This demonstrates that once the dependent variable has been selected, there is only one correct way to input the data on the HP-12C and HP-17BII calculators. Incidentally, the regression results when release 2.3 is used are identical to those when a Hewlett-Packard calculator is used, because both are based in part on formulas that estimate the standard deviation of a sample (i.e., n - 1). Regression results derived through use of release 2.01 differ slightly from results derived through use of the Hewlett-Packard calculators in that the earlier release of Lotus bases calculations of standard deviations on the population (all possible observations, or n) instead of a sample. Upgrading to release 2.3 eliminates this discrepancy. This is an important improvement by Lotus, making the program compatible with the results of leading financial calculators and also making it more applicable to real estate appraising, which is typically based on samples. As is stated in The Appraisal of Real Estate, ninth edition:

The standard deviation is a way to describe a sample or a population that lends itself to further mathematical treatment. . . . One is subtracted from the number of observations in a sample to adjust for the one degree of freedom that is lost when the mean is calculated. . . . Samples are typically used in real estate appraising, so the second formula (|n - 1~ in the denominator of the standard deviation equation) is more applicable. . . . The standard deviation is used and understood in many disciplines and it can be calculated easily with an electronic calculator. It will undoubtedly be more widely used by appraisers in the future.(8)

The equations used to derive the statistics in the lower portion of Table 2 are presented below (these same equations are built into the HP financial calculators and Lotus 1-2-3).

|Mathematical Expression Omitted~

|Mathematical Expression Omitted~

|Mathematical Expression Omitted~

|Mathematical Expression Omitted~

|Mathematical Expression Omitted~

|Mathematical Expression Omitted~

where

|Mathematical Expression Omitted~

n = Number of observations in the sample

|Mathematical Expression Omitted~

cov(|y.sub.1~x) = Covariance between the dependent and independent variables

a, b = Intercept and slope estimates

The correctness of this analysis can be illustrated and verified by reversing the order in which data are input. The example presented, in which the predicted rents per square foot are \$11.05, \$11.94, and \$12.82, is incorrect, as the following analysis shows. First, what equation is being used to predict these figures? The figures have been derived on the HP-17BII, which can show up to eleven decimal places in conjunction with a single-digit number.

y = a + bx

|Mathematical Expression Omitted~

TABULAR DATA OMITTED

Year 1:

\$11.0460317460 = \$7.93333333333 - b(-2.5) + b(1)

= \$7.93333333333 + 3.5b

\$3.11269841267 = 3.5b

0.88934240362 = b

Year 2:

\$11.9353741497 = \$7.93333333333 - b(-2.5) + b(2)

= \$7.93333333333 + 4.5b

\$4.00204081637 = 4.5b

0.88934240364 = b

Year 3:

\$12.8247165533 = \$7.93333333333 - b(-2.5) + b(3)

= \$7.93333333333 + 5.5b

\$4.89138321997 = 5.5b

0.88934240363 = b

Differences of one number in the eleventh decimal place are caused by a rounding error. It turns out that the slope (b) is being calculated based on the following equation.

|Mathematical Expression Omitted~

Because the formula for the slope (b) or x coefficient does not include the coefficient of determination (|R.sup.2~) in the denominator, it can be concluded that the predicted rents per square foot are erroneous, and are based on an incorrect application of the linear regression model. Therefore, this proves that there is only one correct method for inputting the data.

LINEAR REGRESSION ANALYSIS USING THE HP-12C

Before entering data in the HP-12C, it is recommended that the statistics registers be cleared by pressing "f Clear |Sigma~." To enter data pairs these steps should be repeated for the entire data set.

1. The y value (i.e., sale price/square foot or effective gross income multiplier |EGIM~) should be keyed into the display (this is the dependent variable that is to be predicted or estimated).

2. "Enter" should be pressed.

3. The related x value (i.e., NOI/square feet, or operating expense ratio |OER~) should be keyed into the display (this is the independent variable that influences the dependent variable).

4. |Sigma~ + should then be pressed (the result in the display will return the number of data pairs entered; that is, the first data set entered will display 1.00, the second data set entered will display 2.00, etc.).

Once all of the data pairs have been inputted, an estimate of the subject's y variable can be obtained by the following keystrokes.

1. A new or the subject's x value should be keyed in.

2. The next step is to press g and |Mathematical Expression Omitted~, r.

The result is the predicted or estimated y value for the subject's x value. To calculate the correlation coefficient (R), x |is greater than or less than~ y should be pressed. To calculate the coefficient of determination (|R.sup.2~), simply square the correlation coefficient.

LINEAR REGRESSION ANALYSIS USING THE HP-17BII

The HP-17BII is a menu-driven calculator. To perform statistical analyses, the menu entitled "SUM" must be accessed. The y and x variables are inputted separately in the proper sequence. For example, the y values are inputted in the proper order of comparable sales and assigned a name, and the respective x values are inputted in the same order and assigned a name. To exit the named range, it is merely necessary to access the menu "GET" and "*NEW" and the inputting process can begin again. Each data list is given a name (an easy way to remember is to simply name the y range y and the x range x--or use more descriptive names such as NOI, EGIM). After the two data lists are entered and named, the "CALC" menu and the "FRCST" menu should be accessed. The calculator will respond by displaying "Select X Variable." The name of the list assigned to the x variables should be accessed, at which point the calculator will respond by displaying "Select Y Variable." Then the name of the list assigned to the y variables should be accessed. If the "CORR" menu is accessed next, the calculator will display the correlation coefficient (R). The subject's x variable should be entered in the menu in which the x variable range is located and the menu in which the y variable range is located should be accessed. The result is the predicted variable y for the subject property.

EXAMPLE OF NOI/SQUARE FOOT AND PRICE/SQUARE FOOT REGRESSION

The next two regression examples use the Lotus 1-2-3, release 2.3, software program. The HP calculators can be used as a preliminary tool to discover whether the linear regression results are worth using in the appraisal. The Lotus software can be used to verify the results from the HP calculators, to display more data in a printed format, and to make graphs. The example in Table 3 was taken from an apartment appraisal that I reviewed.

TABULAR DATA OMITTED

The results of the regression analysis indicate a high correlation among the data. The software program spews out the statistics on the right side of the table, except for the correlation coefficient (R). The coefficient of determination (|R.sup.2~), however, is provided automatically by the program. Therefore, to calculate the R value it is necessary to insert the following equation in Table 3:

@SQRT(Cell Reference containing the |R.sup.2~ value)

This is because the correlation coefficient is the square root of the coefficient of determination. To establish the predicted prices per square foot for each comparable and the subject, an equation must be established as follows:

Price/SF = Constant + (X coefficient) x NOI/SF

The results of the regression analysis are portrayed in Figure 1. It should be noted that the predicted price per square foot was close to the appraised value conclusion.

EXAMPLE OF OER AND EGIM REGRESSION

Figure 2 depicts another regression example involving the OER as the independent variable (x) and the EGIM as the dependent variable (y). The data were extracted from an appraisal report that I reviewed. It should be noted that two OERs were presented for the subject. The higher OER is based on the subject's pro forma income statement, including revenue on units in which the landlord pays utilities and their related expenses, albeit at some profit spread. The lower OER is based on the landlord-paid utilities being removed from the revenue and expense statement.

The predicted EGIMs were close to the appraiser's estimated 6.0 EGIM. The correlation coefficient (R) is relatively high and is inverse, which is typical, as properties with a lower operating expense ratio generally command a higher EGIM because investors place a premium on higher net income margins. It is important to emphasize that the Lotus program does not provide the R value. As previously discussed, a Lotus @SQRT function can be

inserted in the program. It is prudent, however, to insert a Lotus @IF statement because some correlations are negative. To merely take the square root of this |R.sup.2~ value, which is automatically positive, would result in another positive value for the R statistic. This would be misleading and inconsistent with the results obtained from using the HP-12C or HP-17BII calculators. By establishing the following formula, the problem of inadvertently generating a positive correlation coefficient in a negative-sloped equation can be avoided.

@IF(Cell Reference containing X coefficient |is less than~ 0, - @SQRT(Cell Reference containing |R.sup.2~), @SQRT(Cell

Reference containing |R.sup.2~))

APPROPRIATE USE OF REGRESSION ANALYSIS

Often linear regression is viewed disdainfully when used in a commercial appraisal report on a property for which relatively few comparable sales are available. The most common criticism is that there are not enough observations to be statistically valid or significant. This criticism automatically assumes that the data sampled are representative of the general population. The data selected, however, are by definition not randomly selected from the population universe; they are the most comparable, confirmable sales available. Second, rigorous adherence to statistical theory is not the goal of appraising. The purpose of appraising is to value real property, which is not an exact science. If reliable data are available and the results generally support the other approaches to value, a linear regression analysis can be a useful tool to supplement the other valuation techniques used. If the linear regression results are bizarre, which can result from the use of poor data, mistakes when inputting the data, or inefficiencies in the real estate market, the analysis should simply be omitted from the appraisal report.

The correlation coefficient is an easy benchmark with which to gauge the relative reliability of the predicted values. One general guideline to be aware of in a linear regression analysis, however, is whether the subject's independent parameter (e.g., x value such as NOI/square foot or OER) is within the range of the comparable data. Problems can occur if the subject's independent variable lies outside of the range of the comparable data. Obviously, it is always helpful to find comparables that bracket the subject property. The reason that problems can occur is that the linear relationship may not hold outside of the range of comparable data, particularly if the subject's data are far beyond the range of comparable data. A high correlation coefficient is certainly possible, but if the subject's data lie outside the range of comparables the results may be skewed (the results may also support other valuation estimates in the report). Thus, caution should be exercised when this phenomenon occurs. This concept is explained by Paul G. Hoel and Raymond J. Jessen in Basic Statistics for Business and Economics as follows:

The problem of predicting a y value for an x value beyond the range of observed x values is a considerably more difficult problem than that of predicting for a value of x inside the interval of observations. Prediction beyond the range of observations is called extrapolation, whereas prediction inside the range is called interpolation. The difficulty with extrapolation is that the assumptions necessary to justify it are seldom realized in real-life situations. . . . It will certainly not be if x is chosen sufficiently large. Furthermore, in most realistic regression problems it is unreasonable to expect the prediction errors to be normally distributed about the regression line with standard deviation of those errors remaining constant as x goes beyond the interval of observations. Extrapolation is a legitimate technique only when the experimenter has valid reasons for believing that his model holds beyond the range of the available observations. Stock market prices (y) as a function of time 9x, for example, are an excellent illustration of a regression problem for which no satisfactory extrapolation model has yet been constructed.(9)

Regression analysis should never be used in a strictly mechanical manner; judgment is of paramount importance in appraising real estate. If regression analysis is used prudently and cautiously, however, it can provide additional support to the final value estimate. UCLA real estate professor Fred Case describes the use of multiple regression(10) as follows:

Multiple regression analysis is an important tool which can be very useful in the appraisal process. It is not intended to replace other appraisal methods but is meant to be used in conjunction with other techniques. There are several advantages of multiple regression analysis which lead to its importance as an additional tool for the appraiser.

The conventional appraisal process includes obtaining a predicted sales price by applying adjustments to a limited number of comparable sales which are believed to be very similar to the subject property. This can lead to a distorted valuation TABULAR DATA OMITTED if the comparable sales selected are, in fact, not very similar to the subject property. With regression analysis, the estimated regression equation specifies the relationship between variations in selected property characteristics and variations in sales price in a precise mathematical manner . . . Another benefit of using multiple regression analysis is that it provides a measure of reliability of the predicted sales price when it computes confidence intervals around the predicted price. Such a measure of accuracy is not available in the conventional appraisal approach.

Multiple regression analysis should never be applied in a completely automatic or mechanical manner. The results obtained should be analyzed for reasonableness and any warnings printed should be heeded. If multiple regression analysis is used properly, it should provide the appraiser with additional information which will be very useful in the whole appraisal process.(11)

Another criticism leveled at the analysis of NOI-per-square-foot and sale-price-per-square-foot variables is that the two variables obliterate the independence of the income capitalization and sales comparison approaches. The same criticism could be applied to the use of a GIM analysis, however, which applies the estimated GIM from the sales comparison approach to the projected gross income from the income approach, and to the use of direct capitalization analysis, which often applies an estimated overall rate from the comparable sales to the projected NOI from the income approach.

Although appraisers should attempt to perform each valuation approach as independently as possible as a checks-and-balance system, the fact remains that all three approaches are integrally related. Richard Sorenson, in describing typical appraisal deficiencies in "The Art of Reviewing Appraisals," noted that one deficiency is the "failure to recognize that the three approaches are completely interdependent in most cases and that an appraisal comprises a number of integrated, interrelated, and inseparable procedures that have the common objective of a convincing and reliable estimate of value."(12)

CONCLUSION

As this article illustrates, there is only one correct technique for inputting data on the HP-12C and HP-17BII calculators once the dependent variable has been selected when linear regression analyses are performed. Examples are presented illustrating that the order of inputting the data does make a difference. A traditional sales comparison approach should always be attempted, and linear regression analysis can be a useful tool for increasing the validity and accuracy of the results. While no approach is totally independent, it is a good idea to implement each approach as independently as possible when developing a value estimate. A regression analysis can provide additional insight into an appraisal problem if used appropriately. Regression analysis should never be used in a mechanical fashion. The results should be viewed for reasonableness in conjunction with the value estimates derived by the other approaches to value. With proper application and judgment, linear regression analysis can supplement the traditional approaches to value and provide a more convincing value estimate. Finally, the circular logic inherent in using the NOI/sale price ratio technique can be avoided by employing linear regression analysis. A strong correlation exists between free cash flow per share and share price in corporate finance; similarly, a strong correlation often exists between NOI per square foot and sale price per square foot in commercial real estate.

1. Andrew J. Moye, "The Use of an Economic Indicator in the Sales Comparison Approach," The Appraisal Journal (April 1991): 280-284.

2. Marvin L. Wolverton, "Why Disguise the Economic Indicator?" The Appraisal Journal (October 1991): 573-574.

3. Andrew J. Moye, "Why Disguise the Economic Indicator?--Author's Response," The Appraisal Journal (January 1992): 145.

4. Mark W. Galleshaw, "Appropriate Uses of Economic Characteristics in the Sales Comparison Approach," The Appraisal Journal (January 1992): 91-98.

5. I prefer the term "economic variables" rather than the term "economic indicators" because one of my clients confused the term economic indicators with that of the U.S. leading economic indicators.

6. Hewlett-Packard, "HP-12C Owner's Handbook and Problem-Solving Guide," (Hewlett-Packard Company, September 1983): 88-91.

7. It is not surprising that Mason was correct as he was a key consultant to Hewlett-Packard in the development of the HP-12C.

8. American Inst. of Real Estate Appraisers, The Appraisal of Real Estate, 9th ed. (Chicago: American Inst. of Real Estate Appraisers, 1987), 640-641.

9. Paul G. Hoel and Raymond J. Jessen, Basic Statistics for Business and Economics (New York: John Wiley & Sons, Inc., 1971), 249-250.

10. This is similar to linear regression except that multiple independent variables are being analyzed.

11. Fred E. Case, Professional Real Estate Investing (Englewood Cliffs, New Jersey: Prentice-Hall, Inc., 1983), 316.

12. Richard Sorenson, "The Art of Reviewing Appraisals," The Appraisal Journal (July 1991): 366.

Stephen C. Kincheloe, MAI, is an independent real estate consultant based in Dallas, Texas. He received a BS in business administration from Oregon State University and an MBA in corporate finance from the John E. Anderson Graduate School of Management, the University of California at Los Angeles. Mr. Kincheloe has previously published in The Appraisal Journal.