Predicting urban land prices: a comparison of four approaches/ Zemes kainy miestuose prognozes: keturiv metody palyginimas.ABSTRACT. This paper investigates forecasting accuracy of four different hedonic he·don·icadj. 1. Of, relating to, or marked by pleasure. 2. Of or relating to hedonism or hedonists. [Greek h approaches, when vacant urban land prices are predicted in local markets. The investigated hedonic approaches are: 1) ordinary least squares estimation, 2) robust MM-estimation, 3) structural time series estimation and 4) robust local regression LOESS, or locally weighted scatterplot smoothing, is one of many "modern" modeling methods that build on "classical" methods, such as linear and nonlinear least squares regression. . Post-sample predictive testing Predictive testing is a form of Genetic testing. It is also known as presymptomatic testing. These types of testing are used to detect gene mutations associated with disorders that appear after birth, often later in life. indicated that more accurate predictions are obtained if the unorthodox methods of this paper are used instead of the conventional least squares estimation. In particular, the predictive unbiassness can significantly be improved when using the unconventional hedonic methods of the study. The paper also studied the structure of urban land prices. The most important attribute variables in explaining land prices were permitted building volume, house price index, northing north·ing n. 1. The difference in latitude between two positions as a result of a movement to the north. 2. Progress toward the north. Northward, that is, from bottom to top, reading of grid values on a map. and easting east·ing n. 1. The difference in longitude between two positions as a result of movement to the east. 2. Progress toward the east. Eastward (that is from left to right) reading of grid values on a map. . The influence of parcel size variable and different indicator variables on land prices were much weaker. KEYWORDS: Land price; Hedonic model; Prediction; Robustness; Flexibility SANTRAUKA Nagrinejama, kokiu tikslumu keturi skirtingi hedonistiniai metodai prognozuoja laisvu zemes plotu kainas vietinese miestu rinkose. Nagrineti tokie hedonistiniai metodai: 1) maziausiuju kvadratu metodas, 2) daugybiniq modeliu vertinimas, 3) struktoriniu laiko eiluciu vertinimas, 4) lokaline regresine analize. Post-sample prognostinis testas parode, kad tikslesnes prognozes gaunamos taikant netradicinius siame darbe nurodytus metodus, o ne iprasta maziausiuju kvadratu metoda. Taikant netradicinius hedonistinius tyrimo metodus, gali Gali can refer to:
1. INTRODUCTION Hedonic methods are often advocated in complex land valuation assignments in order to objectively minimise the systematic valuation error and in order to produce the necessary quality-adjustments, which stem from the differentiated nature of separate land parcels, validly and reliably. However, the use of hedonic models is plagued with some fundamental problems imposing serious threats to their empirical adequacy. These fundamental dilemmas include: (1) the temporal variability of land prices, (2) the spatial variability Spatial variability is characterized by different values for an observed attribute or property that are measured at different geographic locations in an area. The geographic locations are recorded using GPS (global positioning systems) while the attribute's spatial variability is of land prices, (3) the model specification dilemma and (4) outlying and influential observations. When investigating the temporal dimension of land prices it is important to understand that the behaviour of land prices is generally nonstationary. This is a typical characteristic of many economic time series, which means that the data-generating process that produces the observables is itself transient in time. The effect of time is also multidimensional: Often we can legitimately separate from each other the price trend, the price cycle, seasonal variation and random variation. Traditionally, when modelling temporal land price movements, the effect of time has been tried to reduce to the variation of cost-of-living index or house price index, which have subsequently been used as explanatory variables in a hedonic regression In economics, hedonic regression, or more generally hedonic demand theory, is a method of estimating demand or prices. It decomposes the item being researched into its constituent characteristics, and obtains estimates of the value of each characteristic. . Also the indicator variable technique (i.e. by using yearly time dummy variables) has been a very popular approach when analysing the temporal dimension of land prices. These approaches contain problems mainly because the influence of time can only be estimated in a manner, which is not very accurate in practice. Structural time series models, on the other hand, usually provide a more accurate description about temporal movements. The spatial variation of land prices can be divided to the spatial heterogeneity Environments with a wide variety of habitats such as different topographies, soil types and climates are able to accommodate a greater amount of species. Spatial heterogeneity and spatial dependency. Spatial heterogeneity implies that functional forms and parameters vary with location and are not homogeneous throughout the data set, whereas spatial dependence In mathematical statistics, spatial dependence is a measure for the degree of associative dependence between independently measured values in a temporally or in situ implies that the variation is a function of distance. The spatial dependency problem can usually be solved by including location or some distance variables into a hedonic regression as explanatory variables. The spatial heterogeneity problem is usually more problematic: One natural solution would be to narrow the analyses into reasonably small submarkets, which homogenises the data. However, in practise this operation is not typically feasible due to the scarcity of observations for the hedonic modelling purposes. Adaptive modelling techniques, such as local regression, usually provide a better solution to the spatial heterogeneity problem in that they possess a spatial adaptation property and thus explicitly address the spatial heterogeneity problem. The model specification dilemma can be solved by three different ways: (1) parametrically, (2) semiparametrically and (3) nonparametrically. Parametric modelling is the classical approach in the hedonic modelling of land prices, which is theory-laden because pre-specified functional forms are used in the analysis. Nonparametric techniques are on the other hand data-driven, very flexible tools and semiparametric techniques combine features from parametric and nonparametric approaches. The exact research problem determines what approach should be used. Generally, nonparametric methods are useful when associations between variables are complex (i.e. highly nonlinear) and theoretically unknown. Parametric models apply well to a less complex setting where there exists valid prior knowledge about model's functional form. Irrespective of irrespective of prep. Without consideration of; regardless of. irrespective of preposition despite a chosen approach the model specification dilemma contains the choice of a hedonic model's functional form, the selection of relevant study variables and an error distribution assumption. And it should be noted that the result depends on the chosen scale, which is often, however, implicit. Parametric models that represent data modelling data modelling - data model culture (Breiman, 2001) have formed the conventional dogma of hedonic pricing Hedonic Pricing A model identifying price factors according to the premise that price is determined both by internal characteristics of the good and external factors affecting it. methods in land price studies, where prespecified global models are estimated by means of ordinarily least squares or some modification thereof. Benefits of parametric approaches undeniably include: simplicity, interpretability, parsimony par·si·mo·ny n. 1. Unusual or excessive frugality; extreme economy or stinginess. 2. Adoption of the simplest assumption in the formulation of a theory or in the interpretation of data, especially in accordance with the rule of and comprehensive statistical theory. The fundamental obstacle, however, under-lying the general use of parametric models is their inflexibility, i.e. inability to learn genuine structure about the hedonic relationship from the evidence in such decision-making settings, where theoretically unknown nonlinearity is expected. This is the typical case when the effects of variables representing location and time are considered (McMillen and Thorsnes, 2003). The conventional result is that even the best parametric model In statistics, a parametric model is a parametrized family of probability distributions, one of which is presumed to describe the way a population is distributed. Examples
Semiparametric and nonparametric approaches are representative of algorithmic modelling culture (Breiman, 2001) that emphasise aspects of learning the complex structure from the available facts and adaptability to the features underlying the data. Semiparametric estimators are, more precisely, an intermediate strategy between theory-laden and data-driven estimators that have restricted learning ability, i.e. semiparametric estimators can approximate functions only within some prespecified classes. Their practical relevance is mainly in balancing the dual goals of low specification error and high efficiency (Pace, 1995; Anglin and Gencay, 1996) and in enchaining the interpretability of results. Nonparametric estimators are by their nature highly flexible and, thus, capable of approximating very general classes of functions (e.g. smooth functions, square integrable functions) that does not require any restrictive, unwarranted prespecification of the functional form of mean response function (nor any specific error distribution assumption). This renders nonparametric estimators to be powerful data-driven tools, albeit highly sensitive Adj. 1. highly sensitive - readily affected by various agents; "a highly sensitive explosive is easily exploded by a shock"; "a sensitive colloid is readily coagulated" to the problem of undersmoothing or overfitting, if local estimation is implemented unduly. Outlying and influential observations are very common in the land value studies, which may be genuine, faultless fault·less adj. Being without fault. See Synonyms at perfect. fault less·ly adv. values, generated under
conditions of some untypical Adj. 1. untypical - not representative of a group, class, or type; "a group that is atypical of the target audience"; "a class of atypical mosses"; "atypical behavior is not the accepted type of response that we expect from children"atypical factors or they can contain different errors (such as recording and measurement error; wrong population, etc.). Traditional hedonic modelling techniques, especially the ordinary least squares technique, are sensitive to outlying observations; even a single outlier outlier /out·li·er/ (out´li-er) an observation so distant from the central mass of the data that it noticeably influences results. outlier an extremely high or low value lying beyond the range of the bulk of the data. can drastically change the results and misguide mis·guide tr.v. mis·guid·ed, mis·guid·ing, mis·guides To lead or guide in the wrong direction; lead astray. mis·guid the inferences. In fact, a single sufficiently deviating data point can cause that the least squares estimator breaks down and generates results that are utterly unreliable and uninformative un·in·for·ma·tive adj. Providing little or no information; not informative. un in·for . Robust methods such as
MM-estimation, on the contrary, are not sensitive to outliers or
influential observations and, therefore, can tolerate a certain amount
of bad observations without the fear that the estimator breaks down and
produces completely useless results.
2. THE RESEARCH PROBLEM In this study four different hedonic modelling approaches are empirically compared together when urban land prices are modelled in a local market of Espoo, Finland. The fits are analysed and post-sample predictions are calculated across different modelling schemes. The main research question is: "Which approach produces the most accurate post-sample predictions with the given vacant urban land price data?" The forecasting accuracy is perhaps the most important operation criterion, which determines the much of the utility of the corresponding hedonic model. Generally, for any valuation method to have a sufficient degree of validity it must produce an accurate prediction of the most probable market price of a land parcel. The four different hedonic approaches that are investigated in this paper consist of: 1) Ordinary least squares estimation. 2) Robust MM-estimation. 3) Structural time series estimation. 4) Robust local regression. Ordinary least squares estimation and robust MM-estimation represent parametric hedonic methods, structural time series estimation is a semiparametric hedonic method and robust local regression is a nonparametric hedonic method. Post-sample predictions are analysed using six different predictive accuracy indicators, which are: 1) mean prediction error, 2) mean absolute percentage error Mean absolute percentage error (also known as MAPE) is measure of accuracy in a fitted time series value in statistics, specifically trending. It usually expresses accuracy as a percentage. , 3) mean absolute error, 4) root mean squared error In statistics, the mean squared error or MSE of an estimator is the expected value of the square of the "error." The error is the amount by which the estimator differs from the quantity to be estimated. , 5) correlation coefficient Correlation Coefficient A measure that determines the degree to which two variable's movements are associated. The correlation coefficient is calculated as: and 6) gravity. 3. PREVIOUS RELATED RESEARCH' Most of the hedonic modelling studies in land markets have been based on the ordinary least squares estimation, yet some nonparametric and semiparametric estimation techniques have also been applied. However, none of the hedonic studies has been focused on the issue of land price prediction, which is the main focus in this paper. Shimizu and Nishimura (2007) estimated using ordinary least squares hedonic price equations of commercial and residential land prices in Tokyo for a 25-year period (from 1975 to 1999) and investigated possible structural changes in these price equations. They find that the price structure differed significantly among locations reflecting differences in supplier pricing and end-user preferences. They also found significant structural changes in the underlying price structure, identifying pre-bubble, bubble and post-bubble periods. Colwell and Munneke (2003) examined urban land prices within a nonparametric framework using piecewise parabolic par·a·bol·ic also par·a·bol·i·cal adj. 1. Of or similar to a parable. 2. Of or having the form of a parabola or paraboloid. regression with specific interest in the land price gradient with respect to distance from the inner city. They concluded that the piecewise parabolic regression is an amazingly flexible technique, which can be used to represent very complex land value functions. Clapp et al. (2001) estimated a hedonic price index equation to determine the value of land under residential structures in Fairfax county, Virginia Fairfax County is a county in Northern Virginia, in the United States. As of 2005, the estimated population of the county is 1,041,200;[1] making it by far the most populous jurisdiction in the Commonwealth of Virginia, and larger than seven states. at various points in time over the 1975 to 1992 time frame. A set of three simultaneous equations explained land value together with changes in population density and the percentage working at home. The method of estimation was ordinary least squares. They stressed the importance of dealing with the double simultaneity issue and found that the land-value surface has changed dramatically over time. Lin and Evans (2000) investigated the relationship between the price of land and size of plot when plots were small. They used a land price data from the city of Taipei, Taiwan. They found that the price of land per unit of area increases with lot size. Colwell and Munneke (1999) investigated spatial dimension to the concavity con·cav·i·ty n. A hollow or depression that is curved like the inner surface of a sphere. concavity, n 1. the condition of being concave. n 2. in the total price and parcel size relationship, when the dataset consisted of sales of vacant residential, commercial and industrial land in Cook County, Illinois Cook County is a county located in the U.S. state of Illinois. As of 2000, the population was 5,376,741, making it the second largest county by population in the United States (after Los Angeles County, California), and accounting for 43. during the time period of 1986 to 1993. They used ordinary least squares and found that concavity is higher in the rest of Cook County than in the CBD (Component Based Development) Building applications with components (objects). See component software. CBD - component based development for all three land-use types. Thorsnes and McMillen (1998) used a semiparametric estimator to analyse the relationship between land values and parcel size in the Portland, Oregon, metropolitan area. The value-size relationship was estimated nonparametrically and a simple log-linear parametric relationship was assumed for the rest of the model. They found that ordinary least squares and semiparametric estimates imply similar results: There was a concave Concave Property that a curve is below a straight line connecting two end points. If the curve falls above the straight line, it is called convex. value-size relationship meaning that subdivision costs cause large parcels to trade at a discount. Colwell (1998) investigated a nonparametric method, a piecewise parabolic regression analysis In statistics, a mathematical method of modeling the relationships among three or more variables. It is used to predict the value of one variable given the values of the others. For example, a model might estimate sales based on age and gender. , for estimating spatial land price functions in the Chicago CBD. The independent variables were barycentric coordinates Barycentric coordinates can refer to:
Atack and Margo (1998) investigated using ordinary least squares a simple monocentric urban model of the price of vacant land in Manhattan in the time frame of 1835 to 1900. They also found that vacant land in Manhattan was price elastic with respect to distance from the CBD in 1845 but becomes price inelastic inelastic Of or relating to the demand for a good or service when quantity purchased varies little in response to price changes in the good or service. in the post-Civil War period. Colwell and Munneke (1997) studied the structure of urban land prices in Chicago using data from the sales of commercial, residential and industrial land during the time period of 1986 to 1992. The method of estimation was ordinary least squares (and multinomial logit estimation to control for possible sample selection bias).They found evidence that land prices are non-linear in nature and that land prices are concave in parcel size. McMillen (1996) analysed locally weighted regression The introduction to this article provides insufficient context for those unfamiliar with the subject matter. Please help [ improve the introduction] to meet Wikipedia's layout standards. You can discuss the issue on the talk page. in modelling land prices in Chicago using two different data sets from 1836 to 1990. Two parametric models were estimated: a simple monocentric model and a more flexible spatial expansion model. These fits were compared to local linear regression Linear regression A statistical technique for fitting a straight line to a set of data points. estimates, which locally estimated the spatial expansion model. McMillen also demonstrated that local regression is useful for both prediction as well as testing hypothesises in land markets. McMillen summarised: "Locally weighted regression is a useful tool for spatial modelling. Nonlinearity is handled directly and simply". 4. HEDONIC METHODS OF THE STUDY 4.1. Ordinary least squares estimation Ordinary least squares estimation is by far the most applied hedonic method in practice. This is a parametric estimator where the form of hedonic function is specified before seeing the data. The only aspects that are determined from the data are the hedonic prices of different attribute variables. The conventional hedonic regression approach that is based on ordinary least squares is an appropriate modelling context, strictly speaking Adv. 1. strictly speaking - in actual fact; "properly speaking, they are not husband and wife" properly speaking, to be precise , if the interest solely focuses on the cross-sectional variation of the hedonic prices and if the problem due to spatial heterogeneity can adequately be addressed. When temporal aspects are analysed with the ordinary least squares estimator, several problems are encountered. According to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. Scwann (1998) the core problem in local markets is the lack of sufficient degrees of freedom, since estimation involves an extensive set of time-indexed dummy variables along with other regressors, at least one for each time period. Even if the locality of the markets imposes no dilemma, the major weakness of these methods remains: parameters values in one period do not affect the values of parameters in other periods (Francke and Vos, 2004). Some nonlinear features can be accounted by the ordinary least squares estimator e.g. by using the double-log model specification. However, many nonlinear features are in any case omitted if the orthodox least squares estimator is applied. As a result, the ordinary least squares estimation produces only a coarse description about the actual dependencies between the regressand and the regressors. Whether this approximation is satisfactory in practise depends largely on the predictive accuracy of the estimated hedonic model. The time element is estimated by the OLS OLS Ordinary Least Squares OLS Online Library System OLS Ottawa Linux Symposium OLS Operation Lifeline Sudan OLS Operational Linescan System OLS Online Service OLS Organizational Leadership and Supervision OLS On Line Support OLS Online System using house-price index measure. The main idea is then that the temporal variability can be reduced to that of the variability in that index. Time indexed dummy variables were not used, on the one hand, because of high collinearity collinearity very high correlation between variables. between the different indicators and, on the other hand, because the use of house price index variable tend to produce a better approximation to temporal movements than by simply using time indicators. 4.2. Robust MM-estimation The aim of robust statistics is to investigate the behaviour of estimators, when the basic modelling assumptions (linearity, normality, independence, etc.) are not exactly valid but are at most approximations to reality. To put it slightly differently, the basic aims of robust statistics are (Hampel et al., 1986, p. 11): * To describe a structure best fitting the bulk of the data. * To identify deviating data points (outliers) or deviating substructures for further treatment. * To identify and give a warning about highly influential data points (leverage points). * To deal with unspecified serial correlations, or more generally, with deviations from the assumed correlation structures. In practice the approximate nature of hedonic models is largely result of the occurrence of gross errors, the empirical character of models and only partial validity of theoretical modelling assumptions. In general, the hedonic model can be considered as robust if * It is reasonably unbiased and efficient. * Small deviations from the hedonic model assumptions will not substantially impair the performance of the hedonic model. * Somewhat larger deviations will not invalidate in·val·i·date tr.v. in·val·i·dat·ed, in·val·i·dat·ing, in·val·i·dates To make invalid; nullify. in·val the hedonic model completely. In this study a very fault tolerant The ability to continue non-stop when a hardware failure occurs. A fault-tolerant system is designed from the ground up for reliability by building multiples of all critical components, such as CPUs, memories, disks and power supplies into the same computer. and computationally intensive method, the three-stage MM-estimation, is analysed in the hedonic modelling of land prices. This estimator is parametric in nature, i.e. the model structure is fixed in advance. In the first phase of the MM-estimation is calculated a regression estimate, which is consistent and have a high break-down point, but is not necessarily efficient. In the second phase the scale of errors is estimated, which is based on the residuals of the first phase. In the third phase is calculated the M-estimate of the hedonic prices. The breakdown point the MM-estimator is the highest, i.e. 50% of the data can be corrupted before the estimator provides useless results. The computational algorithm used to derive the hedonic prices is a variant of iterative it·er·a·tive adj. 1. Characterized by or involving repetition, recurrence, reiteration, or repetitiousness. 2. Grammar Frequentative. Noun 1. re-weighted least squares, which is applied in the M-estimation. Iterative solution is needed because weights depend on residuals, residuals depend on estimated hedonic prices and hedonic prices depend on weights. Lets assume that we have an initial estimate of hedonic prices, [[??].sub.0], and its deviation measure [s.sub.2]. Lets define the weights: [w.sub.i]([beta]) = [[psi].sub.1] ([r.sub.i]([beta])/s)/[r.sub.i]([beta])/s. (1) where [[psi].sub.1] is double-weighted objective function, [r.sub.i]([beta]) are residuals and s is a measure of scale. Then lets define: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. ], (2) and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (3) where -g([beta]) is the gradient of residuals sum of squares. Now the applied iterative formula for the derivation derivation, in grammar: see inflection. of hedonic prices can be written: [B.sub.j+1] = [B.sub.j] + 1/[2.sub.k] [DELTA]([B.sub.j]), (4) where [DELTA](beta]) = [M.sup.-1]([beta])g([beta]). The integer k is chosen so that the left side of the inequality: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (5) is minimised and 0 < [delta] < 1. The time element is estimated by the MM-method using house-price index measure. 4.3. Structural time series models The variation of observed land prices is a combination of cross-sectional and time-series variations (Schulz, 2003, p. 58). Besides the spatial characteristics, the selling date is an important attribute in explaining the evolution of market prices through the flux of time which itself is directly an unobservable quantity, i.e. time is a latent variable In statistics, Latent variables (as opposed to observable variables), are variables that are not directly observed but are rather inferred (through a mathematical model) from other variables that are observed and directly measured. . What we can observe are different states that occur in a predefined submarket and changes that they cause in prices in that market area. (Francke and Vos, 2004) The time-series or temporal variation is a result of changing market conditions, which are driven by, among others, changes in consumers' preferences, investors' expectations and technological advantages. The temporal variation can be understood as representing that part of price variation that is more or less common to all parcels of land in the same submarket (Schulz and Werwatz, 2004). An empirical model of land prices has to recognize these two different, yet closely related sources of variations. Given the special characteristics of land markets one natural solution to the dual problem of hedonic modelling caused by spatio-temporal variation is combine the flexibility of a time series model with that of the interpretation of a regression. This is the underlying rationale in the structural time series approach: the observations are directly made up of trend, cycle, seasonal, and regression components plus error. In essence, structural time series models can be thought of as regression models in which explanatory variables are functions of time and the parameters are time-varying (Harvey, 1997). Structural time series methods can also be understood as semiparametric estimators that combine many of the benefits of parametric and nonparametric estimators; temporal variability of land prices is estimated in a nonparametric fashion, which permits the effect of time to be linear, convex and concave Convex and Concave is a lithograph print by the Dutch artist M. C. Escher which was first printed in March, 1955. It depicts an ornate architectural structure with many stairs, pillars and other shapes. in different regions, whereas the hedonic prices of attribute variables are estimated in a parametric manner. When considering the determination of hedonic prices in land markets and, specifically the temporal dimension, there are several benefits in using the structural time series approach and the associated state space form as compared to the Box-Jenkins ARIMA methodology. These include (Harvey and Shephard, 1993; Harvey, 1997; Durbin and Koopman, 2002, p. 51-53): * Structural analysis of the problem. Different components that make up the series, including the regression elements, are modelled explicitly when, in contrast, the Box-Jenkins approach is a sort of "black box". A structural model provides not only the forecasts of the series but also presents a set of stylised Adj. 1. stylised - using artistic forms and conventions to create effects; not natural or spontaneous; "a stylized mode of theater production" conventionalised, conventionalized, stylized facts. Also a structural model can be handled within a unified statistical framework that produces optimal estimates with well-defined properties. * Management of nonstationarity. In a structural model nonstationarity can be handled conveniently by unobserved components without the need of differencing any variables. By comparison, in the Box-Jenkins approach the stationary is assumed, and nonstationary components of the series are usually eliminated by differencing the variables, which results to a potential loss of valuable long-term information. Furthermore, the standard unobserved component models are simple, yet effective, leading to parsimonious par·si·mo·ni·ous adj. Excessively sparing or frugal. par si·mo representations for the
systems.
* Generality. Multivariate observations can easily be handled with structural models, which cover as special cases a wide range of econometric e·con·o·met·rics n. (used with a sing. verb) Application of mathematical and statistical techniques to economics in the study of problems, the analysis of data, and the development and testing of theories and models. models (including all ARIMA models). Explanatory variables can be introduced into the model structure and the associated regression coefficients (hedonic prices) can be permitted to vary stochastically sto·chas·tic adj. 1. Of, relating to, or characterized by conjecture; conjectural. 2. Statistics a. Involving or containing a random variable or variables: stochastic calculus. over time if needed. Different kinds of intervention variables can be specified and lagged values of dependent as well as explanatory variables can be incorporated to a model. Missing observations and varying dimensionality of observations are issues that are straightforward to deal with structural models. In this study the local level model or the random walk plus noise model is used to capture the underlying trend in the series. The local level model is the simplest, yet effective, structural trend model, which regards an observation on land price [p.sub.t] at time t as being made up of an underlying level [[mu].sub.t] and an irregular disturbance [[epsilon].sub.t] (Koopman et al., 1999; Durbin and Koopman, 2002, p. 44-45): [p.sub.t] = [[mu].sub.t] + [[epsilon].sub.t], {[[epsilon].sub.t]} ~ NID NID Next ID NID Network Interface Device NID No I Don't NID Namespace Identifier NID National Intelligence Director NID New Iraqi Dinar NID No I Didn't NID Network Identification NID National Inventory of Dams NID NCVA (0, [[sigma].sup.2.sub.[epsilon]], [[mu].sub.t] = [[mu].sub.t-1] + [[eta].sub.t], {[[eta].sub.t]} ~ NID(0, [[sigma].sup.2.sub.[eta]]. (6) The underlying level [[mu].sub.t] is not directly observable. It is generated by a random walk, i.e. the level term in the current period is equal to the level term in the previous period plus a level disturbance term [[eta].sub.t]. The effect of [[eta].sub.t] is to allow the level of the trend to shift up and down. It is generally assumed that the level and irregular disturbances are mutually independent and independent of [[mu].sub.0]. The signal-to-noise ratio q = [[sigma].sup.2.sub.[eta]] / [[sigma].sup.2.sub.[epsilon]] plays a vital role in determining how observations should be weighted for prediction and smoothing. Basically the higher q is, the greater is the discounting of past observations. The reduced form In social science and statistics, particularlly econometrics, a reduced form equation is a method of dealing with endogeneity. A reduced form equation is defined by James Stock & Mark Watson (2007) in the following way: of local level model is ARIMA(0,1,1) with certain restrictions on the parameter space In generative art people talk about parameter space as the set of possible parameters for a generative system. In statistics one can study the distribution of a random variable. Several models exist, the most common one being the normal distribution (or Gaussian distribution). . Cycles are characteristic to many economic time series as economy goes from boom to recession and back again. These can be modelled in different ways, but in this study cycles are effectively presented as a mixture of sine and cosine cosine: see trigonometry. See sine. COSINE - Cooperation for Open Systems Interconnection Networking in Europe. A EUREKA project. waves with two parameters [[theta Theta A measure of the rate of decline in the value of an option due to the passage of time. Theta can also be referred to as the time decay on the value of an option. If everything is held constant, then the option will lose value as time moves closer to the maturity of the option. ].sub.1] and [[theta].sub.2]. If [[psi].sub.t] is a cyclical function of time with frequency [[lambda].sub.c] that is measured in radians, then (Harvey and Shephard, 1993): [[psi].sub.t] = [[theta].sub.1] cos [[lambda].sub.c]t + [[theta].sub.2] sin [[lambda].sub.c]t, (7) where the period of the cycle is 2[pi]/[[lambda].sub.c], [square root of [[theta].sup.2.sub.1] + [[theta].sup.2.sub.2]] is the amplitude and [tan.sup.-1]([[theta].sub.2]/[[theta].sub.1]) is the phase. A stochastic By guesswork; by chance; using or containing random values. stochastic - probabilistic cycle can be constructed recursively: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (8) where [[kappa Kappa Used in regression analysis, Kappa represents the ratio of the dollar price change in the price of an option to a 1% change in the expected price volatility. Notes: Remember, the price of the option increases simultaneously with the volatility. ].sub.t] and [[kappa]'.sub.t] are mutually uncorrelated with a common variance [[sigma].sup.2.sub.[kappa]]. [rho] [member of] [0,1] is a damping factor
In audio system terminology the damping factor gives the ratio of the rated impedance of the loudspeaker to the source impedance. . Stationary models correspond to situations where [rho] is strictly less than one. A first-order autoregressive process is an important limiting case of a stochastic cycle when a frequency [[lambda].sub.c] is equal to 0 or [pi]. The calculations of unobserved components and hedonic prices are done by using the Kalman filtering and smoothing recursions. These can be expressed as: [[??].sub.t|t-1] = [[PI].sub.t][[??].sub.t-1] + [W.sub.t][beta], (9) [S.sub.t|t-1] = [[PI].sub.t][S.sub.t-1] [[PI]'.sub.t] + [R.sub.t][Q.sub.t][R'.sub.t], (10) [v.sub.t] = [p.sub.t] - [Z.sub.t][[??].sub.t|t-1] - [X.sub.t][beta], (11) [F.sub.t] = [Z.sub.t][S.sub.t|t-1][Z'.sub.t] + [H.sub.t], (12) [[??].sub.t] = [[??]sub.t|t-1] + [S.sub.t|t-1][Z'.sub.t][F.sub.t.sup.-1] ([p.sub.t] - [Z.sub.t][[??].sub.t|t-1] - [W.sub.t][beta]), (13) [S.sub.t] = [S.sub.t|t-1] - [S.sub.t|t-1][Z'.sub.t][F.sub.t.sup.-1][Z.sub.t] [S.sub.t|t-1], (14) where [p.sub.t] is a N x 1 vector of observed land prices at time t, [[alpha].sub.t] is a m x 1 state vector
where of the prediction error. The basic Kalman filtering and smoothing recursions, which are described in the formulas 9-14, are supplemented in this research by a set of complementary vector and matrix recursions, because non-stationary components and fixed regression effects are present. This is called an augmented Kalman filter (Koopman et al., 1999; Durbin and Koopman, 2002, p. 115-120), which is described by the equations: [V.sub.t] = -Z[A.sub.t|t-1] - [X.sub.t][beta], (15) [A.sub.t+1|t] = [[PI].sub.t][A.sub.t|t-1] + [W.sub.t]B + [K.sub.t][V.sub.t], (16) ([m.sub.t], [M.sub.t]) = ([m.sub.t-1], [M.sub.t-1]) + [V'.sub.t][F.sub.t.sup.-1] ([v.sub.t], [V.sub.t]) (17) with [A.sub.1|0] = [W.sub.0]B and B = ([B.sub.x], [B.sub.i]) is a square selection matrix of zeros and ones and the subscripts x, i are related to regression and initial effects, respectively. The number of columns of [V.sub.t] and [A.sub.t+1|t] is the same as in the matrix B. [K.sub.t] = [[PI].sub.t][S.sub.t|t-1] [Z'.sub.t][F.sub.t.sup-1] is the so-called Kalman gain. Now the one-step ahead prediction of the state vector and the associated mean square error matrix are given by: [[??].sup.*.sub.t|t-1] = [[??].sub.t|t-1] + [A.sub.t|t-1][M.sup.-1.sub.t-1][m.sub.t-1], (18) [S.sup.*.sub.t|t-1] = [S.sub.t|t-1] + [A.sub.t|t-1][M.sup.-1.sub.t-1][m.sub.t-1], [A'.sub.t|t-1]. (19) The one-step ahead prediction errors and the associated mean square error matrix are given by: [v.sup.*.sub.t] = [v.sub.t] + [V.sub.t][M.sup.-1.sub.t-1][m.sub.t-1], (20) [F.sup.*.sub.t] = [F.sub.t] + [V.sub.t][M.sup.-1.sub.t-1][V'.sub.t], (21) The matrix inversions for [M.sub.t] can be evaluated in a manner similar to recursive See recursion. recursive - recursion regressions (de Jong De Jong is the most common Dutch surname. Many people bear this name, including many important historical figures. Some of these people are mentioned below. De Jong may mean:
4.4. Robust local regression (4) Much of the aim of applied hedonic analysis is to produce a reasonable approximation to the generally unknown mean response function. The primary implication of the theoretical literature concerning hedonic prices in the real estate markets is that hedonic relationships are expected to be highly nonlinear due to their locational uniqueness that induces spational heterogeneity of regression surfaces (Wallace, 1996; McMillen and Thorsnes, 2003) that cannot be, in general, specified a priori a priori In epistemology, knowledge that is independent of all particular experiences, as opposed to a posteriori (or empirical) knowledge, which derives from experience. (Anglin and Gencay, 1996). Nonlinearity indicates locally changing degrees of curvature in the hedonic function with non-constant characteristic values. Nonlinearity is a fundamental feature that characterise processes in the real estate markets imposing serious threats to empirical validity of hedonic models that in current practise are predominantly used. The complex question of validity underlying hedonic model specification can be divided into three subproblems that involve determining (Pace, 1993 and 1995; Wallace, 1996): (1) the relevant set of response and attribute variables; (2) the appropriate functional form between these variables; (3) the adequate error distribution for inference. Economic theory and past experience usually provide useful a priori information of what variables should enter the model structure that substantially reduce the threat of omitted variable bias. Phenomena in real estate markets are, however, strongly dependent on the particular submarket, time period and property type and, as a consequence, the selection of proper set of dependent and conditioning variables is partially an empirical question, too. Economic theory or previous experience rarely provides any specific, valid guidance on the choice of an appropriate functional form of the hedonic model (Pace, 1993 and 1995; Anglin and Gencay, 1996; Gencay and Yang, 1996). A prespecified functional form is, however, the fundamental assumption underlying the use of theory-laden parametric models; a poor choice imposes artificial structure on data and significantly invalidates results of the subsequent analysis (5). In contrast, nonparametric techniques are data-driven, flexible approaches that can learn much of the genuine structure from available facts and, therefore, allow greatly reduced attention to the question of which functional form ought to be used. Local regression techniques can significantly reduce the mis-specification error by letting the data to determine the appropriate functional relationship between the response and a set of attributes. Locally weighted regression adapts locally to changing curvature in the hedonic surface by giving more weight to nearby observations (McMillen, 1996) and, therefore, can account for complex nonlinear patterns. The local adaptation property, which is achieved by parametric localization Customizing software and documentation for a particular country. It includes the translation of menus and messages into the native spoken language as well as changes in the user interface to accommodate different alphabets and culture. See internationalization and l10n. (Cleveland and Loader, 1996), makes it a highly attractive tool for estimating spationally non-homogeneous hedonic functions. Furthermore, any specific assumption underlying the error distribution can be relaxed and, in most cases, derived directly from the evidence e.g. by resampling techniques. Data on land prices are imperfect, which generate difficult problems with conventional parametric approaches. In particular, extreme points, influential and outlying observations, which might represent erroneous data or otherwise reflect unusual market conditions such as non-arm's length transactions, can seriously undermine the performance of parametric estimator. The results of locally weighted regression can be robustified in a straightforward manner by a scheme, which is a variant of M-estimation. This simple regulation robuste typically offers enough protection against unusual or aberrant observations. The local regression problem can be formalized for·mal·ize tr.v. for·mal·ized, for·mal·iz·ing, for·mal·iz·es 1. To give a definite form or shape to. 2. a. To make formal. b. by using locally weighted least squares (e.g. Ruppert and Wand, 1994; Loader, 2004): Minimize [n.summation summation n. the final argument of an attorney at the close of a trial in which he/she attempts to convince the judge and/or jury of the virtues of the client's case. (See: closing argument) over (i=1)] [W.sub.H] ([x.sub.i] - x)([p.sub.i] - <[theta], F([x.sub.i] - x)>).sup.2], (22) where [theta] is the d + 1 vector of unknown coefficients and F(x) is a vector of basis polynomials. [W.sub.H] is a multivariate weight function and [H.sup.1/2] is a bandwidth matrix. The local least squares estimate of the unknown regression function f(x) is then (6): [??](x) = [e'.sub.1] [(X'WX).sup.-1] X'Wp. (23) For local cubic regression [e.sub.1] is a {1 + d + 1/2 d(d + 1) + 1/6 d(d + 1)(d + 1} 1 x 1 vector having 1 in the first entry and all other entries 0. For local quadratic quadratic, mathematical expression of the second degree in one or more unknowns (see polynomial). The general quadratic in one unknown has the form ax2+bx+c, where a, b, and c are constants and x is the variable. and linear model the dimension of [e.sub.1] is, respectively, {1 + d + 1/2 d(d + 1)} x 1 and {1 + d} x 1. p = [[p.sub.1], ..., [p.sub.n]]' is a vector of observed land prices and the data matrix X for the local cubic model is: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (24) where [x.sub.(i)] = ([x.sub.i] - x) and the vec-operator stacks the columns of ([x.sub.i] - x) ([x.sub.i] - x)', each below the previous, but with entries above main diagonal Noun 1. main diagonal - the diagonal of a square matrix running from the upper left entry to the lower right entry principal diagonal diagonal - an oblique line of squares of the same color on a checkerboard; "the bishop moves on the diagonals" omitted; [cross product] is the Knonecker (tensor tensor, in mathematics, quantity that depends linearly on several vector variables and that varies covariantly with respect to some variables and contravariantly with respect to others when the coordinate axes are rotated (see Cartesian coordinates). ) product. The local linear and quadratic model uses only the first two and three, respectively, columns of the data matrix. The weight matrix is composed of W = diag {[W.sub.H] ([x.sub.1] - x), ..., [W.sub.H]([x.sub.n] - x)} [equivalent to] diag {[w.sub.1](x),... ,[w.sub.n](x)}. The computational method for estimating the fit at points of evaluation is based on a damped Newton-Raphson algorithm (Loader, 1999, p. 209-211; Loader, 2004): [[??].sub.k+1] = [[??].sub.k] + 1/[2.sup.j] [J.sup.-1] X'Wq. (25) [[??].sub.k] is an estimate of the parameter vector [theta] at the k + 1 iteration One repetition of a sequence of instructions or events. For example, in a program loop, one iteration is once through the instructions in the loop. See iterative development. (programming) iteration - Repetition of a sequence of instructions. . Here j is selected to be the smallest non-negative integer that results in an increase of the local log-likelihood at every step, and q is the usual score function. Furthermore, the Jacobian matrix can be expressed as J = [D.sup.1/2] P' [SIGMA] [PD.sup.1/2], where D is the matrix of diagonal elements of J = X' WVX WVX Windows Video Playlist with P and [SIGMA] representing, respectively, the eigenvectors and eigenvalues eigenvalues statistical term meaning latent root. of [D.sup.1/2][JD.sup.1/2]. V is the diagonal observed information matrix. Since the sample sizes and the dimensions of attribute spaces are small in this study, direct evaluation of fit is potentially feasible (although not applied) using the recursion In programming, the ability of a subroutine or program module to call itself. It is helpful for writing routines that solve problems by repeatedly processing the output of the same process. See recurse subdirectories. of (25), which would mean that separate weighted least squares regression is estimated for each data point, with more influence given to nearby observations. Direct evaluation is, however, computationally infeasible for larger data sets (7) and for increased dimensions of attribute space; a general algorithm is needed to perform the selection of evaluation points, where the fit is subsequently estimated. Tree-based structures are popular; in particular, the k-d-trees due to (Friedman, Bentley and Finkel, 1977) or growing adaptive trees (see, inter alia, Loader, 1999, p. 212-218). Different interpolation interpolation In mathematics, estimation of a value between two known data points. A simple example is calculating the mean (see mean, median, and mode) of two population counts made 10 years apart to estimate the population in the fifth year. schemes (e.g. blending functions) can be used to define the fit elsewhere. The time element is estimated by the local regression using house-price index measure. 5. SAMPLE DATA The sample data of this study involve observations on urban residential land prices and the associated characteristics in the municipality of Espoo, a highly polycentric polycentric /poly·cen·tric/ (-sen´trik) having many centers. city (8), which lies inside the Helsinki metropolitan area with circa 225 000 habitants Habitants is the name used to refer to both the French settlers and the America-born inhabitants of French origin who farmed the land along the two shores of the St. Lawrence waterway in what is the present-day Province of Quebec in Canada. ; its population is the second largest of the cities in Finland, which has experienced a rapid growth in its late history. The study period is from January, 1985 to December, 2007 with total number of observations of 3149 that constitute a judgement sample and cover phases of upward and rapid downward movements of land prices. In that period Finnish economy has experienced a great depression, which has had a major influence of land prices also. The observations from the last year (total of 78) are held back for post-sample predictive testing; a choice which is a somewhat arbitrary and mainly dictated by practical valuation concerns. In Table 1 are documented some standard sample statistics for the study variables. The total sales price is chosen as the proper dependent variable (instead of the unit price) after some empirical experimentation: The goodness-of-fit statistics are much better when total sales price is explained by attribute variables. The unit of total sales price is [euro]. Permitted building volume and parcel size variables are expressed in square meters. Northing and easting represent co-ordinates in the Finnish KKJ-system. The house price index variable is a quality-adjusted measure of house prices in the Helsinki metropolitan area and it is unitless. There are also seven indicator variables in the data set that deserves a mentioning. Presence of a shore indicator receives a value of one if the land parcel is bordered on water system and null otherwise. Presence of a shore NA indicator receives a value of one, if it is not known whether the land parcel is bordered on water system and null otherwise. There existed 327 observations where it was not known whether it was bounded by a water system or not. Housing block indicator receives a value of one, if the intended use of the land in the detailed plan is multistorey apartment block and null otherwise (9). Single-family house indicator receives a value of one if the intended use of the land in the detailed plan is for single-family houses and null otherwise. Row house indicator receives a value of one, if the intended use of the land in the detailed plan is for row houses row houses npl (US) → casas fpl adosadas and null otherwise. Municipality indicator receives a value of one if the buyer of the land is a municipality and null otherwise. And finally, private person indicator receives a value of one if the buyer of the land is a private person and null otherwise. 6. RESULTS OF HEDONIC MODEL ESTIMATION 6.1. Ordinary least squares estimation Table 2 summarises the results of the ordinary least squares estimation, in which double-log model specification is used (i.e. all quantitative variables are logarithmised). The standard goodness-of-fit statistics, the coefficient of determination Coefficient of determination A measure of the goodness of fit of the relationship between the dependent and independent variables in a regression analysis; for instance, the percentage of variation in the return of an asset explained by the market portfolio return. Also known as R-square. and standard error of regression, indicate the fit is quite good. In particular, the coefficient of determination statistics is over 0.70, which commonly used target in land valuation. Furthermore, the standard error of regression is below 0.40 indicating that the internal precision is acceptable. Statistically, four most significant attribute variables are, respectively: house price index (t-value is 55.53), permitted building volume (t-value is 42.01), easting (t-value is 27.53) and northing (t-value is -23.89). Furthermore, there are six statistically significant indicator variables and parcel size variable in the hedonic model. However, these remaining seven attribute variables explain the observed variability in land prices much less than the four most significant attribute variables. All explanatory variables are plausible in sign and magnitude. Overall, 74 outliers were dropped from the final hedonic model (10). 6.2. Robust MM-estimation Table 3 summarises the results of the robust MM-estimation, in which double-log model specification is used. The standard error of regression statistic indicates a slightly better fit than in the case of ordinary least squares (11). All explanatory variables are plausible in sign and magnitude. The standard error of regression statistic is below 0.40 indicating that the internal precision is acceptable. Statistically, four most significant attribute variables are, respectively: easting (t-value is now increased to 143.62!), house price index (t-value is 56.26), permitted building volume (t-value is 46.19) and northing (t-value is -32.89). Furthermore, there are six statistically significant indicator variables and parcel size variable in the hedonic model. However, these remaining seven attribute variables explain the observed variability in land prices much less than the four most significant attribute variables. No outliers were dropped from this hedonic model. Instead, the influence of aberrant observations was down weighted by using a specific weight function. 6.3. Structural time series models Table 4 summarises the results of the structural time series estimation, in which double-log model specification is used for regression effects. The standard goodness-of-fit statistics, the coefficient of determination and the standard error of regression, indicate the fit is pretty good. In particular, the coefficient of determination statistics is 0.90 and the standard error of regression is below 0.34 indicating that the internal precision is acceptable (12). Statistically, three most significant attribute variables are, respectively: permitted building volume (t-value is 36.96), easting (t-value is 28.07) and northing (t-value is -24.79). The statistical significance of the house price index variable is now significantly reduced (the t-value is now only 6.61). The reason for this is that the unobserved components are, in fact, already revealing much of the same information than the house price index variable. Furthermore, there are six statistically significant indicator variables and parcel size variable in the hedonic model. However, these remaining seven attribute variables explain the observed variability in land prices much less than the three most significant attribute variables. All explanatory variables are plausible in sign and magnitude. Structural time series model also uses unobserved components to account for the temporal variability in the dependent variable. In Table 4 there are three different unobserved components in the model structure: the level term (which is the dynamic version of the constant variable), one cycle term (with two components) and an 1st order autoregressive (AR(1)) process. The data analysed contained many outlying observations in terms of an unusual high value of standardised residual (13). Instead of removing the outlier its effect was statistically measured by an impulse intervention variable and the influence was subsequently included as part of the overall model specification resulting to no loss of price information. In the final hedonic model there are 73 impulse intervention variables. 6.4. Robust local regression Table 5 summarises the results of the robust local regression, in which local double-log model specification is used. To avoid the curse of dimensionality The curse of dimensionality is a term coined by Richard Bellman to describe the problem caused by the exponential increase in volume associated with adding extra dimensions to a (mathematical) space. , only the six most significant variables from the ordinary least squares estimation were included into final model of local regression (14). The overall in-sample fit is better than in the former approaches. The coefficient of determination statistic is 0.93 and the standard error of regression statistic is 0.29. Because the local regression is a nonparametric method, there are no coefficient estimates that can be reported (15). No outliers were dropped from this hedonic model. Instead, the influence of aberrant observations was down weighted by use of M-estimation. 7. MEASURES OF PREDICTIVE ACCURACY Predictive accuracy is perhaps the single most important operational criterion in the evaluation of performance of chosen hedonic model. The success of hedonic model-based forecast depends on (see, Hendry, 1997): (1) the existence of structure; (2) whether such structure is informative about the future; (3) the proposed method capturing the structure; (4) the exclusion of irregularities that swamp the structure. The aspects in (1)-(2) are characteristics of the economic system and the last two of the chosen forecasting method. When structure is understood as a systematic relation between the entity to be forecast and the available information, the conditions in (1)-(4) are sufficient for forecastability. There are numerous different indicators for post-sample predictive assessment of hedonic models (e.g. Case et al., 2004) and the relative ranking of the performance of various models varies according to the applied accuracy measure. Mean prediction error is evaluated in this study by the arithmetic average prediction error, which measures the predictive unbiassness of the hedonic model. Two measures of strength of the association between predictions and observed out-of-sample land prices are reported. First, the usual correlation coefficient is calculated, which is a useful measure of statistical relation in the case of normally distributed variables and when the focus is on the co-variation of variables. The major problem of using the classical correlation measure in land valuation studies lies in its strong dependency on the normality assumption, which is typically violated by the influence of aberrant error terms, whose effect is squared in the denominator, which, in turn, tend to lead to highly similar standard deviations between different model alternatives. Secondly, the gravity (see McMillen, 2001) is reported that is not strongly dependent on any particular distributional assumptions. Generally, the gravity seems to be a viable measure of strength of association (16). Root mean squared error (RMSE) is the most commonly used measure of success of numeric prediction, which controls the reliability or variability of predictions. This statistic is very sensitive to outlying observations tending to exaggerate the variance of prediction errors of model choices in which the prediction error is larger than the others (which is typical in land price studies). Mean absolute error WE) is generally a more appropriate indicator of predictive variability, and is especially suitable in cases of outlying prediction errors. Widely used measure of predictive variability is mean absolute percentage error (MAPE MAPE Mean Absolute Percentage Error MAPE Minnesota Association of Professional Employees MAPE Multinational Advisory Police Element (UN - Albania) ) (see e.g. Makridakis and Hibon, 2000) which, however, has some problems of asymmetry Asymmetry A lack of equivalence between two things, such as the unequal tax treatment of interest expense and dividend payments. and instability, when the data are small. 8. FORECASTING ACCURACY OF DIFFERENT HEDONIC APPROACHES Table 6 summarises the post-sample prediction statistics for the four different approaches (ordinary least squares, MM-estimation, structural time series and local regression). Six different measures of predictive accuracy are reported. First of all, the mean prediction error, which measures the predictive unbiassness, is significantly reduced, when MM-estimation, structural time series or local regression is used instead of the orthodox ordinary least squares. The mean prediction error is 81% smaller, when MM-estimation is used, 64% smaller when structural time series is used and 59% smaller, when local regression is used, instead of ordinary least squares. It therefore seems that predictive validity In psychometrics, predictive validity is the extent to which a scale predicts scores on some criterion measure. For example, the validity of a cognitive test for job performance is the correlation between test scores and, for example, supervisor performance ratings. can be significantly improved when unorthodox methods (MM-estimation, structural time series, local regression) are applied. The mean prediction error is smallest when MM-estimation is used. MAPE is useful predictive measure and usually in practice it is the measure we looking for. Here all approaches produce an error, which is only a slight over 2%. The unorthodox approaches produce a smaller MAPE that the orthodox approach: Structural time series gives 7% smaller MAPE, MM-estimation generates 5.5% smaller MAPE and local regression produces 4.3% smaller MAPE than in the case of ordinary least squares. MAPE is smallest when structural time series are applied. The unorthodox approaches all give the same RMSE of 0.18, whereas the ordinary least squares produces RMSE of 0.20. It means that the unorthodox approaches produce a RMSE that is 10% smaller the one that is obtained by ordinary least squares. MAE (1) (Metropolitan Area Exchange) Originally known as Metropolitan Area Ethernets, MAEs are junction points on the Internet where data is exchanged between carriers. See IXP and NAP. is a robust version of RMSE and it is usually more reliable indicator than RMSE. Again MAE is highest when ordinary least squares is used: Structural time series generates 9.7% smaller MAE, MM-estimation and local regression produce 6.5% smaller MAE, when compared to the case of ordinary least squares. Correlation coefficients are very similar between the approaches, the only exception is the value underlying structural time series, which is 1% higher than in the other cases. When gravity is used there are more differences between the approaches: the highest association is obtained when structural time series is used and the lowest association is obtained when ordinary least squares is used. Specifically, structural time series produces 9.5% higher gravity, MM-estimation generates 8.2% higher gravity and local regression gives 6.5% higher gravity, when compared to the case of ordinary least squares estimation. 9. CONCLUSIONS This paper has investigated the structure of urban residential land prices and, specifically, the predictive accuracy of hedonic models between four different approaches, when land prices are predicted in a local market. In this study applied hedonic approaches are: 1) ordinary least squares estimation, 2) robust MM-estimation, 3) structural time series estimation and 4) robust local regression. Ordinary least squares and robust MM-estimation are parametric methods, structural time series estimation is a semi-parametric method and robust local regression is a nonparametric method. Post-sample predictive assessment indicated that more precise predictions are obtained if the unorthodox methods of this study are used instead of the conventional least squares estimation. In particular, the predictive unbiassness can significantly be improved, if we move from the orthodox least squares estimation to using robust MM-estimation, structural time series estimation or robust local estimation. All six different forecasting indicators were better (or at least equal) in the case of the non-standard hedonic methods. Among the four different hedonic approaches, in overall, finest post-sample predictions are produced by the structural time series estimation. The hedonic estimation revealed that there are four separate attribute variables that have an overriding effect of land prices. These independent variables are: permitted building volume, house price index, northing and easting. The influence of parcel size variable and different indicator variables on land prices were much weaker. Received 23 May 2008; accepted 9 July 2008 REFERENCES Anglin, P. M. and Gencay, R. (1996) Semiparametric estimation of a hedonic price function, Journal of Applied Econometrics, 11(6), pp. 633-648. Atack, J. and Margo, R.A. (1998) Location, location, location Location, Location, Location is a popular Channel 4 property programme, presented by Kirstie Allsopp and Phil Spencer. The reality show follows two real estate experts as they try to find the perfect home for a different set of buyers each week. It first aired in May 2001. ! The price gradient for vacant urban land: New York New York, state, United States New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of , 1835 to 1900, Journal of Real Estate Finance and Economics, 16(2), pp. 151-172. Breiman, L. (2001) Statistical modeling: The two cultures, Statistical Science, 16(3), pp. 199-231. Case, B., Clapp, J., Dubin, R. and Rodriquez, M. (2004) Modelling spatial and temporal house price patterns: A comparison of four models, Journal of Real Estate Finance and Economics, 29(2), pp. 167-191. Clapp, J. M., Rodriquez, M. and Pace, R. K. (2001) Residential land values and the decentralization de·cen·tral·ize v. de·cen·tral·ized, de·cen·tral·iz·ing, de·cen·tral·iz·es v.tr. 1. To distribute the administrative functions or powers of (a central authority) among several local authorities. of jobs, Journal of Real Estate Finance and Economics, 22(1), pp. 43-61. Cleveland, W. S. and Loader, C.R. (1996) Smoothing by local regression: Principles and methods. In: Hdrdle, W. and Schimek, M. G. (eds.) Statistical Theory and Computational Aspects of Smoothing, Physica-Verlag. Colwell, P.F. (1998) A primer on piecewise parabolic multiple regression analysis via estimations of Chicago CBD land prices, Journal of Real Estate Finance and Economics, 17(1), pp. 87-97. Colwell, P. F. and Munneke, H. J. (1997) The structure of urban land prices, Journal of Urban Economics, 41(3), pp. 321-336. Colwell, P. F. and Munneke, H. J. (1999) Land prices and land assembly in the CBD, Journal of Real Estate Finance and Economics, 18(2), pp. 163-180. Colwell, P. F. and Munneke, H. J. (2003) Estimating a price surface for vacant land in urban area, Land Economics, 79(1), pp. 15-28. Durbin, J. and Koopman, S. J. (2002) Time Series Analysis by State Space Methods. Oxford Statistical Science Series #24, Oxford University Press. Francke, M. K. and Vos, G. A. (2004) The hierarchical trend model for property valuation and local price indices, Journal of Real Estate Finance and Economics, 28(2/3), pp. 179-208. Friedman, J. H., Bentley, J. L. and, Finkel, R. A. (1977) An algorithm for finding best matches in logarithmic logarithmic pertaining to logarithm. logarithmic relationship when the logs of two variables plotted against each other create a straight line. expected time, ACM (Association for Computing Machinery, New York, www.acm.org) A membership organization founded in 1947 dedicated to advancing the arts and sciences of information processing. In addition to awards and publications, ACM also maintains special interest groups (SIGs) in the computer field. Transactions on Mathematical Software, 3(3), pp. 209-226. Gencay, R. and Yang, X. (1996) A forecast comparison of residential housing prices by parametric versus semiparametric conditional mean estimators, Economic Letters, 52(2), pp. 129-135. Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986) Robust Statistics: The Approach Based on Influence Functions, Wiley Series in Probability and Mathematical Statistics. Hannonen, M. (2005) On the recursive estimation of hedonic prices of land, Nordic Journal of Surveying and Real Estate Research, 2(2), pp. 30-56. Harvey, A. C. (1997) Trends, cycles and auto-regressions, Economic Journal, 107 (January), pp. 192-201. Harvey, A. C. and Shephard, N. (1993) Structural Time Series Models. In: Maddala, G. S., Rao, C. R. and Vinod, H. D. (eds.) Handbook of Statistics, Vol. 11, Elsevier Science Publishers B. V. Hendry, D.F. (1997) The econometrics of macroeconomic mac·ro·ec·o·nom·ics n. (used with a sing. verb) The study of the overall aspects and workings of a national economy, such as income, output, and the interrelationship among diverse economic sectors. forecasting, Economic Journal, 107 (September), pp. 1330-1357. de Jong, P. (1991) The diffusive dif·fu·sive adj. Characterized by diffusion. dif·fu sive·ly adv.dif·fu Kalman filter, Annals of Statistics, 19(2), pp. 1073-1083. Koopman, S. J., Harvey, A. C., Doornik, J. A. and Shephard, N. (1999) Structural Time Series Analysis, Modelling and Prediction using Stamp, Timberlake Consultants, London. Lin, T. and Evans, A. W. (2000) The relationship between the price of land and size of plot when plots are small, Land Economics, 76(3), pp. 386-394. Loader, C. (1999) Local Regression and Likelihood. Springer Series in Statistics and Computing, Springer-Verlag. Loader, C. (2004) Smoothing: Local Regression Techniques, Handbook of Computational Statistics, (eds.) Gentle, J., Hdrdle, W. and Mori, Y., Springer-Verlag. McMillen, D. P. (1996) One hundred fifty years of land values in Chicago: A nonparametric approach, Journal of Urban Economics, 40(1), pp. 100-124. McMillen, D. P. (2001) Nonparametric employment subcenter sub·cen·ter n. A secondary center, especially a commercial or shopping area located away from the main business sector of a city. sub·cen identification, Journal of Urban Economics, 50(3), pp. 448-473. McMillen, D. P. and Thorsnes, P. (2003) The aroma of Tahoma: Time-varying average derivates and the effect of a superfund site on house prices, Journal of Business and Economics Statistics, 21(2), pp. 237-246. Pace, R. K. (1993) Nonparametric methods with applications to hedonic models, Journal of Real Estate Finance and Economics, 7(3), pp. 185-204. Pace, R. K. (1995) Parametric, semiparametric, and nonparametric estimation of characteristic values within mass assessment and hedonic pricing models, Journal of Real Estate Finance and Economics, 11(3), pp. 195-217. Rousseeuw, P. J. and Yohai, V. J. (1984) Robust Regression Please [ improve this article] by rewriting this article or section in an . by Means of S-estimators. In: Franke J., Hardle, W. and Martin D. (eds.) Robust and Nonlinear Time Series, Lecture Notes in Statistics, Springer-Verlag, 26, pp. 256-272. Ruppert, D. and Wand, M. P (1994) Multivariate Locally Weighted Least Squares Regression, Annals of Statistics, 22(3), pp. 1346-1370. Shimizu, C. and Nishimura, G. N. (2007) Pricing structure in Tokyo metropolitan land markets and its structural changes: Pre-bubble, bubble, post-bubble, Journal of Real Estate Finance and Economics, 35(4), pp. 475-496. Schulz, M.A.R. (2003) Valuation of Properties and Economic Models of Real Estate Markets. Dissertation, Berlin: Humboldt University Berlin. Schulz, R. and Werwatz, A. (2004) A state space model for Berlin house prices: Estimation and economic interpretation, Journal of Real Estate Finance and Economics, 28(1), pp. 37-57. Schwann, G. M. (1998) A real estate price index for thin markets, Journal of Real Estate Finance and Economics, 16(3), pp. 269-287. Thorsnes, P. and McMillen, D. P. (1998) Land value and parcel size: A semiparametric analysis, Journal of Real Estate Finance and Economics, 17(3), pp. 233-244. Wallace, N. E. (1996) Hedonic-based price indexes for housing: Theory, estimation and index construction, FRBSF FRBSF Federal Reserve Bank of San Francisco (California) Economic Review, No. 3, pp. 34-48. Marko HANNONEN Institute of Real Estate Studies, Department of Surveying, Helsinki University of Technology TKK redirects here. For other uses, see TKK (disambiguation). Helsinki University of Technology is not to be confused with University of Helsinki. Helsinki University of Technology (TKK) (Finnish: Teknillinen korkeakoulu; Swedish: Tekniska högskolan , Espoo, P.O. Box 1200, FIN-02015 HUT, Finland E-mail: marko.hannonen@pp.inet.fi; Tel: +358 05 596 6065; Telefax. +358 9 465 077 Notes (1) This section reviews the hedonic price studies in land markets, which are presented in major scientific journals since 1995. Major findings, the data and the modelling methods are documented. (2) In the study these are obtained by using S-estimation (Rousseeuw and Yohai, 1984). (3) Regression coefficients can be time-varying, but this the representation used in the empirical section of the study. Here p and k denote the number quantitative and qualitative explanatory variables, respectively. (4) Local regression means locally weighted regression, in which local polynomial polynomial, mathematical expression which is a finite sum, each term being a constant times a product of one or more variables raised to powers. With only one variable the general form of a polynomial is a0xn+a functions are used in estimating the regression surface. (5) In the parametric modelling context, a common solution to the problem of selecting an appropriate functional form is to consider a set of parametric functions with the objective of finding a model structure that matches the evidence in most measurable respects. However, there is no clear evidence that this practice will be successful in avoiding functional form mis-specification (Anglin and Gencay, 1996; Hannonen, 2005). Specification searches can be highly time-consuming and the intrinsic power of these specification tests is somewhat questionable. (6) Assuming, as usual, that X' WX is non-singular. (7) Also visualisation of regression surfaces, variance functions etc. demands that a separate, reduced number of fitting points are selected. (8) Because of this polycentric nature numerous distance measures are needed to various subcenters. From the hedonic modelling viewpoint this creates problems of multicollinearity, when several distance measures are used. As a solution, no distance measures are used but the co-ordinates describing location are used instead. (9) Intended use of all sites in this study is for housing so that there does not exist non-residential types of land use. (10) Outliers are considered here as those observations whose standardised residual is larger than 3.5. This is a typical value in the Finnish practice when hedonic based land analysis is conducted. (11) The coefficient of determination statistic cannot be calculated in standard manner and thus it is not reported in the case of MM-estimation. (12) In fact, the standard error of regression statistic is now same as in the case of MM-estimation. (13) Intervention variables are used for those observations whose standardised residual is larger than 3.5. (14) In other words Adv. 1. in other words - otherwise stated; "in other words, we are broke" put differently , only those attribute variables were included into the local regression from the ordinary least estimation with global double-log specification whose t-values in absolute terms (Alg.) such as are known, or which do not contain the unknown quantity. See also: Absolute were higher than five. (15) To be more specific, nonparametric estimators are, in fact, "over-parametric" in a sense that they generate an infinite number infinite number a number so large as to be uncountable. Represented by 8, frequently obtained by 'dividing' by zero. of hedonic prices for each attribute depending on the values of that attribute. In practice, for a particular characteristic a single representative hedonic price is often of direct interest and, consequently, some kind of average derivative is usually needed. However, there are problems in the average derivate der·i·vate adj. Derivative. estimation, so that in this study no average hedonic prices are calculated. (16) Here gravity is calculated so that the weighted (by the area) inner product of predictions and realisations is divided by the L2-distance between predictions and realisations.
Table 1. Sample statistics of study variables
Arithmetic
Variable mean Minimum
Total sales price 191738.49 631.00
Permitted building volume 736.23 30.00
Parcel size 2286.93 300.00
Northing 6677724.65 6668110.00
Easting 2540238.28 2528820.00
House price index 204.2 116.40
Presence of a shore indicator NA 0
Presence of a shore NA indicator NA 0
Housing block indicator NA 0
Singly-family house indicator NA 0
Row house indicator NA 0
Municipality indicator NA 0
Private person indicator NA 0
Std.
Variable Maximum Deviation
Total sales price 19450000.00 647188.81
Permitted building volume 54322.40 1750.24
Parcel size 113000.00 4800.9
Northing 6781565.00 5954.26
Easting 3475399.47 17009.52
House price index 350.10 57.78
Presence of a shore indicator 1 NA
Presence of a shore NA indicator 1 NA
Housing block indicator 1 NA
Singly-family house indicator 1 NA
Row house indicator 1 NA
Municipality indicator 1 NA
Private person indicator 1 NA
Table 2. Fit and hedonic prices from the ordinary least squares
regression
Variable Coefficient Standard error
Constant 1658.300 152.110
Permitted building volume 0.766 0.0182
Parcel size 0.115 0.0191
Northing -260.586 10.909
Easting 165.110 5.998
House price index 1.455 0.0262
Presence of a shore indicator -0.768 0.143
Singly-family house indicator 0.0807 0.0269
Row house indicator 0.138 0.0404
Municipality indicator -0.295 0.0671
Private person indicator -0.0619 0.0181
Presence of a shore NA indicator 0.0848 0.0235
Variable t-value p-value
Constant 10.90 0.0000
Permitted building volume 42.01 0.0000
Parcel size 6.04 0.0000
Northing -23.89 0.0000
Easting 27.53 0.0000
House price index 55.53 0.0000
Presence of a shore indicator -5.38 0.0000
Singly-family house indicator 3.00 0.0027
Row house indicator 3.42 0.0006
Municipality indicator -4.40 0.0000
Private person indicator -3.42 0.0006
Presence of a shore NA indicator 3.61 0.0003
Coefficient of determination: 0.88 Standard error of
regression: 0.37 Number of outliers: 74
Table 3. Fit and hedonic prices from the MM-estimation
Variable Coefficient Standard error
Constant 1589.955 123.018
Permitted building volume 0.774 0.0168
Parcel size 0.109 0.0174
Northing -257.274 7.823
Easting 166.216 1.157
House price index 1.453 0.0258
Presence of a shore indicator -0.867 0.133
Singly-family house indicator 0.0803 0.0258
Row house indicator 0.182 0.0395
Municipality indicator -0.313 0.0672
Private person indicator -0.0666 0.0177
Presence of a shore NA indicator 0.0665 0.0233
Variable t-value p-value
Constant 12.93 NA
Permitted building volume 46.19 NA
Parcel size 6.25 NA
Northing -32.89 NA
Easting 143.62 NA
House price index 56.26 NA
Presence of a shore indicator -6.51 NA
Singly-family house indicator 3.11 NA
Row house indicator 4.61 NA
Municipality indicator -4.66 NA
Private person indicator -3.77 NA
Presence of a shore NA indicator 2.85 NA
Coefficient of determination: NA Standard error of regression: 0.34
Table 4. Fit and hedonic prices from the structural time series
estimation
Variable Coefficient Standard error
Level 1704.400 144.250
Cycle (comp. #1) 0.0178 0.126
Cycle (comp. #2) -0.00430 0.141
AR(1) 0.0339 0.143
Permitted building volume 0.7050 0.0190
Parcel size 0.1790 0.0195
Northing -256.740 10.358
Easting 158.11 5.632
House price index 0.927 0.140
Presence of a shore indicator -1.101 0.146
Housing block indicator 0.168 0.0378
Singly-family house indicator 0.123 0.0268
Row house indicator 0.178 0.0398
Municipality indicator -0.358 0.0630
Private person indicator -0.0811 0.0171
Variable t-value p-value
Level 11.82 0.0000
Cycle (comp. #1) NA NA
Cycle (comp. #2) NA NA
AR(1) NA NA
Permitted building volume 36.96 0.0000
Parcel size 9.19 0.0000
Northing -24.79 0.0000
Easting 28.07 0.0000
House price index 6.61 0.0000
Presence of a shore indicator -7.54 0.0000
Housing block indicator 4.45 0.0000
Singly-family house indicator 4.59 0.0000
Row house indicator 4.46 0.0000
Municipality indicator -5.68 0.0000
Private person indicator -4.74 0.0000
Coefficient of determination: 0.90 Standard error of regression: 0.34
Number of interventions: 72
Table 5. Fit of the local regression
Standard
Variable Coefficient error t-value p-value
Constant NA NA NA NA
Permitted building NA NA NA NA
volume
Parcel size NA NA NA NA
Northing NA NA NA NA
Easting NA NA NA NA
House price index NA NA NA NA
Presence of a NA NA NA NA
shore indicator
Coefficient of determination: 0.93 Standard error of regression: 0.29
Table 6. Post-sample prediction statistics for different approaches
Measure of Modelling approach
predictive
accuracy Ordinary Structural
least MM- time Local
squares estimation series regression
Mean predict. 0.16 0.031 -0.057 0.066
error
MAPE 2.56 2.42 2.38 2.45
RMSE 0.2 0.18 0.18 0.18
MAE 0.31 0.29 0.28 0.29
Correlation 0.86 0.86 0.87 0.86
Gravity 2714 2936 2971 2891
|
|
||||||||||||||||||||

less·ly adv.
in·for
Printer friendly
Cite/link
Email
Feedback
Reader Opinion