# The Price of Residential Land for Counties, ZIP Codes, and Census Tracts in the United States.

1 IntroductionResearchers have taken to describing a single-family house as a physical structure occupying some land: See Bostic, Longhofer, and Redfearn (2007), Davis and Heathcote (2007) and Davis and Palumbo (2008), for example. Because housing structures are infrequently renovated and construction costs change relatively slowly from year to year, rapid change in the value of housing typically occurs when the underlying land is appreciating or depreciating. For this reason, the housing boom and bust of 1998-2012 has been described as a land boom and bust (Davis, Oliner, Pinto, and Bokka, 2017).

Although the importance of studying and monitoring the price of land currently in residential use is now well understood, until recently few studies have produced data on land prices at a relatively fine level of geography. Broadly speaking, researchers have used one of two methods to estimate the price of land in current residential use. Both of these methods require data that have been, until recently, hard to acquire. The first method uses data from sales of vacant or near-vacant land. Three examples of the first method are Haughwout, Orr, and Bedoll (2008), Nichols, Oliner, and Mulhall (2013) and Albouy, Ehrlich, and Shin (2018). These authors all use data from the CoStar Group, Inc. Haughwout, Orr, and Bedoll (2008) estimate the price of land inside the New York metro area; Nichols, Oliner, and Mulhall (2013) produce price indexes for land for 23 metro areas; and Albouy, Ehrlich, and Shin (2018) estimate land values for all urban land in nearly all metropolitan areas in the United States.

The second method measures the price of land as the difference between house value and the replacement cost of the structure on the land. Davis and Palumbo (2008) apply this method to data from the American Housing Survey to generate the average price of land for 46 metro areas. Davis, Oliner, Pinto, and Bokka (2017) use proprietary data on house prices and construction costs from a number of sources to generate the level of land prices and changes in land prices at the ZIP code level for the Washington, DC metropolitan area. (2)

In this paper, we use a huge database of home appraisals to produce annual panel data for the price of land in single-family residential use for 964 counties, 8,344 ZIP codes, and 11,494 census tracts over the 2012 through 2017 period. Land prices are estimated for areas representing more than 85% of the U.S. population and 83% of all single-family homes homes. (3) To our knowledge, ours is the first study to produce these estimates at a fine geography for nearly the entirety of the United States. (4) Our source data are the Uniform Residential Appraisal Report submissions to the Government Sponsored Enterprises (GSEs), Fannie Mae and Freddie Mac. These reports are required by the GSEs before they guarantee a mortgage against default. These data contain more than 16 million unique appraisals submitted between 2012 and 2018.

Our raw estimates of land values in this data set are based on "cost-approach" appraisals; we set land value equal to the appraised value of the house less an estimate of depreciated replacement cost of the housing structure. (5) A common concern about this residual method of estimating land values is that it assumes that the sum of the replacement cost of the housing structure and the value of the land (if it were vacant) is equal to the value of housing. This is not true when the market value of the structure is below its replacement cost, as would be the case when a housing structure has become functionally obsolete and is due to be torn down or extensively remodeled, or when housing demand has fallen dramatically and construction activity in the area has ceased (see Glaeser and Gyourko, 2005). To address this issue, we calibrate a simple option model for tearing down and rebuilding a house. Simulations of the calibrated model suggest that the value of housing is well approximated as the sum of the replacement cost of the structure and the market value of the land if vacant for at least the first 10 years of the life of the structure. We therefore limit our data to housing units with an "effective age" of no more than 10 years. This filter eliminates about half of the appraisal observations, preserving only the appraisals where we believe the implied land value is an unbiased estimate of the market value of the land if the land were vacant.

For each estimate of land value in these data, we use the procedure described in Davis, Oliner, Pinto, and Bokka (2017) to adjust for the effect of lot size on land prices (the so-called "plattage effect") and then compute the land price per acre. Then, we use a procedure called Kriging as described by Basu and Thibodeau (1998) to interpolate land price per acre for every single-family housing unit in a given geography (county, ZIP code, or census tract) that does not have a GSE appraisal report. (6) Using a 20% hold-out sample, we show in our data that Kriging offers a lower root mean square error in interpolating land prices than some other commonly used methods of spatial interpolation. In an Appendix, we analytically derive a land-price gradient from a simple, calibrated urban model and show that when we simulate data from that model, Kriging delivers the correct land-price gradient using data simulated from the model.

The primary goal of the paper is to generate land prices indices covering most of the United States at a fine geography for use by researchers and policy-makers. To that end, the aggregate land price data presented in this paper are available for download at the web site of the Federal Housing Finance Agency, at https://www.fhfa.gov/papers/wp1901.aspx. The data include land price per acre by year, county, ZIP code, and census tract.

In addition, we use the data to develop and confirm important stylized facts about land prices in the United States. First, as shown by Albouy, Ehrlich, and Shin (2018) and others, the level of land prices at the center of a metro area varies greatly. For example, the price of land at the center of metro areas with more than 2.5 million single-family housing units is more than 20 times greater than the price of land at the center of metro areas with less than 500 thousand housing units. Second, the rate at which land prices decline from the city center also varies across metro areas; the rate of decline depends on the size of the metro area, the amount of regulation and the nature of the area's topography. The price of land covaries with certain variables in accordance with what standard urban models predict. In particular, measured at the average price per acre in a county, residential land prices are negatively correlated with lot size and are positively correlated with floor-area ratios, farm land prices, and other land uses in the area.

2 Option Model of Housing Teardowns

In this section, we build a simple model for when a land owner should optimally tear down his or her house and rebuild. We use the model to develop a rule-of-thumb to determine the oldest existing homes for which we can derive unbiased estimates of land value from a cost-approach value decomposition.

In the model, the land owner owns property with a building of size S on a lot of size L. The lot size is fixed in perpetuity, but the building size can be changed. This property earns rents of

[q.sup.H][S.sup.1-[phi]][L.sup.[phi]] (1)

where [q.sup.H] is the rental price per unit of housing service provided and [S.sup.1-[phi]][L.sup.[phi]] is the number of units of such housing service produced under a Cobb-Douglas production function.

Each period, the land owner must decide whether or not to demolish the building and rebuild on vacant land, or to let the building deteriorate further and revisit the choice next period. Since L is fixed in perpetuity, we can summarize the decision problem of the land owner as one over a choice of S only. Denote V (S) as the value of owning a property with a building of size S and similarly let V (0) denote the value of the property as vacant land.

When the land is not vacant and a structure of size S sits on the property, the land owner chooses either to let the property sit as is and collect rents, or to knock the property down and make the land vacant. This problem has the expression

V (S) = max {[q.sup.H] [S.sup.1-[phi]][L.sup.[phi]] + [beta]V (S (1 - [delta])),V (0)} (2)

[beta] is the factor by which the land owner discounts the future and [delta] is the rate at which the housing structure depreciates. The first term in the max operator is the value of the property with the structure left intact. This term includes the discounted value of owning a property with a structure of size S (1 - [delta]) next period. The second term is the value of the property when it is made vacant, assuming there are no demolition costs that must be paid to clear the land of the structure.

Denote [p.sup.S] as the price per unit of newly-built structure. When the land is vacant, the land owner chooses to build the optimally-sized structure to maximize the value of the land. The choice determines the rents earned this period, plus the discounted value of the property in the future after accounting for depreciation of the structure, less the cost of building the structure. This choice satisfies:

[mathematical expression not reproducible] (3)

The solution to this model can be characterized by two variables: [bar.S], the size of the structure that is built when the land is vacant, (7) and [S.bar], the smallest structure that exists (i.e. a structure of any smaller size is demolished). Using the relationship [S.bar]= [bar.S][(1-[delta]).sup.T], where T represents the maximum age of any housing structure, T = (log[S.bar]-log [bar.S]) [log [(1-[delta])].sup.-1].

We solve this model and calibrate it as follows. We set the discount rate to [beta] = 0.90; the annual depreciation rate to [delta] = 0.023; land's share of value to [phi] = 0.30; the price per unit of structure, a normalization, to [p.sup.S] = 1; and then we find the level of rent q such that the value of housing for a newly built optimally-sized structure is V ([bar.S]) = 100. This calibration delivers simulation results that the oldest house is 80 years old; the smallest (most depreciated) house size at the time of demolition [S.bar] = 10.8 with the value at that house size equal to the value of the underlying land V (0) = V ([S.bar]) = 30; and that the optimal house built on vacant land is [bar.S] = 70.

Figure 1 shows how housing value (blue line) and the replacement cost of structures (red line) change with the building age in this model. The replacement cost of structures declines at a constant rate from 70 when newly built to 10.8 at 80 years, at which point the structure is torn down. The value of the vacant land is always 30 (green line). The value of housing declines gradually over time from 100 when newly built to 30 at 80 years. When the point at which the structure is torn down the value of housing (30) is less than the sum of the value of land and the replacement cost of the structure (40.8 = 30 + 10.8).

So, according to this calibrated model, at what point is the house value no longer well-approximated as the sum of the value of the vacant land and the cost of the depreciated housing structure? To answer this question, in figure 2 we compute a "land share" of house value two different ways, and determine the age of the housing structure at which these two methods stop producing similar results. The first (correct) method, the blue line, computes land's share of value as the ratio of the value of vacant land to the value of housing, V (0) /V (S). This method shows that land's share of value increases monotonically from 30% for newly built homes to 100% for homes about to be torn down. The second method, the red line, computes land value residually as house value less the replacement cost of structures, V (S)--S. This is meant to approximate how land is measured residually when given an appraised value of housing, V (S), and an estimate of the depreciated reconstruction cost of the housing structure, S. Land's share of housing is this residually-measured land value divided by house value. Figure 2 shows that this measure of land's share of housing ranges from 30% for newly constructed homes to only about 60% for homes about to be torn down. The value of housing at the point of teardown is 30, entirely equal to land value. A residually-measured estimate of land value at the point of teardown would be biased down and only equal to 30 - 10.8 = 19.2. Figure 2 shows that the two methods produce nearly identical estimates of land share of value for structures that are younger than 20 years old. This result guides sample restrictions on the set of homes we use to estimate land values.

Of course, different models will produce different results. The point of this section is to not write down the most realistic model of land ownership and teardowns. Rather, it is to gain intuition about the value of the option to tear down a house. Our general results should be robust to any model where an optimal teardown occurs decades after a house is built. The reason is as follows: When a house is relatively newly built, the expected date of a teardown is so far away in the future that the option of tearing down the house has little value. Since this option has little value, the value of housing is well approximated as the sum of the replacement cost of the structure and the value of the vacant land.

3 Data

In each mortgage appraisal, there are typically three separate approaches to estimating the value of the underlying property. The first is the sales comparison (or "comps") approach, by which an appraised value is generated based on recent comparable transaction prices. A second "income" approach uses the discounted flow of imputed rental income to arrive at an estimated value of the property. Finally, the "cost" approach attempts to estimate the cost of the components of the property, the land and the structure, giving the estimated value of the property as the sum. We use cost-approach appraisals in our analysis. In the cost approach, our understanding based on conversations with industry experts is that appraisers typically use the residual method to estimate the land component: land value is set equal to an a priori estimate of the value of the property less the depreciated replacement cost of the structure component.

Our data on cost-approach appraisals are from Uniform Residential Appraisal Report submissions as collected by the GSEs. After data cleaning, including the removal of duplicate and/or resubmitted appraisals, we have approximately 16.4 million unique cost-approach appraisal records submitted between 2012 and 2018. (8) Among the appraisal records, 52% specify a single source for the estimates of replacement costs, while 15% do not specify any source and the remaining 32% are associated with at least two sources. Marshall & Swift (57%) and "local information" (31%) are the two most commonly appearing sources, followed by R.S. Means (13%) and "internet" (10%), with others (e.g. tax records and new construction information) making up about 1%.

We restrict the sample of homes in our study based on reported effective age. A standard definition for effective age is

Economic Life x ([[New Replacement Cost--Depreciated Structure Value]/New Replacement Cost]) (4)

For example, for a structure with an assumed economic life of 80 years, a depreciated structure value of $100,000, and a replacement cost of the structure as new of $150,000, the effective age would be 80 x (50/150) = 26.7 years. (9)

The results from our calibrated model of section 2 suggest that the residual method produces accurate estimates of the value of land for structures less than 20 years old. Given an assumed constant depreciation rate of 2.3% per year and a maximum economic life of a given house of 80 years, the calibrated model implies the maximum effective age from equation (4) for reliable cost-based appraisals of land should be approximately

80 x [1 - [(1 - 0.023).sup.20]] = 29.8 years (5)

In our analysis, we conservatively restrict our data to appraisals reporting an effective life of 10 years or less, corresponding to an age of 6 years in the calibrated model. Our data contains 7.8 million cost-approach appraisals with an appraised land value, lot size, and a structure with an effective age of no more than 10 years.

4 Standardization and Interpolation of Land Values

To create a value for land within a county or subaggregate, we use a two-step procedure. First, we standardize all land value estimates in the data to a per-acre basis. Then, we assign to all single-family parcels in the county a value of land using an interpolation algorithm. We report the average value of land per acre, averaged across all single-family parcels, for each desired geography within the county, including the county itself, ZIP codes, and census tracts. (10)

To standardize all estimates of land value to a per-acre basis, we first correct for plattage effects using a procedure similar to Davis, Oliner, Pinto, and Bokka (2017). For each county, we pool all data in all years and regress the log of the value of land on the log of lot size including ZIP code fixed effects and year dummies. Denote the coefficient on the log of lot size in this pooled regression as [beta]. For each parcel in our sample, we then compute the predicted log price per acre as the observed price plus our estimate of [beta] times the difference of the log of one acre and the log of the size of the lot.

We then assign a value of land to each single-family property in the county that lacks a GSE appraisal by using an interpolation procedure called Kriging. Before describing Kriging, note that we merge our data set of appraised properties to the universe of single-family parcels from the "Assessor Data" licensed from CoreLogic. The CoreLogic data contains the near-census of parcels in the counties for which it has acquired rights to the data.

Kriging is commonly used in the hard sciences. For our purposes, we need a method that can be used in urban areas with steep and varying gradients over short distances, and in rural areas with relatively flat gradients and geographically sparse transactions. We also need a method that is computationally manageable to loop over thousands of areas that include millions of parcels. Kriging satisfies these requirements. (11) To our knowledge, there have been no studies in the land-price literature that have evaluated the relative accuracy of different spatial imputation methods. Later in this section, we compare Kriging to a number of alternatives and show that Kriging produces more accurate estimates in a 20% holdout sample.

In Appendix A, we discuss the Kriging procedure in detail. Here we provide a brief summary. Before we do so, note that the bottom line is that Kriging - like other estimators - uses a weighted-average of n nearest neighbors to generate predicted land prices. What makes Kriging different is its algorithm to generate the weights.

Derivation of those weights proceeds in five steps. The first step involves calculating pairwise differences in values between each pair in the sample within a certain distance range, 5 miles in our case. (12) The next step establishes 15 bins of distances and computes the average "semivariance" (defined as half of the squared difference in land values) of all the points in each distance bin. The third step fits a 3-parameter curve that preserves monotonicity to this set of 15 binned averages. This is referred to as a "variogram;" we estimate one variogram per county per year. The fourth step applies this estimated curve to estimate covariances between values in an unsampled location and a number of nearby sampled locations (we choose 20 nearby neighbors). The fifth step uses these fitted covariances to construct the weights on the nearby sampled locations.

Figure 3 provides a look at the output of the Kriging procedure for the District of Columbia. The red dots show the location of observations in our sample located inside the city boundaries. The figure itself shows a heat map of spatially interpolated (log) land prices per acre. Figure 4 shows the mean of (log) land prices by ZIP code. The mean of each ZIP code is computed using the estimated land prices for properties with a GSE appraisal and the interpolated land prices for properties without an appraisal.

We evaluate the accuracy of the Kriging procedure by omitting a randomly selected 20% of our sample and determining the root mean square error (RMSE) of the Kriging interpolation procedure for that 20%. In every year, we compute the RMSE for each county in our sample. Across counties, the median RMSE is between 51 and 54 percent. This might seem high, but it is in fact well within our priors. Denote true land, structure and house value for a given parcel as L*, S* and H* such that

L* = H*--S* (6)

Suppose that structures are not measured with error but housing is measured with multiplicative error e such that observed house value [H.sup.o] is equal to H* (1 + e). We can then write an expression for the percentage deviation of observed land value [L.sup.o] = [H.sup.o]-S* from the truth as

[mathematical expression not reproducible] (7)

Equation (7) says that the percentage measurement error in land values is equal to the inverse of land's share of house value times the percentage measurement error in house values. Case and Shiller (1989) estimate the residual standard deviation to be about 15% for housing in repeat-sales of individual properties. Given land's share of house value is about 30%, this suggests the standard deviation of measurement error to land prices should be about 50%.

We compare the median RMSEs from Kriging to three other commonly-used spatial interpolation procedures: Null, Nearest Neighbor, and Inverse-Distance Weights. Null sets the interpolated value of the target parcel equal to the unconditional average value in the county, ZIP, or tract, respectively; Nearest Neighbor sets the interpolated value equal to the average value of the 20 most proximate observations; and Inverse-Distance Weights computes the weighted-average value of the 20 most proximate observations with the weights inversely proportional to the squared distance between the neighbor and the target parcel. Table 1 reports median RMSEs across counties for Kriging and the other spatial interpolation methods we consider for our 20% hold-out sample. As the table shows, measured this way, Kriging provides the greatest interpolation accuracy in every sample year.

We also evaluate the accuracy of the Kriging procedure for our application by seeing if it can recover land values that are the outcome of a simple rendition of the standard monocentric city model that we can compute analytically. (13) We simulate two data sets from this model, one in which land values are measured perfectly and one in which land values are measured with error. We ask if Kriging can exactly replicate the analytic gradient of land prices in the model. In the data set in which land values are perfectly measured, Kriging nearly exactly replicates the analytic gradient. In the data set in which land values are measured with error, Kriging produces relatively low average errors. For more details, see appendix B.

5 Results

Table 2 presents observation counts for the calculated land price series. We are able to calculate land prices for 964 counties, 8,344 ZIP codes, and 11,494 census tracts in our balanced panel between 2012 and 2017. Our pooled cross-section, calculated using data pooled between 2012 and 2018 and re-based to 2013 using the year dummy coefficients from the county-level regressions described above, includes data for 1,758 counties, 15,450 ZIP codes, and 38,539 census tracts. (14) The annual and pooled samples cover 85% and 94% of the U.S. population residing in the 50 states plus the District of Columbia, respectively, and 83% and 93% of the single-family housing units.

In this section, we present stylized facts related to the land-price data. For expositional purposes, we present pooled estimates first in order to validate our data in terms of known relations between land prices and other variables. We then proceed to the annual panel, where we present several new findings. Overall, there are five main categories of stylized facts that we present in turn: 1) the land price gradient; 2) spatial variation in housing-structure density; 3) spatial variation in land use; 4) dynamics of land prices; and 5) dynamics relating housing and land prices.

5.1. The Land Price Gradient

The traditional monocentric city model predicts land prices fall with distance to the CBD because households are willing to pay less per unit of housing as commuting costs rise. Since the marginal cost of an additional unit of structures is roughly constant within the city, the solution to the zero-profit condition for housing producers requires variation in the price of land. Therefore, the negative house-price gradient translates to a negative gradient for land prices.

To illustrate the relation between land prices and proximity to the CBD, Figure 5 shows the land price per acre (pooled sample) for ZIP codes within 25 miles of the CBDs of two CBSAs, Washington-Arlington-Alexandria, DC-VA-MD-WV and San Francisco-Oakland-Hayward, CA. The same land price data are presented as maps (top panel) and as plots of prices as a function of the radial distance to the center of the CBD (bottom panel). These figures show a clear, downward sloping land price gradient. In the very center ZIP codes, shown in grey on the maps, there are not enough single-family housing unit transactions to construct an index. Land prices where single-family units are plentiful enough to construct an index start at about $10 to $30 million per acre in the most expensive ZIP codes of both cities. By about 10 miles from the CBD, the gradients are mostly flat, though there is still some slope in the San Francisco CBSA.

These relations between land prices and proximity to the CBD are not unique to these two cities. Figure 6, panel (a) pools land prices and proximities for all ZIP codes within city population ranges and shows that average land prices fall with distance to the CBD. The radius at which the gradient flattens may be usable as a metric of what defines the effective boundaries of the city: When the gradient flattens, the value of the locations in the flat sections are no longer governed primarily by the forces described by the traditional monocentric city model and perhaps another model is necessary.

The monocentric city model also predicts that land-supply restrictions have a negative effect on land prices holding population and amenity constant. This occurs because land supply restrictions have a negative effect on the spatial efficiency of the city which becomes capitalized into land prices. But such restrictions also create amenities that may exceed the efficiency loss, with the net effect being a reduced-form positive correlation between land restrictions on the one hand, and house and land prices on the other (see Saks, 2008, Saiz, 2010, and Davidoff, 2016). Figure 6, panels (b) and (c) show land-price gradients by population and top and bottom halves of regulatory burden (Gyourko, Saiz, and Summers, 2008) and topographic interruptions (Saiz, 2010), respectively. These panels show supply restrictions to be positively correlated with the levels of the land price but not the slopes of the gradients.

5.2. Land Prices, Housing Structure Density, and the Reservation Use

The traditional production function for housing includes structure and land inputs. Because of the downward-sloping land price gradient, combined with constant marginal structure production costs, housing producers shift towards land-intensive housing production far from the CBD where the price of land is low. Our master assessor dataset contains information on both interior square feet and the lot size. We use these two variables to construct the ratio of the interior square footage to the lot size, known as the "floor-area ratio" (FAR), for single-family homes in each area covered by our land data.

Panel (a) of Figure 7 graphs FARs for single-family housing against land prices. At a low land price per acre, structure density is low, with the cheapest land containing FARs at about 0.05 (i.e. a house with about 2,200 interior square feet on a one-acre lot) and the most expensive land containing FARs near 1 (i.e. a 3 story, 2,400 square foot row-house on a 2,400 square foot lot). Holding structure constant, lower structure density is associated with larger lots. Accordingly, we see in panel (b) that lot sizes decrease monotonically with land prices.

As land prices fall and lot sizes increase, the land use approaches its reservation use. Agricultural production occurs at the edge of most cities. Because land use is governed by the bid-rent curves for various sectors, and low-density residential land use is typically the dominant developed use at the edge of the city, the single-family land price for ZIP codes at the edge of the city should be closely linked to the agricultural land price. Panel (c) shows the agricultural land price per acre, as measured by the U.S. Department of Agriculture, is highly correlated with single-family residential land prices.

5.3. Variation in Land Use

Land prices and land use are also correlated in the neoclassical model. Each economic sector has its own production function, and these production functions include land as an input directly (e.g. agriculture) and/or indirectly (interior space, e.g. offices, retail, manufacturing, and housing). Profit maximization results in a marginal willingness to pay for land based on its proximity to the CBD, referred commonly in urban economics as a "bid-rent" curve. In part because different sectors have different bid-rent curves, sectors sort into regions of the city. Traditionally, agriculture is the most land intensive and thus occupies the cheapest land far from the CBD. Conversely, office space is the least land intensive and occupies the densest structures in the CBD to take advantage of agglomeration externalities. In practice, at each distance from the CBD, there is a mixture of land uses within a particular area, though land use tends to follow the theoretical predictions.

Our land prices are for single-family housing units, yet due to the spatial equilibrium condition, these should be highly correlated with land use types for other sectors. The panels in Figure 8 show the fraction of the land area in respective sectors as a function of the single-family land prices we have estimated. Land use classifications are generated by mapping the CoreLogic land use codes to create the aggregate categories presented in the figures.

Panel (a) shows the three residential land use types as a function of the single-family land price per acre. All three residential land use types increase in share with the land price up to about $5 million per acre, at which point, single-family detached land use declines while the other two continue increasing. Panel (b) shows industrial and agricultural land use as a function of the single-family land price per acre. Agricultural land use monotonically declines with the single-family land price, highlighting the reservation use of agricultural land in urban areas. Conversely, industrial land use almost monotonically increases with the land price. Panel (c) presents three different commercial land use types, retail, office, and other. Retail and office land use rise with the single-family land price until about $15 million per acre, at which point, office increases dramatically, mainly at the expense of retail. Panel (d) shows vacant structure and vacant land use. Both are declining as a function of the single-family land price, reflecting the increasing opportunity cost of leaving a potentially productive asset in an unproductive state.

5.4. Land Price Dynamics

Bogin, Doerner, and Larson (2018) document patterns of house price appreciation in different parts of the city and across cities of various sizes and types. Based on the insights of Davis and Palumbo (2008) and other papers mentioned earlier, this should reflect different rates of appreciation of land prices after controlling for the value of housing that is attributable to the replacement cost of structures.

Why does the price of land increase at different rates in different places? Table 3 shows the results of four models for land-price changes over 2012-2017 estimated using ordinary least squares. Model 1 contains CBSA-specific fixed effects and serves the purpose of focusing on within-city land price dynamics. Models 2 through 4 do not contain CBSA fixed effects and therefore are able to capture partial correlations across cities based on city-level attributes.

Model 1 shows the FAR and initial average house value to be positively associated with land price appreciation, whereas the distance to the CBD is negatively associated with land appreciation. Combined, these results indicate that areas near the CBD have land prices that have been appreciating faster than areas far from the CBD, holding city price levels constant.

Model 2 evaluates the effect of city size on land price appreciation, with appreciation positively related. Model 3 estimates the appreciation gradient as it relates to city size. The coefficient on the distance variable is 0.0036 and the coefficient on the distance x housing units (log) variable is -0.00028, giving a flat gradient in a city with 340,000 housing units, and negatively (positively) sloped appreciation gradients in larger (smaller) cities. Model 4 shows each of the previous two models' results to be robust, while also demonstrating housing market regulation to be positively correlated with land price appreciation. In sum, these results echo the findings of Bogin, Doerner, and Larson (2018). The recent data suggest land prices are appreciating in much the same manner as house prices--fastest near the centers of large cities.

5.5. House and Land Price Dynamics

The stylized fact that land and house prices have been rising together should not come as a surprise, as land prices and house prices are linked through the housing production function. Previous studies have shown land prices to be more volatile than the price of housing, most notably in the land-leverage literature (e.g. Bostic, Longhofer, and Redfearn, 2007, and Davis and Heathcote, 2007).

This relationship holds in our data as well. Figure 9 plots the average annual growth rate of land prices (horizontal axis) against the average annual growth rate of house prices from Bogin, Doerner, and Larson (2018) (vertical axis) between 2012 and 2017, a period of recovery from the housing bust. The 45 degree line shows cases where growth rates are identical. The figure shows the range of changes to land prices is larger than that of house prices. A simple OLS regression relating house and land price appreciation gives an elasticity of 0.44 over this sample, indicating a 10% change in land prices is associated with a 4.4% change in house prices. (15)

6 Conclusion

Although it is widely recognized that booms and busts in house prices reflect, in part, booms and busts in underlying land prices, until recently little data was available to study land prices. We help fill this gap by using a very large data set of appraisals to generate annual panel data from 2012 through 2017 of the average price of land per acre used in single-family homes for 964 counties, 8,344 ZIP codes, and 11,494 census tracts. We also calculate pooled cross-sectional estimates of land price per acre for 1,758 counties, 15,450 ZIP codes, and 38,539 census tracts to facilitate analysis of gradients and long-run, location-specific spatial variation. In our work, we use a new model of the option to tear-down and rebuild a house to guide the maximum effective age of the housing stock with which to calculate unbiased estimates of land prices. For properties without appraisal data, we interpolate land prices using a Kriging procedure, provides more accurate estimates than other interpolation methods.

Overall, we document a number of properties of the level and growth rate of land prices that are generally consistent with predictions of traditional models of urban economics. We expect that future researchers will use the data we generate to build on our results, and current and future policy-makers will monitor these data to better understand emerging risks in housing markets.

References

ALBOUY, D., G. EHRLICH, AND M. SHIN (2018): "Metropolitan Land Values," Review of Economics and Statistics, 100(3), 101-120.

ALONSO, W. (1964): Location and land use: Toward a general theory of land rent. Harvard University Press.

BASU, S., AND T. G. THIBODEAU (1998): "Analysis of spatial autocorrelation in house prices," Journal of Real Estate Finance and Economics, 17(1), 61-85.

BOGIN, A., W. DOERNER, AND W. LARSON (2018): "Local House Price Dynamics: New Indices and Stylized Facts," Forthcoming, Real Estate Economics.

BOSTIC, R. W., S. D. LONGHOFER, AND C. L. REDFEARN (2007): "Land leverage: Decomposing home price dynamics," Real Estate Economics, 35(2), 183-208.

BRUECKNER, J. K. (1987): "The structure of urban equilibria: A unified treatment of the Muth - Mills model," Handbook of Regional and Urban Economics, 2, 821-845.

CASE, K. E., AND R. J. SHILLER (1989): "The Efficiency of the Market for Single-Family Homes," American Economic Review, 79(1), 125-137.

DAVIDOFF, T., ET AL. (2016): "Supply constraints are not valid instrumental variables for home prices because they are correlated with many demand factors," Critical Finance Review, 5(2), 177-206.

DAVIS, M. A., AND J. HEATHCOTE (2007): "The price and quantity of residential land in the United States," Journal of Monetary Economics, 54(8), 2595-2620.

DAVIS, M. A., S. D. OLINER, E. J. PINTO, AND S. BOKKA (2017): "Residential land values in the Washington, DC metro area: New insights from big data," Regional Science and Urban Economics, 66, 224-246.

DAVIS, M. A., AND F. ORTALO-MAGNE (2011): "Household Expenditures, Wages, Rents," Review of Economic Dynamics, 14(2), 248-261.

DAVIS, M. A., AND M. G. PALUMBO (2008): "The price of residential land in large U.S. cities," Journal of Urban Economics, 63(1), 352-384.

GLAESER, E. L., AND J. GYOURKO (2005): "Urban Decline and Durable Housing," Journal of Political Economy, 113(2), 345-375.

GYOURKO, J., A. SAIZ, AND A. SUMMERS (2008): "A new measure of the local regulatory environment for housing markets: The Wharton Residential Land Use Regulatory Index," Urban Studies, 45(3), 693-729.

HAUGHWOUT, A., J. ORR, AND D. BEDOLL (2008): "The price of land in the New York metropolitan area," Federal Reserve Bank of New York: Current Issues in Economics and Finance, 14(3).

HENGL, T. (2007): A Practical Guide to Geostatistical Mapping of Environmental Variables. Luxembourg: Office for Official Publications of the European Communities.

KATZ, A. J., AND S. W. HERMAN (1997): "Improved estimates of fixed reproducible tangible wealth," Survey of Current Business, 77(5), 69-92.

MILLS, E. S. (1967): "An Aggregative Model of Resource Allocation in a Metropolitan Area," American Economic Review, 57(2), 197-210.

MUTH, R. F. (1969): Cities and housing; the spatial pattern of urban residential land use. University of Chicago Press.

NICHOLS, J. B., S. D. OLINER, AND M. R. MULHALL (2013): "Swings in commercial and residential land prices in the United States," Journal of Urban Economics, 73(1), 57-76.

SAIZ, A. (2010): "The Geographic Determinants of Housing Supply," The Quarterly Journal of Economics, 125(3), 1253-1296.

SAKS, R. E. (2008): "Job Creation and Housing Construction: Constraints on Metropolitan Area Employment Growth," Journal of Urban Economics, 64(1), 178-195.

A Appendix: Spatial Interpolation Methods

This section describes various interpolation methods discussed in the paper. Our land price index approach begins by dividing the universe of parcels N within a county into those that are sampled [N.sup.s] and unsampled [N.sup.u]. All indices are calculated as the simple average of the price estimate (or actual value, when available) of each individual parcel within the county or county subaggregate (i.e. census tract). Accordingly, differences between indices within a geography arise due the method used to estimate prices for unsampled parcels.

The price of an unsampled parcel is estimated as a weighted average of prices of a subset of observed parcels within the county. The estimated price for parcel i is calculated as the average of the n nearest (by proximity) neighbor parcels, indexed by j, with weights [[lambda].sub.i,j]. (16)

For a particular method, the estimated price for parcel i is:

[mathematical expression not reproducible] (8)

The weights are assumed to sum to unity, or [summation][[lambda].sub.i,j] = 1.

1.1. Spatial Statistics

Spatial statistical methods do not consider spatial relations in outcomes, only proximity. Here, we discuss three spatial statistics that are commonly used to interpolate prices spatially. The general form of each of these statistics is below, where h is the distance between the location to be imputed i, and another nearby sampled location j that is one of the n nearest locations. The exponent c gives the degree of decay in the weight that is due to distance between the parcels.

[mathematical expression not reproducible] (9)

Null Estimator

The null estimator ("Null") sets n=[N.sup.s] and c = 0, giving [[lambda].sub.i,j] = 1/n. This gives the estimate of an individual parcel as the sample average.

Nearest Neighbor

The nearest neighbor estimator ("NN") also sets c = 0, giving [[lambda].sub.i,j] = 1/n. But n is typically set to the 5 to 25 nearest observed prices. This gives the estimate of a parcel as the sample average of nearby parcels.

Inverse-distance weights

The inverse-distance weight estimator ("IDW") sets n generally within the range of 5 to 25 nearest observed prices as with the NN estimator. The calculation of [lambda] then involves assuming an exponent c. This exponent is commonly set equal to c = 2, giving a relation between points that declines with the square of the distance. In this case, [mathematical expression not reproducible]

1.2. Geostatistics

In addition to spatial statistics, we include a single geostatistical estimator: ordinary kriging. As with the nearest neighbor estimator, n nearest neighbors are weighted and summed to generate predicted prices. [lambda] is calculated based on the strength of the observed relationship between observations of different proximities within the sample.

Derivation of the weights proceeds in five steps. The first step involves calculating pairwise differences in values between each pair in the sample within a certain distance range. The next step collapses and bins the semivariances (half of the squared differences) into averages by the distance between the points. The third step fits a curve, often referred to as a "variogram," to this set of binned averages. The fourth step applies this estimated curve to estimate covariances between values in an unsampled location and nearby sampled locations. The fifth steps uses these fitted covariances to construct the weights. (17)

As an illustration of this procedure, we present kriging steps for land prices in Washington, DC, pooled between 2012 and 2017. There are about 122,000 parcels and 4,000 sampled standardized land prices. To start, differences and semivariances [gamma] are calculated for each pair of points in N, resulting in over 9 million pairwise combinations that are within 10 miles of each other. (18)

The results from the first two kriging steps are shown in figure A.1. The hollow circles represent semivariance averages within each of the 15 distance bins. Exact values are shown in the table, including the distance, the number of pairs, and the average semivariance. Distances are reported in terms of distance in latitude/longitude degrees. (19) The bin at a distance of 0.05 therefore corresponds to a distance of 3.11 miles, with an average semivariance of 1.33 over 992,298 pairwise semivariances. The average semivariance of 1.33 corresponds to an average variance of 2.66, or an average standard deviation of 1.63, in terms of the log land price per acre, for any hypothetical pair of points with the distance between them referenced by the bin.

In step three, a functional form for the relationship between the seminvariances and the distances (the hollow circles) must then be assumed and fit to the data. The fitted curve, as shown by the blue line, is typically upward sloping, indicating the greater the distance the higher the variance. The spherical functional form has three parameters, [a.sub.0], [a.sub.1], and r that we estimate

[mathematical expression not reproducible] (10)

The three parameters combine to give the "sill" which is the value to which the variogram asymptotically approaches as the distance between points approaches infinity, or [a.sub.0]+[a.sub.1]; the "nugget" which is the value of the variogram when distance approaches zero, or [a.sub.0], and the "range," r, which is the value of h when the variogram reaches the sill. (20)

In addition to the binned semivariances, figure A.1 also shows a fitted spherical functional form for Washington, DC, with [a.sub.0] = 0.53, [a.sub.1] = 0.89 and r = 4.35 miles. This function is used to estimate the semivariance between any two hypothetical points, facilitating interpolation of prices in unsampled locations. (21)

The fourth step specifies the function transforming the fitted semivariances to covariances. For any two points j and k, a distance [h.sub.j,k] is calculated. Then, when combined with the fitted variogram parameters, the covariance, C, between the points is estimated as:

C([h.sub.j,k]; [a.sub.0], [a.sub.1], r) = [a.sub.0] + [a.sub.1] - [gamma] ([h.sub.j,k]; [a.sub.0], [a.sub.1], r)

The final step involves constructing the matrices and performing the operations necessary to arrive at the weights. First, construct [c.sub.i], an n x 1 vector of estimated covariances between an unsampled location i and its n nearest points. Then, we construct [C.sub.i], an n x n covariance matrix of the n nearest points. These matrices are augmented in the standard fashion with a Lagrange multiplier and column/row vectors of ones and a zero to normalize the weights to sum to one. These give the weights in [[lambda].sub.i], an n x 1 vector.

[mathematical expression not reproducible] (12)

1.3. Comparison

We compare the fit of each spatial interpolation method by comparing actual standardized land prices for a 20% hold-out sample to predicted land prices estimated using an 80% training sample. We consider a number of values for the number of nearest neighbors and the overall distance boundary considered.

These results are shown in table A.1 in terms of the mean, median, and standard deviation of RMSEs across the 1758 counties in the pooled sample. The mean RMSE for the entire sample of observations is 0.771, representing the average variation around the county mean. This is similar to the mean of the training sample, which gives a mean RMSE of 0.767 when compared to the 20% of the observations in the hold-out sample. Nearest neighbor and inverse-distance weights give increasing accuracy, with mean RMSEs falling to 0.569 and 0.522, respectively.

Each of the kriging estimates gives similar average fit, with the mean RMSE either 0.497 or 0.496. We interpret this finding as indicating that the boundary is rarely binding for finding nearest neighbors, and has little effect on the variogram. Overall, we interpret these findings as lending support to our decision to use the 20 nearest neighbors and a 5 mile boundary in our county-specific Kriging procedure.

B Appendix: Monte Carlo Simulation of Standard Urban Model

Assume that a city lies on a featureless plane with a region called the central business district (CBD) at its center. This district provides all employment and because commuting is costly in a way we specify precisely, households wish to live near the CBD. Spatial equilibrium requires identical households to have identical utility in all locations in the city. As we show, this implies that households consume less housing at a higher price near the CBD. Additionally, the housing production function implies housing is produced with high density and structure intensity near the CBD where land prices are high, and with low density near the edge of the city.

To be precise, assume a person consuming c units of consumption and h units of housing receives utility of

(1-[alpha])ln c+[alpha] ln h (13)

If a person lives distance d from the city center, their wage after commuting is w (1-td) where t is the percentage of income that must be paid to commute for each unit of distance d. Denote the rental cost per unit of housing d units from the city center as [q.sup.h.sub.d]. A person living d units from the city center faces the budget constraint of:

w (1-td) = c + [q.sup.h.sub.d]h (14)

A person choosing to live in location d units away from the CBD maximizes utility (13) subject to the budget constraint (14) by choosing optimal consumption [c.sub.d] and housing [h.sub.d] of

cd = (1-[alpha]) w (1-td) (15)

[q.sup.h.sub.d][h.sub.d] = [alpha]w (1-td) (16)

This means maximized utility at distance d from the center can be written as [U.sub.d]

[U.sub.d] = (1-[alpha]) ln [(1-[alpha])w (1-td)] + [alpha] ln [alpha]w (1-td) /[q.sup.h.sub.d] = [[kappa].sub.u] + ln w + ln (1-td)-[alpha] ln [q.sup.h.sub.d] (17)

where [[kappa].sub.u] is a constant equal to [alpha] ln [alpha] + (1-[alpha])ln(1-[alpha]). In equilibrium, we assume all locations have to provide the same utility, for example location d and d' must satisfy

[U.sub.d] = [U.sub.d]'

Then from equation (17) this implies

[mathematical expression not reproducible] (18)

Equation (18) governs the rate at which housing rental prices per unit change with distance from the CBD, roughly t/[alpha] percent per unit of d.

Note that we can also work out how the quantity of housing changes as a function of distance to the CBD. We start by using the definition of utility and substituting in optimal consumption but keeping housing

[U.sub.d] = (1-[alpha]) ln [(1-[alpha]) w (1-td)]+[alpha] ln [h.sub.d] (19)

Once we impose [U.sub.d] = [U.sub.d]', this gives us

[mathematical expression not reproducible] (20)

Now that we have worked out how housing quantities h and prices per unit [q.sup.h.sub.d] vary from the city center, we can also work out how the quantities and prices of land and structures change with distance. Temporarily suppressing the distance subscripts, assume competitive builders build housing using land l and structures s according to a CES production function

[mathematical expression not reproducible] (21)

with [rho] [member of] (-[infinity],1]. Assume each unit of housing generates revenue of [q.sup.h]; further, assume each unit of land costs [q.sup.l] and each unit of structure costs 1. Builders maximize

[mathematical expression not reproducible] (22)

The first-order conditions for optimal structures are

1 = [q.sup.h][h.sup.1-p] (1 - [theta]) [s.sup.p-1] (23)

[mathematical expression not reproducible] (24)

This implies that once we know [q.sup.h] and h, we also know s. Note that because we know s, we also know [q.sup.l]l = [q.sup.h] h - s. Now consider the first-order condition for optimal land:

[q.sup.l]l = [q.sup.h] [h.sup.1-p] [theta][l.sup.p] (25)

and thus

[mathematical expression not reproducible] (26)

Given a set of parameters, we can compute how quantities and prices and expenditures on housing, structures and land change with distance from a CBD. For a rough calibration, we set [alpha] = 0.25 based on the median housing budget shares of renters as documented by Davis and Ortalo-Magne (2011). The other parameters we set to match some approximate features of a city. We set t = 0.02 such that people 10 miles from the CBD consume about double the housing than people at the CBD but spend 20% less. (22) We jointly set [theta] = 0.90 and [rho] = -2.0 such that land's share of value rises from about 15% 10 miles from the CBD to about 55% at the CBD. We normalize the price per unit of housing to 1 at the CBD and normalize the quantity of housing consumed at the CBD such that the total expenditure at the CBD is for a $1 million house. As noted earlier, we assume the price per unit of housing structure is 1.0 everywhere in the metro area. Table A.2 shows prices, quantities and expenditures on housing, structures and land as well as land's share of house value, the quantity of land once we normalize the size of a single-family plot at the CBD to 0.25 acres, and land price per acre.

We simulate two data sets based on the calibration of this model. In both data sets, we draw 100 observations for houses in neighborhoods uniformly between 0 and 3.5 miles from the CBD; 200 observations for houses in neighborhoods uniformly between 3.5 and 7.5 miles from the CBD; and 300 observations for houses in neighborhoods uniformly between 7.5 miles and 10 miles from the CBD. In the first data set we assume no quantities or prices are measured with error. This enables us to see the accuracy of the Kriging procedure with regards to this application in an ideal environment.

In the second data set, we allow for i.i.d. measurement error in both the value of housing and the value of structures. (23) This simulation gives us some intuition for how the Kriging procedure performs under conditions where land is imperfectly measured. Denote [mathematical expression not reproducible] as observed housing value and [??] as observed structures costs. [mathematical expression not reproducible] and [??] are determined as [mathematical expression not reproducible] = [q.sup.h.sub.d][h.sub.d] (1 + [e.sup.h.sub.d]) [e.sup.h.sub.d] ~ U[-0.10,0.10] [??] = s (1 + [e.sup.s.sub.d]) [e.sup.s.sub.d] ~ U[-0.10,0.10]

with [e.sup.h.sub.d] and [e.sup.s.sub.d] drawn independently. We then compute observed land value residually,

[mathematical expression not reproducible]

Denote the term in parentheses as [e.sup.l.sub.d], measurement error in land value. Even though the standard deviation of [e.sup.h.sub.d] and [e.sup.s.sub.d] are relatively small (5.7 percent each), the standard deviation of measurement error as a percent of true land value, measured as [e.sup.l.sub.d]/ ([q.sup.l.sub.d][l.sub.d]), in this second data set is much larger, 27 percent. The measurement error for land value is magnified because land value is residually measured and accounts for a relatively small fraction of home value (as we discuss earlier in the paper).

Table A.3 compares the estimates from Kriging for land price per acre by distance to CBD to the numbers we compute analytically in table A.2 for the simulated data set measured without error (data set 1) and the simulated data set with measurement error (data set 2) for 0 to 9 miles to the CBD. When the data are measured without error, Kriging nearly perfectly replicates the analytic results. When the data are measured with error, the Kriging results are less accurate. The average error is 4.2%.

Morris A. Davis, William D. Larson (*), Stephen D. Oliner, Jessica Shui

January 2019

(*) Corresponding Author: William Larson (william.larson@fhfa.gov).

(1) These indices are works in progress and all data, tables, figures, and other results in this working paper are subject to change.

(2) See Nichols, Oliner, and Mulhall (2013) and Davis, Oliner, Pinto, and Bokka (2017) for additional references in this literature.

(3) We also create a pooled cross-section of land prices for many more localities than the panel. This database includes land prices for 1,758 counties, 15,450 ZIP codes, and 38,539 census tracts. The pooled cross section also includes partial-year data for 2018 that are excluded from the annual panel dataset.

(4) Albouy, Ehrlich, and Shin (2018) produce estimates for nearly all Primary Metropolitan Statistical Areas (PMSAs) in the United States, but they do not report on or make available data for any finer level of geography. On average, they observe only 212 direct land sales per PMSA.

(5) We exclude data on vacant land sales.

(6) We obtain the universe of single-family housing units in a given geography from the "Assessor Data" licensed from CoreLogic. The CoreLogic data contain the near-universe of parcels in the counties for which it has assembled data.

(7) This is the argmax of equation (3).

(8) We exclude appraisal records with 1) lots smaller than 500 square feet or larger than 2 acres; 2) property value missing or less than $10,000; 3) cost-approach-estimated site value missing or less than $200; 4) land price per acre smaller than $200; 5) site value greater than cost-approach-estimated property value; 6) missing depreciation information or depreciation at least three times greater than the contract price or the appraised value; 7) land share of property value less than 1% or greater than 99%; 8) structure-land area ratio equal less than .01 or greater than 10; 9) construction date before 1850 or after 2018.

(9) The U.S. Bureau of Economic Analysis uses a service life of 80 years for new 1-4 unit residential structures; see Katz and Herman (1997).

(10) When ZIP codes span multiple counties, the reported value is the average of values in each represented county, weighted by the single-family housing stock share.

(11) Basu and Thibodeau (1998) conduct an analysis of spatial autocorrelation in housing prices by comparing predictions from hedonic models to models with spatially autocorrelated errors. They find that traditional hedonic models are more accurate when unexplained price variation is spatially uncorrelated; otherwise, Kriging is more accurate.

(12) Our data are partitioned by county (by year). This implies that the Kriging procedure never considers any pairwise points where one of the points is not in the county. Restated, the 5-mile cutoff will not bind in any county where the maximum distance between two locations in that county is less than 5 miles.

(13) See Alonso (1964), Mills (1967), Muth (1969), and Brueckner (1987).

(14) As noted in the introduction, we had only partial data for 2018 when the empirical work for the paper was completed. 2018 data are therefore usable in a pooled context but not in a panel.

(15) The equation [DELTA]House Pricesi = [alpha] + [beta][DELTA]Land [Prices.sub.i] + [e.sub.i] is estimated over N = 8,154 counties indexed by i, with changes denoting the log-difference between 2017 and 2012 values. Estimates give [alpha] = 0.026(0.001), [beta] = 0.440(0.014), and [R.sup.2] = 0.56, with heteroskedasticity-robust standard errors in parentheses.

(16) So [p.sub.i,1] is the closest observed price to location i; [p.sub.i,2] is the second closest observed price to location i; and so on.

(17) For a more in-depth overview of kriging, see Hengl (2007).

(18) The semivariance for prices at two points i and j is half of the squared difference, [mathematical expression not reproducible]. Semi-variances are used instead of variances to facilitate construction of weights using covariances in later steps, and this is the standard in the geospatial literature. Isotropy (i.e. the direction between the points does not affect the strength of the relationship) is a standard assumption, which we make here in order to express proximity using a single variable, h.

(19) In other words, we take the square root of the sum of the squared differences between two sets of coordinates. Since our distances are generally small, we use this simplified distance measure as a proxy for the Euclidean distance.

(20) The spatial statistics previously discussed are special cases of models that can be fit to a variogram. For instance, when [a.sub.1] = 0, the kriging estimator reduces to the nearest neighbor weights. When the function is an exponential with [a.sub.0] = 0 and an exponent of 2, then the kriging estimator reduces to inverse-distance weights. The advantage of the kriging estimator is that it does not place these parameter restrictions on the spatial relation, a priori.

(21) Other functional forms are common, especially the exponential function, [mathematical expression not reproducible]. We consider the exponential function as well in some exercises.

(22) This is referring to single-family homes.

(23) This measurement error can be thought of as a deviation from model-determined prices.

Table 1: Interpolation RMSE (20% hold-out sample) Kriging IDW NN Null-Tract Null-ZIP Code 2012 0.545 0.558 0.622 0.666 0.659 2013 0.534 0.549 0.609 0.673 0.663 2014 0.538 0.556 0.615 0.666 0.665 2015 0.522 0.540 0.605 0.664 0.661 2016 0.515 0.533 0.592 0.667 0.662 2017 0.530 0.546 0.607 0.682 0.667 Null-County Hold-Out Obs 2012 0.768 148,695 2013 0.764 207,337 2014 0.766 171,719 2015 0.767 224,380 2016 0.763 275,151 2017 0.766 225,927 Notes: Interpolation RMSE calculated as follows. 1) Estimate an interpolated estimate for each hold-out parcel for each year. 2) Calculate an RMSE for each county for each year. 3) Calculate the median RMSE across counties (reported in table). IDW = inverse-distance weights, NN = nearest neighbor. Table 2: Land Price Index Counts Counties ZIP Census Population S.F. Codes Tracts Housing Units Balanced Annual Panel 964 8,344 11,494 85.8% 83.2% Pooled Cross-Section 1,758 15,450 38,539 94.2% 92.6% Notes: The county sample include all counties with at least 50 standardized land price observations within the period. The ZIP code and Census Tract samples are the subsets of the relevant geography with at least 10 standardized land price observations that exist within counties with a calculated index. When ZIP codes cross county boundaries, the ZIP code index is based on the simple average of all parcels with estimated prices in counties with a calculated index. The population and single-family housing units percentage is the fraction represented by the county index coverage, according to 2012-2016 (5 -year) ACS estimates. Table 3: Land Price Change Correlates, 2012-2017 Dependent variable: log(land price 2017) - log(land price 2012) [1] [2] Initial land value share -0.194 (***) [0.0361] Floor area - land area ratio 0.0412 (***) [0.00927] Initial average house value 0.0169 (***) [0.00283] Distance to the CBD (miles) -0.000496 (***) [0.000126] Housing units in CBSA (log) 0.00778 (**) [0.00351] Units (log) x- Distance WRLURI Constant -0.111 (***) -0.05 [0.0341] [0.0438] CBSA Fixed Effects Yes No Observations 7,922 8,338 R-squared 0.647 0.038 [3] [4] Initial land value share Floor area - land area ratio Initial average house value Distance to the CBD (miles) 0.00363 (***) 0.00434 (***) [0.000726] [0.00105] Housing units in CBSA (log) 0.0146 (***) 0.0147 (***) [0.00324] [0.00447] Units (log) x- Distance -0.000285 (***) -0.000335 (***) [5.15e-05] [7.18e-05] WRLURI 0.00961 (**) [0.00478] Constant -0.136 (***) -0.139 (**) [0.0394] [0.0566] CBSA Fixed Effects No No Observations 8,272 7,062 R-squared 0.059 0.073 Notes: Robust standard errors in brackets, clustered by CBSA. (***) p < 0.01, (**) p < 0.05, (*) p < 0.1. The WRLURI is the Wharton Residential Land Use Regulation Index. Table A.1: Interpolation RMSE (20% hold-out sample), alternative parametrizations Mean Median SD Full-Sample RMSE 0.771 0.805 0.222 20% Hold-out Sample RMSE Null - County Average 0.767 0.799 0.231 NN - 20 NN, 5 Mile Boundary 0.569 0.613 0.197 IDW - 20 NN, 5 Mile Boundary 0.522 0.563 0.181 Kriging - 15 NN, 5 Mile Boundary 0.497 0.536 0.174 Kriging - 20 NN, 5 Mile Boundary 0.497 0.536 0.175 Kriging - 25 NN, 5 Mile Boundary 0.496 0.537 0.177 Kriging - 20 NN, 2.5 Mile Boundary 0.497 0.540 0.180 Kriging - 20 NN, 10 Mile Boundary 0.497 0.536 0.170 Notes: Sample is the pooled cross-section (1758 counties). Interpolation RMSE calculated as follows. 1) Estimate an interpolated estimate for each hold-out parcel. 2) calculate an RMSE for each county for each year. 3) Calculate the median/mean/SD RMSE across counties (reported in table). Table A.2: Predictions of Calibrated Urban Model d [q.sub.d.sup.h] [h.sub.d] [q.sub.d.sup.h][h.sub.d] s 0 1.000 1,000,000 $1,000,000 $464,159 1 0.922 1,062,482 $980,000 $480,054 2 0.849 1,130,281 $960,000 $496,838 3 0.781 1,203,972 $940,000 $514,581 4 0.716 1,284,211 $920,000 $533,360 5 0.656 1,371,742 $900,000 $553,260 6 0.600 1,467,412 $880,000 $574,375 7 0.547 1,572,189 $860,000 $596,810 8 0.498 1,687,183 $840,000 $620,680 9 0.452 1,813,671 $820,000 $646,116 10 0.410 1,953,125 $800,000 $673,261 d [q.sub.d.sup.l] [l.sub.d] [q.sub.d.sup.l] [l.sub.d] land share 0 0.413 1,295,995 $535,841 54% 1 0.354 1,411,219 $499,946 51% 2 0.300 1,543,749 $463,162 48% 3 0.251 1,697,826 $425,419 45% 4 0.206 1,879,309 $386,640 42% 5 0.165 2,096,587 $346,740 39% 6 0.129 2,362,219 $305,625 35% 7 0.098 2,696,126 $263,190 31% 8 0.070 3,132,449 $219,320 26% 9 0.047 3,736,426 $173,884 21% 10 0.027 4,655,227 $126,739 16% [l.sub.d] [q.sub.d.sup.l] d (acres) per acre 0 0.25 $2,143,364 1 0.27 $1,836,505 2 0.30 $1,555,320 3 0.33 $1,298,934 4 0.36 $1,066,527 5 0.40 $857,343 6 0.46 $670,706 7 0.52 $506,049 8 0.60 $362,959 9 0.72 $241,250 10 0.90 $141,134 Table A.3: Land price per acre, as predicted by the model and estimated via Kriging from two data sets From Kriging Procedure Data Set 1 Data Set 2 Distance From Model Predicted % Error Predicted % Error 0 $2,143,364 $2,137,352 0.28% $2,107,724 1.70% 1 $1,836,505 $1,836,186 0.02% $1,762,402 4.00% 2 $1,555,320 $1,555,104 0.01% $1,575,406 -1.30% 3 $1,298,934 $1,298,750 0.01% $1,276,002 1.80% 4 $1,066,527 $1,066,499 0.00% $998,657 6.40% 5 $857,343 $857,259 0.01% $878,682 -2.50% 6 $670,706 $670,688 0.00% $641,687 4.30% 7 $506,049 $506,015 0.01% $450,198 11.00% 8 $362,959 $362,945 0.00% $364,878 -0.50% 9 $241,250 $241,259 0.00% $200,874 16.70% Mean 0.03% 4.16%

Printer friendly Cite/link Email Feedback | |

Author: | Davis, Morris A.; Larson, William D.; Oliner, Stephen D.; Shui, Jessica |
---|---|

Publication: | AEI Paper & Studies |

Article Type: | Report |

Date: | Jan 1, 2019 |

Words: | 11413 |

Previous Article: | "The Duty of a Wise Governments": How Patronage Served the Republic in the Second Party System. |

Next Article: | Executive Summary. |

Topics: |