Validation and demonstration of the Prescott spatial growth model in metropolitan Atlanta, Georgia.
The Prescott Spatial Growth Model (PSGM), originally designed for commercial use, has been used to develop growth scenarios from which to evaluate environmental impacts of urbanization. To facilitate further use of the PSGM in scientific research, a rigorous verification and validation of the model's capabilities is needed. Given the model's previous use in the Atlanta region to perform growth projections, this was the logical choice for a study area to perform model verification and validation. This allowed the use of historical and current data for the Atlanta regional area (see Figure 1), which was modeled in previous work by the authors (Estes et al. 2006, 2007). The purpose of this project was to develop growth scenarios for the time period of 1980-2000 using historical population, employment and land-use data. The intent of this endeavor was to validate the PSGM through comparison of scenarios generated with "blind" growth projections and those generated using actual growth for the time period. The drivers of growth are ever-changing for elected officials come and go, planning practices evolve, and current "hot-button" issues change. An exact agreement between projected and observed growth is not possible because of the complexity of decision drivers, previous development trends, and the inherent political and social variability.
[FIGURE 1 OMITTED]
Numerous land-use and land-cover change (LULCC) models have been developed with various perspectives. Growth models may be spatial or nonspatial and typically are used for prediction and scenario generation in the context of integrated assessments of LULCC. Such models usually are implemented at local scales and may not be scalable to continental or global scales. Growth models may be grouped into two broad categories, empirical models and dynamic process simulation models. Empirically fitted models are based on statistically matching temporal trends and/or spatial patterns with a set of predictor variables (Brown et al. 2004).
Dynamic process models seek to represent the most important interactions between agents, organisms, and their environment (Brown et al. 2004). Examples of process models are cellular automata (CA) (Clarke and Gaydos 1998) and agent-based models (ABMs) White and Engelen 1993). In CA models, cells have fixed neighborhood relations and update rules. In some cases, the CA represents the state and dynamics of the environment. Cells can represent parcels of land with unique characteristics, each changing as a result of rules applied to the state of the cell and that of its neighbors. Challenges include how to establish rules that govern system behavior and incorporating heterogeneity and dynamism in these rules (Brown et al. 2004).
A widely used CA model is SLEUTH (Clarke et al. 1997) in which each grid cell is classified as either urbanized or non-urbanized. Such CA models are probabilistic, run quickly, and can be applied to any region with the necessary data. However, they lack the ability to distinguish activity types for they operate on simple "urban" and "nonurban" designations. The SLEUTH model runs in the UNIX environment and requires a tremendous amount of spatial data. The model also has neither coherent economic theory nor a behavioral component to help understand its results (EPA 1999).
Agent-based models (ABMs) are defined in terms of entities and dynamics at microlevels such as individuals (householders, farmers, developers) and/or institutions (industries, governments, etc.). Agents need their state to be defined, decision-making rules developed, and other mechanisms to perform particular behaviors. Agents' behaviors affect each other and the environment. The environment changes in response to agents and by following its own dynamics. This allows complex feedback relationships that lead to nonlinear path-dependent dynamics often observed in complex systems. ABMs are considered a promising topic for continued development (Brown et al. 2004). These models require detailed knowledge of the behavior of the agents being modeled. They also may require considerable coding expertise as well as considerable computer time to run. In addition, they typically require many simulations to evaluate any particular situation for they are based on an underlying stochastic model.
The PSGM, developed at Prescott College in Prescott, Arizona, in collaboration with NASA, is a dynamic process model with a raster-based structure that is compatible with agent-based features. This GIS-based model allows users to build a variety of future community growth scenarios based on current policy and development decisions. It is important to note that the PSGM is projective not predictive. The validation process allowed us to see how well our rules captured past growth activity. Validating that the PSGM does this reliably allows a reasonable confidence that the rules used to replicate past growth can also project future growth under the same scenario assumptions. Scenarios may be created at the parcel level or by the use of any size-assigned grid cells. The PSGM may be constructed as a set of "nested" models moving from the county to the community and potentially the neighborhood level. As the PSGM is a grid-based model, the suitability or land-use allocation results can be transferred to parcels through an overlay process afterwards. Also, one of the strengths of the PSGM is that, once the input data are set up for a baseline, adjustments can be made to reflect the impact of faster or slower growth, different distributions of growth, and other factors that may impact the growth rate and dispersion of the population. This type of approach allows the user to realistically represent each area's particular rate and distribution of growth. The PSGM is not as data-intensive as other GIS-based spatial growth models such as the INDEX (EPA 1999) and the California Urban Futures (CUF) model (Landis 1995), which require detailed data for raw land prices, construction costs, site improvement costs, service costs, development fees, and other development costs.
THE PRESCOTT SPATIAL GROWTH MODEL AND METHODOLOGY
The PSGM is an ArcView GIS compatible application that allocates future growth into available land based on user-defined parameters. The purpose of the PSGM is to help users develop alternative future patterns of land use based on socioeconomic projections such as population, employment, and other controlling factors. When creating scenarios based on future development, the PSGM requires several inputs. Developable land must be provided as an input grid that represents areas suitable for accepting future growth. Growth projections quantify the demand for land area to be developed for each time horizon for each land-use type. These projections are derived from socioeconomic drivers in a PSGM utility that determines the growth for each land-use category (industrial, high-density residential, etc.). Suitability rules for the location of future growth are specified using a PSGM table interface. When the PSGM runs, it allocates the new growth onto the developable land grid in the order of most to least suitable land and in user-defined order for land-use type (e.g., user decides that for each time step land use x gets first choice of available land, then y, then z. The output of the PSGM is a series of growth grids, one for each time step and land-use type, showing the anticipated future growth pattern.
The creation of a set of growth rules provides the basis for allocating new types of development and to specify land restricted from development. Each rule is assigned a priority weight in relation to the other rules to reflect the assumptions of the scenario being developed. The model output will reflect the complex aggregation of these rules. A separate rule set is created for each land-use class being assessed in the model. The various rule sets then are run consecutively in a comprehensive model simulation, letting each rule set allocate land based on available area and priority. In each scenario, once land is used up by one type of development, it becomes unavailable to any other land-use type. The model also notifies the user if there is insufficient land to meet the demand of a particular rule set. There is no limit on the number of rules in a rule set or the number of rule sets in a scenario. Model run time varies widely depending on the number of rules used, the size of the land bank, and the scale of the grid, lot, or parcel resolution to be utilized provided the data are in raster format.
OVERVIEW OF COMPOSITE SUITABILITY LOGIC
Each suitability rule created by the user belongs to a specific land-use type. Each land-use type can have 1 to N suitability rules. Each rule is used to create an individual suitability grid and if there is more than one rule, the individual suitability grids are added to create a composite suitability grid for that land-use category. This is the grid that is used to allocate growth assigned to that land-use type. The process is outlined as follows:
Determine the rule type and the associated parameters for each land use and user-selected growth scenario (i.e., distance, distance units, threshold, density, etc.).
Apply the rule and normalize (slice) the resulting suitability values on a 1 to 10 scale using the equal interval option. For example, if a "distance from" rule is run using major roads as a reference theme and the distance is one mile, the threshold is "LT 1" and "More Is Better = False," then a one-mile buffer is generated around roads and the suitability within the buffer area is set as 10 for cells closest to the road and 1 for those further away.
Multiply the suitability grid by its weight.
Add each suitability grid for the current model together.
OVERVIEW OF GROWTH ALLOCATION
Each land-use type that is selected by the user results in a composite suitability grid with higher values representing the more suitable areas. The growth allocation portion of the PSGM assigns growth (in the form of acres) for each land-use category and a time step to the composite suitability grid (CSG). As land is allocated, it is removed from consideration by the model in the allocation of other land-use needs. If there is not enough suitable land to accommodate the growth, the user is warned and that model is not allocated for future time steps.
[FIGURE 2 OMITTED]
The sequence of allocation is in land-use type order (user-defined) by time step. So each land-use type is run for the current time step and then the next time step is run. For example, if there were a set of four rules for Single Family Growth in ten-year intervals to 2050, the model would generate four grids representing each rule, one CSG, and grids that represent Single Family Growth in 2010, 2020, 2030, 2040, and 2050. These sets of grids are created for each rule set run for a given scenario. These grids then can be merged by land-use type, year of growth, etc., to display different scenario data for assessment. Figure 2 shows how separate rules are combined to create a CSG, in this case for commercial development. Each of the three rule grids shown has an assigned weight and encompasses all counties within the region. Within each county there also is a rule that attracts growth within the county. This rule, when combined with the others, results in the CSG shown for one county--Gwinnett--in the eastern part of the figure.
The projection period for this study is from the years 1980 to 2000. Both blind and guided model simulations were performed to evaluate land-cover land-use (LCLU) changes in the years 1990 and 2000. Blind simulations use existing trend data for model inputs that existed prior to 1980 and guided simulations use Census data for the projection period. Observed data in the form of classified LCLU from Landsat were used to evaluate model accuracy or performance.
Within the context of growth modeling, performing or even defining validation is a difficult task. Projections of a future state cannot be validated, but the performance of the model can be evaluated in "hind-cast" mode, in which projections are made from some past starting point and results are compared with observed LCLU at the simulation end time (Liu et al. 2007). White (2006) proposed two pattern-based techniques--a fuzzy polygon-based matching method and fractal analysis--to compare maps. Pontius and Schneider (2001) described using the "relative operating characteristic" as a quantitative measurement of performance of a land-cover change model.
In this research, we have taken a somewhat different approach in which we assume that in validating growth model results, it is important to isolate the effects of four sources of uncertainty in the modeling system: (1) errors in model inputs (population and employment projections, road network); (2) errors in model parameters (e.g., dwelling units per acre, persons per household); (3) errors in model formulation (growth rules); and (4) random errors. If model inputs and parameters were known exactly, the first two error sources would vanish and the growth model would develop exactly the correct amount of land for residential and commercial use in each county. However, the distribution of development still would be imperfectly modeled because of the latter two error sources, which control the spatial patterns but not the amount of growth. Growth rules control the proximity of growth to existing development and to the road network and the "clustering" of development. "Random errors" contribute to an inaccurate spatial distribution of development.
A projection made from a past starting point using only data available at that time, i.e., a "blind" simulation, will be affected by all four types of errors. Comparison of a blind simulation made for the year 2000 from a 1980 starting point to observed land-use patterns in 2000 allows a quantification of the effects of these errors. It is possible to minimize errors of the first type by providing as model inputs the actual population and employment data available from U.S. Census reports or other sources throughout the model simulation period. If we also utilize population and housing data for the projection period to calibrate model parameters, type 2 errors can be reduced (but not eliminated), leaving types 3 and 4 errors as the primary sources of uncertainty. This "guided" simulation provides better estimates of parameters such as jobs per acre than in the blind projection. By comparing the guided forecast with the observed growth for the 1980-2000 simulation period, the effects of errors in growth rules can be estimated. Given that the development of growth rules is a mixture of art and science, a trial-and-error process was followed to evaluate the impacts of errors that resulted from rule modifications.
[FIGURE 3 OMITTED]
DATA INPUTS FOR BLIND AND GUIDED SIMULATIONS
County-level data obtained from the Atlanta Regional Commission (ARC) and from the U.S. Census report were used to create two population tables, one for the blind simulation and one for the guided simulation. For the blind simulation, the population trends for each county, obtained from the 1970 and 1980 Census data, were used to extrapolate county populations through 2000. As shown in Figure 3, there are significant differences between the actual 2000 populations used in the guided simulation and those extrapolated from the 1970-1980 trends for the blind simulation. Overall, the population forecast used in the blind simulation overestimates population in 2000 by about 15 percent (see Figure 4), but overestimates or underestimates greatly in certain counties. To determine the required high-density residential (HDR) and low-density residential (LDR) populations by county for the blind run, we used 1980 Census data describing the percentage "urban" and percentage "rural" populations, as well as the urban and rural population changes from 1970 to 1980. Assuming that the trends in urban population approximate the trends in HDR population, we projected the urban population trend in five-year intervals to 2000 to define the percentage HDR and the total HDR and LDR populations for each county. These were used in the blind model simulations.
[FIGURE 4 OMITTED]
For the guided run, actual county populations were used for Census years (1980, 1990, 2000), with interpolation used for the intervening years. Additionally, we realized that using the Census urban/rural population splits to directly define the HDR and LDR populations without considering urbanized area definitions would be inaccurate. In the Census reports, urban population was split into three parts: in places of 50,000 people or more (U1), in places of 10,000-50,000 (U2), and in places of 2,500-9,999 (U3). Because not all of this "urban" population resides in HDR areas, we assumed that the percentages of the population in these three types of communities that live in HDR areas are 70 percent, 20 percent, and 10 percent, respectively. This leads to the formula:
HDR population = [0.7.sup.*] (U1) + [0.2.sup.*] (U2) + [0.1.sup.*] (U3).
The HDR populations were interpolated for 1985 and extrapolated for 1995 and 2000. LDR populations then were calculated as the difference between total and HDR populations. Figure 5 shows the percentage of HDR populations for the blind and guided simulations at five-year intervals from 1980 to 2000. The guided percentage of HDR values remained nearly constant over this period, while the blind forecast values, derived as discussed previously, increased from 57 percent to 71 percent.
[FIGURE 5 OMITTED]
Employment data from the U.S. Census exist in the form of number of jobs per county, including total, net change, and percentage change. Using the 1980 land-use base map (described in the next section), the number of acres for each land-use type was determined for each county, from which the number of jobs/acre was calculated for the blind simulation (see Table 1). Holding this ratio constant and continuing the 1970-1980 job growth rate, the number of jobs per county was determined for the 1980-2000 time period. For the blind simulation, we did not include commercial land use for three counties (Coweta, Forsyth, and Paulding) for which we did not have employment data for this time frame.
For the guided simulation, an adjustment was made to better define the jobs/acre ratio for the projection time period. We used the LandPro99 Commercial, Commercial/Industrial, and Industrial land-use classes and calculated the jobs/acre ratio as the number of jobs in 1999 (from ARC) to the number of acres classified as one of these three LandPro99 classes. As a result of this modification, jobs/acre ratios for the guided simulation were much higher than for the blind simulation (see Table 1).
[FIGURE 6 OMITTED]
[FIGURE 7 OMITTED]
The number of jobs for each county at each time step was obtained from Census data; these data were used directly in the guided simulation. For the blind simulation, the number of jobs in each county was determined from the jobs/acre ratio and the projection of the 1970-1980 job growth rate, as discussed previously. Figure 6 shows the number of jobs by county in 1980 and the blind and guided projections for 2000. The blind projections of jobs overestimate the actual number of jobs in some counties and underestimate in others. The blind simulation overestimates the total number of jobs in these ten counties by about 28 percent.
OBSERVED LAND-USE DATA
Classified Landsat data for the Atlanta region in 1980 was acquired by the Multispectral Scanner (MSS) at 75-meter spatial resolution. The 1980 observed data was the baseline from which model simulations began and was used as input data for existing LCLU and to develop employment projections for the blind simulations.
Classified Landsat data from the Thematic Mapper[TM] in 1990 and the Enhanced Thematic Mapper Plus (ETM+) sensor in 2000 at 30-meter spatial resolution were used as ground truth baselines from which model simulations were compared to evaluate accuracy. Image processing included rectification with Digital Line Graph data and atmospheric correction using a modified form of the dark object subtraction technique (Chavez 1988). Supervised training and classification of segments were performed to produce a 16-class land-cover land-use data set (Laymon 2004). Each land-use classification provides commercial, HDR, and LDR developed land-use types as well as undeveloped classes.
To use these data effectively, there must be a relationship between these land uses and information available in the population data as discussed previously. Land-use data standardization is an important consideration and the 1980, 1990, and 2000 data do cover the same geographic area and contain the same land-use types. Differences in image resolution and spectral range contribute to some errors in classification and lead to model simulation errors. For example, the coarser resolution of 1980 data likely results in some low-density residential development being classified as forest or agricultural classes.
A road network is a required model input for the PSGM. The level of detail can vary depending on data availability and output requirements. For model validation of the blind simulation, we used the primary road network, including only interstate, U.S., and major state highways that existed in 2000 (see Figure 7, left). These data were obtained from SMARTRAQ, which was developed in 1997 by the Georgia Tech Research Institute as a detailed database of land use for the 13-county metropolitan Atlanta region, which contains several classes of roads. This input is reasonable for the blind simulation given that the major road network experienced few changes between 1980 and 2000. Conversely, the secondary road network experienced significant modifications. For the guided simulation, we used a road network of primary and secondary roads from available USGS data at the Georgia Data Clearinghouse (see Figure 7, right).
MODEL ASSUMPTIONS AND PARAMETERS
The development of rule sets is critical to the successful function of the PSGM. In the Atlanta study area, each county had its own set of population predictions as described previously. Additionally, the distribution of population by residential land-use type (HDR, LDR) was calculated for each county. A standard set of rules was developed and used for all counties. For this study, the 1980 land-use base contains only one land-use type (urban) that can be used to attract and allocate commercial growth; so for each county, the net change in number of jobs was applied solely to this land-use type.
In running the model, we projected growth of commercial, HDR, and LDR land-use types separately. The trend in the Atlanta area seems to revolve around some basic rules governing where growth occurs, typically along transportation corridors, near existing commercial development, near existing development of the same type, forming the basis for the rule sets, which are shown in Table 2.
The Atlanta region has few barriers to expanding growth; there are no significant mountains or other geographic obstructions. This results in a large amount of land being equally attractive for new development. An additional rule was included directing each county's growth to remain in that county. However, should a county fill up with a given land-use type, development will overflow into adjacent counties, using the other rules set up for that land use.
In the blind simulation, dwelling units per acre (DUAC) and persons per household unit (PPHU) values were assigned using only general urban planning guidelines (Kindel 2006) and local knowledge. The assigned DUAC values for LDR ranged from 1 for rural counties to 2 for urban counties, while HDR DUACs ranged from 4 for rural counties to 7 for urban counties (see Table 3).
In the guided simulation, we first calculated DUAC values for each county based on the county populations and other Census data using the following procedure. First, PPHU for LDR and HDR were set to 2.54 and 2.41, respectively, based on 1990 Census data. Next, LDR and HDR acres for each county were obtained from the 2000 land-use map. The numbers of LDR and HDR units then were calculated from the LDR/HDR populations and PPHU values. Finally, the DUAC values for HDR and LDR classes were determined from the numbers of LDR and HDR units and the LDR and HDR acreages. However, because of classification errors and other uncertainties, DUAC values at the county level varied outside of a range deemed acceptable. Therefore, using the 13-county average DUAC values as guidance, we assigned DUACs in a manner that captured the urban, suburban, or rural nature of each county. Assigned LDR DUACs for the guided simulation were the same as for the blind simulation, ranging from 1 to 2, and HDR DUACs for the guided simulation were slightly lower than for the blind simulation, ranging from 3 to 6 (see Table 3).
[FIGURE 8 OMITTED]
[FIGURE 9 OMITTED]
[FIGURE 10 OMITTED]
Using the inputs and rule sets described in the previous sections, the blind and guided simulations were performed starting from the same 1980 Landsat observed data to predict land use in the year 2000. Figures 8 to 11 display the 1980 and 2000 observed data resampled to 90 meters for the three developed categories under analysis: commercial, HDR, and LDR. Also shown in these figures are the blind and guided simulations of each land-use category for the year 2000 at a 90-meter spatial resolution. Comparison of the blind and guided simulations for 2000 to the observed data indicates that the model is capturing development trends. This is most notable in the HDR and commercial land-use categories that tend to follow major transportation arteries. The guided simulation better represents the LDR spatial pattern than does the blind simulation, although both simulations underestimate LDR development. However, this error seems to be related in part to an apparent underestimation of LDR land use in the 1980 observed data (see the discussion at the end of this section).
As shown in Figure 8, the model captures the spatial commercial development trends in both the blind and guided simulations. The amount of commercial growth is overestimated in the blind simulation, particularly in areas more distant from the central business district. The total commercial land use in the guided simulation compares very favorably with the 2000 observed data.
Figure 9 shows that the model accurately captures spatial trends and the amount of HDR development in both simulations, with the guided simulation very highly correlated with the 2000 observed data. The blind simulation slightly overestimates HDR growth.
The model captures overall spatial trends for LDR development particularly in the guided simulation (see Figure 10), although both simulations underestimate the amount of LDR in 2000. The blind run clusters development too much along roadways in Cherokee County, along the Fulton-Forsyth boundary, and in Rockdale County. Overall, more dispersion and less clustering would improve the spatial output from the guided simulation. Figure 11 (a composite of Figures 8 through 10) shows the 1980 and 2000 maps of all land-use classes along with the blind and guided simulation results.
[FIGURE 11 OMITTED]
Kappa statistics and Moran's I were computed to evaluate the spatial accuracy of the model's projected 1990 and 2000 land use. The Kappa statistic is an index that compares the agreement against that which might be expected by chance. Kappa can be thought of as the chance-corrected proportional agreement, and possible values range from +1 (perfect agreement) to 0 (no agreement above that expected by chance) to -1 (complete disagreement). Kappa statistics for each developed land-use class are shown in Table 4.
The Kappa statistics in Table 4 are computed for each developed class predicted by the model and "entire urban" or an aggregate of all developed classes. Overall, the observed agreement is higher than the chance agreement, though the possibility of chance agreement is high for all classes. The Kappa statistic is higher for most developed classes in the guided simulation compared to the blind simulation; however, the differences are very small. The model's performance in predicting LDR was the lowest among the developed classes as indicated by the very low Kappa statistic for this class.
[FIGURE 12 OMITTED]
Given a set of features and an associated attribute, global Moran's I evaluates whether the pattern expressed is clustered, dispersed, or random. A Moran's I value near +1.0 indicates clustering, while a value near -1.0 indicates dispersion. Table 5 provides Moran's I values for each developed class.
The Moran's I results indicate the degree that model outputs are spatially clustered or dispersed for comparison to the year 2000 observed land-use targets. Both guided and blind projections have comparable autocorrelation values to the year 2000 observed land-use targets. Spatial patterns for the urban commercial and HDR classes are more like the year 2000 observed land use than the LDR class.
Table 6 quantifies model performance in terms of land-use percentages, total pixel changes, and land-use change from 1980 to 2000. Comparisons are made of each land-use category in 2000 to blind and guided land-use projections as well as the percent differences relative to the 2000 base map. Overall, the percent of total pixels being correctly projected by the model is good. Commercial land use is significantly overprojected, LDR is significantly underprojected, and HDR slightly overprojected in the blind simulation. For the guided simulation, the same biases exist but to a much lesser extent.
In addition to the analysis of overall model performance, subregions of the modeling domain were selected for an in-depth evaluation of model performance. Three subregions of comparable size were selected as noted in Figure 12. The subregions depict three representative growth environments commonly found in the modeling domain. Urban zone 1 depicts an area of the central business district and surrounding midtown residential and commercial development. High-density residential development is more common than low-density residential development in this zone. Suburban zone 2 is an area of rapidly expanding commercial and residential growth along the Interstate 75 corridor. This zone is a dynamic mixture of expanding commercial and both HDR and LDR development. Rural zone 3 is an area of undeveloped land and LDR development in 1980 that has been impacted by urban sprawl and primarily an increase in LDR development over the simulation period.
Model performance in the subregions was evaluated based on a set of potential model errors as noted in Table 7. For example, error 1 denotes that the 2000 observed land use was urban and the model predicted HDR. A value of 0 indicates accurate model performance in projecting the correct land use for that pixel in the year 2000 given 1980 inputs.
Tables 8 to 11 display model performance by error type in each of the subregion zones and the total domain for both the guided and blind simulations. The model performs best in the urban and rural zones compared to the suburban zone. Correctly projected pixels range from 55 percent to 57 percent in the urban and rural zones compared to only 30 percent to 31 percent in the suburban zone for both the guided and blind simulations. Overall model performance indicated a 56 percent to 57 percent accuracy in projecting the correct land use by pixel; however, there was only a very small increase in the number of correctly projected pixels between the guided and blind simulations.
The category of errors that impacted model performance varied significantly between the respective subregion zones. The urban zone was most influenced by error category -5, where the projected pixel should have been HDR and the model projected undeveloped land. For the rural zone, the most common error category was -4, which occurs when the model projects undeveloped land while the correct pixel was LDR. The suburban zone's largest error also was the -4 category error, and significant category 2 error also was found where the projected pixel should have been LDR while the model projected HDR.
DISCUSSION AND CONCLUSIONS
We have presented the results of an effort to validate the performance of a spatial growth model in forecasting land-use change over a 20-year period. The basis for this evaluation was a comparison between a "blind" and a "guided" forecast and remotely sensed land-use map at the end of the 20-year projection period. The blind forecast was made with no foreknowledge of population and employment growth, using only trend estimates that would have been available at the beginning of the projection. On the other hand, the guided forecast utilized actual population and employment data over the projection period. As previously discussed, inaccuracies in the blind forecast may be attributed to errors in (1) model inputs such as population projections, (2) model parameters, (3) model formulation (growth rules), and (4) random errors. The guided forecast nearly eliminates errors of the first type and reduces errors of the second type, leaving types 3 and 4 as the predominant sources of forecast uncertainty.
An inconsistency between the 1980 and 2000 land-use classifications is likely impacting overall results. In 1980, the area coverage of HDR land use was estimated to be more than three times that of the LDR area, while in the year 2000 this relationship was reversed, with the LDR area nearly 60 percent greater than the HDR area. Such a dramatic change in the HDR/LDR ratio seems highly unlikely and is a probable source of error in HDR and LDR projections. Because the model uses the 1980 base map as a starting point and simply adds developed land to it, any errors in the initial states will translate into the projected land use. This inconsistency between the starting (1980) land-use base map and the 2000 base map complicates validation and contributes to the below-average model performance in predicting LDR development.
The blind forecast overestimates development of urban (commercial and industrial) land use and dramatically underestimates low-density residential development. New LDR growth is confined too tightly to the road network. Growth of high-density residential land use is well estimated in the blind forecast. The guided forecast somewhat overestimates urban growth and underestimates LDR growth, but not so badly as does the blind forecast. HDR growth is very well simulated in the guided forecast. Spatially, the model is capturing the patterns of development desired for each of the developed classes. Kappa statistics and Moran's I autocorrelations indicate that model results for the urban commercial and HDR classes are good with values well above 0 for the Kappa statistics and autocorrelations around .90 for Moran's I. Results are not so good for the LDR class; however, the .79 Moran's I is very acceptable. The low Kappa statistic for LDR is influenced by the quantity error resulting in low quantities of land allocated for LDR. The Kappa statistic also may overestimate chance agreement at the expense of model performance or output accuracy. Pontius (2000) and others have argued that the Kappa statistic does not appropriately reward the model output or classification for accurate quantity estimates for each class. Nevertheless, the Kappa statistic in conjunction with Moran's I confirm that the rule-based aspect of the model that drives the spatial distribution of growth is performing well.
These results indicate that errors in population and employment forecasts have a substantial impact on the ability of a growth model to simulate urban land-use changes. Also affecting the model performance are the uncertainties in estimating model parameters such as dwelling units per acre and jobs per acre. Problems with forecasts and parameters are prominent in error types -4 and -5 that were evident in the subregion analysis, which underestimates LDR and HDR pixels respectively. The major overall error in the simulation is the underprojection of the number of pixels needed for LDR, which results in the model typically projecting too much undeveloped land. This result is captured in error category -4. The underforecasting of land needed for LDR is directly linked to the population forecasts and associated density assumptions made in the model inputs and parameters. Given accurate projections of population and employment, the agreement between observed and forecasted land use is quite good, and the spatial patterns of development are very realistic.
Finally, having GIS-based cost-effective dynamic models that allow the user to make adjustments to reflect the impact of faster or slower growth, different distributions of growth, and other factors that may impact the growth rate and dispersion of the population is critical. In that regard, the PSGM is advantageous compared to CA and other agent-based models, which require either the UNIX environment, a high number of simulations, a long run time, or a large amount of detailed spatial data (for example, raw land prices, construction costs).
The authors would like to acknowledge funding support from the NASA Applied Sciences Program for this research. Assistance from Dr. Ashutosh Limaye in the development of an algorithm for computing Kappa statistics and Ms. Sue Estes for technical editing also is greatly appreciated.
Brown, D. G., R. Walker, S. Manson, and K. Seto. 2004. Modeling land use and land cover change. In G. Gutman, A. C. Janetos, C. O. Justice, E. F. Moran, J. F. Mustard, R. R. Rindfuss, D. Skole, B. L. Turner II, and M. A. Cochrane, Eds., Land change science: Observing, monitoring, and understanding trajectories of change on the earth's surface. Kluwer Academic Publishers, 399.
Chavez, P. 1988. An improved dark-object subtraction technique for atmospheric scattering correction of multispectral data. Remote Sensing of Environment 24: 459-79.
Clarke, K. C., S. Hoppen, and L. J. Gaydos. 1997. A self-modifying cellular automaton model of historical urbanization in the San Francisco Bay area. Environment and Planning B 24: 247-61.
Clarke, K. C., and L. J. Gaydos. 1998. Loose-coupling a cellular automaton model and GIS: Long-term urban growth prediction for San Francisco and Washington/Baltimore. International Journal Geographical Information Science 12: 699-714.
EPA Urban and Economic Development Division. 1999. Transportation and environmental effects of infill versus greenfield development. EPA Publication Number 231-R-99-005.
Estes, Jr., M. G., H. Johnson, W. L. Crosson, A. Limaye, D. Quattrochi, D. Khan, and W. Lapenta. 2006. Projecting future urbanization with Prescott College's spatial growth model to promote environmental sustainability and smart growth, a case study in Atlanta, Georgia. National Association of Environmental Professionals, April 23 to 26, Albuquerque, New Mexico.
Estes, Jr., M G., W. L. Crosson, H. Johnson, and D. Quattrochi. 2007. Validation and demonstration of the Prescott spatial growth model in the Atlanta, Georgia region. International Symposium on Remote Sensing and Environment, June 25 to 29, San Jose, Costa Rica.
Frenkel, A. 2004. A land-consumption model. Journal of American Planning Association 70: 453-40.
Kindel, P. J. 2006. Building types. In Planning and urban design standards. New York: John Wiley and Sons, 185-95.
Kocabas, V., and S. Dragicevic. 2006. Coupling Bayesian networks with GIS-based cellular automata for modeling land use change. Lecture Notes in Computer Science 4197: 217-33.
Landis, J. D. 1995. Imagining land use futures: Applying the California futures model. Journal of the American Planning Association 61: 438-57.
Laymon, C. 2004. National consortium on remote sensing in transportation--environmental assessment. Http://wwwghcc.msfc.nasa.gov/land/ncrst/raclassmeth.html.
Liu, W., K. Seto, Z. Sun, and Y. Tian. 2007. Urban land use prediction model with spatiotemporal data mining and GIS. In Q. Weng and D. Quattrochi, Eds., Urban remote sensing. CRC Press, Taylor and Francis, 165-78.
Pontius, Jr., R. G. 2000. Quantification error versus location error in comparison of categorical maps. Photogrammetric Engineering and Remote Sensing 66(8): 1,011-16.
Pontius, R. G., and L. C. Schneider. 2001. Land-cover change model validation by an ROC method for the Ipswich watershed, Massachusetts, USA. Agriculture, Ecosystems and Environment 85: 239-48.
White, R. 2006. Pattern based map comparisons. Journal Geographical Systems 8: 145-64.
White R., and G. Engelen. 1993. Cellular automata and fractal urban form: A cellular modeling approach to the evolution of urban land-use patterns. Environment and Planning Association 25: 1,175-99.
Maurice G. Estes, Jr., William L. Crosson, Mohammad Z. Al-Hamdan, and Dale A. Quattrochi are located at the National Space Science and Technology Center in Huntsville, Alabama. All are employed by Universities Space Research Association, except Dale Quattrochi who is employed by the National Aeronautics and Space Administration. Hoyt Johnson III is with H3J Consulting.
Maurice G. Estes, Jr.
National Space Science and Technology Center
Universities Space Research Association
320 Sparkman Drive
Huntsville, AL 35805
Fax: (256) 961-7788
Hoyt Johnson III
331 S. Pleasant St.
Prescott, AZ 86303
Table 1. Jobs per acre used in the blind and guided simulations County Blind Guided Cherokee 7.1 7.7 Clayton 4.0 12.5 Cobb 6.7 14.3 DeKalb 12.5 16.7 Douglas 2.2 7.1 Fayette 1.9 7.7 Fulton 16.7 20.0 Gwinnett 5.3 11.1 Henry 2.0 5.9 Rockdale 3.4 9.1 Table 2. Rule sets for the blind and guided simulations. Rules 1 to 4 each use a distance and weight as shown for each land-use type. Rule 2 was not used for low-density residential in the guided simulation. Rule Type Commercial High-density Residential Distance Distance (Miles) Weight (Miles) Weight 1. In county--develop land in respective county first 1.0 10 1.0 10 2. Near roads--within x miles of major roads 0.5 4 2.0 3 3. Within x miles of existing development of same type 0.75 4 2.0 3 4. Follow new growth--within x miles of new development of same type 0.75 2 2.0 2 5. Random--allows spontaneous development Not Used Used 6 Rule Type Low-density Residential Distance (Miles) Weight 1. In county--develop land in respective county first 1.0 10 2. Near roads--within x miles of major roads 6.0 2 3. Within x miles of existing development of same type 2.0 1 4. Follow new growth--within x miles of new development of same type 3.0 3 5. Random--allows spontaneous development Used 8 Table 3. Dwelling units per acre (DUAC) assigned for each county for low-density residential (LDR) and high-density residential (HDR- guided and HDR-blind) land-use types LDR HDR HDR County Guided and Blind Guided Blind Cherokee 1 4 4 Clayton 1.5 5 6 Cobb 2 6 7 Coweta 1 3 4 DeKalb 2 6 7 Douglas 1.25 4 5 Fayette 1.25 4 5 Forsyth 1.25 3 5 Fulton 2 6 7 Gwinnett 1.5 6 6 Henry 1.25 4 5 Paulding 1 3 4 Rockdale 1.25 4 5 Table 4. Kappa statistics for the 1990 and 2000 projected land-use model outputs 1990 Guided Projected Land Use Class Observed Chance Kappa Agreement Agreement Statistic Urban Commercial 0.96 0.93 0.42 High-density Residential 0.89 0.85 0.28 Low-density Residential 0.91 0.91 0.004 Entire Urban 0.84 0.72 0.41 1990 Blind Projected Land Use Class Observed Chance Kappa Agreement Agreement Statistic Urban Commercial 0.95 0.92 0.40 High-density Residential 0.89 0.85 0.28 Low-density Residential 0.92 0.92 0.003 Entire Urban 0.83 0.72 0.40 2000 Guided Projected Land Use Class Observed Chance Kappa Agreement Agreement Statistic Urban Commercial 0.95 0.92 0.35 High-density Residential 0.87 0.83 0.26 Low-density Residential 0.83 0.81 0.006 Entire Urban 0.78 0.64 0.40 2000 Blind Projected Land Use Class Observed Chance Kappa Agreement Agreement Statistic Urban Commercial 0.94 0.91 0.32 High-density Residential 0.87 0.82 0.27 Low-density Residential 0.83 0.82 0.006 Entire Urban 0.78 0.62 0.43 Table 5. Moran's I for the year 2000 observed land use and projected land use for 2000 model simulations Observed Guided Blind Projected Projected Urban Commercial 0.89 0.94 0.93 High-density Residential 0.90 0.90 0.89 Low-density Residential 0.91 0.79 0.82 Table 6. Projected land use, in number of pixels and in percentage of study area, for blind and guided simulations compared to year 2000 base map 2000 Observed 2000 Blind Simulation % Difference from 2000 Pixels % Pixels % Observed Undeveloped 724,662 59.1 827,617 65.9 11.5 Commercial 67,159 5.2 117,211 9.1 74.4 High-density Residential 178,860 13.9 215,295 16.7 20.3 Low-density Residential 280,871 21.8 107,189 8.3 -61.9 2000 Guided Simulation % Difference from 2000 Pixels % Observed Undeveloped 843,757 67.1 13.6 Commercial 91,040 7.1 35.5 High-density Residential 184,358 14.3 3.0 Low-density Residential 148,158 11.5 -47.3 Table 7. Definition of possible model errors Error 2000 Observed Model Projection Category -6 Urban Undeveloped -5 HDR Undeveloped -4 LDR Undeveloped -3 Urban LDR -2 HDR LDR -1 Urban HDR Urban, HDR, LDR, or 0 Undeveloped Correct Projection 1 HDR Urban 2 LDR HDR 3 LDR Urban 4 Undeveloped LDR 5 Undeveloped HDR 6 Undeveloped Urban Table 8. Model performance in urban zone 1 by type of error Base--Guided: Urban Zone 1 Base--Blind: Urban Zone 1 VALUE COUNT VALUE COUNT -6 313 1.21% -6 401 1.55% -5 2288 8.88% -5 2692 10.45% -4 960 3.73% -4 1074 4.17% -3 8 0.03% -3 8 0.03% -2 243 0.94% -2 235 0.91% -1 1276 4.95% -1 1293 5.02% 0 14328 55.61% 0 14168 54.99% 1 2523 9.79% 1 2308 8.96% 2 1397 5.42% 2 1345 5.22% 3 381 1.48% 3 319 1.24% 4 95 0.37% 4 93 0.36% 5 1441 5.59% 5 1367 5.31% 6 510 1.98% 6 460 1.79% Total 25763 Total 25763 Table 9. Model performance in suburban zone 2 by type of error Base--Guided: Suburban Zone 2 Base--Blind: Suburban Zone 2 VALUE COUNT VALUE COUNT -6 935 2.89% -6 801 2.47% -5 2667 8.24% -5 2532 7.82% -4 8230 25.42% -4 7762 23.98% -3 51 0.16% -3 46 0.14% -2 177 0.55% -2 153 0.47% -1 1054 3.26% -1 1186 3.66% 0 9838 30.39% 0 9941 30.71% 1 1388 4.29% 1 1369 4.23% 2 4592 14.19% 2 4831 14.92% 3 1379 4.26% 3 1620 5.00% 4 231 0.71% 4 187 0.58% 5 1194 3.69% 5 1388 4.29% 6 636 1.96% 6 556 1.72% Total 32372 Total 32372 Table 10. Model performance in rural zone 3 by type of error Base--Guided: Rural Zone 3 Base--Blind: Rural Zone 3 VALUE COUNT VALUE COUNT -6 652 2.64% -6 693 2.80% -5 2765 11.18% -5 2947 11.91% -4 4401 17.79% -4 4680 18.92% -3 89 0.36% -3 54 0.22% -2 332 1.34% -2 188 0.76% -1 180 0.73% -1 174 0.70% 0 13703 55.39% 0 14196 57.39% 1 101 0.41% 1 101 0.41% 2 194 0.78% 2 155 0.63% 3 59 0.24% 3 59 0.24% 4 1390 5.62% 4 686 2.77% 5 804 3.25% 5 737 2.98% 6 67 0.27% 6 67 0.27% Total 24737 Total 24737 Table 11. Model performance in the total model domain by type of error Base--Guided: Total Base--Blind: Total VALUE COUNT VALUE COUNT -6 18361 1.43% -6 14949 1.16% -5 81201 6.32% -5 71443 5.56% -4 178211 13.86% -4 164775 12.82% -3 3219 0.25% -3 3039 0.24% -2 10913 0.85% -2 10469 0.81% -1 15591 1.21% -1 16815 1.31% 0 733156 57.04% 0 720913 56.09% 1 24570 1.91% 1 29937 2.33% 2 59447 4.63% 2 65501 5.10% 3 16629 1.29% 3 24215 1.88% 4 69830 5.43% 4 67381 5.24% 5 54557 4.24% 5 65608 5.10% 6 19651 1.53% 6 30294 2.36% Total 1285336 Total 1285339
|Printer friendly Cite/link Email Feedback|
|Author:||Estes, Maurice G., Jr.; Crosson, William L.; Al-Hamdan, Mohammad Z.; Quattrochi, Dale A.; Johnson, H|
|Date:||Jan 1, 2010|
|Previous Article:||Development of neighborhoods to measure spatial indicators of health.|
|Next Article:||ParticipatoryGIS: a web-based collaborative GIS and multicriteria decision analysis.|