Printer Friendly

Tree-based techniques to predict soil units.


Soil maps are part of the basic infrastructure used by nations to manage and protect their soil resources. Soil scientists face great challenges in meeting society's demand for information at a scale and level of detail appropriate to support responsible land use decisions (Lagacherie and McBratney 2007; Odeh et al. 2007). It is in this context that researchers such as Hengl and Heuvelink (2004), Minasny et al. (2003), McBratney et al. (2003) have made advances in soil mapping pedometrics, predictive algorithms, dynamic modelling, integration of geographic information systems and geostatistical tools, as well as the use of high-resolution images.

The greater changes in the way to map soils comprises the selection of landscape covariates, which arc useful in predicting different soil units based on soil-landscape patterns. Nowadays, using modem tools and digital data, such as surface models and remote sensing data, it is possible to combine tacit knowledge and statistical models to process large amounts of data, creating more accurate maps than those created using the traditional methods of soil surveys.

Spatial analysis with terrain attributes represented by numerical surface models as covariate to predict soil units were exemplified for McBratney et al. (2003), Boettinger (2010) and Gallant and Austin (2015), among others. Some studies have presented techniques for autoclassification and mapping landforms elements (Iwahashi and Pike 2007; Ehsani and Quiel 2008). Jasiewicz and Stepinski (2013) used the concept of 'gcomorphons' to classify the 10 most common recognised surface types based on ternary patterns calculated from a central pixel and the relative elevation of neighbouring pixels.

According to Boettinger et al. (2008) and Ben-Dor et al. (2008), using remote sensing data as environmental covariates can be useful in digital soil mapping. Chagas et al. (2016) reported that orbital remote sensing data combined with random forest models showed satisfactory results in predicting particle size fractions from Brazilian soils, particularly sand and clay content.

However, the contribution of spectral indices as covariates to map soil units is indirect. The spectral indices help distinguish soil units based on the assumption that they are related to organic carbon content, clay content and the nature of clay mineral, among other properties that are also used as taxonomic criteria to classify different soil types. Sabins (1997) noted that Landsat Enhanced Thematic Mapper (ETM) is an important data source for the detection of iron oxide and hydroxyl minerals.

In much of the world, the only maps available are coarse-scale polygon maps that do not provide the detail needed for field-scale management. In Brazil, over the past decade, studies such as those of Chagas et al. (2011), Carvalho Junior et al. (2011), and ten Caten (2011) used digital soil mapping techniques to predict soil classes, thus yielding more detailed soil maps than the legacy data available nowadays (1 :250 000 or coarse scale). These studies are examples of digital soil mapping techniques applied to subtropical soils using a combination of classic concepts of soil-landscape relationships and modem tools for data analysis and modelling to improve the information provided by soil surveys in a quantitatively demonstrable way, which has inspired the present study.

Using modern computational techniques for the handling and analysis of spatial data yields a better cost-benefit relationship then traditional soil survey techniques. In this sense, digital soil mapping techniques represent an important strategy for soil mapping and extracting information from available datasets (Behrens et al. 2005) and include datamining methods such as decision trees (DT; Breiman et al. 1984) and random forests (RF; Breiman 2001).

Several studies have successfully demonstrated the use of tree-based methods to predict soil classes, including Crivelenti et al. (2009), Moonjun et al. (2010), Giasson et al. (2011) and Lorenzetti et al. (2015), all of whom used DT models, and Stum et al. (2010), Barthold et al. (2013) and Rad et al. (2014), who used the RF models.

Although digital soil mapping techniques are commonly used now, most methods do not detect spatial relationships (Moran and Bui 2002). Thus, the present study investigated the relationships between covariates and soil classes using tacit knowledge to determine the best covariates to be used as input in tree-based methods to create a digital soil map of the Guapi-Macacu watershed.

DT models have the advantage of analysing soil-landscape relationships in similar manner to pedologist reasoning (Crivelenti et al. 2009; Giasson et al. 2011), enabling an understanding of the correlation between soil data and terrain attributes (Bou Kheir et al. 2010). Conversely, RF models have few advantages other than their statistical methods, as noted by Breiman (2001) and Liaw and Wiener (2002). Some of the advantages of the RF models highlighted by Breiman (2001) and Liaw and Wiener (2002) include the ability to model nonlinear dimensional relationships, the ability to use categorical and continuous covariates, presents resistance of overfitting and relative robustness (sensitivity in measurement showing highly accuracy in the prediction), the need for only a few parameters to be implemented (with regard to noise in the data), providing an unbiased measure of error rate (out-of-bag error) and, finally, being able to set the importance of the predictive variables used. The main disadvantage of RF models is the restricted interpretation of the results, because the relationship between the predictors and the responses is not explicit for each tree in the forest, which defines this technique as a 'black box' approach (Grimm et al. 2008).

Although the use of digital soil mapping techniques in Brazil has grown in recent years, there is a need for more studies on the selection of predictor covariates, sampling techniques, predictive methods and methods of validation, which could address some of the issues identified above, particularly those related to the level of detail on the maps, making the products more useful for agricultural and territorial planning.

The aim of the present study was to evaluate the ability of different tree-based methods for digital soil mapping in tropical conditions in relation to soil-forming factors to predict the distribution of soil units. The methods tested in the present study were the RF and DT models, and statistical parameters were used to select the better soil map and the better model. The coherence between soil map units (legacy data) and the digital soil maps produced using the models were evaluated, with accuracy determined through field validation.

Material and methods

The study area is included within the hydrographic region of Guanabara Bay (Brazil), corresponding to the Guapi-Macacu watershed (1250.78 [km.sup.2]) located between the coordinates 22[degrees]41'56"-22[degrees]21' 12"S and 43[degrees]3,35"-42[degrees]33'4"W.

The Macacu River is the main drainage feature and flows in the north-east-south-west direction. The main headspring of the Macacu River is in the municipality of Cachoeiras de Macacu in Tres Picos State Park and Serra dos Orgaos National Park, whereas the mouth flows to Guanabara Bay. The rivers that make up the basin have been modified considerably over the years as a result of drainage processes and rectification of rivers' bed, as well as because of urban occupation, which has been marked in recent decades (Dantas et al. 2008). These factors are directly responsible for the disappearance of marshes, wetlands and much of the mangroves. The transformation of these natural landscapes can have various harmful consequences on the ecosystem, including on the dynamics of estuarine waters, salinity and sedimentation, among others. Anthropogenic interference with the drainage of marginal lands through deforestation and consequent erosion tends to undermine the natural recharge of the aquifer system, reducing the flow of rivers, especially during drought periods (Villela and Mattos 1975).

The study area has a wide topographic range varying from zero (sea level) up to 2000 m in the watershed boundary. Thus, the area has a wide range of landscapes features, such as escarpments, massifs, hills and coastal plains. The variety of environments includes different ecosystems, such as forests, pastures, marshes and mangroves. The lithology types include gneisses of the 'Paraiba do Sul Complex'; migmatites and granites from the 'Rio Negro Complex', 'Suite Serra dos Orgaos' and 'Suite Rio de Janeiro'; intrusive bodies of the 'Rio Bonito Alkaline Complex' (syenite and nephelinesyenite); and Quaternary sediments (continental and transitional to marine/coastal), represented by marsh sediments, marine sediments, coastal sediments and alluvial sediments (Silva and Cunha 2001). In the Soil Survey of Rio de Janeiro State (at a scale of 1 :250 000), Carvalho Filho et al. (2003) described the typical classes that occur in the area as Planosols, Cambisols, Gleysols, Ferralsols, Fluvisols and Regosols according to the classification of the World Reference Base for Soil Resources (International Union of Soil Sciences (1USS) Working Group 2014).

A hydrologically consistent digital elevation model (DEM) was generated through interpolation of the primary elevation data, which involved contours and points extracted from Brazilian official charts at a scale of 1:50 000 (Brazilian Institute of Geography and Statistics and Brazilian Geographic Services; IBGE 1974, 1979a, 19796, 1979c, 1983, 2008, 2013). The elevation model was generated using 'TopotoRaster', which uses a specific interpolation method based on the ANUDEM algorithm developed by Hutchinson (1993) to obtain a hydrologically consistent model. Brazilian official charts with stream network features were obtained in order to support the hydrological consistency of the DEM, in this case also at a scale of 1:50 000. The procedures to obtain the DEM were performed in ArcGIS Desktop v.10. (Environmental Systems Research Institute (ESRI) 2010.

Terrain attributes were generated from the DEM, obtained in ArcGIS Desktop v.10, using the Spatial Analyst Toolbox (Environmental Systems Research Institute (ESRI) 2010) to develop a set of variables used as inputs for the predictive models. The attributes derived from the DEM were slope, curvature, compound topographic index (CTI) according to Moore et al. (1991), Euclidean distance from stream networks and 'geomorphons' landform maps (Jasiewicz and Stepinski 2013). Geomorphons were created using a flexible procedure, making it possible to recognise the same types of landforms at different scales. At the end of the autoclassification process, the 10 most commonly recognised surface types were identified. Previous studies identified adequate parameters to create the landform map of the area (Pinheiro et al. 2016).

The landform map (geomorphons) was created using the Geographic Resources Analysis Support System (GRASS) software (GRASS Development Team 2013) through the geomorphons algorithm (Jasiewicz and Stepinski 2013; the code can be downloaded from, accessed 16 April 2014). This algorithm classifies surfaces according to 10 most common landforms and the DEM as the input, a predefined search radius size (L) corresponding to 45 cells (pixels) and a flatness threshold (t) equal to 1.0[degrees] (Pinheiro et al. 2016). All layers were generated in a spatial resolution of 30 m and projected in UTM horizontal datum SIRGAS 2000.

Remotely sensed spectral data were used to generate three indices by combining bands from Landsat 5 (image of July 2011), also at 30-m spatial resolution. The three indices were the normalised difference vegetation index (NDVI), the iron oxide index and the clay minerals index, as per Sabins (1997, 1999). The procedures used to obtain these indices were performed in ERDAS Imagine v.9.1 as follows:

NDVI = (Band 4 - Band 3)/(Band 4 + Band 3) (1)

Clay minerals index = Band 5/Band 7 (2)

Iron oxide index = Band 3/Band 1 (3)

The clay minerals and iron oxide indices are used in remote sensing applied to geological studies to recognise hydrothermal alteration and unaltered rocks, which can be also used to distinguish soils with different mineralogical characteristics (Sabins 1999). The clay minerals index can be used to highlight the occurrence of minerals as kaolinite, illite and smectite, which have high and low reflectance for Bands 5 and 7 respectively from Landsat 5 sensors. This relationship is also useful to distinguish areas of soil-unchanged rocks.

The iron minerals, as goethite and hematite, have low reflectance in Band 1 and high reflectance in Band 3 from Landsat 5; therefore, soils tend to have higher ratios between these spectral bands depending on the mineralogical composition of the soil. The study area contains soils with different characteristics regarding clay type and the clay and iron oxides content in surface horizons. Thus, these indices were used as additional variables to help distinguish different soil types.

The geology map was also used as a covariate input in the predictive models. This map was generated from a digitalised Brazilian official chart at a scale of 1:50 000 (Geological Survey of Brazil and Department of Mineral Resources; DRM 1979, 1980a, 19806, 1980c, 1981a, 19816, 1984). The charts were compiled in a mosaic and the area corresponding to the watershed limits was then selected. These procedures were performed in ArcGIS Desktop v.10.

The field survey included 100 sampling points defined by conditioned Latin hypercube sampling (cLHS). This method also took into account the feasibility of sampling (Minasny and McBratney 2006; Roudier et al. 2012), which was determined as lying within 100 m on each side of the roads based on previously studies (Carvalho Junior et al. 2014). The procedure to restrict the access area was performed in ArcGIS Desktop v.10.1 through the buffer tool by using the distance from the road vector file obtained from the official Brazilian database at a scale of 1:50 000 as a reference. The choice of 100 sample points was made on the basis of the size of the area, the scale of the input database (vector files used to create the DEM) and the level of detail of the soil survey that corresponds to a semidetailed level based on density of observations (number of soil pits by area), in accordance with the Brazilian Manual of Soil Survey (IBGE 2013). These 100 points formed the basis for pixel sampling considering a buffer area around the typical profiles used to represent the soil classes. Through this process, 500 pixels were selected to represent each soil class to be mapped, totalling 4500 pixels used as inputs for model training. The collection of a greater sample set addresses statistical issues about the number of samples used to create a consistent spatial pattern that represents the occurrence of each soil class based on the variability of covariates for each soil map unit as observed on field expeditions. Chagas et al. (2011, 2013) and Pinheiro (2012) performed similar procedures using neural networks to predict soil classes in Brazil.

The covariates driving the selection of the 100 sampling points through cLHS were derived from the DEM with 30-m spatial resolution, which was converted in vector file (points) through the 'Raster to Point' command and those points contained within the buffer around roads selected. This procedure was performed through the 'Select by Location' command, whereby the points contained in the buffer area were selected and then exported. Finally, the values of other covariates (slope, curvature and land use) were added. The land use map derived from Landsat 5 data, also at a spatial resolution of 30 m, was added primarily to remove the urban area and thus to prevent the selection of sample points located in urban areas.

The purposes of soil surveys are to detect soil-landscape relationships and to support pixel sampling under similar conditions to represent soil classes by using classification algorithms based on field observations. The classification, in the first instance, was according the Brazilian System of Soil Classification (Santos et al. 2013) and posteriorly adapted from the World Reference Base for Soil Resources (IUSS Working Group 2014). The models tested were the DT and RF models, and the classification procedures were developed using R Software (R Development Core Team 2013). The prediction by the RF model was made using 'randomForest' suite of tools (Liaw and Wiener 2002). The prediction by the DT model was developed using the 'rpart' package (R Development Core Team 2013). The equations and parameters for the RF and DT models are given in Eqns 4 and 5 respectively.

randomForest (soil map unit ~ geology + DEM + slope + curvature + Euclidean distance from streams + CTI + NDVI + clay minerals + iron oxide + geomorphons, data = dataset 1, importance = T, proximity = T, ntree = 250, mtry = 5) (4)

rpart (soil map unit ~ geology + DEM + slope + curvature + Euclidean distance from streams + CTI + NDVI + clay minerals + iron oxide + geomorphons, method = 'class', data = classes) (5)

The tree-based models are usually easy to implement and can work with large volumes of different types of data. The basic algorithm for the DT defines the architecture of the tree recursively (top to bottom). The algorithm is defined on the basis of a set of training samples, which are sequentially divided into smaller and more homogeneous subsets with regard to the dependent variable (Han and Kamber 2001), in this case soil units. The algorithm for the RF model can be understood as a combination of predictions for various trees, which defines the predicted classes based on the number of votes considering all trees (Breiman 2001). The implementation of the RF model requires the definition of parameters for the number of trees, the minimum number of data per terminal node and the variable number used by each tree (Liaw and Wiener 2002). In the present study, the number of trees was set at 250 and the number of data per terminal nodes was set to 5 (default settings for the program), with the number of variables corresponding to the number of input variables or predictors.

The output classes corresponded to nine soil units and the characteristics observed in the field are presented in Table 1. In the present study, we opted for an approach that is based on the soil-landscape characteristics as observed through the soil survey; for this reason, in the covariates selection, collinearity analysis was not performed among the covariates in order to remove any of that selected by the pedologists.

Evaluation of model performance was based on the kappa and overall indexes (Congalton and Green 1999) and accuracy was based on the contrast with different sample sets of legacy data (Chagas et al. 2013). The confusion matrix to determine kappa and overall indices was obtained by cross-validation. The overall index is obtained by dividing the number of points correctly classified by the total number of points, whereas the kappa index measures the concordance between predicted map units and reference map units, excluding the effect of casualty (Congalton and Green 1999; Giasson et al. 2013).

In addition, model performance was evaluated based on the consistence of the generalisation to predict the soil units in accordance to legacy map and a field validation with 109 control samples not used in the training phase. This last procedure measures the percentage of control samples correctly classified by the different models.

Results and discussion

Analysis of terrain covariates

The success of soil modelling is directly related to the quality of input data and the choice of predictive variables (Zhu 2001; Minasny et al. 2003). In order to choose a relevant set of covariates, it is necessary to perform a detailed analysis of terrain attributes and thus develop an understanding of soil-landscape relationships. Table 1 summarises the environmental conditions associated with the occurrence of the different soil classes.

The Guapi-Macacu watershed has, in the first instance, two relief systems: degradation forms (steep hills and massifs) and aggradation forms (plains and isolated hills with a small range of altitude variability). The degradation relief system predominantly comprises removal surfaces, where processes of leaching and material loss occur. In contrast, the aggradation system tends to receive the material removed from the higher parts and remote locations of the watershed. In this sense, some soils can be differentiated as formed in situ (autochthonous) or from sediments coming from other locations, whether from wind, colluvial or fluvial. This distinction is primarily related to the location of these soils in the landscape (Oliveira and Moniz 1975; Figueiredo et al. 2004; Varajao et al. 2009).

The selection of covariates in the present study was supported by previous studies in the area (Pinheiro 2012) and an exploratory analysis of dataset variability based on descriptive statistical values (mean, median, minimum, maximum and s.d.). Fig. 1 presents the mean ([+ or -] s.d.) values of the predictive covariates of the nine different soil units. These analyses help define the range of variability of terrain attributes in each soil unit, which is important for coherent pixel sampling to use in training the algorithms. This approach also highlighted the importance of some terrain covariates related to the occurrence of certain soil types, thus justifying their use as inputs in the predictive models.

Fig. 1, supported by observations in situ, corroborated that Haplic Acrisols (Clayic) (ACce) occur on gentle slopes with low elevation; in contrast, Haplic Acrisols (Chromic) (ACcr) occur at higher locations under various slope conditions. Haplic Ferrasols (Dystric) (FRdy) predominantly occur on convex surfaces. A similar pattern was observed for ACcr, although these are predominantly associated with the alkaline rocks common within the geology of this particular region of the watershed. Haplic Cambisols (CM) occur predominantly on concave forms, on steep slopes and at high altitudes, and are associated with landscape units where Regosols and rock outcrops also occur. The Gleysols are located at low altitudes (<30 m elevation), on gentle slopes (<3%) with planar curvatures (between 0.01 and -0.01) and are subdivided into two main types: (1) Haplic Gleysols (GL), which have a wide distribution across the watershed, predominantly in flat areas on the flood plains, but also in valleys; and (2) Umbric Gleysols, which were observed in association with conditions favourable to an increased amount of organic material, such as at the shallowest water table levels. The presence of Endosalic Gleysols is further associated with the influence of estuarine and fluviomarine deposits, in this case due to the salic and sulfuric features in the subsurface horizons. The main differences between the Haplic and Endosalic Gleysols (GLszn) units arc due primarily to differences in slope and compound topographic index; GLszn have higher CTI values and gentle slopes. The Fluvisols have higher CTI values related with the intrinsic conditions to form hydromorphic soils, are located a shorter distance from stream networks and are primarily associated with drainage features, as can be observed in the case of the Macacu and Guapi-Acu rivers.

Regarding remote sensing covariates, variations in the clay mineral index were small and values for this index were similar between the units. However, the iron oxides index showed more variability, particularly for the ACce, FRxa, GL, and CM soil units, as shown in Fig. 1g, h. Indices derived from remote sensing data show differences between land usages, aiding the characterisation of soil units. This is due primarily to the relationship between soil capability and the predominant types of land use as observed during field visits.

In areas covered by different vegetation types (agriculture, pasture and/or grazing lands, tropical forest, mangroves and highland fields), the indices derived from remote sensing data highlight different spectral behaviour. Although the soil is more or less covered by vegetation, the spectral response of the vegetation is related to soil conditions, and can prove helpful in distinguishing shallow soils (as Regosols) or hydromorphic soils (as Gleysols) for example. In this sense, we expected that the indices could represent soil variation by its effect on the vegetation.

Yang and Yang (1997) also showed that indices derived from remote sensing data can be useful in modelling climate regimes and morphological characteristics of the soil. Relationships between the covariates used and soil formation are summarised in Table 2.

The approach adopted in the present study regarding the selection of covariates is another way in which the expertise of the pedologist can be used to predict the occurrence of soil units in tropical landscapes through digital soil mapping techniques in addition to the ways presented by Menezes et at. (2013) and Godinho Silva et at. (2014). In those studies, the authors used fuzzy logic to translate tacit knowledge determining the trend in covariate variability to predict the occurrence of soil units in Brazil.

In addition to the covariates, other numerical surface models could also be tested, such as the multiresolution index of ridge top flatness, the multiresolution index of valley bottom flatness and the mass balance index, among others, that could help explain the occurrence of soil units and should be considered for future research.

Of the remote sensing data, the clay minerals and iron oxides covariates were found to be less important in predicting soil units than the NDVI. This can be explained by the important relationship between those indexes (clay minerals and iron oxides) and subsurface horizon conditions reported by Taghizadeh-Mehijardi et at. (2016), who noted that topsoil properties relate better to remote sensing data than subsoil properties. Although collinearity tests in the present study were not performed between iron oxide and clay minerals indices, small differences were observed between them and the soil units considered, which could explain why they were dropped from the input covariates for both tree-based models in the present study. However, the DT model considered the NDVI in the second node to separate soil units. The findings suggests that, on the basis of the remote sensing data evaluated, land use is related to soil units and has an indirect effect on soil genesis. This corroborates the field observations that some deeper soils under flat terrains are preferred for agronomic purposes rather than shallow and wet soils.

Evaluation of tree-based methods

Both the RF and DT methods were used to create soil unit maps, and the performance of the different models was evaluated using a confusion matrix (Table 3).

The ACcr and GLszn classes show better classification results for each method (Table 3). This can be explained by the particular parent material from which these soils were formed, namely alkaline rocks and estuarine deposits respectively. The strong relationship illustrated by this example highlights the importance of a geology map as an input variable. In the DT model, ACce classes are likely to be confused with FRxa classes. Even an expert pedologist could easily confuse these two classes because they can occur in similar landscape conditions, differing only in the increase in clay in the subsurface horizons. In the DT model, the Haplic Acrisols class also lends itself to confusion with the GL class, because both classes can occur in footslopes, although with higher CTI values for GL. In addition, GL can be confused with Fluvisols, which are similarly influenced by hydromorphic processes. The CM class, in turn, can be confused with Regosols because both can occur on the tops of hills and on steep slopes. FRxa can be confused with FRdy, which have an extensive distribution within the watershed. Within both models, the FRdy units are liable to be confused with CM, which can also occur under a wide range of landscape conditions.

Evaluation of the performance of the algorithms was based on the values of the overall and kappa indices obtained from a confusion matrix. Comparisons between models show better values for the RF classifier (overall = 0.97; kappa = 0.96) than for the DT model (overall = 0.85; kappa = 0.83), although both values can be considered excellent (Landis and Koch 1977; Monserud and Leemans 1992).

Fig. 2 shows the inferred maps from both predictive models. In general, both maps show patterns similar to the predicted soil units, with the exception of the ACce unit (Haplic Acrisols Clayic + Haplic Planosol). The map obtained from the DT model illustrates the potential confusion in the classification of flood plains, assigning an extensive area of Acrisols and Planosols to an area that is characterised predominately by Gleysols.

Accuracy was determined by comparing control samples with the inferred maps. Comparison with the legacy soil map using control samples was not performed because the legacy soil map is far less detailed than the maps predicted in the present study and because the occurrence of some soil units was not even detected on the soil survey performed covering all of Rio de Janeiro State (scale 1 :250 000). Regardless, quantitative comparisons between the maps created using the tree-based models and the legacy soil map revealed the similar occurrence of some units. The agreement between these maps is associated with the main soil class in the associations and soil complexes (two or more soil types), which was expected due the different levels of detail.

The results obtained comparing the field data (109 sample points) with inferred maps showed a level of accuracy >50% for both models: 67.89% for the RF model and 54.13% for the DT model. The kappa values obtained from the confusion matrix comparing field data (control sample set) with the inferred units showed a better performance for the RF classifier (61.39%) than for the DT classifier (45.29%).

Observing the maps produced by the models, it is possible to infer that the main discrepancies are related to classes developed from sediments, particularly in flat areas with low elevation quotas, such as the Gleysols and Fluvisols. In these areas Haplic Planosols were also detected, which can easily be confounded with Haplic Acrisols in the footslope of small hills. In Fig. 2, DT classification resulted in large areas of Haplic Acrisols, whereas the RF models correctly classify a significant proportion of these areas as Gleysols.

The range in kappa values is similar to that of other studies in Brazil. Pinheiro (2012) obtained an accuracy of 59.2% in the same study area (the Guapi-Macacu watershed) using artificial neural networks (ANNs), whereas Chagas (2006) obtained 70.8% accuracy and Vaz de Melo (2009) obtained 66.6% accuracy using ANN classifiers with a sample density greater than that of the present study. Ten Caten (2011) reported kappa values between 69.7% and 71.1% following comparisons of field samples and inferred maps generated by decision trees with different-sized sample sets as input data.

The results of the present study demonstrate the potential of using tree-based models in digital soil mapping to promote a quantitative approach and greater overall accuracy of the final product, as evidenced by the error of the models compared with field validation samples. The present study determined that soil class maps developed using RF classifiers had better values in terms of the overall index, kappa index, accuracy and coherence with the legacy soil map than the DT model.


Attributes derived from a DEM, remote sensing data and categorical maps combined with field observations were used to identify strong relationships between landscape variables and the occurrence of different soil types.

Both the DT and RF approaches linked classic concepts of soil formation and landscape models with soil class predictions using terrain features and remote sensing data. The RF model performed relatively better than the DT model, and further research is needed regarding the introduction of additional input variables to improve the accuracy of predictions.

Received 5 March 2016, accepted 24 April 2017, published online 1 June 2017


Barthold FK, Wiesmeier M, Breuer L, Frede FIG, Wu J, Blank FB (2013) Land use and climate control the spatial distribution of soil types in the grasslands of Inner Mongolia. Journal of Arid Environments 88, 194-205. doi: 10.1016/j.jaridenv.2012.08.004

Behrens T, Foster H, Scholten T, Steinrucken U, Spies ED, Goldschmitt M (2005) Digital soil mapping using artificial neural networks. Journal of Plant Nutrition and Soil Science 168, 21-33. doi: 10.1002/jpln. 200421414

Ben-Dor E, Taylor RG, Flill J, Dematte JAM, Whiting ML, Chabrillat S, Sommer S, Donald LS (2008) Imaging spectrometry for soil applications. Advances in Agronomy 97, 321-392. doi:10.1016/S00652113(07)00008-9

Boettinger JL (2010) Environmental covariates for digital soil mapping in the western USA. In 'Digital soil mapping. Bridging research, environmental application, and operation'. (Eds JL Boettinger, DW Howell, AC Moore, AE Hartemink, S Kienast-Brown) pp. 17-27. (Springer: Berlin)

Boettinger JL, Ramsey RD, Bodily JM, Cole NJ, Kienast-Brown S, Nield SJ, Saunders AM, Stum AK (2008) Landsat spectral data for digital soil mapping. In 'Digital soil mapping with limited data'. (Eds AE Hartemink, AB Mcbratney, ML Mendonfa-Santos) pp. 192-202. (Springer-Verlag: New York, NY)

Bou Kheir R, Greveb HM, Abdallahc C, Dalgaardb T (2010) Spatial soil zinc content distribution from terrain parameters: a GIS based decision-tree model in Lebanon. Environmental Pollution 158, 520-528. doi: 10. 1016/j.envpol.2009.08.009

Breiman L (2001) Random forests. Machine Learning 45, 5-32. doi: 10. 1023/A: 1010933404324

Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) 'Classification and regression trees.' (Wadsworth & Brooks: Monterey, CA)

Carvalho Filho A, Lumbreras JF, Wittem KP, Lemos AL, Santos RD. Calderano Filho B, Mothci EP, Larach JOI, Conceifao M, Tavares NP, Santos HG, Gomes JBV, Calderano SB, Gonsalves AO, Martorano LG, Santos LCO, Barreto WO, Claessen MEC, Paula JL, Souza JLR, Lima TC, Antonello LL, Lima PC, Oliveira RP, Aglio MLD (2003) Mapa de reconhecimento de baixa intensidade dos solos do estado do Rio de Janeiro Scale 1 :250.000 (In Portuguese) Rio de Janeiro, RJ (Embrapa Solos) Boletim de Pesquisa e Desenvolvimentu 1, 32.

Carvalho Junior W, Chagas CS, Fernandes El, Vieira CE, Schaefer CEG, Bhering SB, Francelino MR (2011) Digital soilscape mapping of tropical hillslope areas by neural networks. Scientia Agricola 68, 691-696. doi: 10.1590/S0103-90162011000600014

Carvalho Junior W, Silva Chagas C, Muselli A, Pinheiro HSK, Pereira NR, Bhering SB (2014) Metodo do hipercubo latino condicionado para a amostragem de solos na presen?a de covariaveis ambientais visando o mapeamento digital de solos. Revista Brasileira de Ciencia do Solo 38, 3 86-396. doi: 10.1590/S0100-06832014000200003

Chagas CS (2006) Mapeamento digital de solos por correlajao ambicntal e redes neurais em uma bacia hidrografica de dominio de mar de morros. PhD Thesis, Federal University of Vicosa. [In Portuguese with an English abstract]

Chagas CS, Carvalho Junior W, Bhering SB (2011) Integra?ao de dados do Quickbird e atributos do terreno no mapeamento digital de solos por redes neurais artificiais. Revista Brasileira de Ciencia do Solo 35, 693-704. [In Portuguese with an English abstract] doi: 10.1590/S010006832011000300004

Chagas CS, Vieira CAO, Fernandes Filho El (2013) Comparison between artificial neural networks and maximum likelihood classification in digital soil mapping. Revista Brasileira de Ciencia do Solo 37, 339-351.

Chagas CS, Carvalho W Jr, Bhering SB, Calderano Filho B (2016) Spatial prediction of soil surface texture in a semiarid region using random forest and multiple linear regressions. Catena 139, 232 240.

Congalton RG, Green K (1999) 'Assessing the accuracy of remotely sensed data: principles and practices.' (Lewis Publishers: New York)

Crivelenti RC, Coelho RM, Adami SF, Oliveira SRM (2009) Mineracao de dados para a inferencia de relafoes solo-paisagem em mapcamcntos digitais de solo. Pesquisa Agropecuaria Brasileira 44, 1707-1715. [In Portuguese with an English abstract] doi :10.1590/S0100-204X2009 001200021

Dantas JRC, Almeida JR, Lins GA (2008). Impactos ambientais na bacia hidrografica de Guapi Macacu e suas conseqiiencias para o abastecimento de agua nos munictpios do leste da Baia de Guanabara. Serie Gestao e Planejamento Ambiental, 10, Colecao Artigos Tecnicos no. 7, Centro de Tecnologia Mineral do Ministerio da Ciencia e Tecnologia (CETEM/ MCT), Rio de Janeiro, Brazil.

Departamento De Recursos Minerais (DRM) (1979) Projeto Carta Geologica do Estado do Rio de Janeiro. Petropolis: folha SF-23-Z-B-IV-2. Rio de Janeiro. Escala 1:50.000. UFRJ, Rio de Janeiro.

DRM (1980a) Projeto Carta Geologica do Estado do Rio de Janeiro. Teresopolis: folha SF-23-Z-B-11-3. Rio de Janeiro. Escala 1 :50.000. DRM-RJ/GEOSOL, lv, Belo Horizonte.

DRM (1980ft) Projeto Carta Geologica do Estado do Rio de Janeiro. Nova Friburgo: folha SF-23-Z-B-II-4. Rio de Janeiro. Escala 1 :50.000. DRMRJ/GEOSOL, lv, Belo Horizonte.

DRM (1980c) Projeto Carta Geologica do Estado do Rio de Janeiro. Rio Bonito: folha SF-23-Z-B-IV-1. Rio de Janeiro. Escala 1 : 50.000. DRMRJ, Niteroi.

DRM (1981a) Projeto Carta Geologica do Estado do Rio de Janeiro. Itaborai: folha SF-23-Z-B-V-1. Rio de Janeiro. Escala 1 : 50.000. GEOM1TEC DRM/RJ, Niteroi.

DRM (1981ft) Projeto Carta Geologica do Estado do Rio de Janeiro. Cava: folha SF-23-Z-B-IV-1. Rio de Janeiro. Escala 1:50.000. DRM-R.J/ GEOSOL, 2v, Belo Horizonte.

DRM (1984) Projeto Carta Geologica do Estado do Rio de Janeiro. Itaipava: folhaSF-23-Z-B-l-4. Rio de Janeiro. Escala 1 :50.000. UFRJ, Rio de Janeiro.

Ehsani AH, Quiel F (2008) Geomorphometric feature analysis using morphometric parameterization and artificial neural networks. Geomorphology 99, 1-12. doi:10.1016/j.geomorph.2007.10.002

Environmental Systems Research Institute (ESRI) (2010) 'ArcGIS and ArcINFO v.10.' [CD-ROM] (ESRI: Redlands, CA)

Figueiredo MA, Varajao AFDC, Fabris JD, Loutfi IS, Carvalho AP (2004) Alterafao superficial e pedogeomorfologia no sul do Complexo Bacao-Quadrilatero Fem'fero (MG). Revista Brasileira de Ciencia do Solo 28, 713-729.

Gallant JC, Austin JM (2015) Derivation of terrain covariates for digital soil mapping in Australia. Soil Research 53, 895-906.

Giasson E, Sarmento EC, Weber E, Flores CA, Hasenack H (2011) Decision trees for digital soil mapping on subtropical basaltic steeplands. Scientia Agricola 68, 167-174. doi: 10.1590/S0103-9016201 1000200006

Giasson E, Hartemink AE, Tomquist CG, Teske R, Bagatini T (2013) Evaluation of five algorithms of decision trees and three digital elevation models for digital soil mapping at semidetail level at the Lageado Grande watershed, RS, Brazil. Ciencia Rural 43, 1967-1973. doi: 10.1590/S0103-84782013001100008

Godinho Silva SH, Owens PR, Menezes MD, Santos R, Junior W, Curi N (2014) A technique for low cost soil mapping and validation using expert knowledge on a watershed in Minas Gerais, Brazil. Soil Science Society of America Journal 78, 1310-1319. doi: 10.2136/sssaj2013. 09.0382

GRASS Development Team (2013) Geographic Resources Analysis Support System (GRASS v.7.0.3) GIS. Available at http://grass.osgeo. org/home/copyright [accessed 13 May 2014],

Grimm R, Behrens T, Marker M, Elsenbeer H (2008) Soil organic carbon concentrations and stocks on Barro Colorado Island--digital soil mapping using random forests analysis. Geoderma 146, 102 113. doi: 10.1016/j.geoderma.2008.05.008

Han J, Kamber M (2001) 'Datamining: concepts and techniques.' (Morgan Kaufmann/CA: San Francisco, CA)

Hengl TE, Heuvelink GBM (2004). New challenges for predictive soil mapping. In 'Global Workshop on Digital Soil Mapping', 14-17 September 2004, Montpellier, France, pp. 1-9. (AGRO-M/INRA: Montpellier, France). Available at Heuvelink_DSM2004.pdf [accessed 15 June 2014],

Hutchinson MF (1993) Development of continent-wide DEM with applications to terrain and climate analysis. In 'Environmental modeling with GIS'. (Ed. MF Goodchild) pp. 392-399. (Oxford University Press: New York, NY)

Institute Brasileiro de Geografia e Estatistica (IBGE) (1974) Escala 1 :50.000. Carta topografica. Institute Brasileiro de Geografia e Estatistica, Diretoria de Geociencias. Departamento de Cartografia. Nova Friburgo, folha SF-23-Z-B-II-4, Rio de Janeiro. Available at [accessed 20 April 2011].

IBGE (1979a) dados digitais da carta topografica na escala 1 :50.000. Institute Brasileiro de Geografia e Estatistica, Diretoria de Geociencias, Departamento de Cartografia, Itaipava. Available at home/#sub_download [accessed 20 April 2011]

IBGE (1979b) Escala 1 :50.000. Carta topografica. Institute Brasileiro de Geografia e Estatistica. Diretoria de Geociencias. Departamento de Cartografia, Itaborai, folha SF-23-Z-B-V-1. 2. 135a, Rio de Janeiro Available at [accessed 20 April 2011],

IBGE (1979c) Escala 1:50.000. Carta topografica. Institute Brasileiro de Geografia e Estatistica, Diretoria de Geociencias, Departamento de Cartografia. Petropolis, folha-SF-23,Z-B-IV-2. 2.135a, Rio de Janeiro. Available at [accessed 20 April 2011],

IBGE (1983) Escala 1 :50.000. Carta topografica. Institute Brasileiro de Geografia e Estatistica, Diretoria de Geociencias, Departamento de Cartografia. Teresopolis, folha SF-23-Z-B-1I-3 MI-2716-3. 2. 135a, Rio de Janeiro. Available at [accessed 20 April 2011],

IBGE (2008) Modelo de Elevagao Projeto RJ-25. Metadados. Rio de Janeiro. Institute Brasileiro de Geografia e Estatistica. Diretoria de Geociencias. Departamento de Cartografia, Pontado Fomo, Folha SF-24-Y-A-IV-3NE 2748-3-NE. Available at [accessed 20 April 2011],

IBGE (2013) 'Manual Tecnico de Pedologia.' 3rd edn. (Diretoria de Geociencias, Coordenagao de Recursos Naturais e Estudos Ambientais, IBGE).

International Union of Soil Sciences (IUSS) Working Group (2014) World reference base for soil resources. World Soil Resources Reports No. 106, FAO, Rome, Italy.

Iwahashi J, Pike RJ (2007) Automated classifications of topography from DEMs by an unsupervised nested-means algorithm and a three-part geometric signature. Geomorphology 86, 409-440. doi: 10.1016/ j.geomorph.2006.09.012

Jasiewicz J, Stepinski TF (2013) Geomorphons a pattern recognition approach to classification and mapping of landforms. Geomorphology 182, 147-156. doi: 10.1016/j.geomorph.2012.11.005

Lagacherie P, McBratney AB (2007) Spatial soil information systems and spatial soil inference systems: perspectives for digital soil mapping. In 'Digital soil mapping: an introductory perspective'. Developments in Soil Science 31. (Eds P Lagacherie, AB McBratney, M Voltz) pp. 389-399. (Elsevier: Amsterdam, The Netherlands)

Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33, 159-174. doi: 10.2307/2529310

Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2, 18-22.

Lorenzetti R, Barbetti R, Fantappie M, L'abate G, Costantini EA (2015) Comparing data mining and deterministic pedology to assess the frequency of WRB reference soil groups in the legend of small scale maps. Geoderma 237-238, 237-245. doi:10.1016/j.geoderma.2014. 09.006

McBratney AB, Mendonga-Santos ML, Minasny B (2003) On digital soil mapping. Geoderma 117, 3-52. doi: 10.1016/S0016-7061 (03)00223-4

McKenzie NJ, Ryan PJ (1999) Spatial prediction of soil properties using environmental correlation. Geoderma 89, 67 94. doi: 10.1016/S00167061(98)00137-2

Menezes MD, Silva SF1G, Owens PR, Curi N (2013) Digital soil mapping approach based on fuzzy logic and field expert knowledge. Ciencia e Agrotecnologia 37, 287-298. doi: 10.1590/S1413-70542013 000400001

Minasny B, McBratney AB (2006) A conditioned Latin hypercube method for sampling in the presence of ancillary information. Computers & Geosciences 32, 1378-1388. doi:10.1016/j.cageo.2005.12.009 Minasny B, McBratney AB, Santos ML, Santos HG (2003) Revisao sobre fungoes de pedotransferencia (PTFs) e novos metodos de predigao de classes de solos e atributos do solo. Documentos no. 45, Embrapa Solos, Rio de Janeiro, Brazil. [In Portuguese with an English abstract]

Monserud RA, Leemans R (1992) Comparing global vegetation maps with the kappa statistic. Ecological Modelling 62, 275-293. doi: 10.1016/ 0304-3800(92 (90003-W

Moonjun R, Farshad A, Shresha DP, Vaiphase C (2010) Artificial neural network and decision tree in predictive soil mapping of Hoi Num Rin sub-watershed. In 'Digital soil mapping. Bridging research, environmental application, and operation'. (Eds JL Boettinger, DW Howell, AC Moore, AE Hartemink, S Kienast-Brown) pp. 151-163. (Springer)

Moore ID, Grayson RB, Ladson AR (1991) Digital terrain modelling: a review of hydrological. Geomorphological and biological application. Hydrology Processes 5, 3-30.

Moran C, Bui E (2002) Spatial data mining for enhanced soil map modelling. International Journal of Geographical Information Science 16, 533-549. doi: 10.1080/13658810210138715

Odeh IOA, Crawford M, McBratney AB (2007) Digital mapping of soil attributes for regional and catchment modelling, using ancillary covariates, statistical and geostatistical techniques. In 'Digital soil mapping: an introductory perspective'. Developments in Soil Science 31. (Eds P Lagacherie, AB McBratney, M Voltz) pp. 437-453. (Elsevier: Amsterdam, The Netherlands)

Oliveira JB, Moniz AC (1975) Levantamento pedologico detalhado da estagao experimental de Ribeirao Preto, SP. Bragantia 34, 1-55.

Pinheiro HSK (2012) Digital soil mapping by artificial neural network in Guapi-Macacu watershed, RJ. MSc Thesis, Federal Rural University of Rio de Janeiro. [In Portuguese with an English abstract]

Pinheiro HSK, Owens PR, Chagas CS, Junior WC, Anjos LHC (2016) Applying artificial neural networks utilizing geomorphons to predict soil classes in a Brazilian Watershed. In 'Digital Soil mapping across paradigms, scales and boundaries'. (Eds G Zhang, D Brus, F Liu, X Song, P Lagacherie) pp. 89-102. (Springer: Singapore)

R Development Core Team (2013) R: a language and environment for statistical computing. (R Foundation for Statistical Computing: Vienna, Austria). Available at [accessed 15 June 2014].

Rad MRP, Toomanian N, Khormali F, Brungard CW, Komaki CB, Bogaert P (2014) Updating soil survey maps using random forest and conditioned Latin hypercube sampling in the loess derived soils of northern Iran. Geoderma 232, 97-106.

Roudier P, Beaudette DE, Hewitt AE (2012) A conditioned Latin hypercube sampling algorithm incorporating operational constraints. In 'Digital soil assessments and beyond: proceedings of the 5th Global Workshop on Digital Soil Mapping', 10-13 April 2012, Sydney, NSW, Australia. (Eds B Minasny, BP Malone, AB McBratney) pp. 227-231. (CRC Press)

Sabins FF (1997) 'Remote sensing principles and interpretation.' 3rd edn. (W. H. Freeman and Co.: New York, NY)

Sabins FF (1999) Remote sensing for mineral exploration. Ore Geology Reviews 14, 157-183. doi: 10.1016/S0169-1368(99)00007-4

Santos HG, Jacomine PKT, Anjos LHC, Oliveira VA, Lumbreras JF, Coelho MR, Almeida JA, Cunha TJF, Oliveira JB (2013) 'Sistema Brasileiro de Classificagao de Solos.' 3rd edn. (Embrapa Solos: Rio de Janeiro, Brazil)

Silva LD, Cunha H (2001) 'Geologia do Estado do Rio de Janeiro: texto explicativo do mapa geologico do Estado do Rio de Janeiro.' (Companhia de Pesquisa de Recursos Minerais (CPRM): Brasilia, Brazil) [In Portuguese]

Stum AK, Boettinger JL, White MA, Ramsey RD (2010) Random forests applied as a soil spatial predictive model in arid Utah. In 'Digital soil mapping. Bridging research, environmental application, and operation'. (Eds JL Boettinger, DW Howell, AC Moore, AE Hartemink, S Kienast-Brown) pp. 179-190. (Springer: Berlin)

Taghizadeh-Mehijardi R, Toomania N, Khavaninzadeha AR, Jafari A, Triantafili J (2016) Predicting and mapping of soil particle-size fractions with adaptive neuro-fuzzy inference and ant colony optimization in central Iran. European Journal of Soil Science 67, 707-725. doi: 10.1111/ejss. 12382

ten Caten A (2011) Mapeamento digital de solos: metodologias para atender a demanda por informacao especial em solos. PhD Thesis, Federal University of Santa Maria, Brazil. [In Portuguese with an English abstract]

Varajao CAC, Salgado AAR, Varajao AFDC, Braucher CF, Nalini Junior HA (2009) Estudo da evolufao da paisagem do quadrilatero ferrifero (Minas Gerais. Brasil) por meio da mensurafao das taxas de erosao (10be) e da pedogenese. Revista Brasileira de Ciencia do Solo 33, 14091425.

Vaz de Melo L (2009) Uso de redes neurais artificiais no mapeamento de solos na Bacia do Rio Turvo Sujo--Vicosa MG. MSc Thesis, Federal University of Vicosa. [In Portuguese with an English abstract]

Villela SM, Mattos A (1975) 'Hidrologia aplicada.' (McGraw-Hill do Brasil: Sao Paulo, Brazil)

Yang WL, Yang JW (1997) An assessment of AVHRR/NDV1ecoclimatological relations in Nebraska. USA. International Journal of Remote Sensing 18, 2161-2180. doi: 10.1080/014311697217819

Zhu AX (2001) Soil mapping using GIS, expert knowledge, and fuzzy logic. Soil Science Society of America Journal 65, 1463 1472. doi: 10.2136/ sssaj2001.6551463x

H. S. K. Pinheiro (A,D), P. R. Owens (B), L. H. C. Anjos (A), W. Carvalho Junior (C), and C. S. Chagas (C)

(A) Agronomy Institute--Soil Department, Federal Rural University of Rio de janeiro, Rodovia BR 465, Km 7, Campus Universitario, Zona Rural, 23897-000 Seropedica, RJ, Brazil.

(B) USDA Dale Bumpers Small Farms Research Center, 6883 S State Highway 23, Booneville, AR 72927, USA.

(C) Embrapa Solos (National Center of Soil Research), R. Jardim Botanico 1024, Rio de janeiro, RJ, Brazil.

(D) Corresponding author. Email:

Caption: Fig. 1. Covariates derived from a digital elevation model (DEM) and remote sensing data relating to each soil class. ([]), s.d.; ([??]), mean. ACce, Haplic Acrisols (Clayic); ACcr, Haplic Acrisols (Chromic); CM, Haplic Cambisols; GL, Haplic Gleysols; GLszn, Endosalic Gleysols; FRxa, Haplic Ferrasols (Xanthic); FRdy, Haplic Ferrasols (Dystric); FL, Fluvisols; RG, Regosols.

Caption: Fig. 2. Soil maps generated by the predictive (a) random forest model and (b) decision tree model. ACce, -Haplic Acrisols (Clayic); ACcr, Haplic Acrisols (Chromic); CM, Haplic Cambisols; GL, Haplic Gleysols; GLszn, Endosalic Gleysols; FRxa, Haplic Ferrasols (Xanthic); FRdy, Haplic Ferrasols (Dystric); FL, Fluvisols; RG, Regosols.
Table 1. Description of soil units, according to the World Reference
Base for Soil Resources (IUSS Working Group 2014), and respective
landscape characteristics

Soil     Description                 Landscape characteristics

ACce     Haplic Acrisols             <100m, undulating to hilly areas,
         (Clayic) +Haplic Planosol   under crops and native vegetation

ACcr     Haplic Acrisols (Chromic)   >150m, hilly areas, native

CM       Haplic Cambisols            Wide variation, but usually steep
                                     slopes, native vegetation

GL       Haplic Gleysols (Umbric     <20 m, nearly level to rolling,
         Gleysols)                   flat, pasture

GLsz     Endosalic Gleysols +        <10 m, nearly level to rolling,
         Thionic Gleysols            flat, native vegetation

FRxa     Haplic Ferrasols            >50 m, undulating to hilly areas,
         (Xanthic)                   pasture

FRdy     Haplic Ferrasols            >50 m, hilly areas,
         (Dystric) + Haplic          native vegetation
         Ferrasols (Xanthic)

FL       Fiuvisols (FL)              <50 m, nearly level to rolling,
                                     flat, crops

RG       Regosols (RG) + Rocky       >300 m, very steep areas on
         Outcrop                     watershed divisors, native

Soil     Landforms               Parental material

ACce     Footslope and slope     Granite/gneiss, sedimentary
                                 rocks, Quaternary
ACcr     Slope, footslope,       Alkaline rocks

CM       Slope, spur, shoulder   Granite/gneiss

GL       Valley, flat            Quaternary sediments

GLsz     Valley, flat            Quaternary sediments

FRxa     Spur, shoulder          Granite/gneiss, sedimentary

FRdy     Slope, spur,            Granite/gneiss
         shoulder, summit

FL       Valley, flat            Quaternary sediments

RG       Slope, shoulder,        Granite/gneiss
         ridge, spur

Table 2. Soil-forming factors, terrain covariates and
pedogcnctic importance

NDVI, normalised difference vegetation index; DEM,
digital elevation model

Soil-forming    Terrain covariates          Pedogenetic importance

Organisms       NDVI                        Vegetation, water and
                                            organic contents

Relief          DKM, slope, curvature,      Weathering, microclimate
                compound topographic        characteristics,
                index, Euclidean            moisture and other
                distance from stream        soil properties
                networks, Landform map

Parental        Geology map, clay           Mineralogy, erosion
material        minerals index, iron        susceptibility, soil
and time        oxide index                 fertility

(^) Adapted from McKenzie and Ryan (1999) and Chagas (2006).

Table 3. Confusion matrix from random forest and decision tree models

RF, random forest; DT, decision tree; ACce, Haplic Acrisols (Clayic);
ACcr, Haplic Acrisols (Chromic); CM, Haplic Cambisols; GL, Haplic
Gleysols; GLszn, Endosalic Gleysols; FRxa, Haplic Ferrasols (Xanthic);
FRdy, Haplic Ferrasols (Dystric); FL, Fluvisols; RG, Regosols

            CM            GLszn       GL

         RF    DT     RF     DT    RF    DT

CM       468   393     0      0     0     0
GLszn     0     0     499    499    1     0
GL        0     0      0      0    481   383
FRxa      0     0      0      0     0     0
FRdy      2    23      0      0     0     0
ACce      1     0      0      0    13    72
ACcr      0     0      0      0     0     0
RG       10    58      0      0     0     0
FL        0     0      0      0     1     0

            FRxa          FRdy        ACce

         RF    DT     RF     DT    RF    DT

CM        6    10     13     60     0     0
GLszn     0     0      0      0     0     1
GL        0     0      0      0     4    41
FRxa     466   393    31     107    2     0
FRdy      3     0     495    477    0     0
ACce     19    57      4     40    454   278
ACcr      0     0      0      0     0     2
RG        0     0      0      0     0     0
FL        2    23      0      0     2     1

            ACct          RG          FL

         RF    DT     RF     DT    RF    DT

CM        0     0     13     37     0     0
GLszn     0     0      0      0     0     0
GL        0     0      0      0    15    76
FRxa      0     0      0      0     1     0
FRdy      0     0      0      0     0     0
ACce      0     0      0      0     9    53
ACcr     500   498     0      0     0     0
RG        0     0     490    442    0     0
FL        0     0      0      0    495   476
COPYRIGHT 2017 CSIRO Publishing
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Pinheiro, H.S.K.; Owens, P.R.; Anjos, L.H.C.; Carvalho, W., Jr.; Chagas, C.S.
Publication:Soil Research
Article Type:Report
Geographic Code:1CANA
Date:Nov 1, 2017
Previous Article:Surface lime and silicate application and crop production system effects on physical characteristics of a Brazilian Oxisol.
Next Article:Parent material and climate affect soil organic carbon fractions under pastures in south-eastern Australia.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters