Global Soil Wetness Project

4. Validation Results

Quality Local Regional Other Models

4. VALIDATION RESULTS

The approaches to validation can be divided along the lines of spatial scale and assessment of the quality of the input data. The discussions in this section is so structured.

4.1 Quality of input data

The quality of the input data to a large extent will determine the quality of the final product. The input data from the ISLSCP CD-ROM have been independently assessed by a review group (Kerr et al. 1995). Details of this may be found on the CD-ROM and in Sellers et al. (1996). The soil data used in the GSWP however, presents new problems in its application, as the soil hydraulic parameters generally scale very non-linearly with soil moisture and show great spatial variability. In an effort to identify potential errors in the input data an assessment was made of the quality of the 1×1 gridded parameter fields with high resolution parameter maps derived from high resolution soil maps over Europe (Dolman et al., 1997).

Figure 9 Saturated hydraulic conductivity of the soil derived from the ISLSCP CD-ROM (top) and the EU soil map (bottom). Units are s m-1.

The soil map of the FAO was used to create the 1×1 data sets of soil hydraulic parameters. These parameters describe the relation between soil moisture and soil suction (the soil moisture retention curve) and soil moisture and hydraulic conductivity. The precise form depends on the soil type and the type of equation chosen. In the ISLSCP data set the forms of Clapp and Hornberger (1978) are used. Since, the parameters depend on soil type, more precisely on clay, sand and organic matter content, they show a considerable sub-grid variability. It was the aim of this part of the GSWP to qualitatively assess the potential error involved in using the ISLSCP data compared to aggregated descriptions based on high resolution soil maps.

Aggregation of soil hydraulic parameters is possible within certain limits (Kabat et al. 1997). In general it would tend to conserve some of the spatial variability as opposed to taking a dominant class (Noilhan and Lacarrére 1992). This issue is not without contention. Kabat et al. (1997) suggested that soil hydraulic parameters can be aggregated in a similar manner as proposed for vegetation parameters (Dolman and Blyth 1996). This implies that the parameters are weighted by their fractional coverage in the domain and averaged according to the structure of equations in which they appear, i.e. logarithmically, reciprocally, linear etc. Boone and Wetzel (1998) illustrate the potential impact of linear versus non-linear averaging of soil properties on simulation of the surface water balance.

Figure 9 shows a comparison of the parameter Ksat for the ISLSCP CD-ROM field and a field derived from the European Soil map (Lilly 1995). It is clear that considerable differences can be found between these two approaches. The effect of this on the water balance is the subject of ongoing research.

Figure 10 Winter available soil moisture content calculated from ISLSCP soils data (top) and the maximum available water content derived from the EU soils data set (bottom). Units are mm.

The input data from the EU map can also be used to assess the overall validity of some of the estimates. Dolman et al. (1997) used information on rooting depth and soil type to calculate physical limits of available water for the European domain. Figure 10 shows a comparison of winter soil moisture (assumed generally to be at its maximum) and maximum soil water availability according to soil type, underground and slope. Rooting depth is the prime determinant of available water content, so the high resolution data effectively provide an upper limit to the available moisture content. It is clear that considerable improvement may be made to the input data, as at high region, with rocky sloping underground, the rooting depth in the ISLSCP CD-ROM data appears to be too deep. This is visible in the region of the Alps and Scandinavia.

Zhang et al. (1998) found evidence that the specified surface air temperatures over grid points in Russia where soil moisture and meteorological observations were available are cooler than observed during winter and spring. This may have contributed to errors in the simulation of snow accumulation and the timing of melt in SiB2, and may have also affected the simulation of the annual cycle of soil moisture at these locations.
 

4.2 Local scale measurements

Off-line validation of LSPs against single point datasets has been the norm rather than the exception in land surface modeling. The experience with PILPS has shown that considerable differences exist between various models. It is not the intention of the GSWP to duplicate the effort by PILPS, but we want nevertheless be sure that the predicted global fields of soil moisture are realistic, and moreover, are realistic for the right physical reasons. Two studies were executed in the GSWP validation framework using data from local point sources, one of them concerned a classical off-line study. We treat these studies here together, because they basically address similar questions. We also used point data from a field experiment in the Netherlands to check the range of predictions by the GSWP-LSPs.

Figure 11 Time series of 6-hourly average observations from FIFE (top panel of each section), the JMA-SiB GSWP simulation at 39.5N, 96.5W (middle panel), and the difference (bottom panel): (a) 2 m temperature; (b) 2 m mixing ratio; (c) downward shortwave radiation; (d) wind speed.

The First ISLSCP Field Experiment (FIFE; Sellers et al. 1992) data have been used by Matsuyama and Nishimura (1999) to validate components of their JMA-SiB land surface model. The model was run with forcing data from the ISLSCP CD-ROM. They initially compared this forcing with the FIFE observations and conclude that precipitation, except heavy rainfall events, of the CD-ROM agrees well with the FIFE observations. This should come as no surprise, given the way the CD-ROM rainfall was treated to be downscaled to 6-hourly estimates. Differences between the CD-ROM data and the observations are also found between temperature, mixing ratio, shortwave radiation and 5 m wind speed. Down-welling shortwave radiation is also underestimated. This was noted also in the CD-ROM review (Kerr et al., 1995). Figure 11 shows a composite plot of the differences.

It is to be expected that these differences translate into the model predictions. Matsuyama and Nishimura (1999) conclude that their model dries out too quickly. They suggest that the soil hydraulic parameters used in GSWP cannot be responsible for this and suggest that root water uptake is too strong as a result of drier than actual meteorological conditions in the CD-ROM data.

They further conclude that the use of CD-ROM forcing data jointly with in situ observations of soil moisture is questionable, as soil moisture reacts strongly to precipitation input, and small (or large) differences between observed and CD-ROM precipitation can lead to appreciable changes in modeled and observed soil moisture.

Entin et al. (1999) use four observed data sets from Russia, Mongolia, Illinois and China to compare plant available soil moisture with the GSWP. These data have been previously successfully used in PILPS and AMIP intercomparisons (Robock et al. 1995, Schlosser et al. 1997). They consist of measured soil moisture over a depth comparable to the rooting depth of agricultural plants and are generally taken every two weeks manually. Details on the data can be found in Entin et al. (1999). Figure 12 shows the spatial distribution and geographical location of the data. Figure 13 shows the plant available soil moisture for the 10 models used in the GSWP and the observations. The spread is tremendous. The wetter models are consistently wetter (100 mm) than the observations. The model range is roughly 150 mm from the driest to the wettest model. This is no simple correctable offset as can be inferred by looking at the Place model which is wet in the agriculture group, but belongs to the dry models in the forest group. In general the so-called SiBling group, SiB-2, SiB, COLA-SSiB and Place belong to the models giving on average wetter than observed results. The drier group consists of BATS, Bucket, Mosaic, SSiB-H and the ISBA model.

Figure 12 Location of the soil moisture observing stations, and the boxes used for spatial averages in Fig. 13.
Entin et al. conclude that "no model did a good job of reproducing the amount of plant-available soil moisture in the top meter of the soil". The differences between the models are generally of similar or larger magnitude than the differences between the observations and any one model. In this they echo the PILPS results. They cannot test the assumption that the physics of the models is flawed or that the input data may be incorrect.

Chen and Mitchell (1999) compare the results for the LSP used operationally in the Eta regional forecast model of the National Centers for Environmental Prediction (NCEP; Chen et al. 1996, 1997) to observational soil moisture data from the Illinois soil moisture network (Hollinger and Isard 1994). The Illinois data are from a series of point stations on a spacing finer than the 1×1 grid of GSWP. They find that their LSP simulation of the phase and amplitude of the seasonal cycle of soil moisture aggregated over the grid boxes covering Illinois compares well with observations, and falls within the range of variability within the point observations. Comparisons between single grid boxes and nearest point observations do not match as well, but still appear skillful.

Figure 13 Two-year time series of plant-available soil moisture in the top meter from ten models for the areas in Fig. 12. Observations are also displayed, along with the one standard deviation confidence interval.

4.3 Regional scale measurements

Writing the regional water balance as:

dS = P - E - R

with P precipitation, E evaporation, R runoff and dS the change in surface water storage, it is possible to estimate P- E by the atmospheric water balance method (Oki et al., 1995). We can then derive the change in storage by comparing against observations of runoff of large rivers such as the Amazon (Matsuyama, 1992, Matsuyama and Masuda, 1997) or Congo (Matsuyama et al., 1994). It is worth noting that S represents the total change in water storage, including snow, groundwater table changes, lakes and water in flooded plains. The accuracy of this method restricts its use to large river basins having a relative dense network of radio soundings. In practice, the analysis products of the large weather centers are often used.

In the GSWP, Oki et al (1999) use the runoff of large rivers directly as a means to assess the quality of the GSWP-product. This is a logical step in the context of comparison with and Land Surface Parameterizations, as these models produce also runoff by various mechanisms (saturation overflow, etc.). Oki and Sud (1998) produced a 10*10 global river channel network (TRIP; Total Runoff Integrating Pathways) that allowed comparison of the GSWP output with the global runoff observations.

Figure 14 Mean annual runoff by 11 LSMs (top) and observed annual runoff (bottom) over each drainage area for 1988. Units are mm y-1.

One of the main stumbling blocks of comparing the GSWP with observations is that nature does not conform to rectangular 1×1 grid. Hence comparing the estimates of runoff produced by the models with real observations of river runoff is complicated, to say the least. Oki and Sud (1997) have made a major improvement in our ability to execute these comparisons by developing a global river channel network on a 1×1 grid. This allows the discrimination of contributing grid boxes to a particular real world gauging station. To allow comparison of the essentially one-dimensional GSWP results with observations a routing scheme must be applied. Oki et al. (1999) describe this simple scheme in some detail.
Figure 15 Comparison between the density of raingauges used in preparing the forcing precipitation data set, and the mean runoff error among the 11 LSPs.

In total 467 gauging station data were available. This allowed validation of runoff of 50 river basins, covering roughly 30% off all land mass, excluding Antarctica. Figure 14 shows a comparison of annual runoff of GSWP results with data from the observations. In general the agreement is quite good. Negative values can occur in the observed data within sub-basins where evaporation or irrigation loss exceeds the streamflow into the sub-basin.

A crucial element in this comparison is the assessment of the quality of the data. In Figure 15, the discrepancy between observed and modeled runoff is plotted as a function of the density of rainfall gauges. The scatter in the bias of the estimates is large for areas of low rain gauge density and decreases as the density increases. Unfortunately areas with high rainfall gauge density are sparse over the globe. The minimum density of gauges required to achieve some meaningful comparison between model and data is 30-50 gauges per 106 km2. This graph dramatically illustrates some of the difficulties in comparing global models with observations. Only in small parts of the world, enough data is available to do this successfully. In large parts of the world, model predictions are our only source of information. Nevertheless, the relative RMS error of the 11 schemes participating in the GSWP is 40% for runoff and 18% for annual evaporation. This is comparable to PILPS results, but it should be borne in mind that these refer to local comparisons only. Using the river routing model, the monthly runoff was also compared to the observations. Using this model considerably improved the monthly comparison, and the system show good promise for application in coupled models.

Zhang et al. (1998) also examined basin scale runoff as a means of validating the performance of their SiB2 model over large scales. Using the routing formulation of Miller et al. (1994) with the TRIP routing map, they try to find on what spatial and temporal scales their simulation of the surface water balance compares well with observations. They find that SiB2 underestimates the discharge for large river basins, while underestimating small river discharges with a net bias toward underestimation of about 10%.
 

4.4 Validation against other model estimates

One further option to assess the usefulness of the GSWP is to compare it against rather crude, but simple equations relating evaporation to the annual water and energy balance constraints. Such a well known equation is derived by Budyko and starts from:

which states that the annual evaporation (the equation holds only for annual sums) is the minimum of the amount of precipitation available for evaporation and the available net radiation. The apparent problem with this equation is that at any one time a change in storage of soil moisture would invalue the equation. Nevertheless at annual, climatological time scale the equation is attractive as provide a physical bound to the estimate of evaporation. Koster et al (1998) explore the use of this equation as a yardstick against which to test the performance of the LSP involved in the GSWP.

Off line comparison of land surface models with local data has been the traditional way of looking at the performance of LSPs. A critical question is how the use of independent meteorological forcing and the absence of feedbacks with the atmosphere corrupts the results. Koster et al (1998) approach this issue by comparing the outputs of their LSP with a simple empirical description relating precipitation to available energy and evaporation, developed by Budyko.

Figure 16 Annual runoff for thirteen representative catchments as derived from observations (solid bar), GSWP models (open bars), and the Budyko equation (striped bar). The station name is listed below the catchment name.

Figure 16 shows the runoff for 14 river basins of the world for some of the GSWP LSP's compared with Budyko estimate. Annual standard error of prediction for these LSP's is 80 mm year-1 compared to the empirical Budyko equation. This is of the same order as the errors produced by the various land surface models used in GSWP. The increased complexity in the current generation of LSP's apparently does not automatically lead to increased performance or predictability at the annual scale. Alternatively, to large extent, the performance of both the Budyko and LSPs is significantly dominated by the forcing applied from the ISLSCP CD-ROM. Arguably, improvement in an LSP can now be measured against this yardstick. It should be noted that at the shorter time scale (< annual) this comparison is not valid as the physics of the model takes control over the meteorological forcing.

Chen and Mitchell (1998) used the NCEP reanalysis products for 1987 and 1988 as a means to validate the performance f their LSP. They found that compared to reanalysis soil moisture, their GSWP run generally has a smaller annual cycle, however at the middle and high latitudes, the NCEP reanalysis has a wetter and less spatially variable distribution of soil moisture throughout the year. These differences exist despite the similarity of the GSWP atmospheric forcing fields to reanalysis, and the similarity of the land surface model used in both. It should be noted that the NCEP reanalysis procedure includes a damping of soil moisture to the climatology of Mintz and Serafini described by Mintz and Walker (1993), instituted to prevent drift in the reanalysis model to a dry regime. This may contribute to the uniformity of soil moisture, and an apparent lack of interannual variability in the reanalysis.