Interactive comment on “ Process-level model evaluation : A Snow and Heat Transfer Metric ”

This is an interesting and concise paper that proposes a compact method to evaluate the capacity of land surface models to represent the effect of snow inflation on the underlying soil. I have no doubt that the proposed metric, with some little changes proposed in the following, will be widely used. The paper is yet another illustration why the first author’s recent passing away is a huge loss for the scientific community.


Introduction
The current generation of land models are typically complex in nature and simulate a vast array of processes (Clark et al., 2015;Prentice et al., 2015).Interdependencies within these models produce external and internal feedbacks that can operate on various temporal and spatial scales.It is therefore imperative that such models be rigorously evaluated in order to interpret their performance, as well as to guide future development.
Verifying a model result against observations using statistics such as root mean square error can provide useful information but alone does not necessarily indicate whether the model is getting the right answers for the right reason (Abramowitz et al., 2008;Gupta et al., 2008).For example, the impact of the forcing data can play an equal or greater role in results than some aspects of the model (Ménard et al., 2015).Most global-scale "observationally based" data sets contain substantial uncertainty, especially in the relatively sparsely observed high latitudes (Anisimov et al., 2007;Decker et al., 2012;Slater, 2016;Mudryk et al., 2015), and coupled Earth system models can produce biased surface meteorology, particularly in high-latitude winters (Slater and Lawrence, 2013).It therefore follows that if one is trying to assess land model structural error (i.e., deficiencies in model architecture and/or parameterizations) it is preferable to reduce ambiguity by minimizing other sources of uncertainty, e.g., uncertainties in model parameters, initial conditions, or forcing (in either land-only or coupled simulations).
In high-latitude regions an important process is snow and soil heat transfer.The temperature and state of the soil, whether frozen or thawed, play a pivotal role in the timing and magnitude of energy, mass, and biogeochemical fluxes between the land and atmosphere (Hobbie et al., 2000;Dutta et al., 2006;Monson et al., 2006;Lawrence et al., 2012Lawrence et al., , 2015)).Concerns about the potential release of vast carbon stores currently locked in permafrost soils and its feedback Published by Copernicus Publications on behalf of the European Geosciences Union.
on the climate have brought this topic to the fore (Schuur et al., 2015).Model estimates of Arctic carbon fluxes are highly variable (Fisher et al., 2014;Koven et al., 2015) in part due to differences in simulation of soil temperatures and permafrost conditions.Essentially, we need to ensure soil temperature and moisture are correctly simulated before confidence in projected biogeochemical fluxes can be achieved.Additionally, biases in the simulated temperature at the snow-soil interface can adversely affect the snowpack itself though the impact of these biases on snow metamorphism at the base of the snowpack.
In cold midlatitude and Arctic regions, snow forms an insulating barrier between the colder atmosphere and underlying soil during winter.The impact of snow on soil temperatures, particularly in permafrost regions, has been well documented in both observational (Sharratt et al., 1992;Zhang, 2005) and modeling studies (Luetschg et al., 2008;Lawrence and Slater, 2010;Ekici et al., 2015;Yi et al., 2015).A variety of structures have been used to represent snow in models (Slater et al., 2001), with varying levels of simulated insulation (Koven et al., 2013).The aim of this work is to define a metric that demonstrates processes-level model evaluation using the heat transfer mechanism from atmosphere to soil under the conditions of a seasonal snowpack.

Method
The theory of conductive heat flow under periodic forcing (e.g., the annual cycle) is demonstrated in many texts (Lunardini, 1981;Yershov, 1998).Taking an example of a semiinfinite medium forced at the surface with a periodic temperature wave with no phase change or mass exchange, we can compute the amplitude of this periodic wave at soil depth (z) as where A z is the amplitude of the temperature wave at depth (z); A o is the surface amplitude; and d is the damping depth, defined as the depth at which the 1/e of the surface amplitude occurs.The value of d is a function of the thermal diffusivity of the medium and the length of time that the forcing is applied.
The above theory is adapted to the cooling period of the year, defined here as October to March.A vast portion of the terrestrial Northern Hemisphere becomes snow covered during October and remains so beyond March.The actual land surface temperature is rarely observed in situ, but 2 m air temperature serves as a sufficient proxy as the two quantities tend to equilibrate towards each other, particularly in colder months of high-latitude regions with low solar input, though in situations with strong inversions the temperature difference between the snow-air interface and 2 m height can be substantial (this is an acknowledged yet unavoidable limitation).Observations (Sect.3) show that the coldest air tem-perature is typically in January or February, while the coldest soil temperature at 20 cm depth usually lags by a month.When data are confined to the annual cooling period, the amplitude of air temperature is taken as (2) The soil temperature amplitude (A soil ) can be calculated annually in the same way.To elucidate the process of heat transfer and remove climatically driven factors, for example the large seasonal cycle associated with deep continental regions as compared to more moderate coastal locations, a normalized temperature amplitude difference is computed as which ranges from 0 to 1.A norm values near 0.0 indicate minimal difference in the annual cycle of air and soil temperatures, while a value close to 1.0 suggests soil temperatures essentially do not change over the cooling period.If we take A soil to be A z and A air as A o , and then substitute Eq. (3) into Eq.( 1), we arrive at the form where Such theory pertains to an idealized case, but in reality the distance z would be affected by the quantity and temporal sequence of snow accumulation, and the damping depth d will be impacted by snow density, soil inhomogeneity, or phase change.Therefore, we propose a similar but more flexible approach: We introduce an offset, P , because, even if there were absolutely no snow, the thermal properties of the soil, along with phase change, and imperfect heat transfer between the surface and near-surface air mean that the amplitude of soil temperature at a depth (e.g., 20 cm) will likely be different to that of the air.A multiplier, Q, is applied to account for the temporal nature of snow accumulation.Our analysis pertains to locations with seasonal, rather than permanent, snow cover, so even if 3 m of snow accumulates by the end of March there is likely to be some cooling in soil temperature from atmospheric forcing in October.The R parameter is an effective damping depth of the snow and soil system.The density and morphology of the snow, along with soil moisture content and phase change, play a role in determining R. The R value also tends to govern the marginal influence of additional snow insulation.To account for seasonal variations in the linear distance between the soil and the atmosphere (i.e., the snow depth z, in Eq. 4), we define an effective snow depth (S depth,eff ).
S depth,eff describes the insulation impact of snow and is an integral value such that the mean snow depth (S) each month (m; 1-6) is weighted by its duration.The maximum duration (M = 6) is the total cooling period of 6 months (October-March).The first snow depth value (S 1 ) is the October mean snow depth.
As shown in Fig. 1, a season with an early snowfall will typically produce a higher effective snow depth than a linearly increasing snowpack with the same mean value.Similarly, if shallow snow persists for most of the winter but a large snowfall occurs in February, the effective snow depth will be lower than in the linear case.
Given inputs of A norm and S depth,eff (from observations or models), we can efficiently compute the values of the three parameters P , Q, and R using a nonlinear fitting method (e.g., the Levenberg-Marquardt (LM) algorithm; Press et al., 2007).The relationship between A norm and S depth,eff informs us about the heat transfer mechanism between the atmosphere and the subsurface.To alleviate the problem of overfitting, the curve is fit using data that are sampled evenly across a range of effective snow depths with up to 35 data points per 5 cm of S depth,eff up to 50 cm (with the final category being 45 cm or beyond).This sampling was performed 100 times, with the median of fit values taken; if the LM algorithm does not converge for a given sample, another sample is taken.Sampling also ensures that a wide variety of climatic regimes are used for characterizing the functional relationship seen in observations or a model.

Data
Assessing the relation between A norm and S depth,eff requires three pieces of data: screen level air temperature, 20 cm soil temperature (as this is the most commonly observed depth), and snow depth.An analysis constraint is the need to use monthly mean values, as this is the most common output from large-scale land models.
A large network of hydrometeorological stations in Russia (and the former Soviet Union) provide the three required variables as follows: air temperature was acquired from the National Climatic Data Center (NCDC) Global Summary of the Day (https://data.noaa.gov/dataset/global-surface-summary-of-the-day-gsod), provided the snow depths were measured at stations (as opposed to local transects), and information on soil temperatures is available in Sherstyukov and Sherstyukov (2015).Data in the USA are from the Natural Resources Conservation Service as part of the SNOTEL and SCAN networks (http://www.wcc.nrcs.usda.gov/).In Canada, Alberta's AgroClimate Information Service (ACIS; http://agriculture.alberta.ca/acis/)collates and distributes station data within the province.
For a given season, we use only sites with complete, quality-controlled records from October to March; 2049 observed site years are available.It is also recognized that the data have inherent limitations; for example it is unknown whether snow depths are measured at precisely the same location as the soil temperature measurements.Many measurements have been made on soils that have been historically altered; however our use of shallow (20 cm) soil temperatures and focus on the winter period aim to minimizes possible artifacts due to disturbance.
Snow heat transfer in 13 models that participated in the CMIP5 experiment (Taylor et al., 2012; http://cmip-pcmdi.llnl.gov/cmip5/) is evaluated using the historical (1850-2005) simulations.Soil columns within the various land models have different depths and layer thicknesses (Slater and Lawrence, 2013; their Fig. 2), so a spline interpolation of each monthly mean temperature profile was used to estimate a 20 cm value.Further details of the land models are available in Koven et al. (2013) and Slater and Lawrence (2013).
All data are restricted to those instances where, during the cooling period (October-March), mean T air is below −1 • C, mean T soil is below 2.5 • C, A air is greater than 10 • C, and S depth,eff is greater than 1 cm and less than 150 cm.Observed locations fitting the above data criteria are shown in Fig. 2. Model data meet the same criteria, but our intention is to test the ubiquitous physics of a process; thus model data are not restricted to the same locations as the observations, and an order of magnitude more (or greater) grid cells from the models are available for sampling.
Locations with observed air and 20 cm soil temperature plus snow depth

Results and discussion
The observations show the expected exponential shape and fit the underlying theory (5) well despite significant scatter in the data (Fig. 3).The scatter likely arises from several sources, including (1) the range of climate conditions and snow regimes (Sturm et al., 1995) that occur across the landscape, including the timing of snowfall and the pattern of snow metamorphism; (2) the properties and moisture content of the soil; and (3) uncertainties in the measurements themselves and the measurement locations of the observed data.It is not possible to distinguish which of these sources of uncertainty dominates the scatter seen in Fig. 3, and it remains possible that snowpack dynamics in regions outside the sampled data would generate slightly different relationships, though the shape of the curve would not likely change.Grey shading in Fig. 3 shows the span of median scatter of all 100 fits to the observations, with the black curve being the median fit value.
The observational results demonstrate that the marginal influence of snow insulation relative to the annual cycle of air temperature diminishes beyond a S depth,eff of about 25 cm.This phenomenon of insulation saturation has been previously noted in observations and models (Zhang, 2005;Lawrence and Slater, 2010).
Results from models are calculated the same as observations, with the median curve for each model shown on the same plot (Fig. 4).The difference amongst model curves is considerable.Several models produce an exponential-like curve (e.g., CCSM4, GISS, MRI, NorESM), suggesting that they generally reflect the character of the observed relationship.Many models, however, do not reproduce the observed relationship.Models such as CanESM, GFDL, Hadley models, MIROC, and MPI are more linear in their form.This group of more poorly performing models all show A norm values of less than 0.45 at S depth,eff of 20 cm (compared to a median observed value of 0.65 at 20 cm depth).A low A norm value is produced when soil temperatures more closely track the changes in air temperature rather than being modulated by snow cover.
Note that the impact of heat transfer from the soil surface to 20 cm depth can be inferred from the A norm values at S depth,eff = 0 cm (A norm typically between 0.05 and 0.30 at S depth,eff = 0 cm).Many models exhibit a lower normalized temperature difference value at S depth,eff = 0 cm, which suggests that these models are transferring heat through the upper soil too efficiently.The potential sources of a soil heat transfer bias are myriad and could be due to biases/errors in soil texture, soil moisture, soil water phase, soil thickness, and vegetation.Most of these models, apart from CLM (Lawrence and Slater, 2008), do not represent the highly insulative soil organic matter, so this is a potential explanation of the common biases at S depth,eff = 0 cm.Incorrect snow heat transfer curves are symptomatic of model deficiencies.As an example, the land scheme in the Hadley Centre models used here (MOSES2.2;Essery et al., 2001) applies a composite snow model where the top soil layer and snowpack share the same temperature (Slater et 5), showing large differences in how heat is transferred though the snowpack to the soil.Data from the CMIP5 comparison period 1986-2005 are used.The dashed black line is the observational fit, with grey shading representing the observational scatter as in Fig. 3. al., 2001); hence insulation is not properly accounted for and cold temperatures easily penetrate into the soil.Conversely, the better-performing models feature multi-layer snowpacks that are more apt at emulating the nonlinear temperature profile of the snowpack.Note that the Hadley Centre model developers have addressed this limitation by implementing a multi-layer snow model (Best et al., 2011;Chadburn et al., 2015).
Despite the application of a different coupled model component (e.g., atmosphere or ocean models) and different initial conditions, both of which will influence terrestrial surface climate, the two Hadley Centre climate models (HadGEM2-CC and HadGEM2-ES, both utilizing MOSES as their land model) produce essentially the same snow insulation curve.Similarly, CCSM4 and NorESM reproduce essentially the same curve, which is expected since both models utilize CLM4 as their land model.The fact that snow insulation curves are essentially the same for a particular land model, even when driven with different climatic forcings from their parent climate model, suggests that the form of the curve does indeed capture the functional land model behavior.
The steep gradient of A norm at shallower S depth,eff is an important feature to capture as over 85 % of seasonally snowcovered land is estimated to have a S depth,eff less than 30 cm.Reliable observed data of hemispheric-scale snow depths or mass is not yet available, but the prevalence of shallow snow can be seen in estimated climatological S depth,eff derived from numerous reanalysis (ERA-Interim (Dee et al., 2011), MERRA (Rienecker et al., 2011), CFSR (Saha et al., 2010), JRA-55 (Kobayashi et al., 2015), some of which assimilate snow depths) products such as GlobSnow (Takala et al., 2011) and station interpolations (Foster and Davy, 1988;Brown and Brasnett, 2010) (Fig. 5).
A model diagnostic: the snow and heat transfer metric (SHTM) A land model should be able to capture the exponential relationship between A norm and S depth,eff , and it is useful to summarize the ability of a given model to do so as a compact metric.Here, we develop a snow and heat transfer metric (SHTM) that could be used within a broader land model analysis system such as the International Land Model Benchmarking project (ILAMB; http://www.ilamb.org;Luo et al., 2012).The metric is designed to have a value from 0 to 1 and describes the departure of a model's snow insulation curve from the observed curve.As the marginal influence of snow insulation decreases after a S depth,eff value of about 25 cm and most of the seasonal snow regions have a S depth,eff below 30 cm (Fig. 5), the SHTM value is only calculated over the range of 0 to 30 cm.For each 1 cm of S depth,eff , up to 30 cm, twice the difference in A norm between the model and observational curves is obtained; a root mean square error of these values is then computed and subtracted from 1.The SHTM value is thus 2 . (7) The closer SHTM is to 1, the better the model is at reproducing the observed snow insulation curve; a lower limit of 0 is placed on the SHTM.An important feature of the SHTM is that it effectively isolates analysis of the snow insulation process and therefore should be robust across different climate forcing.We test that hypothesis here by calculating the SHTM for ten 15-year periods from 1850 to 2000 for each model.The mean 2 m air temperature over the terrestrial Northern Hemisphere north of 55 • N for the cooling period (October-March) was computed for each 15-year period for each model.As a measure of changing climates, the minimum and maximum of these 15-year averages was differenced per model, resulting in a span of 1.25 to 4.15 • C, with a mean of 2.15 • C. Despite these climatic changes, the SHTM maintains a fairly constant value for each land model.Over the 150 years, all models return a SHTM standard deviation of less than 0.021, which is substantially less than the typical SHTM difference between models and indicates a general invariance to climate variability (Fig. 6).Due to scatter in the data (both modeled and observed data), the SHTM can only be used as a broader indicator of model skill.Based on the standard deviation of scores and observational uncertainty, SHTM values within 0.05 of each other may not be significantly different, and incremental improvements to the snow insulation schemes may not necessarily result in meaningful changes in the SHTM score.Additionally, the SHTM does not evaluate the ability of models to simulate snow accumulation or ablation; additional data sets and metrics are needed to assess these processes.

Conclusions
Model evaluation is an important aspect of the overall modeling process: e.g., development, application, and/or prediction.Land, hydrology, and ecology models can range in complexity from simpler, more conceptually based, formulations through to highly explicit representations that incorporate a multitude of processes.Regardless of their complexity, it is important that the underlying physical processes are represented correctly.
Deriving the relationship between a normalized temperature amplitude difference between air and soil, A norm , and the seasonal effective snow depth, S depth,eff , we have shown that observations are consistent with heat transfer theory.Analysis of observed results suggests that the marginal impact of snow insulation diminishes beyond a S depth,eff of about 25 cm.Structural weaknesses in several models have been exposed by examining their ability to represent the atmosphere-land heat transfer process in the presence of snow.To quantitatively compare model performance, a snow and heat transfer metric was designed.The SHTM value per model changed little when calculated for different periods over 150 years of climate change, suggesting it can fairly unambiguously provide an indication of whether model struc- tural/parameter deficiencies exist by negating other areas of uncertainty, such as the model forcing data.The SHTM is a useful model diagnostic that can be added to existing land model analysis/benchmarking systems.

Figure 1 .
Figure 1.Effective snow depth (S depth,eff , dashed lines) of three different snow regimes that have equal mean values over the period October-March.Earlier snowfall receives greater weighting as it represents greater insulation; data in green thus have the greatest effective depth.

Figure 2 .
Figure 2. Locations with colocated observations of air temperature, 20 cm soil temperature, and snow depth.

Figure 3 .
Figure 3. Observed relation between A norm and effective mean snow depth (S depth,eff ) along with the resulting exponential fit (dashed line).The grey shading shows the median fit plus/minus the mean scatter (O err ) of all fits.

Figure 4 .
Figure 4. Model fits to Eq. (5), showing large differences in how heat is transferred though the snowpack to the soil.Data from the CMIP5 comparison period 1986-2005 are used.The dashed black line is the observational fit, with grey shading representing the observational scatter as in Fig.3.

Figure 5 .
Figure 5. Cumulative distribution curves for climatological effective snow depths (S depth,eff ) as a percentage of all seasonally snowcovered area in the Northern Hemisphere (NH).The observed and model climatologies span a minimum of 10 years in the last quarter of the 20th century.The bold lines show the median values of the curves for both observations and models; shading shows the total range of curves.Note that the absolute snow-covered area can be quite different amongst models and observational estimates.

Figure 6 .
Figure 6.Mean value of the snow and heat transfer metric (SHTM) for each model.Values closer to 1.0 indicate better agreement with observations.SHTM values are relatively stable over 150 years of climate for each model.