Evaluation of air-soil temperature relationships 1 simulated by land surface models during winter across 2 the permafrost region

. A realistic simulation of snow cover and its thermal properties are important for


Introduction
Present-day permafrost simulations by global climate models are limited and future projections contain high, modelinduced uncertainty (e.g.Slater and Lawrence, 2013;Koven et al., 2013).Most of the model biases and cross-model differences in simulating permafrost area are due to inaccurate atmospheric simulation, e.g. of air temperature and precipitation, deficient simulation of snow and soil temperature and the coupling between atmosphere and land surface.In winter, the snow insulation effect is a key process for air-soil temperature coupling.Its strength depends on the snow depth, areal coverage, snow density and conductivity (see overview by Zhang, 2005).Many individual model studies have shown the strong impact of snow parameterisations on soil temperature simulations (e.g.Langer et al., 2013;Dutra et al., 2012;Gouttevin et al., 2012;Essery et al., 2013;Wang et al., 2013;Jafarov et al., 2014).Most importantly, these studies showed that the consideration of wet snow metamorphism and snow compaction, improved snow thermal conductivity and multilayer snow schemes can improve the simulation of snow dynamics and soil temperature.Parameterisations that take into account snow compaction (e.g.related to overburden pressure, thermal metamorphism and liquid water) work better than simpler schemes such as an exponential increase of density with time (Dutra et al., 2010).The influence of snow thermal conductivity on soil temperature has been demonstrated by many model studies (e.g. Bartlett et al., 2006;Saha et al., 2006;Vavrus, 2007;Nicolsky et al., 2007;Dankers et al., 2011;Gouttevin et al., 2012).Winter soil temperature can change by up to 20 K simply by varying the snow thermal conductivity by 0.1-0.5 W m −1 K −1 (Cook et al., 2008).The snow insulation effect also plays an important role for the Arctic soil temperature response to climate change and therefore for future near-surface permafrost thawing and soil carbon vulnerability (e.g.Schuur et al., 2008).Shallower snow can reduce soil warming while shorter snow season can enhance soil warming (Lawrence and Slater, 2010).The model skill in atmosphere-soil coupling with the concomitant snow cover in the Arctic is an important factor in the assessment of limitations and uncertainty of carbon mobility estimates (Schaefer et al., 2011).
The Snow Models intercomparison project (SnowMIP; Essery et al., 2009) and the Project for Intercomparison of Land Surface Parameterization Schemes (PILPS) Phase 2e (Slater et al., 2001) examined the snow simulations of an ensemble of land surface schemes for the midlatitudes.However, until now there has been no attempt to evaluate the air-soil temperature relationship in the Northern Hemisphere permafrost region and the detailed role of snow depth therein across an ensemble of models.In such an investigation, a first suitable approach is the evaluation of standalone (offline) land surface models (LSMs).The retrospective  simulations from the model integration group of the Permafrost Carbon Network (PCN; http://www.permafrostcarbon.org)provide an opportunity to evaluate an ensemble of nine state-of-the-art LSMs.Here, the LSMs are run with observation-based atmospheric forcing, meaning that snow depth is not influenced by biases in the atmospheric forcing in a coupled model set-up.The evaluation of the offline modelled air temperature-snow depth-near-surface soil temperature relationship in winter is therefore important for revealing a model's skill in representing the effects of snow insulation.
Most of the LSMs participating in PCN are the land surface modules of Earth system models (ESMs) participating in the Coupled Model Intercomparison Project (CMIP5; http://cmip-pcmdi.llnl.gov/cmip5/)although in some cases different versions were used for PCN and CMIP5 simulations.Thus, the results we present can guide the corresponding evaluation of these ESMs, though analysis of coupled model results requires consideration of couplings between model components and is necessarily more complex.
The scope of the present study is to analyse the extent to which the ensemble of PCN models can reproduce the observed relationship between air and near-surface soil temperatures in the Northern Hemisphere permafrost region during winter, with a particular focus on the snow insulation effect.For the latter we analyse the impact of snow depth on the difference between near-surface soil and air temperatures.Our related key questions are the following: how well do the models represent the observed spatial pattern of the air-soil temperature difference in winter and how it is controlled by snow depth?What is the range of the simulated air-soil temperature relationship across the model ensemble?To the greatest extent possible, we try to relate the performance of the models to their snow schemes.With this aim in mind, a simultaneous analysis of simulated air and near-surface soil temperatures and snow depth is presented and compared with those from a novel data set of Russian station observations.We used this data set because it has been compiled within PCN, and it is hard to find other station data sets which provide simultaneous observations of both air and soil temperatures as well as snow depth over a long period.
In Sect.2, we describe the model simulations, the station observations used for evaluation and the analysis methods.In Sect.3, we present a detailed analysis of near-surface air temperature-snow depth-soil temperature relationships in winter.In Sect.4, we discuss the roles of atmospheric forcing and model processes.In Sect.5, we investigate the relationship between simulated snow insulation and permafrost area.We summarise our findings and present conclusions in Sect.6.

Models
We use data from nine LSMs participating in the PCN, including CLM4.5, CoLM, ISBA, JULES, LPJ-GUESS, MIROC-ESM, ORCHIDEE, UVic and UW-VIC.For detailed information about the models and simulations we refer to Rawlins et al. (2015), Peng et al. (2015) and McGuire et al. (2016).The total soil depth for soil thermal calculations ranges from 3 m (divided into 8 layers) in LPJ-GUESS to 250 m (divided into 14 layers) in UVic.The physical properties of the soil differ among the models as well, and four of them (CLM4.5,ISBA, UVic, UW-VIC) include organic horizons.Three models (ISBA, LPJ-GUESS, UW-VIC) do not archive soil subgrid results and provide only area-weighted ground temperature (i.e.averaged over wetlands and vegetated areas, and in some cases lake fractions).
Table 1 lists relevant snow model details.One model (UVic) uses an implicit snow scheme which replaces the upper soil column with snow-like properties, i.e. the nearsurface soil layer takes the temperature of the air-snow interface.The other models use separate snow layers on top of the ground, either a single bucket (LPJ-GUESS, UW-VIC) or multilayer snow schemes (CLM4.5,CoLM, ISBA, JULES, MIROC-ESM, ORCHIDEE).Snow insulation is explicitly considered in all models: increasing snow depth in-creases the insulation effect.Most models consider the effect of varying snow density on insulation (Table 1).This is parameterised by a snow conductivity-density relationship.Some of the models (LPJ-GUESS, MIROC-ESM, OR-CHIDEE, UVic) use a fixed snow density, consider only dry snow and no compaction effects, while others represent liquid water in snow and different processes for snow densification such as mechanical compaction and thermal and destructive metamorphism (Table 1).
The simulations were generally run for the period 1960-2009, although some simulations were stopped a few years earlier.Each model team was free to choose appropriate driving data sets for weather and climate, atmospheric CO 2 , nitrogen deposition, disturbance, land cover, soil texture, etc.However, the climate forcing data (surface pressure, surface incident longwave and shortwave radiation, near-surface air temperature, wind and specific humidity, rain and snowfall rates) are from gridded observational data sets (e.g.CRUN-CEP, WATC; Table S1 in the Supplement).The exception is MIROC-ESM, which was run as a fully coupled model, forced by its own simulated climate.Mean annual air temperature simulated by MIROC-ESM for the permafrost region was within the range (−7.2 to 2.2 • C) of the other forcing data sets used in this study and the trend in near-surface air temperature (+0.03 • C yr −1 ) was the same for all forcing data sets.However, MIROC-ESM had both the highest annual precipitation (range 433 to 686 mm) and the highest trend in annual precipitation (range −2.1 to +0.8 mm yr −1 ) among the forcing data sets.
The spatial domain of interest is the Northern Hemisphere permafrost land regions.Our analysis is based on the 0.5 • × 0.5 • -resolution gridded driving and modelled data for winter (DJF) 1980-2000.

Observations
A quality-checked data set of monthly near-surface air temperature, 20 cm soil temperatures and snow depth from Russian meteorological stations have been provided by the All-Russian Research Institute of Hydrometeorological Information -World Data Centre (RIHMI-WDC; http://meteo.ru/).Of the stations, 579 report snow depth and 268 provide simultaneous data on all three variables.Ground surface temperature data are not available.A detailed description of data set preparation is provided in Sherstiukov (2012a).Observing conditions at the Russian stations in all meteorological elements correspond with WMO standards.The observations presented have been included in other data sets, such as the Global Summary of the Day (GSOD) data set, HadSRUT4 etc., and are widely used in climate research (e.g.Anisimov and Sherstiukov, 2016;Decharme et al., 2016;Park et al., 2014;Brun et al., 2013;Pavlov and Malkova, 2009;PaiMazumder et al., 2008).The soil temperature data set was run through four independent methods of quality control (Sherstiukov, 2012b).However, some soil temperature www.the-cryosphere.net/10/1721/2016/The Cryosphere, 10, 1721-1737, 2016 observations could be disturbed by grass cutting during the warm season and the removal of organic materials, mainly at agricultural sites, which may affect the trend in the warm season (Park et al., 2014), but this does not affect our results on the air-upper soil temperature relationship in winter.Precipitation station data have been compiled from the GSOD data set produced by the National Climatic Data Center (NCDC; http://www.ncdc.noaa.gov)for all of the stations that are included in the RIHMI-WDC data set.In addition to the station's ground snow depth observations we use gridded snow water equivalent (SWE) data from the GlobSnow-2 product (http://www.globsnow.info/swe/),which have been produced using a combination of passive microwave radiometer and ground-based weather station data (Takala et al., 2011).Orographic complexity, vegetation cover and snow state (e.g.wet snow) affect the accuracy of this product.When compared with ground measurements in Eurasia, the GlobSnow product shows root mean square error (RMSE) values of 30 to 40 mm for SWE values below 150 mm, with retrieval uncertainty increases when SWE is above this threshold (e.g.Takala et al., 2011;Muskett, 2012;Klehemet et al., 2013).In order to be compared with station data, snow depth was then calculated from SWE using a snow density of 250 kg m −3 , which is a median observed value in winter.Zhong et al. (2014) report snow density values of 180-250 kg m −3 for tundra/taiga and 156-193 kg m −3 for alpine snow classes.Woo et al. (1983) report snow density values of 250-400 kg m −3 for various terrain types.Choice of density does not materially affect the results.
All these data have been compiled for winter (DJF) and the same time period of 1980-2000.This period was chosen because soil temperature data are sparse before 1980 and the JULES simulation stopped in the year 2000.Comparison of the simulations with the station data was done using a weighted bilinear interpolation from the four surrounding model grid points onto the station locations.

Analysis methods
Our analysis is focused on the common winter (DJF) condition, although snow can begin in November or even earlier and end at the beginning of May, but we checked that a different winter definition (NDJFMA) does not qualita- The Cryosphere, 10, 1721-1737, 2016 tively change any of the intervariable relationships found.The focus in our study is on the evaluation of the simulated air-soil temperature relationships, modulated by snow depth.For this, we analyse the winter mean as well as the interannual variability (expressed as the standard deviation) of four key variables: near-surface air temperature (T air ), nearsurface soil temperature (soil temperature at 20 cm depth; T soil ), snow depth (d snow ) and the difference between T soil and T air .This difference T ( T = T soil − T air ) is called the air-soil temperature difference.By limiting our analysis to winter only, we are able to attribute the cross-model and model-to-observation differences in T primarily to snow insulation effects.In winter, the effects of other factors (e.g.soil moisture, texture) on T are much smaller than that of snow.Ground surface temperatures were not recorded in the Russian data set, but 20 cm soil depth temperatures were.To test how sensitive results are using 20 cm depth temperatures instead of ground surface temperatures, we also analysed model-simulated temperature differences between ground surface and T air , and found no qualitative differences, hence justifying the use of 20 cm observations.
We use the Pearson product-moment correlation coefficient and its significance (von Storch and Zwiers, 1999) to investigate the covariability between T and d snow as well as between T soil and its two forcing factors (T air and d snow ).Before we compute the correlations, we detrended the data by removing a least squares regression line.The calculated correlation maps (i.e.spatial distributions of correlation coefficients) based on model and observation data, allow comparison of the spatial patterns of these relationships.
To further examine the functional behaviour between the key variables, we present relation diagrams between pairs of variables (e.g.variation of T with change of d snow ).To evaluate the performance of the individual LSMs we calculate the RMSE between the observed and modelled relationships.We illustrate the dependence of T vs. d snow and T soil vs. d snow relations for three T air ranges.To distinguish dry snow pack regimes from those where sporadic melt may occur even in winter, we split T air into three regimes: the coldest conditions (T air ≤ −25 • C, representing 24 % of observations), the intermediate temperature conditions (−25 • C < T air ≤ −15 • C, representing 42 % of the observations), and the warmest conditions (−15 • C < T air ≤ −5 • C, representing 34 % of observations).Hence it is an indirect separation of temperature-gradient metamorphosis regimes and density-gradient metamorphosis snow pack regimes.Additionally, we present conditional probability density functions (PDFs) of T for different snow depth and air temperature regimes and compare the simulated PDFs with those obtained from station observations.

Relationship between air-soil temperature difference and snow depth
The relationship between air-soil temperature difference ( T ) and snow depth (d snow ) in winter (Fig. 1) shows an increase of T with increasing d snow in the Russian station observations.The data exhibit a linear relation between T and d snow at relatively shallow snow depths with a trend towards asymptotic behaviour at thicker snow, which is in agreement with earlier findings (Zhang, 2005;Ge and Gong, 2010;Morse et al., 2011).There is also significant scatter in the observation-based relationship indicated by the interquartile range in T of 1.5-8.5 • C at specific snow depth and air temperature regimes, likely resulting from complicating factors such as snow pack density and moisture content variability over the winter, as well as observational errors.All models reproduce the observed relationship, i.e. increasing T with increasing d snow .However, Fig. 1 also shows a wide cross-model spread in the simulated relationships and shows that some of the models are not consistent with the behaviour in the observations.Only three models (CLM4.5,CoLM, JULES) reproduce the observed T vs. d snow relationship reasonably well using a benchmark of RMSE < 5 • C for all temperature regimes.In particular LPJ-GUESS, ORCHIDEE, UVic, UW-VIC, MIROC-ESM show large RMSE for cold air conditions.ISBA stands out overall, with a RMSE of 7-18 • C in all temperature ranges.We conclude that these models do not adequately represent the features of the observed T vs. d snow relationship.The scatter in the modelled relationships, indicated by the interquartile range, is of the same order as in the observations, except for ISBA and MIROC-ESM, which produce noticeably smaller variations.
Figure 2a views the T vs. d snow relationship in a complementary form using the PDFs of T for different snow depth regimes.This analysis allows for a detailed evaluation of the snow-regime-dependent T separation by quantify-ing and comparing the modal value and width of the different conditional PDFs.Since the Russian snow depths are clearly non-normal in distribution (Fig. S1 in the Supplement, with a median d snow of 30 cm), we divide the data into "shallow" (d snow ≤ 20 cm) and "thick" (d snow ≥ 45 cm) regimes to separate two snow depth regimes.The modal value of the station-based T PDF is 5 • C for shallow snow and 14 • C for thick snow; that is, thick snow is a better insulator than thin snow.Based on the T PDFs, five models (CoLM, CLM4.5, JULES, ORCHIDEE, MIROC-ESM) successfully separate the T regimes under different snow depth conditions.Their simulated T PDFs have a smaller modal value for thin snow than for thick snow, like in the observations.The other models clearly fail in separating the T PDFs for the two different snow depth regimes.However, even for the five successful models, both the shapes and the modal values of the simulated PDFs differ from the observed PDF.
Both Figs. 1 and 2b further indicate that T is related to T air conditions.This is expected due to the effects of T air on snow pack properties, particularly its density and moisture content that affect the thermal conductivity of the snow.For example, the density of fresh fallen snow tends to be much lower under cold T air than warm (Anderson, 1976), leading to increased insulation (larger T ).Snow densification is also a function of T air ; for example, depth hoar metamorphosis of the snow pack, which produces more insulation (loosely packed depth-hoar crystals have very low thermal conductivity), is promoted by strong thermal gradients in the snow pack and is typical of continental climates (e.g.Zhang et al., 1996).Therefore, we can expect that the same thickness of snow in colder climates will provide greater insulation than it would in warmer climates.
Our analysis of observations (Figs. 1 and 2b) confirms (i) a larger T for colder T air than for warmer T air (for a given snow depth), (ii) a greater sensitivity of T to changes in d snow in colder T air (Fig. 1) and (iii) a larger modal value of the T PDF for colder T air than for warmer T air (21 2b).These effects are consistent with colder climates having lower density snow packs, and the differences are in line with measurements of snow density variability (Zhong et al., 2014).Additionally, both the interquartile range in Fig. 1 and the width of the PDFs in Fig. 2b become larger as T air cool.This may be related to the formation of depth hoar, which is a very good insulator and its varying presence in the snow pack decouples T from d snow .Cold, thin snow packs tend to contain much more low-density depth hoar than warmer snow packs (e.g.Zhang et al., 1996;Singh et al., 2011).Continental regions have large annual temperature cycles, with greater interannual variability and thinner snow packs than maritime ones.This variability leads to greater scatter and greater sensitivity of the T vs. d snow relationship in the cold winter regions.An additional cause of scatter is that the density of fresh-fallen snow decreases with the decrease of temperature.Accordingly, in the cold T air regime (T air ≤ −25 • C) we find a larger T in early winter (November-December), when the snow pack is composed of thinner lower-density fresh snow (and depth hoar), than late winter (January-February; Fig. S2).Under warm conditions (−15 • C < T air ≤ −5 • C) such a separation is not observed.
If we evaluate the models with respect to this observed impact of T air on the T vs. d snow relationship, we demonstrate that some models (CLM4.5,CoLM, JULES) are better able to replicate the effect than others (LPJ-GUESS, MIROC-ESM, ORCHIDEE, UW-VIC; Fig. 1).The latter do not fully replicate the larger T under cold T air conditions.CLM4.5, CoLM and JULES capture a larger T for colder T air for a given d snow in agreement with the observations.However, for shallow snow, JULES simulates an increase of T with increasing d snow for all temperature ranges that is twice as large as observations.Two models (ISBA, UVic) clearly fail in this evaluation.Poor model performance in reflecting T air influence on the T vs. d snow also manifests itself in regime separation of the PDFs (Fig. 2b).Some models do not separate the T regimes under different T air conditions well or at all (ISBA, LPJ-GUESS, MIROC-ESM, UVic), while others  ORCHIDEE, UW-VIC).The three models with reasonable intervariable relations (CLM4.5,CoLM, JULES) also capture the regime separation in the PDFs.These three models, as well as LPJ-GUESS and ORCHIDEE, also represent the observed greater insulation of early winter snow packs under cold conditions (Fig. S2).
The maps of the T vs. d snow correlations in winter (Fig. 3) demonstrate a pronounced spatial variability in the T vs. d snow relationship.The highest positive correlation occurs in the region of eastern Siberia and the Siberian Highlands.In other regions, namely Scandinavia, the western Russian Arctic, West Siberian Plain and Central Siberian Plateau, the correlation is much weaker and often not statistically significant.These regions have snow (Sect.4.1.2)influenced by North Atlantic cyclonic activity which brings relatively warm moist air and heavy precipitation in winter (and a positive correlation between d snow and T air ), leading to relatively small mean T .Some models (CLM4.5,CoLM, ORCHIDEE, UW-VIC) show a reasonable spatial pattern of correlation coefficient (r ≥ 0.4) compared to those of the observations, while the others do not (Fig. 3).Obvious outliers are the LPJ-GUESS and UVic models, which do not reproduce the observed pattern of correlation.UVic calculates a reverse spatial pattern comparing to that of the observations (e.g.significant positive correlation in West Siberian Plain and Central Siberian Highlands).LPJ-GUESS produces very few statistically significant correlations.

Variability of soil temperature with air temperature and snow depth
Next we assess whether or not the models can correctly reproduce the interannual near-surface soil temperature (T soil ) variability in relation to snow depth (d snow ) and near-surface air temperature (T air ) variability.Previous studies have noted that the strength of the relationship between T soil and T air is modulated by d snow and the snow insulation effect increases only up to a limiting depth beyond which extra snow makes little difference to soil temperatures (Smith and Riseborough, 2002;Sokratov and Barry, 2002;Zhang, 2005;Lawrence and Slater, 2010).Zhang (2005) reported that the limiting snow depth is approximately 40 cm.
To inspect the difference in insulation capacity for shallow and thick snow, we investigate the T soil vs. T air relationship under shallow (d snow ≤ 20 cm) and thick (d snow ≥ 45 cm) snow conditions.Our Russian observation analysis (Fig. 4, Table 2) indicate a regression slope between T soil and T air (0.62 = 0.8) that is 3 times higher under shallow snow pack than thicker snow conditions (0.21 ).This is consistent with observations that the mean freezing n-factor (the ratio of freezing degree days at the ground surface to air freezing degree days) is high at sites where the snow cover is thin or absent and low at sites where the snow cover is thick (e.g. for Yukon in Canada; Karunaratne and Burn, 2003).
Figure 4 clearly shows that some models (CoLM, CLM45, JULES) can capture this difference well.Their regression slopes for thick and thin snow are well separated and in agreement with those from the observed relationship (Table 2).The RMSE of their modelled T soil vs. T air relationships from observations is smaller than 4 • C.These models better reproduce the observed T vs. d snow relationship.Other models (LPJ-GUESS, MIROC-ESM, ORCHIDEE) do not reproduce the much greater regression slope between T soil and T air for shallow snow than for thick snow as the www.the-cryosphere.net/10/1721/2016/The Cryosphere, 10, 1721-1737, 2016 observations show.They also produce a regression slope for thick snow which is more than twice as large as for the observations.Two models (ISBA, UVic) do not show any sensitivity in the T soil vs. T air relation to snow conditions (Fig. 4, Table 2).Another measure quantitatively confirms the same models behaviour: the observed average d snow in the shallow snow regime is 13.7 cm and that for the thick snow regime is 58.5 cm, so we would expect, if near-surface T air and conductivities were equal in both snow depth classes, that a ratio between the slopes for shallow and thick snow would be 4.3.CLM4.5, CoLM and JULES reproduce this observed variation in the T soil vs. T air relation better than others (Table 2).JULES and CoLM indicate a change of a factor of 4, while CLM4.5 indicates a change of factor of 2. Other models (LPJ-GUESS, MIROC-ESM, ORCHIDEE) underestimate the increase of the regression slope for decreasing snow depth; they simulate only a factor change of about 1.5.The two models with unrealistic T vs. d snow relationships (ISBA, UVic) also fail in this evaluation of their T soil vs. T air relationship.They simulate a too-strong sensitivity of T soil to T air (regression slopes larger than 0.9 Table 2) that is almost completely independent of the snow depth regimes, particularly in ISBA, which is not consistent with observations.These models' spatial correlation patterns between T soil and T air also differ greatly from the observations and the other models (Fig. S3) and show very high positive correlation (r > 0.8) in most regions, as may be expected from the large regression slope shown in Fig. 4. The RMSE of their modelled T soil vs. T air relationships from observations reaches ca. 10 • C .The T soil vs. d snow relationship (Fig. 5) displays the variation of T soil with changing snow depth and emphasises the reduced sensitivity of T soil to snow depth under thick snow conditions.With increasing d snow , T soil asymptotically converges towards a value of around 0 • C. Overall, the Russian observations indicate that snow depth above about 80-90 cm has very little additional insulation effect on T soil .Most models show consistent results with regard to this aspect, although the interquartile range of T soil for specific snow depths is quite large in some models (ISBA, ORCHIDEE, UVic, UW-VIC; Fig. 5).The figure further points to the air temperature dependency of the relation.On average, for a given d snow , a colder T soil is observed for colder near-surface air temperatures, compared with warmer air temperatures.Most models can replicate this effect of air temperature on the T soil vs. d snow relationship, though with differing accuracy.The RMSE between the observed and modelled relationships can reach ca. 10 • C or more (in ISBA, UVic, UW-VIC), particularly under cold conditions.
The spatial patterns of the correlation coefficients between T soil and T air (Fig. S3) and between T soil and d snow (Fig. S4) show a relatively large cross-model scatter in many regions.Obvious outliers in the T soil vs. T air correlation maps are ISBA and UVic, which strongly overestimate the correlation (r > 0.9) over most of the Arctic.This indicates an underes-timated snow insulation effect and confirms the weak insulation in both models, which we already discussed based on their underestimated T (Fig. 1) and weak correlation between T and d snow (Fig. 3).Other models (LPJ-GUESS, ORCHIDEE, UW-VIC) also overestimate the correlation in some regions (e.g.western Russian Arctic, r > 0.7).Most of the simulated maps of T soil vs. d snow correlation agree with the observations on a strong positive correlation in eastern Siberia.This is a region of relatively shallow snow (10-40 cm; Fig. 6) and there T soil is very sensitive to variations in snow depth (e.g.Romanovsky et al., 2007).Comparing both simulated correlation maps, it is obvious that in this region, T soil correlates more strongly with d snow than with T air , in agreement with the Russian data and earlier studies (Romanovsky et al., 2007;Sherstyukov, 2009).

Roles of atmospheric forcing and model processes
The cross-model differences in the snow insulation effect, presented by the air temperature-snow depth-soil temperature relationships described above, are partially due to the differences in the atmospheric forcing data and also due to differences in the snow and soil physics used in the LSMs.However, because the climate forcing data sets utilised with each model are observation-based (except for MIROC-ESM), obvious outliers in individual model performance likely indicate poor or deficient physical descriptions of the air-snow-soil relations in that specific LSM.

Air temperature and precipitation
Both near-surface air temperature (T air ) and precipitation are given by the climate forcing data sets (Table S1) for all models, except for MIROC-ESM, which simulates both.The cross-model differences in forcing T air are relatively small and the simulated spatial patterns of temperature are very similar (Fig. S5).All forcing data sets are somewhat colder than Russian station data in their grid cells.The biases of winter mean T air range from −0.8 to −4.7 • C (Table S2), reflecting biases in the climate forcing data used by the models.In contrast, MIROC-ESM has a positive (mean) T air bias of +2.7 • C.
The large-scale patterns of precipitation are similar across the models, but regional differences can be large (Fig. S6).The individual differences in winter precipitation range from −0.2 to +0.5 mm day −1 (Table S2) relative to the average of the Russian station data.Unfortunately, snowfall was archived in only a few models; however large-scale spatial patterns are similar across these models (Fig. S7).

Snow depth
The broad-scale spatial snow depth (d snow ) patterns are similar across the models and show general agreement with the observed patterns (Fig. 6).The well-pronounced areas of maximum winter d snow (50-100 cm) are in Scandinavia, the Urals, the West Siberian Plain, Central Siberian Highlands, the Far East, Alaska, Labrador Peninsula and Isle of Newfoundland.However, large regional cross-model variability is obvious.Some models (JULES, LPJ-GUESS, ORCHIDEE, UVic) underestimate d snow , while others (CLM4.5,CoLM, ISBA, UW-VIC) overestimate it (Fig. 6; Table 3).The model biases are quite similar with respect to station observations and GlobSnow data.It should be noted that the models do not account for snowdrift.However, redistribution of snow due to wind is an important aspect, which makes comparison between in situ measured and modelled snow depths difficult (e.g.Vionnet et al., 2013;Sturm and Stuefer, 2013;Gisnas et al., 2014).Precipitation/snowfall cross-model differences cannot be the primary explanation for these d snow differences since some models (JULES, MIROC-ESM, ORCHIDEE) have positive bias in precipitation (> 0.2 mm day −1 , Table S2) but simulate much lower d snow compared to other models (Figs. 6, S6, S7, Table 3).Cross-model differences in the interannual variability of winter precipitation do not translate simply to corresponding differences in the interannual d snow variability (not shown).For example, UVic calculates the (unrealistically) largest interannual d snow variability in the boreal European permafrost region, which is not reflected in the precipitation variability.These results indicate that the simulated snow depth is a function of both the prescribed winter precipitation and the model's snow energy and water balance.

Model processes
We have shown that the cross-model spread in the representation of snow insulation effects (Sects.3.1, 3.2) cannot predominantly be explained by differences in the forcing data (Sect.4.1), but to a large extent is due to the representation of snow processes in the models.By considering the relationship plots (Figs. 1, 4 and 5) and conditional PDFs (Fig. 2), we were able to categorise the models in terms of their snow insulation performance.In this section we discuss the influence of the different snow parameterisations in the models.
Models with better performance (CLM4.5,CoLM, JULES) apply multilayer snow schemes.This allows them to simulate more realistic (stronger) insulation because they consider the snowpack's vertical structure and variability.They calculate the energy and mass balance in each snow layer, are able to capture nonlinear profiles of snow temperature and can also account for thermal insulation within the snowpack such as when the upper layer thermally insulates the lower layers (e.g.Dutra et al., 2012).These models also incorporate storage and refreezing of liquid water within the snow and parameterise wet-snow metamorphism, snow compaction and snow thermal conductivity (Table 1), which have been found to be among the most important processes for good snow depth and surface soil temperature simulation (e.g.Wang et al., 2013).
An underestimated snow depth directly leads to insulation that is too weak in JULES, LPJ-GUESS, ORCHIDEE and UVic (Fig. 6, Table 3).However only in ORCHIDEE and UVic does this lead to a significant underestimation of T (Table 3, Fig. S8), indicating bias compensation in the two other models.Thus, compensating error effects occur due to snow density and conductivity (Fig. S9, Table 1), which impact snow thermal insulation.
Our analysis showed that two models (ISBA, UVic) have T soil vs. T air correlation that are too high, indicating that they do not represent the modulation of the T soil vs. T air relationship by snow depth (Fig. 4).This is consistent with their underestimation of T (Figs. 1, 2, S8, Table 3).In UVic, the snowpack is treated not as a separate layer but as an extension of the top soil layer and a combined surface-to-soil thermal conductivity is calculated (Table 1).Such a scheme largely negates or reduces the insulating capacity of snow (Slater et al., 2001).Koven et al. (2013) noted that such a scheme simulates very little warming of soil and sometimes even cooling.The slightly underestimated snow depth (Table 3, Fig. 6) contributes (but not as the primary factor) to reduced snow insulation, as reported for UVic (Avis, 2012).
ISBA strongly underestimates T , while strongly overestimating d snow , compared with the observations (Table 3, Fig. 6).However, ISBA uses the same atmospheric forcing data as JULES (accordingly the air temperature and precipitation are quite similar; Table S2).Also, the model's snow density (150-250 kg m −3 ) is similar to other models (CLM45, CoLM, JULES; Fig. S9) and in agreement with Zhong et al. (2014), who report snow density values of 180-250 kg m −3 for tundra/taiga and 156-193 kg m −3 for alpine snow classes in winter.This apparent contradiction comes from the parameterisation of snow cover fraction within each grid cell (SCF).The version of ISBA used here calculates a unique superficial soil temperature whether or not the soil is covered by snow and all the energy and radiative fluxes are area-weighted by SCF (Eqs.7 and 20 in Douville et al., 1995).In order to get reasonable albedos in snow-covered forests, as is necessary when ISBA is coupled to the CNRM-CM climate model, the parameterisation gives very low SCF in the boreal forest (between 0.2 and 0.5).Hence, snow insulates only 20 to 50 % of the grid cell, despite fairly high snow depths.The heat fluxes from the snow-covered fraction are averaged with the fluxes from the snow-free surface, strongly concealing the actual insulating effect of snow and underestimating it over the grid cell.Using the detailed snow model Crocus (Brun et al., 1992;Vionnet et al., 2012) with a SCF equal to 100 % leads to an almost perfect simulation of near-surface soil temperature over northern Eurasia (Brun et al., 2013).A similar experiment with ISBA and a SCF equal to 100 % (Decharme et al., 2016) leads to good performances showing that the low T in ISBA, despite high snow depth in the present study, is mostly due to this subgrid snow fraction.Decharme et al. (2016) still showed that the ISBA results are further improved by updating the snow albedo and snow densification parameterisation.
Interestingly, the ORCHIDEE performance in simulating snow depth and T is similar to UVic (underestimation of d snow and T ; Table 3).However, ORCHIDEE can better represent the observed T soil vs. T air relationship and its modulation due to the snow pack.ORCHIDEE employs, similarly to UVic, a fixed snow density and thermal conductivity.However, in contrast to UVic, ORCHIDEE applies a multilayer scheme and simulates heat diffusion in the snowpack in up to seven discrete layers (Table 1; Koven et al., 2009).This helps to resolve the snow thermal gradients between the top and the base of the snow cover and might explain how some of the snow insulation effects are reasonably represented in ORCHIDEE, despite the simpler treatment of temperature diffusion.

Permafrost area
Snow cover plays an important role in modulating the variations of soil thermodynamics, hence near-surface permafrost extent (e.g. Park et al., 2015).Here we evaluate whether there is a simple relationship between the simulated Northern Hemisphere permafrost area and the sophistication and ability of the snow insulation component in the LSM to match the observed snow packs.The simulated near-surface permafrost area varies greatly across the nine models in the hindcast simulation (1960-2009;  (20.86 million km 2 ).This is inconsistent with previous studies (e.g.Vavrus, 2007;Koven et al., 2013), which concluded that the first-order control on modelled near-surface permafrost distribution is the representation of the air-to-surface soil temperature difference.Table 4 shows that the situation is more complex and that snow insulation simulation is not the dominant factor in a good permafrost extent simulation.When the land surface models with poor snow models are eliminated, the remaining models' simulated permafrost area show little or no relationship with the performance of the snow insulation component, because several other factors such as differences in the treatment of soil organic matter, soil hydrology, surface energy calculations, model soil depth and vegetation also provide important controls on the simulated permafrost distribution (e.g.Marchenko and Etzelmüller, 2013).

Summary and conclusions
The aim of this work was to evaluate how state-of-the-art LSMs capture the observed relationship between winter nearsurface soil and air temperatures (T soil , T air ) and their modulation by snow depth (d snow ) and climate regime.We presented some benchmarks to evaluate model performance.
The presented relation diagrams of T soil and the difference between T soil and T air regarding snow depth allow for a much better assessment, used to reveal structural issues of the models, than a direct point-by-point comparison with station observations.The results are based on the comparison of LSMs with a comprehensive Russian station data set.We see large differences across the models in their mean air-soil temperature difference ( T ) of 3 to 14 • C, in the sensitivity of near-surface soil temperature to air temperature (T soil vs. T air ; 0.49 to 0.96 • C • C −1 for shallow snow, 0.13 to 0.93 • C • C −1 for thick snow) and in the increase of T with increasing snow depth (modal value of T PDF: 0 to 10 • C for shallow snow, 5 to 21 • C for thick snow).Most of the nine models compare to the observations reasonably well (observations: T = 12 • C, modal T values of 5 • C for shallow snow and of 14 • C for thick snow, T soil vs. T air = 0.62 • C • C −1 for shallow snow, T soil vs. T air = 0.21 • C • C −1 for thick snow).Several models also capture the modulation by air temperature condition (larger increase in T with increasing d snow under colder conditions) and display the control of snow depth on T soil (weaker T soil vs. T air relationship under thicker snow).However, while they generally capture these observed relationships, their strength can differ in the individual models.Two models (ISBA, UVic) show the largest deficits in snow insulation effects and cannot separate the T regimes neither for different snow depths nor for different air temperature conditions.
This study uses the ensemble of models to document model performance with respect to the T soil vs. T air relationship, and to identify those with better performance, rather than to quantify the best model.We were able to attribute performance strength/weakness to snow model features and complexity.Models with better performance apply multilayer snow schemes and consider complex snow processes (e.g.storage and refreezing of liquid water within the snow, wet snow metamorphism, snow compaction).Those models which show limited skill in snow insulation representation (underestimated T , very weak dependency of T on d snow , almost unity ratio of T soil vs. T air ) have some deficiencies or oversimplification in the simulation of heat transfer in snow and soil layer, particularly in the representation of snow depth and density (conductivity).We also emphasise that compensation of errors in snow depth and conductivity can occur.For example, an excessive correlation between T soil and T air can be attributed to excessively high thermal conductivity even when the snow depth is correctly (or over-) simulated.This finding underscores the need for detailed model evaluations using multiple independent performance metrics to ensure that the models get the right functionality for the right reason.It should be noted that the treatment of ground properties, particularly soil organic matter and soil moisture/ice content, also affect the simulated winter ground temperatures.The specific evaluation of these individual processes is more robustly investigated with experiments conducted for individual models (e.g.recently, Wang et al., 2013;Gubler et al., 2013;Decharme et al., 2016).
Snow and its insulation effects are critical for accurately simulating soil temperature and permafrost at high latitudes.The simulated near-surface permafrost area varies greatly across the nine models (from 7.62 to 20.86 million km 2 ).However, it is hard to find a clear relationship between the performance of the snow insulation in the models and the simulated area of permafrost, because several other factors (e.g.related to soil depth and properties and vegetation cover) also control the simulated permafrost distribution.

Data availability
The data will be made available through the National Snow and Ice Data Center (NSIDC; http://nsidc.org);the contact person is Kevin Schaefer (kevin.schaefer@nsidc.org).
The Supplement related to this article is available online at doi:10.5194/tc-10-1721-2016-supplement.

Figure 1 .
Figure 1.Variation of T ( • C), the difference between soil temperature at 20 cm depth and air temperature with snow depth (cm) for winter 1980-2000.The dots represent the medians of 5 cm snow depth bins and the upper and lower bars indicate the 25th and 75th percentiles, calculated from all Russian station grid points (n = 268) and 21 individual winters.The numbers in each model panel indicate the RMSE between the observed and modelled relationship.Colours represent different air temperature regimes.

Figure 2 .
Figure 2. Conditional probability density functions (PDFs) of T ( • C), the difference between soil temperature at 20 cm depth and air temperature for (a) different snow depth classes and (b) air temperature regimes for winter 1980-2000.

Figure 3 .
Figure 3. Spatial maps of the correlation coefficients between snow depth and T , the difference between soil temperature at 20 cm depth and air temperature for winter 1980-2000.Regions with greater than 95 % significance are hashed.

Figure 4 .
Figure 4. Variation of soil temperature at 20 cm depth ( • C) with air temperature ( • C) for winter 1980-2000.The dots represent the medians of 5 • C air temperature bins and the upper and lower bars indicate the 25th and 75th percentiles, calculated from all Russian station grid points (n = 268) and 21 individual winters.The numbers in each model panel indicate the RMSE between the observed and modelled relationship.Colours represent different snow depth regimes.

Figure 5 .
Figure 5. Variation of soil temperature at 20 cm depth ( • C; y axis) with snow depth (cm) for winter 1980-2000.The dots represent the medians of 5 cm snow depth bins and the upper and lower bars indicate the 25th and 75th percentiles, calculated from all Russian station grid points (n = 268) and 21 individual winters.The numbers in each model panel indicate the RMSE between the observed and modelled relationship.Colours represent different air temperature regimes.

Table 1 .
PCN snow model details.

Table 2 .
Sensitivity of near-surface soil temperature (T soil ) to air temperature (T air ) in winter (DJF) calculated by the slopes of the linear regression between T soil ( • C) and T air ( • C) for different regimes of snow depth (d snow ), using data from all Russian station grid points and 21 individual winter 1980-2000.All relationships are statistically significant at p ≤ 0.01.

Table 3 .
Russian station location averaged error statistics for snow depth (cm) and temperature difference between 20 cm soil and air temperature ( T ; • C) for winter 1980-2000.For each variable, the maximum available number of observations (n) is used.Mean St,GS and STD St,GS are the observed mean and interannual variability (standard deviation), while STD is the standard deviations of each model.Bias is the mean error "simulation minus observation" and RMSE is the root mean square error.The statistics for snow depth are given based on both station observation (St) and GlobSnow (GS) data.