Spring snow albedo feedback over northern Eurasia: Comparing in situ measurements with reanalysis products

. This study uses daily observations and modern reanalyses in order to evaluate reanalysis products over northern Eurasia regarding the spring snow albedo feedback (SAF) during the period from 2000 to 2013. We used the state-of-the-art reanalyses from ERA-Interim/Land and the Modern-Era Retrospective Analysis for Research and Applications version 2 (MERRA-2) as well as an experimental set-up of ERA-Interim/Land with prescribed short grass as land cover to enhance the comparability with the station data while underlining the caveats of comparing in situ observations with gridded data. Snow depth statistics derived from daily station data are well reproduced in all three reanaly-ses. However day-to-day albedo variability is notably higher at the stations than for any reanalysis product. The ERA-Interim grass set-up shows improved performance when representing albedo variability and generates comparable estimates for the snow albedo in spring. We ﬁnd that modern re-analyses show a physically consistent representation of SAF, with realistic spatial patterns and area-averaged


Introduction
Global warming is enhanced at high northern latitudes, where the Arctic near-surface air temperature has risen at twice the rate of the global average in recent decades -a feature called Arctic amplification (Serreze and Barry, 2011).Climate model experiments for the 21st and 22nd centuries show that Arctic warming will continue and intensify under all emission scenarios (Collins et al., 2013).Arctic amplification results from several processes interacting with each other such as the albedo feedback due to a reduction in snow and ice cover, enhanced poleward atmospheric and oceanic heat transport, and changes in humidity (Serreze and Barry, 2011;Pithan and Mauritsen, 2014).
As one of the critical factors of the Arctic amplification, the surface albedo feedback implies a decrease in reflected short-wave radiation at the top of the atmosphere in conjunction with decreasing surface albedo and increasing nearsurface temperature (Thackeray and Fletcher, 2016).It is considered to be a positive feedback in the sense that an initial warming is strengthened over time, being quantified through the change in surface albedo per unit change in temperature (Robock, 1983;Cess and Potter, 1991;Qu and Hall, 2007).Snowmelt triggers this feedback via surface absorption of short-wave radiation followed by conversion to Published by Copernicus Publications on behalf of the European Geosciences Union.
During 1979-2011, the Arctic snow cover extent in June decreased at a rate of −21 % per decade (Derksen and Brown, 2012).Climate model projections for the end of the 21st century show an even more reduced Arctic cryosphere and thus the SAF will continue to modulate Arctic warming (Brutel-Vuilmet et al., 2013).The SAF is especially effective over the NH since most of it is covered by snow during boreal wintertime (Groisman et al., 1994).Hall (2004) found that 50 % of the total NH extratropical SAF caused by global warming occurs during spring, while Qu and Hall (2014) estimated that the SAF variability between models accounts for 40-50 % of the spread in the warming signal over the continents of the NH extratropics.
Several studies investigated spring NH extratropical SAF based on satellite, reanalysis and model data sets (Fernandes et al., 2009;Fletcher et al., 2012Fletcher et al., , 2015;;Qu and Hall, 2014).Satellite-based estimates of SAF vary within ±10 % depending on the analysed data set.Hall et al. (2008) used the International Satellite Cloud Climatology Project (ISCCP) data (Schiffer and Rossow, 1983) to calculate a SAF strength of −1.13 % K −1 , whereas Fernandes et al. (2009) using Advanced Very High Resolution Radiometer (AVHRR) data (Justice et al., 1985) found a slightly weaker SAF of −0.93 % K −1 .Qu and Hall (2014) determined the SAF using Moderate Resolution Imaging Spectroradiometer (MODIS) data (Hall et al., 2002) and found a value of −0.87 % K −1 for springtime.By considering different spatial and temporal domains as well as the variety of methods applied, the SAF estimates around −1 % K −1 from satellite data can be considered quantitatively consistent.
Model-and reanalysis-based estimates are somewhat higher compared to those derived from satellite data.Fletcher et al. (2015) investigated Coupled Model Intercomparison Project 3 and 5 (CMIP3/CMIP5) ensembles to estimate the SAF for an assortment of global climate models (GCMs).The authors found a SAF ensemble model mean of −1.2 % K −1 for the NH extratropics, which is in fair agreement with MODIS values but is higher compared to ISCCP-and AVHHR-based estimates.Within this comparison Fletcher et al. (2015) also investigated SAF computations based on ERA-Interim (Dee et al., 2011), Modern-Era Retrospective Analysis for Research and Applications (MERRA) (Rienecker et al., 2011) and NCEP-2 (Kanamitsu et al., 2002) reanalyses, thus providing the most up-to-date assessment of SAF in reanalysis data sets.While MERRA data resulted in a slightly weaker SAF of −1.17 % K −1 compared to ERA-Interim (−1.23 % K −1 ), both reanalyses show similar SAF values compared to MODIS.That said, most studies use satellite-derived albedo data in conjunction with temperature and snow cover data from reanalyses.
Although satellite products of snow cover and albedo cover large parts of the NH, they exhibit low temporal resolution and significant uncertainties for high solar zenith angles as well as complex terrains (e.g.Wang et al., 2014).Thackeray and Fletcher (2016) compared CMIP3/CMIP5 model families and found that the models represent the SAF process rather accurately.However, there are still inherent biases likely related to the use of outdated parameterizations.In this respect the use of in situ observations would provide an opportunity for evaluating SAF estimates in different gridded data sets and especially among reanalyses.However, estimating SAF in the Arctic using in situ data is challenging, mostly because of the lack of reliable, relevant observations, both in the temporal and spatial domains.Furthermore, the lack of in situ SAF estimates hampers the understanding of SAF in high-latitude climates (Graversen and Wang, 2009;Gravesen et al., 2014).
In this study we use a unique data set of daily observations and modern reanalyses over northern Eurasia in order to (1) evaluate reanalysis products with respect to radiation and snow properties and (2) determine the SAF in spring between 2000 and 2013 based on in situ measurements.We compare different land-reanalysis products with modified vegetation settings.Specific questions to be addressed in this study are the following.How well do the modern reanalyses reproduce snow and radiation features on a daily resolution?What are realistic estimates of the SAF from the station data over northern Eurasia and how well do they compare to the gridded reanalyses data?What are the major characteristics of space-time variability of the SAF in station and reanalysis data?
The paper is organized as follows.After describing the different data sets and the methods in Sects. 2 and 3, we evaluate the daily output for snow, radiation fluxes and temperature within these data sets in Sect.4.1.In Sect.4.2 we assess the results of the SAF computations and the differences between products including an analysis of the spatial and temporal variability.Section 5 discusses the results and considers potential implications for future studies.

Reanalysis data
To investigate the SAF processes in reanalyses, we evaluated two products: the ERA-Interim/Land (ERAI-L, Balsamo et al., 2015) and Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2) (Gelaro et al., 2017).ERAI-L is a land-surface-only simulation driven The Cryosphere, 12, 1887Cryosphere, 12, -1898Cryosphere, 12, , 2018 www.the-cryosphere.net/12/1887/2018/by the near-surface meteorology and fluxes from the ERA-Interim atmospheric reanalyses (Dee et al., 2011).The landsurface model in ERAI-L (HTESSEL) has several enhancements compared with the land-surface model used in ERA-Interim including the snowpack representation (Dutra et al., 2010).ERAI-L considers the prognostic evolution of snow mass and density, and for exposed areas there is also a prognostic evolution of snow albedo.For shaded snow, i.e. snow under high vegetation, the albedo is considered constant and dependent on vegetation type (see Dutra et al., 2010 for more details).MERRA-2 also includes a dedicated land module for surface variables.Furthermore, it applies an updated Goddard Earth Observing System (GEOS) model and analysis scheme and assimilates more observations than its predecessor MERRA (Rienecker et al., 2011).Finally, MERRA-2 uses observation-based precipitation data to force its landsurface parameterizations (Reichle et al., 2017), similar to what formerly was known as MERRA-land.Unlike ERAI-L, MERRA-2 consists of a full land-atmosphere reanalysis.Its incremental analysis update (IAU) scheme improves upon 3D-Var by dampening the analysis increment.In IAU, a correction is applied to the forecast model gradually, limiting precipitation spin-up in particular.
For near-surface temperature we use 2 m air temperature for both the reanalyses and observations.Moreover, we do not use albedo computed by the reanalysis but calculate it from the radiative flux components consistently with the observed albedo.For this purpose, we use upward and downward short-wave radiation at the surface as diagnosed by ERA-Interim and MERRA-2 as well as surface net and surface incoming radiation from the station observations.Snow depth is used as inferred by reanalyses and, if needed, converted to centimetres.More information about general characteristics of reanalysis products in the Arctic can be found in Lindsay et al. (2014), Dufour et al. (2016) and Wegmann et al. (2017).

Idealized reanalysis experiment
Since the in situ measurements in this study are observed over clear-cut vegetation, idealized simulations prescribing grassland everywhere were carried out with the ERAI-L configuration (hereafter ERA-Interim/Land grass only, ERAI-LG).The ERAI-LG simulation was carried out with the same model and set-up as ERAI-L, differing only in the land cover used.The land-surface model used in ERAI-L, HTESSEL, accounts for subgrid-scale land cover variability by representing several land tiles, namely low vegetation, high vegetation, bare ground, exposed snow (snow on top of bare ground or low vegetation), shaded snow (snow under high vegetation) and interception.The land cover is prescribed with four maps: low and high vegetation cover (cvl and cvh) and low and high vegetation types (tvl and tvh).The bare ground fraction is computed as cvb = 1 − cvl − cvh, the snow fraction is a function of the mean grid-box snow depth and the interception fraction is a function of the mean interception reservoir water content.For the ERAI-LG simulation, the high vegetation cover was set to zero (cvh = 0), the low vegetation cover to one (cvl = 1) and the low vegetation type to grassland.In this idealized simulation the entire globe was covered in grassland so that only the low vegetation and exposed snow (when snow is present) tiles were active.The main goal of this simulation is to evaluate the role of land cover when comparing point observations with gridded reanalysis and to evaluate pathways to improve reanalyses in representing albedo processes.

Observational in situ data
To evaluate reanalysis performance, we used newly assembled in situ radiation observations from Russian meteorological stations.This data set includes 4 h solar radiation and radiation balance data from the World Meteorological Organization (WMO) World Radiation Network of the World Radiation Data Center (WRDC) at the Voeikov Main Geophysical Observatory, Saint Petersburg, Russia.The original WRDC data contains time series from 65 locations.We selected 47 stations for this study because they overlap with daily snow depth and 2 m temperature observations (see Supplement Table S1).Of these 47 stations three were attributed by ERAI-L to ocean grid points and we decided to remove the three coastal stations from the initial data set, so that the final data set consists of 44 stations.Temperature and snow depth observations were taken from the All-Russian Research Institute of Hydrometeorological Information World Data Centre (RIHMI-WDC), Obninsk, Russia.A detailed description of this data set is provided by Bulygina et al. (2010).This data set includes snow depth as well as snow cover fraction around meteorological stations.Snow cover information in this data set is not stored in percentages but rather on a scale of integers from 0 to 10 (for example, 50 % is assigned a value of 5 but so is 53 %).This makes these data hardly applicable for precise SAF calculations.Snow depth information is measured in centimetres with a precision of 1 cm.This might lead to an underestimation of snow depth in the case of shallow snow (between 0 and 1 cm).All variables (temperature, snow depth and snow cover, surface LW radiation budget and surface SW radiation, the sum of the surface shortwave and long-wave radiation budgets) were represented as daily time series for the period 2000-2013, which is the time period available for the radiation observations by the Voeikov Main Geophysical Observatory.
Figure 1 shows the location of the stations together with the climatological 2000-2013 MAMJ snow depth as computed by ERAI-L.The distribution of stations is quite heterogeneous, with very few stations located in eastern Siberia and in the Far East.Moreover, some stations have prolonged periods of missing values; six stations have more than 50 % missing values in the daily time series for MAMJ.For monthly www.the-cryosphere.net/12/1887/2018/The Cryosphere, 12, 1887-1898, 2018 means, the total number of missing values generally decreases from 2000 to 2013 (see Supplement Fig. S1).However, data for the year 2009 are missing at 44 out of 47 stations during the MAM period and at three stations in June.
Nevertheless, spatial and temporal coverage of this data set is exceptional for the analysis of albedo in this region.It is also important to note that neither snow nor radiation from these stations were assimilated in the reanalysis data sets and therefore our intercomparisons are completely independent.

Methods
To evaluate the climatic variables needed for the SAF computation, we first compared daily values of snow depth, albedo and 2 m temperature from the meteorological stations with those from the reanalyses.To co-locate observations with reanalyses, we extracted the information of the grid cell from the reanalysis in which the station is located.In the case of ERA-Interim/Land, the horizontal resolution is 0.75 • ×0.75 • degrees, whereas MERRA-2 has a horizontal resolution of 0.5 • × 0.625 • degrees.That said, the extracted values of the grid cell are expected to show fewer variability and lower peak values, since they are integrated over a larger spatial domain, which dampens extreme values.We then derived long-term differences, performed a correlation analysis and compared the variability among the data sets for the MAMJ period.
Since the SAF signals for the seasonal cycle and long-term climate change are highly correlated (Hall and Qu, 2006), we focus here on the evaluation of the seasonal cycle.Snow cover is converted from snow depth following a logarithmic equation according to which 2.5 cm of snow depth was defined as equivalent to 100 % snow cover (Fletcher et al., 2015).We split SAF into a snow cover component (SNC) and a temperature/metamorphosis component (TEM).SNC relates to the decrease in albedo linked to the earlier melting of snow.TEM concerns the reduction of snow albedo due to enhanced metamorphism and larger grain sizes at warmer temperatures.In this study we focus on these two components of the feedback process rather than the general classic term for net SAF ( α/ T ), since our goal is to evaluate differences in the more intricate terms of SAF.In the following, we assume that SAF = SNC + TEM, which was shown to be true in nearly all cases for the NH (Fletcher et al., 2012(Fletcher et al., , 2015)).Therefore, we compute the two terms as SNC = (α snow − α land ) S c / T 2 m (1) and where α snow is the snow-covered surface albedo, α land is the snow-free surface albedo, S c is the snow cover fraction and T 2 m is the 2 m temperature.The first term of SNC (α snow − α land ) is also known as an albedo contrast, whereas the second term ( S c / T 2 m ) will be referred to as the snowmelt sensitivity.In Eqs. ( 1) and ( 2) deltas indicate monthly changes and the overbars indicate means over the two adjacent months.Note that T 2 m does not represent a hemispheric mean but rather the difference at an individual location.It was found that the contribution of SNC and TEM to the overall SAF is between 60 to 70 and 30 to 40 % for the NH (Fletcher et al., 2015).
In our SAF assessment, we use 2 m temperature as a surrogate for near-surface air temperature, since the latter variable is not represented by the stations.Using 2 m temperature introduces some uncertainty to the results since atmospheric temperature advection can play a role in local temperature evolution.However, by now multiple studies (Fletcher et al., 2015;Xiao et al., 2017;Kevin et al., 2017) deal with 2 m temperature in their SAF assessments, mainly due to the same comparability issues.
Since daily data are available, we define α snow as the monthly mean over all daily estimates during the specific month when S c = 100 %.Moreover, we define α land as the mean over all daily estimates during MAMJ (in some stations this might only occur in June) when S c = 0 % .This allows for a less artificial estimation of α land than is conventional using summer (e.g.August) albedo.

Daily data evaluation
Since 2 m air temperature in reanalyses has been comprehensively evaluated in previous studies (e.g.Schubert et al., 2014;Lindsay et al., 2014), we only perform a general comparative assessment of the daily values of albedo and snow depth in the SAF computations.That said, Lindsay et al. ( 2014 negative biases over Russia in winter for both ERA-Interim and MERRA-1, whereas in summer ERA-Interim basically shows no bias and MERRA1 shows slight positive biases.
Improvements in this regard from MERRA-1 to MERRA-2 are to be expected.
Figure 2 shows an overall comparison between station data and reanalyses in terms of correlations, differences and magnitude of variability quantified by the standard deviation for the albedo and snow depths.On a day-to-day basis MERRA-2 and ERAI-L are underestimating average albedo values compared to observations by about 0.1 during MAMJ (Fig. 2a).In contrast, ERAI-LG shows a much smaller average deviation from the station data with differences close to zero.However, the overall range of the box plot for ERAI-LG is similar to the other two reanalyses resulting in only slightly fewer absolute deviations from the observations.
For snow depth (Fig. 2b), all three reanalysis data sets show an overestimation of daily values for MAMJ.Interestingly, ERAI-LG shows the largest deviations from observed values, although the grass better represents the conditions at the observational sites.This can be caused by biases in the observations due to surrounding higher vegetation creating a snowfall shadow or negative instrumental biases (Rasmussen et al., 2012).Moreover, positive biases in particular for precipitation can occur in reanalysis products (Brun et al., 2013).
The analysis of daily correlations (Fig. 2c and d) demonstrates that the correlations for the albedo are generally weak among all three experiments, whereas for some stations they can reach correlation coefficients higher than 0.8.Surprisingly, the correlations between MERRA-2 and station data are highest for albedo and lowest for snow depth.The observed difference between MERRA-2 and the ECMWF experiments regarding the correlation for albedo can likely be explained by the introduction of aerosols (and their respective deposition) in MERRA-2 (see the Supplement for a initial investigation).For snow depth, the correlation values are dominated by snowfall and melting events.Also, in this case, the grass-only experiment shows no improved performance compared to the classic ERAI set-up.
All reanalyses severely underestimate the day-to-day variability of the albedo (Fig. 2e and f).MERRA-2 and ERAI-L show similar means but reach the overall station level only in specific grid cells.A clear improvement is observed in ERAI-LG, which shows the smallest deviation from station estimates.Nevertheless, all modern reanalyses fail to adequately reproduce daily variability in the observed albedo.In contrast, for snow depth the agreement is very good.The mean values of all four products are around 8 to 10 cm, with the grass-only experiment being the closest to the average station variability.
In summary, the box plot analysis (Fig. 2) reveals that there is a general improvement in the agreement between the stations and ERAI-L if vegetation is set to grass only.However, none of the reanalysis products can accurately reproduce day-to-day albedo variability.This is likely explained by the comparison of grid versus point observations, where small-scale variations are averaged out.

Analysis of feedback components
To assess regional patterns of key SAF components, we show their spatial distribution over Russia as revealed by the observations in Fig. 3 (see Supplement Figs.2-4 for the respective distribution from the reanalyses data).
Strong SNC (Fig. 3a) responses in the station data are observed in southern European Russia and Western Siberia as well as over the Far East.The weaker responses are observed in south-eastern Siberia.TEM (Fig. 3b) follows a similar distribution but is more homogeneously distributed with most negative values in central Siberia and towards the Arctic coastline.Snowmelt sensitivity (Fig. 3c) is strongest in the midlatitudinal and subpolar regions north of 50 • N, such as Finland to the south-east, west and north of Lake Baikal and along the Pacific coast.Here the temperatures react most strongly to seasonal snowmelt.While there is a broad agreement between the stations and ERAI-LG in this region, stations show a somewhat stronger snowmelt sensitivity (not shown).Snowmelt sensitivity is a key factor for the SNC calculations and thus shapes the spatial variability of SNC.
www.the-cryosphere.net/12/1887/2018/The Cryosphere, 12, 1887-1898, 2018 The other key factor in the SNC calculations is the contrast in albedo between snow-covered and snow-free periods (Fig. 3d).The observed albedo contrast is characterized by a relatively homogeneous pattern with somewhat smaller values in the southern regions, especially over southern Siberia, east of Lake Baikal.In general, a north-south gradient is visible, with similar patterns to the SNC.Mean albedo for spring (Fig. 3e) shows that the highest values are found closer to the Arctic coastline, in central Siberia and towards the western border.Lower mean albedo values are mostly located east of Lake Baikal.This distribution is in general agreement with the reanalyses data sets, especially for the lower values in the south-east.
Finally, since TEM closely follows the general MAMJ snow distribution, we show average snow depth in Fig. 3f.A clear north-south gradient is visible with hotspots at the Pacific coast and towards the Barents and Kara seas.Moreover, snow depths from stations closely follow the ERA-L snow depth distribution shown in Fig. 1.
To analyse the differences between the data sets and to put the station data in context, Fig. 4a shows the response for SAF computed for the entire period 2000-2013 and all 44 locations.Stations show much stronger SAF (−2.5 % K −1 ) compared to MERRA (−1.6 % K −1 ) and ERAI-L (−1.8 % K −1 ).At the same time ERAI-LG shows the SAF estimate close to that derived from the station data (−2.8 % K −1 ).Thus, changing the vegetation to short grass adds an additional 1 % albedo decrease per degree of warm-  ing to the feedback process.Further analysis of the two components of SAF (SNC and TEM, Fig. 4b and c) shows that ERAI-LG successfully reproduces the SNC signal derived from the station data (−1.6 % K −1 mean for stations and −1.7 % K −1 mean for ERAI-LG), whereas the other two reanalyses show much weaker SNC values.The lowest value of −0.56 % K −1 was obtained from the MERRA-2 data.In general, SNC responses largely explain differences in SAF (Fig. 4a).
For TEM values (Fig. 4c), all three reanalyses are in a good agreement with the observations, with MERRA-2 showing the best agreement.Changing the vegetation to grass in ERA-Interim results in a TEM component, which is 0.4-0.5 % K −1 stronger compared to the standard version of ERA-Interim.Given that TEM represents the response to snow metamorphosis, good performance of MERRA-2 is in agreement with findings implied by Fig. 2.However, it is worth noting that for the station network as well as for the ECMWF experiments, locations with positive TEM are calculated.This is due to snow albedo changes being positive in some instances (Fig. 4c).
To further investigate the nature of the SNC and TEM responses in Fig. 4d we show the results for snowmelt sensitivity, which is one of the two key components in the SNC response (Eq.1).This component is barely influenced by the underlying vegetation.All three reanalysis data sets agree very well with the station network, with ERAI-LG show- The Cryosphere, 12, 1887Cryosphere, 12, -1898Cryosphere, 12, , 2018 www.the-cryosphere.net/12/1887/2018/ing the closest agreement for both mean and median values.This indicates an accurate representation of this relationship in both NASA and ECMWF land-surface modules.Figure 4d implies that the changes in the SNC should stem from the albedo contrast, the second key component expressed as the average difference between albedo values for a complete snow cover and snow-free conditions (Fig. 4e).Indeed, MERRA-2 shows the lowest albedo contrast among all data sets, resulting in very low SNC values.The albedo contrast in ERAI-L is higher than MERRA-2 but is on average still lower than the observations, which show average values around 0.35.ERAI-LG shows the strongest albedo contrast, which is twice as large as the experiment with classic vegetation cover.These striking differences among the data sets mainly drive the SNC results.
Snow albedo is well captured by the grass-only experiment, showing the same average value, around 0.6, as determined from the observations (Fig. 4f).The standard vegetation schemes used in MERRA-2 and ERAI-L reduce the snow albedo in the analysed grid cells to 0.33 and 0.37.The differences in snow albedo between the products is the main driver for the differences in the albedo contrast, since the snow-free albedo values are remarkably similar for all reanalysis products (Fig. 5a).Nevertheless, they strongly deviate from the snow-free albedo determined from the observa-tions, which is roughly twice as large as the reanalyses, with a mean value of about 0.21, and is very close to albedo values for grass (see e.g.Betts and Ball, 1997;Wei et al., 2001).
To explore the impact of different factors on the TEM estimates, in Fig. 5 we show mean values of temperature, snow cover and albedo, as well as the average change in snow albedo during spring.Also, to underline the crucial role of in situ snow depth information, mean snow depth is shown.Mean station snow depth lies within the range of reanalyses values, with higher values reported by ERAI-LG.Moreover, stations have the lowest snow cover among all data sets (Fig. 5b and c).This difference is likely due to the conversion of snow depth to snow cover as well as from the precision (in centimetres) of the Russian snow depth measurement.The precision of snow depth diagnosed by reanalysis is much finer and the logarithmic conversion here can be performed more accurately.As a result, TEM values diagnosed by stations are probably too low.If we consider instead in situ snow cover information from stations, the average snow cover is quite similar to the reanalyses (ca.55 %), and the average TEM value strengthens.However, replacing converted snow cover with observed snow cover in Eq. ( 2) is a questionable procedure, as the remaining terms were computed using snow depth conversion.Thus, for consistency we show lower values of TEM in Fig. 4.
Temperature is well represented by all data sets with MERRA-2 being about 1 K colder than at the stations, which is quite notable for such a robust variable.However, absolute values of temperature do not have a strong impact on the computation of TEM, since monthly changes in temperature affect both TEM and SNC computations.For the ERAI-LG albedo contrast, the effects of the underestimated snow-free albedo and overestimated snow albedo cancel each other out.Finally, the snow albedo change during spring (Fig. 5f) is very similar in station data and in MERRA-2 (−0.09 average in both data sets), which points towards an adequate representation of snow metamorphosis and aerosol deposition in MERRA-2.The ERAI-LG experiment shows a stronger change in snow albedo during spring than the standard version.ERAI-L potentially keeps the temperature and therefore snow metamorphosis more constant throughout spring due to a more stable local temperature climate induced by the vegetation.Note also that some stations show an increase in snow albedo during spring.This can be caused by fresh snow accumulation in late spring in some locations.
For snowmelt sensitivity (Fig. 6c) the agreement among the data sets is very good when it comes to magnitude and interannual variability, with MERRA-2 showing an ampliwww.the-cryosphere.net/12/1887/2018/The Cryosphere, 12, 1887-1898, 2018 fied interannual variability (up to 1.5 % K −1 ), which is beyond the magnitudes observed at the stations.As already noted above, snowmelt sensitivity seems to be a rather wellreproduced process in modern reanalyses.Since snow-free albedo is quite constant over time in the reanalyses, the albedo contrast is dominated by the snow albedo (Fig. 6d).ERAI-LG and the station network agree very well on the magnitude of snow albedo, whereas ERAI-L and MERRA-2 fail to reproduce such high values.Magnitudes of interannual variability can reach up to ±0.05 in stations, with slightly weaker responses in reanalyses.The correlation between stations and reanalyses is rather low: only individual years are captured correctly by ERAI-LG (see Supplement for correlation values).Snow albedo change within spring (Fig. 6e) is well captured by MERRA-2 and ERAI-LG.Furthermore, ERAI-LG captures the interannual variability well for this metric.Specifically, variability during 2001-2004 and 2005-2008 periods is quite well represented.In contrast, ERAI-L seems to lack consistency with observations.Finally, as mentioned in Sect.4.1, snow depth variability (Fig. 6f) is very well cap-  tured by all reanalyses.Again, ERAI-LG overestimates snow depth by up to 5 cm, with the other two reanalyses being on average 1-2 cm above the station values.
To further demonstrate the effect of the vegetation changes on the ERA-Interim/Land reanalysis, Fig. 7 shows anomalies between ERAI-L and ERAI-LG.The structure follows Fig. 6, with SNC and TEM shown in Fig. 7a and b.As is clearly visible, both variables are generally less negative in ERAI-L, a fact already known from time series and box plot analysis.The largest impact of the vegetation changes is found for northern Russia, the Pacific coast and the western region between the Black Sea and the Caspian Sea.Interestingly, but as expected, snowmelt sensitivity (Fig. 6c) is not the key driver behind this distribution.Since snowmelt sensitivity is not directly linked to vegetation changes, the anomaly distribution is very heterogenous, with positive and negative anomalies over the whole domain.As known from the time series plot, snow sensitivity in ERAI-LG is overall slightly weaker than in ERAI-L, probably due to positive feedbacks such as reduction of night-time cooling over higher vegetation types.The main driver behind the distribution of SNC is the albedo contrast (Fig. 7d).The albedo contrast is higher overall in ERAI-LG, especially along the borders of the domain, which are already highlighted for SNC.
The Cryosphere, 12, 1887Cryosphere, 12, -1898Cryosphere, 12, , 2018 www.the-cryosphere.net/12/1887/2018/ 5 Discussion We compared spring SAF and its components determined from in situ measurements over Russia for the period 2000-2013 with data derived from three modern reanalysis products restricted to the grid cells including the observational sites.This was achieved by using a unique collection of station measurements of radiation and snow characteristics, investigating observed SAF for the first time over this broad spatial and temporal domain.Besides ERAI-L we used a customized version of ERAI-L (ERAI-LG) in which vegetation was set to grass in all concerned grid cells.All three reanalysis data sets are completely independent from the analysed station data.While a direct comparison of point measurements with grid cell output always introduces uncertainties due to the spatial variability of the surface, this is for now the only way to evaluate reanalyses data using in situ observations.An alternative option would be satellite data, which come with their own uncertainties (e.g.Romanov et al., 2002;Foster et al., 2005;Wang et al., 2014).
Snow depth statistics derived from daily station data are reasonably well reproduced in all three modern reanalyses, which is in agreement with Wegmann et al. (2017), who investigated April snow depth in ERAI-L.While snow depth differences between ERAI-L and ERAI-LG are small, ERAI-LG shows slightly higher deviations from the station data than ERAI-L, which might be caused by the higher vegetation in the station surroundings and by an underestimation of snowfall due to instrumentation used at the Russian station network (Rasmussen et al., 2012).
Day-to-day variability of albedo is notably higher in station data compared to any reanalysis product.Besides spatial averaging over the reanalyses grid cells, this is potentially caused by land-surface changes due to weather (e.g.soil moisture change, aerosol deposition), which are not represented in the reanalyses.However, ERAI-LG demonstrates increasing albedo variability, nearly doubling the standard deviations diagnosed by ERAI-L with the standard vegetation scheme.
The limitations of the station data imply some constraints for comparisons with reanalysed data.As near-surface temperature is unavailable in station data, we used for both stations and reanalyses 2 m air temperature, which reduces the strength of the SAF feedback.Moreover, using local 2 m air temperature first and then averaging over our domain later leads to lower SAF values than if we would have used NHaveraged 2 m air temperature.Since albedo changes at our stations are much more dramatic (due to the WMO conditions) than in model or satellite grid cells, using geographically smoothed temperature data would eventually lead to a much stronger impact of albedo changes on temperature changes.Thus, our results are not to be seen as a Northern Hemisphere impact analysis but rather as a contribution to reanalysis improvement and the investigation of SAF evolution.
Secondly, snow cover is underestimated in station data due to the measurement precision of 1 cm, which reduces the strength of the TEM component.The snow albedo and the snow-free albedo are substantially higher in station data than in the reanalyses with classic vegetation boundary conditions (MERRA-2 and ERAI-L).Compared to other observationbased studies, spring snow albedo and grass albedo derived from our station network are quite realistic (Roesch et al., 2009;Stroeve et al., 2006).Thus, the difference revealed by reanalyses is likely due to averaging over grid cells.
Results from ERAI-LG clearly demonstrate that SAF and its components are very close to those in the station data.The largest improvement was found for the albedo contrast and for snow albedo, which both are more realistic in ERAI-LG.At the same time snow-free albedo in all three reanalyses (including ERAI-LG) was found to be lower than in the station data, because snow-free albedo in all reanalysis data sets is prescribed as a monthly climatology from MODIS data.As MODIS mostly registers albedo from taiga and tundra vegetation, a stark difference to the grass albedo from the stations occurs.
MERRA-2 shows the lowest SAF values resulting from a very low albedo contrast, which is probably a consequence of the vegetation scheme in the MERRA-2 land module.In contrast, MERRA-2 represents TEM reasonably well, most likely due to the accurate representation of the intra-seasonal snow albedo changes.Thus, relative snowpack changes appear to be well represented in MERRA-2, probably also due to a more accurate representation of aerosols.
In general, we found higher SAF values in ERAI-L than in the recent CMIP3/CMIP5 analyses of NH SAF by Fletcher et al. (2015).This disagreement results from a variety of factors.First, our domain is limited to Russia only, thus excluding considerable parts of Eurasia as well as North America.In this respect our domain is set within a high SAF region, which may explain the higher SAF values compared to the NH average by Fletcher et al. (2015).In contrast, MERRA-2 shows good agreements with the NH CMIP4/5 SAF results but mostly because the albedo contrast is very low.Furthermore, as we pointed out above, in situ observations used here tend to slightly overestimate SAF, mainly due to higher snow albedo values.This is because in situ snow albedo is typically measured by a sensor installed over a vegetationfree snow pack.The vegetation scheme used in reanalyses gives lower snow albedo values, implying realistic vegetation cover such as taiga or tundra.However, our MERRA-2 results agree fairly well with the findings of Fletcher et al. (2015).Moreover, mean values of the albedo-independent variable snowmelt sensitivity are very close to the "observational" snowmelt sensitivity computed by Fletcher et al. (2015).
We also found agreements with Fletcher et al. (2015) in the representation of the spatial pattern of the SAF components.Fletcher et al. (2015) as well as Fernandes et al. (2009) have shown maxima in SAF over northern Canada, northwww.the-cryosphere.net/12/1887/2018/The Cryosphere, 12, 1887-1898, 2018 ern Siberia and south-western Eurasia.The relation of 60 : 40 between SNC and TEM, which is found in modelled, satellite and reanalysis data, was replicated by our station network.We found similar spatial patterns for SAF and its components in both stations and gridded data specifically for southern Russia, while the pattern of station responses is less homogenous than the gridded data.Also consistent with Fletcher et al. (2015), we found higher snowmelt sensitivity north of 50 • N. Finally, the albedo contrast distribution, which closely follows the snow albedo pattern, is in very good agreement with the gridded analysis of snow albedo by Fletcher et al. (2015).

Conclusions
Reanalyses including land-surface modules show a physically consistent representation of SAF with realistic spatial patterns and area-averaged sensitivity estimates.ERAI-LG shows a better performance in representing station-based estimates considering the uncertainty associated with "point to grid cell" comparisons.Accounting for aerosol-related processes would likely improve this performance in future reanalysis releases.Thus, for the analysis and validation of large-scale temporal and spatial averages of SAF, modern reanalyses seem to be an appropriate tool.However, to analyse processes on smaller scales and at high temporal resolution, a healthy dense station network is required.The idealized ERAI-LG simulation also highlights the caveats of comparing in situ observations with gridded model data.In this study, we show these discrepancies in terms of albedo and snow depth.Other variables, in particular 2 m temperature, can be expected to have a similar signal arising from the differences between the model's grid cell land cover and the actual station conditions.Our findings show that the experimental approach in ERAI-LG allows for enhanced use of in situ observations to diagnose the SAF in non-forested areas.
Considering future studies, the extension to other regions and use of other regional in situ data might give further insights into regional hotspots of SAF.Cross-validation efforts employing model, reanalysis, satellite and station data may help to generate blended products to investigate radiation and albedo feedbacks in the changing Arctic, a region where SAF is especially strong.Regional modelling, including a variety of multilayer land-surface models over areas with a relatively dense observation network, can provide a quantitative estimation of uncertainties among complex variables such as snow depth, albedo or SAF.
Competing interests.The authors declare that they have no conflict of interest.

Figure 1 .
Figure 1.Station location and snow depth (cm) for the 2000-2013 MAMJ average taken from ERAI-L.Red-coloured stations are excluded by the land-sea mask of ERAI-L.

Figure 2 .
Figure 2. Box plot analysis for daily albedo (a, c, e) and snow depth (b, d, f) estimates using data from 44 locations over 2000-2013 MAMJ period.(a, b) Difference between station and reanalysis, (c, d) linear correlation between station and reanalysis, (e, f) standard deviation.Triangle indicates the mean value.

Figure 5 .
Figure 5. Box plot analysis for MAMJ 2000-2013: (a) snow free albedo, (b) snow cover fraction, where the light grey box plot is the originally observed snow cover from stations, (c) snow depth, (d) 2 m temperature, (e) mean albedo and (f) snow albedo change within the season.Triangle indicates the mean value.