Diagnosing the decline in climatic mass balance of glaciers in Svalbard over 1957-2014

Estimating the longterm mass balance of the High Arctic Svalbard archipelago is difficult due to the incomplete geodetic and direct glaciological measurements, both in space and time. To close these gaps, we use a coupled surface energy balance and snow pack model to analyze the mass changes of all Svalbard glaciers for the period 1957-2014. The model is forced by ERA-40 and ERA-Interim reanalysis data downscaled to 1 km resolution. The model is validated using snow/ firn temperature and density measurements, mass balance from stakes and ice cores, meteorological measurements, snow depths 5 from radar profiles and remotely sensed surface albedo and skin temperatures. Overall model performance is good, but varies regionally. Over the entire period the model yields a climatic mass balance of 8.2 cmw.e. yr which corresponds to a mass input of 175 Gt. Climatic mass balance has a linear trend of -1.4±0.4 cmw.e. yr with a shift from a positive to a negative regime around 1980. Modeled mass balance exhibits large interannual variability, which is controlled by summer temperatures and further amplified by the albedo feedback. For the recent period 2004-13 climatic mass balance was -21 cmw.e. yr, and 10 accounting for frontal ablation estimated by Błaszczyk et al. (2009) yields a total Svalbard mass balance of -39 cmw.e. yr for this 10-year period. In terms of eustatic sea level, this corresponds to a rise of 0.037 mmyr. Refreezing of water in snow and firn is substantial at 22 cmw.e. yr, or 26 % of total annual accumulation. However, as warming leads to reduced firn area over the period, refreezing decreases both absolutely and relative to the total accumulation. Negative mass balance and elevated equilibrium line altitudes (ELAs) resulted in massive reduction of the thick (>2m) firn extent and an increase in the 15 superimposed ice, thin (<2m) firn and bare ice extents. Atmospheric warming also leads to a marked change in the thermal regime, with cooling of the glacier mid-elevation, and warming in the ablation zone and upper firn areas. On the longterm, by removing the thermal barrier, this warming has implications for the vertical transfer of surface meltwater through the glacier and down to the base influencing basal hydrology, sliding, and thereby overall glacier motion.


Introduction
Glaciers are widely acknowledged as good indicators of climate change (e.g. AMAP, 2011), but the relationship between atmosphere, surface energy balance and glacier mass balance is complex. Glaciers and ice caps are currently among the major contributors to current sea level rise (Church et al., 2011), despite their relative small volume compared to the ice sheets of Greenland and Antarctica, and are assumed to be important throughout the 21st century (Meier et al., 2007). The 5 High Arctic archipelago Svalbard has an estimated total eustatic sea level rise potential of 17-26 mm (Martín-Español et al., 2015;Huss and Farinotti, 2012;Radić and Hock, 2010). Global glacier mass balance assessments suggest that Svalbard is one of the most important regional contributors to sea level rise apart from Greenland and Antarctica over the 21st century (Giesen and Oerlemans, 2013;Marzeion et al., 2012;Radić et al., 2014), due to its location in one of the fastest warming regions on Earth. 10 Through feedbacks in the climate system, the Arctic region experiences a greater warming than the global average, the so-called Arctic Amplification (e.g. Serreze and Francis, 2006). For the moderate emission scenario RCP4.5, Svalbard has a predicted warming of 5-8 • C and a precipitation increase of 20-40 % by 2100 relative to the period 1986(IPCC, 2013. Since the 1960s, there has been a strong warming of 0.5 • C decade −1 in Svalbard, the strongest warming measured in Europe (Nordli et al., 2014). Simultaneously there was a precipitation increase of 1.7 % decade −1 15 2000). Steady negative glacier mass balance has been recorded since glaciological measurements began in 1967. However, direct measurements are mostly restricted to glaciers along the western coast of Svalbard, and are known to have more negative mass balance than the rest of the archipelago (Hagen et al., 2003a). Climatic mass balance (B clim ), the sum of surface mass balance and internal accumulation , as derived from ice cores over 1960(Pinglot et al., 1999, 2001Pohjola et al., 2002) and modeled balances for the period 1979-2013 (Lang et al., 2015a) show no trends. van Pelt et al. 20 (2016) find a weakly positive precipitation trend, with strongest changes observed to the north of the archipelago. In contrast, geodetic mass balance studies indicate accelerated glacier mass loss over the last decades (James et al., 2012;Kohler et al., 2007;Nuth et al., 2007). There are multiple causes for this apparent disagreement. Geodetic approaches include all components of the mass budget, i.e. the climatic balance and mass losses through calving and submarine melting at tidewater glacier termini. In addition, ice cores are taken in the accumulation area, while trends in the ablation area may differ; the latter have 25 been shown to have a substantial effect on the glacier-wide climatic balances in Svalbard (e.g. Aas et al., 2016;Van Pelt et al., 2012). Meteorologically driven mass balance modeling facilitates filling of these spatial and temporal gaps, however care must be taken to adequately represent spatial and temporal scales of relevant processes. For instance, the coarse spatial 10-km grid used by Lang et al. (2015a) does not well represent the glacier hypsometry at lower elevations thereby influencing the results.
Here we present results from a model study that covers the entire archipelago for the period 1957−2014 at high spatial 30 and temporal resolution. At time steps of 6 hours, we calculate the mass and energy fluxes at the glacier surface and in the subsurface layers using the model DEBAM (Distributed Energy Balance Model) developed by Hock and Holmgren (2005) and Reijmer and Hock (2008). We downscale ERA-40 (Uppala et al., 2005) and ERA-Interim (Dee et al., 2011) climate reanalysis data to 1-km horizontal resolution largely following the TopoSCALE methodology (Fiddes and Gruber, 2014), except for precipitation, where we use the Linear Theory (LT) for orographic enhancement (Smith and Barstad, 2004). We conduct a thorough comparison with a large number and different types of observations to validate model performance. From our model results we identify climatic controls on the climatic mass balance over the study period and discuss implications for the future by testing sensitivities and apply perturbations represinting a 2100 climate as suggested by Førland et al. (2011). We also examine modeled responses of the water retention capacity in a warmer climate and discuss related implications for B clim and 5 ice dynamics through changes in the hydrological and thermal regimes.

Svalbard climate and target glaciers
The Svalbard archipelago is located in the Norwegian Arctic between 75-81 • N ( Figure 1). The land area of the islands is ∼60 000 km 2 of which 57 % is covered by glaciers . While the western side of the archipelago is characterized by alpine topography, the eastern side has less rugged topography and many low altitude ice caps. Through the Norwegian 10 current, an extension of the Gulf stream, warm Atlantic water is advected northwards keeping the western side of Svalbard mostly ice-free year-round (Walczowski and Piechura, 2011). In contrast, the ocean east of Svalbard is dominated by Arctic ocean currents (Loeng, 1991). Similarly contrasting regimes are found in the atmosphere, where warm and moist air is associated with southerly flow, while colder and drier air masses originate from the northeast (Kaesmacher and Schneider, 2011).
These oceanic and atmospheric circulation patterns combined with the fluctuating sea ice edge cause large temporal and spatial 15 gradients of temperature and precipitation across the archipelago (Hisdal, 1998). Therefore, Svalbard has been identified as one of the most climatically sensitive regions of the world (Rogers et al., 2005).
The climate of Svalbard is polar maritime, with both rain or snowfall possible in all months of the year. At the main settlement Longyearbyen, mean annual air temperature for the normal period 1961-1990 is -6.7 • C. A warming trend of 2.6 • C century −1 has been identified from the 117 year long Svalbard Airport temperature record (Nordli et al., 2014); Supple-20 ment (Figure S1-S2). Although a positive trend exists for all seasons, the annual trend is dominated by an increase in winter temperatures. Increased air temperatures and precipitation have occurred simultaneously with reduced sea ice cover around Svalbard (Rodrigues, 2008).
Annual precipitation in Longyearbyen is 190 mm and has increased by 2.5 % decade −1 over the last 80 years (Førland et al., 1997;Hanssen-Bauer and Førland, 1998), although precipitation gauge undercatch complicate trend analysis (Førland et al., 25 1997;Hanssen-Bauer and Førland, 1998;Førland and Hanssen-Bauer, 2003). Large precipitation variability is observed across the archipelago, with about three times higher precipitation along the west coast compared to Longyearbyen (Førland and Hanssen-Bauer, 2003) and even more in southern Spitsbergen Winther et al., 2003). The drier central Spitsbergen has less extensive glacier coverage and is characterized by land-terminating cirque and valley glaciers.
Surface mass balance is measured at stakes along the center profiles of five glaciers (see Figure 1). Etonbreen (∼640 km 2 ) 30 is the largest of these glaciers, and drains gently westwards from the summit of the Austfonna ice cap. Surface mass balance has been monitored at Austfonna since 2004, but the stakes along Etonbreen are the only ones which have been measured every year. Glacier-wide specific surface mass balance has been close to zero, but margin retreat since the last surge in the 1930s makes the overall mass balance negative. Kongsvegen (∼100 km 2 ) and Holtedahlfonna (∼385 km 2 ) are situated in northwest Spitsbergen with mass balances measurements since 1987 and 2003, respectively. The ice field Holtedahlfonna feeds Kronebreen, a steady fast-flowing glacier, while Kongsvegen is nearly stagnant, since it is in the quiescent phase after a surge in 1948 (Kääb et al., 2005). Kongsvegen and Kronebreen merge downstream and enter Kongsfjorden together. Despite the proximity the two glaciers, Kongsvegen has higher specific accumulation rates than Holtedahlfonna, but because of their 5 distinct hypsometries, glacier-wide surface mass balance on Kongsvegen (-4 cm w.e. yr −1 ) is slightly more negative than Holtedahlfonna (-2 cm w.e. yr −1 ) over 1966(Nuth et al., 2012. However, calving and marginal retreat of Kronebreen make the overall mass balance of the combined Kronebreen and Holtedahlfonna system strongly negative. Nordenskiöldbreen (∼200 km 2 ) is a valley glacier located in central Spitsbergen, flowing southwest from the Lomonosovfonna ice field to enter the Adolfbukta fjord. Mass balance has been measured since 2006 and is negative (Van Pelt et al., 2012). Hansbreen (∼56 km 2 ) 10 is a marine-terminating glacier, where mass balance has been measured since 1989. Surface mass balance has been negative by -28 cm w.e. yr −1 since 1989, and wind redistribution of accumulation is important on Hansbreen (Grabiec et al., 2006(Grabiec et al., , 2012.

Topography and glacier masks
The 1-km resolution digital elevation model (DEM) used in this study, was resampled from a 90-m DEM by Nuth et al. 15 (2007). Slope and aspect were computed following Zevenbergen and Thorne (1987). Fractional glacier masks were created by computing the percentage of glacier coverage for each grid point of the 1 km DEM for three periods (1930-60s, 1990s, and 2000s) based on a multi-temporal inventory . However, only the latest DEM has a complete coverage over all of Svalbard ( Figure S3). We created a fourth fractional glacier mask that combines the masks from the three periods so that each grid cell contains the largest glacier extent of any of these periods (henceforth referred to as the reference glacier 20 mask). Finally, an annual time series of glacier masks was created by linear interpolation, thus assuming linear glacier retreat or advance between the epochs and no changes after the 2000s epoch ( Figure S4). Small discrepancies are introduced by converting glacier polygons to the 1 km grid. While the glacier cover in the 2000s inventory is 33,775 km 2 the fractional glacier mask is 14 km 2 (0.04%) larger. The reference glacier mask covers 36,943 km 2 and is 10 % larger than the 2000s inventory. Figure 2 shows the hypsometry of the glacierized area of each region and of all 25 Svalbard regions combined. The error (area difference in each elevation bin between 90-m and 1-km DEM) is generally low, but glacier area in the 1-km DEM is slightly underestimated below 150 m a.s.l.

Downscaled ERA-40 and ERA-Interim climate reanalysis data
The glacier model is forced by fields of downscaled near-surface air temperature, relative humidity, wind and downwelling shortwave and longwave radiation from the ERA-40 and ERA-Interim reanalyses of the European Centre for Medium-Range 30 Weather Forecasts (Uppala et al., 2005;Dee et al., 2011). The reanalysis data is provided at 6-hour intervals on a 0.  Precipitation is often heavily biased in coarsely-resolved reanalyses, especially in environments with pronounced topography, where it typically is too low and lacks spatial detail (Schuler et al., 2008). This is associated with the smoothed representation of the actual topography in the large-scale model used for the reanalysis (Figure 3), leading to an underestimate of 5 orographic precipitation enhancement. We assume that this is the main reason for the poor performance of reanalyzed precipitation, and use instead a linear theory (LT) of orographic precipitation (Smith and Barstad, 2004) to account for orographic enhancement when downscaling ERA-precipitation to our 1 km resolution model domain. The LT-model describes the motion of an air parcel, characterized by its temperature, stability, wind direction and speed. Terrain induced uplift of the air parcel results in condensation and eventually precipitation of moisture further downstream of the uplift. This model has been success-10 fully evaluated using precipitation gauges (Barstad and Smith, 2005) and snow measurements (Schuler et al., 2008) and applied for downscaling precipitation (e.g. Crochet et al., 2007). To discriminate solid from liquid precipitation, a simple thresholding approach was used, assuming that all precipitation is liquid at temperatures above 2.5 • C, all is solid at temperatures below 0.5 • C, with a linear transition between.
The other required climate variables are downscaled to the 1 km grid using the TopoSCALE methodology (Fiddes and Gruber,15 2014). TopoSCALE exploits the relatively high vertical resolution of the reanalysis data, since downscaled variables at the actual topography are based on the properties of the vertical structure in the reanalysis. The downscaled fields preserve the horizontal gradients present in ERA, but include additional features caused by the real topography not present in ERA (Figure 4).
This approach is assumed to outperform simpler bias corrections, since transient properties of the atmosphere are accounted for. For example, transient lapse rates including inversions in the reanalysis data will be preserved in the downscaled product. 20 We modify the TopoSCALE methodology regarding downscaling of direct shortwave radiation and air temperature. For direct solar radiation we apply the relationship of Kumar et al. (1997) for atmospheric attenuation rather than the one given in Fiddes and Gruber (2014). Solar geometry variables such as solar zenith and azimuth, and topographic shading due to local slope and aspect are calculated following Reda and Andreas (2004). Cast shadow and hemispherical obstructions caused by surrounding topography are calculated following Ratti (2001). 5 During summer when air temperatures (T air ) are above freezing, the TopoSCALE method resulted in too high (low) T air values wherever the actual topography is above (below) the ERA topography. During summer, surface temperatures are restricted to the melting point, even as air temperatures aloft are warmer. This near-surface inversion gives rise to erroneous extrapolation when used to scale temperatures for a large vertical distance. To avoid this problem, we use ERA 2-m T air in the downscaling under these conditions, implying that vertical lapse rates vanish whenever melt occurred in the reanalysis.

Validation of climate input
Downscaled variables are compared to observations at the meteorological stations listed in Table 1   In general the agreement is good between downscaled ERA and observed air temperatures, with biases mostly below 1.5 K (Tab. 2). Despite a small bias for mean annual temperatures, there is a clear seasonal bias, with ERA temperatures too warm during winter and too cold during summer ( Figure 5). Although the biases in Figure 5 are negative during summer, ERA is too warm over the glaciers during summer, when 2-m air temperatures are above freezing.
At Svalbard Airport, the performances of downscaled ERA-40 and ERA-Interim are investigated for the entire model period.

5
Over 1957-1979 only monthly measured temperatures are available at Svalbard Airport, where downscaled ERA-40 has a monthly root mean square error (RMSE) of 2.3 • C. For the 1979-2002 period the reanalysis products overlap with monthly RMSE of 1.8 • C and 1.5 • C at Svalbard Airport for ERA-40 and ERA-Interim, respectively. We attribute the lower performance prior to 1979 to the lack of satellite observations to constrain sea surface temperatures and sea ice cover in the reanalysis. Since the Svalbard Airport temperature record and other sites at the west coast are likely incorporated into the reanalysis, the quality 10 of the reanalysis in the pre-satellite era is possibly even lower in remote areas with no observations. The annual observed air temperature trend for the period 1957-2013 at Svalbard Airport is 0.70±0.22 • C decade −1 , while the downscaled ERA data has an insignificantly lower warming trend of 0.67±0.19 • C decade −1 at Svalbard Airport.
Downwelling shortwave and longwave radiation are compared to measurements at Etonbreen  and the Baseline Surface Radiation Network site in Ny-Ålesund (Maturilli et al., 2013). In Ny-Ålesund the model largely reproduces 15 observations both for short and longwave radiation. During winter, downwelling longwave radiation is slightly underestimated, while there is no bias during summer. Since there is no temperature bias in Ny-Ålesund during winter, the underestimation of longwave radiation is indicative of a too thin cloud cover in the reanalysis. Representations of clouds are among the major issues Table 1. Meteorological stations used for validation of the downscaled ERA-Interim reanalysis. N indicates the number of daily averages used in the validation, subscript "T" refers to 2 m air temperature, "RH" is relative humidity, "ws" is wind speed, "L,S" is downwelling short and long wave radiation and "prec" is precipitation. Also given is the elevation of the observation site (Z) and the closest grid point (ZDEM). of the reanalysis (Aas et al., 2016). Downwelling shortwave radiation is overestimated by 7 W m −2 over the summer season in Ny-Ålesund. There is a much better agreement with radiation observations in Ny-Ålesund than in northeastern Svalbard. This is to be expected, since radio soundings and other observation data from Ny-Ålesund are assimilated into ERA-Interim.
Therefore, cloud cover at Ny-Ålesund is much better represented by the reanalysis than at Austfonna.
On Etonbreen during summer, downwelling shortwave radiation is underestimated by 40 W m −2 while downwelling long-5 wave radiation is overestimated by 12 W m −2 , indicative of a too thick atmosphere or too many clouds in the reanalysis.
However, these biases could be partly explained by measurement uncertainty caused by rime on the sensor or by sensor tilt.
The latter issue is caused by melt and deformation of the foundation of the autonomous weather stations causing sensor tilt, changed hemispherical view and thereby errors, especially at large solar zenith angles (Bogren et al., 2015).
Wind speeds are reproduced reasonably well, including the seasonal cycle. Biases are within ±1.5 m s −1 with no clear 10 seasonal trend. It is likely that the biases are caused by site specific effects, such as deceleration of air flow in the lee of a topographic obstacle or acceleration due to channelizing through valleys.
For relative humidity the reanalysis represents the seasonality well, and in late summer both the humidity and the biases have largest magnitude. At the two coastal stations at Hopen and Rijpfjorden, the downscaled reanalysis is too dry whereas it is too humid at the two higher elevation stations. The coarse land mask of the reanalysis and the poor representation of sea ice are most likely the main causes for these biases.
Downscaled precipitation is overestimated by 5 to 25 mm per month at the weather stations, with a slightly higher bias during winter. These biases are partly caused by measurement undercatch, which is reported to be 50 % at Svalbard on an annual basis, with higher undercatch during winter than during summer (Førland and Hanssen-Bauer, 2000). algorithm (Wan and Dozier, 1996;Wan, 2008). Cloudy satellite scenes are masked out using the MODIS cloud mask products MOD35_L2 and MYD35_L2 (Ackerman et al., 1998;Frey et al., 2008). Under cloud-free conditions Terra and Aqua combined provide four estimates of T surf per day at 1000-m resolution. Automatically generated quality control flags represent the confidence level of the produced T surf . As suggested by Østby et al. (2014), observations flagged as "other quality" and "T surf error<3" have been excluded. MODIS T surf usually has an accuracy better than 1 K (Wan, 2014). However, a much lower ac-15 curacy is found for snow and ice surfaces due to the ambiguous cloud detection caused by the spectral similarities of snow and Table 2. Seasonal biases in meteorological variables (downscaled minus observational averages) at all observation sites averaged over each site's observation period (Table 1). Shown are air temperature T, relative humidity, RH, wind speed, WS, shortwave radiation, S ↓ , longwave radiation, L ↓ and precipitation, P. Column headings S and W denote summer (Jun-Aug) and winter (Sep-May), respectively.
Positive numbers indicate that the model results are larger than the observations. The second row at each site is the bias between the downscaled variables and the coarse ERA raw data. (Hall et al., 2004(Hall et al., , 2008Scambos et al., 2006;Østby et al., 2014). For Svalbard Østby et al. (2014) found a RMSE=5.0 K and bias of -3.0 K. Figure S5 in the supplement show average T surf and minimum albedo from MODIS.

MODIS albedo
Similar to the MODIS surface temperatures, daily satellite-derived albedo is provided by the snow cover products MOD10A1 and MYD10A1, at 500-m spatial resolution (Hall and Riggs, 2007). To minimize possible errors in the satellite albedo, acquisition during days of noon solar zenith above 70 • were excluded (Schaaf et al., 2011). Mean daily albedo was only calculated if both Terra and Aqua had observation flagged "good" by the internal quality check. Observations were also discarded if the 5 albedo difference between Aqua and Terra exceeded 0.1, since such rapid and large albedo change can only occur during snowfall, when clouds should anyway preclude acquisition as they are opaque in the visible and thermal regions of the spectrum.
Given these criteria, the satellite-derived albedo yielded RMSE=0.08, mean bias=-0.005 and R 2 =0.72 compared to the noon time albedo measured at Etonbreen.

Model description 10
A surface energy balance model coupled to a snow pack model was used to calculate surface energy fluxes, mass balance, water retention and snow and ice properties. Only the main features will be described here; details can be found in Reijmer and Hock (2008); Hock and Holmgren (2005); Østby et al. (2013). Our model set-up employs slightly different parameterizations for albedo, thermal conductivity, and runoff, compared to the original model.

Surface energy balance 15
The energy balance at the glacier surface is given by: where net radiation is Q N = S ↓ (1−α)+L ↓ +L ↑ . Downwelling short and longwave radiation fluxes are taken from downscaled ERA reanalysis data. Turbulent fluxes of sensible heat (Q H ) and latent heat (Q L ) are calculated from the Monin-Obukhovtheory using downscaled humidity, wind speed and air temperatures at screen level (2 m). Roughness lengths of momentum 20 for snow and ice surfaces are determined through calibration (Section 4.4), while roughness lengths for heat and vapor are calculated according to Andreas (1987). Sensible heat supplied by rain water (Q R ) is derived from the rainfall rate assuming that the hydrometeors have the same temperature as the surrounding air. Q G is the energy exchange with the subsurface layer (Section 4.2). Q M is the energy flux used for melting snow and ice. The sign convention is such that fluxes directed towards the surface carry a positive sign and vice versa.

Albedo
Since shortwave radiation is generally the most important energy supply for melt on Arctic glaciers (e.g. Arendt, 1999;Arnold et al., 2006;Østby et al., 2013), we carefully constructed an albedo parameterization similar to the one by Bougamont and Bamber (2005). Albedo is set to a maximum value during snowfall. Snow aging induces exponential albedo decay with different temperature dependent time scales for wet and dry snow. In case of a thin snow cover, albedo is reduced from snow albedo to 30 the underlying albedo (firn, ice, or superimposed ice), using the relationship described by Oerlemans and Knap (1998), with a characteristic snow depth scale of 3 cm, such that the albedo transition is smooth when snow cover is thin. Precipitation events are frequent (∼200 days a year), but usually yield low amounts (Aleksandrov et al., 2005). The snow depth dependency is essential to avoid that snow albedo is reset to fresh snow albedo in case of an insignificantly thin fresh snow layer. A similar approach was used for albedo reduction to account for water ponding at the surface with a characteristic water depth of 30 5 cm (Zuo and Oerlemans, 1996). Threshold values for albedo of firn and ice are determined during calibration (Section 4.4).
Finally, albedo is adjusted for specular reflection at large zenith angles following Gardner and Sharp (2010).

Subsurface processes
The subsurface model is based on the SOMARS model (Greuell and Konzelmann, 1994) which calculates temperature, density and water content of the subsurface layers and the subsurface energy flux Q G . The surface energy balance and subsurface 10 model are connected through the skin surface temperature assuming the surface to be an infinite layer with zero heat capacity.
Percolation follows a tipping-bucket scheme, where water percolating downwards is partly retained by capillary forces or refreezes upon encountering layers at sub-zero temperatures. Where water meets impermeable ice, slush builds up and lateral runoff is computed using a relationships defined by Zuo and Oerlemans (1996). A density dependent thermal conductivity parameterization (Douville et al., 1995) was calibrated for computing subsurface heat conduction. Irreducible water content is 15 calculated after Schneider and Jansson (2004) and densification of dry snow after Herron and Langway (1980).

Climatic mass balance
The climatic mass balance is the sum of melt and (re-)sublimation at the surface, refreezing in the subsurface layers, and solid precipitation from the downscaled reanalysis. Although the model also calculates the water balance, liquid water retained in snow or firn is not included in the mass balance .

Model set-up and calibration
The model is run for each glacierized grid cell based on the reference glacier mask (Section 3.1). The temporal resolution of the surface energy balance is that of the ERA reanalysis (6 hours Snow/ice temperature, density and water content are initialized with a 10-year spin-up using the climate data of the 1960s. To start the spin-up, the entire subsurface domain has an initial density of ice (900 kg m −3 ), zero water content, and temperatures of 0 • C at the surface; the latter linearly decrease to -5 • C at 7 m depth, beyond which the temperature remains constant. The initial value of -5 • C at depth is somewhat higher than the mean annual air temperature, and is based on observed temperatures 30 in boreholes (Björnsson et al., 1996). Different initial conditions for the spin-up period had little effect on the results at most locations where the spin-up time was tested. A few exceptions occurred at some locations close to the equilibrium line, where system memory seems to be longer; at these locations differences in annual mass balance of up to 5 cm w.e. yr −1 occurred in the first years following the spin-up period, when a 10-year spin-up is run twice instead of once.
Since model calibration is computationally expensive, only 48 grid cells were selected for calibration. Most of these sites correspond to locations where mass balance is monitored, but additional sites were included to achieve a good spatial coverage 5 over the entire archipelago ( Figure 1) At these additional sites only remote sensing data were available for calibration. Glacier edges were avoided such that the MODIS data (1-km & 0.5-km resolution) were entirely on glacier when resampled to the applied 1-km DEM.
Even though the model is physically based, some parameters do not have a consensus estimate in the literature. Østby et al. (2014) showed that model performance is sensitive to the choice of parameters concerning the turbulent fluxes (roughness 10 lengths) and the albedo parameterization. Here, the model is calibrated using an adaptive Monte Carlo Markov chain solver entitled DREAM ZS (Differential Evolution Adaptive Metropolis) (Vrugt et al., 2009;Laloy and Vrugt, 2012). This approach enables parallel computing, since the Markov chains evolve almost separately. Four parameters were selected for calibration: roughness lengths for snow and ice surfaces, and albedo for firn and ice. DREAM ZS seeks to maximize the likelihood L of an observed quantity (O) and its corresponding quantity predicted by the model (P ) using where N is the number of samples and σ is the uncertainty of the measurement.
Nine different types of data are used for calibration. Table 3 lists the variables along with the measurement uncertainty σ and number of observations N . Uncertainty of stake mass balance is assumed to be 10 cm w.e., in the lower range of reported uncertainties (e.g. Huss et al., 2009;Zemp et al., 2013). We adopt this low value due to the low mass turnover in Svalbard, but 20 acknowledge that higher uncertainties are likely for the accumulation area. For MODIS-derived albedo we apply an uncertainty of 8 % (Section3) and 5 K for MODIS T surf (Østby et al., 2014). For the measurements at the automatic weather station we assume σ=3 W m −2 for longwave radiation (Michel et al., 2008), while uncertainties for the albedo, the sonic ranger data and snow and ice temperatures from the thermistor string, are given by the instrument manufactures Østby et al., 2013). 25 Likelihood functions for each of the nine data types are summed and the parameter set yielding the highest total L is applied in the simulations. Calibrated parameter values and other model parameters are listed in Table S1.

Climatic mass balance
The modeled mean annual B clim averaged over the entire domain for the period 1957 to 2014 is positive (8.2 cm w.e. yr −1 ), Table 3. Data used for calibration: Point annual (ba), summer (bs), winter (bw) mass balances, albedo (α), surface temperature from MODIS (Ts,MODIS), longwave outgoing radiation (L ↑,AWS ) and surface height changes (SRAWS) from a sonic ranger. Subscripts, AWS, indicate that data are from the automatic weather station at Etonbreen. Ti,AWS is snow and ice temperatures measured by a thermistor string next to the Etonbeen AWS. Sites indicate the number of sites for which each of the listed data variables is available, N is total number of observations, and σ is the assumed observation uncertainty.
Data ( yields a trend of -9.6±9.9 cm w.e. decade −1 that is not significant at the 95 %-level. Melt is well correlated (R 2 = 0.93) with B clim , and controls both B clim interannual variability and the trend. Accumulation has a small increase (not significant at 95 %-level) over the period, while there is a significant decrease in refreezing.  Table 7 for details.
.  Q L is close to zero in the 1960s, while it has been mostly positive since 1970. Both the turbulent fluxes have a significant increase over the period. Q G decreases over the period, which we associate with the reduction of snow and firn volumes.
Glacier ice has a higher thermal conductivity than snow and firn, such that heat exchange is more efficient between the surface 5 and the subsurface layers. In addition, higher conductivity during winter enables efficient cooling of the near-surface layers.
The energy flux for melt increases over the study period by 3.8±1 W m −2 per decade. This trend is significant despite large year-to-year variability. During the 1960s, mean Q M (-13 W m −2 ) was less than half of that after 2000 (-30 W m −2 ). Over the same period, the increase in melt is of similar magnitude, rising from 48 to 110 cm w.e. yr −1 . There is a very weak 5 correlation (R=0.13) between winter snow accumulation and B clim . A much higher correlation of 0.55 is found between summer snowfall and B clim , which is linked to its effect on both albedo and refreezing. The correlation between PDD and snow accumulation is smaller (-0.40) such that the low temperatures alone cannot explain the summer snowfall effect. Of all the glacio-meteorological variables in Tables 5 and 6, PDD and albedo are the quantities that can explain B clim best. Q L is relatively small, but it is surprisingly well correlated to both melt and B clim . In the Q L calculation, humidity and temperatures 10 in the air and at the surface are included together with wind speed, thereby integrating several relevant meteorological variables.

Refreezing and subsurface properties
Overall refreezing comprises 26 % of total accumulation, but the amount of refreezing is reduced by 1.2 cm yr −1 decade −1 over the period. With the slight increase of snowfall, the role of refreezing is reduced from 29 % of the accumulation in the Table 5. Summary of modeled energy fluxes (W m −2 ) averaged over all glaciers and each year's melt season (15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) for the period 1957-2014. STD is one standard deviation of the temporal variability, RQ M is the correlation with QM over the melt season, β is the estimate of a linear trend over 1957-2014, and σ slope is slope uncertainty given by two standard deviation. Slopes significant at the 95 %-level are marked in bold (|β| > 2σ slope ). 1960s to 22 % in the 2000s. The refreezing also has a profound effect on the subsurface thermal regime. When meltwater percolates and refreezes it releases large amounts of latent heat; and at the same time, densification leads to an increase in thermal conductivity, thereby intensifying heat conduction.
Through refreezing and prolonged negative B clim , subsurface density and temperature change markedly over the period. Figure 9 shows that refreezing increases with altitude, with large interannual variability. During the cold 1960s there is decreasing refreezing with altitude above 500 m elevation. In this case, refreezing is limited by available water; Figure 9b indicates the 5 presence of cold firn for these areas. After 1975 Svalbard accumulation areas are mostly temperate or near temperate at 15 m depth. Transition from cold to temperate firn is accompanied by a seasonal shift of the main refreezing period. In cold firn, most of the refreezing occurs when percolating water enters cold subsurface layers. While in the temperate firn, a large portion of the refreezing occurs when capillary water refreezes as the firn cools during winter. As the ELA increases, firn area extent is reduced and, accordingly, the potential for heat release through refreezing is also reduced. Henceforth, glacier areas 10 that lose firn due to raised ELA experience subsurface cooling. This occurs at different altitudes around Svalbard, consistent with the regional ELA pattern. In northeastern Svalbard cooling occurs above 100 m a.s.l., while cooling starts at 450 m a.s.l.
in southern Spitsbergen. Figure 9 shows the expansion of the cold ice area, which is predicted for polythermal glaciers in a warming climate, and also shown in model experiments (Irvine-Fynn et al., 2011;Wilson and Flowers, 2013). The regionally differentiated thermal response is shown in Figures S6 and S7. 15 In contrast to the firn area, refreezing in the ablation area results in the formation of superimposed ice. Below the ELA the newly formed superimposed ice ablates later during the same melt season. Figure 10 shows annual area-averaged refreezing, superimposed ice (SI) and internal accumulation, which refers to refreezing below the previous summer surface. Over the period there is a decrease in internal accumulation following the firn area decrease. The reduction in internal accumulation is only partly compensated by refreezing above the previous summer surface. Thus, total refreezing decreases over the period, 20 whereas the amount of SI formation in the SI-zone is quite stable, except in very negative years, when nearly all formed SI ablates later in the melt season.
The area covered by thick firn (dark blue, Figure 10b The overall model performance in terms of B clim is determined by comparing stake mass balance readings and mass balance retrieved from ice cores with the model grid box closest to the measuring site. In total, there are 1459 annual mass balances measurements covering different time periods at the various locations, see Section 2 and Table 7 for specifications. For all these measurements the model slightly underestimates B clim by 1 cm w.e. yr −1 . However the much higher RMSE of 59 cm w.e. yr −1 reveals the existence of compensating errors. At Hansbreen, mass balance is underestimated by more than 100 cm w.e. yr −1 in some years, while mass balance is overestimated at the lower stakes of the other three Spitsbergen glaciers. This is illustrated in Figure 11, which also shows that modeled mass balance gradients are too low for all five glaciers. Mass balance measurements for the period 2004-2013 at Kongsvegen, Etonbreen and Hansbreen used in the calibration, corresponds to about 25 % of the 5 total measurements. When removing the measurements used for calibration, the RMSE is slightly reduced (58 cm w.e. yr −1 ).
Since we apply equal weights to all measurements the impact of Hansbreen (the region with large underestimation) is reduced thereby improving the RMSE.
The model generally overestimates B clim (Tab. 7) at the ice core sites, in contrast to the stakes. At the higher elevations of Holtedahlfonna, Nordenskiöldbreen, and Kongsvegen, Figure 11 shows that B clim and winter balance are underestimated. 10 Although mass balance stakes at these glaciers are in the proximity of the drilling sites, there is no overlap in time, since the ice cores were retrieved at about the that the adjacent mass balance programs began. Assuming that both stake and ice corederived mass balance measurements are correct, model error is not time invariant at these sites. Ice-core-derived mass balances  [1968][1969][1970][1971][1972][1973][1974][1975][1976][1977][1978][1979]. This trend in model performance is driven by melt, and therefore most likely by summer air temperatures. Although B clim is overestimated, winter accumulation is underestimated in the model, which can be attributed to overestimation of sea ice in the reanalysis product at the northwest coast of Spitsbergen. The west coast is usually ice-free year round, but prior to 1979 satellite data were not available to constrain sea ice cover and sea surface temperature 10 in the reanalysis. An erroneous sea ice cover influence the heat and moisture uptake in the reanalyses, from which our forcing data were derived

Model parameters
Due to large computational cost, sensitivity experiments are limited to stake locations on the five target glaciers over the period 15 2004-2013. In contrast to the calibration procedure, here we perturb one parameter at a time to isolate its effect. Table 8 show parameters used in the control run, the perturbation and the B clim response averaged over all stakes and years, relative to the control run (dB clim ) and the geographical variability at all locations (SMB-stakes) in terms of standard deviation (STD). The sensitivity of ice albedo is surprisingly low in comparison to other parameters, while albedo aging parameters (t * ) have rather large impact on B clim . This can be explained by the relatively short exposure of glacier ice at the surface. Even in the ablation area, snowfall is common during the melt season, such that the rate of albedo decay is more important than the actual threshold values. The modeled B clim is robust with respect to the choice of roughness lengths (zo). A 1 K change in the rain-snow Table 7. Climatic mass balances derived from ice cores (B clim ) (cm w.e. yr −1 ) from Pinglot et al. (1999Pinglot et al. ( , 2001 and error ǫ (modeledmeasured) for the three time periods. Z is elavation of the ice core and Zǫ is elevation of the closest DEM grid cell minus ice core elevation.
The year of ice core retrieval depends on the site and is denoted by 199X, see Pinglot et al. (1999Pinglot et al. ( , 2001 for exact years. The error is positive, if modeled balance is larger than measured. Z Zǫ 1963Z Zǫ -1986Z Zǫ 1986Z Zǫ -199X 1963  threshold temperature has a comparable effect as a 10 % precipitation change. More than 90 % of this change is taking place during the melt season. Although a large portion of the precipitation occurs at temperatures close to the phase transition, the effect of the rain-snow threshold on winter mass balance is not so important in terms of mass balance, since most of the winter rain refreezes.

5
We explore the sensitivity of the climatic balance to climate change by applying uniform perturbations in temperature and precipitation to the climate data for the period 2004-2013. First we shift air temperature by ±1, ±2 and ±3 K and then increase precipitation by ±15 % and ±30 % leaving temperature unperturbed. A temperature increase of 1 K results in a B clim decrease of 30 cm w.e., same as found by Van Pelt et al. (2012). The temperature sensitivity of roughly -30 cm w.e. yr −1 K −1 is in the lower range reported from glaciers elsewhere in the world (see De Woul and Hock (2005) and references therein). However, the impact of this low sensitivity on B clim is higher than for other glacierized regions given the low mass flux turnover in Svalbard. This is exemplified by comparing the impact of a temperature perturbation to the mass balance components of the control run over the 2004-13 period, where a temperature increase of 1 K results in a 30 % increase in ablation, and a more than doubled the already negative B clim . A precipitation increase of nearly 40 % would be necessary to compensate a temperature increase of 1 K.
We also perform two perturbations which corresponds to temperature and precipitation projections for 2100 after Førland et al. (2011). Since changes in temperature and precipitation at Svalbard are projected to occur with a substantial southwest-northeast gradient, we perform two experiments. In the first scenario, we increase annual temperature and precipitation by 4 K and 5 5 %, respectively, to represent changes as expected for western Svalbard. In the second scenario, applicable for northeastern Svalbard we use annual increases of 6 K and 30 % precipitation. For these future scenarios we apply seasonality in air temperature, as suggested in Førland et al. (2011), with summer temperature increase of 2.5 K for western Svalbard and 4 K for the northeastern Svalbard. Figure 11 shows that the modeled ELA on the measured glaciers is mostly above the present topography under both scenarios. Note that the predictions of Førland et al. (2011) are for 2071-2100 averages relative to the normal period 10 1961-1990. We apply these increases to the 2004-2013 period which have an annual air temperature of -3.2 • C, already 3.5 K warmer than the normal period .

Glacier mask
We test the impact of choice of glacier mask on B clim by applying the three different fractional glacier masks (Section S2); the reference mask (all-time max), the 2000s mask and the time-varying mask. B clim for entire Svalbard is 4.5 cm w.e. yr −1 for the 15 reference mask, 10 cm w.e. yr −1 for the 2000s mask and 8.2 cm w.e. yr −1 for the time-varying mask. Despite relatively small differences in area and specific mass balance, there is a 100 Gt ice mass difference over the 56 year period for the different glacier masks. In comparison this is more than a third of the annual contribution of all glaciers and ice caps to sea level rise.
Since the area difference between the glacier mask mainly occurs in the lower ablation zone, where B clim fluxes are the largest, a small area change has a substantial impact on the glacier wide B clim . Hence, accurate representation of glacier margins in 20 B clim models are very important. Mass balance using the time-varying mask represents something between the conventional and reference surface mass balance (Elsberg et al., 2001), since glacier area is annually updated, while the DEM is static.  (Tables 4 and 9), is closely related to net shortwave radiation, the largest energy source for melt. Albedo is also indicative of winter snow and summer snowfall events, since it has higher values for snow than for bare ice (Table 9), but is also related to temperature through the precipitation phase, rate of albedo decay and the importance of temperature for melt.

30
With a future climate as projected by Førland et al. (2011), increased precipitation can only partly compensate for enhanced melting. Lang et al. (2015b) argue that increases in future cloud cover will reduce the negative effect of a warming climate on B clim . Statistics of our simulations show that the net radiation is positively correlated with summer air temperatures (Table 9), suggesting that increased cloud cover lead to higher net radiation through longwave radiation despite a decrease in short wave radiation. Cloud cover is not specifically addressed in our sensitivity test and an extrapolation to the future is hence afflicted 5 with large uncertainties.
As in Greenland, melt water retention through refreezing on Svalbard has been proposed as an efficient buffer for mass losses in a future warmer climate (Harper et al., 2012;Wright et al., 2005). However, here we find that refreezing and retention decrease over the 1957-2014 period, both in absolute numbers and as a percentage of the B clim . These findings are in line with other studies on Svalbard (Van Pelt and Kohler, 2015) and Greenland (Charalampidis et al., 2015;Machguth et al., 2016). 10 Even if the superimposed ice area increases, at the expense of the firn area, superimposed ice formation does not compensate for the loss of internal accumulation. In contrast to the Greenland ice sheet, the existing area of cold firn in Svalbard is not large enough to buffer effects of future warming. Furthermore, our model does not simulate impermeable ice layers within the firn; such layers would further decrease the available storage volume by diverting runoff laterally (Mikkelsen et al., 2015).
Despite an overall decrease in refreezing, some low altitude areas along the western coast show an increase due to winter 15 rain events (e.g. Hansen et al., 2014). During and in the first days after rain events substantial amounts of superimposed ice form. However, for the remaining of winter, densification of the snowpack reduces the insulating effect such that the initial latent heat release is largely compensated if not exceeded by intensified cooling.
For a warmer climate, our results suggest a cooling of glacier ice close to the (rising) ELA. At the glacier front, modeled temperatures at 15 m depth increased by 2-4 • C over 1957-2014. It is likely that glaciers that have their snouts frozen to the 20 ground will approach the melting point in the near future, if warming continues. Frozen glacier fronts may act as a plug for the upstream glacier flow, and provide a mechanism to slow the entire glacier (e.g. Dunse et al., 2015). Furthermore, a thermal switch at the glacier base allows sliding and is proposed as surge initiation mechanisms (e.g. Murray et al., 2000). Historical records show that surge frequency at Svalbard is connected with periods of warming and negative B clim (W. Farnsworth, 2015, pers. comm.). Increased ice flow leads to more crevassing, which again promote cryo-hydrological-warming and basal 25 lubrication, acting as a positive feedback in the system (Phillips et al., 2010;Dunse et al., 2015). Although these mechanism are not fully understood, theory suggests that increased runoff and changes in the thermal and hydrological regimes may trigger widespread changes in the velocity structure of large ice masses. To further study the coupling between surface and basal processes, melt rates and near-surface temperatures must be reliably quantified; this requirement emphasizes potential further use of the dataset resulting from our study.

Comparison to other studies
We compare our results to Svalbard-wide mass balance estimates by other studies (see Tab. 10), although direct comparability is hampered by differences in areal coverage or time periods due to lack of definition of areas in the other studies. Also some studies (e.g. based on gravimetric or geodetic methods) include all mass changes, i.e. also those by frontal ablation (e.g. the mass loss due to calving and submarine melt). Figure 12 shows mass balance through time compared with studies summarized in IPCC (2013) along with three other recent B clim -studies. Average mass balance for the respective studies are also listed in Table 10. Overall there is a good agreement with the other studies, but our B clim estimates are more negative than the others for the latter part of the study period, whereas the 5 opposite is true during the first part. Our estimate is expected to be higher than the geodetic and gravimetric estimates since they account for mass loss by frontal ablation, which is estimated to be 13±5 cm w.e. yr −1 (Błaszczyk et al., 2009). Compared to Nuth et al. (2010), our estimate is 30 cm w.e. yr −1 less negative due to more positive B clim in northeastern Spitsbergen. In the same area, ice core derived B clim also suggest that our model overestimates B clim , indicating too high precipitation and/ or too low air temperature in our forcing dataset for this region. The north-south gradient in downscaled mean annual air temperatures

Uncertainties
Uncertainties are introduced at every step in the process chain and accumulate in the simulated B clim . Uncertainties comprise the climate forcing and initial state, model set-up and parameterizations, topographic simplification, and uncertainties of the measurements used for calibration. However, it is hard to quantify the contribution of the individual sources, especially since the calibration may compensate for systematic errors. 5 Glaciological surface mass balance represents a useful quantity for validating our model. Not only because it resembles our goal, mass balance, but it is sensitivite to atmospheric and surface conditions over a large variate of temporal and spatial scales, thus highly valuable for validating land surface models in general. Since subsurface properties (temperature, density and water content) are usually not well captured by glaciological mass balance measurements, the measurement uncertainty is higher in the firn areas compared to the 10 cm w.e. yr −1 uncertainty for glacier ice areas (Section4.4).

10
Model performance at the validation sites (ice cores and stakes) is satisfactory with a mean bias of -1 cm w.e. yr −1 , but substantial compensating errors exist, as indicated by the RMSE of 59 cm w.e. yr −1 . For Svalbard as a whole, B clim is in agreement with other studies, but regionally our B clim is more positive in in northern Spitsbergen and more negative in southern Spitsbergen.
Uncertainties in the climate forcing are probably the main source of error. We substantiate this statement by the importance 15 of air temperature on B clim (-30 cm w.e. yr −1 K −1 ), the low sensitivity of model parameters (Section 5.5), and that the model is capable of reproducing measured B clim , when forced with local climate data at the weather station site (Østby et al., 2013).
Validation at weather stations indicates that our climate forcing is of similar or slightly lower quality as other, more expensive downscaling studies (Claremar et al., 2012;Lang et al., 2015a;Aas et al., 2016).
Downscaled summer air temperatures are in general too low at the coastal weather stations (Tab. 2). In contrast, downscaled 20 air temperatures are too warm at the glacier weather stations. Theses biases, combined with the underestimation of summer mass balance gradients (Figure 11), indicate too low air temperature lapse rates in the downscaling. Since satellite-derived surface temperatures are capped at the melting point, only temperature measurements at elevation substantially higher than the ERA-orography can resolve this issue. The underestimated summer mass balance gradient may, at least partly, be explained by underestimated winter mass balance gradients (Figure 11), through its effect of prolonged snow cover and increased albedo 25 and refreezing.
Downscaled precipitation compares well to measurements at the coastal stations (Section 3.2.1) and seasonal precipitation is mostly well reproduced over the glaciers when compared to ice cores and winter mass balance from stakes and ground penetrating radar. Underestimation of winter mass balance gradients ( Figure 11) is not necessarily indicative of too low orographic enhancement in the LT-model, but could be caused by wind redistribution. Snowdrift accumulates in concave-shaped 30 accumulation areas, while the wind erodes ablation areas that tend to have a convex-shaped surface topography. In contrast, ice cores indicate that precipitation is overestimated at higher elevation in northern Spitsbergen. This bias could be caused by the wind exposed ice core drilling sites at ice field summits. The largest precipitation underestimation (up to 100 cm w.e. yr −1 ) is found at Hansbreen, which is known to have an asymmetrical snow accumulation pattern across the centerline due to wind redistribution (Grabiec et al., 2006).
The TopoSCALE methodology for downscaling climate variables shows several shortcomings when applied to Svalbard. In the European Alps, where the methodology was developed, differences between the coarse reanalysis data and the finer grid for downscaling are governed by the elevation difference between the coarse and fine grid. On Svalbard, large horizontal gradients 5 of atmospheric heat and moisture arise from the interaction between open water, sea ice, tundra and glacier-covered areas, in addition to the vertical gradients. Due to the coarse spatial resolution of the reanalysis, some land areas of our finer topography may be wrongly represented as ocean in the reanalysis, thereby considerably affecting the downscaled variables. For instance, land areas of south Spitsbergen are largely represented as ocean in the ERA land mask, such that sea surface temperatures are incorporated into the downscaling of air temperatures over land and glaciers in the fine scale topography. This gives rise to 10 considerable biases and thereby also affects temperature gradients in the downscaled temperature field. A similar effect arises if an erroneous sea ice mask was employed in the reanalysis. The climate reanalysis is a quite homogeneous product in time, except for sea surface temperatures and sea ice cover, which have been substantially improved with the advent of satelliteborne sensors, and further improved with newer sensors. We link these improvements over time with the higher climate forcing quality after 1980 (Section 3.2.1). Nevertheless, this discontinuity in reanalysis quality coincides with the discontinuity of our 15 composite forcing dataset, that is based on ERA-40 before and on ERA-interim after 1979. To investigate the possibility that the resulting change in mass balance regime may be an artifact caused by this transition, simulations have been conducted over the overlap period 1979-2002 using both reanalyses, but only for the grid points used for calibration ( Figure 1). We find that the ERA40-based simulation yields an about 13 cm w.e. higher mass balance than the ERAinterim-based one, but ERA-40 based simulations still show a 20 cm drop of Bclim between 1970 and 1990, larger than that caused by the dataset discontinuity. This 20 suggests that this change in mass balance regime is not caused by the heterogeneity of our composite forcing. Nevertheless, we cannot rule out the possibility that this change was caused by the discontinuity inherent in both reanalyses due to availability of satellite observations after 1979 (Bromwich and Fogt, 2004;Screen and Simmonds, 2011;Uppala et al., 2005).
Calculations of turbulent fluxes and albedo are the processes within the model that have the highest sensitivities regarding parameter uncertainty on modeled B clim . Model performance is quite robust to the choice of roughness lengths used to calcu-25 late turbulent fluxes, but values typically span several orders of magnitude in the literature. Albedo parameterization is shown to be more sensitive to the aging parameters rather than the actual threshold values used in the parameterization. The applied albedo parameterization does not account for spatial variability due to impurity content, which is possibly a major weakness given that ice albedo varies from 0.15-0.44 across Svalbard (Greuell et al., 2007). Dark bands such as in western Greenland (Wientjes et al., 2011) are also observed in Svalbard, but are not included in the model. Extending the model albedo formu-30 lation to account for dust and impurity content would be a subject for future work. Model parameters concerning runoff and water retention may also be a significant error source, but we lack observational data to evaluate this aspect. The bucket-type water percolation method is also questionable, as infiltration is highly heterogeneous and horizontal fluxes are neglected (e.g. Reijmer et al., 2012;Cox et al., 2015). A 10 year spin-up time is found to be sufficient for the model performance, although sites close to the ELA showed a larger sensitive to the spin-up period in the first years of the run (<5 cm w.e.). From the 35 Svalbard temperature record, we know that the 1960s were colder than the 1950s. Using the cold 1960s as a spin-up period has likely led to a too large potential for refreezing given the lower temperatures of the early period. Additionally, lower temperatures cause less dense firn and further exaggerate the retention potential.
Although the 1-km resolution DEM applied in this study largely reproduces the hypsometry of the 90-m DEM, glacier mask sensitivity reveals differences of 1.8 Gt yr −1 between different glacier masks (Section5.5.3), slightly lower than the 2.1 5 Gt yr −1 mass loss from tidewater margin retreat (Błaszczyk et al., 2009). We assess possible errors introduced by using a static DEM by considering a typical mass balance gradient of 0.2-0.3 cm w.e. m −1 . A uniform elevation change of 10 m for the entire glacier surface would alter the mass balance by 2 cm w.e. yr −1 , equivalent to 1 Gt yr −1 . Observed elevation changes in southern and western Spitsbergen for 1930s to 1990s are typically a few tens of meters . In the early part of the study period, when the DEM difference is largest, this error might amount up to 1 Gt yr −1 .

20
-There is large interannual variability in B clim , which is controlled by summer melt. Decreasing B clim over the study period is primarily due to increased summer temperatures, amplified by the albedo feedback.
-Refreezing plays a major role in Svalbard B clim , representing about a quarter of annual accumulation. With increasing air temperatures over the study period, refreezing and water retention decrease both in absolute amounts and relative percentage of total accumulation 25 -With increasing ELAs, firn extent is reduced and subsurface cooling occurs in the vicinity of the ELA, while subsurface warming occurs in the lower ablation area. Increased runoff and changes in the hydrological and thermal regimes are likely to be important for glacier flow.
-Sensitivity experiments suggest that the expected future precipitation increase cannot compensate for the expected rise in air temperatures. Perturbing temperature by 6 K and precipitation by 30 % (Northestern Svabard) and by 4 K and 5 % (Western Svalbard) as projected for 2100 from a regional climate model (Førland et al., 2011), results in modeled ELA above the summits of most of today's ice caps and ice fields.
Two major shortcomings in the meteorological forcing are identified. First, prior to 1980, temperatures and precipitation are too low. This may be caused by inaccurate representation of sea ice in the reanalysis data for the pre-satellite period.
Second, the downscaling methodology likely introduces biases where the surface type (ocean, sea ice, tundra, glacier) does not 5 match across the scale gap between the coarsely resolved reanalysis and our finer grid resolution. This latter problem may be overcome by using the same downscaling approach on 10-25 km RCM output instead of the ERA-40 and ERA-Interim data, which could potentially improve our results.
In addition to considerable mass change, this study suggests there are significant changes in the thermal and hydrological regimes of Svalbard glaciers, which in turn may have important implications for glacier dynamics, as suggested by recent 10 observation of ice cap destabilization (Dunse et al., 2015); an understudied process that requires further investigation.  configuration and performance of the data assimilation system, Quarterly Journal of the Royal Meteorological Society, 137, 553-597, Table 8. Sensitivity of the climatic mass balance B clim in response to perturbations of model parameters and climate perturbations. dB clim (cm w.e.) is the departure of B clim from the control run averaged over all stakes and years, σ (cm w.e.) is the standard deviation of the difference for both year-to-year and site-to-site. α is albedo, t is aging factor for snow albedo, T rain/snow is rain snow threshold, zo is roughness lengths for momentum necessary for calculating the turbulent heat fluxes. Climate experiments are given by uniform air temperature shifts (K) and precipitation increases (%), but for the last two experiments we apply seasonal differences, see text for explanation. a time scales of aging is 5,15 and 100 days at temperatures of 0 • C (wet),0 • C (dry) and -10 • C.
b roughness lengths for ice and snow respectively. c , d Climate scenario for western and northeastern Svalbard respectively, as in Førland et al. (2011) . Table 9. Correlation coefficients between annual mass balance components, average melt season surface energy fluxes, and selected meteorological variables.
BACCU and BABLA is total annual accumulation and ablation respectively and B ref is annually refrozen water. Precipitation is separated into snow and rain where indices s refers to summer averages (15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31) and w refers to the period 1 Oct-14 May. TA and TS are annual and summer averages of near-surface air temperature over the entire glacier area, respectively.   Table 10. Svalbard mass balance estimates from different studies and methods. Note that the numbers between the studies are not directly comparable due to difference in time periods, areal coverage and methodology. Mass balance method short names are: ICESat is laser altimetry from the Ice, Cloud and elevation Satellite, GRACE is gravimetry using data from the Gravimetry Recovery and Climate Experiment, Mix is a combination of GRACE, ICESat and modelling, WRF, MAR and RCM are three different Regional climate models, Local is direct glaciological measurements, see references for details. Gravimetry studies report total mass balance (Btot), which is the sum of climatic (B clim ), basal balance(B b ) and frontal ablation(A f ), The geodetic studies report (Btot) by integrating elevation changes over a constant area (B tot(A=const) ). Other studies report climatic mass balance (B clim ) or surface mass balance (B sfc  .