Ensemble-based assimilation of fractional snow covered area satellite retrievals to estimate snow distribution at a high Arctic site

Snow, with high albedo, low thermal conductivity and large water storing capacity strongly modulates the surface energy and water balance, thus making it a critical factor in mid to high-latitude and mountain environments. At the same time, already at medium spatial resolutions of 1 km, estimating the average and subgrid variability of the snow water equivalent (SWE) is challenging in remote sensing applications. In this study, we present an ensemble-based data assimilation scheme that assimilates fractional snow covered area (fSCA) retrievals into a simple snow model forced by statistically downscaled 5 reanalysis data so as to estimate peak SWE distributions at the kilometer scale. The basic idea is to relate the timing of the snow cover depletion (that is accessible from satellite products) to pre-melt SWE, while at the same time obtaining the subgrid scale distribution. Subgrid SWE is assumed to be lognormally distributed, which can be translated into a modeled time series of fSCA by means of the snow model. Assimilation of satellite-derived fSCA hence facilitates the constrained estimation of the average SWE and coefficient of variation, while taking into account uncertainties in both the model and the assimilated data 10 sets. As an extension to previous studies, our method makes use of the novel (to snow data assimilation) ensemble smoother with multiple data assimilation (ES-MDA) scheme combined with analytical Gaussian anamorphosis to assimilate time series of MODIS and Sentinel-2 fSCA retrievals. The scheme is applied to high-Arctic sites near Ny Ålesund (79◦N, Svalbard, Norway) where in-situ observations of fSCA and SWE distributions are available. The method is able to successfully recover accurate estimates of peak subgrid SWE distributions on most of the occasions considered. Through the ES-MDA assimilation, 15 the root mean squared error (RMSE) for the fSCA, peak mean SWE and subgrid coefficient of variation is improved by around 75%, 60% and 20% respectively when compared to the prior, yielding RMSEs of 0.01, 0.09 m water equivalent (w.e.) and 0.13 respectively. By comparing the performance of the ES-MDA to that of other ensemble-based batch smoother schemes, it was found that the ES-MDA either outperforms or at least nearly matches the performance of the other schemes with regards to various evaluation metrics. Given the modularity of the method, it could prove valuable for a range of satellite-era 20 hydrometeorological reanalyses. 1 The Cryosphere Discuss., https://doi.org/10.5194/tc-2017-109 Manuscript under review for journal The Cryosphere Discussion started: 4 July 2017 c © Author(s) 2017. CC BY 4.0 License.


Introduction
The seasonal snow cover exhibits a high albedo, low thermal conductivity, large water holding capacity, strong smoothing effect on the terrain as well as a considerable variability both in depth and horizontal extent. Consequently, the distribution of the seasonal snow cover in time and space is a key control on the terrestrial surface energy and mass balance (e.g. Boike et al., 2003). Thereby, snow is a central modulator of the global radiation balance and hydrological cycle and is thus one of the drivers 5 of the atmospheric circulation and the associated climate (Liston, 1999;Andreadis and Lettenmaier, 2006). For example, in mid and high latitude regions, as well as in mountainous areas, (a combined region that accounts for about one quarter of the global gross domestic product) the snow melt plays a dominant role in the seasonal patterns of the stream flow (Barnett et al., 2005). In terms of water resource management, there is thus much to be gained in being able to map the spatiotemporal distribution of snow water equivalent (SWE) across this extensive region. 10 The primary controls on the distribution of SWE are topography, vegetation, precipitation, wind, radiation and avalanching (Sturm and Wagner, 2010). Of these the former two are relatively fixed temporally. The remaining dynamic controls vary significantly not just in space, but also in time. All of these controls occur across a range of scales and thus cause great variability in snow depth. For example, snow tends to be affected by wind drift in non-forested regions (e.g. Gisnås et al., 2014) 15 leading to accumulation in areas with preferential deposition such as topographic depressions or the lee-side of a ridge, but the scale of such features varies dramatically across the landscape (Clark et al., 2011). Therefore, especially due to the dynamic controls, the mapping of SWE distributions is particularly challenging. Nonetheless, the processes occurring at a given site are sometimes quite consistent from year to year and so the SWE distribution pattern in any given year may be quite similar to the climatological snow distribution pattern (Sturm and Wagner, 2010). For mapping SWE distributions over large areas, 20 manual measurement surveys usually have both limited support, large spacing and small extent (Blöschl, 1999) and are thus exhausting, impractical and expensive as well as potentially hazardous in steep terrain due to avalanches. Instead, modeling, remote sensing or a combination of the two can be employed to map SWE. Snow models range in complexity from relatively simple single layer models, such as the Utah Energy Balance model (UEB; 25 Tarboton and Luce, 1996;You et al., 2014), to detailed multi-layer snowpack models such as Crocus (Vionnet et al., 2012) and SNOWPACK (Bartelt and Lehning, 2002). Many snow models (e.g. ALPINE3D; Lehning et al., 2006) can also be run in distributed mode so as to predict the snow distribution over large areas. The accuracy of the model results are limited by the hydrometeorological forcing data, be it from reanalyses or local measurements, whose errors are typically the major source of uncertainty in snow modeling (De Lannoy et al., 2010;Raleigh et al., 2015). A problem with snow models is that they are 30 often developed as point scale models and even if they are run as distributed models there is no guarantee that the grid scale values predicted by the model are representative of the corresponding process scale (Blöschl, 1999). Indeed, for the example of a model forced by near-point scale hydrometeorological measurements, the model predictions will only be representative of some grid-scale if that particular point is representative of the mean conditions within the grid cell which is generally unlikely 2 The Cryosphere Discuss., https://doi. org/10.5194/tc-2017-109 Manuscript under review for journal The Cryosphere Discussion started: 4 July 2017 c Author(s) 2017. CC BY 4.0 License.
given the large spatial variability in snow depths (Clark et al., 2011). To circumvent this problem probabilistic snow depletion curve (SDC) parametrizations have been developed (Liston, 1999;Luce and Tarboton, 2004;Liston, 2004). Therein, a distribution function is assigned to the SWE within a grid cell at peak accumulation and, along with an assumption of uniform melt across the grid cell, this allows for a direct relationship between the mean SWE and the fractional snow covered area (fSCA) of the grid cell. Liston (2004) used such an SDC parametrization in conjunction with a dichotomous key (a simple classification 5 scheme) for the subgrid coefficient of variation of SWE with the ClimRAMS model at 80 km resolution to map the fSCA over North America; including the SDC in ClimRAMS increased the total snow-covered area considerably compared to the control run. More recently, Aas et al. (2017) used a mosaic, or tiling, approach to represent subgrid snow variability in the weather research and forecasting (WRF) model coupled to the Noah land surface scheme at 3 km resolution over southern Norway. The tiling simultaneously reduced the cold bias in the model and greatly improved the match to the observed fSCA evolution. Still, 10 as mentioned, modeling alone is in itself arguably not an effective tool for mapping SWE distributions in hindcast mode due to the inherently large uncertainties in the hydrometeorological forcing. Instead, models need to be combined with relevant data from remote sensing.
Snow-related data sets can be acquired from a variety of remote sensing platforms. The Gravity Recovery and Climate Ex-15 periment (GRACE) twin satellites allow for the retrieval of terrestrial water storage (TWS), from which SWE can be recovered at around 100 km spatial resolution (e.g. Niu et al., 2007). Similarly passive microwave (PM) satellite sensors can retrieve SWE based on brightness temperature at a somewhat finer spatial resolution of around 25 km. However, PM SWE retrievals have problems over forested areas and complex topography, as well as for wet and deep snowpacks (Foster et al., 2005). Both gravimetric and PM sensors are able to retrieve SWE independent of cloud coverage, generally resulting in gap-free time series. 20 While not capable of measuring SWE, moderate resolution optical sensors such as MODIS can retrieve binary information on snow cover (i.e. snow or no snow), fSCA and snow grain size (Hall et al., 2002;Appel, 2004, 2006;Painter et al., 2009) at approximately 500 m spatial resolution at a daily revisit frequency. In addition, there exists high resolution optical sensors, e.g. on board the LandSat and Sentinel-2 satellites, that can map snow cover and fSCA at 30 m spatial resolution (e.g. Cortés et al., 2014). Such optical sensors can not see through a cloud cover, which results in extended data gaps in most 25 snow-covered regions. To obtain continuous time series, it is necessary to either interpolate such remote sensing data in time and space, or ingest them into models.
Data assimilation (DA) methods can objectively fuse uncertain information from observations and models. The simplest form of snow data assimilation are the so-called deterministic SWE reconstruction techniques (Girotto et al., 2014b) that make 30 use of direct insertion of remotely sensed fSCA data. Such schemes back-calculate peak SWE from the disappearance date of the snow cover (as determined from fSCA retrievals) using snow melt models. Martinec and Rango (1981) used retrievals of fSCA from LandSat images during the melt season in conjunction with a simple degree day snow melt model to estimate the peak mean SWE at 1 km spatial resolution. Similarly, Cline et al. (1998) also used fSCA retrievals based on LandSat imagery, but this time combined with a distributed energy balance model, to reconstruct the SWE distribution at 30 m resolution. More recently, Molotch and Margulis (2008) used fSCA information from multiple sensors (LandSat ETM+, MODIS and AVHRR) to reconstruct the SWE distribution at 100 m resolution. Durand et al. (2008) introduced a probabilistic framework for SWE reconstruction which was based on assimilating synthetic fSCA retrievals during the ablation into the SSiB3 model coupled to the probabilistic SDC of Liston (2004) using the ensemble smoother (ES; Van Leeuwen and Evensen, 1996) in batch mode (c.f. Dunne and Entekhabi, 2005). The assimilation of fSCA in this synthetic, or 'twin', experiment was used to correct annual 5 biases in the snowfall, and enabled the recovery of the SWE distribution at 90 m resolution. Through the assimilation the bias and root mean square error (RMSE) of the SWE estimation was reduced by 86% and 78% respectively for pixels with less than 90% vegetation cover. Girotto et al. (2014b) discussed how the method of Durand et al. (2008) was a generalization of the previous deterministic SWE reconstruction techniques cast in a probabilistic framework. In addition, Girotto et al. (2014b) assimilated LandSat based fSCA retrievals with the same DA framework to recover the peak SWE distribution at 90 m reso-10 lution. A significant reduction in RMSE was found when comparing the probabilistic to the deterministic SWE reconstruction techniques. Subsequently, Girotto et al. (2014a) used the same framework to perform a 27 year reanalysis of SWE distributions at 90 m resolution. Recently, Margulis et al. (2015) modified the probabilistic SWE reconstruction approach by adopting a particle batch smoother (PBS) as opposed to the ES for the assimilation of fSCA retrievals to estimate the SWE distribution at 90 m resolution. The PBS was found to perform favorably compared to the ES, considerably reducing the RMSEs compared to 15 in-situ measurements. Based on this work, Margulis et al. (2016) adopted the same framework to construct a 30 year reanalysis of SWE over the Sierra Nevada (USA) using LandSat-based fSCA retrievals also at 90 m resolution. The resulting posterior SWE estimates had an RMSE of 0.11 m and 0.13 m and correlation coefficient of 0.97 and 0.95 for snow pillows and snow courses respectively. 20 In addition to deterministic and probabilistic SWE reconstruction, there have been several other snow data assimilation techniques employed in recent studies. Andreadis and Lettenmaier (2006) assimilated MODIS fSCA retrievals into the VIC model through the ensemble Kalman filter (EnKF; Evensen, 1994) using a simple SDC for the SWE-fSCA inversion. However, the improvement, compared to the open-loop (no DA) run, in SWE estimates was only modest, which was also found in similar studies using the EnKF Slater and Clark, 2006). A fully Bayesian DA technique was used by 25 Kolberg and Gottschalk (2006) to assimilate fSCA retrievals from LandSat 7 ETM+ images into a snow model with a probabilistic SDC to estimate the peak SWE distribution at 1 km resolution. They found a significant reduction in uncertainty when two retrievals were assimilated simultaneously as opposed to sequentially. At the continental scale, a multisensor assimilation of both GRACE TWS and MODIS fSCA using the ES and EnKF for TWS and fSCA respectively yielded significant improvements compared to the open loop (Su et al., 2010 Foster et al., 2005).
Of late, particle filter (PF; see Van Leeuwen, 2009) schemes have been gaining popularity in snow DA studies (Charrois et al., 2016;Magnusson et al., 2017). For example, Charrois et al. (2016) assimilated synthetic MODIS-like optical reflectance retrievals into the detailed snowpack model Crocus using the sequential importance re-sampling PF at a point scale, which 5 resulted in a 45% reduction in RMSE in both snow depth and SWE when compared to the open loop. It is worth emphasizing that the most popular schemes in the snow DA community, i.e. both the EnKF and the PF, are filters meaning that they are sequential techniques. As such, they are Markovian of order 1 ('memoryless'); i.e. the future state at a given point in time depends only on the present state. Moreover, observations are assimilated sequentially with only the current observation affecting the current state. Batch smoothers (see Dunne and Entekhabi, 2005), on the other hand, take into account the entire history of 10 a model trajectory within a batch (observation window), and as such have memory (non-Markovian), so that they are better suited for reconstruction, i.e. reanalysis, type problems. Within petroleum reservoir geophysics, an entire subfield is dedicated to history matching (c.f. Le et al., 2016) that makes ubiquitous use of ensemble batch smoother type schemes.
In this study, we build on the probabilistic SWE reconstruction techniques outlined in Girotto et al. (2014b) and Margulis 15 et al. (2015) to recover subgrid SWE distributions (SSD) for a study area in the high Arctic, based on fSCA retrievals from MODIS and Sentinel-2. The novelty of our study lies in the use of an iterative batch smoother scheme, namely the ensemble smoother with multiple data assimilation (ES-MDA; Emerick and Reynolds, 2013), which is popular in the aforementioned history matching community. To update parameters that are bounded in physical space we also make explicit use of analytical Gaussian anamorphosis (Bertino et al., 2003). We investigate the performance of the ES-MDA in terms of SWE reconstruction 20 and compare it to the ES and the PBS employed by Girotto et al. (2014b) and Margulis et al. (2015) respectively. The results are evaluated against independent in-situ measurements of fSCA and peak SSD conducted over six snow seasons. As both the forcing data and satellite retrievals are available globally, the presented modular technique holds great potential for estimating peak SSD at a range of spatial scales.

Physical characteristics and climate
The high-Arctic study area is located in NW Svalbard close to the research town of Ny-Ålesund (78 • 55 N, 11 • 50 E) on the Brøgger peninsula, where in-situ measurements are available from three sites ( Figure 1). "Bayelva" about two kilometers west of Ny-Ålesund is the main study site from where multi-year in-situ records on e.g. surface energy balance, permafrost thermal regime and snow distribution are available (Boike et al., 2003;Westermann et al., 2009;Gisnås et al., 2014). In addition, snow 30 surveys for a single season are available from "Steinflåen plateau" and "Kvadehuksletta". All sites feature undulating topography, with surfaces characterised by patterned gorund features, leading to strong differences in snow cover due to wind drift.
While both Bayelva and Kvadehuksletta are located between 10 and 50 m a.s.l., the Steinflåen plateau is at a slightly higher  elevation of around 200 m. Furthermore, the wind regime features considerable differences between the sites, with Kvadehuksletta exposed to most wind directions, while Bayelva and Steinflåen plateau are partly sheltered by mountain ranges. The study sites are located within the continuous permafrost zone (Boike et al., 2003) with an active layer depth on the order of 1.5 m at the Bayelva site (Westermann et al., 2009). The Bayelva site is located around the heavily instrumented Bayelva climate and soil monitoring station (see Westermann et al., 2009). The surrounding area has been the subject of extensive study on a wide range of topics spanning permafrost (Roth and Boike, 2001;Boike et al., 2008;Westermann et al., 2011a), the surface energy balance (Boike et al., 2003;Westermann et al., 2009), CO 2 exchange (Lüers et al., 2014;Cannone et al., 2016), ecology (Kohler and Aanes, 2004), snow (Gisnås et al., 2014;López-Moreno et al., 2016), hydrology (Nowak and Hodson, 2013) and satellite retrieval validation (Westermann et al., 5 2011b(Westermann et al., 5 , 2012. At Bayelva, the surface cover alternates between bare soil, rocks and sparse vegetation in the form of low vascular plants, mosses and lichens (Westermann et al., 2009;Gisnås et al., 2014). The surface cover at Kvadehuksletta is similar to that at Bayelva, but the more elevated Steinflåen plateau is predominantly covered by loose rocks.
The climate of western Svalbard is influenced by the relatively warm West Spitsbergen current causing a maritime climate 10 with mild winters and cool summers for this latitude (Esau et al., 2012). At Ny-Ålesund the winter, summer and annual  average air temperatures were -12.0 • C, 3.8 • C and -5.2 • C, respectively, while the average annual precipitation was 427 mm (Førland et al., 2011). Between September/October and May most of the precipitation falls as snow, although rain on snow events have become more frequent due to the recent observed warming of the local climate (Nowak and Hodson, 2013;López-Moreno et al., 2016). The perennial snow cover usually forms in late September or early October and lasts until mid 15 June to early July, with around a month long melt season (Winther et al., 2002). Measurements at the Bayelva site show that the energy available for snow melt is dominated by radiation, both shortwave and longwave (Boike et al., 2003;Westermann et al., 2009). In addition, the energy required to warm the frozen ground beneath the snow needs to be considered in the energy balance during the snow melt (Boike et al., 2003). 20 Manual surveys of snow depth and density were carried out in late April/early May for six years at the Bayelva site (see Gisnås et al., 2014, for details) and for one year (2016) at the two other sites (Table 1). At this time, just before the onset of the melt, the snow depth is near maximum but the snowpack is still dry. The snow density was sampled in vertical layers at every fifth point.

Field measurements
As no systematic stratification of the snow density was found, SWE was finally calculated from snow depth and the average snow density computed from all measurements in a given year. At Bayelva, the snow density was generally confined to a range 25 of 350±50 kg m −3 for all the surveys, while at Steinflåen plateau, the snow density was found to be around 450 kg m −3 in 2016. At Kvadehuksletta and Steinflåen plateau, the surveys were conducted along transects with regular sample intervals (see Figure 1), while a randomized array of sample points (see Gisnås et al., 2014, for details) was employed for Bayelva in most years, apart from the first years where we also had transects for Bayelva (see Table 1). Basal ice layers occur in the area in conjunction with rain on snow events (Kohler and Aanes, 2004;Westermann et al., 2011a), and can constitute a major source of uncertainty for SWE measurements, both for single measurements and the resulting spatial distribution. In 2016, the depth of basal ice layers was measured using ice screws and their contribution to the SWE was accounted for. In addition, not accounted for internal ice layers and the spatial variability of average snow densities (see above) contribute to the uncertainty of the measurements. Finally, only a limited number of sampling points is considered, so 5 that the obtained snow distributions are expected to deviate to a certain extent from the true snow distributions in the area (c.f. Gisnås et al., 2014).
In 2012, 2013 and 2016, an automatic time-lapse camera was deployed near the summit of Scheteligfjellet (694 m a.s.l.; c.f. Figure 1) overlooking the Bayelva site. The camera was a standard digital camera triggered by a Harbortronics time-lapse 10 system, delivering daily images except for prolonged periods with low cloud cover. The raw camera images were orthorectified at a 1 m resolution and the snow cover was mapped for each pixel using a simple threshold on the grayscale intensity, so that fSCA could thus be determined for each image. The orthorectified images for most of the years are freely available through Westermann et al. (2015a). 15 In 2008, aerial images were obtained for the Bayelva site for five dates in June during the beginning of the snow melt period. For this purpose a digital camera was mounted to a UAV flying at an altitude of 100 to 250 m above ground which took between 700 and 1000 images per mission at nadir angles. As the images were taken in a near-random fashion over the entire area, fSCA was calculated by averaging over the fSCA determined for each image using a simple threshold criterion. Note that orthorectification of the images is not necessary since a sufficient number of images taken from random positions and/or 20 altitudes is available. GPS-based surveys of remaining snow patches were available for five additional dates, so that a complete fSCA time series is available for the snow melt period in 2008. The model is a blend of a single layer mass balance model, based on the UEB model (detailed in Tarboton and Luce, 1996;5 You et al., 2014) and a snow depletion curve parametrization based on the subgrid snow distribution submodel described in Liston (2004). Snow internal processes are not considered. In the following sections, we describe the governing equations of the SSM (see Table 2 for an explanation of constants).

Snow depletion curve
We use the probabilistic snow depletion curve formulations discussed in Liston (1999), Luce and Tarboton (2004) and Liston

10
(2004) which parametrize the relationship between fSCA and SWE by using a probability density function (pdf) to represent the subgrid SWE distribution (SSD). A central assumption in these parametrizations is that the melt rate is uniform throughout the given grid cell. The relationship between the accumulated melt depth (D m ), the peak SSD pdf which we denote f P , and the fSCA within the grid cell at time t is given by (1) 15 Similarly the mean SWE depth is given by Following Liston (2004) we parametrize the peak SSD using a two parameter lognormal pdf f P = f P (D|µ, χ) where µ is the peak mean SWE and χ = σ/µ is the peak subgrid coefficient of variation (σ is the standard deviation). Our choice of parametric distribution was motivated by independent measurements of the SSD which fit reasonably well to a lognormal distribution for 20 most of the years. Equations (1) and (2) can both be solved analytically as presented in Liston (2004).

Mass and Energy Balance
To obtain the instantaneous net accumulation rate, A(t), we follow the UEB model through (You et al., 2014) where P (t) is the precipitation rate and M (t) is the melt rate. Sublimation is not considered as it is a relatively small contribu- 25 tion to the mass balance at our study area (Westermann et al., 2009). We use a linear transition to delineate between snowfall and rainfall as in You et al. (2014) with thresholds given in Table 2. We only consider rainfall as a positive contribution to 9 The Cryosphere Discuss., https://doi.org/10.5194/tc-2017-109 Manuscript under review for journal The Cryosphere Discussion started: 4 July 2017 c Author(s) 2017. CC BY 4.0 License.
the mass balance during non-melting conditions when the rainwater generally refreezes in the snowpack (Westermann et al., 2011a). For melting conditions (where D m > 0), we assume that rainfall becomes runoff directly.
The melt rate, M , is calculated based on a simplified snow energy balance. Here SSM differs from UEB in that we assume that the snowpack is always isothermal and at 0 • C under melting conditions. In this case, the energy balance becomes where Q M is the snow melt flux, Q * R is the global radiation, Q P is the heat advected by precipitation, Q H is the sensible heat flux, Q E is the latent heat flux and Q G is the ground heat flux. The last three fluxes are defined as positive when directed away from the surface and vice versa for the first two on the right hand side of (4). The global radiation is given by 10 in which S ↓ and L ↓ are the downwelling shortwave and longwave irradiances and the last term is the upwelling longwave radiation for the isothermal snowpack at T 0 =273.15 K. The snow albedo (α S ) is parametrized prognostically through the continuous reset formulation (Dutra et al., 2010), which differentiates between accumulating, steady and ablating conditions as follows for a time increment ∆t 15 where α min and α max are the minimum and maximum snow albedo values respectively, while τ A and τ F are aging (decay) rates for non-melting and melting snow respectively, and finally τ S is a threshold daily snowfall which if exceeded leads to a reset of the snow albedo to its maximum value. This simple decay and reset type of snow albedo parametrization has been shown to perform reasonably well at our study site in the study of Pedersen and Winther (2005). The heat advected by rainfall, Q P , is diagnosed as in Tarboton and Luce (1996), while the turbulent fluxes of sensible (Q H ) and latent (Q E ) heat are evaluated 20 through Monin-Obukhov similarity theory as presented in Westermann et al. (2016a). The ground heat flux is parametrized through a simple e-folding relationship during the melting period, i.e.
where Q 0 is the initial ground heat flux, d H is the thermal diffusivity of the ground and z E is the effective depth of the heat transfer below the base of the snowpack, while t m is the number of days with melting conditions after peak accumulation. The With all terms in the energy balance diagnosed the snow melt flux Q M can now be evaluated through eq. (4). We recall that an isothermal snowpack at 0 • C is assumed for (3) and (4), which is only justified for a melting snowpack: in this case, positive Q M correspond to melting and SWE reduction, while negative values correspond to refreezing of melt water and thus SWE increase. For a dry snowpack (as is generally the case before the snow melt), negative Q M values would lead to a cooling of the snowpack, which is not considered in this simple snow melt scheme. For simplicity, we work with a daily time step. To 5 discard unphysical values (negative melt rates) we only consider days with net melting conditions, i.e. positive daily average snow melt fluxes. Thus, the daily averaged melt rate M n at day n (lasting from t n to t n+1 ) is obtained through where ρ w is the density of fresh water, L f is the latent heat of fusion and ∆t is the daily time step. It is worth emphasizing that with the formulation in (8) the effects of refreezing are still considered at subdaily time resolution. Similarly, the daily 10 averaged precipitation rate is obtained through where we reiterate that P (t) only includes the snowfall during the melt season. Now the daily averaged net accumulation rate is obtained through 15 and the accumulated melt depth D m is accounted for through The peak mean SWE µ is updated via where in both (11) and (13)  Consequently, in (13) the peak mean SWE µ is only non-zero if Λ exceeds the threshold τ S . Note that the formulation in (11) gradually resets the melt depth towards zero in the case of snowfall after the onset of melt following Liston (2004). This means that fSCA is not reset to unity in the case of new snowfall after a melting period unless the new snowfall leads to an increase 25 in the peak SWE. In the study area, snowfall events occurred only rarely during the snow melt period and the new snow cover lasted only a short time. At sites where such events are more frequent, Durand et al. (2008) presents an alternative solution, albeit at an increased computational expense. The initial conditions for the SSM are quite simple with the model integration being carried out independently for each hydrological year starting in the beginning of September where we assume that the surface is free of snow so that both µ and D m are initialized as zero. Moreover, both µ and D m are reset to zero following the complete disappearance of the snowpack within a grid cell defined (due to the infinite tail of f P ) as when fSCA < 0.01. 5

Forcing
Forcing terms in the form of precipitation, air temperature, relative humidity and wind speed, as well as downwelling longwave and shortwave radiation are required to diagnose the mass and energy balance in (3) and (4). These terms were obtained by downscaling ERA-Interim reanalysis data (Dee et al., 2011) at 0.75 • resolution following Østby et al. (2017). This method uses the linear theory of orographic precipitation in Smith and Barstad (2004) Tarboton and Luce (1996) ρ w Density of fresh liquid water 10 3 kg m −3 Tarboton and Luce (1996) σ SB Stefan-Boltzmann constant 5.67 × 10 −8 W m −2 K −4 Tarboton and Luce (1996)

Satellite retrievals
We make use of satellite retrievals between May and September which contained the snow melt period in all the investigated years. Only observations that fell inside the melt season were assimilated as this is where information about the true snow cover depletion is contained. Due to frequent cloud cover, the effective revisit frequency of fSCA retrievals is irregular, with prolonged data gaps occurring regularly.  2004,2006). The NDSI exploits the fact that snow is highly reflective in the visible, but a good absorber in the shortwave infrared which sets it apart from other natural surfaces such as clouds, vegetation and soil (Painter et al., 2009).
We average over all the pixels considered for each study area as shown in Figure 1. If both Terra and Aqua MODIS fSCA retrievals are available for a given day, the Terra MODIS retrievals are prioritized due to faulty Aqua MODIS band 6 detectors 15 and the subsequent use of band 7 in the NDSI causing greater pixel misregistration (Salomonson and Appel, 2006). If only Aqua MODIS fSCA retrievals are available for a given pixel, these are still used to maximize the temporal coverage of the snow cover depletion. Despite small deviations in the measurement footprint (Figure 1), we compare MODIS fSCA retrievals to the ground truth obtained from the automatic camera system, UAV surveys and ground based measurements (Section 2.2).
From this comparison, we estimate a RMSE of σ MOD = 0.13 for the MODIS retrievals, which is used for the observation error 20 covariance matrix in the assimilation step (Section 3.3.2).

Sentinel-2
For the year 2016, we complement the MODIS fSCA retrievals with aggregated 20 m resolution retrievals from the Sentinel-2A and 2B missions (Drusch et al., 2012). fSCA estimates are derived from the level 1C orthorectified top of the atmosphere (TOA) reflectance product, with cloud-free scenes manually selected. For this purpose, NDSI is computed from reflectances 25 (r) from a visible (b3, centered on 0.56 µm) and a shortwave infrared band (b11, centered on 1.61 µm) through As b3   in Figure 2. By comparing the Sentinel-2 retrievals to the ground truth fSCA from the automatic camera system in 2016 we estimate a RMSE of σ S2 = 0.09, which is used for the observation error covariance matrix in the assimilation step.

Ensemble Data Assimilation
In the following section we will first describe how the prior ensemble of model realizations is setup and subsequently how this is updated to a posterior ensemble through the assimilation of Terra/Aqua MODIS and Sentinel-2 fSCA retrievals using SWE distribution using a novel land surface data assimilation scheme adopted from the petroleum reservoir history matching community (Emerick and Reynolds, 2013) in conjunction with analytical Gaussian anamorphosis (Bertino et al., 2003).

Ensemble Generation
The prior ensemble of SSM realizations is generated by independently drawing parameter values from the distributions listed in Table 3. These parameters are held constant throughout the integration of the SSM. Two of these are multiplicative bias 5 parameters that perturb the mass balance through the net accumulation rate for j ∈ 1 : N e where N e is the number of ensemble members. We inherently assume the model forcing to be the major source of uncertainty (De Lannoy et al., 2010;Raleigh et al., 2015), and that the error in the forcing can be modeled through constant multiplicative biases in the mass balance. Consequently, each of these bias parameters is modeled as a positive definite lognor-10 mal random variable. This is in line with the perturbations in Girotto et al. (2014b) on the precipitation rate, but we also perturb the melt rate. Moreover, we assume that the ensemble of net accumulation rates is on average unbiased due to the applied downscaling method (Østby et al., 2017) and thus assign the two bias parameters a mean of unity. The precipitation rates are also perturbed by the same bias parameter in the computation of the heat advected by precipitation (Q P ) in the surface energy balance that contributes to the melt rate M n .  In addition to the mass balance forcing, the peak subgrid coefficient of variation χ (Section 3.3.1) is a source of uncertainty.
Here our prior assumption is that χ is on average close to the value given in the dichotomous key (simple classification scheme) of Liston (2004). For the tundra environment of our study sites this amounts to a mean value of 0.4. In addition, we assume that χ is double bounded between 0 and 0.8, with negative values being unphysical and the upper bound close to the maximum value in the key of Liston (2004), and thus model it as a logit-normal random variable. Likewise, both the initial ground heat 20 flux at the onset of melt Q 0 , and the minimum snow albedo α min are uncertain and we model these parameters as double bounded logit-normal random variables (see Table 3 for values). The logit transform for a variable x bounded between a and b is given by while the inverse transform is given by To generate a prior ensemble of a logit-normally distributed random variable: we apply the logit transform to the mean, add 5 N e ensemble members of Gaussian white noise with a consistent variance and then apply the inverse transform. We emphasize that by perturbing the forcing, ground heat flux and coefficient of variation and performing an ensemble integration of SSM we also get an ensemble of states (fSCA n,j , D m,n,j ,D n,j ), remaining parameters (µ j ) and pdfs (f P,j ) that are consistent with the fixed prior parameter ensemble.

10
Herein we will describe the main batch smoother scheme used in the assimilation, namely the ES-MDA, in addition to the ES and the PBS that are used for comparison. In a batch smoother all the observations, in this case all fSCA retrievals from the snow cover depletion during a given melt season, are assimilated at once in a single batch (Dunne and Entekhabi, 2005), as opposed to sequentially as for a filter (Bertino et al., 2003). For clarity, we follow as closely as possible the conventional notation in the data assimilation literature as laid out in Ide et al. (1997). observations during the ensemble integration into a batch and performing the analysis step is referred to as one assimilation 25 cycle and we will let the current iteration number be denoted as . In such a case, the ES-MDA scheme is set up as follows, for ∈ 0 : N a iterations: 1. Run an ensemble model integration, that is for n ∈ 0 : (N t − 1) time steps compute where M is the SSM operator defined through equations (1), (2), (11) and (13).

Collect the batch of predicted observations
and perturbed observations where ⊗ is the outer product, 1 is an N o ×1 vector of ones, the T superscript denotes the transpose, α ( ) is the observation where ψ is the natural logarithm transform for the bias parameters and the logit transform (15) for the remaining param-10 eters.

Perform the Kalman-like analysis step in the transformed space
where the transformed parameter-predicted observation error covariance matrix and the predicted observation error covariance matrices are given by and C ( ) respectively, in which primes ( ) denote anomalies (deviations from the ensemble mean).

Apply the appropriate inverse transforms to recover the updated parameters
where ψ −1 is the exponential transform for the bias parameters and the inverse logit transform (16) for the remaining parameters.
The ensemble smoother (ES; Van Leeuwen and Evensen, 1996) scheme is recovered in the algorithm above for N a = α ( ) = 1, that is without inflating the observation error covariance and in the absence of iterative analysis steps. This is the algorithm that was used in the probabilistic SWE reconstruction scheme of Durand et al. (2008) and Girotto et al. (2014b). It is the observation error inflation coefficient α ( ) in (21), along with the iterations, that sets the ES-MDA scheme apart from the traditional ES scheme. The idea with the ES-MDA is to perform multiple smaller analysis steps as opposed to one abrupt analysis step. In the case of a non-linear models this is expected to yield a better approximation of the true posterior (Emerick and Reynolds, 2013). For the ES-MDA to give a nearly unbiased estimate (c.f. Stordal and Elsheikh, 2015) it is required 5 that the coefficients satisfy Na−1 =0 1 α ( ) = 1. In our case this is accomplished by setting all the coefficients as α ( ) = N a and specifying N a before any assimilation cycles are carried out. As an alternative that is not pursued here an adaptive form of the scheme is presented in Le et al. (2016) where the inflation coefficients are calculated on the fly based on a cost function and the iterations stop once the algorithm converges. It is worth emphasizing that the analysis step (21) only updates the parameters and a consistent ensemble of states are found from the subsequent ensemble model integration. As mentioned, the parameter matrix 10 Θ in (21) is transformed through analytical Gaussian anamorphosis (Bertino et al., 2003) to ensure that the priors are Gaussian in which case the ensemble smoother-based analysis step is variance minimizing in the case of a linear model (Van Leeuwen and Evensen, 1996). The entire methodology, with ES-MDA as the DA scheme, is depicted in Figure 3.
In the particle batch smoother (PBS), as introduced for snow assimilation by Margulis et al. (2015), an initial ensemble 15 integration of the model is carried out to obtain the prior state and parameter estimates. A priori, each particle (i.e. ensemble member; Van Leeuwen, 2009) is given an equal weight of 1/N e . Subsequently, following Margulis et al. (2015), the normalized posterior importance weights w j ∈ [0, 1] are diagnosed through the expression where X j = [X j ; Θ j ] is the augmented state vector for the j-th particle and the Gaussian likelihoods are given by Note in particular that (25) is a direct application of Bayes' rule with the normalizing denominator having two important consequences. Firstly, c 0 = 1/ (2π) No |R| cancels out thus avoiding errors introduced through floating point arithmetic ((2π) No is a relatively large number). Secondly, the prior weights p( X j ) also fall out as they are equal (1/N e ) for all particles. Thereby, with this choice of likelihood, the analysis step (25) simplifies to a simple ratio of likelihoods where the posterior weights w j sum to unity. The posterior ensemble will still span the range of the prior ensemble, as the analysis step only changes the relative weights of the ensemble members and not their position within the state and parameter space.  are readily recovered thus enabling a diagnosis of quantile values. Note that the PBS is equivalent to running a particle filter without re-sampling and using the prior as the importance density (see Van Leeuwen, 2009). Consequently, the PBS is in itself not novel as a data assimilation scheme; in fact it is very similar to the generalized likelihood uncertainty estimation method (GLUE; Beven and Binley, 1992) with the choice of a formal Gaussian likelihood function. As discussed in Van Leeuwen (2009), due to the absence of re-sampling, even for medium dimensional systems with a large number of observations to be 5 assimilated, we expect the PBS to be degenerate with only a few particles carrying the majority of the importance weights.
Nevertheless, a major advantage of the PBS is its computational efficiency, requiring only one ensemble model integration and one efficient analysis step (27). In this study, the PBS and the ES are used to benchmark the performance of the ES-MDA.

Interannual variability and comparison to in-situ measurements
Here, we present results of the ES-MDA scheme with 100 ensemble members and 4 assimilation cycles (Section 3.3.2) for all the years and sites where snow surveys were conducted. Figure 4 shows    observed distributions, such as that for Kvadehuksletta 2016, are hard to match as they do not conform well to a lognormal distribution, most likely due to the limited number of sample points (Section 2.2).

22
The Cryosphere Discuss.  Table 4. Summary of evaluation metrics, i.e. bias, time averaged ensemble RMSE and square correlation coefficient (R 2 ), for the the independently observed variables/parameters fSCA, peak SWE (µ) and peak subgrid coefficient of variation (χ). Number of observations in brackets next to the corresponding symbols. All the metrics are averaged over 100 independent runs each with 100 ensemble members. The ES-MDA was run with Na = 4 assimilation cycles.
In addition to the ES-MDA scheme, we evaluate the PBS and ES (Section 3.3.2) with regards to in-situ measurements, using an ensemble size of 100 members for all schemes. Three error metrics are considered: the bias (mean error), RMSE and the square correlation coefficient, the results are summarized in Table 4. For the fSCA, all the schemes achieved a major improvement relative to the prior with an almost tenfold reduction in bias, a halving of RMSE and an almost perfect correlation to 5 the in-situ fSCA retrievals. For the peak mean SWE, the PBS performs best in terms of RMSE and bias, followed closely by ES-MDA which, in turn, has the highest correlation coefficient. With regards to the peak subgrid coefficient of variation, ES-MDA performs best across all the metrics, tying with ES for (absolute) bias and the PBS for RMSE. As considerably more in-situ observations are available for fSCA than for SWE, the evaluation for fSCA must be considered more robust.

10
Observed, prior and posterior peak mean SWE and peak subgrid coefficient of variation for different years/sites are shown in Figure 6. As discussed in Section 4.1, the assimilation tends to move the posterior peak mean SWE estimates closer to the observed mean SWE when compared to the prior. However, clear performance differences are found between the different schemes for a number of situations: in 2008, the PBS is not able to correct for as much of the bias in the peak mean SWE   Table 4. Similar to peak mean SWE, PBS shows signs of degeneracy (e.g. Bayelva 2009, complete degeneracy) for the coefficient of variation estimation. On some occasions (e.g. Bayelva 2008, Bayelva 2009 and Bayelva 2014), the posterior ensemble median is effectively pulled closer to the observed coefficient of variation when compared to the prior. Moreover, on the same occasions the ensemble spread is slightly constrained.
When compared to the peak mean SWE, however, it seems that it is much harder to constrain estimates of the coefficient of 5 variation regardless of scheme, although it is possible to shift the ensemble in the right direction.
We gauged the sensitivity of the three batch smoother schemes with respect to ensemble size and the number of assimilation cycles by considering the fractional improvement (FI) in RMSE that was achieved through the analysis step for all available 'ground truth' observations. Figure 7 demonstrates the differences in performance with regards to this metric. On the one hand, 10 the PBS requires an ensemble size of 1000 to converge to a stable FI of around 75%, 20% and 60% for the fSCA, peak subgrid coefficient of variation, and peak mean SWE respectively. On the other hand, ES-MDA with 4 assimilation cycles converges at already 100 ensemble members at a similar FI to the PBS. The ES performs worst regardless of ensemble size, with an FI of around 70%, 10% and 55% even with 10 5 ensemble members, while requiring 100 ensemble members for convergent results.
For all schemes, the greatest improvements were achieved for fSCA, followed by peak mean SWE, while by far the lowest  improvements were achieved for the peak subgrid coefficient of variation. With an ensemble size of 100, the ES-MDA required around 4 assimilation cycles to converge to a stable performance, i.e. there is no marked improvement of FI for more cycles (Figure 7, bottom right). Therefore, the results for ES-MDA must be considered particularly favorable, as stable performance is achieved with only 4 assimilation cycles and 100 ensemble members.

25
The Cryosphere Discuss.   The effects of observation error were studied by running the ES-MDA (N e = 10 2 , N a = 4) and assimilating first only MODIS observations and then both MODIS and Sentinel-2 observations, the latter with significantly lower observation error (Section 3.2), for the 2016 snow season at all study sites. As previously discussed the Sentinel-2 fSCA retrievals are based on higher resolution optical reflectance retrievals. As such they are expected to be less prone to representativeness error (i.e. error due 5 to differences in the measurement footprint), and thus observation error, in that the area in which the snow surveys were conducted is more accurately covered by the retrievals. Furthermore, the Sentinel-2 scenes used for fSCA retrievals were manually checked to be cloud free, this was not the case for the MODIS scenes. Table 5 summarizes various performance metrics for the two different runs. For the peak mean SWE depth (µ), there is no difference when including Sentinel-2 fSCA retrievals in the assimilation. For the coefficient of variation, however, there is a noticeable increase in FI for both the bias and 10 the RMSE, as well as a considerable increase in the correlation coefficient. This improvement in the coefficient of variation estimation is expected because upon including Sentinel-2 fSCA retrievals with lower observation error the shape of the snow depletion curve is more constrained. Since the shape of the depletion curve is closely tied to the value of the peak subgrid coefficient of variation (χ), the inclusion of even just a few Sentinel-2 fSCA retrievals with lower error might add considerable information to better constrain the value of χ. same time accounting for uncertainties in the retrievals. In addition, ES-MDA is generally able to correct the prior estimates of the peak mean SWE towards the independently observed values, which is essentially achieved through a bias correction on the forcing. Even if the downscaled forcing is biased it is likely a better model input than forcing data obtained directly from coarse-scale reanalyses (Østby et al., 2017). For example, the lapse rate correction on temperature in the downscaling (c.f. Fiddes and Gruber, 2014) will adversely affect the precipitation and melt rates at the more elevated Steinflåen plateau. This 5 effect is not captured in the reanalysis product (Dee et al., 2011) where the elevation of the nearest grid point is near sea level. Figure 5 shows that for most of the years the prior median is a poor estimate of the observed peak mean SWE. This has important consequences for SWE modeling as it indicated that an open-loop (i.e. no DA) run would have a tendency to not be representative at the grid scale. One reason for this discrepancy is that many crucial processes such as wind drift are not 10 included in our model. Moreover, the subgrid variability of the SWE was typically overestimated in the prior, with the prior distributions typically being too skew. It would be possible to run a more sophisticated model (e.g. ALPINE3D; Lehning et al., 2006) and make use of physiographic and climatological features in formulating the prior peak subgrid coefficient of variation distribution, perhaps even using climatological snow distribution patterns derived from high resolution fSCA imagery as proposed in Sturm and Wagner (2010). 15 The posterior distributions were on the other hand much closer to the observed distributions for most of the years and sites considered. This suggests that there is sufficient information contained in the remotely sensed snow cover depletion to constrain the peak SSD estimates. On some occasions, especially for Bayelva 2015, the posterior SSD was far from the observed SSD both in shape and in mean. However, the posterior estimate was still slightly better than the prior, indicating that the assimilation 20 had a positive effect on the outcome. A similar marginal performance is found for Steinåen plateau in 2016, but the number of SWE point observations (see Table 1) is in this case not sufficient to reliably constrain the shape of the distribution, while peak mean SWE is slightly overestimated by the assimilation.

Evaluation of data assimilation schemes
Compared to the ES and PBS, which were used in previous studies (e.g. Durand et al., 2008;Girotto et al., 2014b;Margulis 25 et al., 2015), ES-MDA exceeds or at least nearly matches the performance for all the evaluation metrics considered, i.e. bias, RMSE and correlation coefficient for fSCA, peak mean SWE and peak subgrid coefficient of variation. The performance gain over the ES is explained by the iterative nature of the ES-MDA, performing a sequence of smaller corrections in the analysis step as opposed to one abrupt correction (Emerick and Reynolds, 2013;Stordal and Elsheikh, 2015). Particularly in the case of a non-linear model, as is the case for the SSM, this process of "simulated annealing" (c.f. Stordal and Elsheikh, 2015) in 30 the ES-MDA leads to a better approximation of the posterior than a single analysis. In terms of error metrics, the RMSE with 100 ensemble members for the peak SWE (µ) is lower for both the ES-MDA and the PBS than in the corresponding studies of Girotto et al. (2014b) and Margulis et al. (2015), although this could be attributed to the smaller size of the validation data set. At least with a low number of ensemble-members the ES-MDA also outperforms the PBS: a possible reason for this is that the PBS posterior ensemble only spans the same range as the prior ensemble, and only changes the relative weights of the ensemble members in the analysis. Thus, if the prior ensemble is biased compared to the observations, it is unlikely that the analysis step in the PBS is capable of correcting the posterior towards the observations. In such a case, the region with high likelihood is very small and not necessarily close to the observations. A good example is the 2008 season at Bayelva (c.f. 5 Figure 4 and Figure 6) for which the prior is far away from the observed fSCA. Consequently, the PBS is unable to shift the ensemble outside the prior range as opposed to both the ES and the ES-MDA. In several years, the PBS also shows signs of degeneracy, i.e. a large part of the weight is carried by a very low number of particles. As the PBS is essentially a particle filter without re-sampling (Van Leeuwen, 2009), the weights can quickly converge on just a few particles in high-likelihood regions, leaving the remaining particles redundant, even for low-dimensional systems with a relatively large number of observations 10 and thus many analysis steps (as the one considered here). Note that both the ES and the ES-MDA are less prone to degeneracy (provided that the prior ensemble has sufficient spread) as they modify the position of the ensemble in the analysis step.
The sensitivity analysis for the ensemble size is consistent with higher dimensional models: the ES-MDA requires relatively few ensemble members for a convergent performance, similar to the ensemble Kalman filter (EnKF; Evensen, 1994), while the 15 PBS requires a relatively large ensemble size for convergence as with the particle filter (PF; see Van Leeuwen, 2009). Moreover, the number of assimilation cycles required for convergence of the ES-MDA, i.e. four cycles, is in line with previous studies (Emerick and Reynolds, 2013

Effects of observation error
As pointed out for Figure 6, all DA methods have problems constraining the spread of the peak subgrid coefficient of variation (χ) although they can effectively pull the median in the right direction. A likely reason is the limited information available in the remotely sensed snow depletion, with either too sparse or too uncertain fSCA retrievals. Therefore, it could be worth considering fSCA retrievals from even more sensors, such as LandSat, PROBA-V and AVHRR, which increases the chances to obtain 5 more cloud-free scenes. It is worth highlighting that χ is best constrained when a high number of accurate fSCA retrievals is available, such as Bayelva 2016 with Sentinel-2 retrievals added (c.f. Figure 4). Thus, with fSCA retrievals from different sensors added, it may be possible to better constrain the posterior χ ensemble, especially in the years where the SDC follows a well defined, monotonically decreasing pattern. Moreover, more accurate high resolution fSCA retrievals help in constraining the peak coefficient of variation: even with just a few additional high accuracy retrievals from Sentinel-2 the performance of 10 the assimilation was markedly improved with respect to χ across all evaluation metrics. This clearly points towards the benefits of high resolution optical reflectance based fSCA retrievals from sensors on board the LandSat and Sentinel-2 satellites.

Outlook
Several extensions to the ensemble-based subgrid snow data assimilation discussed herein could be considered. The first is to 15 change the grid scale of the assimilation framework from the order of 1 km by either going larger or smaller scale. In the case of the latter, it would be possible to just assimilate LandSat and Sentinel-2 based fSCA retrievals and operate at a grid scale on the order of 100 m in line with the work of Girotto et al. (2014a) and Margulis et al. (2016). For the former, one would aggregate the satellite retrievals yet further and perform the assimilation at a grid scale on the order of 10 km or larger. This implementation could be problematic as the probabilistic snow depletion curve formulation of Liston (2004)  might limit the applicability of such multisensor assimilation approaches.
The major caveat to the assimilation of fSCA retrievals is the occurrence of clouds which causes extended gaps in time series obtained from optical sensors. As discussed, using fSCA retrievals from even more sensors could help to fill in the gaps in the remotely sensed snow cover depletion and yet further constrain the peak subgrid coefficient of variation (χ), which proved 5 to be the most difficult parameter to constrain in the evaluation. In particular, the use of additional higher resolution fSCA retrievals, such as those from LandSat, with lower representativeness error (and thus observation error) could help in reducing the spread in χ and acquiring more reliable estimates.
To reduce the computational costs of the ES-MDA, the adaptive ES-MDA presented in Le et al. (2016) should be considered.
In addition, implementing the bias corrected ES-MDA outlined in Stordal and Elsheikh (2015) may be worthwhile for future applications, especially when applied to a larger domain with possibly even larger misfits between the prior and the observations. Furthermore, implementing the scheme with a more complex snow model such as Crocus (Vionnet et al., 2012) or SNOWPACK (Bartelt and Lehning, 2002)   As snow is a crucial driver for many terrestrial and atmospheric processes, the ensemble-based subgrid snow data assimilation scheme may improve process modeling in a range of disciplines, in particular since the spatial resolution is considerably higher than in passive-microwave derived SWE data sets. For example, the subgrid variability of permafrost temperatures is closely tied to that of SWE depth (e.g. Gisnås et al., 2016), which has major implications for permafrost mapping (e.g. 25 Westermann et al., 2015b25 Westermann et al., , 2016b. Similarly, snow cover information is an important component of many ecological models (e.g. Kohler and Aanes, 2004). Moreover, peak SWE is intimately linked to stream flow, and high-resolution information on the snow distribution hold significant potential for for hydrology and water resource management (Andreadis and Lettenmaier, 2006;Barnett et al., 2005). Finally, knowledge of snow distribution and snow melt is of interest for tourism given its importance for e.g. skiing, hiking and backcountry travel.

Conclusions
In this study, we use an ensemble-based data assimilation scheme, the ensemble smoother with multiple data assimilation and Sentinel-2. The ES-MDA is combined with analytical Gaussian anamorphosis to update parameters that were either lower or double bounded in physical space. The data assimilation is applied to a simple snow model, based on the surface energy balance coupled to a probabilistic snow depletion curve. The scheme is driven by downscaled coarse ERA-Interim reanalysis data. As such, both the model forcing and the satellite retrievals are globally available.

5
The results are compared to in-situ data sets of snow distributions from high-arctic sites near Ny-Ålesund, Svalbard, so that the performance can be evaluated with respect to modeled fSCA, mean peak SWE and subgrid coefficient of variation.
Thereby, the following conclusions can be drawn: -At the kilometer scale, the ES-MDA is able to successfully assimilate fSCA retrievals into the simple snow model and estimate the peak subgrid SWE distribution prior to the snow-melt.

10
-A physically-based interpolation of the remotely sensed fSCA time series is obtained that takes into account uncertainties in both the model and the retrievals.
-For peak mean SWE, the ES-MDA features a RMSE of 0.09 m w.e. which is lower than in previously published studies.
-For the peak subgrid coefficient of variation that controls the width and skewness of the distribution, the ES-MDA to a certain extent fails to constrain the spread of the ensemble although the posterior median was usually pulled in the right 15 direction.
-By including high resolution fSCA retrievals from Sentinel-2, the ensemble can be better constrained (in particular with respect to the coefficient of variation) which highlights the potential of additional high resolution fSCA retrievals from sensors on board the LandSat and Sentinel-2 satellites in future work.
-In line with previous studies, the ES-MDA converges with as low as 100 ensemble members and 4 assimilation cycles.

20
-With this configuration the fractional improvement in RMSE from prior to posterior is around 75%, 60% and 20% for the fSCA, peak mean SWE and peak subgrid coefficient of variation.
-The ES-MDA is shown to improve upon or at least nearly match the performance of the particle batch smoother and ensemble smoother for all evaluation metrics considered.
As the scheme exploits high and medium resolution satellite images from optical sensors, it is capable of estimating snow dis-25 tribution at considerably higher spatial resolutions than traditional SWE products, e.g. based on passive microwave retrievals.
On the other hand, the scheme can only recover the peak subgrid SWE distribution prior to the onset of melt, as opposed to providing information on the seasonal evolution of the snow distribution, so that it can rather complement than replace existing SWE retrieval algorithms. More generally, the method could become a part of satellite-era hydrometeorological reanalysis schemes with a wide span of applications.