Ensemble-based assimilation of fractional snow-covered area satellite retrievals to estimate the snow distribution at Arctic sites

With its high albedo, low thermal conductivity and large water storing capacity, snow strongly modulates the surface energy and water balance, which makes it a critical factor in midto high-latitude and mountain environments. However, estimating the snow water equivalent (SWE) is challenging in remote-sensing applications already at medium spatial resolutions of 1km. We present an ensemble-based data assimilation framework that estimates the peak subgrid SWE distribution (SSD) at the 1km scale by assimilating fractional snow-covered area (fSCA) satellite retrievals in a simple snow model forced by downscaled reanalysis data. The basic idea is to relate the timing of the snow cover depletion (accessible from satellite products) to the peak SSD. Peak subgrid SWE is assumed to be lognormally distributed, which can be translated to a modeled time series of fSCA through the snow model. Assimilation of satellite-derived fSCA facilitates the estimation of the peak SSD, while taking into account uncertainties in both the model and the assimilated data sets. As an extension to previous studies, our method makes use of the novel (to snow data assimilation) ensemble smoother with multiple data assimilation (ES-MDA) scheme combined with analytical Gaussian anamorphosis to assimilate time series of Moderate Resolution Imaging Spectroradiometer (MODIS) and Sentinel-2 fSCA retrievals. The scheme is applied to Arctic sites near Ny-Ålesund (79 N, Svalbard, Norway) where field measurements of fSCA and SWE distributions are available. The method is able to successfully recover accurate estimates of peak SSD on most of the occasions considered. Through the ES-MDA assimilation, the root-mean-square error (RMSE) for the fSCA, peak mean SWE and peak subgrid coefficient of variation is improved by around 75, 60 and 20 %, respectively, when compared to the prior, yielding RMSEs of 0.01, 0.09m water equivalent (w.e.) and 0.13, respectively. The ES-MDA either outperforms or at least nearly matches the performance of other ensemble-based batch smoother schemes with regards to various evaluation metrics. Given the modularity of the method, it could prove valuable for a range of satellite-era hydrometeorological reanalyses.

sensors can retrieve SWE based on brightness temperature at a resolution of around 25 km. However, PM SWE retrievals have problems over forested areas and complex topography, as well as for wet and deep snowpacks (Foster et al., 2005). Both gravimetric and PM sensors are able to retrieve SWE independent of cloud coverage resulting in gap free time series. While not capable of measuring SWE, moderate resolution optical sensors such as MODIS can retrieve binary information on snow cover (i.e. snow or no snow), fSCA and snow grain size (Hall et al., 2002;Salomonson and Appel, 2004;Painter et al., 2009) 5 at approximately 500 m resolution with a daily revisit frequency. In addition, higher resolution optical sensors, e.g. on board the Landsat and Sentinel-2 satellites, can map fSCA at around 30 m resolution (e.g. Cortés et al., 2014). Optical sensors can not see through clouds which results in data gaps over most snow covered regions. To obtain gap free time series, it is thus necessary to either interpolate optical remote sensing data in time and space, or ingest them in models.
Data assimilation (DA) methods can objectively fuse uncertain information from observations and models. Deterministic 10 SWE reconstruction techniques (Girotto et al., 2014b) that directly insert remotely sensed fSCA data in models represent the simplest form of snow data assimilation. Such schemes back-calculate peak SWE from the disappearance date of the snow cover (as determined from fSCA retrievals) using snowmelt models. Martinec and Rango (1981) used Landsat fSCA retrievals during the melt season in conjunction with a simple degree day snowmelt model to estimate the peak mean SWE. Similarly, Cline et al. (1998) used Landsat fSCA retrievals combined with a distributed energy balance model to reconstruct the SWE 15 distribution. More recently, Molotch and Margulis (2008) used fSCA information from multiple sensors for deterministic SWE reconstruction. Durand et al. (2008) introduced a probabilistic framework for SWE reconstruction. This was based on assimilating synthetic fSCA retrievals during the ablation into the SSiB3 land surface model coupled to the SDC of Liston (2004) using the ensemble smoother (ES; Van Leeuwen and Evensen, 1996) in batch mode (c.f. Dunne and Entekhabi, 2005). The assimilation of synthetic fSCA in this twin experiment was used to correct annual biases in the snowfall and facilitated the 20 recovery of the SWE distribution. Using the Durand et al. (2008) framework, Girotto et al. (2014b) assimilated Landsat fSCA retrievals to recover the SWE distribution, yielding a significant reduction in RMSE relative to deterministic SWE reconstruction. Subsequently, Girotto et al. (2014a) used the same framework to perform a 27 year reanalysis of SWE distributions.
Recently, Margulis et al. (2015) modified this probabilistic approach by adopting a particle batch smoother (PBS) as opposed to the ES for the assimilation of fSCA retrievals to estimate the SWE distribution. The PBS was found to outperform the ES, 25 considerably reducing the RMSE. Based on this work, Margulis et al. (2016) adopted the PBS framework to conduct a 30 year reanalysis of SWE over the Sierra Nevada (USA) using Landsat fSCA retrievals. Cortés et al. (2016) applied the same PBS framework to construct a 30 year reanalysis of SWE over 6 instrumented basins in the Andes. Cortés and Margulis (2017) subsequently adopted this approach to perform a 31 year SWE reanalysis over the entire extratropical Andes.
Several other snow DA techniques have recently been employed. Andreadis and Lettenmaier (2006) assimilated MODIS 30 fSCA retrievals into the VIC model through the ensemble Kalman filter (EnKF; Evensen, 2009) using a simple SDC for the SWE-fSCA inversion. However, the improvement compared to the open loop (OL, i.e. no DA) run was only modest which was also found in similar studies Slater and Clark, 2006). A Bayesian technique was used by Kolberg and Gottschalk (2006) to assimilate Landsat fSCA retrievals into a snow model with a probabilistic SDC to estimate the peak SWE distribution. They found a significant reduction in uncertainty when retrievals were assimilated simultaneously as opposed to 35 sequentially. At the continental scale, a multisensor assimilation of both GRACE TWS and MODIS fSCA using the ES and EnKF for TWS and fSCA respectively yielded significant improvements relative to the OL (Su et al., 2010). De Lannoy et al.
(2010) used the EnKF in a twin experiment to assimilate synthetic PM SWE retrievals and greatly outperformed the OL. This was extended to a real multisensor experiment by jointly assimilating PM SWE and MODIS fSCA retrievals (De Lannoy et al., 2012). Li et al. (2017) used the ES to assimilate PM SWE retrievals and estimate the SWE distribution, markedly outperforming 5 the OL. Of late, particle filter (PF; see Van Leeuwen, 2009) schemes have been gaining popularity in snow DA studies (Charrois et al., 2016;Magnusson et al., 2017). For example, Charrois et al. (2016) assimilated synthetic optical reflectance retrievals into Crocus using the sequential importance re-sampling PF at a point scale and considerably outperformed the OL.
It is worth emphasizing that the most popular schemes in the snow DA community, both the EnKF and the PF, are filters (i.e. sequential techniques). As such, they are Markovian of order 1 (memoryless): the future state at a given point in time 10 depends only on the present state. Furthermore, observations are assimilated sequentially with only the current observation affecting the current state. Batch smoothers (Dunne and Entekhabi, 2005), on the other hand, take into account the entire history of a model trajectory within a batch (observation window) and as such have memory (non-Markovian) so that they are better suited for reanalysis problems.
In this study, we build on the probabilistic SWE reconstruction technique outlined in Girotto et al. (2014b) to recover 15 subgrid SWE distributions (SSD) for a study area in the Arctic based on fSCA retrievals from MODIS and Sentinel-2. The novelty of our study lies in the use of an iterative batch smoother scheme, namely the ensemble smoother with multiple data assimilation (ES-MDA; Emerick and Reynolds, 2013). To update physically bounded parameters we make use of analytical Gaussian anamorphosis (Bertino et al., 2003). We investigate the performance of the ES-MDA in terms of SWE reconstruction and compare it to the ES and the PBS employed by Girotto et al. (2014b) and Margulis et al. (2015) respectively. The results 20 are evaluated against independent field measurements of fSCA and snow surveys conducted over six snow seasons.
2 Study area

Physical characteristics and climate
The study area is located in NW Svalbard close to the research town of Ny-Ålesund (78 • 55 N, 11 • 50 E) on the Brøgger peninsula. Field measurements are available from three sites (Figure 1). "Bayelva" about two kilometers west of Ny-Ålesund 25 is the main study site where multiyear in situ records on e.g. the surface energy balance, permafrost thermal regime and snow distribution are available (Westermann et al., 2009;Gisnås et al., 2014;Boike et al., 2017). In addition, snow surveys for a single season (2016) are available from "Steinflåen plateau" and "Kvadehuksletta". All sites feature gently undulating topography with small hills and surfaces characterized by patterned ground features, leading to strong differences in snow cover due to wind drift. Bayelva and Kvadehuksletta are located between 10 and 50 m a.s.l., while the Steinflåen plateau is at 30 a higher elevation of around 200 m a.s.l.. Kvadehuksletta is exposed to most wind directions, whereas Bayelva and Steinflåen plateau are partly sheltered by mountains. The sites are located within the continuous permafrost zone (Boike et al., 2003) with a maximum active layer depth of around 1.5 m at the Bayelva site (Westermann et al., 2009). The Bayelva site is located around the heavily instrumented Bayelva climate and soil monitoring station . This area has been the subject of extensive studies spanning permafrost (Roth and Boike, 2001;Boike et al., 2008;Westermann et al., 2011a), the surface energy balance (Boike et al., 2003;Westermann et al., 2009), CO 2 exchange (Lüers et al., 2014;Cannone et al., 2016), ecology (Kohler and Aanes, 2004), snow (Bruland et al., 2001;Gisnås et al., 2014;López-Moreno et al., 2016), hydrology (Nowak and Hodson, 2013) and satellite retrieval validation (Westermann et al., 2011b(Westermann et al., , 2012. The surface 5 cover at Bayelva and Kvadehuksletta alternates between bare soil, rocks and sparse low vegetation (Westermann et al., 2009), while the more elevated Steinflåen plateau is predominantly covered by loose rocks.
The climate of western Svalbard is influenced by the relatively warm West Spitsbergen ocean current causing a maritime climate with mild winters and cool summers for this latitude (Esau et al., 2012). At Ny-Ålesund the winter, summer and annual  average air temperatures were -12.0 • C, 3.8 • C and -5.2 • C, respectively, while the average annual precipitation was 427 mm (Førland et al., 2011). Between September/October and May the precipitation mainly falls as snow, although rain on snow events have become more frequent due to the warming of the local climate (Nowak and Hodson, 2013;López-Moreno 5 et al., 2016). The seasonal snow cover usually forms in late September or early October and lasts until mid June to early July, with a melt season of around one month (Winther et al., 2002). The dominant energy source during the snowmelt is radiation (longwave and shortwave), while the heat flux required to warm the frozen ground underlying the snow is an important energy sink (Boike et al., 2003;Westermann et al., 2009). 10 Manual surveys of snow depth and density were carried out in April/May for six years at the Bayelva site and for one year (2016) at the two other sites (Table 1). At this time, the snow depth is near its maximum but the snowpack is still dry. The snow density was sampled in vertical layers at every fifth point. As no systematic stratification of the snow density was found, SWE was finally calculated from snow depth and the average snow density at each site in a given year. At Bayelva, the snow density was generally confined to a range of 350±50 kg m −3 for all the surveys, while the snow density was found to be 15 around 450 kg m −3 at Steinflåen plateau in 2016. At Kvadehuksletta and Steinflåen plateau, the surveys were conducted along transects with regular sample intervals (see Figure 1). A randomized array of sample points was employed for Bayelva in most years, except for the first two years where transects were used. Basal ice layers resulting from rain on snow events (Kohler and Aanes, 2004;Westermann et al., 2011a) occur in the area and can constitute a major source of uncertainty for SWE measurements. In 2016, the depth of basal ice layers was measured 20 using ice screws and their contribution to the SWE was accounted for. In addition, internal ice layers and the spatial variability of average snow densities (see above) contribute to the uncertainty of the measurements. Furthermore, only a limited number of sampling points is available, so that the obtained snow distributions are expected to deviate to a certain extent from the true snow distributions in the area. Although the snow surveys coincide closely with peak SWE, some accumulation (ablation) may occur after (before) the surveys. To assess the magnitude of this error source, we used snow depth measurements at the Bayelva 25 station  to compare the snow depth at the survey dates to the maximum snow depth for each snow season.
In 2012, 2013 and 2016, an automatic time-lapse camera was deployed near the summit of Scheteligfjellet (694 m a.s.l.; c.f. Figure 1) overlooking the Bayelva site. The camera was a standard digital camera triggered by a Harbortronics time-lapse system, delivering daily images except for prolonged periods with low cloud cover. The raw camera images were orthorectified 5 at a 1 m resolution and snow was mapped for each pixel using a threshold on the intensity, so that fSCA could be determined for each image. The orthorectified images for two of the years are freely available in Westermann et al. (2015a).
In 2008, aerial images were obtained for the Bayelva site for five dates in June during the beginning of the snowmelt period.
This was accomplished by mounting a digital camera to a UAV flying at an altitude of 100 to 250 m above ground which took between 700 and 1000 images per mission at nadir angles. As the images were taken in a near random fashion over the 10 entire area, fSCA was calculated by averaging over the fSCA determined for each image using a simple threshold criterion.
GPS-based surveys of the remaining snow patches were available for five additional dates, so that a complete fSCA time series is available for the snowmelt period in 2008.

Method
3.1 Simple snow model 15 To efficiently run a large number of ensemble members, a simple snow model (SSM) is employed, which computes snowmelt rates according to surface energy balance formulations (as in the CryoGrid 3 ground thermal model; Westermann et al., 2016).
The model is a blend of a single layer mass balance scheme, based on the UEB model (Tarboton and Luce, 1996;You et al., 2014), and the Liston (2004) SDC. Many internal snow processes (occurring inside the snowpack), including heat conduction and melt water percolation, are omitted. In addition, several external processes such as sublimation and deposition are ignored. 20 In the following sections, we describe the governing equations of the SSM (see Table 2 for the model constants).

Snow depletion curve
We use the SDC parametrizations discussed in Liston (1999), Luce and Tarboton (2004) and Liston (2004) which parametrize the relationship between fSCA, melt depth and SWE by using a probability density function (pdf) to represent the peak subgrid SWE distribution (SSD). A key assumption is that the melt rate is spatially uniform within each grid cell. The relationship 25 between the accumulated melt depth (D m ), the peak SSD pdf (f P ), and the fSCA within the grid cell at time t is given by Similarly, the mean SWE depth is given by Following Liston (2004) we parametrize the peak SSD using a two parameter lognormal pdf f P = f P (D|µ, χ) where µ is the peak mean SWE and χ = σ/µ is the peak subgrid coefficient of variation (σ is the standard deviation). Our choice of parametric distribution was motivated by independent measurements of the SSD which fit reasonably well to a lognormal 5 distribution (Bruland et al., 2001). Equations (1) and (2) can both be solved analytically as presented in Liston (2004).

Mass and Energy Balance
To obtain the instantaneous net accumulation rate, A(t), we follow the UEB model through (You et al., 2014) where P (t) is the precipitation rate and M (t) is the melt rate. Sublimation is not considered as it is a relatively small contribu-10 tion to the mass balance at our study area (Westermann et al., 2009). We use a linear transition to delineate between snowfall and rainfall (You et al., 2014), with thresholds given in Table 2. We only consider rainfall as a positive contribution to the mass balance during non-melting conditions when the rainwater generally refreezes in the snowpack (Westermann et al., 2011a). For melting conditions (where D m > 0), we assume that rainfall directly becomes runoff.
The melt rate, M , is calculated based on a simplified snow energy balance defined by where Q M is the snowmelt flux, Q * R is the global radiation, Q P is the heat advected by precipitation, Q H is the sensible heat flux, Q E is the latent heat flux and Q G is the ground heat flux. The last three fluxes are defined as positive when directed away from the surface and vice versa for the first two on the right hand side of (4). The SSM differs from UEB in that we calculate the surface energy balance for a melting snowpack, i.e. isothermal at 0 • C, at all times. In this case, the global radiation is in which S ↓ and L ↓ are the downwelling shortwave and longwave irradiances and the last term is the upwelling longwave radiation for the assumed isothermal snowpack at T 0 =273.15 K. The snow albedo (α S ) is parametrized prognostically through the continuous reset formulation following Dutra et al. (2010), which computes the albedo for time increments ∆t by distinguishing between accumulating, steady and ablating conditions: Here, α min and α max are the minimum and maximum snow albedo values respectively, while τ A and τ F are aging (decay) rates for non-melting and melting snow respectively. τ S is a threshold for daily snowfall which if exceeded leads to a reset of the snow albedo to its maximum value. This simple decay and reset type of snow albedo parametrization has been shown to perform reasonably well at Bayelva (Pedersen and Winther, 2005). The heat advected by rainfall (Q P ) is computed as in Tarboton and Luce (1996), while the turbulent fluxes of sensible (Q H ) and latent (Q E ) heat are evaluated following Westermann et al. (2016).

5
The ground heat flux (Q G ) is parametrized through a simple e-folding relationship during the melting period, i.e.
where Q 0 is the initial ground heat flux, d H is the thermal diffusivity of the ground, z E is the effective depth of the heat transfer below the base of the snowpack and t m is the number of days with melting conditions after peak accumulation. Q 0 is a perturbation parameter (see Table 4) that is updated in the assimilation, d H is selected according to field measurements 10 (Westermann et al., 2009) and z E is set so that the ground heat flux decays to near zero a month into the melt season.
The snowmelt flux Q M can now be evaluated through (4). We recall that an isothermal snowpack at 0 • C is assumed for (4), which is only justified for a melting snowpack. In this case, positive Q M correspond to melting and SWE reduction, while negative values correspond to refreezing of melt water and thus SWE increase. For a dry snowpack (as is generally the case before the snowmelt), negative Q M values would lead to a cooling of the snowpack, which is not considered in this simple 15 snowmelt scheme. To discard unphysical values (negative melt rates) we only consider days with net melting conditions, i.e.
positive daily average snowmelt fluxes. Thus, the daily averaged melt rate M n at day n (lasting from t n to t n+1 ) is given by where ρ w is the density of fresh water, L f is the latent heat of fusion and ∆t is the daily time step. We emphasize that the effects of refreezing are still considered at a subdaily time resolution in (8). Similarly, the daily averaged precipitation rate is Now the daily averaged net accumulation rate can be obtained through and the accumulated melt depth D m is accounted for through

25
The peak mean SWE µ is updated via where the alternative Heaviside function is defined through H(x) = 0 if x ≤ 0, and H(x) = 1 otherwise. Consequently, in (13) the peak mean SWE µ is only non-zero if Λ exceeds the threshold τ S . Note that the formulation in (11) gradually resets the melt depth towards zero in the case of snowfall after the onset of melt, following Liston (2004). This means that fSCA is not 5 reset to unity in the case of new snowfall after a melting period unless the new snowfall leads to an increase in the peak SWE.
In the study area, snowfall events occurred rarely during the snowmelt period and the new snow cover lasted only a short time. At sites where such events are more frequent, Durand et al. (2008) presents an alternative solution, albeit at an increased computational cost. The annual model integrations start in the beginning of September when the surface is assumed to be snow free so that both µ and D m are initialized as zero. Both µ and D m are reset to zero following the complete disappearance of 10 the snowpack, defined as when the fSCA decreases below 0.01 to account for the infinite tail of f P . The model resolution is defined by the footprint of (area encompassed by) the snow surveys for each site (see Table 1 and Figure 1).  Tarboton and Luce (1996) ρ w Density of fresh liquid water 10 3 kg m −3 Tarboton and Luce (1996) σ SB Stefan-Boltzmann constant 5.67 × 10 −8 W m −2 K −4 Tarboton and Luce (1996)

Forcing
Forcing terms in the form of precipitation, air temperature, relative humidity and wind speed, as well as downwelling longwave and shortwave radiation are required to diagnose the mass and energy balance in (3) and (4). These terms are obtained 15 by downscaling ERA-Interim reanalysis data (Dee et al., 2011) at 0.75 • resolution following Østby et al. (2017). This method uses the linear theory of orographic precipitation in Smith and Barstad (2004) to downscale precipitation and a modification of the TopoSCALE approach (Fiddes and Gruber, 2014) for the remaining fields. The reanalysis forcing is downscaled onto 1 km resolution DEM grid cells centered on each of the study sites. The downscaling is performed based on the mean physiographic conditions (elevation, slope and aspect) within each of these grid cells. The resulting values at 1 km spatial and 6 hourly temporal resolution are linearly interpolated in time to facilitate a stable computation of the time evolution of turbulent energy fluxes 5 following Westermann et al. (2016). From these fluxes, and the remaining surface energy balance fields, diurnally averaged melt rates are calculated. Similarly, diurnally averaged rainfall and snowfall rates are computed by delineating between rain and snow in the time interpolated precipitation rate (You et al., 2014) and then taking diurnal averages. While the resolution of the downscaled forcing data do not exactly match the model resolution (i.e. the footprint of the snow surveys, Section 3.1.2), the mismatch is small considering the gentle topography of the study sites (Section 2.1).

Satellite retrievals
We make use of satellite retrievals between May and September which contain the snowmelt period for all the investigated years. Only retrievals that fall inside the melt season are assimilated as these contain information about the snow cover depletion. Due to frequent cloud cover, the effective revisit frequency of fSCA retrievals is irregular, with prolonged data gaps occurring regularly. An overview of the number of available scenes is given in Table 3. 15 Location Melt season # of MODIS scenes # of Sentinel 2 scenes Bayelva 2008Bayelva , 2009Bayelva , 2012Bayelva , 2013Bayelva , 2014Bayelva , 2015Bayelva , 2016 −, −, −, −, −, −, 7 Steinflåen plateau 2016 5 8 Kvadehuksletta 2016 11 7 Table 3. Number of MODIS and Sentinel-2 scenes per melt season with field measurements available for the three study sites.

MODIS
We employ version 6 of the level 3 daily 500 m resolution fSCA retrievals from the Moderate Resolution Imaging Spectroradiometer (MODIS) on board the satellites Terra (MOD10A1 product; Hall and Riggs, 2016a) and Aqua (MYD10A1 product; Hall and Riggs, 2016b). The retrieval algorithm is based on an linear fit of the normalized difference snow index (NDSI) measured by MODIS to fSCA retrievals from ground truth Landsat scenes as described in Salomonson and Appel (2004). The

20
NDSI exploits the fact that snow is highly reflective in the visible, but a good absorber in the shortwave infrared which sets it apart from other natural surfaces such as clouds, vegetation and soil (Painter et al., 2009).
We average over all the pixels for each day and study site (see Figure 1). This average is only taken if cloud free (as determined by the MODIS cloud mask) retrievals are available for each of these pixels. If both Terra and Aqua retrievals are available for a given day only the former are used. Despite small deviations in the measurement footprint (see Figure 1), we 25 compare MODIS fSCA retrievals to the field measurements of fSCA obtained from the automatic camera system, UAV and GPS surveys (Section 2.2). From this comparison, we estimate a RMSE of σ MOD = 0.13 for the MODIS fSCA. We use σ 2 MOD as the observation error variance in the corresponding diagonal entries of the observation error covariance matrix (Section 3.3.2).

Sentinel-2
For the year 2016, we complement the MODIS fSCA retrievals with aggregated 20 m resolution retrievals from the Sentinel-2A 5 mission (Drusch et al., 2012). fSCA estimates are derived from the level 1C orthorectified top of the atmosphere reflectance product, with cloud-free scenes manually selected. For this purpose, NDSI is computed from reflectances (r) from a visible (b3, centered on 0.56 µm) and a shortwave infrared band (b11, centered on 1.61 µm) through Each pixel is then either classified as snow covered (NDSI≥ 0.4) or snow free (NDSI< 0.4), where the NDSI threshold was 10 chosen in line with Hall et al. (2002). The binary (snow/no snow) pixels are then aggregated to the approximate footprint of the independent snow surveys conducted at each site ( Figure 1) to obtain Sentinel-2 derived fSCA estimates. Therefore, the areal extent of the Sentinel-2 fSCA retrievals closely matches the areas of the corresponding study sites given in Table 1. The retrieval process is illustrated schematically in Figure 2. By comparing the Sentinel-2 retrievals to the field measurements of fSCA from the automatic camera system in 2016 we estimate a RMSE of σ S2 = 0.09. We use σ 2 S2 as the observation error 15 variance in the corresponding diagonal entries of the observation error covariance matrix (Section 3.3.2).

Ensemble Data Assimilation
In this section we outline how the prior ensemble of model realizations is set up and how it is updated to a posterior ensemble through the assimilation of fSCA satellite retrievals using ensemble-based batch smoother schemes.

20
The prior ensemble of model realizations is generated by independently drawing perturbation parameter values from the distributions listed in Table 4. These perturbation parameters are held constant throughout integration of the model for each water year. Two of these are multiplicative bias parameters that perturb the mass balance through the net accumulation rate for j ∈ 1 : N e where N e is the number of ensemble members. We inherently assume the model forcing to be the major source 25 of uncertainty (De Lannoy et al., 2010;Raleigh et al., 2015). Furthermore, we assume that the error in the forcing can be modeled through constant multiplicative biases (fixed throughout the annual integration) in the mass balance. Consequently, the bias parameters are modeled as a positive definite lognormal random variables. This is in line with the perturbations in Girotto et al. (2014b) on the precipitation rate, but we also perturb the melt rate. Moreover, we assume that the ensemble of net accumulation rates is on average unbiased due to the applied downscaling method (Østby et al., 2017) and thus assign the two bias parameters a mean of unity. The precipitation rates are also perturbed by the same bias parameter in the computation of the heat advected by precipitation (Q P ) in the surface energy balance that contributes to the melt rate M n .
In addition to the mass balance forcing, the peak subgrid coefficient of variation χ (Section 3.1.1) is a source of uncertainty.
We assume a prior mean of 0.4 for χ, which corresponds to the value provided by Liston (2004) for "Arctic tundra". Moreover, χ is assumed to be double bounded between 0 and 0.8, with negative values being unphysical and the upper bound close to the 5 maximum value in Liston (2004). Furthermore, both the initial ground heat flux at the onset of melt (Q 0 ) and the minimum snow albedo (α min ) are uncertain and we also assume that these are double bounded. The probability distributions of double bounded random variables are modeled as logit-normal distributions, with the logit transform for a variable x bounded between a and b given by while the inverse transform is given by To generate a prior ensemble of a logit-normally distributed random variable, we first apply the logit transform to the mean.
Then, N e ensemble members of Gaussian white noise with a consistent variance are added and finally the inverse transform is applied. We emphasize that through the perturbation parameters we effectively perturb the melt rate, precipitation rate and coefficient of variation. By performing a subsequent ensemble integration of the SSM we also get an ensemble of state variables that are consistent with the prior perturbation parameter ensemble.

Batch Smoothers
Here, we describe the ES-MDA, the main batch smoother scheme used in the assimilation, as well as the ES and the PBS that are used for comparison. In a batch smoother all the observations, in this case all fSCA retrievals from the snow cover depletion during one melt season, are assimilated at once in a single batch (Dunne and Entekhabi, 2005), as opposed to sequentially as in a filter (Bertino et al., 2003). We follow the conventional notation in the DA literature, as laid out in Ide et al. (1997). requiring multiple ensemble model integrations and analysis steps. Collecting the perturbed and predicted observations during the ensemble integration into a batch and performing the analysis step is referred to as one assimilation cycle and we will let the current iteration number be denoted as . In such a case, the ES-MDA scheme is set up as follows, for ∈ 0 : N a iterations: 1. Run an ensemble model integration, i.e. for n ∈ 0 : (N t − 1) time steps compute where M is the SSM operator defined through equations (1), (2), (11) and (13).
2. If < N a (otherwise stop the algorithm here) collect the batch of predicted observations and perturbed observations where ⊗ is the outer product, 1 is an N o ×1 vector of ones, the T superscript denotes the transpose, α ( ) is the observation error inflation coefficient and ( ) is a N o × N e matrix containing zero mean Gaussian white noise with a variance of 1.
3. Transform the perturbation parameters using analytical Gaussian anamorphosis functions ψ (Bertino et al., 2003) 15 ψ is the natural logarithm and the logit for the biases and the remaining perturbation parameters respectively.
4. Perform the Kalman-like analysis step in the transformed space the transformed perturbation parameter-predicted observation and predicted observation error covariance matrices are respectively, in which primes ( ) denote anomalies (deviations from the ensemble mean).
5. Apply the appropriate inverse transforms to recover the updated perturbation parameters 25 ψ −1 is the exponential and the inverse logit for the biases and the remaining perturbation parameters respectively. ensemble smoother (ES; Van Leeuwen and Evensen, 1996). For N a = α ( ) = 1, the ES scheme is recovered, which was used in the probabilistic SWE reconstruction of Durand et al. (2008) and Girotto et al. (2014b). The idea behind the ES-MDA is to perform multiple smaller analysis steps as opposed to one abrupt analysis step. In the case of a non-linear model, this is expected to yield a better approximation of the true posterior (Emerick and Reynolds, 2013). A requirement for the ES-MDA 5 to give a nearly unbiased estimate (c.f. Stordal and Elsheikh, 2015) is that the coefficients satisfy Na−1 =0 1/α ( ) = 1. In our case this is accomplished by setting all the coefficients to α ( ) = N a and specifying N a before any assimilation cycles are carried out. We emphasize that the analysis step (21) only updates the perturbation parameters and a consistent ensemble of states is found from the subsequent ensemble model integration. The model constants listed in Table 2 remain unchanged by the analysis and the integration. As mentioned, the perturbation parameter matrix Θ in (21) is transformed through analytical 10 Gaussian anamorphosis (Bertino et al., 2003) to ensure that the priors are Gaussian. In this case, the Kalman-like analysis step (21) is variance minimizing for a linear model (Van Leeuwen and Evensen, 1996). The entire methodology, with ES-MDA as the DA scheme, is depicted in Figure 3. Margulis et al. (2015) introduced the particle batch smoother (PBS) for snow data assimilation. In this scheme, each particle (i.e. ensemble member; Van Leeuwen, 2009) is given an equal prior weight of 1/N e . Then, after an ensemble model integration, 15 the normalized posterior importance weights w j ∈ [0, 1] are diagnosed through the analysis step where X j = [X j ; Θ j ] is the augmented state vector for the j-th particle and the Gaussian likelihoods are given by This is a direct application of Bayes' rule in which the normalizing denominator has two important consequences. Firstly, 20 c 0 = 1/ (2π) No |R| cancels out thus avoiding errors introduced through floating point arithmetic ((2π) No is generally large).
Secondly, the prior weights p( X j ) also cancel as they are equal for all particles. With Gaussian likelihoods, (25) becomes where the posterior weights w j sum to unity. The posterior ensemble still spans the range of the prior ensemble, as the analysis step only changes the relative weights of the ensemble members and not their position within the state and perturbation param-25 eter space. Through the individual ranking of the states and perturbation parameters followed by a cumulative summation of the correspondingly sorted weights, the marginal cumulative distribution functions are readily recovered, facilitating the estimation of quantile values. Note that the PBS is equivalent to running a particle filter without re-sampling and using the prior as the importance density (see Van Leeuwen, 2009). As such, the PBS corresponds to the generalized likelihood uncertainty In this study, the PBS and the ES are used to benchmark the ES-MDA.

Interannual variability and comparison to field measurements
In this section, we present results of the ES-MDA scheme with 100 ensemble members and 4 assimilation cycles (Section 3.3.2) for all the years and sites where snow surveys were conducted. Figure  despite the prior range being far from the observations. The posterior ensemble median peak mean SWE is generally close to the independently observed peak mean SWE, but absolute relative differences up to 40% (minimum 0.5%, mean 19%) occur.  observed snow distributions. Some of the observed distributions, such as that for Kvadehuksletta 2016, are hard to match as they do not conform well to a lognormal distribution, most likely due to the limited number of sample points (Section 2.2).
The posterior bias parameters can be directly evaluated by comparing the bias corrected forcing to field measurements. Due to a lack of snowfall observations (see Boike et al., 2017), an evaluation of the precipitation bias parameter is not possible.
However, the melt bias parameter can be evaluated by comparing the estimated snowmelt flux (which is directly proportional  In ensemble-based data assimilation the spread of the posterior ensemble should represent the uncertainty. To verify this one can compare two metrics: the residual, i.e. the instantaneous posterior RMSE of the ensemble relative to the corresponding independent field measurement, and the ensemble standard deviation (e.g. Evensen, 2009). For this comparison we define the  Table 5. Summary of evaluation metrics, i.e. bias, RMSE and square correlation coefficient (R 2 ), for the fSCA, peak SWE (µ) and peak subgrid coefficient of variation (χ). These metrics are based on comparisons to all the field measurements presented in Section 2.2 with the number of observations for the comparisons in brackets next to the corresponding symbols. All the metrics are averaged over 100 independent runs each with 100 ensemble members. The ES-MDA was run with Na = 4 assimilation cycles.
that the two metrics are equal so that the posterior ensemble spread accurately captures the estimation uncertainty. For the fSCA, peak mean SWE and peak subgrid coefficient of variation the average (over all available field measurements) relative residuals were 2.22, 1.53 and 1.66 respectively, so the posterior ensemble underestimates the uncertainty. This effect has been extensively described by Evensen (2009), it arises in part because of model structural errors related to neglected physical processes (Section 3.1). Still, the assimilation is generally able to simultaneously (but not to the same extent) reduce the spread 5 and the error in the ensemble (Figure 4).

Evaluation of data assimilation schemes
In addition to the ES-MDA scheme, we evaluate the PBS and ES (Section 3.3.2) with regards to field measurements, using an ensemble size of 100 members for all schemes. Three error metrics are summarized in Table 5: the bias (mean error), RMSE and the square correlation coefficient. For the fSCA, all the schemes achieve a major improvement relative to the prior with 10 an almost tenfold reduction in bias, a halving of RMSE and an almost perfect correlation to the field measurements of fSCA (Section 2.2). For the peak mean SWE (µ), the PBS performs best in terms of RMSE and bias, followed closely by ES-MDA which, in turn, has the highest correlation coefficient. With regards to the peak subgrid coefficient of variation (χ), ES-MDA performs best across all the metrics, tying with ES for (absolute) bias and the PBS for RMSE. As considerably more field measurements are available for fSCA than for µ and χ, the evaluation for fSCA must be considered more robust. The scatter 15 plots in Figure 6 visualize the performance of the prior and all the considered DA schemes relative to the field measurements.
Observed, prior and posterior peak mean SWE and peak subgrid coefficient of variation for different years/sites are shown in Figure 7. As discussed in Section 4.1, the assimilation moves the posterior peak mean SWE estimates closer to the observed peak mean SWE in most cases when compared to the prior. However, clear performance differences are found between the different schemes for a number of situations. In 2008, the PBS is not able to correct for as much of the bias in the peak 20 mean SWE compared to the ES-MDA and the ES. For the remaining years, the performance of the schemes in terms of  Table 5). The PBS also shows signs of degeneracy (e.g. Bayelva 2009, complete degeneracy) for the coefficient of variation estimation. On some occasions (e.g. Bayelva in 2008Bayelva in , 2009Bayelva in and 2014, the posterior ensemble median is effectively pulled closer to the observed coefficient of variation when compared to the prior. On the same occasions the ensemble spread is slightly constrained. Compared to the peak mean SWE, it is much harder to constrain estimates of the 10 coefficient of variation regardless of scheme, although it is possible to shift the ensemble in the right direction. We gauged the sensitivity of the three batch smoother schemes with respect to ensemble size and the number of assimilation cycles by considering the fractional improvement (FI) in RMSE that was achieved through the analysis step based on all available field measurements (Figure 8). On the one hand, the PBS requires an ensemble size of 1000 to converge to a stable FI of around 75%, 20% and 60% for the fSCA, peak subgrid coefficient of variation, and peak mean SWE respectively. On the other hand, ES-MDA with 4 assimilation cycles converges at already 100 ensemble members at similar FIs to the PBS.
The ES performs worst regardless of ensemble size, with FIs of around 70%, 10% and 55% even with 10 5 ensemble members requiring 100 ensemble members for convergent results. For all schemes the available validation data suggests that the greatest 5 improvements are achieved for fSCA, followed by peak mean SWE, while by far the lowest improvements are found for the peak subgrid coefficient of variation. With 100 ensemble members, the ES-MDA converges to a stable performance at 4 assimilation cycles, i.e. there is no marked improvement of FI for more cycles (Figure 8, bottom right).

Effects of observation error and assimilation frequency
The effects of observation error and assimilation frequency are studied by running the ES-MDA (N e = 10 2 , N a = 4) and 10 assimilating first only MODIS and then both MODIS and Sentinel-2 retrievals for the 2016 snow season at all study sites. As discussed in Section 3.2, the Sentinel-2 fSCA retrievals are based on higher resolution optical reflectance retrievals. As such they are expected to be less prone to representativeness error and thus observation error since the area in which the snow surveys were conducted is more accurately covered by the retrievals. Furthermore, the Sentinel-2 scenes used for fSCA retrievals were manually checked to be cloud free, which was not the case for the MODIS scenes. Table 6 summarizes various performance metrics for the two different runs. For the peak mean SWE depth (µ), there is no difference when including Sentinel-2 fSCA retrievals in the assimilation. For the coefficient of variation (χ), however, there is an increase in FI for both the bias and the RMSE, as well as an increase in the square correlation coefficient. Sentinel-2 fSCA retrievals with lower observation error help to further constrain the shape of the snow depletion curve which explains the improvement in the χ estimation. We emphasize 5 that this evaluation is based on the only 3 available field measurements of µ and χ in 2016 (from the snow surveys), so that these preliminary results need to be consolidated by future studies with more validation data.  Table 6. Summary of evaluation metrics, i.e. fractional improvement in bias and RMSE as well as prior and posterior square correlation coefficient (R 2 ), using the ES-MDA (Ne = 10 2 , Na = 4) for peak mean SWE (µ) and coefficient of variation (χ) when assimilating only MODIS as well as assimilating both MODIS and Sentinel-2 observations. These metrics are based on a comparison to all the snow surveys conducted in 2016 (see Table 1) and are averaged over 100 independent runs each with 100 ensemble members.

Interannual variability and comparison to field measurements
For all considered years and sites, the ES-MDA scheme both brings the ensemble median fSCA closer to the observed fSCA and significantly constrains the spread of the ensemble (Figure 4). Thus, the posterior effectively fills the gaps in the remotely sensed fSCA time series using a physically based snow model which is bias-corrected through the assimilation, while at the same time accounting for uncertainties in the retrievals. In addition, the ES-MDA is generally able to correct the prior estimates of the peak mean SWE towards the independently observed values, which is essentially achieved through a bias correction of the model forcing. Although the downscaled forcing is biased, it is a more reliable input than forcing data obtained directly 15 from coarse-scale reanalyses (Østby et al., 2017). For example, the lapse rate correction on temperature in the downscaling (c.f. Fiddes and Gruber, 2014) influences the snowfall and melt rates at the more elevated Steinflåen plateau. This effect is not captured in the reanalysis product (Dee et al., 2011) for which the elevation of the nearest grid point is near sea level.
An inherent equifinality problem (see Beven, 2006)  precipitation having a negative bias, the prior melt having a positive bias or a combination of these two. The opposite would be true if the prior fSCA melts out too late. It is not possible to resolve this equifinality problem with observations of fSCA alone. A key assumption in deterministic SWE reconstruction is that the melt flux is more constrained than the precipitation so that uncertainty in the melt is ignored (Slater et al., 2013). We perturb both the precipitation and the melt, although the latter is assigned a lower uncertainty (Table 4). Through the assimilation we obtain snowmelts that are consistent with the observed Figure 5 shows that for most years the prior median is a poor estimate of the observed peak mean SWE. This indicates that a deterministic (no assimilation and unperturbed) run is not a good representation of the true state. In addition to biases in the precipitation and melt forcing, crucial processes for peak SWE, such as deposition, are not included in the simple snow model. Furthermore, the subgrid variability of the SWE is typically overestimated in the prior, with the prior distributions typically being too skewed. To circumvent these issues, a more sophisticated model (e.g. ALPINE3D; Lehning et al., 2006) 5 accounting for wind drift could be employed, and the climatological snow distribution pattern (Sturm and Wagner, 2010) could help formulate the prior peak subgrid coefficient of variation distribution.
The posterior distributions are on the other hand much closer to the observed distributions for most of the years and sites considered. This suggests that there is sufficient information contained in the remotely sensed snow cover depletion to constrain the peak SSD estimates. On some occasions, especially for Bayelva 2015, the posterior SSD is far from the observed SSD both 10 in shape and in mean. However, the posterior estimate is still slightly better than the prior, indicating that the assimilation has a positive effect on the outcome. A similar marginal performance is found for Steinåen plateau in 2016, but the number of SWE point observations (see Table 1) is not sufficient to reliably constrain the shape of the observed distribution in this case.

Evaluation of data assimilation schemes
Compared to the ES and the PBS, which were used in previous studies (e.g. Durand et al., 2008;Girotto et al., 2014b;Margulis 15 et al., 2015), the ES-MDA exceeds or at least nearly matches in performance for all the evaluation metrics considered, i.e. bias, RMSE and correlation coefficient for fSCA, peak mean SWE and peak subgrid coefficient of variation. The performance gain over the ES is explained by the iterative nature of the ES-MDA, performing a sequence of smaller corrections in the analysis steps as opposed to one abrupt correction (Emerick and Reynolds, 2013;Stordal and Elsheikh, 2015). Particularly in the case of a non-linear model, as is the case for the SSM, this process of simulated annealing (c.f. Stordal and Elsheikh, 2015) in the 20 ES-MDA leads to a better approximation of the posterior than a single analysis step.
At least with a low number of ensemble-members the ES-MDA also outperforms the PBS. A possible reason for this is that the PBS posterior ensemble spans the same range as the prior ensemble and only changes the relative weights of the ensemble members in the analysis. Thus, if the prior ensemble is so biased that it does not encompass the observations, the PBS is incapable of correcting the posterior towards the observations outside the bounds of the prior. In such a case, the region with 25 high likelihood is very small and not necessarily close to the observations. A good example is the 2008 season at Bayelva (c.f. Figure 7) for which the prior is far away from the observed fSCA. Consequently, the PBS is unable to shift the ensemble outside the prior range as opposed to both the ES and the ES-MDA. In several years, the PBS also shows signs of degeneracy, i.e. a large part of the weight is carried by a very low number of particles. As the PBS is essentially a particle filter without re-sampling (Van Leeuwen, 2009), the weights can quickly converge on just a few particles in high likelihood 30 regions. Therefore, the remaining particles become redundant even for low-dimensional systems with a relatively large number of observations such as the one considered here.

Figure 4 and
The sensitivity analysis for the ensemble size is consistent with higher dimensional models. The ES-MDA requires relatively few ensemble members for convergence, similar to the EnKF (Evensen, 2009), while the PBS requires a larger ensemble for convergence as with the PF (Van Leeuwen, 2009). The number of assimilation cycles required for convergence of the ES-MDA (4 cycles) is also in line with previous studies (Emerick and Reynolds, 2013 and N a analysis steps, where N a (typically ≥ 2) is the number of assimilation cycles (Section 3.3.2). On the other hand, the ES requires only two ensemble model integrations and a single analysis step, while the PBS only needs one ensemble model integration and a single analysis step. Based on a sensitivity analysis (Section 4.2) we set N a = 4, so the computational cost of the ES-MDA is higher than for the other schemes. For more complex models, such as Crocus (Vionnet et al., 2012) and SNOWPACK (Bartelt and Lehning, 2002), the ES-MDA could prove to be prohibitively expensive. However, an adaptive between neighboring grid cells) and the ES-MDA algorithm can be parallelized using high performance computing.

Effects of observation error and assimilation frequency
All the DA methods have problems constraining the spread of the peak subgrid coefficient of variation (χ, see Figure 7) although they can pull the median in the right direction. A likely reason is the limited information available in the remotely sensed snow cover depletion, with either too sparse or too uncertain fSCA retrievals. It is worth considering fSCA retrievals 20 from even more sensors such as Landsat and PROBA-V, which increases the chances of obtaining more cloud-free scenes.
With more scenes available, it may be possible to better constrain the posterior χ ensemble: even with just a few additional retrievals from Sentinel-2 the performance was improved with respect to χ estimation across all evaluation metrics. This also points towards the benefits of including higher resolution fSCA retrievals from the Landsat and Sentinel-2 satellites which will be more representative and thus accurate. The effective MODIS footprint is inhomogeneous and differs markedly from the 25 nominal 500 m pixel resolution when the view angle deviates from nadir (Peng et al., 2015). So, even for gridded applications, there is a considerable representativeness error in MODIS fSCA, although this is reduced when several pixels are aggregated.

Outlook
Several extensions to the presented ensemble-based data assimilation framework could be considered. The first is to change the 30 grid scale of the framework from the order of 1 km to larger or smaller scales. For the latter, it would be possible to assimilate only Landsat and Sentinel-2 based fSCA retrievals and operate at a grid scale on the order of 100 m in line with the work of e.g. Girotto et al. (2014a). For the former, one would aggregate the satellite retrievals even further and perform the assimilation at a grid scale on the order of 10 km or larger. This implementation could be problematic as the uniform snowmelt assumption in the SDC (Liston, 2004)  snowpacks (Foster et al., 2005), which might limit the applicability of such multisensor assimilation approaches.
The major problem in the assimilation of fSCA retrievals is the occurrence of clouds which causes extended gaps in time series obtained from optical sensors. As discussed, using fSCA retrievals from even more sensors could help to fill in the gaps in the remotely sensed snow cover depletion and further constrain the peak subgrid coefficient of variation (χ). The use of additional higher resolution fSCA retrievals with lower representativeness error (and thus observation error) could also prove 15 especially beneficial for constraining χ.
To reduce the computational costs of the ES-MDA, the adaptive ES-MDA (Le et al., 2016) should be considered. Furthermore, the bias corrected ES-MDA outlined in Stordal and Elsheikh (2015) may be worth pursuing for future applications, especially when applied to bigger domains with possibly even larger misfits between the prior and the observations. Using a more complex snow model such as Crocus (Vionnet et al., 2012) or SNOWPACK (Bartelt and Lehning, 2002) may not only 20 improve the modeled melt rates, but also offer the possibility to assimilate snow grain size retrievals (c.f. Painter et al., 2009), as noted by Durand et al. (2008). In addition, the method could be applied in a fully coupled land-atmosphere model. The intermediate complexity atmospheric research model (ICAR; Gutmann et al., 2016) shows particular promise in terms of an atmospheric model that can efficiently and iteratively be run in ensemble mode, as required for applications of ES-MDA. In principle, one could run ICAR in ensemble mode coupled to a land surface model with an adequately complex snow scheme 25 and assimilate fSCA (and possibly PM and TWS) retrievals with the ES-MDA to deliver a snow reanalysis.
As snow is a crucial driver for many terrestrial and atmospheric processes, the presented framework has the potential to improve process modeling in a range of disciplines, especially since the spatial resolution is considerably higher than in passivemicrowave derived SWE data sets. For example, the subgrid variability of permafrost temperatures is closely tied to that of SWE depth (e.g. Gisnås et al., 2016), which has major implications for permafrost mapping (e.g. Westermann et al., 2015bWestermann et al., , 30 2017. Similarly, snow cover information is an important component of many ecological models (e.g. Kohler and Aanes, 2004), and peak SWE is intimately linked to stream flow which is crucial for hydrology and water resource management (Andreadis and Lettenmaier, 2006;Barnett et al., 2005). Finally, knowledge of the snow distribution and snowmelt is of interest for tourism given its importance for e.g. skiing, hiking and backcountry travel.
In this study, we use the ensemble smoother with multiple data assimilation (ES-MDA) scheme to estimate peak SWE distributions at the kilometer scale from time series of remotely sensed fractional snow covered area (fSCA) from MODIS and Sentinel-2. The ES-MDA is combined with analytical Gaussian anamorphosis to update perturbation parameters that were either lower or double bounded in physical space. The data assimilation is applied to a simple snow model based on the surface 5 energy balance coupled to a probabilistic snow depletion curve. The scheme is driven by statistically downscaled ERA-Interim reanalysis data. As such, both the model forcing and the satellite retrievals are globally available.
The results are compared to field measurements of peak SWE distributions and fSCA from Arctic sites near Ny-Ålesund (79 • N, Svalbard, Norway) so that the performance can be evaluated with respect to the estimated fSCA, peak mean SWE and peak subgrid coefficient of variation. From this study, the following conclusions can be drawn:

10
-At the kilometer scale, the ES-MDA is able to successfully assimilate fSCA retrievals into the simple snow model and estimate the peak subgrid SWE distribution prior to the snowmelt.
-A physically-based interpolation of the remotely sensed fSCA time series is obtained that takes into account uncertainties in both the model and the retrievals.
-For the peak mean SWE, the ES-MDA features an average RMSE of 0.09 m w.e. compared to field measurements.

15
-For the peak subgrid coefficient of variation that controls the width and skewness of the distribution, the ES-MDA usually manages to pull the posterior median in the right direction, but the spread of the ensemble was difficult to constrain.
-By including higher resolution fSCA retrievals from Sentinel-2, the posterior peak subgrid coefficient of variation ensemble can be better constrained. This highlights the potential benefits of assimilating additional higher resolution fSCA retrievals from sensors on board the Landsat and Sentinel-2 satellites in future work.

20
-In line with previous studies, the ES-MDA converges with as low as 100 ensemble members and 4 assimilation cycles.
-With this ES-MDA configuration, the fractional improvement in RMSE from prior to posterior is around 75%, 60% and 20% for the fSCA, peak mean SWE and peak subgrid coefficient of variation.
-The ES-MDA exceeds or at least nearly match the performance of the particle batch smoother and the ensemble smoother for all evaluation metrics considered.

25
As the scheme exploits high and medium resolution satellite images from optical sensors, it is capable of estimating snow distribution at considerably higher spatial resolutions than traditional SWE products, e.g. based on passive microwave retrievals.
On the other hand, the scheme can only recover the peak subgrid SWE distribution prior to the onset of melt, as opposed to providing information on the seasonal evolution of the snow distribution, so that it can rather complement than replace existing SWE retrieval algorithms. However, the method could become a part of satellite-era hydrometeorological reanalysis schemes 30 with a wide range of applications.