ENSO influence on surface energy and mass balance at Shallap Glacier

The El Niño/Southern Oscillation (ENSO) is a major driver of climate variability in the tropical Andes, where recent Niño and Niña events left an observable footprint on glacier mass balance. The nature and strength of the relationship between ENSO and glacier mass balance, however, varies between regions and time periods, leaving several unanswered questions about its exact mechanisms. The starting point of this study is a 4-year long time series of distributed surface energy and mass balance (SEB/SMB) calculated using a process-based model driven by observations at Shallap Glacier (Cordillera Blanca, Peru). These data are used to calibrate a regression-based downscaling model that links the local SEB/SMB fluxes to atmospheric reanalysis variables on a monthly basis, allowing an unprecedented quantification of the ENSO influence on the SEB/SMB at climatological time scales (1980–2013, ERA-Interim period). We find a stronger and steadier anti-correlation between Pacific sea-surface temperature (SST) and glacier mass balance than previously reported. This relationship is most pronounced during the wet season (December–May) and at low altitudes where Niño (Niña) events are accompanied with a snowfall deficit (excess) and a higher (lower) radiation energy input. We detect a weaker but significant ENSO anticorrelation with total precipitation (Niño dry signal) and positive correlation with the sensible heat flux, but find no ENSO influence on sublimation. Sensitivity analyses comparing several downscaling methods and reanalysis data sets resulted in stable mass balance correlations with Pacific SST but also revealed large uncertainties in computing the mass balance trend of the last decades. The newly introduced open-source downscaling tool can be applied easily to other glaciers in the tropics, opening new research possibilities on even longer time scales.


Introduction
The climate of the Cordillera Blanca in the Peruvian Andes is characterized by a wet season from October to April followed by a dry season with little or no precipitation.These dry and wet periods can be modified during El Niño and La Niña events, the El Niño Southern Oscillation (ENSO) being an important driver of climate variability in the region (e.g.Garreaud et al., 2009).In this particular setting, the glaciers of the Cordillera Blanca are of great economical, environmental and scientific importance.They are not only important suppliers of fresh-water during the dry periods (e.g.Chevallier et al., 2011), but they also act as sensitive indicators of climate variability and climate change, as evidenced by the glacier shrinkage observed since the Little Ice Age (Kaser et al., 1990;Georges, 2004;Racoviteanu et al., 2008;Schauwecker et al., 2014).
The response of tropical glaciers to climate variations differs from their mid-latitudinal counterparts (e.g.Kaser, 1999) and has been studied extensively, in Africa (e.g.Kaser et al., 2004;Nicholson et al., 2013) and in South-America (e.g.Hastenrath, 1978;Kaser et al., 1990;Francou et al., 2000 and references herein; see Vuille et al., 2008a andRabatel et al., 2013 for a review).At low latitudes the annual cycle of temperature is small and humidity becomes an important driver of mass balance seasonality by its control on precipitation, net radiation, and sublimation (Wagnon et al., 1999;Kaser, 2001;Winkler et al., 2009;Sicart et al., 2011).By determining the phase of precipitation and thus the surface albedo, changes in temperature can have a significant impact on mass balance inter-annual variability (e.g.Favier et al., 2004;Gurgiser et al., 2013).The physical basis of tropical glaciers' responses to various atmospheric forcings are therefore best studied with process-based models that aim at the F. Maussion et al.: ENSO influence on Shallap Glacier, Cordillera Blanca full decomposition of the surface energy and mass balance (SEB/SMB) (e.g.Wagnon et al., 2003;Mölg et al., 2008).Since SEB/SMB models require high quality, high resolution glacio-meteorological observations for calibration and validation, the available time-series are short and unsuitable for long-term studies of glacier-climate interactions.
The starting point of this study is a 4-year long time series of distributed SEB/SMB fluxes at Shallap Glacier, Cordillera Blanca, obtained using a process-based model (Gurgiser et al., 2013).Our first objective is to extend the length of these time series while still preserving the advantages of the decomposition into individual SEB/SMB components.The SEB/SMB variability is tied to large scale driven weather conditions, and we hypothesize that by using atmospheric reanalysis data we can compute (downscale) the energy fluxes with sufficient accuracy to determine the atmospheric drivers of SEB/SMB variability on longer timescales.This hypothesis is the foundation of any empirical statistical retrieval of glacier climatic mass balance (MB), no matter of which complexity.
"Temperature index" or "positive degree day" models (e.g.Braithwaite, 1995;Hock, 2003) are probably the simplest example of seeking statistical relationships between glacier MB and local climate variables (in this case temperature and precipitation).Extensions of temperature-based models include so-called "semi-empirical" models that incorporate further explanatory variables and/or physical processes while still relying on observational data for calibration (e.g.Kaser, 2001;Juen et al., 2007;Pellicciotti et al., 2008).Another approach is to use observed relations between the MB and atmospheric variables or global circulation indexes in order to build statistical models that predict glacier MB (see Hoinkes, 1968, for a probably very first attempt in this direction).Several variations of this method have been applied to glaciers in Northern Europe (Mernild et al., 2014;Trachsel and Nesje, 2015), northern America (Hodge et al., 1998;Shea and Marshall, 2007) and in the Tropics (Manciati et al., 2014).All these studies use the MB as the predicted variable and do not use the terminology of "downscaling", that is extensively used in climate research.Statistical downscaling studies that target glaciological applications often focus on one or more meteorological variables at the glacier surface (Hofer et al., 2010(Hofer et al., , 2012) ) for use in a subsequent MB model for example (Jarosch et al., 2010;Weidemann et al., 2013).
Here we follow the general idea that, in principle, any target variable can be downscaled from large-scale atmospheric fields -as long as there is a physical reason for the local-and large-scale variables to be related (Benestad, 2004;Maraun et al., 2010).We present a new open-source tool (DownGlacier) developed especially to retrieve glacier SEB/SMB fluxes from large-scale atmospheric data.Inspired from existing software packages (Wilby et al., 2002;Hessami et al., 2008), it is a semi-automated, regression-based statistical downscaling tool (see Sect. 2.2).
The second and main objective of this study is to quantitatively assess the impact of ENSO on the SEB/SMB of the Shallap Glacier.The influence of ENSO in the tropical and central Andes can be roughly summarized with prevailing warmer and drier conditions during El Niño phases, while colder and wetter conditions prevail during La Niña phases.As a result, studies dealing with ENSO's influence on tropical Andean glaciers reported a significant anti-correlation between Pacific sea-surface temperature anomalies (SSTA) and MB (Arnaud et al., 2001;Francou et al., 2004;Vuille et al., 2008b;Veettil et al., 2014).The extreme 1997/98 Niño year, for example, caused exceptional glacier melt in the outer tropics (Wagnon et al., 2001;Francou et al., 2003).Favier et al. (2004) advanced that glaciers in the outer and inner tropics react similarly to El Niño events, mainly because of a precipitation deficit in the outer tropics and a temperature increase in the inner tropics, both leading to a rise in snow line altitude.
However, ENSO influences are neither spatially nor temporally coherent, especially in regions of complex terrain between the outer and inner tropics (Vuille and Keimig, 2004;Garreaud et al., 2009).Several studies in the Zongo valley (Bolivia, ∼ 16 • S; Ronchail and Gallaire, 2006) or in the Cordillera Vilcanota (∼ 14 • S; Perry et al., 2014;Salzmann et al., 2013) report less strong or even opposite ("Niño/wet, Niña/dry") local ENSO effects.The related studies of Kaser et al. (2003) and Vuille et al. (2008b) are the only reports of ENSO influence in the Cordillera Blanca (∼ 9 • S) to date.Based on a hydrological reconstruction of glacier MB for the period 1953-1993 (Kaser et al., 2003), they found a significant anti-correlation between annual MB and October to April Pacific SSTA, supporting the expected "Niño → negative MB, Niña → positive MB" pattern.This relationship however did not hold true during at least three individual years after the mid-1970s, leading the authors to conclude that ENSO characteristics may have undergone changes in recent decades.
Here we use DownGlacier to retrieve monthly SEB/SMB fluxes at Shallap Glacier from atmospheric reanalysis data.This allows a first time assessment of the influence of ENSO on the individual components of the SEB/SMB during a longer climatological period .The seasonal variations of the ENSO signal and its varying impact with altitude will be of particular interest.The rest of the paper is organized as follows.In Sect.2, we present the study region, describe the DownGlacier tool and the data used.In Sect.3, we present the downscaling results for the ablation area of the glacier where the glacio-meteorological measurements took place.In Sect.4, we apply the downscaling procedure to the entire glacier area and discuss the strengths and limitations of our method.We assess the robustness of our results in Sect. 5 by using several sensitivity analyses.The influence of ENSO will be analyzed and discussed for each of these steps before concluding our study in Sect.6.

Study region and meteorological data
The Shallap Glacier (9 • 20 S, 77 • 20 W, cf.Fig. 1) spans the altitude range 4700-5800 m a.s.l..It lies in the Cordillera Blanca, which hosts nearly a quarter of all tropical glaciers by area (Kaser, 1999).Precipitation in the region is essentially of convective nature and is tied to the moisture originating from the Amazonian Basin (Vuille and Keimig, 2004;Perry et al., 2014).The Andes mountain chain (reaching > 6700 m a.s.l. in the Cordillera Blanca) divides the wet Amazonian climate in the east from the dry coastal areas in the west (e.g.Kaser and Osmaston, 2002).The map in Fig. 1 illustrates the control of topography on triggering precipitation and the pronounced changes occurring within short distances.
The Shallap Glacier has been the subject of an intensive field program in recent years.Two automatic weather stations were operated over two distinct and partly overlapping periods: at the glacier surface (July 2010-September 2012, with several gaps) and on the southern moraine (2002-2009and July 2011to February 2012).Here, we used the southern moraine data from October 2005 to September 2009 (longest period with complete data coverage).The Unidad de Glaciología y Recursos Hidricos (UGRH) of the Peruvian Autoridad Nacional de Agua (ANA) started surface height change measurements in the ablation zone of the glacier in 2003.From August 2006 to August 2008 (end of data collection), additional measurement points are available (20 ablation stakes in total) with a reading frequency of 14 to 64 days.The average altitudes of the stake points as measured by the UGRH in August 2006 and August 2009 ranges between 4758 and 4824 m a.s.l.For a geographic overview of the stations and stakes see Gurgiser et al. (2013) (their Fig. 1).

DownGlacier
DownGlacier is an open-source tool programmed in the Python language.It relies on the statistical libraries Scikit-learn (Pedregosa et al., 2012) and Statsmodels (Seabold and Perktold, 2010) for the regression models, and adds specific SEB/SMB and uncertainty assessment tools.The project repository (https://bitbucket.org/fmaussion/downglacier) contains the source code, some usage examples and all data and scripts used to generate the plots presented in this paper.

Surface energy and mass balance
The function of DownGlacier is to compute the glacier SEB equation as resolved by most process-based melt models (e.g.Mölg et al., 2012): where SW in and SW out are the incoming and outgoing shortwave radiation, LW in and LW out the incoming and outgoing longwave radiation, QS and QL the turbulent sensible and latent heat fluxes, QC the conductive heat flux inside snow or ice, and QPS the penetrating shortwave radiation.An energy flux (W m −2 ) has a positive (negative) sign when it induces an energy gain (loss) at the surface.The sum of these fluxes yields a resulting flux F , which represents the available energy for melting QM if the glacier surface temperature is at the melting point (0  2) are valid at any instant, but not for averaged time periods.To compute the SEB/SMB from monthly averaged fluxes we assume that F is always equal to QM and that l subli is equal to the enthalpy of sublimation (and not vaporization).The effect of these approximations is generally small and depends on temperature and therefore on altitude (see Appendix A1 for details and Sect. 4 where we present a method to mitigate the errors related to these assumptions).

Downscaling strategy
The purpose of the downscaling procedure is to find a function f such as the following: where Y is the variable to be predicted (predictand), X = X 1 , X 2 , . .., X p are the explanatory variables (predictors), and ε is a random error term1 .In principle, the downscaling process is similar to any statistical learning problem (Hastie et al., 2009).The term downscaling refers to the fact that, in this case, the predictors X are extracted from large-scale atmospheric data (reanalysis data of atmospheric model output, representative of a large space) and the predicted variables Y are the glacier SEB/SMB fluxes, representative of a local state (Benestad, 2004).DownGlacier proposes several options to define f but for this study we use the so-called Lasso ("least absolute shrinkage and selection operator", Tibshirani, 1996) which performed best in our cross-validation tests.The Lasso is a shrinkage method developed to overcome some of the problems of least-squares regression such as over-fitting and the high sensitivity to the predictor subset.By penalizing the fitting of the regression coefficients by a factor λ, it shrinks some coefficients and sets others to zero (Tibshirani, 1996;Hastie et al., 2009).The resulting model is still a linear combination of multiple predictors (as for stepwise regression), but the chosen coefficients are not the same as with standard least-squares.Lasso is widely used in statistical learning problems across disciplines but it is not (yet) used much in climate downscaling studies despite of encouraging results (e.g.Hammami et al., 2012;Gao et al., 2014).Due to the novelty of this approach in a glaciological context, we provide more elements about Lasso in Appendix A2.

Calibration SEB/SMB data
The SEB/SMB data used to calibrate and validate the downscaling model was generated using an updated version of the process-based model developed and described by Mölg et al. (2008Mölg et al. ( , 2009Mölg et al. ( , 2012) ) previously applied at Shallap glacier by Gurgiser et al. (2013).Air temperature, humidity, wind speed, global radiation and total precipitation measured at the southern glacier moraine serve as model input for the period October 2005 to September 2009.The model calculates the SEB/SMB as formulated in Eqs. ( 1) and (2) at hourly time steps and for the entire glacier surface on a 50 m×50 m grid.
The distributed SEB/SMB time series are aggregated to monthly values and averaged spatially over altitude slices of 50 m height (the altitude slice at 4750 m a.s.l. for example being the average of the grid points in the 4750-4800 m range).The uncertainty associated with this reference data has to be assessed independently using the measurements at the ablation stakes: the annual RMSE (root mean square error) of the reference MB was estimated to 0.76 m w.e.(water equivalent) for the year 2007 and 0.88 m w.e. for the year 2008.We kept the more conservative estimate of 0.88 m w.e. and scaled it by a factor of 1/ √ 12 (following the normality assumption) to obtain a monthly RMSE of 0.25 m w.e.month −1 .This value will be taken into account and added to the downscaling error when analyzing our results at 4750 m a.s.l.(where most ablation stakes are located).For other altitudes and for the intermediate SEB variables no uncertainty assessment can be realized: this is discussed in more detail in Sect. 4.

Atmospheric predictors
The selection of the predictor set is crucial for the accuracy and stability of the downscaled time series (e.g.Maraun et al., 2010;Fowler et al., 2007;Sauter and Venema, 2011).For this study, we chose to select the predictors out of the nearest grid point of the atmospheric reanalysis data set, which is a common approach in downscaling studies (e.g.Gutiérrez et al., 2013;Hofer et al., 2012Hofer et al., , 2015)).It prevents dubious correlations with remote indices and ensures that the local glacier features are indeed related to the local atmospheric state (from the coarse data set perspective).Another more practical advantage of this procedure is its systematic and objective aspect.
In a first step, we chose to use ERA-Interim reanalysis data (Dee et al., 2011) provided by the European Centre for Medium-range Weather Forecasts (ECMWF), which proved to be most accurate for downscaling purposes in the region (Hofer et al., 2012).We chose to follow a similar approach as in Hofer et al. (2012) and previously smoothed the ERAinterim fields using a spatial Gaussian filter with σ = 1 (approximately a 3 × 3 box average), reducing the noise related to the arbitrary choice of the nearest grid-point.The starting predictor set consists of 27 predictors at the surface and at selected pressure levels in the atmosphere (Table 1).The sensitivity of our results on the chosen predictor set and the reanalysis data set is assessed in Sects.5.2 and 5.3.

Uncertainty analysis
The uncertainty associated with our method has two major sources: the calibration of the SEB/SMB time series (see Sect. 2.2.3) and the downscaling procedure itself.To a certain extent, the later can be assessed using cross-validation (e.g.Michaelsen, 1987).Here we use a variant of the leaveone-out cross-validation in which a five-elements window is removed iteratively from the calibration set.The model selection and calibration procedure is repeated 48 times (one for each month), providing new "penalized" time series obtained by 48 different models, each of them unaware of the 5 months period surrounding each data point.The period of ±2 months was chosen based on the predictands properties: the lag-3 autocorrelation values of the predictands at 4750 m a.s.l. were all close to 0, the highest being M Subs with an r 2 of 0.08.Refer to Appendix A3 for more details about the cross-validation procedure.
For the evaluation of the model skill we used standard metrics computed from the cross-validation: coefficient of determination r 2 , root mean square error (RMSE), and the Brier skill score (BSS), defined as follows: with MSE ds and MSE ref being the mean square error of the downscaling and of the reference model, respectively.The reference model is the leave-one-out monthly average of the calibration time series (i.e. the value for June 2007 is the average of the June values in 2006, 2008 and 2009).A posi-tive BSS evaluates the capacity of the downscaling model to make better predictions than taking the "climatology" (a perfect model having a BSS of 1).

ENSO classification
For the ENSO events classification we use the sea surface temperature anomalies (SSTA, relative to the base period 1981-2010) in the Niño 3.4 region obtained from the National Oceanic and Atmospheric Administration (NOAA) Climate Prediction Center (CPC).We follow the classification recommended by Trenberth (1997): El Niño (La Niña) occurs if the 5-month running average of Niño 3.4 SSTA exceeds 0.4 K (falls below −0.4 K) for 6 months or more (Fig. 2).This results in 192 neutral (47 %), 95 Niño (23 %) and 121 Niña months (30 %).The same index was also chosen by Vuille et al. (2008b) and Francou et al. (2004) for similar purposes.In addition, we shifted the SSTA time series by 3 months (as suggested by Francou et al. (2004) and confirmed by our own correlation analyses) to account for the lagged response of the MB anomalies at Shallap glacier.
We assess the sensitivity of our results to the choice of other ENSO indexes and lag values in Sect.5.4.

Results
We first present the results of the downscaling at the 4750-4800 m a.s.l.altitude slice which is located in the ablation area of the glacier.Here we have the highest confidence in the calibration time series and in their uncertainty estimates.
From now on we use the term "year" as replacement for "hydrological years" (1998 for example referring to the period October 1997 to September 1998).For simplicity, we refer to the 4750-4800 m a.s.l.altitude slice with the shorter "4750 m a.s.l." and add a 4750 subscript to the MB variable name wherever judged necessary to avoid ambiguities.

Validation
A summary of the cross-validation results is presented in Table 2.All downscaled variables have a positive BSS, the highest (0.81) for Temp and the lowest (0.21) for PRCP Total .
The most determinant variables for MB 4750 are the shortwave variables SW in and SW out as well as PRCP Solid .Their scores are generally lower than those of other variables but they are satisfying considering the complex nature of the precipitation and surface albedo processes.We discuss the conditions for the successful downscaling of SW out in Sect. 4. The example of PRCP Total illustrates the importance of considering all scores when assessing the model results.RMSE σ of PRCP Total is lower than that of PRCP Solid , meaning that the model is working satisfyingly.However, the inter-annual variability of PRCP Total is smaller and results   in an efficient reference model that penalizes the BSS.PRCP Solid , in turn, has a higher inter-annual variability (tied to the temperature variations, see Fig. 3) better caught by the downscaling model than by the reference climatology.
Figure 3 shows a comparison between the reference and modeled time series of Temp, SW net , PRCP Solid and MB 4750 .As expected, the full-model time series are closer to the reference than the cross-validation time series.However, the differences between the two are small, which indicates that the chosen predictors and their coefficients are stable regardless of the calibration period.The inter-annual variability is well caught by the model: the MB 4750 of the two last years is less negative due to lower air temperatures, higher snowfall and lower short-wave radiation input, for both the reference and the downscaled model.This raises a question: are the down-scaled variables consistent with the glacier surface processes and can we interpret them in the same way as we would do it with a physical model?

Physical consistency of the downscaled variables
Albedo (ratio SW out /SW in ) for example is strongly related to solid precipitation (Fig. 4e).The downscaled fields reproduce the expected relationship and the spread (related to other factors such as snowfall frequency) but some issues arise: in rare cases (6 %) the downscaled precipitation is slightly negative and for two cases the albedo is close to the high value of 1.This is due to the linear nature of the downscaling algorithm and is a known issue of statistical models, which are not aware of the physical properties of the downscaled variables.In DownGlacier the precipitation values are clipped to zero but we decided to leave the short-wave variables unchanged, since the occurrence of extreme low/high albedo are rare and correspond to a realistic atmospheric forcing (low/high solid precipitation).For other expected relationships such as the relation between the turbulent fluxes QS and QL with wind speed and vapor pressure (Fig. 4a and c), the downscaling produces realistic fields as well.Since monthly air temperature is always close to 0 • and therefore close to the surface temperature, the sensible heat flux is more dependent on wind speed than on air temper-ature (Fig. 4d).The latent heat flux is also well correlated with wind speed (Fig. 4e), but this is also due to the peculiar conditions at Shallap glacier, where wind speed and vapor pressure are well correlated (not shown).
The physical consistency of the MB with the downscaled fluxes is ensured by the computation of the SEB/SMB budget (Eqs. 1 and 2) and is an advantage of DownGlacier over other approaches downscaling the MB only.Interestingly, the direct downscaling of MB 4750 is slightly less accurate than the diagnostic method (BSS of 0.65 instead of 0.69) but we obtain a downscaled MB Down extremely close to the diagnostic MB Diag (Fig. 4f).Two reasons can explain this encouraging result.First, while the SEB/SMB equations are additive in nature, the non-linear processes are resolved beforehand by the process-based model and then mimicked by the downscaling procedure.Second, this result can be seen as an implicit confirmation that the downscaling procedure has "caught" all the SEB/SMB variability that can be explained by the large scale atmospheric fields.The remaining uncertainty is related either to missing information and errors in the large-scale atmospheric data or to the simplifying nature of the downscaling functions.In Appendix A4, we describe these functions and discuss their interpretation.

Influence of ENSO on the SEB/SMB fluxes
The downscaled monthly MB 4750 is mostly negative for the period 1980-2013 and displays a pronounced intra-and inter-annual variability (Fig. 5).A few months have a positive MB 4750 , all of them occurring during Niña periods.Inversely, the most negative events occur during Niño periods.We will now investigate if we can detect a systematic pattern by building composites of the Niño and Niña periods.

Niño/Niña composites
The annual cycles of MB 4750 , temperature, snowfall and total precipitation for each of the seven Niña and eight Niño periods are shown in Fig. 6.Despite of a large spread between individual years we distinguish a clear signal with below (above) average MB 4750 during Niño (Niña) periods, confirming the findings of previous studies (e.g.Francou et al., 2004;Favier et al., 2004;Vuille et al., 2008b).The largest differences between Niño and Niña occur between December and May, although the ENSO signal remains visible towards the end of the year for both MB 4750 and temperature.Temperature displays a larger spread for Niño than for Niña years, with temperature anomalies up to +2 K for extreme months.We also distinguish a Niño → dry/Niña → wet signal in total precipitation during the wet season, but it is less pronounced, with at least 2 wetter than average Niño years.Snowfall displays a clearer tendency, all Niña years being above the average from December to May and several Niño years having almost no snowfall at 4750 m a.s.l.during the same period.Due to this large spread it would be difficult to  define a "typical" Niño or Niña period.In fact, our attempts to go beyond the visual interpretation by testing the statistical significance of these differences were unfruitful because of the large standard deviation between years, the small number of composites, and the variables' RMSEs.
As for most glaciers, the energy budget at the ablation area of Shallap is dominated by the radiation fluxes (Fig. 7).The annual cycle of the energy budget is rather flat with an average annual energy gain of ∼ 60 W m −2 .The minimum of SW Net in February/March is combined with a smaller energy loss by LW Net , while during the dry season LW Net and QL inversely compensate the high SW Net .The turbulent fluxes are more important during the dry season, the sensible (latent) flux being constantly positive (negative) throughout the year.The resulting SMB cycle follows a bimodal pattern: the first peak (less negative MB 4750 ) in February is due to a combined effect of a smaller energy gain and a maximum of accumulation, while the second peak in July is related to the stronger energy sink by QL.
The composites presented on the right panel of Fig. 7 (note the different y axis ranges) provide useful information about the factors that possibly control the differences between Niño and Niña periods.The differences in SEB are overwhelmingly dominated by the short-wave balance, the other fluxes playing a smaller role (higher energy loss by LW Net in January/February of Niño years, smaller energy loss from April to June by QL).The increase in SW Net is directly related to a snowfall deficit, mostly between December and May.At least in the ablation zone of the glacier, the picture seems to unequivocally follow the pattern described for other tropical Andes glaciers (e.g.Favier et al., 2004).

Inter-annual variability
The individual Niña and especially Niño years are highly variable regarding their signal on MB 4750 since the events differ in strength, but how well is the Pacific SST related to MB 4750 ? Figure 8 displays the annual averages of MB 4750 and of Niño 3.4 SSTA.The relationship is striking throughout most of the period with a coefficient of determination of r 2 = 0.8 (p 10 −5 ) which reduces to r 2 = 0.67 ± 0.06 (p 10 −5 ) when taking the RMSE into account 2 .The latter figure is more realistic because the downscaled MB 4750 represents the deterministic part of the "real" MB 4750 : local and random processes which are not caught by the downscaling procedure are more likely to weaken the relationship than enhance it.We distinguish two periods with a slightly weakened relationship : 1991-1995 and 2002-2005, which are the exact same periods described by Rabatel et al. (2013) (their Fig. 9) or by Kaser et al. (2003) (their Fig. 9, for 1991-1995).
2 Mean and standard deviation of r 2 computed from 10 000 random realizations of MB 4750 ± RMSE.

Distributed SEB/SMB: exploring the potential and limitations of the procedure
In the previous section we limited our analyses to 4750 m a.s.l.where the accuracy of the reference SEB/SMB model could be assessed thoroughly using external data (ablation stakes), leading to a robust error assessment of the entire modeling chain.Using DownGlacier for the entire glacier is straightforward, at least in practice: the distributed SEB/SMB data are averaged over altitude slices of a fixed range (here 50 m, see Sect.2.2.3) and each variable/slice is downscaled independently.The cross-validation scores are computed in the same way (Fig. 9).The scores of PRCP Solid and LW Net are stable for all altitudes (PRCP Solid is getting closer to PRCP Total as temperature lowers).The score of SW Net , however, is highly variable and determines the accuracy of MB at lower altitudes where it is the largest energy input.In the 4800-4900 altitude range the capacity to downscale SW Net worsens with a maximum RMSE σ = 0.83 (the reasons for this low accuracy are discussed below).After 5000 m a.s.l., SW Net becomes less relevant for the energy budget and has less impact on the model skill.The BSS scores are low at high altitudes due to uncertainties in the estimation of the energy available for melting (QM).To address this issue, we can introduce a correction factor c which guarantees that the average downscaled QM is equal to the average calibration QM.The usage of this factor can also be cross-validated (MB Cor in Fig. 9 pact from 4950 m a.s.l.upwards, and is particularly efficient at high altitudes.However, it has a negative impact at low altitudes where it constrains the variability of the model.Altogether, this correction factor has only a very small impact on our conclusions, and we decided to use the non-corrected time series for further analyses. Figure 10 shows that the negative BSS at 5450 m a.s.l. is related to exceptional errors during the dry season where unrealistic negative MB is predicted for a few isolated months, an issue that is strongly reduced when using MB Cor (not shown).At 5700 m a.s.l. the problem is weaker and the predictions are satisfying.At ∼ 4850 m a.s.l., however, we reach the limits of the downscaling procedure: abrupt MB varia- tions from one month to another are not reproduced and the model's attempts to catch those result in bad predictions for the second half of the period.These jumps from positive to highly negative values are directly related to the surface conditions of the glacier; snow cover is a function of previous snowfall and melt (information which is not available in the reanalysis data).Our efforts to account for this monthly persistence by including lagged predictors were unsuccessful: increasing the number of predictors also increased the noise, and it is probable that the linear nature of the Lasso method is not able to cope for these complex effects.
The conditions for the successful downscaling of SW Net are found for example at 4750 m a.s.l.where snowfall and melt occur within days, or at higher altitudes when there is a permanent snow cover.It is therefore probable that the current version of DownGlacier will perform poorly on e.g.midlatitudes glaciers, where persistent effects can be determinant for the annual MB (e.g.Mölg et al., 2013, who showed that spring snowfall conditions on a Tibetan glacier regulate the entire annual mass balance due to the albedo feedback).In these cases the purely statistical approach used here should be complemented by physical albedo models.
The altitudes between 4850 and 4950 m a.s.l. with the highest RMSE represent approx.20 % of the glacier area and are responsible for most uncertainties of the MB variability 3 .Despite of these errors occurring around the location of the equilibrium line, the MB averaged over the entire glacier (specific MB) is well predicted by the model (RMSE σ of 0.5 and BSS of 0.64, time series in Fig. 10).The reasons for these good scores are the reliable downscaling in the lower parts of the glacier (which account for the majority of the mass loss) and of the accumulation processes in the upper parts.
These encouraging results call for an analysis of the model's glacier-wide predictions for 1980-2013, presented in Fig. 11, presented in Fig. 11.We arbitrarily multiplied the cross-validation RMSE by a factor 2 to account for unknown errors in the 3 These errors however have no systematic impact on the average specific MB, thanks to the property of statistical models to preserve the mean.upper parts.We see that the SSTA → MB relationship is less strong for the glacier average, with a deterministic correlation of r 2 = 0.51 (p < 10 −5 ) diminishing to r 2 = 0.38±0.08(p < 10 −3 ) when taking the RMSE into account.These values are lower than at the 4750 m a.s.l.altitude, and are closer to the correlation values found by Vuille et al. (2008b).As for most tropical glaciers (Kaser and Osmaston, 2002), Shallap glacier has large accumulation areas where precipitation falls as snow most of the time.Total precipitation is less sensitive to ENSO events than temperature: at 4750 m a.s.l. the deterministic correlation of snowfall with Pacific SSTA is r 2 = 0.75 (p < 10 −5 ) while it is 0.39 (p < 10 −3 ) for total precipitation.

Sensitivity analyses
We test the robustness of our conclusions by presenting the results of a series of sensitivity experiments grouped in three categories: downscaling method, predictor set and reanalysis data (Table 3 and Fig. 12).In a fourth experiment we analyze the sensitivity of our results to the choice of the ENSO index and lag value.

Sensitivity to the downscaling method
In this study we have used the Lasso, but other traditional regression methods include stepwise regression or principle component regression (e.g.Wilby et al., 2002;Hessami et al., 2008).We test several variants: -S Pcor : after an iterative selection, all predictors have a partial correlation significant at the p = 0.01 value;  -L 8fold : same as the standard run (Lasso) but with the λ parameter selection based on 8-fold cross-validation (instead of 4-fold).
All methods have a lower skill than the reference run (Table 3), with an increase of the RMSE by about 20 % for S RMSE or S Pcor and up to 33 % for S PC .As shown by the correlation values and the time series in Fig. 12, the sensitivity of the results to the chosen method is marginal with respect to the MB 4750 variability (with the exception of the principle components regression which shows a different trend and smaller variability).The Lasso is only weakly sensitive to the method chosen to select the penalization parameter.Stepwise regression methods show a stronger sensitivity to the choice of the stopping rule, such as the significance of the partial correlation (not shown).

Sensitivity to the predictor choice
We run five experiments with another predictor set.Predictors were either removed (temperature, relative humidity or surface variables), changed (pressure levels) or added (with a lag of 1 month).Here again, all experiments result in lower downscaling skill but lead to similar conclusions.Surprisingly, omitting temperature has the smallest effect on the model skill and has only a relative impact on the correlation with SST.This means that large parts of the temperature signal can be found in the other predictors.This is not the case with relative humidity: omitting this predictor has the strongest negative impact on the prediction skill.Further predictor denial experiments lead to inefficient models and are not shown here.
The Lag 1 experiment is particularly instructive with respect to the skill of the downscaling procedure: doubling the number of predictors by adding the lagged ones results in a lower out of sample cross-validation skill by increasing the noise and the chance for Lasso to select false-positive predictors.This is more likely to occur with short calibration periods and might also be one of the reasons for the increase of RMSE of 15 % when changing the predictor pressure levels (hPa experiment).Indeed, it is possible that the MB 4750 variability is more related to the levels chosen for the reference run (350, 450, 550 and 650 hPa) than the new ones, but it is more likely that the hPa experiment increased the noise and made the job for the Lasso more difficult.

Sensitivity to the reanalysis choice
Several studies (e.g.Brands et al., 2012;Hofer et al., 2012Hofer et al., , 2015) ) have discussed the sensitivity of the downscaling results to the choice of the reanalysis products used for calibration.Here we test three additional data sets chosen for their historical significance (NCEP/NCAR R1) or for their relative novelty and sophistication (ERA-Interim, MERRA and CFSR)4 : -NCEP: NCEP/NCAR R1 reanalysis (Kalnay et al., 1996) belongs to the most widely used reanalysis data sets.It is of coarser resolution (2.5 • ) and is one of the oldest systems still operating to date.
-MERRA: Modern Era Retrospective-Analysis for Research and Applications reanalysis from the NASA (Rienecker et al., 2011) is of higher resolution (0.5 • ) and belongs to the so-called "third generation" of reanalysis products (including ERA-Interim and CFSR).
The sensitivity of the downscaling to the various reanalysis data sets is larger than to the other experiments (Table 3).The three most recent reanalyses have comparably higher skills than NCEP, and CFSR shows the highest skill overall (higher than the reference run).Unlike for the other experiments, the differences in skill are accompanied with differences in trends and correlations with Pacific SST.As shown in Fig. 12, the time series still display a strong covariability but disagree for certain years (e.g. 1985, 2010).The low correlation of NCEP with SST is attributed to a smaller variability and a lower accuracy, while the lower correlation of CFSR is quite unexpected.Overall, the most striking differences concern the trends of the time series, from negative for MERRA and CFSR to statistically insignificant for ERA and NCEP.Looking for the reasons of these disagreements is beyond the scope of this study, but we can learn from this analysis that if the ENSO → MB 4750 relationship is quite stable regardless of the method and data used, it is less the case for trends or for the predicted absolute MB 4750 .In Appendix A5, we use NCEP/NCAR R1 to analyze the relationship for the longer period 1950-2013.

Sensitivity to the ENSO index
In this study we used the Niño 3.4 index which is widely acknowledged as a good indicator for ENSO variations (e.g.Trenberth, 1997) and was also used by Vuille et al. (2008b) for their study in the Cordillera Blanca.However, other studies found that the Niño 1+2 index had a higher predictive skill for Chacaltaya Glacier's MB in the outer tropics (Francou et al., 2003;Rabatel et al., 2013, with a lag of 2 and 4 months, respectively).Recently, the Multivariate ENSO Index (MEI, Wolter and Timlin, 2011) was presented as alternative to the purely SST based indices such as Niño 3.4.In Table 4, we test if our results at 4750 m a.s.l. are sensitive to a change to the Niño 1+2 and MEI indices.It appears that Niño 3.4 and MEI both have very high correlations with MB 4750 , with a maximum at lag 2 and 3, respectively (which was to expect since both indexes are highly covariable during the study period).The Niño 1+2 however has a lower predictive skill, which maximizes at lag 2. Most of these differences, however, are not significant in comparison to the uncertainty estimates of our computed MB.

Summary and discussion
Based on four years of distributed SEB/SMB at Shallap glacier, we calibrated a statistical model linking each individual SEB/SMB flux to local atmospheric variables extracted from reanalysis data.We presented a new open-source tool developed for this purpose and applied it first to the ablation area and then to the entire glacier surface.The downscaled time series    3 for the description of the experiments).The period 2005-2009 is the calibration period and thus with the smallest spread.
to the decomposition of the SEB/SMB into individual fluxes (a summary of the SSTA ↔ SEB/SMB correlations is provided in Fig. 13).Niño (Niña) events imply an increase (decrease) of air temperature leading to a higher (lower) snowfall altitude and thus to an increase (decrease) of the net short wave radiation supply.This effect is enhanced by a further precipitation deficit (excess) during Niño (Niña) years.The influence of ENSO is therefore stronger at lower altitudes but it remains detectable at higher elevations through changes in total precipitation.We find a small influence of ENSO on the sensible heat flux but no significant influence on net longwave radiation or sublimation.
Our results are in accordance with our current understanding of the ENSO/glacier relationship in the Central and Tropical Andes (e.g.Arnaud et al., 2001;Favier et al., 2004;Francou et al., 2004;Vuille et al., 2008b;Veettil et al., 2014).However, we find a stronger SSTA → MB relationship than described in Vuille et al. (2008b) and cannot confirm their exceptional years (1983 and 1994).This discrepancy could be explained by the different methods used to retrieve the MB, but it is likely that the relationship is also modified by regional and altitudinal differences: Vuille et al. (2008b) analyzed the MB for the sum of several glacierized catchments of the western part of the Cordillera Blanca, while our results are valid for Shallap glacier only.If ENSO's influence on temperature is regionally stable in the Andes, its influence on precipitation is less known and highly variable.A recent study by Perry et al. (2014) found a Niño/wet signal in the Cordillera Vilcanota south of the Cordillera Blanca which, if confirmed, could counterbalance the albedo effect described here.A bit further south, Ronchail and Gallaire (2006) reported opposite ENSO effects within short distances, with a Niña/dry signal in the Zongo valley lowlands and a Niña/wet signal on the higher Altiplano.MB, future studies should focus on the atmospheric mechanisms of this relationship and assess its latitudinal and altitudinal stability.
A major source of uncertainty in our method is the short period available for calibration, preventing us to develop more sophisticated models (for example including distant predictors or seasonally dependant downscaling functions).Fortunately, the four years used here are dynamically variable and contain neutral and Niña periods, as well as a few months with SSTA above the Niño threshold.Our uncertainty estimates computed with cross-validation are robust, but they remain high and prevent more detailed analyses of individual events.In particular, the sensitivity analyses showed that if the MB variability is persistent between the experiments, the absolute values and trends can vary considerably.This is especially the case when changing the reanalysis products, an issue that should be kept in mind when carrying out long-term glacier modeling studies.
Nevertheless, DownGlacier proved to be a versatile and efficient tool to extend existing SEB/SMB series in time, provided that there are no persistence effects or heavy autocorrelation in the calibration time series.These conditions are met for monthly values in the tropics, were we expect DownGlacier to bring helpful insights on decadal to centennial glacier variability.For mid-latitude glaciers covered with seasonal snowpack it will be necessary to include nonlinear and persistent effects (for example by adding surface albedo parameterizations).The major obstacle to such enhancements is the lack of long and reliable SEB/SMB time series for calibration: here, combined statistical and dynamical approaches might help to complement the otherwise irreplaceable glacio-meteorological observations.

A1 Solving the SEB/SMB equations on monthly averages
On monthly averages, the ice surface is practically never at the melting point and the equality F = QM does not hold true.DownGlacier implements a simple test to assess this error: in a "perfect downscaling" experiment, the downscaled variables (bold in Eqs. 1 and 2) are set to their calibration values and the skill of the diagnostic variables is assessed using the usual statistical scores that show the error related to the averaging only.Figure A1 displays the RMSE σ of the perfect downscaling experiment for all altitude slices of the glacier along with monthly air temperature.For most parts of the glacier the error is close to 1 %, but reaches 14 % at the 5000 m a.s.l.altitude slice where the air temperature is closest to 0 • C. For conditions close to the melting point a substantial part of the energy residual F will not be converted to melt but will heat the ice.At colder temperatures, F will be close to 0 and less relevant.This error is small in comparison to the other uncertainties of the method (see Sect. 4) and is negligible at the altitude of 4750 m a.s.l.

A2 The Lasso
Extensive treatment of the Lasso method can be found in Tibshirani (1996) and in statistical textbooks (e.g.Hastie et al., 2009).Here we provide some elements about basic principles of the method.First, we recall that for a multiple linear regression problem with p predictors the objective is to find the parameters β 0 . ..β p such as where Y is the variable to predict (vector of n observations y 1 . ..y n ) and X 1 . ..X p are the predictor vectors (also of length n).The free parameters β 0 . ..β p are usually fitted by minimizing the residual sum of squares RSS: ) This method becomes unstable when the predictors are collinear (anti-correlated predictors will lead to very high parameter estimates) and is subject to over-fitting when p becomes large.The role of stepwise regression algorithms is to select meaningful predictors in order to keep p small and prevent these problems.The Lasso, in turn, can fit a model containing all original p predictors (thus generalizing the predictor selection problem) using a technique that constrains the 4800 5000 5200 5400 5600 5800 Altitude (m) where λ ≥ 0 is a penalization coefficient which has to be determined separately.This penalization has the advantage to prevent over-fitting by shrinking the coefficients and to force some of the coefficients to be equal to zero (predictor selection) when λ is large.λ is usually chosen among an ensemble of predefined values which are tested one by one: the value leading to the smallest cross-validation RMSE is selected5 .The advantage of Lasso over stepwise regression is shown by our sensitivity analyses (Sect.5) and is also illustrated in Fig. A2 (see next Appendix for details).The improvements over the other methods is not overwhelming in this case but Lasso proved to be much more stable (and fast) in the early exploration stages of this study, when we considered many different predictor combinations.With very large p, stepwise regression showed high variance and high sensitivity to the predictor set (low out-of-sample cross-validation scores) while Lasso remained robust.

A3 Cross-validation
The principle of cross-validation it to hide information to the statistical model by calibrating it with a smaller subset of the data and testing its predictions against the remaining (unseen) subset.DownGlacier realizes two automatic steps to  2).
choose the downscaling function f : selection (s) and calibration (c).In the case of Lasso, (s) consists of choosing the penalization parameter λ using in-sample cross-validation and (c) consists of fitting the penalized coefficients.In the case of stepwise regression, (s) consists of choosing a subset of the predictors and (c) consists of fitting the least-square coefficients.As discussed early by e.g.Elsner and Schmertmann (1994), it is crucial to evaluate both steps (s) and (c) in the cross-validation procedure.
The need for out-of-sample cross-validation is not always obvious (when model selection is based on partialcorrelation for example) even if all automated predictor selection methods should be cross-validated.The following way to select the predictors is more obvious: the predictors might be added and removed iteratively for their capacity to reduce the cross-validation RMSE.We provide an example of using this stepwise algorithm in Fig. A2, which displays the scores of three different validation steps: full model (selection s and fit f based on all available data), crossvalidation (selected only once based on all available data but fitted 48 times using cross-validation) and out-of-sample cross-validation (selected and fitted 48 times using crossvalidation).We see that the algorithm is able to reach "better" cross-validation scores than the Lasso.However, several of the predictors chosen by the algorithm are very likely to be added by chance rather than for their real predictive skill, as shown by the out-of-sample cross-validation scores.

A4 Interpretation of the downscaling functions
The number of predictors selected by the Lasso varies between 7 and 17 (Table 2), which is larger than the number of predictors we would obtain with stepwise regression algorithms.Indeed, the Lasso might choose a linear combi-nation of correlated predictors instead of a single predictor with less predictive skill, by shrinking less significant coefficients to values close to zero.Table A1 presents the six most important predictors and their coefficients (normalized in %) for each downscaled variable.Some of the functions allow a direct and meaningful interpretation: LW in for example is strongly related to relative humidity.It is also coherent that higher temperatures imply a more negative LW out .Similarly, the first two predictors of QS are wind components, and QL is controlled by relative humidity to a large extent.PRCP Total is a function of relative humidity and total cloud cover and is also inversely proportional to the zonal wind flow at 650 hPa, which is consistent with the assumption that most of the moisture in the Cordillera Blanca originates from the Amazon Basin.We should however not over-interpret these functions, as shown by some unexpected results (e.g.prcp sfc positively correlated to SW in ).Covariability (positive and negative) between predictors confuses the interpretation, and choosing another predictor set can produce very similar predictions despite of distinct downscaling functions (see Sect. 5.2).

A5 Relationship between SSTA and MB for 1951-2013
The NCEP/NCAR R1 products are available from 1948 onwards and the SSTA data from 1950 onwards, allowing us to analyze the SSTA → MB 4750 relationship for the period 1951-2013 (Fig. A5).In general, the downscaled MB 4750 does not correlate as well with NCEP/NCAR R1 as it does with other reanalysis products (Table 3).The relation is clear throughout the 60 years but with lower correlations values for the 1951-1980 period ("deterministic" r 2 , i.e. without RMSE and without detrending) The lower pre-1980 correlations are mostly due to several exceptionally low MB 4750 values during La Niña and Neutral events during that period.The reasons for these differences are speculative but some discrepancies are likely to be explained by deficiencies in the reanalysis products before the introduction of satellite data in 1979.This effect is particularly strong in the Southern Hemisphere, where observational data are sparse (e.g.Tennant, 2004).

Figure 2 .
Figure 2. The 5-month running average of Niño 3.4 sea surface temperature anomalies (SSTA, base period: 1981-2010) and Niño/Niña classification afterTrenberth (1997).The threshold values (−0.4 and 0.4 K) are indicated by black broken lines.The length of each period (in months) is indicated at the bottom.Note that here and throughout the paper the SSTAa time series have been shifted forward by 3 months to account for the lagged response of the atmospheric conditions at Shallap Glacier.

Figure 3 .
Figure 3.Time series of the reference data set (black), full downscaling model (dotted blue) and out-of-sample cross-validation (red) during the calibration period.Shown are the variables air temperature, solid precipitation, net shortwave radiation, and MB at 4750 m a.s.l.

Figure 4 .
Figure 4. Checking the physical consistency of the downscaled variables.Scatter plots of reference (2005-2009, red) and downscaled (1980-2014, blue) time series at 4750 m a.s.l.(a) and (b): sensible heat flux vs. wind-speed and air temperature.(c) and (d): latent heat flux vs. vapor pressure and wind-speed.(e): albedo vs. solid precipitation.(f) represents the scatter plot of the diagnostic mass balance (computed from the several downscaled variables) vs. the downscaled mass balance (1980-2014).

Figure 5 .
Figure 5.Time series of the computed monthly mass balance at 4750 m a.s.l.The grey shading represents ±RMSE (including the RMSE of both the downscaled and the reference data).The calibration period is outlined by the green vertical bars.

Figure 6 .
Figure6.Annual cycles of mass balance, air temperature, solid and total precipitation at 4750 m a.s.l. for each Niño (red) and Niña (blue) periods (note that some annual cycles are incomplete).The average of all neutral months is drawn in black (error range omitted for clarity).

Figure 7 .Figure 8 .
Figure 7. Annual cycles of the surface energy (top) and mass (bottom) fluxes at 4750 m a.s.l.Left: 1980-2014 average.Right: average difference between the Niño and Niña composites.The numbers of values for the Niño (red) and Niña (blue) composites are indicated at the bottom of the plots.Note the different y axis ranges and that none of these differences is significant in the statistical sense because of the large standard deviation between years combined with the small number of composites and the variables' RMSE.

Figure 9 .
Figure 9. Out-of-sample cross-validation scores for selected variables and for each 50 m altitude slice at Shallap Glacier.Left: RMSE expressed in % of the standard deviation σ .Right: Brier skill score (BSS).

Figure 10 .
Figure 10.Time series of the reference data set (black), full downscaling model (dotted, blue) and out-of-sample cross-validation (red) during the calibration period.Shown are the mass balance time series at the 4850, 5450 and 5700 m a.s.l.altitude slices and averaged over the whole glacier.

FFigure 11 .Figure 12 .
Figure 11.Same as Fig. 8 but for the glacier averaged MB.The shading represents ±2 RMSE.Note that this mass balance does not account for changing glacier geometry.

Figure 13 .
Figure 13.Glacier averaged contribution (x axis) and correlation (y axis) between annual Niño 3.4 SSTA and each SEB (left panel) and SMB (right panel) flux for 1980-2013.Note that the error bars are related to the uncertainty of the downscaling only (not of the calibration data) and that these results do not account for changing glacier geometry.

Figure A2 .
Figure A2.Box plots of the Brier skill score (BSS) of each validation step for the Lasso and the stepwise downscaling algorithms.Each box represents a population of 14 scores (one for each downscaled variable listed in Table2).

Table 1 .
Selected predictors from the monthly ERA-Interim fields.

Table 2 .
Variables statistics (monthly mean and standard deviation), number of selected predictors and out-of-sample cross-validation scores r 2 , RMSE, RMSE σ (expressed in % of the standard deviation σ ) and Brier skill score (BSS) for the downscaled variables and the diagnostic variable MB at the 4750 m a.s.l.altitude slice.The variables Temp (air temperature), VP (vapor pressure), WS (wind speed) and PRCP Total (total precipitation) are downscaled and listed here for information, but they are not used to calculate MB.

Table 3 .
Results of the sensitivity experiments for MB 4750 : skill scores RMSE (mm w.e month −1 ) and BSS, linear trend (m w.e yr −1 ) and correlation with Pacific SST Anomalies (detrended, without taking RMSE into account).Trends and correlation values adjoined with a * indicate significance at p < 0.01.

Table 4 .
Coefficient of determination (r 2 ) between the computed annual MB 4750 and various ENSO indices, for lags between 0 and 5 months.