Semi-automated calibration method for modelling of mountain permafrost evolution in Switzerland

Permafrost is a widespread phenomenon in mountainous regions of the world such as the European Alps. Many important topics such as the future evolution of permafrost related to climate change and the detection of permafrost related to potential natural hazards sites are of major concern to our society. Numerical permafrost models are the only tools which allow for the projection of the future evolution of permafrost. Due to the complexity of the processes involved and the heterogeneity of Alpine terrain, models must be carefully calibrated, and results should be compared with observations at the site (borehole) scale. However, for large-scale applications, a site-specific model calibration for a multitude of grid points would be very timeconsuming. To tackle this issue, this study presents a semiautomated calibration method using the Generalized Likelihood Uncertainty Estimation (GLUE) as implemented in a 1-D soil model (CoupModel) and applies it to six permafrost sites in the Swiss Alps. We show that this semiautomated calibration method is able to accurately reproduce the main thermal condition characteristics with some limitations at sites with unique conditions such as 3-D air or water circulation, which have to be calibrated manually. The calibration obtained was used for global and regional climate model (GCM/RCM)-based long-term climate projections under the A1B climate scenario (EU-ENSEMBLES project) specifically downscaled at each borehole site. The projection shows general permafrost degradation with thawing at 10 m, even partially reaching 20 m depth by the end of the century, but with different timing among the sites and with partly considerable uncertainties due to the spread of the applied climatic forcing.

The ensemble accounts for a comprehensive range of model uncertainty and is forced by the IPCC SRES 7 A1B emission scenario (Nakicenovic and Swart, 2000). Due to their limited spatial resolution, site-8 specific features are typically not resolved by climate models and even on resolved scales, models are  To obtain robust and reliable climate scenarios at the six considered sites, a newly implemented SD/BC 16 method was used that specifically targets locations that lack long-term data. A detailed description and 17 comprehensive validation of the approach is given by Rajczak et al. (2016). It is designed as a two-step 18 procedure sketched in Figure 1. In the first step, climate model simulations are downscaled to match long- 19 term observational measurements at a most representative site (MRS) within a surrounding measurement 20 network (e.g. MeteoSwiss weather stations). In the second step, the downscaled and bias-corrected time 21 series from the MRS are spatially transferred to the site of interest (e.g. a permafrost monitoring site). 22 Both steps rely on the quantile mapping (QM) method, a well-established statistical downscaling and bias 23 correction technique (e.g. Themessl, 2011). The concept behind QM is to correct the distribution of a 24 given predictor (e.g. climate model output) in such a way that it matches the distribution of a predictand 25 (e.g. observations of the same variable at a monitoring site). Values outside the range of calibrated values 26 are treated using the correction for the 1st (99th) quantile. Within this study, the spatial transfer is 27 performed from an objectively selected MRS within the MeteoSwiss monitoring network. Consequently, 28 Rajczak et al., (2016) show that the MRS is, in many cases, not the closest station but rather one at a 29 similar altitude.

Reconstruction of meteorological observations 1
The two-step procedure (Figure 1) additionally facilitates the reconstruction of data at the monitoring sites 2 for non-measured periods. The concept behind reconstructing data is to spatially transfer ( Figure 1, step 2) 3 observed values from an MRS to the target site. In the framework of the present study, data were 4 reconstructed for some periods between 1981 and 2013. Note, that reconstruction is constrained by the 5 availability of data at the MRS. An extensive validation of the reconstruction performance is given by  14 For each site, the reconstructed meteorological data consists of daily series for the period between 1981-15 2013 for five variables: mean air temperature, precipitation sum, mean wind speed, mean relative 16 humidity and global radiation. For the site MBP, the global radiation series could not be reconstructed 17 because of a lack of validation data and could therefore not be used as forcing variable in the calibration 18 for this site. Global radiation for MBP has therefore been estimated by CoupModel based on potential 19 global radiation (depending on latitude and declination) and atmospheric turbidity (Jansson 2012). 20 Independent comparison between measured, reconstructed and CoupModel estimated global radiation 21 values for COR showed an overestimation of global radiation by the CoupModel leading to near-surface 22 maximum temperature biases of up to 10°C in summer (cf. supplementary material). However, the 23 calibration technique applied (see sections 4 and 5) would compensate potential biases in the temperature 24 simulations by adjusting related parameters in the model, e.g. snow cover parameters or the albedo.
Despite the good quality of the reconstruction, some short gaps could not be avoided. These gaps have 1 been filled by artificial random selection of data from other years at the same date. This method is 2 satisfactory as the gaps are short and infrequent. 3 For seven of the chains, wind speed scenarios were not available. As the wind speed scenarios of all 4 available GCM-RCM chains are very similar, we consider it acceptable to use the median of these 5 scenarios as a substitute for the seven chains with missing wind speed scenarios. For calibration, we used series of borehole temperature data for each site with a minimum length of 10 8 years (Table 1). Borehole data is often considered as "ground truth data" but potential measurement errors 9 are possible due to several reasons (such as sensor or logger drift, logger failure and infiltration of water 10 inside the borehole casing, to name a few). Borehole data in mountainous terrain are also influenced by 3-   18 The model used for this study is the CoupModel, a 1-dimensional numerical model coupling soil, snow  The model couples the water and heat transfer of the soil using the general heat flow equation:

COUP model description & experimental set-up
where C (J K −1 ) is the heat capacity of soil, C w (J K −1 ) is the heat capacity of water, T (z,t) (K) is the soil 1 temperature, L f and L v (J kg −1 ) are the latent heat of freezing and vapor, ρ (kg m −3 ) is the density, Θ i (z,t) is 2 the volumetric ice content, k (W m −1 K −1 ) is the thermal conductivity, t is the time, z is the depth and q w 3 (z,t) and q v (z,t) (kg m −2 s −1 ) are the water and vapor fluxes.

4
The lower boundary condition is derived from the sine variation of the temperature at the soil surface and 5 a damping factor with depth. The maximum model depth is different for the various sites due to the 6 varying maximum depth of the available boreholes, but it is at least 30 meters for all sites and well below 7 the depth of zero annual amplitude (see Figure 3). The prescribed heat flux at the lower boundary 8 condition is therefore negligible. This enables comparatively stable conditions at the lower boundary, and 9 accounts for the often isothermal conditions found in Alpine permafrost at this depth (Scherler et al. 2013, 10 PERMOS 2013). However, the long-term variability of permafrost conditions at the lower boundary 11 cannot be simulated with this approach. The hydraulic boundary condition is given by gravity-driven 12 percolation if the lowest compartment is unsaturated..

13
The upper boundary condition is calculated using the complete energy balance at the soil surface (or snow 14 surface, if present). The convective heat inflow of water is given by precipitation and snow melt 15 multiplied by the surface temperature and the heat capacity of liquid water (C w ): where q h (0) (J m -2 d -1 ) is the soil surface heat flow, T s is the soil surface temperature, T 1 is the temperature 18 in the uppermost soil layer, ∆ is a parameter representing the temperature difference between air and 19 precipitation, q v (0) and q w (0)are the vapour and water fluxes at the surface and L v is the latent heat of 20 vapour. For periods with snow cover, the upper boundary condition is calculated assuming a steady state 21 heat flow between the soil and a homogeneous snow pack using the thermal conductivity of snow.

22
Temporally changing insulation conditions of the snow cover can be simulated by a critical snow height 23 that corresponds to the snow height that completely covers the soil. It depends mainly on the surface 24 roughness and reflects the fact that 50 cm of snow induces different insulation properties for a surface 25 consisting of 1-2 m high boulders (e.g. for a rock glacier, COR) compared to a rather homogenous surface 26 covered by sandy soil (e.g. at SCH, cf. also the discussion in Staub and Delaloye 2016). The fraction of 27 bare soil is then calculated by a ratio between 0 and this threshold (see Table 2) and further used to estimate the average soil surface temperature and surface albedo. This critical snow height is one of the 1 parameters having the largest influence in our calibration procedure.

2
Snow is simulated by partitioning precipitation into rain and snow depending on temperature threshold 3 parameters. The snow cover is assumed to be horizontally and vertically homogenous. Snow melt is 4 estimated as part of the heat balance of the snow pack, including net radiation, sensible and latent heat 5 flux to the atmosphere, heat flux in precipitation, snow temperature change and heat flux to the soil.

6
Further important processes in COUP are listed in Table 2 together with the respective equations. only negligible differences with respect to the procedure described above, which may also be due to low 14 ice contents in the bedrock layers at larger depths, which exert no large cooling effect to the surface from 15 below during thawing. However, this approach clearly neglects all long-term effects of past climatic 16 conditions on the ground thermal regime at larger depths. Therefore, simulation results at larger depths 17 should not be interpreted in a climate context.

12
In addition, a model set-up which is consistent with present day conditions may not be optimal for future 13 climatic conditions. This well-known problem is inherent to most long-term transient simulations with a outliers, which cannot be explained by site-specific conditions. However, it has to be noted that the aim of 18 the calibration procedure is not the determination of the parameter values (e.g. physical properties) 19 themselves, but to get a model that is thermally most representative for the ground thermal regime at a 20 given site. Keeping the above constraints in mind, for long-term simulations, where no observations are 21 available, it has to be assumed that the parameters governing the ground thermal regime do not change 22 significantly over the duration of the simulation. . Those four to six sensitive parameters were then used in a second iteration of 20'000 simulations to refine the calibration. It is important to note that the sensitive parameters may 1 differ from site to site depending on site-specific characteristics, although initial parameters and their 2 ranges were equivalent for all sites (see next section). From the 20'000 simulations of the second GLUE 3 calibration iteration, an optimal model set-up for each site was then selected based on statistical 4 performance indicators (r2 and the mean error, ME) for ground temperature at several depths. The 5 calibration procedure is summarized in Figure 4. In addition to useful information about site-specific 6 processes and their representation in the model (see next section), the calibration obtained by this method 7 led to the selection of a model set-up for the long-term simulations forced by the GCM/RCM data.

Calibration results
3 5.1 Relative importance metrics 4 The GLUE method was used to test a large number of parameters at each site and to statistically assess lasting freeze/thaw events a good correlation will always be difficult to achieve, but a reasonable good 19 match was achieved at least for the near-surface layers by optimising the correlation. In a future step, 20 other quantities such as the energy content of the ground (Jafarov et al. 2012) could be used for 21 calibration, but also variables directly related to the amount of water present (see section 7) can be used to 22 enhance the calibration.

23
The results of the calibration are shown in Figure 5

13
In comparison with r2, ME is less influenced by snow parameters, as snow cover is more important for very limited influence as it explains only 0.1 % of the r2 and 0 % of the ME at this depth. This is probably 20 related to the thick model layer with high porosity (cf Figure 3), where massive ice is permanently present 21 which decouples the lowest layers from processes at the upper boundary.

22
The albedo parameters have a significant influence on the calibration results at all sites, with relative 23 importance for the ME ranging from 17 % at SCH to 74 % at LAP, reflecting the calibration of the surface 24 temperature amplitudes. The r2 (reflecting the inter-seasonal variation) is less or not influenced by the 25 albedo. In some cases at greater depths (r2 and ME at 10 m at COR, 8 m at LAP), albedo appears to have 26 a high relative influence, sometimes higher than at the surface. This is most likely not related to realistic conditions to a larger extent than at MBP and RIT, which are also sites with coarse-grained material, but 12 with a smaller estimated porosity by the model (Figure 3).

13
Not surprisingly, the thermal conductivity plays a large role at depth where the relative importance (ME) 14 ranges from 34 % at SCH to 89 % at STO and even 100 % at LAP (an exception is COR with 11 %). At Among all sites, only RIT is insensitive to changes in the thermal conductivity (3 % of ME at 30 m depth).

25
On the other hand, evaporation has a strong influence on calibration even at larger depths (44%), which is 26 in strong contrast to all other sites, where this parameter shows only little influence (between 0 and 10 %).

27
When analyzing the specific values obtained for the different calibration parameters, the parameter related 28 to evaporation (water tension  eg , cf. Table 2) did not show specifically high or low values for RIT, but 29 the parameterized values for T snow (minimum temperature at which precipitation falls only as snow) and S crit (critical snow depth, at which the whole surface is considered to be covered by snow) were very low 1 (T snow = -4.86°C) and high (S crit = 1.9 m), respectively. Whereas the former leads to comparatively large 2 precipitation input as rain, the latter leads to an almost never completely snow-covered surface. In 3 addition, the wet soil albedo for RIT is calibrated with the lowest value of all sites ( wet = 7.0) whereas its 4 dry albedo is comparatively high ( dry = 34.6). In total, this parameter combination enables additional 5 energy input by liquid water into the subsurface, which of course also explains the high sensitivity to (1.6 to 3.6 m depth). This points to an imprecise initial soil structure set-up that the model needed to 16 correct, in this case the thickness of the surface blocky layer with high porosity 17 When considering the absolute importance (% in Figure 5, left), we notice that deep boreholes (COR, RIT 18 and STO) have low percentages, which is not surprising as the temperatures at those depths vary on much 19 longer time-scales and depend primarily on the structural set-up of the model. As their future evolution is 20 influenced by past climates, which are not included in the present study, simulated temperature changes at 21 large depths will not be discussed within this study. However, their correct representation for present day 22 climate is important as lower boundary condition for shallower levels. Contrary to these deep levels, the 23 surface at all sites shows the highest sensitivity to the tested parameters mainly due, as explained above, to 24 the high importance of snow and albedo parameters.

25
After the LGM analysis, the most sensitive parameters for each site were identified to be used in the 26 second iteration of the GLUE calibration procedure (cf. Figure 4) to refine the calibration. The parameters 27 listed in Figure 5 (right), are the four to six most important parameters in the variation of statistical 28 indicators; their relative importance for the variation of r2 and ME at three different depths is represented by the pie charts. One parameter that shows high sensitivity is ΔS crit which allows the model to correct for preferred over a globally best r2 or ME averaged over all depths because the latter would put the weight 15 equally to all depths, whereas the surface is more important (and more accurate) regarding decadal 16 changes. Figure 6 shows the performance of the calibration at each site for three different depths, indicating the 18 obtained value of r2 and ME. It has to be pointed out that a low r2 or high ME value does not mean that a 19 better result at a certain depth cannot be obtained by GLUE, because the selection process is a 20 compromise between r2 and ME at several depths. Most calibration runs produce either well calibrated 21 temperatures near the surface or at greater depths, but not both for the same set of calibration parameter. 14 Seasonal variations at this depth are only reproduced correctly at MBP (low ME and high r2) and, to a 15 certain extent, at COR and SCH (cf. Figure 6). At SCH a warm bias is introduced in the model at 3 m  temperature (see Figure 2) and precipitation are summarized in Table 3.  Once its ice has permanently melted, the 10 m layer is subject to significant seasonal variations (see COR, 20 RIT and STO). SCH is not as much affected by the seasonal variations although the layer is projected to 21 be unfrozen early in the century because of a smaller decrease in snow cover duration in comparison with 22 other sites. In addition, its permafrost degradation is less pronounced than projected in Scherler et al.

23
(2013). This is most probably due to the cold bias introduced during the calibration and to a slightly As mentioned above, the snow cover duration is one key element for the evolution of the ground thermal 3 regime. Its evolution in the future is expected to be mostly influenced by changes in air temperature: the 4 changes in the annual sum of precipitation are highly uncertain and do not generally exceed ±5 % in the 5 GCM/RCM output (see Table 3; but with high variability among the chains), while the simulated mean 6 change in snow cover duration ranges from -20 % (SCH) to -37 % (LAP). Figure 8 shows the relationship The GLUE calibration method is not meant to determine the physical value of a parameter. The model is 17 physically-based regarding its underlying equations, but has to rely on parameterisations for many of the 18 complex processes in the subsurface and at the soil-snow-atmosphere boundary. The values for all model 19 parameters at all depths cannot be known exactly, especially as almost no direct measurements of these 20 properties are available. The GLUE method enables to find the value which gives the best fit with 21 observations within the number of tested runs. But as the system is complex, with sometimes highly

17
One challenge of the calibration with GLUE is that there are many parameters to calibrate which are often 18 underdetermined with respect to the available data. Therefore, the optimum is sometimes poorly-defined 19 especially for sites that include processes like 2-d air circulation which is not taken into account in the  Other processes that were not taken into account in the model concern the snow redistribution by    course always highly uncertain, but it has to be noted that the best results of the GLUE procedure were not 10 obtained with the highest porosities for the deeper layers: during the selection process, the consideration 11 of the r2 tended towards high porosities, but the best performances were obtained with lower porosities 12 when considering the ME (cf. Figure 3).

24
The present paper tested a semi-automated method for a soil/permafrost model calibration, in order to be 10  The calibration of upper boundary parameters, especially parameters related to snow cover, was 11 shown to have a large influence on the calibration performance, also on deeper ground layers.

12
Therefore, efforts to obtain a precise upper boundary calibration must be undertaken, especially by study focuses more on the site-specific processes understanding, while the long-term simulation results will not necessarily be better than results from simpler approaches as in the above cited 1 studies. But we believe that the considerably higher efforts of our approach are well justified by 2 the knowledge gained regarding the effect of the dominant processes at the different sites. Of 3 course, future work has to be directed into including the already identified missing processes into 4 the model formulation (i.e. convection).

6
We believe that the method presented here can be used as a starting point for large-scale modelling of 7 the permafrost distribution in the Alps provided that an increased number of sites with high quality 8 data series of observed ground temperature become available. A distributed model could be derived 9 from the numerous calibrated sites by interpolation, in combination with digital elevation models, 10 remote sensing data, GST measurements and subsurface data from geophysical surveys.        18 Figure 10 -difference in simulated 10 m temperature for the long-term simulation between the reference 19 run for SCH (Figure 7) and the improved calibration of Fig. 9 (c,d).   and the mean error, ME) for ground temperature at several depths. Among those four to six set-ups, the 7 median (regarding the evolution of active layer thickness) is eventually used for long-term simulations. LGM relative importance of six groups of parameters (snow, albedo, hydraulic 2 conductivity, saturation, thermal conductivity and evaporation) on the r2 (left) and the ME (right) at three 3 different depths. The percentage indicates the total LGM absolute importance. Right panel: LGM relative 4 importance of the most sensitive parameters that were selected for the second step of the calibration procedure.     Fig. 9 (c,d).