How accurate are estimates of glacier ice thickness? Results from ITMIX, the Ice Thickness Models Intercomparison eXperiment

. Knowledge of the ice thickness distribution of glaciers and ice caps is an important prerequisite for many glaciological and hydrological investigations. A wealth of approaches has recently been presented for inferring ice thickness from characteristics of the surface. With the Ice Thickness Models Intercomparison eXperiment (ITMIX) we performed the ﬁrst coordinated assessment quantifying individual model performance. A set of 17 different models showed that individual ice thickness estimates can differ con-siderably – locally by a spread comparable to the observed thickness. Averaging the results of multiple models, however, signiﬁcantly improved the results: on average over the 21 considered test cases, comparison against direct ice thickness measurements revealed deviations on the order of 10 ± 24 % of the mean ice thickness (1 σ estimate). Models relying on multiple data sets – such as surface ice velocity ﬁelds, surface mass balance, or rates of ice thickness change – showed high sensitivity to input data quality. Together with the require-ment of being able to handle large regions in an automated fashion, the capacity of better accounting for uncertainties in the input data will be a key for an improved next generation of ice thickness estimation approaches.

Abstract. Knowledge of the ice thickness distribution of glaciers and ice caps is an important prerequisite for many glaciological and hydrological investigations. A wealth of approaches has recently been presented for inferring ice thickness from characteristics of the surface. With the Ice Thickness Models Intercomparison eXperiment (ITMIX) we performed the first coordinated assessment quantifying individual model performance. A set of 17 different models showed that individual ice thickness estimates can differ considerably -locally by a spread comparable to the observed thickness. Averaging the results of multiple models, however, significantly improved the results: on average over the 21 considered test cases, comparison against direct ice thickness measurements revealed deviations on the order of 10 ± 24 % of the mean ice thickness (1σ estimate). Models relying on multiple data sets -such as surface ice velocity fields, surface mass balance, or rates of ice thickness change -showed high sensitivity to input data quality. Together with the requirement of being able to handle large regions in an automated fashion, the capacity of better accounting for uncertainties in the input data will be a key for an improved next generation of ice thickness estimation approaches.

Introduction
The ice thickness distribution of a glacier, ice cap, or ice sheet is a fundamental parameter for many glaciological applications. It determines the total volume of the ice body, which is crucial to quantify water availability or sea-level change, and provides the link between surface and subglacial topography, which is a prerequisite for ice flow modelling studies. Despite this importance, knowledge about the ice thickness of glaciers and ice caps around the globe is limited -a fact linked mainly to the difficulties in measuring ice thickness directly. To overcome this problem, a number of methods have been developed to infer the total volume and/or the ice thickness distribution of ice masses from characteristics of the surface.
Amongst the simplest methods, so-called "scaling approaches" are the most popular (see Bahr et al., 2015, for a recent review). These approaches explore relationships between the area and the volume of a glacier (e.g. Chen and Ohmura, 1990;Bahr et al., 1997), partially including other characteristics such as glacier length or surface slope (e.g. Lüthi, 2009;Radić and Hock, 2011;Grinsted, 2013). Such approaches, however, yield estimates of the mean ice thickness and total volume of a glacier only.
Methods that yield distributed information about the ice thickness generally rely on theoretical considerations. Nye (1952), for example, noted that for the case of an idealized glacier of infinite width, ice thickness can be calculated from the surface slope using estimates of basal shear stress and assuming perfect plastic behaviour. Nye (1965) successively extended the considerations to valley glaciers of idealized shapes, whilst Li et al. (2012) additionally accounted for the effect of side drag from the glacier margins. Common to these three approaches is the assumption of a constant and known basal shear stress. Haeberli and Hoelzle (1995) were the first suggesting that the latter can be estimated from the glacier elevation range, and the corresponding parametrization has been used in a series of recent studies (e.g. Paul and Linsbauer, 2011;Linsbauer et al., 2012;Frey et al., 2014).
Early approaches that take into account mass conservation and ice flow dynamics go back to Budd and Allison (1975) and Rasmussen (1988), whose ideas were further developed by Fastook et al. (1995) and Farinotti et al. (2009). The latter approach was successively extended by Huss and Farinotti (2012), who presented the first globally complete estimate for the ice thickness distribution of individual glaciers. Alternative methods based on more rigorous inverse modelling, in contrast, have often focused on additionally inferring basal slipperiness together with bedrock topography (e.g. Gudmundsson et al., 2001;Thorsteinsson et al., 2003;Raymond-Pralong and Gudmundsson, 2011;Mosbeux et al., 2016).
In the recent past, the number of methods aiming at estimating the ice thickness distribution from characteristics of the surface has increased at a rapid pace. Methods have been presented that include additional data such as surface velocities and mass balance (e.g. Morlighem et al., 2011;McNabb et al., 2012;Clarke et al., 2013;Farinotti et al., 2013;Huss and Farinotti, 2014;Gantayat et al., 2014;Brinkerhoff et al., 2016), as well as approaches that make iterative use of more complex forward models of ice flow (e.g. van Pelt et al., 2013;Michel et al., 2013Michel et al., , 2014 or non-physical methods based on neural network approaches (Clarke et al., 2009;Haq et al., 2014). This development has led to a situation in which a wealth of approaches is potentially available, but no assessment comparing the relative strengths and weaknesses of the models exists.
Against this background, the working group on glacier ice thickness estimation, hosted by the International Association of Cryospheric Sciences (www.cryosphericsciences. org), launched the Ice Thickness Models Intercomparison eXperiment (ITMIX). The experiment aimed at conducting a coordinated comparison between models capable of estimating the ice thickness distribution of glaciers and ice caps from surface characteristics. Emphasis was put on evaluating the model performance when no a priori information on actual ice thickness is provided. This was to focus on the most widespread application of such models, that is, the estimation of the ice thickness of an unmeasured glacier.
This article presents both the experimental set-up of ITMIX and the results of the intercomparison. The accuracy of individual approaches is assessed in a unified manner, and the strengths and shortcomings of individual models are highlighted. By doing so, ITMIX not only provides quantitative constraints on the accuracies that can be expected from individual models but also aims at setting the basis for devel-The Cryosphere, 11, 2017 www.the-cryosphere.net/11/949/2017/ Table 1. Overview of the test cases considered in ITMIX. Glacier type follows the GLIMS classification guidance (Rau et al., 2005). "Calv" indicates whether the glacier or ice cap is affected by calving (x) or not (-). The following abbreviations are used: glacier area (A); simple basin (SB); compound basin (CB); mountain (mtn.); glacier outline (OL); digital elevation model of the glacier surface (DEM); surface mass balance (SMB); surface ice flow velocity (Vel.); rate of ice thickness change (∂h/∂t); ice thickness measurements (H); unpublished data (Unpub.). References to the data are given. oping a new generation of improved ice thickness estimation approaches.
2 Experimental set-up ITMIX was conducted as an open experiment, with a call for participation posted on the email distribution list "Cryolist" (http://cryolist.org/) on 13 October 2015. Individual researchers known to have developed a method for estimating glacier ice thickness were invited personally. Upon registration, participants were granted access to the input data necessary for the experiment and the corresponding set of instructions.
The input data referred to the surface characteristics of a predefined set of 21 test cases (see next section, Table 1, and Fig. 1) and participants were asked to use these data for generating an estimate of the corresponding ice thickness distribution. Results were collected and compared to direct ice thickness measurements.
No prior information about ice thickness was provided, and the participants were asked not to make use of published ice thickness measurements referring to the considered test cases for model calibration. This was to mimic the general case in which the ice thickness distribution for unmeasured glaciers has to be estimated. The compliance to the above rule relied on honesty.
www.the-cryosphere.net/11/949/2017/ The Cryosphere, 11, 949-970, 2017 Participants were asked to treat as many test cases as possible and to consider data availability (see next section and Table 1) as the only factor limiting the number of addressed cases. Details on the considered test cases and the participating models are given in Sects. 3 and 4 respectively. An overview of the solutions submitted to the experiment is given in Table 2.

Considered test cases and data
The considered test cases included 15 glaciers and 3 ice caps for which direct ice thickness measurements are available and 3 synthetically generated glaciers virtually "grown" over known bedrock topographies (more detailed information below). The real-world test cases (see Fig. 1 for geographical distribution) were chosen to reflect different glacier morphologies (see Table 1) and different climatic regions, whilst the synthetic test cases were included to have a set of experiments for which all necessary information is perfectly known. Since most published approaches for estimating ice thickness were developed for applications on mountain glaciers and smaller ice caps, ice sheets were not included in the experiment.
For each test case, the input data provided to the ITMIX participants included at a minimum (a) an outline of the glacier or ice cap and (b) a gridded digital elevation model (DEM) of the ice surface. Further information was provided on a case-by-case basis depending on data availability, including the spatial distribution of the (i) surface mass balance (SMB), (ii) rate of ice thickness change (∂h/∂t), and (iii) surface flow velocity. An overview of the data available for individual test cases and the corresponding data sources is given in Tables 2 and 1 respectively. For the real-world test cases, and whenever possible, temporal consistency was ensured between individual data sets. Glacier outlines and DEMs were snapshots for a given point in time, whereas SMB, ∂h/∂t, and velocity fields generally referred to multi-year averages for an epoch as close as possible to the corresponding DEM. Glacier-wide estimates of surface velocities were not available for any of the considered cases. For obtaining a possibly complete coverage, velocities from separate sources were therefore merged, which often led to discontinuities along the tile margins.
Ice thickness measurements were only used for quantifying model performance but were not distributed to the ITMIX participants. Bedrock elevations were obtained by subtracting observed ice thicknesses from surface elevations, and the bedrock was assumed to remain unchanged over time. The time periods the individual data sets are referring to are given in Supplement Table S1. Note that no specific information about the uncertainties associated to individual measurements were available. Reported uncertainties for ice thickness measurements, however, are typically below 5 % (Plewes and Hubbard, 2001). The synthetic test cases were generated by "growing" ice masses over known bedrock topographies with the Elmer/Ice ice flow model (Gagliardini et al., 2013). To do so, selected deglacierized areas located in the European Alps were extracted from local high-resolution DEMs (product DHM25 by the Swiss Federal Office of Topography) and the flow model forced with a prescribed SMB field. The SMB field was either generated by prescribing an equilibrium-line altitude and two separate SMB elevation gradients for the accumulation and ablation zone (test cases "Synthetic1" and "Synthetic2") or by constructing the field through a multiple linear regression between SMB and terrain elevation, slope, aspect, curvature, and local position (test case "Synthetic3"). In the latter case, the individual regression parameters were defined arbitrarily but such to ensure a plausible range for the resulting SMB field. The Elmer/Ice simulations were stopped after the formation of a glacier judged to be of suitable size and shape, and the corresponding ∂h/∂t and surface velocity fields were extracted. No sliding at the glacier base was assumed, and all three resulting geometries were close to steady state. Note that, to avoid numerical instabilities, the DEM used for prescribing the bedrock topography had to be smoothed significantly. For anonymizing the individual locations, the original coordinates were removed, and the individual tiles arbitrarily rotated and shifted in elevation.
All data provided as input to the ITMIX participants, as well as the results submitted by individual models, will be provided as an electronic Supplement to this article. The direct ice thickness measurements were additionally included in the Glacier Thickness Database (GlaThiDa) version 2 (WGMS, 2016).

Participating models
The ITMIX call for participation was answered by 13 research groups having access to 15 different models in total. Two modelling approaches were used twice, with two and includes (1) minimization approaches, (2) mass conserving approaches, (3) shear-stress-based approaches, (4) velocity-based approaches, and (5) other approaches. Abbreviations: glacier outline and digital elevation model of the surface (OL+DEM); surface mass balance (SMB); surface ice flow velocity (Vel.); rate of ice thickness change (∂h/∂t). For "Vel.", a distributed field of flow speeds (s) and flow directions (d) or individual point measurements (p) were provided. "x" (".") indicates that the given information was (not) provided/used.
In the columns "Data used", "(x)" indicates that the information was used when available, but that it is not strictly necessary for model application. References for the data source are given in Table 1. TOTAL models independent implementations stemming from two different groups, nine models were published prior to the call, one model consisted of a modification of an existing approach, and five models were previously unpublished. In total, thus, 17 different models submitted individual solutions ( Table 2). The 17 approaches providing individual solutions can be classified into five different categories: (1) approaches casting ice thickness inversion as a minimization problem (minimization approaches), (2) approaches based on mass conservation (mass conserving approaches), (3) approaches based on a parametrization of basal shear stress (shear-stressbased approaches), (4) approaches based on observed surface velocities (velocity-based approaches), and (5) other approaches. The principle of each of the five categories is briefly described hereafter. A more detailed description, including information about parameter choices, is found in the Supplement (Supplement Sect. S1). The supplementary description is exhaustive for unpublished approaches and is held at a minimum for published ones.

Minimization approaches
Methods within this category formulate the problem of ice thickness inversion as a minimization problem. They do so by defining a cost function that penalizes the difference between a modelled and an observable quantity. Typically, the observable quantity includes the elevation of the glacier surface (e.g. Leclercq et al., 2012;Michel et al., 2013;van Pelt et al., 2013), which can be obtained from a surface DEM. Given an initial guess for the subglacial bedrock topography, a forward model for glacier ice flow is then used to predict the observable quantity. The difference between model and observation is subsequently used to update the model, and the procedure is repeated iteratively to minimize the cost function. The forward model can be of any type, generally considers mass conservation (see next section), and often relies on a higher-order representation of ice dynamics. Three models of this category participated in ITMIX: -"Brinkerhoff-v2" (Brinkerhoff, unpublished; see Supplement Sect. S1.2 for details) includes three terms in the cost function. The first term quantifies the difference between modelled and observed surface elevations; the second penalizes strong spatial variations in bedrock elevations; and the third is used to impose zero ice thickness outside the glacier boundaries. If available, surface flow velocities are used to additionally invert for the basal traction field. The forward model is based on the Blatter-Pattyn approximation to the Stokes equations (Pattyn, 2003).
-"VanPeltLeclercq" (adapted from van Pelt et al., 2013; see Supplement Sect. S1.17) has a cost function based on the difference of modelled and observed surface elevation as well. In contrast to , which evaluates the cost function for steady-state surfaces, this approach allows for transient surface geometries to be taken into account. If available, the mismatch between calculated and observed surface velocities is used for both stopping the iteration procedure and for optimizing the model parameters affecting basal sliding and deformational flow. The implemented forward model "SIADYN" is part of the ICEDYN package (Sect. 3.3 in Reerink et al., 2010), relies on the vertically integrated shallow ice approximation (e.g. Hutter, 1983), and includes Weertman-type sliding (Huybrechts, 1991).
-"Fuerst" (Fürst et al., unpublished; Supplement Sect. S1.4) differs from the two above approaches in that the cost function is not linked to surface elevations. Instead, the function penalizes (i) negative thickness values, (ii) the mismatch between modelled and observed surface velocities, (iii) the mismatch between modelled and observed SMB, and (iv) strong spatial variations in ice thickness or surface velocities. The forward model is based on Elmer/Ice (Gagliardini et al., 2013) and the mass conservation approach of Morlighem et al. (2011).

Mass conserving approaches
Methods appertaining to this category are based on the principle of mass conservation. If ice is treated as an incompressible medium, the corresponding continuum equation states that the ice flux divergence ∇ · q has to be compensated by the rate of ice thickness change ∂h ∂t and the climatic mass balanceḃ: The methods of this category estimate the distribution of both ∂h ∂t andḃ and use that estimate to quantify the glacier's mass turnover along the glacier. The mass flux is then converted into ice thickness by prescribing some constitutive relation. Most often, an integrated form of Glen's flow law (Glen, 1955) is used. The corresponding equation, solved for ice thickness, is then generally formulated as where h is glacier ice thickness, q the mean specific ice volume flux, A the flow rate factor, n Glen's flow law exponent, ρ the ice density, g the gravitational acceleration, α the surface slope, and f a factor accounting for valley shape, basal sliding, and parameter uncertainty. To avoid infinite h for α tending to zero, a minimal surface slope is often imposed, or α is averaged over a given distance. Based on theoretical considerations (Kamb and Echelmeyer, 1986), this distance should correspond to 10-20 times the ice thickness. In most cases, the ice thickness is first inferred along prescribed ice The Cryosphere, 11, 949-970, 2017 www.the-cryosphere.net/11/949/2017/ flow lines and then distributed across the glacier or ice cap by choosing a suitable interpolation scheme. Five of the models participating in ITMIX belong to this category.
(2) for manually digitized "ice flow catchments" and along manually predefined ice flow lines. The ice volume flux across individual cross sections is estimated by integrating the SMB field of the corresponding upstream area. The method was the first suggesting that the necessity of a steady-state assumption can be circumvented when directly estimating the differenceḃ − ∂h/∂t rather than imposing constraints on the two terms separately. Many of the approaches within this and other categories have adopted this idea. , however, it fully automatizes the method, thus making it applicable at larger scales. Automatization is achieved by generating multiple flow lines according to the methods presented in Kienholz et al. (2014). The major difference between Maussion/OGGM and the approaches Farinotti or Huss is that SMB is not prescribed as a linear function of elevation but with a temperature-index model driven by gridded climate data (Marzeion et al., 2012).
(2) to account for additional factors such as basal sliding, longitudinal variations in the valley shape factor, frontal ablation, and the influence of ice temperature and the climatic regime. The latter is achieved by imposing site-specific parameters. A major difference compared to other models in this category is that all calculations are performed on elevation bands. Mean elevation-band thickness is then extrapolated to a spatially distributed field by considering local surface slope and the distance from the glacier margin. The approach was the first ice thickness model that was applied to the global scale.
-"GCbedstress" (Clarke et al., 2013; see Supplement Sec. S1.7) shares many conceptual features with Farinotti et al. (2009) as well, but it differs in its implementation. Manually delineated flow sheds are transversely dissected by ladder-like "rungs" representing flux gates. Ice flow discharges -derived from integration of the mass contribution from the upstream areaare then applied to intervening cells by interpolation. "Raw" ice thickness is derived from Eq.
(2) and the final ice thickness is smoothed by minimizing a cost function that negotiates a tradeoff between accepting the raw estimates or maximizing the smoothness of the solution.
-"Morlighem" (Morlighem et al., 2011;Supplement Sect. S1.13) was originally designed to fill gaps between ground-penetrating radar measurements over ice sheets. As such, it was cast as an optimization problem minimizing the misfit between observed and modelled thicknesses. Since no such measurements were provided within ITMIX, the method was applied without the minimization scheme. The method is thus purely based on mass conservation. Ice thickness is computed by requiring the ice flux divergence to be balanced by the rate of thickness change and the net surface and basal mass balances (see Eq. 1). For the test cases for which no ice velocities were provided, the shallow ice approximation (see below) was used together with an assumption of no-sliding to convert the computed ice mass flux into ice thickness.

Shear-stress-based approaches
Methods of this category rely on the shallow ice approximation (e.g Fowler and Larson, 1978). In the latter, the relation is assumed to hold true everywhere, from which it follows that knowledge of the basal shear stress, τ , allows for the ice thickness to be determined. Most existing approaches estimate τ from the empirical relation proposed by Haeberli and Hoelzle (1995), which relates τ to the elevation range of a glacier. The denominator of the right-hand side of the equation often includes an additional factor f , with a similar meaning as described for Eq.
(2). The models of this category mostly differ from the ones in the last section in that they do not account for mass conservation. Three such approaches participated in ITMIX: -"Linsbauer" (Linsbauer et al., 2009, 2012 see Sect, S1.10) was the first proposing to use the empirical relation by Haeberli and Hoelzle (1995) to solve Eq. (3). This is done by considering manually digitized branch lines, and determining α within 50 m elevation bins. An ice thickness distribution is then obtained by interpolating the so-obtained ice thickness along several branch lines.
-"Machguth" (Frey et al., 2014, GlabTop2; see Supplement Sect. S1.11) is based on the same concept but overcomes the need of manually drawing branch lines by applying the relation at randomly selected grid cells. During this process, α is determined from the average slope of all grid cells within a predefined elevation buffer. The final ice thickness distribution is derived from interpolation of the randomly selected points and the condition of zero ice thickness at the glacier margin. The procedure by which the random points are selected has an influence on the shape of the obtained bedrock topography.

Velocity-based approaches
As for models in Sect. 4.2, models in this category are based on an integrated form of Glen's flow law (Glen, 1955). Differently as in Eq.
(2), however, the flow law is either expressed as where u s and u b are the surface and basal ice flow velocities respectively, or such to replace q in Eq.
(2) with the depthaveraged profile velocity u (since q = u h). An assumption relating u s to u b or u is then made, which usually implies postulating the existence of some coefficient k or k for which u b = ku s or u = k u s holds true everywhere. Four models participating in ITMIX follow this strategy: -"Gantayat" (Gantayat et al., 2014; see Supplement Sect. S1.5) solves Eq. (4) in elevation bands, and by substituting τ according to Eq. (3). The central assumption is that u b = 0.25 u s . A final, gridded ice thickness distribution is then obtained by smoothing the elevation-band thickness with a 3 × 3 kernel.
-"RAAJgantayat" (re-implemented from Gantayat et al., 2014; see Supplement Sect. S1.14) follows exactly the same procedure. In fact, the method is an independent re-implementation of the Gantayat approach.
-"Gantayat-v2" (adapted from Gantayat et al., 2014; Supplement Sect. S1.6) closely follows the original approach by Gantayat et al. (2014). Instead of solving Eq. (4) for elevation bands, however, the equation is solved for discrete points along manually digitized branch lines. Interpolation between various branch lines is then used to obtain an ice thickness distribution. Note that none of the approaches based on the ideas by Gantayat et al. (2014) account for mass conservation.
-"Rabatel" (Rabatel et al., unpublished; see Supplement Sect. S1.16) is based on the knowledge of surface velocities as well but includes some elements of the mass conserving approaches. Basically, the ice thickness along individual glacier cross sections is calculated by assuming that u = 0.8 u s and by determining the ice volume flux for a given cross section from an estimate of the mass flux from the upstream area. Combining this information allows for the area of a given cross section to be computed, and the spatial distribution of u s along the cross section is used to determine the local ice thickness. The final ice thickness distribution is obtained by interpolation of various cross sections.

Other approaches
This last category includes two additional approaches that cannot be classified in any of the categories above: -"GCneuralnet" (Clarke et al., 2009; see Supplement Sect. S1.8) is based on artificial neural networks (ANNs) and thus neglects any kind of glacier physics. The basic assumption is that the bedrock topography underneath glacierized areas closely resembles nearby ice-free landscapes. In principle, the method uses an elevation-dependent azimuthal stencil to "paste" icefree landscape sections into glacierized parts of a given region.
-"Brinkerhoff" (Brinkerhoff et al., 2016; see Supplement Sect. S1.1) poses the problem of finding bedrock elevations in the context of Bayesian inference. The main hypothesis is that both bed elevations and ice flux divergence can be modelled as Gaussian random fields with assumed covariance but unknown mean. Depthaveraged velocities are found by solving the continuity equation (Eq. 1) and by prescribing normally distributed likelihoods with known covariance around the available velocity, SMB, and ∂h/∂t data. A Metropolis-Hastings algorithm (Hastings, 1970) is then used to generate samples from the posterior distribution of bed elevations.

Results and discussion
In total, 189 different solutions were submitted to ITMIX (Table 2). Three models (Farinotti, Huss, Linsbauer) were able to handle all 21 test cases, one model handled 20 cases (Machguth), and one model handled 19 cases (Maussion). Data availability was the main factor hindering the consideration of additional test cases. This is particularly true for the approaches (a) Brinkerhoff, Brinkerhoff-v2, Morlighem, and VanPeltLeclercq, requiring SMB at least; (b) Gantayat, Gantayat-v2, and RAAJgantayat, requiring surface velocity fields; (c) Fuerst, requiring SMB, ∂h/∂t, and velocity fields simultaneously; and (d) GCneuralnet, requiring surrounding ice-free terrain for algorithm training. For the approaches GCbedstress, RAAJglabtop2, and Rabatel, the time required for model set-up was a deterrent for considering additional test cases.

Between-model comparison
Locally, the solutions provided by the different models can differ considerably. As an example, Fig. 2 provides an overview of the solutions generated for the test case "Unteraar" (the real-world case considered by the largest number of models). The large differences between the solutions are particularly evident when comparing the average composite ice thickness (i.e. the distribution obtained when averaging The Cryosphere, 11, 949-970, 2017 www.the-cryosphere.net/11/949/2017/ all solutions grid cell by grid cell; Fig. 2a) with the local ensemble spread (i.e. the spread between all solutions at a given grid cell; Fig. 2b). Often, the local spread is larger than the local average. This observation holds true for most of the other test cases as well (not shown). Figure 2c and d provide insights into the composition of the ensemble spread by presenting the composites of the minimum and maximum provided thicknesses respectively. The models providing the most extreme solutions are depicted in Fig. 2e and f. In the Unteraar example, the approaches GCneuralnet and Fuerst tend to provide the smallest and largest local ice thickness of the ensemble respectively. For the specific case, closer inspection shows that the very low ice thicknesses estimated by GCneuralnet are associated with the debris-covered parts of the glacier and with the steep slopes delimiting these parts in particular. This is an artefact introduced by the specific set-up of the stencil used within the ANN. In fact, Clarke et al. (2009) found that including steep ice in the definition of valley walls can be advantageous for ANN training. An unforeseen consequence is that steep ice walls close to debris-covered glacier ice are inter-preted as valley walls as well, thus causing the surrounding ice thickness to be too thin. Flagging debris-covered glacier parts and treating them as a special case could be an option for alleviating this issue. For Fuerst, large ice thicknesses (locally exceeding 900 m) mostly occur in the accumulation area. This is the area for which no measured ice flow velocities were available, thus precluding precise model constraint. Uncertainties in this area are propagated downstream thus perturbing the inferred ice thickness even in areas with velocity information. For the particular test case, the approach also provides the minimal ice thickness for large areas, indicating that important oscillations are present in the estimated ice thickness field.
The overall tendency for individual models to provide "extreme" solutions is shown in Fig. 3. Two models (Rabatel and GCbedstress) seem to be particularly prone to predict large ice thicknesses, providing the largest ice thickness of the ensemble for 33 and 25 % of the area they considered. Although for Rabatel the basis of the statement is weak (only one test case considered), possible explanations lie in (a)  VanPeltLeclercq 10 Farinotti 21 Maussion 19 Huss 21 GCbedstress 15 Morlighem 10 Linsbauer 21 Machguth 20 RAAJglabtop2 4 Gantayat 7 Gantayat-v2 7 RAAJgantayat 5 Rabatel 1 Brinkerhoff 3 GCneuralnet 10 Figure 3. Share of "extreme results" provided by individual models. An extreme result is defined as either the minimum (MIN) or maximum (MAX) ice thickness occurring in the ensemble of solutions provided for a given test case. The share is based on test case area and assigns equal weights to all cases (a 10 % "fraction of MAX solutions provided" indicates, for example, that on average, the model generated the maximal ice thickness for 10 % of the area of any considered case). The number of test cases considered by individual models is given. Models are sorted according to the categories introduced in Sect. 4. flux of individual profiles and (b) the assumed relation between depth-averaged and surface flow velocity (see Supplement Sect. S1.16). For GCbedstress the possible reasons are less clear. The no-sliding assumption included by the model (see Supplement Sect. S1.7) -which causes systematically higher thickness estimates than if sliding is assumed -could be a reason. The model, however, seems not to be particularly sensitive to it: assuming that half of the surface velocity is due to sliding decreases the mean estimated thickness by 13 % only (not shown).
Very small ice thicknesses are often predicted by the models Maussion and GCneuralnet. The two models provided the smallest ice thickness of the ensemble in 30 and 23 % of the considered area respectively. For Maussion, the result is mainly driven by the ice thickness predicted for ice caps (Academy, Austfonna, Devon) and large glaciers (Columbia, Elbrus). This is likely related to the applied calibration procedure (see Supplement Sect. S1.12), which is based on data included in GlaThiDa v1. The observations in that data set, in fact, mostly refer to smaller glaciers (Gärtner-Roer et al., 2014). For GCneuralnet, it can be noted that the smallest ice thicknesses are often predicted along the glacier centre lines (not shown). Besides the previously discussed issue related to steep ice in proximity of e.g. medial moraines, the ad hoc solution adopted to allow the ANN stencil to be trained (see Supplement Sect. S1.8) might be an additional cause.
Although the above observations provide insights into the general behaviour of individual models, it should be noted that a tendency of providing extreme results is not necessarily an indicator of poor model performance. Actual model performance, in fact, can only be assessed through comparison against direct observations (see next section).

Comparison to ice thickness measurements
The solutions submitted by individual models are compared to ice thickness measurements in Figs. 4 and 5. For every glacier, the figures show one selected profile along and one across the main ice flow direction. The previously noted large spread between individual solutions re-emerges, as well as the tendency of individual models to produce rather large oscillations. The spread is particularly pronounced for ice caps (Academy, Austfonna, Devon) and for across-flow profiles (Fig. 5).
It is interesting to note that the spread between models is not reduced when individual model categories are considered separately (see also Fig. S3). We interpret this as an indication that even models based on the same conceptual principles can be regarded as independent. Whilst this is not surprising for the minimization approaches since they are based on very different forward models (see Sect. 4.1), or for the mass conserving approaches since they differ significantly in terms of implementation (Sect. 4.2), the observation is rather unexpected for the shear-stress-based and velocity-based approaches (Sects. 4.3 and 4.4 respectively). The latter two categories, in fact, both rely on very similar concepts. Figure 5 reveals that for shear-stress-based approaches the differences are particularly prominent for ice caps, in the vicinity of ice divides in particular. This seems to be related to the way individual models (a) subdivide individual ice caps, (b) treat the resulting boundaries, and (c) handle very small surface slopes. Also for the participating velocity-based approaches, apart from Rabatel all rely on the ideas of Gantayat et al. (2014), and it seems that the implementation differences of conceptually similar approaches (see  are sufficient for considering the models as independent. The above consideration is relevant when interpreting the average solution of the model ensemble (thick green line in Figs. 4 and 5): this average solution matches the direct measurements relatively well for most glaciers, with an average deviation below 10 % in 17 out of 21 cases. This increase in prediction accuracy is expected for an unbiased model ensemble. For a set of independent random realizations of the same variable, in fact, Poisson's law of large numbers predicts the average result to converge to the expected value (the "true bedrock" in this case) with increasing number of realizations. The so-inferred unbiasedness of the ensemble has an important consequence, as it suggests that future estimates The Cryosphere, 11, 949-970, 2017 www.the-cryosphere.net/11/949/2017/ could be significantly improved when relying on such model ensembles. Model weighting -such as used in numerical weather prediction for example (e.g. Raftery et al., 2005)could additionally be considered in this respect, but it would require a sufficiently large data set to quantify model performance. The positive effect of averaging the results of individual models is best seen in Fig. 6. On average over the individual model solutions, the difference between modelled and measured ice thickness is −17 ± 36 % (1σ estimate) of the mean glacier thickness (first box plot in the ALL group). This value reduces to +10 ± 24 % when the average composite solution is considered and is close to the value obtained when selecting the best single solution for every test case individually (third and second box plots of the group respectively).
Two notable exceptions in the above considerations are given by the test cases Unteraar and Tasman, for which the ensembles of solutions (15 and 11 solutions provided respectively) converge to a significantly smaller ice thickness than observed (median deviations of −84 and −65 % respectively). Two common features that might partially explain the observation are (a) the significant debris cover of the two glaciers, which might bury ice thicker than what would be expected from the present-day SMB fields, and (b) the branched nature of the glaciers, which might be insufficiently captured by the models. Both hypotheses, however, are difficult to test further, as the remaining cases show very different morphological characteristics. An erroneous interpretation of the actual ice thickness measurements, in contrast, seems unlikely. This is particularly true for Unteraar, for which the reported quality of original radio-echo soundings is high and independent verifications through borehole measurements were performed (Bauder et al., 2003).
"Urumqi" and "Washmawapta", for which eight and six individual solutions were provided, respectively, are the other two cases for which the average ice thickness composite differs largely from the observations (median deviations of −71 and −125 % respectively; Figs. 4, 5, and 6; recall that because the "true" ice thickness is not known everywhere, deviations are expressed in terms of mean thickness of the average composite). For Washmawapta -a cirque glacier mostly fed by steep ice-free headwalls (Sanders et al., 2010) -it is interesting to note that the Farinotti approach is the only one predicting ice thickness in the observed range. This suggests that the concept of "ice flow catchments", which is used in the approach for accommodating areas outside the glacier margin that contribute to snow accumulation (see Farinotti et al., 2009), is an effective workaround for taking such areas into account. Failure of doing so, in fact, causes the ice volume flux (and thus the ice thickness) to be underestimated. For Urumqi, in contrast, the reasons for the substantial underestimation of actual ice thickness are less clear. Potentially, they could be linked to (a) the cold nature of the glacier (e.g. Maohuan et al., 1989), which requires thicker ice to produce a given surface velocity (note that most models assumed flow rate factors for temperate ice; Supplement Table S2), and (b) the artefacts in the provided DEM (note the step-like features in the surface shown in Fig. 4), which lead to locally very high surface slope and thus low ice thickness.
The comparison between Figs. 4 and 5 also suggests that, in general, the ice thickness distribution along flow is better captured than the distribution across flow. This is likely due to the combination of the fact that most participating approaches include considerations about mass conservation and that virtually all models include surface slope as a predictor for the local ice thickness. Indeed, these two factors have a stronger control on the along-flow ice thickness distribution than they have across flow.
The results also indicate that, compared to real-world cases, the ice thickness distribution of the three synthetic cases is better reproduced. On average over individual solutions, the difference to the correct ice thickness is −17±20 % (Fig. 6). This difference reduces to −15 ± 11 % for the average composites, i.e. to a 1σ spread reduced by a factor of 2. Again, two factors provide the most likely explanation. On the one hand, the model used for generating the synthetic cases is built upon the same theoretical knowledge as the models used for generating the ice thickness estimates. On the other hand, and more importantly, the input data from which the ice thickness distribution is inferred are known without any uncertainty in the synthetic cases. The latter is in contrast to the data available for the real-world cases: whilst the provided DEMs, ∂h/∂t fields, and outlines can be considered of good quality, SMB fields are often the product of the inter-and extrapolation of sparse in situ measurements. The inconsistencies that may arise between ∂h/∂t and SMB, together with the previously mentioned discontinuities in the available velocity fields (see Sect. 3), are obviously problematic for methods that use this information. Two additional observations that might be related to the better model performance in the synthetic cases are (1) that the no-sliding assumption adopted in most models was adequate for the considered synthetic cases, but does not hold true in the realworld ones, and (2) that synthetic glacier geometries were close to steady state. Testing the importance of the second consideration is not possible with the data at hand and would require the generation of transient synthetic geometries.
In relative terms, the average composite solutions seem to better predict (smaller interquartile range, IQR) the ice thickness distribution of ice caps than that of glaciers. In fact, the 1σ deviations from the measurements for ice caps and glaciers are of 12 ± 16 % and 12 ± 34 % respectively (Fig. 6). This might be surprising at first, but Fig. 4 illustrates that for all three considered ice caps, the average composites are the results of a relatively small set (six or seven) of largely differing solutions. This issue is particularly evident for the ice cap interiors, for which two model clusters emerge, predicting extremely high and extremely low ice thicknesses respectively. The relatively small IQR of the ensemble mean, thus, appears to be rather fortuitous and calls for additional work www.the-cryosphere.net/11/949/2017/ The Cryosphere, 11, 2017 in this domain. Note, moreover, that the relative accuracy is expressed in relation to the mean ice thickness. In absolute terms, the abovementioned values translate into average deviations on the order of 48 ± 63 m for ice caps and 11 ± 27 m for glaciers. Obviously, these values are strongly affected by the particular test cases included in the intercomparison and should not be expected to hold true in general.
To put the average model performance into context, the results are compared to a benchmark model based on volumearea scaling (last box plot in Fig. 6). The "model" neglects spatial variations in thickness altogether and simply assigns the mean ice thickness predicted by a scaling relation to the whole glacier. For the scaling relation, we use the form h = cA γ −1 , where h (m) and A (km 2 ) are the mean ice thickness and the area of the glacier respectively. The parameters c and γ are set to c = 0.034 and γ = 1.36 for glaciers (Bahr et al., 2015) and to c = 0.054 and γ = 1.25 for ice caps (Radić and Hock, 2010). The values of parameter γ have a strong theoretical foundation (Bahr et al., 1997(Bahr et al., , 2015, whilst c is a free parameter. Since the relation between c and h is linear, it must be noted that as long as the distribution of c is symmetric and as long as the value chosen for c corresponds to the mean of that distribution, the results of the above relation correspond to the maximum likelihood estimator for the mean of the distribution of h. In other words, randomly sampling different values for c would increase the spread of our estimates but not its mean. This simple model deviates from the measured ice thickness by −42±59 %, which is a spread (bias) more than twice (4 times) as large as estimated for the average composites of the model ensemble (10 ± 24 %). This result is reassuring as it suggests that the individual models have actual skill in estimating both the relative ice thickness distribution and the total glacier volume of individual glaciers. The negative sign of the bias -which is consistent with results obtained from a comprehensive data set in Norway (Andreassen et al., 2015) -should not be overinterpreted, since a different choice for c could be used to alter it. It has again to be noted, however, that this would not reduce the spread in the results and that for real-world applications the value of c is unknown. In general, a site-specific calibration of c would be required.

Individual model performance
The considerations in the previous section refer mainly to the average composite ice thickness provided by the ensemble of models. Running a model ensemble, however, can be very impractical. This opens the question on whether individual models can be recommended for particular settings, or whether a single best model can be identified.
To address this question, we propose two separate rankings. Both are based on the (I) average, (II) median, (III) interquartile range, and (IV) 95 % confidence interval (95 % CI) of the distribution of the deviations between modelled and measured ice thicknesses (Fig. 7).
The first ranking considers the individual test cases separately. All models considering a particular test case are first ranked separately for the four indicators (I-IV). When a model does not include a particular test case, no ranks are assigned. For every model, the four indicators are then averaged individually over all test cases. The final rank is computed by computing the mean of these average ranks (Table 3). The ranking rewards models with a consistently high performance over a large number of test cases.
The second ranking is only based on the average model performance. In this case, ranks for the above indicators (I-IV) are assigned to the ensemble of point-to-point deviations of the various models (last row of box plots in Fig. 7; same weight between test cases ensured). The ranks for the four individual indicators are then averaged to obtain the overall rank (Table 4). In contrast to the first option, this ranking does not consider the test cases individually and does not account for the number of considered test cases. A model considering only one test case but performing perfectly on it, for example, would score highest.
The ranking result of every model on a case-by-case basis is given in Table S3. The distributions of the deviations between modelled and measured ice thicknesses for every model and considered test case are given in Figs. 7 and S2. Similarly as noted during the discussion of the last section (Sect. 5.2), the rankings do not suggest a performance advantage in any of the five model categories introduced in Sect. 4.
Combined over the two rankings, the model Brinkerhoff-v2 scores highest (third and first rank respectively). The good score is mainly driven by the comparatively small model spread (IQR and 95 % CI) and bias (Table 4). The small model bias (−3 % average deviation), however, arises from a partial compensation between positive bias for glaciers (+5 %) and negative bias for the synthetic cases (−22 %) (Table 4). Unfortunately, the model did not consider any ice cap, thus hampering any statement on model performance in this particular setting. Ice caps were not considered mainly because of the absence of the necessary data. Apart from the model Brinkerhoff-v2, the first positions in the first ranking are occupied by models that consider a large number of test cases (Table 3). The model by Maussion is rated highest. Similar to Brinkerhoff-v2, the good result is driven by the small IQRs and 95 % CIs, in particular for glaciers and ice caps. In the second ranking, the model is severely penalized (11th rank) for its large bias (−36 % on average; Table 4). The bias is particularly prominent in the case of ice caps and the synthetic cases (−42 and −45 % average deviation respectively) and may be related to the fact that the Maussion model was developed and calibrated by using data from valley glaciers only. For the synthetic cases in particular, the calibration with real-world glaciers (i.e. cases that include sliding) seems to be a likely explanation for a systematic underestimation of the ice thickness. This, however, appears to be only a partial explanation, as such a negative bias is apparent for most approaches, i.e.
In general, the model bias can be interpreted as an indicator for the performance of the models in reproducing the total glacier ice volume. The latter is not discussed explicitly as the computation of a "measured volume" would need the available measurements to be interpolated over large distances. Seven of the considered models show a bias of less than 8 % (Table 4). An interesting case in this respect is given by the model by Gantayat et al. (2014), which yields small biases (−4 and −8 %) for both considered implementations (Gantayat and RAAJgantayat respectively). The relatively low overall ranks assigned to these models (ranks 10 and 14 in the first ranking, ranks 3 and 12 in the second respectively) are an expression of the relatively small number of considered test cases (first ranking) and the relatively large model spread (second ranking). Of interest is also the observation that the version of the model considering multiple flow lines (Gantayat-v2) yields a significantly higher bias (−32 % on average) than the approach based on elevation bands, despite a moderate decrease in model spread. The increase is particularly visible for real-world glaciers, for which the bias changes from +4 to −61 %. This might hint at the difficulty in correctly subdividing a given glaciers into individual flow lines and could be an indication that the rather mechanistic procedure used in this case (see Supplement Sect. S1.6) is insufficient for achieving a sensible subdivision.
The difficulty in correctly interpreting the overall model bias is well illustrated in the case of the Linsbauer model: the model yields the smallest bias over the entire set of considered test cases (−1 % on average) but is the result of a compensation between (a) a moderate negative bias for glaciers and the synthetic test cases (both −16 %) and (b) a large positive bias for ice caps (+91 %).
Together with Brinkerhoff-v2, the model Farinotti is the second one included in the first five places of both rankings (ranks 4 and 5 respectively; Tables 3 and 4). The relatively high ranking is due to a combination of comparatively high model performance (small bias and spread) and large number of considered test cases. The consideration of all test cases, however, should not be interpreted as capability of handling large samples of glaciers in this case. The application of the model, in fact, requires a significant amount of manual input (see Sect. 4). This is in contrast to the fully automated methods of Maussion, Huss, and Machguth. In this respect it is interesting to note that the model by Huss slips from the second rank in the first ranking to the eighth in the second one. The relatively low score in the second ranking is mainly an expression of the comparatively large confidence intervals (Table 4). Combined over the two rankings, however, the model can be considered as the best amongst the fully automated approaches.
The model GCbedstress (fifth and sixth in the two rankings) ranks highest when only ice caps are considered. The average deviation of 3 ± 17 % indeed suggests a very high www.the-cryosphere.net/11/949/2017/ The Cryosphere, 11, 949-970, 2017 model performance. However, it has to be noted that the result is based on one test case only (Table 2). For the models considering more ice caps, the results are heterogeneous and difficult to interpret, as models showing small IQRs show large bias, and vice versa (Table 4). The model GCneuralnet is found at the other end of the ranking (penultimate and last ranks respectively). The average deviation of −39 ± 52 % highlights both the large bias and large spread of the estimates. Obviously, the performance of approaches based on ANNs are highly dependent on the data set used for algorithm training. The large deviations might therefore be an expression of the issues encountered with the provided DEMs (see Supplement Sect. S1.8) rather than an indication of generally low model performance. As already noted, however, the general absence of ice-free analogues for ice caps or crater glaciers makes the approach unsuitable for this kind of morphologies.
An interesting result emerges when considering the IQRs and 95 % CIs in the synthetic test cases (see Table 4): approaches that include SMB, ∂h/∂t or velocity information in addition to the glacier outline and the DEM of the surface (e.g. approaches by Brinkerhoff, Fuerst, Morlighem, or Van-PeltLeclerq) yield the smallest spreads around the average deviation (IQR < 22 % for all mentioned models). This is in marked contrast to the real-world cases, in which the IQRs are about 5 times larger (average IQR for the same models = 105 %), and similar considerations apply when analysing the 95 % CIs. As already noted, this is most likely linked to the differences in data quality. Whilst the input data for the synthetic cases are perfectly known, the data available for the real-world cases were necessarily retrieved from various independent data sources (see Table 1). This often led to problems in the mutual consistency of the surface fields and caused particular difficulties to those models requiring all of the information. The capability of accounting for observational uncertainties, as in the Brinkerhoff approach for example (see Supplement Sect. S1.1), hence seems to be an important prerequisite when handling real-world cases. Similarly, having access to reliable uncertainty estimates for any particular data set would be important. Emphasis has to be put in this domain if significant advances are to be achieved.

Conclusions
ITMIX was the first coordinated intercomparison of approaches that estimate the ice thickness of glacier and ice caps from surface characteristics. The goal was to assess model performance for cases in which no a priori information about ice thickness is available. The experiment included 15 glaciers and 3 ice caps spread across a range of different climatic regions, as well as 3 synthetically generated test cases.
ITMIX attracted 13 research groups with 17 different models that can be classified into (1) minimization approaches, (2) mass conserving approaches, (3) shear- Table 3. Ranking of individual models based on case-by-case performance. Numbers in front of model names refer to the categories introduced in Sect. 4. Displayed values are mean ranks for the average (avg), median (med), interquartile range (IQR), and 95 % confidence interval (95 %) of the deviations from ice thickness measurements. For every group (glaciers, ice caps, and synthetic cases), the average of the above three ranks is given (AVG). Values for ALL correspond to the average of the three groups. The three best average results for every group are given in bold. n is the number of considered test cases per model.     Figure 7. Difference between estimated and measured ice thicknesses. For every test case (rows) and every model (columns; ordered according to the categories defined in Sect. 4), the distribution of the point-by-point deviations between estimated and measured ice thicknesses is shown. Differences are expressed relative to the mean ice thickness (e.g. a 100 % deviation means that the modelled ice thickness deviates from the measured one by one mean ice thickness   stress-based approaches, (4) velocity-based approaches, and (5) other approaches outside of the previous categories. The 189 solutions submitted in total provided insights into the performance of the various models and the accuracies that can be expected from their application.
The submitted results highlighted the large deviations between individual solutions and even between solutions of the same model category. The local spread often exceeded the local ice thickness. Caution is thus required when interpreting the results of individual models, especially if they are applied to individual sites. Substantial improvements in terms of accuracy, however, could be achieved when combining the results of different models. Locally, the mean deviation between an average composite solution and the measured ice thickness was on the order of 10 ± 24 % of the mean ice thickness (1σ estimate). This hints at the random nature of individual model errors, and suggests that ensembles of models could help in improving the estimates. For applications at the large scale -such as the estimation of the ice thickness distribution of an entire mountain range and beyond -reducing the uncertainties through such a strategy will be challenging, as only few models are currently capable of operating at the regional or global scale.
Although no clear pattern emerged for the performance of individual model categories, the intercomparison allowed statements about the performance of individual models. The model Brinkerhoff-v2 was detected as the best single model, with average deviations for real-world glaciers on the order of −3 ± 27 %. Some caution has to be expressed, however, since the model considered only about half of the provided test cases and was not applied to any ice cap. The model Huss scored highest amongst the automated methods capable of handling large sets of glaciers. With average deviations of −14 ± 35 %, the approach ranged mid-way when considering point-to-point deviations from measurements. For ice caps, the model GCbedstress showed very promising results (average deviations of 3 ± 17 %), although generalizing this observation would be speculative, as the approach considered only one test case. For ice caps, particularly large differences between individual models were detected in the proximity of ice divides. This calls for improvements in how models treat these regions.
Somewhat surprisingly, models that include SMB, ∂h/∂t, or surface flow velocity fields in addition to the glacier outline and DEM did not perform better when compared to approaches requiring less data, in particular for real-world cases. Inconsistencies between available data sets -which are often acquired with very different techniques, spatial footprints, and temporal resolutions -appeared to be the most likely cause. Although it must be noted that the set of considered synthetic cases was generated upon the same theoretical knowledge as the approaches used for ice thickness inversion, the generally better model performance for these cases supports the previous hypothesis. In the synthetic cases, in fact, input data were known precisely, i.e. without The Cryosphere, 11, 949-970, 2017 www.the-cryosphere.net/11/949/2017/ observational errors. This highlights the importance for mutually consistent data sets and suggests that improved observational capabilities could help to improve the performance of the next generation of ice thickness estimation methods. Similarly, improving the model's capability of taking into account uncertainties in the input data should be considered a priority.
Besides improved data concerning glacier surface characteristics, a key for developing a new generation of ice thickness estimation models will be the data base against which the models can be calibrated and validated. The data utilized within ITMIX are available as a supplement to this paper (see link at the end of this section), but a much larger effort is ongoing in collaboration with the World Glacier Monitoring Service. With the initiation of the Glacier Thickness Database (Gärtner-Roer et al., 2014;WGMS, 2016), the first steps towards a freely accessible, global database of ice thickness measurements have been undertaken. We anticipate that this effort, together with a second phase of ITMIX targeting at how to best integrate sparse thickness measurements to improve model performance, will foster the development of improved ice thickness estimation approaches.
To summarize, in order to improve available thickness estimates for glacier and ice caps, we make the following recommendations: -Ensemble methods comprising a variety of independent, physically based approaches should be considered. This is likely to be a more effective strategy than focusing on one individual approach.
-Models should be extended to take observational uncertainty into account. The Bayesian framework used by Brinkerhoff et al. (2016), for example, showed promising results in this respect.
-The increasing availability of surface ice flow velocity data (e.g. Scambos et al., 2016) should be exploited. In this context, the previously mentioned necessity of accounting for observational uncertainty is crucial.
-The way individual models treat ice divides has to be improved. This is important when addressing ice caps and glacier complexes and to ensure consistency between subsurface topographies of adjacent ice masses.
-Efforts for centralizing available ice thickness measurements should be strengthened. Initiatives such as the GlaThiDa database launched by the World Glacier Monitoring Service are essential for new generations of ice thickness models to be developed and validated.
Data availability. All data used within ITMIX, as well as the individual solutions submitted by the participating models, can be found at http://dx.doi.org/10.5905/ethz-1007-92.
The Supplement related to this article is available online at doi:10.5194/tc-11-949-2017-supplement.