Knowledge of the ice thickness distribution of glaciers and ice caps
is an important prerequisite for many glaciological and hydrological
investigations. A wealth of approaches has recently been presented for
inferring ice thickness from characteristics of the surface. With the
Ice Thickness Models Intercomparison eXperiment (ITMIX) we performed
the first coordinated assessment quantifying individual model
performance. A set of 17 different models showed that individual ice
thickness estimates can differ considerably – locally by a spread
comparable to the observed thickness. Averaging the results of
multiple models, however, significantly improved the results: on
average over the 21 considered test cases, comparison against direct
ice thickness measurements revealed deviations on the order of

The ice thickness distribution of a glacier, ice cap, or ice sheet is a fundamental parameter for many glaciological applications. It determines the total volume of the ice body, which is crucial to quantify water availability or sea-level change, and provides the link between surface and subglacial topography, which is a prerequisite for ice flow modelling studies. Despite this importance, knowledge about the ice thickness of glaciers and ice caps around the globe is limited – a fact linked mainly to the difficulties in measuring ice thickness directly. To overcome this problem, a number of methods have been developed to infer the total volume and/or the ice thickness distribution of ice masses from characteristics of the surface.

Amongst the simplest methods, so-called “scaling approaches” are the
most popular

Methods that yield distributed information about the ice
thickness generally rely on theoretical
considerations.

Early approaches that take into account mass conservation and ice flow
dynamics go back to

Overview of the test cases considered in
ITMIX. Glacier type follows the GLIMS classification guidance

In the recent past, the number of methods aiming at estimating the ice
thickness distribution from characteristics of the surface has
increased at a rapid pace. Methods have been presented that include
additional data such as surface velocities and mass balance

Against this background, the working group on glacier ice thickness
estimation, hosted by the International Association of Cryospheric
Sciences (

This article presents both the experimental set-up of ITMIX and the results of the intercomparison. The accuracy of individual approaches is assessed in a unified manner, and the strengths and shortcomings of individual models are highlighted. By doing so, ITMIX not only provides quantitative constraints on the accuracies that can be expected from individual models but also aims at setting the basis for developing a new generation of improved ice thickness estimation approaches.

ITMIX was conducted as an open experiment, with a call for
participation posted on the email distribution list “Cryolist”
(

The input data referred to the surface characteristics of a predefined
set of 21 test cases (see next section, Table

No prior information about ice thickness was provided, and the participants were asked not to make use of published ice thickness measurements referring to the considered test cases for model calibration. This was to mimic the general case in which the ice thickness distribution for unmeasured glaciers has to be estimated. The compliance to the above rule relied on honesty.

Participants were asked to treat as many test cases as possible
and to consider data availability (see next section and
Table

Overview of provided and used data, as
well as test cases considered by individual models. Names of ice
caps are flagged with an asterisk (*). Models are named after the
modeller submitting the results; alternative model identifiers that
have been used in the literature are given in parentheses. The
category refers to the classification provided in
Sect.

The considered test cases included 15 glaciers and 3 ice caps for
which direct ice thickness measurements are available and 3
synthetically generated glaciers virtually “grown” over known
bedrock topographies (more detailed information below). The
real-world test cases (see Fig.

For each test case, the input data provided to the ITMIX participants
included at a minimum (a) an outline of the glacier or ice cap and
(b) a gridded digital elevation model (DEM) of the ice
surface. Further information was provided on a case-by-case basis
depending on data availability, including the spatial distribution of
the (i) surface mass balance (SMB), (ii) rate of ice thickness change
(

For the real-world test cases, and whenever possible, temporal
consistency was ensured between individual data sets. Glacier outlines
and DEMs were snapshots for a given point in time, whereas SMB,

Ice thickness measurements were only used for quantifying model
performance but were not distributed to the ITMIX
participants. Bedrock elevations were obtained by subtracting observed
ice thicknesses from surface elevations, and the bedrock was assumed
to remain unchanged over time. The time periods the individual data
sets are referring to are given in Supplement Table S1. Note that no
specific information about the uncertainties associated to individual
measurements were available. Reported uncertainties for ice thickness
measurements, however, are typically below 5 %

The synthetic test cases were generated by “growing” ice masses over
known bedrock topographies with the Elmer/Ice ice flow model

All data provided as input to the ITMIX participants, as well as the
results submitted by individual models, will be provided as an electronic
Supplement to this article. The direct ice thickness measurements were
additionally included in the Glacier Thickness Database (GlaThiDa) version 2

Overview of the considered real-world test cases. Note that some names are shortened for convenience (Academy is Academy of Sciences Ice Cap; Devon is Devon Ice Cap; Mocho is Glaciar Mocho-Choshuenco; Unteraar is Unteraargletscher; Urumqi is Urumqi Glacier no. 1).

The ITMIX call for participation was answered by 13 research groups
having access to 15 different models in total. Two modelling
approaches were used twice, with two independent implementations
stemming from two different groups, nine models were published prior to
the call, one model consisted of a modification of an existing approach,
and five models were previously unpublished. In total, thus, 17 different
models submitted individual solutions (Table

The 17 approaches providing individual solutions can be classified
into five different categories: (1) approaches casting ice thickness
inversion as a minimization problem (

Methods within this category formulate the problem of ice thickness
inversion as a minimization problem. They do so by defining a cost
function that penalizes the difference between a modelled and an
observable quantity. Typically, the observable quantity includes the
elevation of the glacier surface

“Brinkerhoff-v2” (Brinkerhoff, unpublished; see
Supplement Sect. S1.2 for details) includes three terms in the cost
function. The first term quantifies the difference between modelled
and observed surface elevations; the second penalizes strong spatial
variations in bedrock elevations; and the third is used to impose zero
ice thickness outside the glacier boundaries. If available, surface
flow velocities are used to additionally invert for the basal traction
field. The forward model is based on the Blatter–Pattyn approximation
to the Stokes equations

“VanPeltLeclercq” (adapted from

“Fuerst” (Fürst et al., unpublished; Supplement Sect. S1.4) differs from the two above approaches in that the cost
function is not linked to surface elevations. Instead, the function
penalizes (i) negative thickness values, (ii) the mismatch between
modelled and observed surface velocities, (iii) the mismatch between
modelled and observed SMB, and (iv) strong spatial variations in ice
thickness or surface velocities. The forward model is based on
Elmer/Ice

Methods appertaining to this category are based on the principle of
mass conservation. If ice is treated as an incompressible medium, the
corresponding continuum equation states that the ice flux divergence

“Farinotti”

“Maussion” (Maussion et al., unpublished; see Supplement
Sect. S1.12) is based on the same approach as

“Huss”

“GCbedstress” (

“Morlighem” (

Methods of this category rely on the shallow ice approximation

“Linsbauer”

“Machguth”

“RAAJglabtop2”

As for models in Sect.

“Gantayat” (

“RAAJgantayat” (re-implemented from

“Gantayat-v2” (adapted from

“Rabatel” (Rabatel et al., unpublished; see Supplement
Sect. S1.16) is based on the knowledge of surface velocities as
well but includes some elements of the mass conserving
approaches. Basically, the ice thickness along individual glacier
cross sections is calculated by assuming that

This last category includes two additional approaches that cannot be
classified in any of the categories above:

“GCneuralnet” (

“Brinkerhoff” (

In total, 189 different solutions were submitted to ITMIX
(Table

Overview of the range of
solutions provided by the ensemble of models. The example refers to
the test case “Unteraar”. The first four panels show composites
for the

Share of “extreme results”
provided by individual models. An extreme result is defined as
either the minimum (MIN) or maximum (MAX) ice thickness occurring
in the ensemble of solutions provided for a given test case. The
share is based on test case area and assigns equal weights to all
cases (a 10 % “fraction of MAX solutions provided” indicates,
for example, that on average, the model generated the maximal ice
thickness for 10 % of the area of any considered case). The
number of test cases considered by individual models is
given. Models are sorted according to the categories introduced in
Sect.

Locally, the solutions provided by the different models can differ
considerably. As an example, Fig.

Figure

The overall tendency for individual models to provide “extreme”
solutions is shown in Fig.

Very small ice thicknesses are often predicted by the models
Maussion and GCneuralnet. The two models provided the smallest
ice thickness of the ensemble in 30 and 23 % of the considered area
respectively. For Maussion, the result is mainly driven by the ice
thickness predicted for ice caps (Academy, Austfonna, Devon) and large
glaciers (Columbia, Elbrus). This is likely related to the applied
calibration procedure (see Supplement Sect. S1.12), which is based
on data included in GlaThiDa v1. The observations in that data set, in
fact, mostly refer to smaller glaciers

Although the above observations provide insights into the general behaviour of individual models, it should be noted that a tendency of providing extreme results is not necessarily an indicator of poor model performance. Actual model performance, in fact, can only be assessed through comparison against direct observations (see next section).

The solutions submitted by individual models are compared to
ice thickness measurements in Figs.

It is interesting to note that the spread between models is not
reduced when individual model categories are considered separately
(see also Fig. S3). We interpret this as an indication
that even models based on the same conceptual principles can be
regarded as independent. Whilst this is not surprising for the
minimization approaches since they are based on very different forward
models (see Sect.

The above consideration is relevant when interpreting the average
solution of the model ensemble (thick green line in
Figs.

The positive effect of averaging the results of individual models is
best seen in Fig.

Two notable exceptions in the above considerations are given by the
test cases Unteraar and Tasman, for which the ensembles of
solutions (15 and 11 solutions provided respectively) converge to a
significantly smaller ice thickness than observed (median deviations
of

“Urumqi” and “Washmawapta”, for which eight and six individual solutions
were provided, respectively, are the other two cases for which the
average ice thickness composite differs largely from the observations
(median deviations of

The comparison between Figs.

The results also indicate that, compared to real-world cases, the ice
thickness distribution of the three synthetic cases is better
reproduced. On average over individual solutions, the difference to
the correct ice thickness is

In relative terms, the average composite solutions seem to better
predict (smaller interquartile range, IQR) the ice thickness
distribution of ice caps than that of glaciers. In fact, the
1

To put the average model performance into context, the results are
compared to a benchmark model based on volume-area scaling (last
box plot in Fig.

This simple model deviates from the measured ice thickness by

Comparison between estimated and measured bedrock topographies. For every test case, a longitudinal profile showing the glacier surface (thick black line), the bedrock solution of individual models (coloured lines), the average composite solution (thick green line), and the available GPR measurements (black-encircled red dots) are given. The coloured squares on the upper left of the panels indicate which models provided solutions for the considered test case (see legend on the right margin for colour key). The location of the profiles are shown on the small map on the bottom left of the panels (red), and the beginning of the profile (blue dot) is to the left. Available ice thickness measurements are shown in grey.

Same as Fig.

Effect of merging individual model
solutions. For every test case, the distribution of the deviations
between modelled and measured ice thicknesses is shown for the case
in which (i) the individual point-to-point comparisons of all
available solutions are pooled (grey box plots), (ii) only the
provided single best solution is considered (blue box plots), and
(iii) the deviations are computed from the average composite
thickness of all model solutions (green box plots). Deviations are
expressed relative to the mean ice thickness. The best single
solution is computed by summing the ranks for the (a) average
deviation, (b) median deviation, (c) interquartile range, and (d) 95 % confidence interval. The distributions of the deviations
when grouping glaciers, ice caps, and synthetic glaciers separately
are additionally shown, as are the results when grouping all test
cases together (ALL). When forming the groups, point-to-point
deviations for every test case are resampled so that every test
case has the same weight. The last box plot to the right refers to
the case in which the mean ice thickness is predicted by
volume–area scaling (see Sect.

The considerations in the previous section refer mainly to the average composite ice thickness provided by the ensemble of models. Running a model ensemble, however, can be very impractical. This opens the question on whether individual models can be recommended for particular settings, or whether a single best model can be identified.

To address this question, we propose two separate rankings. Both are
based on the (I) average, (II) median, (III) interquartile range, and (IV)
95 % confidence interval (95 % CI) of the distribution of the
deviations between modelled and measured ice thicknesses
(Fig.

The first ranking considers the individual test cases separately. All
models considering a particular test case are first ranked separately
for the four indicators (I–IV). When a model does not include a
particular test case, no ranks are assigned. For every model, the four
indicators are then averaged individually over all test cases. The
final rank is computed by computing the mean of these average ranks
(Table

The second ranking is only based on the average model performance. In
this case, ranks for the above indicators (I–IV) are assigned to the
ensemble of point-to-point deviations of the various models (last row
of box plots in Fig.

The ranking result of every model on a case-by-case basis is given in
Table S3. The distributions of the deviations between
modelled and measured ice thicknesses for every model and considered
test case are given in Figs.

Combined over the two rankings, the model Brinkerhoff-v2 scores
highest (third and first rank respectively). The good
score is mainly driven by the comparatively small model spread (IQR
and 95 % CI) and bias (Table

Apart from the model Brinkerhoff-v2, the first positions in the
first ranking are occupied by models that consider a large number of
test cases (Table

In general, the model bias can be interpreted as an indicator for the
performance of the models in reproducing the total glacier ice
volume. The latter is not discussed explicitly as the computation of a
“measured volume” would need the available measurements to be
interpolated over large distances. Seven of the considered models show
a bias of less than 8 % (Table

The difficulty in correctly interpreting the overall model bias is
well illustrated in the case of the Linsbauer model: the model
yields the smallest bias over the entire set of considered test cases
(

Together with Brinkerhoff-v2, the model Farinotti is the
second one included in the first five places of both rankings (ranks 4
and 5 respectively; Tables

The model GCbedstress (fifth and sixth in the two
rankings) ranks highest when only ice caps are considered. The average
deviation of

The model GCneuralnet is found at the other end of the ranking
(penultimate and last ranks respectively). The average deviation of

An interesting result emerges when considering the IQRs and 95 % CIs
in the synthetic test cases (see Table

Ranking of individual models based on case-by-case
performance. Numbers in front of model names refer to the categories
introduced in Sect.

Ranking of individual models based on
average model performance. Numbers in front of model names refer to
the categories introduced in Sect.

Difference between estimated and
measured ice thicknesses. For every test case (rows) and every
model (columns; ordered according to the categories defined
in Sect.

ITMIX was the first coordinated intercomparison of approaches that estimate the ice thickness of glacier and ice caps from surface characteristics. The goal was to assess model performance for cases in which no a priori information about ice thickness is available. The experiment included 15 glaciers and 3 ice caps spread across a range of different climatic regions, as well as 3 synthetically generated test cases.

ITMIX attracted 13 research groups with 17 different models that can be classified into (1) minimization approaches, (2) mass conserving approaches, (3) shear-stress-based approaches, (4) velocity-based approaches, and (5) other approaches outside of the previous categories. The 189 solutions submitted in total provided insights into the performance of the various models and the accuracies that can be expected from their application.

The submitted results highlighted the large deviations between
individual solutions and even between solutions of the same model
category. The local spread often exceeded the local ice
thickness. Caution is thus required when interpreting the results of
individual models, especially if they are applied to individual
sites. Substantial improvements in terms of accuracy, however, could
be achieved when combining the results of different models. Locally,
the mean deviation between an average composite solution and the
measured ice thickness was on the order of

Although no clear pattern emerged for the performance of individual
model categories, the intercomparison allowed statements about the
performance of individual models. The model Brinkerhoff-v2 was
detected as the best single model, with average deviations for
real-world glaciers on the order of

Somewhat surprisingly, models that include SMB,

Besides improved data concerning glacier surface characteristics, a
key for developing a new generation of ice thickness estimation models
will be the data base against which the models can be calibrated and
validated. The data utilized within ITMIX are available as a
supplement to this paper (see link at the end of this section), but a
much larger effort is ongoing in collaboration with the World Glacier
Monitoring Service. With the initiation of the Glacier Thickness
Database

To summarize, in order to improve available thickness estimates for
glacier and ice caps, we make the following recommendations:

Ensemble methods comprising a variety of independent, physically based approaches should be considered. This is likely to be a more effective strategy than focusing on one individual approach.

Models should be extended to take
observational uncertainty into account. The Bayesian framework used
by

The increasing availability of
surface ice flow velocity data

The way individual models treat ice divides has to be improved. This is important when addressing ice caps and glacier complexes and to ensure consistency between subsurface topographies of adjacent ice masses.

Efforts for centralizing available ice thickness measurements should be strengthened. Initiatives such as the GlaThiDa database launched by the World Glacier Monitoring Service are essential for new generations of ice thickness models to be developed and validated.

All data used within ITMIX, as well as the individual solutions
submitted by the participating models, can be found at

D. Farinotti and H. Li designed ITMIX. D. Farinotti coordinated the experiment, performed the model evaluations, prepared the figures, and wrote the paper with contributions by D. J. Brinkerhoff, G. K. C. Clarke, J. F. Fürst, P. Gantayat, F. Gillet-Chaulet, M. Huss, P. W. Leclercq, A. Linsbauer, H. Machguth, F. Maussion, M. Morlighem, A. Rabatel, R. Ramsankaran, O. Sanchez, W. J. J. van Pelt. D. J. Brinkerhoff, G. K. C. Clarke, J. J. Fürst, H. Frey, P. Gantayat, F. Gillet-Chaulet, C. Girard, M. Huss, P. W. Leclercq, A. Linsbauer, H. Machguth, C. Martin, F. Maussion, M. Morlighem, C. Mosbeux, A. Pandit, A. Portmann, A. Rabatel, R. Ramsankaran, O. Sanchez, S. S. Kumari, and W. J. J. van Pelt participated in the experiment and provided modelling results. B. Anderson, L. M. Andreassen, T. Benham, D. Binder, J. A. Dowdeswell, D. Farinotti, A. Fischer, K. Helfricht, S. Kutuzov, I. Lavrentiev, H. Li, R. McNabb, and P. A. Stentoft provided the data necessary for the real-world test cases. C. Martin and D. Farinotti generated the synthetic test cases. M. Huss, L. M. Andreassen, and G. H. Gugmundsson provided advice during experimental set-up and evaluation.

The authors declare that they have no conflict of interest.

ITMIX was made possible by the International Association of Cryospheric Sciences (IACS). We are indebted to the European Space Agency's project Glaciers_cci as well as to Elisa Bjerre, Emiliano Cimoli, Gwenn Flowers, Marco Marcer, Geir Moholdt, Johnny Sanders, Marius Schäfer, Konrad Schindler, Tazio Strozzi, and Christoph Vogel for providing data necessary for the experiment. D. Farinotti acknowledges support from the Swiss National Science Foundation (SNSF). M. Morlighem and C. Girard were funded by the National Aeronautics and Space Administration, Cryospheric Sciences Program, grant NNX15AD55G. P. W. Leclercq was funded by the European Research Council under the European Union's Seventh Framework Programme (FP/2007–2013)/ERC grant agreement no. 320816. J. A. Dowdeswell acknowledges the UK Natural Environment Research Council for supporting the airborne radio-echo sounding surveys of the three Arctic ice caps included in the experiment (grants GR3/4663, GR3/9958, GR3/12469, and NE/K004999/1). A. Rabatel and O. Sanchez acknowledge the contribution of Labex OSUG@2020 (Investissements d'avenir – ANR10 LABX56) and the VIP_Mont-Blanc project (ANR-14-CE03-0006-03). F. Gillet-Chaulet and C. Mosbeux acknowledge support from French National Research Agency (ANR) under the SUMER (Blanc SIMI 6) 2012 project ANR-12BS06-0018. The constructive comments by Shin Sugiyama and an anonymous referee helped to improve the manuscript.Edited by: A. Vieli Reviewed by: S. Sugiyama and one anonymous referee