Bayesian hierarchical modeling can assist the
study of glacial dynamics and ice flow properties. This approach
will allow glaciologists to make fully probabilistic predictions for
the thickness of a glacier at unobserved spatiotemporal
coordinates, and it will also allow for the derivation of posterior
probability distributions for key physical parameters such as ice
viscosity and basal sliding. The goal of this paper is to develop a
proof of concept for a Bayesian hierarchical model constructed,
which uses exact analytical solutions for the shallow ice
approximation (SIA) introduced by

Schematic of a simple Bayesian hierarchical model; here,

The shallow ice approximation (SIA) is a nonlinear partial
differential equation (PDE) that describes ice flow when glacier
thickness is relatively small compared to the horizontal dimensions.
Derived from the principle of mass conservation, the SIA PDE depends
on two key physical parameters: ice viscosity and basal sliding
(sometimes described as basal friction or drag). The primary
objective of this paper is to develop a Bayesian hierarchical model
(BHM) for glacier flow utilizing the framework espoused by

Before describing how BHMs are used in physical–statistical models, particularly for geophysical problems, a very brief overview of Bayesian modeling and Bayesian hierarchical modeling is given for the uninitiated reader. A main component of Bayesian statistics is the use of probability distributions to model parameters thought to be fixed quantities (i.e., scientific constants); this assumption allows one to use rules of conditional probability (i.e., Bayes' theorem) to derive probability distributions for scientific quantities of interest, such as physical constants or predictions of future quantities of a system being studied. Typically, the major assumptions required as input to the analysis are prior distributions for parameters as well as a probabilistic model for the data. The output is a probability distribution for parameters or predictions conditional on data that have been collected or observed; canonically, this is referred to as the posterior distribution.

A BHM is a Bayesian model in which the probabilistic model for data
is specified in a hierarchy. Working with such a hierarchy has a
number of advantages – it is usually easier to conceptualize the
probabilistic model for the data, and it is also easier to model
various parts of a system of interest modularly instead of all at
once. Such an approach is conducive to the construction of a
probabilistic model that tightly corresponds to a scientific system
of interest, which is naturally thought of in separate components or
modules. In a BHM, the rules of conditional probability can be used
to specify the relevant distributions. For example, let us consider
a mock system that has parameter vector

The case for applying Bayesian hierarchical modeling and methodology
in geophysics is strongly made by

At the highest level of a BHM, prior probability distributions are laid out for the physical parameters of interest. At the intermediary level, a probability distribution for the physical process of interest is laid out conditional on the parameters, which is typically motivated by a numerical scheme for solving PDEs. In particular, this level may be modeled as the sum of the output from a numerical solver and an error-correcting process. Finally, at the observed level, a probability distribution is set forth for the observed data conditional on the latent physical process and other relevant measurement parameters, which include variances of measuring procedures or devices. The product of these probability distributions specifies the joint distribution of all relevant quantities, which is proportional to the posterior distribution by the definition of conditional probability. While a traditional analysis may handle each of these disparate sources of uncertainty in an ad hoc and disjointed fashion, the Bayesian hierarchical approach leverages probability measures to cohesively model major sources of uncertainty and undertake inference in a principled manner. Figure 2 diagrams what a prototypical physical–statistical Bayesian hierarchical model might look like.

Schematic of a prototypical physical–statistical Bayesian hierarchical model. At the top layer, physical parameters, initial conditions, and boundary conditions are fed into a numerical solver, and the output of this is corrected with an error-correcting process; finally, the actual observations are dependent on the actual physical process values.

While the BHM approach to physical–statistical problems offers many
advantages, it is not an infallible approach. In particular, while
constructing a BHM may be straightforward, actually fitting a BHM to
data can be computationally difficult. In the analysis that follows,
there are only one to two physical parameters and the likelihood
function is tractable, so posterior computation is not difficult. In
more complex scenarios with many physical parameters (e.g., a basal
sliding field with a parameter for each grid point), it becomes more
difficult to compute the posterior or draw samples from it. There
are now many new tools, however, for Bayesian inference of
complicated and high dimensional posterior distributions, such as
Stan

To put the contributions of this work into context, we briefly
review glaciology papers that have employed Bayesian modeling. In

The same approach within this work can be used for non-SIA problems
in cryosphere science, and the Bayesian hierarchical model does not
necessitate analytical solutions; the analytical solutions are used
for the evaluation of the particular BHM in the paper based upon the
SIA. However, in general, the biggest difficulty will be in
developing a statistical error-correcting process that appropriately
models numerical errors for an arbitrary scenario, where a numerical
solver for a different set of dynamical equations is used. In the
SIA context, we can rely on prior studies of

The main differentiating contribution of this paper is to utilize
the exact analytical solutions from

The physics of glaciers is an extensive topic; hence, only the
portions which are most relevant to this paper are described. The
reader is pointed to the comprehensive works by

Written in terms of glacial thickness,

A summary of main parameters and notation utilized.

It is important to make explicit that there are some limitations of
this PDE. Besides ignoring longitudinal and transverse stress terms,
the PDE does not model subglacial hydrology, tunneling systems,
jökulhlaups, or surges, the dynamics of which are believed to
contribute to dynamics of glaciers as a whole. Nonetheless, one
hopes these equations may serve as a first approximation for shallow
glacier dynamics. In addition to dynamics, another important
physical consideration of glaciers is the relationship between
temperature and viscosity, which follows an Arrhenius relationship

In this section, we provide an overview and setup of the BHM
employed in addition to notation for the key parameters, both
statistical and physical. The reader is referred, however, to
Table 1 for a summary of the model parameters utilized and a
schematic illustrating the BHM in Fig. 3. We use index

Schematic of the physical–statistical BHM that has been constructed based on the SIA PDE. The main parameters and variables for each module of the physical–statistical model are highlighted in red. The main levels of a physical–statistical model shown in Fig. 2 are displayed here, along with the parameters and variables describing each level.

At the data level, the observed height for each grid point
is modeled with a normal distribution (denoted with the notation

At the process level,

Since we believe numerical errors will accumulate over time

An illustration marking the 25 measurement sites on the glacier. This is a top level view of the glacier, where the blue points indicate the glacier, the red points indicate the measurement locations, and the black points indicate locations surrounding the glacier with no glacial thickness.

Finally, at the parameter level,

It is prudent to discuss the motivations and justifications of the
various modeling choices employed in the model previously
delineated. The data level is assumed to have independent normal
errors with fixed variance; this is justified because of the
uniformity of the measuring technology used from site to site (e.g.,
digital GPS) and symmetry of errors. In contrast, the precise
functional form of the data level is chosen somewhat arbitrarily as
a Gaussian, which affords one analytical convenience. Similarly, the
error-correcting process at the process level uses a zero mean
Gaussian process with a parameterized covariance kernel (e.g.,
square exponential), mostly as an analytically manageable way to
induce spatial correlation in the error-correcting process. Spatial
correlation in numerical errors has been demonstrated, for example,
in

Moreover, it is appropriate to consider potential variations of this model for slightly different scenarios; naturally, these could fall into alternate choices of covariance kernel at the process level (e.g., Matérn, to allow for a less smooth error-correcting process) and varying errors at the data level, for example to account for compaction or densification that occurs between seasons. For the latter, a suggestion is to use conjugate inverse-gamma distributions for the variances, so that sampling can be accomplished with a Gibbs sampler. Additionally, as aforementioned, one can conceivably use any numerical solver for a PDE at the process level. Future variations may consider utilizing non-zero mean Gaussian processes for the error-correcting process, which may be more computationally costly yet perhaps more realistic. Generally, this model can be adapted to any science or engineering system that is driven by physically meaningful parameters, whose dynamics are solved by noisy numerical methods, and for which noisy and continuous data are collected with well-probed errors.

The mathematical details for how to do posterior computation within this model are given in Appendix B, which includes a derivation of an approximation to the log-likelihood that allows for computational efficiency. In summary, we compute the posterior of physical parameters directly on a grid since there are at most two physical parameters, and we use samples from the posterior distribution of physical parameters to generate predictions for glacier thickness in the future.

In

Ice viscosity posterior intervals.

Results of prediction at

Error-correcting process hyperparameters;

We make use of four analytical solutions from

Grid map used to interpret the following box plots in
Fig. 6. Eight randomly chosen grid points are selected for testing
predictions; these are not the same as the measurement locations.
Only one quadrant of the glacier is shown due to symmetry, as is done
in Figs. 9, 10, and 12 of

Thickness prediction samples 100 years from

Comparison of posterior and prior distributions of ice viscosity for test case D (i.e., mass balance field producing a periodic SIA solution).

Conditions of the simulation study have been chosen to closely
emulate the data collected at Langjökull ice cap by the
IES-UI.
In particular, 20 years of data are assumed, which is comparable to
data provided by the IES. 25 fixed measurement sites are used for
bi-annual surface elevation measurements, which are geographically
distributed on the glacier in a manner that is comparable to the
real data provided by the IES-UI. Figure 4 illustrates the locations
of these measurement sites on the glacier. Surface elevation
measurements for these sites are taken twice a year (i.e., for
summer and winter mass balance measurements). The surface elevation
measurements are generated by adding Gaussian noise (zero mean, unit
variance) to the analytical solutions at the spatiotemporal
coordinates of the fixed measurement sites. The choice of unit
variance is larger than the errors produced by digital GPS
measurements. Remaining physical parameters were chosen using the
values from Table 2 in

A comparison of posteriors under strong and weak prior information for the error-correcting process in test case D (i.e., mass balance field producing a periodic SIA solution); prior information for the error-correcting process leads to a less biased posterior, though with slightly more posterior uncertainty.

A comparison of posteriors in test case D (i.e., mass balance field producing a periodic SIA solution) under different sampling periods: data sampled once every 10 years, every 5 years, once a year, and twice a year. The general trend is that the posterior tends to become less biased as the period of sampling decreases, although the posterior becomes more diffuse. The University of Iceland Institute of Earth Sciences Glaciology Team takes measurements twice a year for summer and winter mass balance measurements.

An illustration comparing the expected variability of the error-correcting process (as per the Bayesian hierarchical model) to the observed variability of residuals at the interior, margin, and dome for test case B (i.e., no mass balance field or basal sliding). These residuals are the differences between the observed data and the numerical solution.

Validation and diagnostics of the BHM were achieved through a
combination of an assessment of posterior probability intervals, a
test of the predictive error of thickness values 100 years from the
initial time point

Table 2 contains posterior credibility intervals for ice viscosity
in test cases B–D. A 3 SD credibility interval was computed with
mean

To investigate the frequentist properties of the posterior probability distribution for ice viscosity (i.e., its performance under repeated sampling of data), 500 simulations were completed under repeated sampling of the surface elevation data at the 25 fixed measurement sites for test cases B–D. The coverage of ice viscosity for a 3 SD interval was computed for each of the simulations, where coverage for a given interval is binary; either the actual parameter value is in the interval or it is not. For test case B, in 499 of 500 simulations the 3 SD credibility interval covered the actual value of ice viscosity. In test cases C and D, the 3 SD credibility interval covered the actual value of ice viscosity in all of the simulations. This suggests that the frequentist coverage probability of the credibility interval is at least 99 %.

For test case E, one assumes that the field is described by
parameterized Eq. (16) of

While the credibility intervals achieved coverage of the actual
values of the parameters, it was noticed that the posterior
distribution for physical parameters and predictions are biased.

To assess how the posterior distribution for ice viscosity evolves
under different sampling plans of the data, we conducted a series of
simulations in test case D under varying sampling periods. In
particular, we considered data samples once every 10 years, once
every 5 years, once a year, and twice a year; the resulting
posteriors for ice viscosity are in Fig. 9. The general pattern is
that the bias of the posterior distributions reduces as the period
gets shorter, although the posterior becomes more diffuse. The
result that some posterior uncertainty does not go away with more
collected data is also consistent with the results in

To assess the accumulating error-correcting process model, we
estimated the marginal variances of the error-correcting process for
each of the time points with observed data in test case B, by
examining the residuals formed by the difference between the
numerical solver and the observed data. According to the model, the
standard deviation of these residuals at the interior of the glacier
should grow as

The primary contribution of this work has been to construct a BHM
for glacier flow based on the SIA that operates in two spatial
dimensions and time, which successfully models numerical errors
induced by a numerical solver that accumulate with time and vary
spatially. This BHM leads to full posterior probability
distributions for physical parameters as well as a principled method
for making predictions that takes into account both numerical errors
and uncertainty in key physical parameters. Furthermore, the BHM
operates in two spatial dimensions and time, which, to our
knowledge, is new to the field of glaciology. An additional
contribution is the derivation of a novel finite difference method
for solving the SIA. When tested using simulated data sets based on
analytical solutions to the SIA from

R scripts and R data files have been included in the Supplement to rerun the simulations utilized within this paper.

Here a finite difference scheme is derived for the SIA PDE. The
overarching strategy in developing this finite discretization scheme
is to take a second-order Taylor expansion for

In the following derivations, note that the subscripts mean
“derivative with respect to” (e.g.,

Now we solve for these derivatives in terms of spatial derivatives
in

Note that

In the following subsections, we go through the key details
regarding Bayesian computation for the model used in this work.
Assume

In this subsection, we derive both the likelihood of the observed
data:

Though Sect. 2.2 specifies the BHM in greater detail, the process
and data levels of the BHM (i.e., conditioning on

Assume

Conditional on

Therefore, the covariance matrix for the observed data can be
written as

The joint distribution

Posterior inference is accomplished with grid sampling

The minimum is (5.25,5.00,6.00).

The first quartile is (23.8,23.5,24.0).

The median is (27.0,26.5, 27.0).

The mean is (27.1,26.7,27.1).

The third quartile is (30.5,30.0,30.0).

The max is (51.50,49.0,51.0).

In this section, we give details for how to make predictions under
the proposed Bayesian model. Denote

We must then determine how to sample from the distribution of

The supplement related to this article is available online at:

All of the glaciologists contributed equally to this work.

The authors declare that they have no conflict of interest.

The Icelandic Research Fund (RANNIS) is thanked for funding this research. Edited by: Eric Larour Reviewed by: Lambert Caron and one anonymous referee