A Bayesian Hierarchical Model for Glacial Dynamics Based on the Shallow Ice Approximation and its Evaluation Using Analytical Solutions

Bayesian hierarchical modeling can assist the study of glacial dynamics and ice flow properties. This approach will allow glaciologists to make fully probabilistic predictions for the thickness of a glacier at unobserved spatio-temporal coordinates, and it will also allow for the derivation of posterior probability distributions for key physical parameters such as ice viscosity and basal sliding. The goal of this paper is to develop a proof of concept for a Bayesian hierarchical model constructed, which uses exact analytical solutions for the shallow ice approximation (SIA) introduced by Bueler et al. (2005). 5 A suite of test simulations utilizing these exact solutions suggests that this approach is able to adequately model numerical errors and produce useful physical parameter posterior distributions and predictions. A byproduct of the development of the Bayesian hierarchical model is the derivation of a novel finite difference method for solving the SIA partial differential equation (PDE). An additional novelty of this work is the correction of numerical errors induced through a numerical solution using a statistical model. This error correcting process models numerical errors that accumulate forward in time and spatial variation 10 of numerical errors between the dome, interior, and margin of a glacier.


Introduction
The shallow ice approximation (SIA) is a nonlinear partial differential equation (PDE) that describes ice flow when glacier thickness is relatively small compared to the horizontal dimensions.Derived from the principle of mass conser-vation, the SIA PDE depends on two key physical parameters: ice viscosity and basal sliding (sometimes described as basal friction or drag).The primary objective of this paper is to develop a Bayesian hierarchical model (BHM) for glacier flow utilizing the framework espoused by Wikle (2016) and Cressie and Wikle (2015), which allows one to (1) infer ice viscosity and basal sliding parameters and (2) make probabilistic predictions for glacial thickness at unobserved spatiotemporal coordinates.This BHM relies upon a finite difference scheme for solving the SIA that is inspired by the Lax-Wendroff method (Hudson, 1998).To validate this BHM, we utilize exact analytical solutions from Bueler et al. (2005).Hence, in addition to the development of a BHM for shallow glaciers, this paper serves as a case study for the strategy of using exact analytical solutions to validate or tune BHMs governed by physical dynamics.Moreover, the BHM developed can be applied to the general "physicalstatistical" problem (Berliner, 2003).This BHM is verified and diagnosed through a combination of assessments of posterior probability intervals, checks of predictive accuracy for glacial thickness prediction, and a comparison between observed and expected errors due to the numerical solution of the SIA.

An overview of Bayesian modeling and BHMs
Before describing how BHMs are used in physical-statistical models, particularly for geophysical problems, a very brief overview of Bayesian modeling and Bayesian hierarchical modeling is given for the uninitiated reader.A main component of Bayesian statistics is the use of probability distri- butions to model parameters thought to be fixed quantities (i.e., scientific constants); this assumption allows one to use rules of conditional probability (i.e., Bayes' theorem) to derive probability distributions for scientific quantities of interest, such as physical constants or predictions of future quantities of a system being studied.Typically, the major assumptions required as input to the analysis are prior distributions for parameters as well as a probabilistic model for the data.
The output is a probability distribution for parameters or predictions conditional on data that have been collected or observed; canonically, this is referred to as the posterior distribution.
A BHM is a Bayesian model in which the probabilistic model for data is specified in a hierarchy.Working with such a hierarchy has a number of advantages -it is usually easier to conceptualize the probabilistic model for the data, and it is also easier to model various parts of a system of interest modularly instead of all at once.Such an approach is conducive to the construction of a probabilistic model that tightly corresponds to a scientific system of interest, which is naturally thought of in separate components or modules.In a BHM, the rules of conditional probability can be used to specify the relevant distributions.For example, let us consider a mock system that has parameter vector θ , an intermediate unobserved vector S, and observations Y .θ might be statistical or physical parameters, S could be a quantity of scientific interest, and Y could be noisy observations of S. A schematic for such a model is given in Fig. 1, and the joint probability distribution is p(Y , S, θ ) = p(θ )p(S|θ )p(Y |S, θ ). (1) The distribution p(θ ) represents prior beliefs about parameters before data are collected, while p(S|θ ) represents prior knowledge or assumptions for how S is generated given parameters.For instance, this prior knowledge could entail clustering or some dependence between the elements of S. The process that models Y conditional on S and θ is p(Y |S, θ ).The posterior distribution of scientific quantities of interest, p(θ, S|Y ), is proportional to p(Y , S, θ) by Bayes' theorem.Estimates and assessments of uncertainty of scien-tific parameters and quantities can be extracted from the posterior distribution.

An overview of physical-statistical modeling with BHMs
The case for applying Bayesian hierarchical modeling and methodology in geophysics is strongly made by Berliner (2003), which he describes as "physical-statistical modeling".Particularly, employing the Bayesian hierarchical approach has the primary advantage of incorporating all relevant sources of uncertainty and randomness into one coherent probabilistic framework.The sources typically modeled together are (1) measurement errors in the data collection process; (2) lack of full knowledge of the precise functional form of the underlying physical equations describing the physical phenomenon being modeled, or else simplification of the physical system description; (3) numerical errors induced while approximating the solution to a system of PDEs; and (4) lack of precise knowledge of fundamental parameters (constants) in the underlying PDEs describing said phenomenon.In the Bayesian hierarchical framework (Berliner, 1996;Wikle, 2016;Cressie and Wikle, 2015) each of these sources of uncertainty is modeled by conditioning on the appropriate quantities, and inference is performed by sampling from or approximating the posterior distribution (the distribution of the unknown quantities of interest conditional on the observed data).At the highest level of a BHM, prior probability distributions are laid out for the physical parameters of interest.At the intermediary level, a probability distribution for the physical process of interest is laid out conditional on the parameters, which is typically motivated by a numerical scheme for solving PDEs.In particular, this level may be modeled as the sum of the output from a numerical solver and an errorcorrecting process.Finally, at the observed level, a probability distribution is set forth for the observed data conditional on the latent physical process and other relevant measurement parameters, which include variances of measuring procedures or devices.The product of these probability distributions specifies the joint distribution of all relevant quan- tities, which is proportional to the posterior distribution by the definition of conditional probability.While a traditional analysis may handle each of these disparate sources of uncertainty in an ad hoc and disjointed fashion, the Bayesian hierarchical approach leverages probability measures to cohesively model major sources of uncertainty and undertake inference in a principled manner.Figure 2 diagrams what a prototypical physical-statistical Bayesian hierarchical model might look like.
While the BHM approach to physical-statistical problems offers many advantages, it is not an infallible approach.In particular, while constructing a BHM may be straightforward, actually fitting a BHM to data can be computationally difficult.In the analysis that follows, there are only one to two physical parameters and the likelihood function is tractable, so posterior computation is not difficult.In more complex scenarios with many physical parameters (e.g., a basal sliding field with a parameter for each grid point), it becomes more difficult to compute the posterior or draw samples from it.There are now many new tools, however, for Bayesian inference of complicated and high dimensional posterior distributions, such as Stan (Stan Development Team, 2018) and INLA (Rue et al., 2017).Another potential difficulty in using BHMs for physical-statistical problems is that solving for a set of dynamical equations with a numerical method can be computationally onerous, generally speaking; while this is not a detriment in the work that follows, this can be a problem for posterior computation.One way to circumvent this issue is to emulate a numerical solver, using techniques as in Hooten et al. (2011).Another methodology that can be used to efficiently solve PDEs using Bayesian numerical analysis comes from Owhadi and Scovel (2017).Finally, Calderhead et al. (2008) suggest methodology to avoid explicitly solving ordinary differential equations by using Gaussian processes.
To put the contributions of this work into context, we briefly review glaciology papers that have employed Bayesian modeling.In Berliner et al. (2008), a Bayesian hierarchical approach is used to model ice streams in one spatial dimension, and an error-correcting process is utilized to account for a simplification in the physical model.A combination of Markov chain Monte Carlo and empirical Bayes methodology is used to fit the model, and basal shear stress and resistive stresses are included.Furthermore, wavelets are used for dimensionality reduction purposes in order to make the computations more feasible.In Pralong and Gudmundsson (2011), a Bayesian model is constructed for an ice stream where the likelihood and prior are Gaussian.The observed data are surface topography, horizontal and vertical surface velocities, and the latent system state is basal topography and slipperiness.The goal is to infer the system state given the observed data, and ultimately a maximum a posteriori point estimate is used for inference in conjunction with an iterative method for posterior maximization.Physics is incorporated by solving for the steady state solution with a finite element method (FEM) solver, given the system state.In Brinkerhoff et al. (2016) a flowline model of the SIA is considered with vertically integrated velocities.Gaussian process priors are used for all unknowns, and the Metropolis-Hastings algorithm is used to fit the model.The approach yields convincing results in simulations and a real data set.In Isaac et al. (2015), numerical methods are presented for solving a nonlinear Stokes equation boundary value problem for an ice sheet in Antarctica.The method ultimately uses a low rank approximation to a covariance matrix for the posterior distribution of a basal parameter field.Finally, and perhaps most directly related to this research, in Minchew et al. (2015) interferometric synthetic aperture radar (InSAR) is used to determine velocity fields at Langjökull and Hofsjökull in early June 2012.The velocity directions match the surface gradient, but magnitudes do not appear to coincide with the theoretical predictions of other authors (likely due to the inappropriate modeling of basal sliding).
The same approach within this work can be used for non-SIA problems in cryosphere science, and the Bayesian hierarchical model does not necessitate analytical solutions; the analytical solutions are used for the evaluation of the particular BHM in the paper based upon the SIA.However, in general, the biggest difficulty will be in developing a statistical error-correcting process that appropriately models numerical errors for an arbitrary scenario, where a numerical solver for a different set of dynamical equations is used.In the SIA context, we can rely on prior studies of Bueler et al. (2005) to tell us something about how the numerical errors will look in the SIA case -i.e., spatial variation in the scale of numerical errors between the dome, interior, and margin.This error pattern will not hold in general for other geometries and systems, and so different prior studies must be utilized or, if these do not exist, the hierarchical model must be extended to include a more general model for the error-correcting process (e.g., a spatially varying field for the log of the scale of numerical errors with a Gaussian process prior).
The main differentiating contribution of this paper is to utilize the exact analytical solutions from Bueler et al. (2005) to evaluate the BHM employed.An additional novelty is the derivation and utilization of a novel finite difference method for solving the SIA PDE that operates in two spatial dimensions; consequently, the Bayesian model employed also operates in two spatial dimensions, in addition to time.Finally, we explicitly model the errors due to a numerical solver with a spatiotemporal statistical process, which accounts for different scales of spatial variability within the dome, within the interior, and within the margin of the glacier, as well as accumulation of numerical errors forward in time.

Shallow ice approximation
The physics of glaciers is an extensive topic; hence, only the portions which are most relevant to this paper are described.The reader is pointed to the comprehensive works by Cuffey and Paterson (2010) and van der Veen (2017) for further reading on the subject.PDEs for glaciers are derived from the following considerations.First, glaciers are modeled as very slowly moving and viscous fluids.By applying the principle of mass conservation, the net ice flux moving in or out of an infinitesimal column of the glacier located at some spatial coordinate, plus the net mass change due to precipitation or melting, yields the change in the height of the column over an infinitesimal time interval.Such a heuristic argument provides a PDE in two dimensions for a glacier, with averaged velocities in two spatial dimensions.The PDE relates the time derivative of the thickness of the glacier to the flux and net mass change (i.e., mass balance).The main assumptions are that ice is isotropic and homogeneous and also that longitudinal and transverse stress terms can be ignored, which is reasonable when the overall thickness of the glacier is small in comparison to its width.Under these assumptions, the velocity of the ice is made up of two additive components.The first component of the velocity is based upon deformation due to gravity, which acts in the direction of steepest descent of the surface and is a function of the ice viscosity parameter.The second component of velocity also acts along the gradient of the glacier surface and is a function of the basal sliding parameter field.The formulations stem from Glen's flow law (Glen, 1955(Glen, , 1958) and Weertman's sliding relation (Weertman, 1964).
Written in terms of glacial thickness, H (x, y, t), the SIA PDE is Here H (x, y, t) is the thickness of the glacier at spatial coordinate (x, y) and time t, u is the average velocity in the x direction, and v is the average velocity in the y direction.This model is vertically integrated, and hence only two spatial dimensions are modeled.R(x, y, t) is the bedrock elevation, which is assumed to be constant in time and thus can be written as R(x, y); ḃ(x, y, t) is the mass balance field, B and C 0 γ are physical parameters governing the viscosity and basal sliding; ρ governs the mass density of the ice; and finally n is Glen's flow law constant, typically set to 3. Initial conditions (i.e., H (x, y, 0)) are assumed to be given, and the boundary condition H ≥ 0 is assumed, just as in Table 2 of Bueler et al. (2005).Additional derivations and details on the SIA are covered in a variety of sources, including Fowler and Larson (1978), Hutter (1982), Hutter (1983), andFlowers et al. (2005).
It is important to make explicit that there are some limitations of this PDE.Besides ignoring longitudinal and transverse stress terms, the PDE does not model subglacial hydrology, tunneling systems, jökulhlaups, or surges, the dynamics of which are believed to contribute to dynamics of glaciers as a whole.Nonetheless, one hopes these equations may serve as a first approximation for shallow glacier dynamics.In addition to dynamics, another important physical consideration of glaciers is the relationship between temperature and viscosity, which follows an Arrhenius relationship (Cuffey and Paterson, 2010).However, in the context of Icelandic glaciers like Langjökull, this is not consequential since they are temperate (i.e., their temperature is at melting point).

Bayesian hierarchical model
In this section, we provide an overview and setup of the BHM employed in addition to notation for the key parameters, both statistical and physical.The reader is referred, however, to a schematic illustrating the BHM in Fig. 3.We use index i to refer to spatial coordinates (for this model space is assumed to be discretized into squares) and index j to refer to time coordinates.Furthermore, the notation S .,jrefers to the surface elevation at all spatial coordinates for a particular time index j .Keeping in line with the Bayesian hierarchical modeling framework from Wikle (2016) and Cressie and Wikle (2015), we delineate the models used for the data level, process level, and parameter level.The primary inferential goals are to infer physical process parameters (i.e., ice viscosity and basal sliding) and to predict the height of the glacier at various time points and spatial locations besides those that have been observed (aligned to a grid for which we have bedrock and initial surface height conditions).Within the Bayesian framework, all inferential goals may be achieved by determining the posterior distribution of these quantities (i.e., their probability distributions conditioned on observed data).At the data level, the observed height for each grid point is modeled with a normal distribution (denoted with the notation N (µ,τ 2 ), where µ is the mean and τ 2 is the variance), where the mean is the physical process value, and the variance is assumed to be known.In particular it is assumed that , where Y ij is the observed surface elevation of the glacier at location i and time index j , S ij is the latent (i.e., unobserved) surface elevation at location i and time index j (equivalent to sum of the glacier thickness and bedrock level), and σ 2 is the variance of the measurement errors for the surface height observations, a fixed and known quantity.The number of observed spatial indices is assumed to be much smaller than the number of total spatial indices modeled at the latent level.
At the process level, S .,j= f (S 0 , B, ḃ, C 0 γ , j ) + X j , where f is a numerical solution to the SIA at time index j , and X j is an error-correcting process at time index j .A finite difference version of the SIA PDE is described in full detail in Appendix A. In principle, however, the function f may be derived from other numerical solvers.Additionally, it should be made clear that f is the output of a numerical solver for the underlying dynamics.Also, S 0 denotes the glacier surface elevation values at the initial time point, which are assumed to be known; e.g., with high precision light detection and ranging (lidar) initial conditions provided by the Institute of Earth Sciences at the University of Iceland (IES-UI).ḃ.,j is the mass balance field for time index j at all the grid points, which is assumed to be fixed and known for the purpose of this analysis.B is the ice viscosity parameter and C 0 γ is the basal sliding field, which itself is parameterized with µ max as in Eq. ( 16) of Bueler et al. (2005) and, furthermore, is static in time.For compact notation, θ is used to refer to B in test cases B-D and (B, µ max ) jointly in test case E.
Since we believe numerical errors will accumulate over time (Bueler et al., 2005), we define the error-correcting process as follows: X j +1 = X j + j +1 , where j +1 is MVN(0, ).(MVN stands for multivariate normal, and the first argument is the mean and the second is the covariance.) is block diagonal, with three blocks for indices corresponding to the margin, interior, and dome of the glacier (the margin is defined as the last grid squares before the glacier drops to 0 thickness, and the dome is the origin grid square), respectively.Each block is defined from a squareexponential kernel with the same length scale, denoted by φ, but distinct marginal variances, σ 2 interior , σ 2 margin , and σ 2 dome .The motivation for using different marginal variance parameters is to account for the widely different errors exhibited at the dome, interior, and margin, as is demonstrated by Bueler et al. (2005) and Jarosch et al. (2013).This error-correcting process leads to a tractable likelihood function, as is shown in Appendix B.
Finally, at the parameter level, B and µ max are endowed with truncated normal distributions as priors.B has a normal prior with mean 3.5×10 −24 and standard deviation 3×10  for ice viscosity and Pa −1 m s −1 for basal sliding.)The prior supports for B and µ max provide plausible values for temperate ice caps.
It is prudent to discuss the motivations and justifications of the various modeling choices employed in the model previously delineated.The data level is assumed to have independent normal errors with fixed variance; this is justified because of the uniformity of the measuring technology used from site to site (e.g., digital GPS) and symmetry of errors.In contrast, the precise functional form of the data level is chosen somewhat arbitrarily as a Gaussian, which affords one analytical convenience.Similarly, the error-correcting process at the process level uses a zero mean Gaussian process with a parameterized covariance kernel (e.g., square exponential), mostly as an analytically manageable way to induce spatial correlation in the error-correcting process.Spa-tial correlation in numerical errors has been demonstrated, for example, in Bueler et al. (2005).
Moreover, it is appropriate to consider potential variations of this model for slightly different scenarios; naturally, these could fall into alternate choices of covariance kernel at the process level (e.g., Matérn, to allow for a less smooth errorcorrecting process) and varying errors at the data level, for example to account for compaction or densification that occurs between seasons.For the latter, a suggestion is to use conjugate inverse-gamma distributions for the variances, so that sampling can be accomplished with a Gibbs sampler.Additionally, as aforementioned, one can conceivably use any numerical solver for a PDE at the process level.Future variations may consider utilizing non-zero mean Gaussian processes for the error-correcting process, which may be more computationally costly yet perhaps more realistic.Gen- erally, this model can be adapted to any science or engineering system that is driven by physically meaningful parameters, whose dynamics are solved by noisy numerical methods, and for which noisy and continuous data are collected with well-probed errors.
The mathematical details for how to do posterior computation within this model are given in Appendix B, which includes a derivation of an approximation to the log-likelihood that allows for computational efficiency.In summary, we compute the posterior of physical parameters directly on a grid since there are at most two physical parameters, and we use samples from the posterior distribution of physical parameters to generate predictions for glacier thickness in the future.
3 Experiments to assess the Bayesian hierarchical model

Analytical solutions
In Bueler et al. (2005), analytical solutions to the SIA are presented as benchmarks for numerical solvers of the SIA.As opposed to using other benchmarks such as the EISMINT experiment (Payne et al., 2000), which itself is based on numerical modeling and hence subject to numerical errors, the benchmark solutions provided in this work can be treated as ground truth to compare to.(This is in the sense that these are exact solutions of the SIA, but it must be stressed that the SIA is an approximation of the true physical dynamics governing a glacier.)These analytical solutions serve as a basis  for simulating data sets to validate the Bayesian hierarchical approaches developed in this paper.In other words, the exact analytical solutions provide the latent process in the BHM, conditioning on given initial conditions and mass balance functions.Hence to simulate data from the BHM, one can bypass the need to numerically solve the PDE and introduce errors.We make use of four analytical solutions from Bueler et al. (2005) that are summarized here, but the reader is referred to the original paper for the exact mathematical formulation and derivation of these analytical solutions.All of the analytical solutions assume a flat bedrock.Test case B includes no mass balance or basal sliding, and, consequently, the motion of the glacier is only attributable to deformation due to gravity.Test case C makes use of a mass balance field that is inversely proportional to time and directly proportional to thickness, but there is no basal sliding field modeled.Similarly, test case D utilizes a mass balance field with no basal sliding field modeled.In distinction from test case C, how- ever, the mass balance field of test case D is such that the overall solution for glacial thickness is periodic in time.Finally, in contrast to the other tests, test case E has a spatially varying basal sliding field, yet the overall solution is static in time.Note that test A was not utilized in this study because it is a steady state solution without a varying mass balance or basal sliding field.

Simulation study test details
Conditions of the simulation study have been chosen to closely emulate the data collected at Langjökull ice cap by the IES-UI.In particular, 20 years of data are assumed, which is comparable to data provided by the IES. 25 fixed measurement sites are used for bi-annual surface elevation measurements, which are geographically distributed on the glacier in a manner that is comparable to the real data provided by the IES-UI.Figure 4 illustrates the locations of these measurement sites on the glacier.Surface elevation measurements for these sites are taken twice a year (i.e., for summer and winter mass balance measurements).The surface elevation measurements are generated by adding Gaussian noise (zero mean, unit variance) to the analytical solutions at the spatiotemporal coordinates of the fixed measurement sites.The  (b) shows predictions that are between the dome and margin of the glacier (blue squares of Fig. 5), and panel (c) shows predictions that are towards the dome of the glacier (black squares of Fig. 5).Refer to Fig. 5 for a grid map to spatially reference each of the box plots.As can be expected according to Bueler et al. (2005), the largest errors occur at the dome and the margin.Note on interpretation: the middle of each box is the median, the interquartile range is denoted by the box, and 1.5 of the interquartile range beyond the first and third quartile is illustrated with the whiskers.Those points that are more than 1.5 of the interquartile range beyond the first and third quartiles are outliers, and they are denoted with circles.
choice of unit variance is larger than the errors produced by digital GPS measurements.Remaining physical parameters were chosen using the values from Table 2 in Bueler et al. (2005) to allow for comparisons to this work and the EIS-MINT I experiment (Payne et al., 2000).

Results
Validation and diagnostics of the BHM were achieved through a combination of an assessment of posterior probability intervals, a test of the predictive error of thickness values 100 years from the initial time point t 0 , and a comparison between observed and expected values for numerical errors based on the error-correcting process utilized.As is discussed in more detail below, these assessments suggest that the BHM is useful for inference of posterior probability distributions for physical parameters, prediction of future glacial thickness values on the order of 100 years, and the modeling of numerical errors at the margin, interior, and dome of the glacier.Table 2 contains posterior credibility intervals for ice viscosity in test cases B-D.A 3 SD credibility interval was computed with mean ±3 standard deviations of the posterior samples.In all of these test cases, the 3 SD credibility interval covers the actual ice viscosity.Furthermore, as is apparent in Table 3, the predictive error, relative to thickness values on the order of a kilometer, appears be small overall, particularly at the interior; predictive error is the root mean squared difference between predictions and the exact analytical values for each of the test cases.Note that test E was not included with the predictive checks since it is static in time.Consistent with Bueler et al. (2005) and Jarosch et al. (2013), however, errors are greatest at the margin and dome of the glacier (evident in Fig. 6).Nonetheless, the predictive distributions cover the actual thicknesses even at these extremes.This illustrates the utility of the BHM for accounting for errors induced by the numerical solution of the SIA.Additionally, an illustration comparing the posterior and prior distributions for test case D is shown in Fig. 7.
To investigate the frequentist properties of the posterior probability distribution for ice viscosity (i.e., its performance under repeated sampling of data), 500 simulations were completed under repeated sampling of the surface elevation data at the 25 fixed measurement sites for test cases B-D.The coverage of ice viscosity for a 3 SD interval was computed for each of the simulations, where coverage for a given in-Figure 8.A comparison of posteriors under strong and weak prior information for the error-correcting process in test case D (i.e., mass balance field producing a periodic SIA solution); prior information for the error-correcting process leads to a less biased posterior, though with slightly more posterior uncertainty.
terval is binary; either the actual parameter value is in the interval or it is not.For test case B, in 499 of 500 simulations the 3 SD credibility interval covered the actual value of ice viscosity.In test cases C and D, the 3 SD credibility interval covered the actual value of ice viscosity in all of the simulations.This suggests that the frequentist coverage probability of the credibility interval is at least 99 %.
For test case E, one assumes that the field is described by parameterized Eq. ( 16) of Bueler et al. (2005).That is, in polar coordinates with radius r and angle , for θ 1 < < θ 2 and r 1 < r < r 2 , and C 0 γ = 0 otherwise.In addition to ice viscosity, the inferential object of interest is the scale parameter µ max .The 3 SD posterior cred- an error-correcting process).To demonstrate that this also holds in the BHM presented in this paper, we consider the following comparison.To assign prior information to the error-correcting process, we consider a discrete parameter set for σ 2 interior , σ 2 margin , and σ 2 dome : {0.1, 1, 10, 100} in units of square meters, which corresponds to different orders of magnitude for variability.In one case, we ignore prior information from Bueler et al. (2005) and put equal probability mass on the parameter space for these parameters.In the second case, we encode more realistic prior information into the scales of errors at the three regions: equal mass on 10 and 100 at the margin, equal mass on 0.1 and 1 at the interior, and equal mass at 1 and 10 at the dome (all units are square meters).In both cases, the parameter φ is fixed at 70 km to place emphasis on the scales of error.The results of inferring the posterior distribution for ice viscosity B are shown in Fig. 8. Consistent with Brynjarsdóttir and O'Hagan (2014), the posterior distribution of the physical parameter B is much less biased when prior information is encoded into the errorcorrecting process.
To assess how the posterior distribution for ice viscosity evolves under different sampling plans of the data, we conducted a series of simulations in test case D under varying sampling periods.In particular, we considered data samples once every 10 years, once every 5 years, once a year, and twice a year; the resulting posteriors for ice viscosity are in Fig. 9.The general pattern is that the bias of the posterior distributions reduces as the period gets shorter, although the posterior becomes more diffuse.The result that some posterior uncertainty does not go away with more collected data is also consistent with the results in Brynjarsdóttir and O'Hagan (2014).The particular period we chose in this analysis (data collected twice a year) was meant to model how the UI-IES Glaciology Team collects data, that is, twice a year due to summer and winter mass balance measurements.
To assess the accumulating error-correcting process model, we estimated the marginal variances of the error- correcting process for each of the time points with observed data in test case B, by examining the residuals formed by the difference between the numerical solver and the observed data.According to the model, the standard deviation of these residuals at the interior of the glacier should grow as σ 2 + tσ 2 interior , where t is the number of time steps (and likewise at the dome and margin).Figure 10 shows a match between observed and expected in this regard, and, in particular, the 99 % confidence bands appear to cover the expected variability as time progresses.Also apparent from this figure is that, as time progresses, the errors at the margin, dome, and interior contribute more error than measurement error, which is on the order of 1 m.Moreover, this is also evident in Table 4, since after 200 time steps from t 0 (i.e., 20 years), the marginal variances will be 200σ 2 interior , 200σ 2 margin , and 200σ 2 dome based on the accumulating errors model; all of these values exceed 1, the measurement variance.

Summary, discussion, and future work
The primary contribution of this work has been to construct a BHM for glacier flow based on the SIA that operates in two spatial dimensions and time, which successfully mod-els numerical errors induced by a numerical solver that accumulate with time and vary spatially.This BHM leads to full posterior probability distributions for physical parameters as well as a principled method for making predictions that takes into account both numerical errors and uncertainty in key physical parameters.Furthermore, the BHM operates in two spatial dimensions and time, which, to our knowledge, is new to the field of glaciology.An additional contribution is the derivation of a novel finite difference method for solving the SIA.When tested using simulated data sets based on analytical solutions to the SIA from Bueler et al. (2005), the results herein indicate that our approach is able to infer meaningful probability distributions for glacial parameters, and, furthermore, this approach makes probabilistic predictions for glacial thickness that adequately account for the error induced by using a numerical solver of the SIA.A future goal is to create an R package for fitting a generalized version of the model used within this work, where the function f (.) is provided by the user.This will allow glaciologists to extend the modeling approach we have developed to other similar scenarios in which the physical dynamics are more complex than the SIA.An additional scenario for which this package can be useful is when the numerical method is not a finite difference method, e.g., a FEM.To this end, we will at-Appendix A: Finite difference method for the shallow ice approximation Here a finite difference scheme is derived for the SIA PDE.The overarching strategy in developing this finite discretization scheme is to take a second-order Taylor expansion for H (x, y, t) with x, y fixed, and then equate the resultant time derivatives, H t and H tt , to functions of spatial derivatives by using the original SIA PDE.That is, one starts with the approximation H (x, y, t + t) ≈ H (x, y, t) + H t (x, y, t) t + H tt (x, y, t) t 2 /2 and uses the first equation of Sect. 2 to write H t and H tt in terms of spatial derivatives.Finally, central differences in space are substituted for the spatial derivatives.This finite difference scheme is motivated by the Lax-Wendroff (Hudson, 1998) method, which is generally better than finite difference methods that use only a single-order Taylor expansion (indeed, in the advection-diffusion equation such methods may be unconditionally unstable).
In the following derivations, note that the subscripts mean "derivative with respect to" (e.g., H t means derivative of H with respect to t): Now we solve for these derivatives in terms of spatial derivatives in H (x, y, t), the glacier thickness, and R(x, y), the bedrock level.The derivation makes repeated use of the differentiation rule for products, the chain rule for differentiation, and equality of partial derivatives.(e.g., H xt = H tx ).
By symmetry in x and y, −[vH ] y can be analogously derived: In this subsection, we derive both the likelihood of the observed data: p(Y k , . .., Y T |θ) and an approximation to the likelihood.Though Sect.2.2 specifies the BHM in greater detail, the process and data levels of the BHM (i.e., conditioning on θ) are concisely written as follows.
Assume j ∈ 1, 2, . ..T and c ∈ 1, 2, . ..N ; hence there are N total spatial vectors observed with a period of length k.Furthermore, X 1 is marginally MVN(0, ).That is, the process level vectors, S j , are modeled conditional on the parameter level and the error-correcting process.The data level vectors, Y ck , are generated conditional on the process level S ck .Throughout the following, we condition on θ being fixed.

B1.1 The exact likelihood
Conditional on θ , the distribution of (Y k , . .., Y T ), viewed as one long random vector, is multivariate normal.Also, conditional on θ , the mean of (Y k , . .., Y T ) is (Af (θ , k), . ..Af (θ , T )) because both (X k , . .., X T ) and Therefore, the covariance matrix for the observed data can be written as M ⊗ (A A ) + σ 2 I , where M ij = k min (i, j ) and M ∈ R N×N .This is a useful matrix representation because the inverse of M is band-limited and sparse, for which there exist efficient computationally efficient linear algebraic routines (Rue, 2001).

B2 Posterior computation
Posterior inference is accomplished with grid sampling (Gelman et al., 2013); this approach directly computes the posterior distribution, p(θ |Y k , . .., Y T ) of the parameter, proportional to p(Y k , . .., Y T |θ )p(θ ), on a grid of plausible values.The likelihood is derived in the previous subsection.Parameters for the error-correcting process are selected using knowledge elicited from the studies of Bueler et al. (2005).To verify the sensitivity of grid sampling to the grid width, three grid widths for B are considered: 0.25, 0.50, and 1, and the grid's range is from [1,70] (all in units of 10 −25 s −1 Pa −3 ).The summary statistics for generating 10 6 posterior samples from more to less fine (0.25, 0.50, 1) are given below.
The similarity of summary statistics across grid widths indicates that the posterior samples are not very sensitive to grid width; a grid width of 0.50 was used for the analyses within.Moreover, the posterior samples in this check were generated for test case D (i.e., mass balance field producing a periodic solution to the SIA).

B3 Making spatiotemporal predictions of glacial surface elevation
In this section, we give details for how to make predictions under the proposed Bayesian model.Denote S T end ∈

Figure 1 .
Figure 1.Schematic of a simple Bayesian hierarchical model; here, θ represents physical parameters, S represents unobserved scientific quantities of interest, and Y represents the observed data.

Figure 3 .
Figure3.Schematic of the physical-statistical BHM that has been constructed based on the SIA PDE.The main parameters and variables for each module of the physical-statistical model are highlighted in red.The main levels of a physical-statistical model shown in Fig.2are displayed here, along with the parameters and variables describing each level.

Figure 4 .
Figure 4.An illustration marking the 25 measurement sites on the glacier.This is a top level view of the glacier, where the blue points indicate the glacier, the red points indicate the measurement locations, and the black points indicate locations surrounding the glacier with no glacial thickness.

Figure 5 .
Figure 5. Grid map used to interpret the following box plots in Fig. 6.Eight randomly chosen grid points are selected for testing predictions; these are not the same as the measurement locations.Only one quadrant of the glacier is shown due to symmetry, as is done in Figs. 9, 10, and 12 of Bueler et al. (2005), and the width of each cell is 10 5 m.Additionally, the red squares indicate locations at or close to the margin, the blue squares indicate locations that are between the dome and margin of the glacier, and the black squares indicate locations at or close to the dome of the glacier.Moreover, glacier grid squares with non-zero thickness are shaded in grey to indicate the glacier location.

Figure 6 .
Figure 6.Thickness prediction samples 100 years from t 0 for test case B (i.e., no mass balance field or basal sliding).Triangles indicate the actual thickness values from the analytical solution.Panel (a) shows predictions that are close to the margin (red squares of Fig. 5), panel(b)  shows predictions that are between the dome and margin of the glacier (blue squares of Fig.5), and panel (c) shows predictions that are towards the dome of the glacier (black squares of Fig.5).Refer to Fig.5for a grid map to spatially reference each of the box plots.As can be expected according toBueler et al. (2005), the largest errors occur at the dome and the margin.Note on interpretation: the middle of each box is the median, the interquartile range is denoted by the box, and 1.5 of the interquartile range beyond the first and third quartile is illustrated with the whiskers.Those points that are more than 1.5 of the interquartile range beyond the first and third quartiles are outliers, and they are denoted with circles.

Figure 7 .
Figure 7.Comparison of posterior and prior distributions of ice viscosity for test case D (i.e., mass balance field producing a periodic SIA solution).
Figure9.A comparison of posteriors in test case D (i.e., mass balance field producing a periodic SIA solution) under different sampling periods: data sampled once every 10 years, every 5 years, once a year, and twice a year.The general trend is that the posterior tends to become less biased as the period of sampling decreases, although the posterior becomes more diffuse.The University of Iceland Institute of Earth Sciences Glaciology Team takes measurements twice a year for summer and winter mass balance measurements.

Figure 10 .
Figure10.An illustration comparing the expected variability of the error-correcting process (as per the Bayesian hierarchical model) to the observed variability of residuals at the interior, margin, and dome for test case B (i.e., no mass balance field or basal sliding).These residuals are the differences between the observed data and the numerical solution.
Schematic of a prototypical physical-statistical Bayesian hierarchical model.At the top layer, physical parameters, initial conditions, and boundary conditions are fed into a numerical solver, and the output of this is corrected with an error-correcting process; finally, the actual observations are dependent on the actual physical process values.

Table 1 .
A summary of main parameters and notation utilized.

Table 3 .
Results of prediction at t 0 + 100 years.RMSE stands for root mean squared error.This is calculated by taking the average of the squared difference between the actual glacial thickness values and predicted glacial thickness values, and then taking the square root.

Table 4 .
Error-correcting process hyperparameters; σ 2 dome is the error-correcting process variance at the dome, σ 2 interior is the errorcorrecting process variance at the interior, σ 2 margin is the errorcorrecting process variance at the margin, and φ is the length scale parameter.