Antarctica and Greenland hold enough ice to raise sea level by more than 65 m if both ice sheets were to melt completely. Predicting future ice sheet mass balance depends on our ability to model these ice sheets, which is limited by our current understanding of several key physical processes, such as iceberg calving. Large-scale ice flow models either ignore this process or represent it crudely. To model fractured zones, an important component of many calving models, continuum damage mechanics as well as linear fracture mechanics are commonly used. However, these methods have a large number of uncertainties when applied across the entire Antarctic continent because the models were typically tuned to match processes seen on particular ice shelves. Here we present an alternative, statistics-based method to model the most probable zones of the location of fractures and demonstrate our approach on all main ice shelf regions in Antarctica, including the Antarctic Peninsula. We can predict the location of observed fractures with an average success rate of 84 % for grounded ice and 61 % for floating ice and a mean overestimation error rate of 26 % and 20 %, respectively. We found that Antarctic ice shelves can be classified into groups based on the factors that control fracture location.

In recent years, increased positive-temperature anomalies have been observed
in Antarctica

Overall, research on crevasse propagation started as early as 1955, and
calving parameterization has been under development for the last 20 years. It
has been shown that the increased ice mass loss from Antarctica is caused
primarily by an increased number of calving events in the last 2 decades,
which has led to significant ice front retreat (e.g. the collapse of the Larsen
B Ice Shelf (IS) and break-off of a part of the Larsen C Ice Shelf;

Developing a reliable calving law requires knowledge of where these fractures
are located and how they evolve. Fractures in the Antarctic Ice Sheet and ice
shelves are visible in satellite imagery and can occur more often than
every 50 m. Because of the size of Antarctica, the only feasible means of
creating a database of fractured zones is through the analysis of satellite
imagery and altimetry. However, fractures can be covered by snow and/or not
be visible because of poor resolution of the available imagery. It is for
these reasons that inverse methods are often used

The aim of this study is to construct an empirical model that can predict the
locations of fractures. We focus on modelling of crevasses (surface fractures
less than 200 m wide) on the surface of the Antarctic Ice Sheet and
surrounding ice shelves. Our model for predicting fractured zones is based on a
probabilistic approach, where we utilize a logistic regression algorithm
(LRA) to find a relationship that enables the prediction of fracture
locations. Our approach accounts for many potential parameters – including
geometry, mechanical properties and flow regime (predictor parameters) – and is
based on a combination of modelling and remote sensing. We use a data set of
fracture observations, built by careful manual selection of the locations of
visible fractures in satellite images, to build a model that can identify
fractured regions on most of the Antarctic ice shelves as well as grounded
ice regions around ice shelves in Antarctica, including the Antarctic
Peninsula. We compare the ability of our model to match observations of
fractures from satellite imagery versus the predictive ability of the
damage-based method of

In Sect.

Development of calving parameterizations.

A number of calving parameterizations have been developed and implemented in
some software packages, but none of them includes the propagation of
fractures both vertically and horizontally. Most of the available
parameterizations are specific to a particular case and set of predictors

The majority of calving models are one- or two-dimensional and are built on
simplified physics

A number of other approaches such as calving laws proposed by

To date, linear elastic fracture mechanics (LEFM)

The continuum damage mechanics approach, based on the method suggested by

Recently,

Our statistical model is built upon knowledge of the flow and geometry
parameters collected from 35 ice shelf regions in Antarctica (see
Fig.

To construct the model, we needed two types of data: observations of fractures
and the flow/geometry information in each fracture. To run the model for a
particular ice shelf region, only information about the flow/geometry is
required. We chose 45 ice shelves/glaciers where high-resolution imagery was
available. For calibration of the model we constructed a data set containing
information about flow regime/geometry and the locations of some observed
fracture or lack thereof (described in Sect.

Data about geometry and velocity used in our statistical model were taken
from observations, while other flow and geometry parameters were calculated
either using ISSM or independently (described in Sect.

ISSM is a fully dynamic model that includes both two-dimensional (2-D) and
three-dimensional (3-D) stress balance approximations. Our experiments rely
on the shelfy-stream approximation (SSA) as it is computationally cheap and
suitable for modelling floating ice shelves and grounded ice streams
undergoing widespread basal sliding. We ran one simulation to create a stress
balance solution per region (ice shelf/glacier), which allowed us to obtain
the predictor parameters required for the calculation of the probability of
fracturing. We used SeaRISE air temperature, snow accumulation and geothermal
heat flux

For all the simulations, we used a multi-resolution mesh approach for the chosen domains in East and West Antarctica as well as the Antarctic Peninsula. This method was chosen due to the fact that, on the one hand, using a 50 or 100 m mesh resolution created a significant increase in the computational time of the model but that, on the other hand, it was important to have a fine-resolution mesh in order to model surface fractures, as the distance between them is normally around 50–100 m. In order to have a fine resolution together with smaller computational time, we first calculated all the main predictors on a 200 m resolution mesh (to achieve a faster computational speed) and then interpolated the values to nodes on a 100 m resolution mesh (to use in our fracture model resolved at 100 m). All further computations and analyses were performed on this finer mesh.

We focus only on predicting the location of surface crevasses without
modelling rifts, since the processes that cause rift opening might differ
from processes that allow surface crevasses to stay open. In fact rifts might
be formed due to presence of basal fractures, tidal deformation

In order to obtain the observations of the location of fractures on the ice sheet surface we used satellite images taken from Google Earth Pro, where images of the Antarctic Ice Sheet were available at different spatial resolutions. However, to be able to see surface fractures, we limited our choice to only images with a horizontal resolution smaller than 10 m for the period between 2011 and 2015. We included only regions with at least one high-resolution satellite image and where it was relatively easy to identify surface fractures.

The visual images of the ice surface include many features, and it is
important to distinguish the surface fractures from other patterns such as
surface troughs (formed as a result of presence of bottom crevasses or
subglacial channels). It has been suggested by

To construct a set of observed fractures, we manually selected fractured locations as well as non-fractured zones that we could identify in the satellite images. Most of the identified non-fractured regions are located in blue-ice regions, which are areas with low snow accumulation or where the snow has been removed by the wind. In such areas we can clearly see where the ice is not damaged. It is important to note that in some locations the resolution of images was not always sufficiently high to clearly see every fracture. Moreover, some surface fractures may be covered in snow and, therefore, are not identified by our analysis.

For 35 ice shelf regions we constructed two different types of data sets:
“calibration” and “evaluation”, for building the statistical model and
for studying the output of the model, respectively. For the other 10 ice
shelves we only constructed the evaluation data set as we did not use
these 10 regions to construct our model. In the calibration data set we
select a subset of observed fractures, being a representative sample of
locations where fractures are found on 35 ice shelves/glaciers. The
statistical approach requires training in a large number of ice shelf regions
with different characteristics and a variety of observations. Therefore, we
used the calibration data set to build the LRA model. This improves the
reliability of the model, as the diversity in sampling provides a better
estimation of correlation coefficients (called

We form the evaluation data set to test how well our new approach predicts fractures for each ice shelf region individually. The evaluation set for each glacier/ice shelf is much larger than the number of fractures selected from each of the regions for calibration as we did not need every observed fracture to construct the model (as previously mentioned, it was the variety and not the number of data points that was required for a successful construction of our model). Although we did not need to select all the fractures on the ice sheet surface to build the calibration data set, to construct the evaluation data set we made a concerted effort to select the majority of the visible surface fractures in each of the ice shelf regions. It is possible that some fractures were missed due to the large spatial extent of the experiments. Moreover, we do not present every fracture in the figures in this paper in order to make the figures legible. In addition, we perform validation experiments with another 10 ice shelf regions to test how well the LRA works for a randomly selected ice shelf/glacier that was not a part of the construction of the model.

It is important to note that the evaluation data sets are not just discrete values (0 and 1) but are rather a continuous field representing the probability of observing a fracture in a location. In a node where we could see a fracture, we assigned the probability of observing a fracture to 1. Nodes around the observed fracture are more likely than not to be fractured. It is important to mention that the spacing of crevasses is often linked to their depth. A single crevasse can penetrate much deeper than a crevasse in a set of closely spaced crevasses. However, in this study we do not focus on estimating either depth or spacing of crevasses. Therefore, we then set the probability of observing a fracture to simply decrease from 1 to 0.55, decreasing with increased distance from the observed fracture (within 500 m radius). On the other hand, when a non-fractured node was found within a region with high-resolution imagery, we assigned the probability of fracturing in this node to 0.05. Within a 500 m radius of the non-fractured node we allowed the probability to increase linearly from 0.05 to 0.4. In all other nodes we set the probability of observing a fracture to 0.5. The last assumption is due to the fact that if there are no fractures visible in the area of poor resolution of the image it is equally likely for the node to be fractured or non-fractured. This allows us to account for uncertainties of the observations, since it is not always possible to determine whether there are no fractures or whether fractures are just not visible. We do not include any information about the depth and spacing of the crevasses.

We used statistics-based methods as an alternative to physics-based approaches in order to gain insights into the location of fractured zones in ice shelves and glaciers. In the well-known damage-based approach, the damage variable varies from 0 to 1, representing the fraction of a volume that is fractured, with 0 being not fractured and 1 being fully fractured. Instead of using the damage-based method, we use the LRA, which provides us with the probability of fracturing (also varying from 0 to 1). We then apply this method to derive fracture likelihood functions for both floating ice shelves and the grounded ice for any ice shelf region. To construct the likelihood function, we need to find coefficients that describe the relationships between predictor variables and what we want to predict (in our case it is surface fractures not including rifts). Thus, in order to create a statistical model, we use our calibration data set of observations of surface fractures and non-fractures as well as information about the flow regime at the locations of each observation (predictor parameters).

Our main goal is to determine the most likely location of surface fractures.
We do not focus on identifying the location of their initiation, since it is
not possible to know whether the observed fractures were formed where
observed or advected to that position after having formed upstream. We tried
to select observed fractures where there were no other fractures visible
upstream, meaning that the observed fractures would identify the initiation
zones, but this may not always be possible. The model will, therefore,
predict the locations not only of initiation of fractures but also of some zones to which
fractures have advected. For this reason, we do not distinguish between the
high-advection (advection from upstream) and low-advection (because of local
stresses) cycles

This section is structured in the following order. First, we present our method (logistic regression algorithm) used for predicting the formation of fractures. Second, we describe the predictor factors (predictors) we include in this method. Then, two methods used for optimizing a set of predictor factors are presented (Bayesian-based algorithm and Jensen–Shannon divergence). Finally, we present the damage calculation used for a qualitative comparison with our results.

Predictor factors (predictors).

Logistic regression is a statistical technique generally used to classify
data based on values of input fields. The method is similar to linear
regression but takes a categorical target field (in our case nodes which are
fractured or non-fractured) instead of a numerical series. The logistic
function allows us to calculate the likelihood of an event as a function of
different predictor factors (see Table

To apply the logistic regression algorithm, we constructed a logistic
function

The unknown coefficients

Once the values of

We started with a set of 19 predictors,

All the experiments and sets of parameters used in LRA were constructed separately for floating and grounded ice. This is due to the fact that some parameters that were used for prediction of fractures on grounded ice are not applicable for predicting fractures on floating ice and vice versa (for example, friction and bed slope are irrelevant on floating ice, whereas back stress cannot be applied to grounded ice).

The calculation of some predictors was performed using methods already implemented in ISSM (e.g. stresses, strain rates, friction coefficient). Other predictors (e.g. calculation of curvature, distances to ice front, grounding line, proximity to glacier edges and nunataks) are not produced by ISSM and were calculated independently. Here we describe the methods we used to calculate each predictor parameter as well as a brief description as to why each parameter may have an impact on the location of fractures:

The deviatoric stress values have a direct effect on the opening and closing of crevasses; the sign of the first principal stress component determines whether it is compressive (negative) or tensile (positive).

Effective deviatoric stress is calculated as

Von Mises stress is calculated as

The principal strain rates are calculated using the observed velocities as
eigenvalues of the matrix

Using again the shallow-ice approximation, vertical shear is neglected, and
the effective strain rate is approximated as

Comparison between the success of identifying fractures using LRA
(purple for grounded ice, green for floating ice) and using test set 1:
effective deviatoric stress (effective dev. stress); test set 2: principal
deviatoric stress 1 and 2; and test set 3: von Mises stress (blue for grounded
ice, yellow for floating ice). Left column represents grounded
ice

Finally, due to the fact that all predictor parameters have different units,
as well as significantly different magnitudes, we normalize each predictor
used in Eq. (

Including stress variables to predict fractures is intuitive as they are one of the major indicator of ice being fractured or non-fractured. Other variables such as geometry correlate to stress variables, but we found that it is important to include them in the model because the results are inferior if the parameters are not included. This might be caused by limitations of the predictor parameter values produced by the ice sheet model or the simplification of the Stokes equation.

In order to show that our fracture model works better when including both
physics-based and geometry-based predictors, we ran three additional
experiments. In the first test run we included only the effective deviatoric
stress as a predictor and found that, although it produces reasonable results
matching the observations, the success of identifying fractures is about
20 % lower than the results of the model with the chosen optimal set of
the predictors (Fig.

Moreover, including both friction and strain rate might be ambiguous since less friction can lead to larger strain rates. By looking at the predictor data sets, we found that the optimal choice of parameters for each group includes either friction or strain rate but never both at the same time. We ran an experiment replacing strain rates by friction and found that the prediction success for some glacier decreases by only about 3 %. We therefore kept only strain rate as a predictor parameter and discarded friction.

To construct the probability function for each glacier, we sought a set (or subset) of the predictor factors required to include in the LRA. We started with a first guess (calculated using LRA and a potential choice of the predictor parameters) and then improved it based on three methods: random walk, Bayesian and Jensen–Shannon divergence algorithm.

For each of the 45 ice shelf regions we performed a 100 000-step run with random sets of predictors used at each step (the number and selection of predictors were chosen at random at each step). We defined a potentially good model to be the one with a success of identifying fractures larger than 70 % and the error rate of overestimation not exceeding 15 % (however, when a good-fit model was not found after 2000 steps, we looked for a model with a 65 % success rate and 20 % error rate). Once a good fit was found, we saved it as a potentially good set and continued running the model with different sets of predictor factors for the remaining number of steps to search for a better model. At the end of each run the algorithm provided us with a mean set of factors for a best-fitting model.

To test the behaviour of the models with different sets of parameters and, thus, to choose with more precision an optimal set of predictors from the full set, we performed a non-linear Bayesian inversion.

To find the likelihood function for a Bayesian inversion, we need to add the
probabilities of fracturing for all nodes. The area of each ice shelf region
is

In addition to defining a likelihood function, Bayesian inversion requires an
input of prior model and prior scores. For a prior model, we took a
calculated fracture probability

Finding an expression for a likelihood function,

However, all of them produced very large likelihoods which increased
dramatically with a small percentage change in the probability density
function. The value of the likelihood function increased up to an order of

In order to construct the function, first we assumed that the measure

Second, our idea was to calculate the likelihood

We performed a Bayesian analysis for 500 steps, then narrowed down the selection and accepted only those models that had likelihoods greater than 90 % of the best likelihood. Each step included two criteria: if a new likelihood was greater than the prior likelihood or was greater than a certain percentage (taken at random at each step) of the old likelihood, we accepted the model. This allowed us to identify the most commonly chosen sets of parameters.

In order to select a set of predictors for a general case and to find whether it is possible to identify a set that can be used for any ice shelf region, we started with the construction of a binary array for each ice shelf/glacier, where the number of rows represents the number of well-fitting models for an ice shelf/glacier and the number of columns represents each of the predictor factors.

We then found the average occurrence of each predictor:

We could then determine how often a certain predictor was included in the
good-fit models. If a predictor was selected more than 50 % of the time,
then it was assigned as a candidate for the best-fitting model. Thus, we obtained a

Then, we classified the glaciers in groups. There were a large number of
possible combinations to select such groups. Therefore, we constructed a test
that assessed every possible combination and calculated a percentage of
similarity between glaciers in a group (Eq.

Finally, we found that we could categorize all 45 glaciers into four different groups, with group 1 having glaciers/ice shelves that can be more easily combined and group 4 being a narrower group of specific glaciers/ice shelves that cannot be placed in any of the other three groups.

The JSD method

The Kullback–Leibler divergence

Here we utilize the damage-based model as an independent method in order to compare it with our statistics-based method. We do not compare our probability-based model with the damage model directly; rather, we evaluate their respective ability to predict the formation of fractures in ice. For this we compare calculated damage with the observations of fractures and identify areas where it can and cannot accurately predict the presence of fractures.

In this study we use the damage inversion method proposed by

It is important to keep in mind that the inversion only infers damage in areas where fractures (crevasses or rifts) are being actively formed and, thus, creating a jump in strain rate/velocity. Many rifts are formed at one point in time and then only intermittently propagate. If the velocity observations do not show a discrete jump across a fracture, then there is nothing for the inversion to pick up in terms of damage. It only finds fractures that are actively enhancing the flow, and it is not meant to locate every fracture.

Estimation of

Fractures that have been advected can be identified by damage, but this is not always the case, due to the fact that the inverse method for calculating damage will only find damage where there are fractures that give rise to velocity gradients. Damage will capture some fractures that were formed upstream and advected to a region with different stress conditions only if the fracture enhances the flow and creates a local velocity gradient. Thus, we first calculate flow lines for each observed fracture. If upstream from the fracture the damage is larger than 50 %, we assume that the damage calculation may be correct and that the observed fracture was formed upstream. If there is no damage initiated at the point or damage upstream from the observed fracture, we assign the observation point as not captured by the damage method and consider this as a failure of damage to identify the fracture (which can be due to the fact that the fracture in observation point does not cause a local gradient in strain rate).

Physics-based methods, such as LEFM and CDM, are necessary when modelling fractures in
Antarctica. We do not intend to substitute these methods; rather, we seek a
method that can improve on some aspects and cases when physics-based models
do not predict well the formation of fractures. In particular it is possible
that some fractures are initiated upstream from the grounding line rather
than on floating ice. It is therefore important to be able to predict the
formation of fractures in both cases. Damage is calculated only on floating
ice based on model inversions using ISSM

We applied the LRA method combined with the random-walk method to 45 ice
shelf regions, including both ice shelves and surrounding grounded ice (the
corresponding names and locations can be found in
Table

In total, for each ice shelf/glacier the random-walk analysis gave a number of possible sets of predictors that can produce a well-fitting model. We combined all of these possible sets for each glacier to see which predictors are always present in the well-fitting model and which ones are never included. The results of the random walk and the Bayesian inversion agreed well. Most of the essential predictors for each particular glacier selected in the Bayesian approach were also chosen when performing the random walk. In most cases, the Bayesian analysis showed equal importance of most of the predictors although effective strain rate and velocity had a slightly higher rate of selection. There was no universal set of factors that could be used to model all ice shelf regions. However, subsets of glaciers had some similarities in terms of the predictors that had to be included in order to achieve a well-fitting model.

To estimate how well our probability model and the damage model identify observed fractures, we calculated the percentage of success and error for each ice shelf/glacier model. First, we found the number of cases when there is a modelled fracture in the vicinity of an observed fracture (within 100 m radius). Then, we divided this number by the total number of observed fractures to find the percentage of success. To find the percentage of failure, we calculated how many times there is a modelled fracture when there are no observed fractures within a 100 m radius. We divide this number by the total number of non-fractured nodes to find the failure percentage.

Formed groups of ice shelf regions.

The value of the calculated weight coefficient

It is important to mention that

Same as Table 4 but for floating ice.

Success and error percentages for LRA for grounded ice are shown
in

Thus, we categorized the 45 glaciers/ice shelves into four groups, requiring
that the deviation from the best-fit models did not exceed 5 %. Next, we
performed a test to assess whether these selected sets were the optimal
choice, by estimating the deviation from the best solution using the
Jensen–Shannon divergence algorithm. We assigned each glacier to a particular
group based on its minimum value of the deviation from the best-fitting model
in JSD analysis. In so doing, we slightly modified the members of each group
that we had previously created. For example, glacier 27 belonged to group 1
previously, and it fit well with only a slight change of the best-fit score.
However the JSD showed that, if we move this glacier to group 2, the deviation
from the best-fit decreases from 0.01 to 0.003. However, we had to take into
account the fact that the JSD algorithm measures the total distance to the
best-fit probability and, thus, can decrease the overestimation error while,
at the same time, significantly decreasing the success rate. This took place
for six glaciers/ice shelves with the numbers 10, 13, 15, 11, 30 and 32 (see
Table

Finally, to reach an optimal agreement between our model and the observations
of fractures, we assigned each glacier to a particular group, and the set of
factors for each group are presented in Tables

While the success rate of identifying fractures on floating ice was lower than for
grounded ice, we were still able to identify the main fracture patterns, and
the success rate was high for the majority of ice shelves (see
Fig.

This was the largest group of glaciers, and the best-fit model includes as
many as 10 predictors for grounded ice and seven predictors for floating ice.
The analysis of the estimated coefficients in LRA showed that predictors with
the highest weights in our model for this group of glaciers were effective
strain rate, proximity to glacier edges and nunataks, and the surface
elevation change. We present the modelled probability of fractures in
Figs.

The location of each of the ice shelf regions in our analysis.

The number of observed fractures versus distance from the ice front.

A list of analysed ice shelf regions. Fracture observations for the
model calibration process were taken from regions marked with

AP: Antarctic Peninsula; WA: West Antarctica; EA: East Antarctica.

The main pattern of surface fractures is well represented for this group. On
grounded ice the success rate of identifying fractures is larger than
88 %, with a quarter of glaciers at almost 100 %. The failure related to
overestimation of fractures is 27 %. On floating ice the success
amounted to 55 %, and the failure was equal to 15 % on average. For
Vanderford IS (see Fig.

The modelling results for the Cook Ice Shelf are shown in
Fig.

Modelled probability of a fracture for group 2, Rayner Thyner
IS

Modelled probability of a fracture for group 2, Larsen A
IS

The modelling results for Larsen B IS are illustrated in
Fig.

The results for Nansen IS (Fig.

The model for the second group of glaciers has the best fit when the bed slope is excluded. Effective strain rate and surface slopes were found to be the most important predictors in the model for this group.

For this group the LRA method predicts fractures with a 70 %–90 %
success rate on grounded ice and finds about 67 % of observed fractures on
floating ice with an overestimation of 25 % and 27 %, respectively.
In most cases the model represents the non-fractured nodes with high
precision, except for the slight overestimation at the front of the ice
shelf. Similar situations are observed for most glaciers in this group: the
area of floating ice is relatively small; thus the main prediction is
performed for grounded ice. For, example, for Edward VII IS and Rayner Thyner
IS (Fig.

Interesting results were found for Larsen A IS (see Fig.

Modelled probability for Pine Island (group 1)

This group includes four ice shelf regions, namely Totten IS, Nivl IS, Dibble IS and Holmes IS. These glaciers were very sensitive to the choice of predictor factors, and the JSD process could not assign them to either of the two aforementioned groups. The mean success rate for this group was around 93 %, with an overestimation rate just above 23 % on grounded ice and 56 % and 23 % success and error rate on floating ice, respectively. Potentially, in the model for this group we could include the proximity to the ice front since it produces slightly better results for three of the four glaciers. However, it lowers the success rate for Nivl IS significantly. Thus, in order to achieve a set that would give a well-fitting model for all of the glacier, we exclude back stress and the ice front proximity from the list of predictor factors for this group.

Modelled probability

In terms of results for Holmes IS (Fig.

Finally, we present the result for the Totten IS (see Fig.

Modelled probability of a fracture vs. modelled damage for Holmes IS
(group 3)

This group includes Larsen C, Amery, George IV and Borchgrevnik IS. The
average success rate for this group amounted to 66 % and 56 % for
floating and grounded ice, respectively, while the average error rate
amounted to 15 % and 20 %. The most important predictor factors
in this group for floating ice are effective strain rate, surface change and
ice thickness, while for prediction of fractures on grounded ice curvature
and surface velocity need to be included in the model. For all of the ice
shelves/glaciers in this group we found that including the ice front and
grounding line proximity distorts the model, increasing significantly the
error due to overestimation of fractures. For Borchgrevnik IS it also led to
a drop of the success rate of fracture prediction. In addition, Larsen C and
Amery ice shelves can be grouped together but cannot be included in any of
the groups mentioned above. For these glaciers only a small number of
predictors needed to be included in the model. The Bayesian analysis also
confirmed the sensitivity of the Amery fracture model to this set of
predictor factors. The LRA model for the Amery IS
(Fig.

The ice shelves/glaciers in this group have a number of characteristics that distinguish them from other ice shelves/glaciers. Most of them are relatively wide with a large floating area. The floating part is not restricted by any channel walls, and the width of the shelf is similar to its length. All glaciers in this groups are relatively static, with less curvature or significant surface elevation changes. However, there were two exceptions: Abbot IS and Drygalski IS have slightly different characteristics. First, Abbot is a wide glacier that exhibits most of the properties of group 1. However it has a large number of glaciers that restrict its outflow towards the ocean and, therefore, has similarities with the glaciers from group 4. This observation is in good agreement with the JSD results that showed that Abbot IS could also be assigned to group 4 as the change in JSD distance in this case would be very small. Second, the JSD results showed that Drygalski IS could be as well placed in group 2 or group 3. This ice shelf has some characteristics similar to group 2 (large number of nunataks) and group 3 (a very long floating tongue). Therefore, we suggest that some glaciers have mixed features of group 1–group 2 (such as Vanderford) or group 1–group 3 (Ekström, Tracy-Tremenchus, Rennik); however they still have more characteristics of group 1 and produce better-fitting results when assigned to this group.

This group includes a relatively smaller number of ice shelves/glaciers. All of the ice shelves/glaciers have a large number of nunataks and smaller ice thickness as well as many small narrow channels and fast ice streams. They are mostly located on the Antarctic Peninsula or near the Transantarctic Mountains. All of the ice streams are relatively steep, which may explain why it is necessary to include surface slopes in order to achieve a well-fitting model.

Group 3 glaciers were found to have many similar features. Most of the ice shelf regions in this group have one relatively long glacier that flows inside an embayment. For most of them the ice shelf is much longer than it is wide, and they all have a very low glacier channel curvature. The surface velocities of these ice shelves/glaciers are relatively high, which explains why changes in strain rate and surface velocity are the most important predictors for this group.

Interestingly, although the average back stress for Totten IS is one of the highest out of all 45 ice shelf regions, including it in the model does not significantly change the fracture probabilities. Thus, apparently, even though predictors may have large magnitudes, they can make just minor contributions to the constructed probability, and other predictors dominate the fracturing process for the Totten IS. The effective strain rate is also one of the highest for Totten, but we found that it is not this predictor that most contributes to fracturing; rather it is the effective strain rate change. Thus, sudden changes in the flow regime of the glacier would be the most likely cause to promote an increase of the number of fractures.

The JSD analysis has shown that Borchgrevnik IS could also be assigned to group 1, but it produced slightly better results being placed in group 4. On the other hand, Amery and Larsen C need to be strictly assigned to group 4 only. George IV IS and Amery IS have similar characteristics as they are both narrow and long (in fact much longer than any other ice shelves of this type in Antarctica) and are located inside an embayment. Although Larsen C IS is not inside an embayment, it is a significantly long and narrow ice shelf stretching around the coast. Borchgrevnik IS also has similar features to the Amery and George IV ice shelves as it is of a similar shape and is located inside a narrow channel. However, it does not have exactly the same characteristics as the other ice shelf regions in this group as it is much shorter, which could be why JSD showed that it could also be placed in group 1.

On the Amery IS (see Fig.

Adding the proximity to the grounding line and the ice front as predictors did not produce a good fit for the Borchgrevnik IS because of the specific shape of this region. The distance between the ice front and the grounding line is very small relatively to other glaciers in our analysis.

We found that, in general, the most important predictor factors for modelling
surface fractures on grounded ice for all analysed glaciers were the surface
velocity and the surface change (maximum difference between the surface
elevation within 500 m radius), which is in agreement with the theory of
possible mechanism of fracture formation

We do not claim that all the predictors that were chosen in the final set for each group represent the exact fracture mechanisms for each glacier. For some ice shelf regions, sets containing different predictors can lead to results close to the best-fitting model. However, for some cases, such as Amery and Totten ice shelves, the number of well-fitting models is very limited. For example, by including the effective strain rate and proximity to the ice front in the analysis, we can achieve a better fit to the observations. Therefore, we conclude that some factors have a very strong effect on fracturing, while others are only minor for some glaciers. Ultimately, we seek only to be able to develop a model that can identify correctly the geographical location of fractures, not necessarily explain why they are there.

The main uncertainty of our method is related to the overestimation of the
number of fractures. It could be argued that we predict fractures at the
locations where no observations of fractures are detected due to the fact
that they are not visible due to snow accumulation or coarse resolution. A
possible solution to this could be to supplement satellite images with radar
and seismic measurements

Modelled probability of a fracture vs. modelled damage for George IV
IS (group 4)

Modelled probability of a fracture vs. modelled damage for Moscow
University IS (group 1)

Modelled probability of a fracture vs. modelled damage for Shirase
IS (group 2)

A significant overestimation of predicted fracturing can be seen at the
front of Vanderford IS (see Fig.

We looked at various properties of Ronne IS, for which we could not find a good approximation using any of the 17 predictors. We found that the Ronne IS has the lowest elevation change as well as the principal stress components. We do not have enough samples to cover values that are non-typical for the majority of glaciers, which may explain why we could not find a good-fit model for this ice shelf, neither with LRA nor using the Bayesian analysis. Thus, we conclude that our probabilistic model is not appropriate in this case.

The damage-based approach sometimes produces areas of high damage downstream
of the observed fractures (Fig.

Most previous large-scale modelling of surface fractures has focused on
applying zero-stress models (propagation of crevasses to the depth where the
overburden pressure and the tensile stress are equal;

We found that the logistic regression algorithm, combined with other statistical methods, can significantly improve the prediction of fractured zones for the Antarctic ice shelves/glaciers and can lead to the identification of up to 99 % of observed surface crevasses for some ice shelf regions, with an average of 70 % for all ice shelf regions. Our approach has a number of uncertainties and leads to some overestimation of the number of fractures in comparison to the observations, but the rate is not significantly higher than the overestimation error found when using the damage-based method. However, the damage-based method did not predict location of many fractures either upstream or downstream from the observed locations, suggesting an underestimation when applying the damage method. The probabilistic results suggest that our statistics-based methods are more reliable in identifying fractures and rifts at the locations where the damage method does not predict them (which is related not to a failure in the damage method but to the fact that it is not constructed to do so). There are also uncertainties in the damage-based method related to the surface temperatures in Antarctica, which may be poorly represented with available observations. It is possible that the damage-based method needs to be tuned for every ice shelf separately, but this is beyond the scope of this study.

We classified the Antarctic ice shelf regions into four groups, where ice shelves/glaciers in each group have similar characteristics and each group has a set of predictors that can be used to predict the location of fractures. Although there were ice shelves/glaciers of specific shapes and having specific regimes that are more difficult to describe applying the general set of factors suggested in this study, overall our method provides a tool that can be used in the analysis of fracturing for most of the ice shelves/glaciers in Antarctica.

Our model is easy to implement and can be effectively used as a basis for modelling of fracture propagation and the first step in implementing a calving parameterization in ice sheet models. This statistics-based method can help to expand our current knowledge of the crevasses as well as improve mapping of potential hazards. Our results can be used to identify potential regions with snow-covered crevasses that may pose hazards for navigation in Antarctica and, thus, complement field campaigns.

The data set of the location of fractures can be accessed
at

The supplement related to this article is available online at:

VE designed the study, developed the methodology, collected the data, performed the analysis and wrote the manuscript. PT helped with models and helped revise the manuscript significantly. MM helped with the implementation of the analysis in ISSM and development of the code, implemented some features needed for the model, and helped revise the manuscript. CB suggested a way to improve the observational set using the damage-based results and helped with the damage description and modelling. MS developed the source function for the Bayesian analysis.

The authors declare that they have no conflict of interest.

We would like to thank Teresa Neeman for her help with the logistic regression algorithm. We are also grateful to Anthony Purcell for his consistently helpful advice and his help revising the manuscript. Veronika Emetc received a scholarship from ARC Discovery (DP140103679). We would like to thank Julian Byrne for his help with setting up the ISSM software on Terrawulf. We are grateful to the reviewers for their careful and insightful comments on our paper. Edited by: Olivier Gagliardini Reviewed by: Jeremy Bassis and one anonymous referee