On the influence of model physics on simulations of Arctic and Antarctic sea ice

Introduction Conclusions References


Introduction
Current General Circulation Models (GCMs) show large intermodel spread in simulating future (decadal to centennial) characteristics of sea ice (Zhang and Walsh, 2005;Arzel et al., 2006). This disagreement appears for both sea ice extent and volume, with an even more striking scatter in the Southern Hemishpere (Flato, 2004;Lefebvre and Goosse, 2008). In addition, most of those GCMs present large discrepancies with respect to observations over the last decades, in terms of mean seasonal cycle as well as interannual variability, for both hemispheres (Parkinson et al., 2006;Arzel et al., 2006;Holland and Raphael, 2006;Connolley and Bracegirdle, 2007;Lefebvre and Goosse, 2008;Stroeve et al., 2007).
The sources of this spread are manifold. First, the ability of GCMs to reproduce the observed atmospheric state is not always satisfactory. Bitz et al. (2002) show that the biases in Arctic surface pressure and winds create anomalous ice exports and thickness patterns. Holland and Raphael (2006) come to similar conclusions for the Southern Hemisphere (SH). In the Northern Hemisphere (NH), errors in simulated air temperatures, precipitation rates, clouds and humidities are other well-known sources of spread in GCMs (Walsh et al., 2002). Second, the initial conditions, and in particular those of the Southern Ocean, are important for the multi-decadal evolution of sea ice (Goosse and Rensen, 2005) but are still uncertain. Third, the model equations are solved differently from one model to another, using different numerical methods and horizontal and/or vertical resolutions in the atmosphere, sea ice and ocean. Finally, the representation of sea ice-related thermodynamical and dynamical processes differ from one model to another, ranging from simple static models with no explicit ice thickness distribution to sophisticated dynamical models, including a snow component and a multi-category ice thickness distribution framework.
Published by Copernicus Publications on behalf of the European Geosciences Union. Only a few studies have attempted to investigate how sea ice simulations are sensitive to the representation of these physical processes. Bitz et al. (2001) focused on Arctic sea ice and noted that the inclusion of an ice thickness distribution (ITD) in their sea ice model led to thicker ice and higher variability of ice export at Fram Strait. Holland et al. (2001) found that Arctic ice thickness spatial patterns were more realistic when including a dynamical ice component.  obtained similar results at the global scale. In addition, they noted that the presence of an ITD enhanced the ice thickness response to external perturbations.
Here, we propose to continue the work initiated by those studies. We run two versions of the ocean-sea ice GCM NEMO-LIM 1 driven by atmospheric reanalyses. These two simulations differ only in their sea ice component. Their differences will thus depend only on the model physics and not on any of all other sources of errors mentioned above. We evaluate these simulations with a comprehensive set of metrics adapted to our models' resolution (∼1 • ). These metrics are designed to evaluate sea ice models at the global scale (allowing the comparison of the performance in each hemisphere) and for seasonal to multi-decadal time scales. Whenever observations are sufficiently distributed, we include diagnostics about mean state and seasonal to interannual variability.
We incorporate diagnostics for the ice concentration, thickness and drift. Each of these prognostic variables plays indeed an important role at the seasonal to decadal time scales. Ice concentration controls the open water fraction of a given region and consequently the heat exchanges between atmosphere and ocean. Ice thickness has important implications for the memory of the sea ice system at pluri-annual time scales and directly affects ice dynamics as well as thermodynamics. Ice drift controls the large-scale sea ice thickness patterns but has also important connections at the local scale since it determines the local divergence and shear of the ice pack. In addition to the analyses focused on the regional characteristics of the sea ice cover, we also evaluate the models using integrated quantities, such as the total ice extent in each hemisphere and the Fram Strait export, that are simple but useful diagnostics. The interest of our metrics is clear: they provide a quantitative method for evaluating the sensitivity of a sea ice model to its representation of physics. In a larger framework, our metrics can serve for the evaluation of other sea ice models.
The rest of the paper is organized as follows. Section 2 describes the two ocean-sea ice models, namely NEMO-LIM2 and NEMO-LIM3. In Sect. 3, we present the observations used as the basis of our evaluation and explain how we derive model metrics associated with the different sea ice variables. Section 4 discusses some of the models' characteristics in 1 NEMO: Nucleus for European Modelling of the Ocean (http://www.nemo-ocean.eu) LIM: Louvain-la-Neuve Sea Ice Model (www.climate.be/lim) light of these metrics. Finally, we discuss the physical processes possibly responsible for the differences between the two simulations in Sect. 5.

Models description
In this study, we use two versions of the global coupled ocean-sea ice model NEMO-LIM. These versions differ only in their sea ice component, as described in the next section. Unless otherwise stated, all other experimental conditions are identical and are presented in Sects. 2.2, 2.3 and 2.4.

Sea ice models
LIM2 (Louvain-la-Neuve Sea Ice Model, version 2) is a dynamic-thermodynamic sea ice model. A full description of the model can be found in Fichefet and Morales Maqueda (1997). Timmerman et al. (2005) have validated the oceansea ice coupling. Here, we only present the salient features that are important for this study. LIM2 comprises the 3-layer (1 of snow and 2 of ice) model of Semtner (1976) to account for sensible heat storage and vertical heat conduction. The thermal conductivities of ice and snow are corrected by a multiplicative factor to account for unresolved thickness distribution. Finally, the model takes into account in a rather simplistic way the latent heat storage by brine pockets. Regarding sea ice dynamics, the viscous-plastic (VP) constitutive law of Hibler (1979) is used. The model includes a lead parameterization and the momentum equation is solved using a B-grid formulation.
LIM3 is based on LIM2 but presents notable differences, presented hereafter. A complete description and validation of the model is given in Vancoppenolle et al. (2009b). LIM3 has a more sophisticated thermodynamic component than LIM2. It has a finer vertical resolution (5 layers of ice and 1 of snow). While the storage of latent heat in brine is highly parameterized in LIM2 using a heat reservoir, it is explicitly represented in LIM3, using a vertically varying salinity profile. In addition, salinity variations in time are resolved in LIM3 using parameterizations of brine entrapment and drainage processes based on a simplification of the brine drainage model of Vancoppenolle et al. (2007). LIM3 also includes an explicit ice thickness distribution (5 ice categories) that enables to resolve the more intense growth and melt of thin ice, as well as the redistribution of thinner ice onto thicker ice due to ridging and rafting. For the dynamics, the elastic viscous-plastic (EVP) formulation of Hunke and Dukowicz (1997) is used. The momentum equation is solved using the new C-grid formulation of Bouillon et al. (2009). The full sets of parameters for the LIM2 and LIM3 models result from independent historical tuning procedures with these models (Timmerman et al., 2005;Vancoppenolle et al., 2009b). They should be viewed as reference values for each model based on earlier experience. For information, four of them have been reported in Table 1

Ocean general circulation model
The ocean component is based on version 9 of the finite difference, hydrostatic, primitive equation ocean model OPA, fully documented in Madec (2008). We run our experiments on a global tripolar ORCA1 grid (about 1 degree resolution) with 42 vertical levels. This grid extends from 78 • S to 90 • N with a mesh refinement down to 1/3 • around the equator. A restoring term towards climatological sea surface salinities (Levitus, 1998) is added to the freshwater budget equation to avoid spurious model drift. Both sea ice models are coupled to OPA following the formulation of Goosse and Fichefet (1999).

Model forcing
The ocean-sea ice models are driven by atmospheric reanalyses and various climatologies. We use NCEP/NCAR daily values of 2 m air temperature, and 10 m u-and v-wind components (Kalnay et al., 1996), together with monthly climatologies of relative humidity (Trenberth et al., 1989), total cloudiness (Berliand and Strokina, 1980) and precipitation (Large and Yeager, 2004). River runoff rates are derived from Dai and Trenberth (2002). The use of some climatological forcings is motivated by the questionable reliability of the corresponding NCEP/NCAR atmospheric data sets in the polar regions (Bromwich et al., 2007;Walsh et al., 2009;Vancoppenolle et al., 2011) as well as the realistic global sea ice cover obtained in similar studies with these climatologies (Timmerman et al., 2005;Vancoppenolle et al., 2009b). These forcing fields are all spatially interpolated onto the ORCA1 grid. The atmosphere-sea ice turbulent and radiative heat fluxes follow the formulation of Goosse (1997). Surface wind stress on sea ice is computed using a quadratic bulk formula, with respective drag coefficients for LIM2 and LIM3 tuned after model calibration.

Simulations setup
Both simulations start in 1948, but only the period 1983-2007 is compared to observations (as explained in the next section). We use initial sea temperature and salinity fields from Levitus (1998). Where sea surface temperature is below 0 • C, LIM2 (LIM3) initial sea ice thickness is set to 3.0 m (3.5 m) and 1 m (1 m) for the NH and SH, respectively.
At the same grid locations, LIM2 (LIM3) initial snow ice thickness is set to 0.5 m (0.3 m) and 0.1 m (0.1 m). Given the 35 years of spinup, the slight difference in Arctic initial ice and snow thicknesses used as standard values for those two model versions, has virtually no influence on the sea ice properties during the investigation period. For both simulations, initial concentrations are prescribed to 95 % in the NH and 90 % in the SH. The ocean model has a time step of t 0 = 3600 s = 1/24 day. Both sea ice models are called every 6 h, i.e. every 6 ocean time steps. Finally, the ocean-sea ice drag coefficients are set to 5.0 × 10 −3 in both models.

Sea ice metrics
It is convenient to develop a set of metrics to quantify the performance of the two models. However, caution must be taken. As underlined by Knutti (2010) for climate models, the choice of a metric is dependent on the intended application. Hence, we are not claiming that the metrics described below are exhaustive. They form a baseline for evaluating sea ice models at climatic resolution and are especially designed for seasonal to multi-decadal simulations. Particular attention has been paid to the following points: -We base our metrics on the three main prognostic sea ice state variables: concentration, thickness and drift.
-We use the same metrics for both hemispheres. In this manner, we are able to compare the hemispheric performances on a common basis.
-When observations are sufficiently well distributed in time and space, we evaluate models both on their mean state as well as on their variability (seasonal to interannual).
-When possible, we evaluate models on their ability to reproduce observations at local and regional scales. In this perspective, a simulation characterized by errors of opposite sign in different regions, which compensate when averaged globally, is still penalized.
We chose to focus on the 1983-2007 period. Although satellite measurements of ice concentration and drifts are available from 1979, we decided to exclude the 4 first years because of a known bias towards warm temperatures in the NCEP/NCAR reanalyses during fall 1980 and winter 1981, along Siberia and Alaska (Tartinville et al., unpublished manuscript). For consistency, we also excluded the years 1979-1982 from the diagnostics in the SH.
We discuss hereafter our choice of the metrics that are used in Sect. 4 to evaluate both simulations. We chose to proceed in two steps for each of the variables: (1) compute a set of model versus observation errors (in absolute value) and (2) scale these errors by typical, acceptable values of errors. This procedure has the advantage to make inter-variable and inter-hemispheric comparisons possible. Thus, we get positive metrics, and lower values indicate higher skill. Moreover, metrics below 1 (above 1) indicate a better (lower) performance than expected.

Sea ice concentration and extent
We use the global sea ice concentrations from the Scanning Multichannel Microwave Radiometer (SMMR) and the Special Sensor Microwave/Imager (SSM/I) reprocessed by the EUMETSAT Ocean and Sea Ice Satellite Application Facility (EUMETSAT OSISAF, 2010). The observations are available over the period 1983-2007. We interpolated this data set onto the model grid with a bilinear scheme to allow pointwise (grid cell by grid cell) comparison. This interpolation also avoids the presence of systematic bias in ice extent due to the difference in land-sea mask between model and observations.
For each grid cell, we compute modelled and observed (a) monthly mean ice concentration over 1983-2007 (i.e. the mean seasonal cycle of ice concentration), (b) standard deviation of monthly anomalies of ice concentration over 1983-2007, and (c) ice concentration trend computed from linear regression on monthly anomalies over 1983-2007. For (a), we calculate the mean absolute difference between model and observations over the climatology. For (b) and (c), we retain the absolute difference between model and observations. Finally, we average these errors spatially for each hemisphere, weighted by grid cell areas. In summary, we evaluate a model in its ability to reproduce regional patterns of seasonal cycle and interannual variability. Table 2, rows 1-6 show the corresponding metrics (i.e. the errors described in the previous paragraph, scaled by typical errors).
We adopt a similar strategy for evaluating sea ice extent: we calculate the ice extent as the total area of grid cells covered by more than 15 % of ice, based on monthly mean data of ice concentration. For each hemisphere, we compute modelled and observed (a) monthly mean ice extent over 1983-2007 (i.e. the mean seasonal cycle of ice extent), (b) standard deviation of monthly anomalies of ice extent over the same period, and (c) ice extent trend computed from linear regression on monthly anomalies over 1983-2007. For (a), we compute the mean absolute difference between model and observations over the 12 months; for (b) and (c), we calcu- late the absolute difference between model and observations. In summary, we evaluate a model in its ability to reproduce large-scale patterns. These 6 errors (3 for each hemisphere) are expressed in 10 6 km 2 and we scale them by typical errors shown in Table 2, rows 7-12, to get the corresponding metrics. Note that these metrics are less restrictive than the metrics for ice concentration: errors on ice concentration can somewhere be compensated by errors of opposite sign elsewhere, with no net impact on the total ice extent.

Sea ice thickness and draft
In the NH, sea ice thickness has been measured from 1958 by the Upward Looking Sonars (ULS) onboard submarines. The quantity effectively measured is the ice draft, defined as the ice thickness below sea level (usually around 90 % of the total ice thickness). In our study we use draft data from the National Snow and Ice Data Center (1998, updated 2006) during the period 1983-2000. This data set, described in details by Rothrock et al. (2008), includes about 30 cruises from which we used the mean drafts taken from more than 3000 50 km-long transects. These averages, including open water, are used here as a basis for the evaluation. In the SH, we use the ASPeCT data set of Worby et al. (2008). Here, the sea ice thicknesses are estimated visually from ships when they break the ice and turn it sideways. We only retained observations that are at least 6 nautical miles apart, to ensure independence of each observation. This data set covers about 14 000 observations over the period 1983-2005. For each individual measurement, we pick the model thickness/draft with corresponding year and month, and whose grid cell coordinates are the closest to the observation location. Then, for each hemisphere, we average all absolute differences in thickness/draft (including open water ponderation) with equal weight. For the NH (only), we also calculate the absolute error on relative thinning inside the ice pack with h yr1−yr2 m and h yr1−yr2 o denoting the modelled and observed mean thickness in central Arctic (latitude higher than 80 • N) over all ULS locations between years yr1 and yr2. We are interested in quantifying |T m − T o | because there is a strong climatic signal over the last decades in Arctic mean ice draft (Rothrock et al., 2008;Lindsay and Zhang, 2005). However we did not include any counterpart in Antarctic due to large spatio-temporal gaps in the ASPeCT data set of Worby et al. (2008).
In summary, we evaluate here the model for each hemisphere in terms of mean absolute error with respect to observed draft/thickness. The corresponding metrics are shown in rows 13 and 15 of Table 2. In addition, for the NH, we also retain the absolute difference between modelled and observed relative changes over the period of interest. The corresponding metrics are shown in row 14 of the same Table.

Sea ice drift and Fram export
We use the data set of Fowler (2003Fowler ( , updated 2007) from SMMR-SSM/I satellite observations 2 . The data covers the period 1983-2006 for both hemispheres. As for ice concentration, we interpolated monthly values of ice drift onto the model grid.
Arctic sea ice drift can be viewed as the superposition of a mean field and stochastic perturbations (Rampal et al., 2009). In this study, we are not considering the turbulent-like fluctuations. For evaluating the mean circulation appropriately, Rampal et al. (2009) suggest specific spatio-temporal averaging scales of ∼2.5 months and 200 km in summer, and ∼5.5 months and 500 km in winter. We follow these recommandations for Arctic sea ice. As a first guess, and because no equivalent study exists in the SH to our knowledge, we transpose this averaging method to the Antarctic sea ice. We proceed in two steps for each hemisphere and for each season of where u i and v i are the zonal and meridional components of ice drift after spatial smoothing (i = 1,...,N denote the grid cells), respectively. (b) We compute spatial correlation as the mean of componentwise spatial correlations between model and observations: where u j = [u 1 ,...u N ] j and so for v j (j = m,o). Again, the indices "m" and "o" denote model and observations, respectively. Note that only grid cells containing non-zero modelled and observed drifts were taken into account for the evaluation.
In summary, we evaluate here the model drift in terms of magnitude and circulation. For each hemisphere, we average <KE> and C (Eqs. 2 and 3) over all summers and winters. We are thus left with 2 errors for kinetic energy (one for each hemisphere, in J kg −1 ) and 2 mean correlations (with no units). As higher correlation indicates higher skill (contrary to all other errors discussed in this paper), we substract them from 1 to get an error-like correlation (i.e. 0 is the best score, 2 the worse). We obtain our metrics of ice drift after scaling these 4 errors with typical errors (rows 16-19 of Table 2).
We also evaluate the model on its export of sea ice at Fram Strait. Integrated monthly exports of sea ice area and volume are available over 1983-2007 from a combination of high-quality data from different sensors onboard moorings and satellites with high spatial and temporal coverage (Kwok et al., 2004;Spreen et al., 2009). We break down the signals of monthly areal and volume export into (a) the mean seasonal cycle and (b) monthly anomalies. For (a), we compute the mean absolute difference between model and observations over the 12 months of the year. For (b), we compute the absolute difference between modelled and observed standard deviation of ice export anomalies. The corresponding errors in mean seasonal cycle and standard deviations of the monthly anomalies of volume and area ice export are then scaled to give the metrics given in rows 20-23 of Table 2.

Results
We summarize the two models' performance in Table 2. The left and right columns of this table correspond to the simulations using LIM2 and LIM3, respectively. The rows of the table correspond to the diagnostics defined in the previous section: 6 for ice concentration (3 per hemisphere), 6 for ice extent (3 per hemisphere), 3 for ice draft and thickness (2 for NH, 1 for SH), 4 for ice drift (2 per hemisphere) and 4 for Fram export (2 for area, 2 for volume).

Sea ice concentration and extent
In the NH, LIM3 clearly outperforms LIM2. LIM3 is consistent with observations not only for mean state (Fig. 1) but also for interannual variability (Fig. 2b). LIM3 is particularly skillful in summer months: it catches the September 2007 minimum and displays realistic trend and monthly anomalies. LIM2 systematically overestimates the mean sea ice extent, particularly during summer months. It simulates too little interannual variability, particularly from July to October. Linear trends of ice extent computed from classical regression are excellent in both models, but LIM2 underestimates the magnitude of observed deviations such as in September 1996September , 2005September and 2007, whereas LIM3 is skillful in this respect. The rows 1-3 and 7-9 of Table 2 summarize these findings. These metrics are in agreement with the statements drawn from Figs. 1 and 2: LIM3 shows convincingly better performance than LIM2 in terms of ice concentration and extent in the NH.
In the SH, LIM3 exhibits a better seasonal cycle of ice extent than LIM2 but tends to overestimate the summer interannual variability (Figs. 1, 2d). As in the NH, LIM2 overestimates the mean seasonal extent throughout the year. Despite this systematic bias, the distribution of interannual variations of ice extent around the monthly climatologies is better reproduced in LIM2. This is mainly due to the absence of peaks present in LIM3 (Fig. 2d). The rows 4-6 and 10-12 of Table 2 confirm these statements. Note that, following the metrics developed for ice concentration and extent in this table, LIM2 and LIM3 display similar performance in the SH, each of them showing the best score 3 times out of 6.  Figure 3a-b shows the spatial distribution of draft errors (model minus observations) in the Arctic for LIM2 and LIM3. As shown in this figure, the spatial sampling of ice draft is limited to the central part of the basin. From available observations, we note that LIM2 simulates too thick ice in general. In this respect, LIM3 is more realistic, but also overestimates the ice thickness in the Beaufort Sea. Observations in central Arctic (latitude higher than 80 • N) reveal a relative thinning between the periods 1983-1991 and 1992-2000 of 23.5 % (see Table 3). LIM2 (LIM3) simulates a corresponding relative thinning of 16.2 % (20.2 %).

Sea ice thickness and draft
The distribution of ice thickness errors in the SH is depicted in Fig. 3c-d. Both models overestimate the ice thickness in the eastern part of Weddell Sea (with a stronger overestimation for LIM2), and underestimate ice thickness along the Antarctic Peninsula (western part of Weddell Sea). Pat-terns of error are similar along the coasts of East Antarctica, with an underestimation close to the coasts and an overestimation away from them. Finally, in the Amundsen and Ross Seas, LIM3 shows better skill than LIM2. Overall, the metrics for ice thickness in the SH are favourable to LIM3 (Table 2). Note that the typical errors for NH and SH (1 m and 0.15 m) have been chosen proportional to the mean observed draft/thickness of all corresponding records. Comparatively, the two models have thus more skill in the NH. Table 3. Summary statistics for the two simulations, and comparison with observations (NA = Not Available) of: mean annual sea ice extent, standard deviation of the monthly anomalies of ice extent, mean annual volume, standard deviation of the monthly anomalies of ice volume, mean ice draft in central Arctic (latitude > 80 • N) between 1983 and 1991, and between 1992 and 2000

Sea ice drift and Fram export
The observed and simulated annual mean  ice drifts in NH are shown in Fig. 4a-b. Both models show the expected circulation: an anti-cyclonic gyre in Beaufort Sea and the presence of a Transpolar drift, from the coasts of Eastern Siberia to Fram Strait. Ice drift within the ice pack is underestimated in LIM2 but in good agreement with observations at Fram Strait and northwards. LIM3 simulates realistic drift within the ice pack but overestimates the ice export at Fram Strait. Annual mean cycles of ice export at Fram Strait (Fig. 5a) indicate that LIM3 overestimates the monthly mean areal export of ice through Fram Strait nearly all year long. LIM2 is closer to observations, but exhibits a weaker seasonal cycle than expected. Furthermore, the interannual variability of this areal export is generally overestimated in LIM3, while LIM2 is closer to observations. Monthly mean volume exports (Fig. 5b) are better represented in LIM3, but their interannual variability is more realistic in LIM2. Figure 4c-d shows the annual mean  simulated and observed drifts in the SH. Both models feature the same distribution of ice drift. They largely overestimate the magnitude of the drift away from East Antarctica. The observed northward export of sea ice in Ross Sea is also exaggerated but has the right direction. In the Weddell Sea, the simulations show a reasonable magnitude but the simulated velocities are too zonal. The metrics in Table 2 (rows 18-19 of the table) reveal that simulated ice drift in the SH is worse than in the NH, for the same typical errors.

Discussion
In the previous section we developed a set of metrics for each of the simulation described in Sect. 2. We illustrated the differences of skill with appropriate figures of ice extent (Figs. 1 and 2), draft/thickness (Fig. 3), drift (Fig. 4) and export at Fram Strait (Fig. 5). In this section, we discuss some hypotheses about the physical processes and mechanisms that could be responsible for the differences in model skill. We chose to split the discussion by hemisphere as suggested by our metrics in Table 2.

Northern Hemisphere
LIM3 presents a more faithful representation of sea ice draft, concentration and extent than LIM2. In particular, LIM3 shows more realistic seasonal to interannual variability than LIM2. We suggest that this is mainly due to the difference of representation of the ice thickness distribution (ITD) in the two models. As an illustration, we show in Fig. 6 the distribution of mean ice thickness in LIM3 in a given area at the beginning of Spring 2007 (green bars), and the corresponding virtual distribution of LIM2 (red line) for the same mean thickness and concentration. We chose this particular box because it contains the actual ice edge and thus encloses much variability. As shown in Fig. 6, LIM2 artificially resolves the ice thickness distribution by correcting the ice and snow thermal conductivities assuming that snow and ice are uniformly distributed between zero and twice their mean values over the ice-covered portion of the grid cells (Fichefet and Morales Maqueda, 1997). This correction has been originally included to improve the heat fluxes representation, but underestimates the concentration of thin ice in early spring, as shown in the figure. For the same mean thickness, the reductions in ice concentration and thickness are thus enhanced in LIM3 compared to LIM2 when melting occurs. To a large extent, this is a result from the sensitivity of the identical parameterization of surface albedo in the two models (Shine and Henderson-Sellers, 1985), to different ice thickness distributions. The mean LIM2 (LIM3) albedo over the ice-covered surface shown in Fig. 6 is 0.52 (0.46), indicating that the summer melt is enhanced with a multi-category sea ice model. In this context, it is not surprising that LIM3 shows better skill both for mean state and variability of ice concentration (rows 1-3 of Table 2). Due to the presence of its multi-category framework, this model is more responsive to changes in atmospheric forcing, from seasonal to interannual scales. It simulates realistic seasonal cycle and anomalies of ice extent (Figs. 1 and 2). The total Arctic mean ice volume is significantly reduced when switching from LIM2 to LIM3 (Table 3), but the associated interannual variability is enhanced (no observations are available). This is compatible with our hypothesis that the simple approximation of ice thickness distribution in LIM2 retains too much ice in summer (and thus in winter), and that this model is less sensible to atmospheric variability than LIM3.
It is argued here that the multi-category framework in LIM3 is the primary source of its differences with LIM2, and this is supported by other studies: by increasing the number of ice categories in a multi-category sea ice model run in a climate model framework, Bitz et al. (2001) and  found differences in ice thickness between 1 and 2 m, averaged over the whole Arctic Basin 3 . However, there are other processes that might contribute to the differences between LIM2 and LIM3. For instance, Vancoppenolle et al. (2009a) tested the impact of salinity variations in LIM3 compared to the reference Bitz and Lipscomb (1999) model, where the salinity profile is prescribed. They found differences in ice thickness locally up to 1 m in the Arctic, but on the order of 30 cm averaged over the whole Arctic Ocean. It is also shown that including salinity variations reduces the model bias compared to ULS ice draft data. Hence, differences in the thermodynamics and halodynamics between LIM2 and LIM3 must share a significant part of the differences between the two models, but quantifying their role precisely is difficult, given the other differences in the model formulation -in particular the multi-category framework -that most likely dominate the differences in simulated ice thickness between the two models.
Regarding the dynamics, LIM2 shows a better agreement with observations in terms of mean kinetic energy (Table 2). An examination of all seasonal mean ice drifts from 1983 to 2006 (not shown here) revealed that both models generally overestimate the mean kinetic energy. The shift towards higher speeds in LIM3 is possibly due to three additive effects. First, the elastic-viscous-plastic rheology (on which LIM3 is based) is more responsive to wind forcing than the viscous-vlastic (LIM2) rheology, particularly for high (> 0.9) ice concentrations (Hunke and Dukowicz, 1997). We indeed observed that LIM3 simulates higher ice speeds within the ice pack in winter than LIM2. Second, the ice resistance to compression is a monotonic function of ice concentration and thickness. Along the ice edge, LIM3 simulates thinner and less concentrated ice, as explained in the first paragraph of this section. Consequently, the mean sea ice drift at the ice edge is larger than that of LIM2, as shown in Fig. 4. Lastly, sea ice is particularly sensitive to the value of the air-sea ice drag coefficient (C a = 1.0 × 10 −3 for LIM2, C a = 1.4 × 10 −3 for LIM3) and it should be noted at this point that the models parameters (particularly C a ) have not originally been tuned to optimize sea ice drift. We also note that the spatial patterns of ice circulation in the Arctic Basin are not sensibly different for both experiments, as the metrics in Table 2 suggest. It should be reminded that both experiments are driven by atmospheric reanalyses; given the high dependence of sea ice dynamics to wind forcing (Girard et al., 2009), similar patterns were expected.
Finally, the grid formulation (B-grid for LIM2, C-grid for LIM3) seems to have an influence on the ice export at Fram Strait. A schematic representation of Fram Strait is given in Fig. 7. The actual ORCA1 grid resolves Fram Strait with 9 grid cells, but an example with 3 cells is sufficient to illustrate our reasoning. On a B-grid, the ice velocity vectors are computed at the lower-left corners of the grid cells. Because of the no-slip conditions and the presence of the landsea mask, only 2 non-zero velocities are taken into account when calculating the total export of ice. In LIM3 however, the ice velocities are defined at the centre of the cells edges. Thus 3 non-zero velocities are taken into account for the interpolation to the center of the grid cells. In conclusion, ceteris paribus, the B-grid formulation tends to simulate less ice export compared to the C-grid formulation. This effect, combined with higher drifts in LIM3 (see previous paragraph), Note that Fram Strait is actually 9 grid cells wide, but we show here a schematic representation for readibility.
yields a higher mean areal export at Fram Strait for LIM3 (Fig. 5). Note however that the volume export at Fram Strait is more faithfully simulated in LIM3: higher drifts compensate for thinner ice north of Fram Strait (Fig. 3), whereas LIM2 has too thick ice at the same location and accordingly an excessive mean volume export through Fram Strait. The B-and C-grid formulations can also explain the better reproduction of ice thickness along the Canadian Archipelago in LIM3: the Parry Channel (connecting Baffin Bay and Beaufort Sea) is resolved with 2 grid cells on ORCA1. For the same reasons as explained above, sea ice tends to accumulate faster in LIM2 (with a B-grid) because its flow through the channel is underestimated.

Southern Hemisphere
A careful look at Table 2 suggests that the performances of LIM2 and LIM3 in the SH are comparable for ice thickness and drift, and that none of the models is systematically outperforming the other for ice concentration and extent. We advance 3 possible reasons for explaining this observation. First and foremost, the quality of the atmospheric reanalyses (NCEP/NCAR) in the SH is lower on average than in the NH, essentially due to the sparse spatial coverage of records in Antarctica and the Southern Ocean (Bromwich et al., 2007). Substantial biases in the surface energy budget due to errors in the reanalysis have been suggested Vihma et al., 2002). It is also worth mentioning that the poor representation of the Antarctic Peninsula in the reanalysis land-sea mask introduces a bias in the representation of winds, with an overestimation of westerlies (Timmerman et al., 2004). Accordingly, the simulated ice accumulates (is drifted away) immediately west (east) of the peninsula, and the simulated ice thickness is thus overestimated (underestimated) at these locations (Fig. 3). The bias in winds are also potentially responsible for the unrealistic magnitude of the drift as depicted in Fig. 4. Second, one has to bear in mind that both simulations have been carried out at a coarse (1 • ) resolution. Important ocean small-scale processes (e.g. eddies) are not represented in the models, although they transport considerable amounts of heat and momentum (Rintoul et al., 2001). Consequently, sea ice thicknesses (Fig. 4) and concentrations (not shown here) are misrepresented in both simulations along the ice edge. Finally, the actual mean ice thickness in the SH is smaller than in the NH. This implies that the representation of sea ice thermodynamical and dynamical processes might be less important for the models performance in Antarctica, the skill of models depending more on other factors than the sea ice model physics.

Conclusions
We have investigated the sensitivity of an ocean-sea ice model to the representation of physics in its sea ice component: two hindcast simulations have been studied over the period 1983-2007, for both Arctic and Antarctic sea ice, with an ocean General Circulation Model driven by atmospheric reanalyses and various climatologies. For the purposes of this study, we have developed a set of comprehensive metrics designed for sea ice. These metrics involve the main sea ice characteristics (i.e. concentration, thickness and drift), focus both on regional and global scales, and take mean state as well as variability into account. We chose to define all our metrics as the ratio between the actual model versus observations error, and a typical, or acceptable error. The use of our metrics can extend beyond the purpose of this study and could be full of interest for assessing the performance of fully coupled GCMs in the polar regions in terms of mean sea ice cover and variability. Following our metrics, we obtained similar results as Timmerman et al. (2005) and Vancoppenolle et al. (2009b). We concluded that the model skill in the NH was highly dependent on the representation of physics for ice concentration, extent and thickness. We suggested that the inclusion of a detailed ice thickness distribution (ITD) in one of the model enhanced the interannual variability of sea ice extent, and significantly improved and reduced the simulated ice thickness in the Arctic. We also emphasized that the explicit formulation of brine entrapment and drainage in this model could reinforce the effects of the ITD, with higher melt rates associated with the more sophisticated thermodynamical module. Regarding ice dynamics, the simplest model (with viscous-plastic rheology) was found to be overall in better agreement with observations, but still too energetic. The other model (with elastic-viscous-plastic formulation) was performing worse, probably due to its more responsive rheology and to a higher air-sea ice drag coefficient. Both simulations showed similar patterns of drift, certainly due to high dependence of sea ice drift to the identical wind forcing. In the SH, limitations in terms of model skill do not stem from model physics but rather external causes, such as resolution and atmospheric forcing. No model outperforms the other systematically, and the global performance is lower in the SH than in the NH. NCEP Reanalysis data were provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at http://www. esrl.noaa.gov/psd/. The global sea ice concentration reprocessed data set was provided by the Ocean and Sea Ice Satellite Application Facilities (OSISAF). Thanks particularly to T. Lavergne for useful discussions regarding the use of these data. The Upward Looking Sonar ice draft profiles were provided by the National Snow and Ice Data Center (NSIDC, www.nsidc.org). The ship-based sea ice and snow thickness data were provided by the SCAR Antarctic Sea Ice Processes and Climate (ASPeCt) program (www.aspect.aq). The sea ice motion vectors were provided by the NSIDC.