Skip to content

Saturated surfaces in Antarctica

September 9, 2016

This post is mostly about production rate scaling models, but in addition is neat because it’s an actual useful application of the ICE-D Antarctic exposure age database. It’s a pretty simple one, but hey, mighty oaks from little acorns grow. This is another immensely long post, but the short summary is that a look at samples from Antarctica with very high Be-10 and Al-26 concentrations seems to indicate that the relatively recent, and extremely complicated, production rate scaling model of Nat Lifton and colleagues (the “LSD” scaling method) works better at high latitudes than the simpler and very widely used scaling method based on the work of Devendra Lal in the 1980’s. This is interesting because so far it has been quite difficult to determine whether or not there is any substantive difference in performance between these scaling methods.

We start with some recent Be-10 measurements by Gordon Bromley of the University of Maine on sandstone boulders, presumably glacially transported, at the same site in the southern Transantarctic Mountains where we recently observed what appears to be the highest concentration of cosmogenic He-3 ever observed in a terrestrial sample. These Be-10 measurements are interesting because, even after a reasonably thorough effort to make sure that no mistakes were made, they appear to show that these samples contain an impossible amount of Be-10. What does impossible mean? Basically, if a surface is left undisturbed and exposed to the cosmic-ray flux long enough, the concentration of cosmic-ray-produced Be-10 will eventually build up to a level high enough that the rate that Be-10 is lost by radioactive decay is equal to the production rate. At this point, the amount of Be-10 can’t increase any more. So this relationship provides a limit to the maximum possible concentration of Be-10 that can be present in quartz at a particular site. Generally this condition is referred to as either ‘production-decay equilibrium,’ or, more commonly, just ‘saturation.’ In math, accumulation of Be-10 in a non-eroding surface is governed by the differential equation:

\frac{dN}{dt} = P - N\lambda

where N is the Be-10 concentration (atoms/g), P is the production rate at the site (atoms/g/yr), and \lambda is the decay constant for Be-10 (4.99e-7 /yr). So saturation occurs when dN/dt = 0, or when N = P/\lambda. Thus, if we know the production rate at some site, we equivalently know the saturation concentration.

At Gordon’s sample site, the production rate (calculated with the Antarctic atmosphere model and scaling scheme of Stone(2000) and the “primary” production rate calibration data set of Borchers and others (2015), the Be-10 production rate is 38.2 atoms/g/yr, which means the saturation concentration for Be-10 is 7.65 x 10^7 atoms/g. However, the sample actually contains 8.5 x 10^7 atoms/g. Inconceivable!  If we accept this production rate estimate, this is an impossible Be-10 concentration.

OK, that seems weird, but it’s not actually all that  unusual in Antarctica. To show this, look at nearly all known Be-10 and Al-26 measurements from Antarctica as they are represented in the ICE-D:ANTARCTICA database (making sure they are all correctly normalized to the ’07KNSTD’ and ‘KNSTD’ standardizations, respectively). This data set looks like this:

fig1

What I have done here is just plot nuclide concentration vs. elevation. Because production rates increase with elevation, the nuclide concentration associated with a particular exposure age also increases with elevation, so the envelope of possible nuclide concentrations that one could potentially observe widens as elevation increases. There are about 1200 Be-10 measurements and 500 Al-26 measurements represented here.

Now, because we are using an atmosphere model and scaling scheme that vary only with elevation (magnetic cutoff rigidity is pretty much effectively zero in Antarctica for all practical purposes), predicted saturation concentrations for Be-10 and Al-26 are also only a function of elevation, so we can plot them on these axes, as follows:

fig2

Here the black line shows predicted saturation concentrations. It is evident that there are lots of samples in Antarctica, mostly at high elevation above about 2000 m, whose Be-10 concentrations appear to be above saturation. Gordon’s samples are sitting off to the right of the line at about 2300 m elevation. Some measurements are up to 12% above saturation concentrations calculated in this way.

There are four possible reasons for this. One, the measurements are somehow messed up. This is unlikely, because the measurements represented here are from several different laboratories and/or AMS facilities, and it’s hard to come up with any reason why they would all be similarly spurious. Certainly it is suspicious that the maximum amount that some samples exceed predicted saturation concentrations (12%) is about the same as the difference between the KNSTD and 07KNSTD standardization (11%). However, I spot-checked most of these samples, most of the measurements are well documented, and I am pretty sure that this error has not been made.

The second reason is that we might be overestimating the atmospheric pressure at high-elevation sites, and therefore underestimating the production rate and the saturation concentration. That possibility we can evaluate by using a totally different atmosphere model, one derived from the ERA40 reanalysis and adapted for use in estimating production rates by Nat Lifton (you can read about it here). To do this, we need slightly different axes, because in the ERA40-derived atmosphere model, atmospheric pressure varies with location as well as elevation. So instead of plotting just raw nuclide concentrations, we will plot the ratio of the measured nuclide concentration at a particular site to the saturation concentration predicted for that site. For samples above saturation, this ratio will be greater than one.

fig3

The two left-hand panels use the Stone (2000) Antarctic atmosphere, and the two right-hand panels use the ERA40-derived atmosphere. Basically, there is no difference. Both atmosphere models with this scaling scheme indicate that there are a bunch of samples at high elevations that exceed saturation concentrations for both Be-10 and Al-26. The fact that two independent atmosphere models agree on this point would tend to indicate that our problem cannot be explained just by a bad atmosphere approximation.

The third reason is that the sites where we observe greater-than-saturated nuclide concentrations might have changed elevation. If the samples had spent a significant amount of their exposure history at an elevation higher than their present elevation, they could potentially have higher nuclide concentrations than they could ever achieve at their present elevations. Elevation changes could be caused by local effects (for example, all the samples are on the hanging wall of an active normal fault), or by regional effects (for example, changes in dynamic topography during the several-million-year exposure history of these samples; or glacial isostasy, if the ice sheet surrounding sample sites had continuously thickened during the period of exposure). The amount by which the offending samples exceed predicted saturation concentrations requires approximately 300-400 meters of elevation change in the last few million years. We can evaluate some of these possibilities by looking at where these samples are actually located in Antarctica. Here they are:

fig4

The white dots are Be-10 measurements exceeding saturation, and the gray ones are for Al-26. There are some more white dots hidden behind some of the gray dots. These samples are widely distributed around the high mountains of East Antarctica, which makes it unlikely that concentrations above saturation are due to local faulting or mass-movement effects at a particular location. The background image in this figure is a model for changes in dynamic topography in the last 3 Ma from a recent paper by Austermann and others; the color scale shows total uplift in the past 3 Ma in meters. Samples exhibiting Be-10 concentrations above saturation are not located in areas that have experienced signficant regional subsidence, at least due to this process, over long time periods. In fact a few of them are located in areas that are predicted to have experienced significant uplift. So we can probably exclude local faulting as a blanket explanation for the many samples in Antarctica that have Be-10 concentrations above saturation, and we can also exclude long-wavelength elevation change due to mantle dynamics. What about glacial isostasy? If the ice sheet in the vicinity of these samples had continuously thickened over the past several million years, then the elevation of the samples would have lowered during that time due to isostatic compensation, potentially leading to above-saturation nuclide concentrations. However, we need this change to be large — order 1000 m of ice to obtain the needed few hundred meters of isostatic response — and this is not consistent with sea level records or ice sheet modeling.

fig7

For example, the above figure shows changes in bed elevation from a 5-million-year-long Antarctic ice sheet simulation by David Pollard and others, at two locations that are representative of where we see Be-10 concentrations above saturation: the central Transantarctic Mountains (red), and the East Antarctic marginal highland in the area of the Sor Rondane mountains (blue). At both of these sites, this model run predicts a long-term decrease in land surface elevation over the last ~4 Ma, presumably driven by Antarctic ice sheet expansion due to lower sea levels in the Pleistocene. However, the magnitude of the decrease is only several tens of meters, much too small to account for the above-saturation Be-10 concentrations. Thus, long-term lowering of sample sites due to glacial isostasy does not appear to be a good explanation for above-saturation cosmogenic-nuclide concentrations either.

The fourth possibility, of course, is that our production rate scaling is incorrect. I’ve left it for last, of course, because it’s the most likely. We don’t have any Be-10 or Al-26 production rate calibration data at high elevation near the poles (see here), so we are relying on the scaling method of Stone (2000), which is of course just based on the work of Devendra Lal, to extrapolate from production rate calibration sites at low elevation to sample sites at high elevation. We can test whether inaccuracy in this scaling method explains the apparently-above-saturation samples, by trying a different scaling method. Specifically, let’s try the more recent (and much more complicated) scaling method of Lifton and others (2015), now commonly known as the ‘LSD’ scaling method. At low magnetic cutoff rigidity (i.e., in polar regions), this scaling method predicts a larger altitude dependence than Lal-based scaling methods. Because magnetic field variability is out of the picture at polar latitudes, this elevation dependence is the primary difference between the two scaling methods. Here are the results, calculated in the same way as for the four-panel figure above (compare with the first two panels of that figure):

fig5

Using the LSD scaling method completely fixes the problem (for both atmosphere models, although only one is shown here). Here is another view that copies the first figure that I started with above, showing raw concentration vs. elevation, compared with saturation concentrations. The solid lines are saturation concentrations predicted using the Stone/Lal scaling method as shown above, and the dashed lines are saturation concentrations predicted using the LSD scaling method. If we use the LSD scaling method, there are no measured Be-10 or Al-26 concentrations in Antarctica that are impossibly high.

fig6

 

So, overall, nuclide concentrations in many high-elevation Antarctic surfaces are inconsistent with production rates estimated using the Lal/Stone scaling method, but are consistent with production rates estimated using the LSD method. This would appear to imply that LSD scaling performs better for this data set, and, by implication, very likely for high-elevation sites in polar regions generally.

This is interesting for several reasons.

First, it would tend to indicate that one should probably use LSD rather than Lal-based scaling to compute exposure ages from old, high-elevation glacial deposits in  Antarctica. It makes a pretty big difference: almost 20% at 2500 m elevation. For example, you might find a moraine at this elevation that belonged to the mid-Pliocene warm period (ca. 3-3.3 Ma) if you used Lal-based scaling, but was coeval with the onset of Northern Hemisphere glaciation (ca. 2.7 Ma) if you used LSD scaling. Certainly some potential for confusion there.

Second, although the LSD scaling method is very much more complex than the Lal-based methods, and there are some physical arguments that indicate that LSD should be more accurate than Lal, so far it has been very difficult to show that there is any practical difference in performance between the two. This  issue is important mainly because of the complexity of the LSD scheme — it requires a lot more computation, so code to implement it is more complicated and very much slower. If we don’t have to use it, we’d rather not. The calibration exercise of Borchers and others (2015) showed that for available Be-10 and Al-26 calibration data, there was basically no detectable difference in performance, and that’s been taken (certainly by me) to indicate that for most practical purposes, it’s fine to use the simpler method. However, the Antarctic exposure-age data set would tend to indicate the opposite: that the Lal-based methods, when calibrated with existing calibration data sets that lack coverage at high altitudes in polar regions, significantly underestimate production rates at these sites.

Third, if we accept that the LSD scaling method is correct, we’ve gone from the conclusion that there are a lot of Be-10 and Al-26-saturated surfaces in Antarctica, to the conclusion that there aren’t any. No known measurements from Antarctica quite overlap with predicted saturation values. In one sense, this is not really a huge surprise, because for a site to reach true production-decay saturation, the surface erosion rate must be exactly zero for several million years. Really exactly zero erosion for millions of years is a big ask — this doesn’t even occur in outer space — and is probably unlikely to occur in reality, even at -60° C in the upper Transantarctic Mountains. Again, if we accept that the LSD scaling method is correct, we can estimate steady-state erosion rates (that is, maximum limits on the erosion rate) for the samples that are closest to saturation; for the Be-10 measurements this is in the range of 1-2 cm per million years; for the Al-26 measurements it is 3-4 cm per million years. It is not clear why Be-10 measurements are closer to saturation than Al-26 measurements, although because they are all very close to saturation, this calculation is quite sensitive to the calibrated reference production rates for both nuclides, so this could easily be explained by inaccuracies in those values. In any case, these are plausible values for the lowest erosion rates in Antarctica. Not zero, but extremely slow.

So the summary here is that it appears that the LSD scaling method is consistent with the overall data set of Be-10 and Al-26 concentrations in Antarctic surfaces, and the Lal-Stone scaling method isn’t. One final point, however, is that Stone (2000) also considered this issue (with fewer data; see Figure 4 in that paper) and concluded that the Lal scaling method did not, in fact, undershoot saturation concentrations. The difference, I am fairly sure, is that new calibration data imply lower production rates than the calibration data available at the time of that paper; if currently available production rate calibration data had been used in that paper they would have led to the same conclusion I come to here, that Lal-based scaling underestimates saturation concentrations at high elevations in Antarctica. However, there are a lot of moving parts involved in comparing the two calculations (Be-10 standardizations, production rates, decay constants, etc….) and that claim could probably use further investigation.

 

 

 

 

 

 

 

 

 

 

Possibly the most He-3 ever observed in a terrestrial sample

September 1, 2016

In a previous post I pointed out that we (that is, the noble gas mass spectrometry facility at BGC) might have recently analysed the sample with the highest concentration of cosmogenic Ne-21 ever observed in a terrestrial sample. By ‘terrestrial sample,’ of course, I mean a rock that has accumulated its entire cosmogenic-nuclide inventory on Earth — because cosmogenic-nuclide production rates are so much higher outside the Earth’s atmosphere, meteorites have orders of magnitude higher concentrations of cosmic-ray-produced radionuclides than possible on Earth. So meteorites, or extraterrestrial-dust-laden ocean-floor sediments from the central Pacific, don’t count. In any case, no one has shown up to tell me about anything with a higher Ne-21 concentration, so the claim may be true.

The point of this post is that it now appears that we may also have analysed the sample with the highest concentration of in-situ-produced cosmogenic helium-3 ever observed in a terrestrial rock. This sample is near 2300 m elevation at Roberts Massif, a nunatak at the head of the Shackleton Glacier in the Transantarctic Mountains at 85.5° S latitude. In fieldwork last year, Gordon Bromley and his colleagues collected this sample from that site; the sample is a loose clast, coarsely faceted and therefore possibly glacially transported, of the local bedrock, a Jurassic mafic intrusive rock known as the Ferrar Dolerite. Here is a picture (photo credit: Gordon Bromley).

15-ROB-028_2_small

Two separate analyses at BGC of pyroxene separated from this sample yielded 1.39 x 10^10 and 1.35 x 10^10 atoms/g He-3. Two adjacent smaller loose rocks — this one and this one — had slightly lower, but still pretty unreasonable, He-3 concentrations near 1.25 x 10^10 and 1.07 x 10^10 atoms/g, respectively. As surface rocks go, that’s a lot of helium-3. Note, however, that even though He-3 is extremely expensive at present, you would still have to mine something on the order of 1000 metric tons of this pyroxene to have $1 worth of He-3. It’s not ore-grade.

Pyroxene in this lithology is known to contain only about 6 x 10^6 atoms/g of non-cosmogenic He-3, presumably inherited from initial mineral crystallization, so the measured concentration in these samples is essentially all cosmic-ray-produced. 1.39 x 10^10 atoms/g cosmogenic He-3 at this site implies an apparent exposure age of approximately 13 Ma. As discussed in the previous post, this is not the oldest exposure age ever measured; that record still belongs to samples from the Atacama Desert, analysed by Tibor Dunai and colleagues, with apparent exposure ages exceeding 30 Ma. However, those samples are at much lower elevation where production rates are also much lower, so even though they are older, the total cosmic-ray dose they have experienced is smaller.

In any case, a relatively slapdash literature search reveals that the nearest competitor appears to be another sample of pyroxene from the Ferrar at Mt. Fleming, also a high-elevation site in the Transantarctic Mountains. Joerg Schaefer and colleagues found 6.9 x 10^9 atoms/g cosmogenic He-3 in this sample, which is still plenty but is almost a factor of two less than in the Roberts Massif sample. Are there any other contenders?

The point, once again, is that these high-elevation rock surfaces in Antarctica have enjoyed a stunning and expansive view of nearly the entire late Cenozoic evolution of the Antarctic ice sheets. Frankly, a lot of that was probably pretty boring, but if the East Antarctic Ice Sheet really collapsed in past warm-climate periods, they watched.

Let a hundred flowers bloom

August 1, 2016

This post is about the current proliferation of online exposure age calculators. Summary: good thing. Details: complicated.

Here is a short summary of the situation. Actually, it’s a long summary. But I can’t make it any shorter.

2008-v2. The initial online exposure age calculator at hess.ess.washington.edu, also sometimes referred to by the deeply unfortunate and ill-conceived nickname “Balculator” after the first author of the accompanying paper, dates from 2008 and reflected current thinking at that time about how best to compute exposure ages. The initial funding for me to develop it (a Linux server and several months of postdoctoral salary) came from the then-just-beginning, NSF-funded “CRONUS-Earth” project, so it has also been widely known as the “CRONUS calculator;” however, as discussed below, this is no longer quite accurate. To avoid confusion in the rest of this post, I’ll refer to it here as “the 2008-v2 calculator,” because at the time it entered widespread use it was denoted by this version number, which is often reported in papers and other downstream uses.

Aspects of this calculator that have subsequently turned out to be important are twofold. First, it includes five different production rate scaling methods, two based on the work of Devendra Lal (“Lal-based schemes”) and the rest on later interpretation of more recent neutron monitor data (“neutron-monitor-based schemes”). This  probably excessive diversity was simply because at the time no one knew which one worked better. Second, reference production rates for Be-10 and Al-26 were based on a compilation of currently available calibration data (the “2008 global calibration data set”).

Between 2008-2015 there were several changes to this calculator that were the result of discoveries that various parameter values in version 2 were incorrect.

2008-v2.2. This revision (in 2009) made the important change that users were now required to enter information about isotope ratio standardizations used for Be and Al AMS measurements. The details of this are a giant can of worms that you can learn about here. Interestingly, this turned out to be a really effective example of incentive-based behavioral engineering in that users were successfully induced to solve a messy and unpleasant problem (standardization-related confusion) by being presented with an incentive to do so (the opportunity to outsource all the hard work of exposure age calculations to the online calculator). Also, a subsequent 2.2.1 revision changed a few parameter values, most notably the Be-10 decay constant, that had only minor effects.

2008-v2.2 with alternate production rate calibration data sets. Shortly after 2008, it became clear, from new production rate calibration data that were being generated, that the 2008 global calibration data set was simply wrong, predicting Be-10 and Al-26 production rates that were approximately 10% too high. The reasons for this are diverse and not entirely clear in some cases, but the overall result is quite clear. Although this was basically a non-issue for most applications of erosion rate calculations, it was a big issue for exposure-dating (for example, because the ages of the Younger Dryas and the Antarctic Cold Reversal differ by about 10%). To deal with this issue, I added features to the 2008-v2.2 calculator, mainly the capability to enter arbitrary production rate calibration data, use them to generate a best-fitting value of the reference production rate, and then use this production rate value to determine exposure ages at unknown-age sites. In addition, I carried out this exercise for a variety of published alternatives to the 2008 global calibration data set and made them available as distinct data entry pages. Thus, although the erroneous 2008 calibration data set was still presented as the default means of calculating exposure ages, the availability of other calibration data effectively solved this problem for most practical purposes.

This version (2008-v2.2 with various calibration data sets) has been widely used between ca. 2009 and the present.

Results of the CRONUS-Earth project.  Starting in 2014, results of the now-completed “CRONUS-Earth” project began to become available. Of these results, the ones most relevant to the issue of online exposure age calculators are as follows.

1. Calibration data. A lot more calibration data now exist, and they continue to indicate that the 2008 data set gives incorrect results.

2. Scaling schemes. It is now clear that the neutron-monitor-based scaling schemes are inaccurate, for reasons to do with how neutron monitors work. Nat Lifton and colleagues have put together a new class of scaling scheme based on particle transport models (“Sato-based schemes” after the author responsible for the particle transport modeling) that works better. Both Lal-based and Sato-based schemes appear to work indistinguishably well for most practical purposes, although they are not indistinguishable — they make different predictions for some situations.

3. Muons. We have better estimates of muon interaction cross-sections. This is only marginally relevant for surface exposure dating, but has a nonzero effect on erosion rate calculations.

Of these results, only (1) above has been dealt with at all in the 2008 calculator framework. Thus, at this point we have the situation where the 2008-v2.2 calculator includes incorrect default calibration data, scaling schemes that we know to be inaccurate, and muon interaction cross-sections that we know to be wrong. The calibration data issue can be fixed by use of alternate data sets, but not the others.

CRONUS-2016. In 2015-2016 (the accompanying paper is dated 2016), Shasta Marrero, Brian Borchers, and Rob Aumer put together a new online calculator, intended as one of the capstone products of the overall CRONUS project, that is here. Thus, this product is now the “CRONUS-Earth online exposure age calculator,” and, if one can learn from history, will presumably in future also be known as the “Shastalator” or “Marrerolator.”

One implication of this is that it appears that the 2008-v2.2 calculator is no longer the “CRONUS calculator.” Therefore, in homage to Prince, the 2008-v2.2 calculator will henceforth be known as “The online exposure age calculator formerly known as the CRONUS-Earth online exposure age calculator.”

The CRONUS-2016 calculator has the following features.

  1. Even more scaling schemes. Sato-based schemes (n = 2) have been added to the existing Lal-based (n = 2) and neutron monitor based (n = 3) schemes from 2008. Thus, n = 7.
  2. More complex code. More physical processes are represented in the code.
  3. More nuclides. It is possible to compute exposure ages not only for Be-10 and Al-26 measurements, but also C-14, He-3, and Cl-36.
  4. No shortcuts with respect to numerical precision. The 2008 code takes many numerical shortcuts to speed up calculations that maintain acceptable accuracy for surface exposure dating purposes, but sacrifice some capabilities, such as, for example, the ability to accurately calculate subsurface production rates. These shortcuts are not taken in the 2016 calculator.
  5. Default production rate calibration data derived from recent work.

Items 1-4 above were done for an important reason: the code that the online interface is running was designed also for quantitative testing of all known scaling schemes against all known calibration data (see this paper), and clearly in carrying out such an exercise it is important to take numerical imprecisions and unphysical shortcuts out of the picture. However, the downside of these features is that the code is quite slow; instead of immediate return of results via the web server as in the 2008-v2.2 calculator, the input page of the CRONUS-2016 calculator starts an often-fairly-time-consuming compute job on the server, which emails you the results at a later time when the job is complete.

2008-v2.3. At present, despite the existence of the CRONUS-2016 online calculator, there is still quite a lot of user demand for the 2008-v2.2 calculator, presumably for the following reasons. One, folks are used to it, so it is like a pair of nice fuzzy old socks. Two, it is faster than the CRONUS-2016 code. Three, it facilitates use of non-default calibration data. Possibly, four, it has a web service API. This is all fine, but as noted above, the 2008-v2.2 code perpetuates three major errors: incorrect default calibration data, incorrect scaling schemes, and incorrect muon interaction cross-sections. Thus, I have put together a version 2.3 that mitigates two of these. Specifically, the default production rate calibration data set is now the same (the “CRONUS primary data set” of Borchers and others, 2016) as is used in the CRONUS-2016 calculator, and the muon interaction cross-sections are also updated to reflect calibration data also from that paper. What’s not fixed is that the neutron-monitor-based scaling schemes (now obsolete) are still there, and the Sato-based scaling schemes are not there. However, if you can ignore the neutron-monitor-based scaling schemes, this update removes everything from the 2008-v2.2 code that we actually know to be incorrect. Thus, for most practical purposes the 2008-v2.3 calculators can be used for surface exposure dating with equivalent accuracy to the CRONUS-2016 calculators. Version 2.3 is now the default version at hess.ess.washington.edu. However, it will not be updated further, but, hopefully, instead replaced by:

v3. The v2.3 update doesn’t fix the scaling scheme issue, and in addition it would be nice if the 2008-v2.2 calculators could be used at least for the other primarily spallogenic nuclides (C-14, He-3, and Ne-21), even if not Cl-36 yet. To deal with these issues, there now exists a developmental version 3 of the online exposure age calculators formerly known as the CRONUS calculators, which is here. The design principle of the v3 calculator is to do only exposure-age and erosion-rate calculations for surface samples, but to do them as fast as possible while maintaining acceptable accuracy for these purposes. Thus, it includes a highly streamlined version of  Sato-based scaling that works by interpolation of precalculated gridded data and runs a couple of orders of magnitude faster than the code in the CRONUS-2016 calculator. Muon production systematics are also highly simplified. In addition, the v3 calculator ingests Be-10, Al-26, C-14, Ne-21, and He-3 data for various mineral targets (simultaneously, and also does the multiple-nuclide plots, which is fun). At the date of this writing, I believe it works correctly, but it hasn’t been extensively checked and one should expect that it will be modified fairly often in the near future.

The ICE-D database. One motivation for developing a fast v3 calculator is to facilitate the development of online databases of both exposure-age data and production rate calibration data, such as those here and here. As discussed in other posts, the idea of these databases is that they pretty much make the current approach to online exposure age calculators obsolete, because raw observational data such as nuclide concentrations lives behind the scenes in a database and exposure-age calculations happen dynamically and transparently when viewing the data. This completely deals with the problem of comparing published exposure-age data that were calculated inconsistently. But for it to work, the calculations have to be fast. Hence the need for v3. At present, the developmental v3 code is the back end for the ICE-D database.

Another goal of this database development is to improve how we deal with production rate calibration data. The idea of the ICE-D production rate calibration database is that it represents a compilation of up-to-date production rate calibration data that is generally believed to be complete and accurate, and can be available to whatever online calculator system wants to use it. This overall principle was one of the motivations for…

CREp. This is a new online exposure age calculator put together by Leo Martin, PH Blard, and their colleagues at CRPG in France, that is here. It includes Lal-based and Sato-based scaling schemes and has a large range of options for production rate calibration data sets that are derived from the ICE-D calibration database. This is a good step towards solving the data-assimilation problem for production rate calibration data (which continues to be steadily generated)…theoretically, as we add new data to the ICE-D database, it should be immediately available to the CREp online calculator. CREp also has by far the most attractive user interface (although unfortunately this is not a very high bar).

To summarize. The current situation features at least four different options for online exposure age calculations. One and two, the online calculators formerly known as the CRONUS calculators — the 2008-v2.3  and v3 calculators at hess.ess.washington.edu. Three, the CRONUS-2016 calculator. Four, CREp. Five, the ICE-D database, although as noted above that is just using the v3 code as the back end.

I think this is great.

I should note that there is not universal agreement on this being great. Some members of the CRONUS project envisioned something very different — that an eventual capstone result of the overall project would be a single online calculator that was used by all researchers everywhere as a de facto universal standard for exposure-age calculations. That is the vision described in this paper, which suggests the appointment of an international advisory committee to oversee a single online calculator and issue recommendations for best practices. I think this is a bad idea. In my view, the main value of the CRONUS project is that it generated a really large quantity of valuable information that is relevant to exposure-age calculations and production rate scaling. I think the best possible outcome of generating all this data is that a wide variety of people, whether associated with the original project or not, use it in whatever way they think is best to do whatever research they want to do, and to improve the state of the overall science. Ideally, instead of one committee-approved online calculator, we should have a whole bunch of online calculators that can compete based on their own merits. And that’s what seems to be happening. I think that’s really good.

However, this situation does create a couple of new problems.

One is more general. The question is, how are these things actually going to compete on their own merits? At present, certainly they can compete on the basis of ease of use and general attractiveness. However, really we would like for them to compete on the basis of some sort of quantitative performance metrics that reflect their speed, accuracy, and usefulness for whatever the needed application is. For this to happen, there needs to be some way to easily and transparently quantify the relative performance of different options. This is one important potential function of the ICE-D calibration database: to make the data needed for benchmarking and performance assessment of calculation methods in an easy and straightforward way. Actually making that process straightforward and transparent is more complicated, but at least that is a start.

The second problem is more specific. If I need to calculate some exposure ages right now, what do I do? The answer to this is fairly simple. It doesn’t matter very much which of the four options you use. There are only two things you need to remember. One, don’t use the neutron-monitor-based scaling schemes. Use the Lal- or Sato-based ones. These both work and are effectively equivalent for nearly all practical purposes. Two, which production rate calibration data set you use is much more important than which exposure age calculator you use. If you use the different calculators with the same calibration data, you should get indistinguishable (i.e., differing at less than measurement uncertainty) results in nearly all cases. Much more important, as always, is to completely record all the raw observations needed to compute the exposure ages in your papers, so that readers can recalculate the exposure ages themselves with different methods or calibration data as needed.

 

Another database project — for production rate calibration data

February 5, 2016

In a previous post I described the ICE-D: Antarctica project to collate exposure-age data from Antarctica. This post is about something similar that I am working on with Pierre-Henri Blard  and his colleagues at CRPG. Basically, we’ve now done the same thing with production rate calibration data, which is more useful and more important for several reasons that I will now enumerate in too much detail. If you can’t wait that long, here is the link:

ICE-D: Production rate calibration data

First, some review of what is going on here. To compute exposure ages from cosmogenic-nuclide concentrations, we need to know what the production rate of that nuclide is. We determine this with two steps. We start with some sort of a “scaling scheme,” which is just a model of how production rates vary with location, elevation, and time. Then we need to fit that scaling scheme to a “production rate calibration data set,” which is a set of measurements of cosmogenic-nuclide concentrations in rock surfaces whose exposure ages we already know from some sort of independent evidence. So we measure the average production rate during whatever the exposure time was at some sites whose exposure age we already know, and then we use a scaling scheme to combine data from multiple locations and then apply the results to sites whose exposure age we would like to measure.

The other thing that falls out of this process is the ability to evaluate how well a scaling scheme works. If we can do a good job of fitting the scaling scheme to a set of real calibration data from different locations, elevations, and ages, then we conclude that the scaling scheme is doing a good job. Consult a paper by Brian Borchers and others to see how this works. Basically, this is how we decide which scaling scheme we should be using if we want to compute exposure ages in the most accurate way possible.

Obviously, production-rate calibration data are quite central to this whole process. Another thing that is important is that this whole issue just became a lot more complicated. Expect this to become more blog fodder in the future, but the summary is that the recently completed CRONUS-Earth project, as well as some other efforts, have resulted in a proliferation of scaling methods, online exposure age calculation schemes, and calibration data. In addition, the rate that new production-rate calibration data get generated is reasonably high — a handful of papers describing new calibration data appear each year — which means that the total data set is now rather larger than what was included in the main calibration-and-evaluation project that was done a couple of years ago as part of CRONUS and is described in the paper by Borchers and others. And it’s growing.

So this creates a couple of problems.

One is just a data assimilation problem. What we really want is for the production rate calibration process to be, basically, self-updating — when new calibration data are generated, they are incorporated into our current best estimate of production rates. We, in large part meaning me, have done a terrible job of this in the past few years; there really weren’t any systematic updates of how well scaling models fit the entire set of calibration between the 2008 paper by myself and others and the 2015 Borchers paper. That’s embarrassing, because it was clear very shortly after the 2008 paper was published that the calibration data set in that paper isn’t very good. The data assimilation problem isn’t particularly computationally difficult; we know how to do this. However, there are some obstacles, the first one being the fact that the calibration data needs to be all in one place so that software can easily access the current data set, whatever it may be, in a fast and low-hassle fashion.

The other is the issue of how to decide which of the various scaling and age-calculation options to use, which of course is wound up in the data-assimilation problem because the correct answer will likely change as new data appear. A paper by Fred Phillips and others, that summarizes some of the where-do-we-go-next discussions at the end of the CRONUS project, envisions the existence of an ongoing international committee that will decide what the best way to compute exposure ages is, and prescribe this method to folks who want to do exposure-age calculations. The idea seems to be that this committee would evaluate various proposed scaling schemes and calibration data and determine which were acceptable, which were not, and which one is best.  I’m not a co-author of this paper, and I think this is a bad idea. It creates disincentives for folks who are not on the committee to engage with trying to make the overall field of exposure-dating better, and incentives for people on the committee (who will, of course, be selected because they are responsible for the present state of the art, whatever it is) to maintain the status quo. In my view, this approach would impede progress. As an Economist subscriber who lives near Silicon Valley, of course, I think this is a software problem and not a governance problem: if you create software tools to make it easy to evaluate calibration data and scaling schemes, then people can figure things out for themselves without any help from a committee, and the best-performing calculation methods will float to the top because they are the best-performing. Of course, I don’t have this software yet. But regardless, the first thing that needs to happen here is to make it possible for this software, or anything else, to easily get the entire set of existing calibration data, whatever it is and whenever you want it.

So that’s the initial problem that a calibration data database needs to solve: putting all the calibration data we know about in one place and delivering it to anyone or anything who wants to use it. At present, this problem is not solved. What we have currently is the usual terrible situation in geochronology where various researchers are all maintaining their own mutually inconsistent Excel spreadsheets that each include some fraction of the existing calibration data, are variably up-to-date, and contain different errors and omissions. This situation, of course, maximizes confusion, redundant work, potential for error, and general hassle; and minimizes accuracy, repeatability, and transparency. We can do better. Specifically, the ICE-D calibration data project aims to replace the inconsistent-spreadsheet situation with an online database of all known production rate calibration data that has the following properties: (1) it is generally believed to be reasonably complete and accurate; (2) it only exists in one place so that there are not multiple inconsistent versions; and (3) it can easily be ingested into any software that wants to use these data to do calculations, whether elaborate online exposure age calculator frameworks or odd bits of MATLAB code running on one’s local machine.

So here are the details:

What. The ICE-D: Production Rate Calibration database project.

Where. It’s here: http://calibration.ice-d.org.

Who. I and Pierre-Henri Blard of CRPG/CNRS are at present collaborating on putting it together. If you are involved with collecting production rate calibration data, or you think some data are missing or wrong, you should help too. That does require some knowledge of relational databases and MySQL. Give one of us a call.

Disclaimer. This project is not part of the CRONUS-Earth project.

What, in more detail. As with the ICE-D:Antarctica database, data live in a MySQL database hosted on the Google Cloud SQL service. There is a  front end running in Python on Google App Engine. At present, the front end provides various browser interfaces (by location, publication, etc.) to look at data associated with individual samples or sites. It looks very similar to the Antarctica one. Some interesting features are as follows:

Nonprescriptiveness. The database is organized such that samples can be grouped into “calibration data sets.” For example, the database contains many beryllium-10 measurements from calibration sites that were not included in the CRONUS calibration exercise described in the Borchers and others paper. However, it’s possible to access just the samples that were used in that study as a distinct “calibration data set.” The idea here is to make it possible to replicate previous calibration studies using the data that were used in that study, even as the overall data set grows. It’s also to make the project non-prescriptive: the database should contain all data that can plausibly be described as decent calibration data, but you should be able to decide which data you want to use.

No more spreadsheets. What we want here is for any software to be able to ingest the most up-to-date calibration data from anywhere. This is the fun part. For example, say I am running MATLAB on my laptop and I want to ingest some calibration data to do some kind of calculations. I can get the current up-to-date data for the “Primary Be-10 calibration data set” described in the Borchers et al. paper noted above by using the ‘urlread’ function of MATLAB as follows:

urls = ['http://calibration.ice-d.org/cds/4'];
s = urlread(urls);
l1 = '<!-- begin v3 --><pre>';
l2 = '</pre><!-- end v3 -->';
ss = s((strfind(s,l1)+length(l1):strfind(s,l2)-1));

What you just did was to read the entire calibration data set (102 measurements) in online exposure age calculator v3 input format into a string variable. The following then parses it into a useful data structure:
in = validate_v3_input(ss);

That uses the ‘validate_v3_input’ function from the version 3 online exposure age calculator, which isn’t in really great shape yet so it’s not posted anywhere, but if you want a copy let me know. The point is, you don’t need the spreadsheet any more. You just need an internet connection and five lines of code.

Repository of associated knowledge. This is a neat feature that I am pretty sure no one other than me will use. Lots of people have been involved in collecting production rate calibration data over the years and all of them have odd bits of knowledge about sites and samples that aren’t part of the very short list of numerical data (elevation, nuclide concentration, etc.) associated with each sample. Still, this harder-to-quantify descriptive information might be useful and it would be great to have it in the same place with the numerical data. Thus, the browser interface has a ‘discussion’ feature. If you log in with a Google ID (sorry about that, but it’s the minimum level of authentication needed for decent security practice) you can say whatever you want about sites and samples. If you know that a Be-10 standardization recorded in the database is wrong, say so, so someone knows to fix it. If you were concerned about the amount of moss on the rock surface, say so. If you know something unpublished about the radiocarbon age constraints, you can add that to the permanent record. Wouldn’t it be useful if all this sort of info was all together with the numerical data? Absolutely. The database also has provision for site and sample photos, which are included for projects I was personally involved with, so I have photos. However, enabling photo upload from the general population is a bigger coding project and that hasn’t happened yet.

Where is it going?  A couple of improvements and applications are really going to happen. First, Pierre-Henri and  his colleagues at CRPG are working on a MATLAB-based online exposure age calculator that will use this database to update calibration data as needed. This is in testing and it is happening. Second, the database is reasonably complete right now for Be-10, Al-26, and He-3 calibration data, but nothing else. In-situ-produced carbon-14 in quartz will happen — that is just a matter of data entry. Including chlorine-36 data is part of the plan, but is more difficult simply because a lot more data need to be recorded. That’s more speculative, but there’s been a bit of progress. Other items on the wish list include making it look less like a static database and more like Wikipedia, so more people can contribute. That’s certainly feasible but a lot more programming work; we’ll need help for that. Finally, we’ll need means to use these data in whatever computational environments people are using for exposure-age calculations. The above example shows how it works for MATLAB, but that could be a lot smoother and, obviously, it would be useful to do the same thing for stuff like Python, R, and, yes, Excel if you really must.

Summary. You don’t need to keep the incomprehensible spreadsheet updated any more. Done with spreadsheets.

 

Elevation/atmospheric pressure models

October 16, 2015

This post is about elevation/atmospheric pressure models (or, “atmosphere models”), and which one should be used for exposure-dating purposes. These are important because the reason cosmogenic-nuclide production rates change with elevation is that the air pressure changes with elevation; at higher elevation there is less atmosphere between you and the extraterrestrial cosmic-ray flux. Thus, the parameter that is actually controlling the production rate is not the elevation, but the atmospheric pressure. The difficulty this presents is that while it is relatively easy to determine the elevation of an exposure-dating sample site by direct measurement, it is not at all obvious how one might directly measure the mean atmospheric pressure at that site over the entire exposure history of the sample. Mostly we deal with this by (i) measuring the elevation, and then (ii) converting that to a mean atmospheric pressure by reference to some sort of an atmosphere model that relates elevation to atmospheric pressure and is based on the characteristics of the modern atmosphere.

Early on in the development of production rate scaling schemes (like in the ’90’s), we generally accomplished step (ii) using the ICAO standard atmosphere model. This consists of a single formula relating elevation above sea level to atmospheric pressure that is intended to be a decent approximation for the part of the Earth’s atmosphere lying in temperate latitudes commonly transited by aircraft. Unfortunately for exposure-dating use, although the standard atmosphere is reasonably accurate in an average sense, it fails to capture fairly large spatial variations in the global atmospheric pressure distribution. The following plots show this by comparing the standard atmosphere model to measured mean barometric pressures at various places on Earth. The data set for comparison here is this, which is a set of long-term means of various climate data measured at weather stations globally.

stdatm

This first figure shows observed mean barometric pressures as a function of elevation from the data set of station measurements (red dots) compared with the standard atmosphere approximation (black line). As noted, the standard atmosphere does pretty well on average, but there is quite a lot of scatter. In particular, as noted some time ago in this paper among others, it does a terrible job in the deep southern hemisphere and Antarctica. To see how terrible, here is the above plot showing only station data south of 60° S.

stdatm1

The elevation-pressure relationship is significantly and systematically different in Antarctica, and the standard atmosphere approximation doesn’t work at all.

The important question, of course, is how important is the scatter from the perspective of production rate estimation? To attempt to answer this, try the following. For all the weather stations in the global data set, calculate the production rate scaling factor for the actual measured mean atmospheric pressure at each station. Here and henceforth in this post I will do this with the ‘St’ scaling scheme in the online exposure age calculators. Then, use the standard atmosphere approximation to estimate the atmospheric pressure at the site of each station, and calculate the production rate scaling factor based on this estimated pressure. Then compare the production rate computed using the actual observed atmospheric pressure to that calculated using the estimated atmospheric pressure. For the standard atmosphere and just considering stations north of 60° S, this yields the following:

stdatm2

What is being plotted here is the production rate calculated for the pressure estimated from site elevations and the standard atmosphere approximation, divided by the production rate calculated for the actual measured pressure. A value above 1 means the standard atmosphere overestimates the production rate relative to measured mean pressure; a value below 1 means an underestimate. Some of the few rather extreme values in this plot (like 40% errors) most likely have to do with errors in the weather station data set that I did not remove by a quick screening, so we won’t get too worried about them. However, if we disregard the worst 2% of the data as outliers most likely due to errors, exclude data south of 60° S, and take the mean and standard deviation of the remaining results we find that using the standard atmosphere approximation even outside the high southern latitudes systematically overestimates production rates across the board by about 1 % (with a standard deviation of 3.2%). In addition, as evident in the above plot there is a significant systematic drift toward larger overestimates at higher elevations. So the summary here is that if we just apply the standard atmosphere approximation to estimate production rates at arbitrary locations globally, we will incur a systematic error at high elevations and an uncertainty in the production rate estimate of at least 3%, depending on how you interpret the distribution of the residuals.

In the paper describing the 2008 online exposure age calculators, we showed that you could do better than this by using a spatially variable atmosphere model. We used the same formula relating pressure to elevation as used in the standard atmosphere approximation, but made two input parameters to the formula — sea level pressure and temperature — spatially variable. The MATLAB function that implements this method (NCEPatm_2.m — downloadable from here) looks up long-term average sea level temperature and pressure values computed by the NCEP reanalysis  (specifically, the 2008 version thereof) at one’s sample location, and uses them as input parameters to the standard atmosphere equation. The equivalent plot as above for this method is as follows:

NCEP_std

Note that the y-axis scale has changed. The red circles are production rate residuals using the standard atmosphere as plotted above in the previous figure, and the blue dots use the 2-spatially-variable-parameter model based on the NCEP reanalysis. The residuals are smaller across the elevation range and the overall systematic offset as well as the systematic drift at high elevations is corrected. Note that this figure is the same as Figure 1 in the paper describing the online exposure age calculators linked above (although the y-axis is inverted here). Much of the apparent improvement visible here, however, is just from bringing in the serious outliers — the standard deviation of the residuals for the entire data set, calculated in the same way as described above, is 2.1% in contrast to 3.2%, which is a bit better but not all that much different. So if we apply this atmosphere model to arbitrary exposure-dating sites globally, we get rid of the systematic overestimate and the uncertainty in production rates attributable to this source drops to about 2%. This is pretty good,  but note that this is still a significant contributor to production rate uncertainties. Scatter in globally distributed production rate calibration data sets based on geologic calibration data is typically in the 6-8% range; uncertainty in air pressure estimation could account for a quarter to a third of that total scatter; more if you consider the issue of past changes in the atmospheric pressure distribution.

Now to get to the actual point of this post. A recent paper by Nat Lifton, as well as the recent CRONUS-Earth production rate calibration project described in a paper by Brian Borchers, used a different atmosphere model developed by Nat Lifton. It is similar to the 2-spatially-variable-parameter model based on the NCEP reanalysis used in the 2008 online exposure age calculators and described above, but uses data from the more recent ERA40 reanalysis and adds latitude-dependent variation in a third parameter related to the lapse rate. The MATLAB code is available as part of the supplement to the Lifton paper; I’ll try to add a link here in future. The question here is how this compares to the 2008 NCEP-based scheme. To look at this, replicate the above figure such that instead of comparing the standard atmosphere to the spatially variable NCEP scheme, we are comparing the 2008 NCEP scheme to the 2014 Lifton/ERA-40 scheme.

ERA401

Here the blue points are the same as the blue points in the previous figure (using 2008 NCEP model) and the green points use the 2014 ERA40 model. There are some differences, but the two models agree very well in total performance, both having insignificant systematic offset and 2.1% scatter. Thus, although to my knowledge it is generally believed that the ERA40 reanalysis is more accurate in a climatological sense than the NCEP reanalysis, this does not appear to make a significant difference from the perspective of global uncertainty in production rate estimates contributed by air pressure estimation, at least given the data available to test this.

Note that there are some regional differences between the two atmosphere models. The following figure shows sites at which the production rate based on the ERA40 model is less than that based on the NCEP model in blue and sites where the opposite is true (P(NCEP) < P(ERA40)) in red.

compare

These differences are actually quite small (the standard deviation of P(NCEP)/P(ERA40) is 0.4% and the largest symbols in the plot above correspond to a difference of about 2%), but it is possible that although there appears to be no significant difference in the performance of the two atmosphere models when evaluated based on the entire data set, there may be some percent-level differences in performance in certain regions. I haven’t investigated this in much detail, but based on the map above it might be worth looking into for some mountainous regions, specifically the central Andes, Himalayas, and central Asia.

One more thing for completeness. Here is the relative performance of various atmosphere models for the deep southern hemisphere and Antarctica (all stations south of 60° S).

south

The red data are for the standard atmosphere, which we already know does a terrible job here. Blue data, spatially variable NCEP-based scheme; green data, spatially variable ERA40-based scheme. The black data are for the single-formula Antarctic atmosphere approximation described in the paper by John Stone referenced above (the `antatm‘ formula in the online exposure age calculators, which is actually based on this paper). Of the methods that don’t do a terrible job (antatm, ERA40, NCEP), none display a significant systematic offset, and the scatter in the residuals (considering all data; no effort to remove outliers) is 3.5% (antatm), 4.6% (NCEP), and 4.0% (ERA40). So it appears that the best air pressure approximation for Antarctica continues to be the simple single-formula `antatm’ approximation, although both spatially variable schemes perform nearly as well.  Note that although this scatter for Antarctic sites appears to be significantly larger than for the data north of 60° S, there are not very many station measurements for Antarctica, so this could be the result of only a couple of errors in the data set. Don’t take those scatter estimates too seriously without investigating the input data in more detail.

Summary:

Even without considering changes in the atmospheric pressure distribution over the entire exposure history of exposure-dating samples, uncertainties in approximating mean atmospheric pressure from sample elevations probably contribute at least 2% uncertainty to production rate estimates, which is a significant fraction of the scatter evident in fits to geological calibration data.

All atmosphere models that are more sophisticated than the standard atmosphere (NCEP-based scheme from Balco et al., 2008; ERA40-based scheme from Lifton et al., 2014; simple Antarctic model from Stone, 2000) display similar performance when tested against a global data set of measured mean station pressures (obviously, I am only testing the Antarctic model against Antarctic stations as described above). There appear to be some significant differences between the ERA40 and NCEP-based models for certain regions, but I haven’t looked into that in enough detail to determine which is better.

One more thing:

If the NCEP-based scheme of Balco et al. (2008) and the ERA40-based scheme of Lifton et al. (2014) perform basically the same, which one to use? As discussed above, there is a possibility that ERA40 is better in some mountain areas. This would be worth looking into. The other important thing is, which one is faster? In MATLAB R2014b, the ERA40 model (ERA40atm.m written by Nat Lifton) is faster than NCEPatm_2.m by approximately a factor of 2, the reason for which is unclear but seems to have something to do with differences in interpolation schemes between versions of MATLAB. So it might not be faster in Octave; that would be worth looking into.  In any case, there is no reason not to use ERA40atm.m henceforth.

Exposure-age data archiving performance experiment

July 16, 2015

A notable aspect of the U.S. National Science Foundation’s Antarctic Research program is that there are some data-availability and archiving requirements not generally present in other programs. This is in part a responsibility of NSF under the Antarctic Treaty system, which obligates treaty nations to report on their scientific activities to other treaty nations. The current implementation of these requirements at the level of most project PIs is that PIs must create some sort of a pointer to any data sets created during the project in an index maintained by NASA and known as the Antarctic Master Directory. My understanding of this policy is, furthermore, that not only must some sort of a pointer to the data be in the AMD by the end of the grant period, but all data collected in the course of the project must also be publicly available, presumably online, no later than two years after data collection. Although I think this is an excellent policy in principle and I do my best to comply in letter and spirit, the reality is that it’s actually not at all trivial to do this in a comprehensive way that really makes the data useful and accessible to others. Part of putting the ICE-D:Antarctica database together was to make it a little easier for me to accomplish this.

But recently in putting together the ICE-D database, the overall idea of which is that it is supposed to contain all known cosmogenic-nuclide exposure-age data for Antarctica, it seemed like a sensible idea to see exactly what cosmogenic-nuclide data were accessible through the AMD — the hope being that some of the large known inventory of unpublished cosmogenic-nuclide data from Antarctica would be indexed and archived there, thus facilitating its inclusion in the ICE-D database.

This page shows the results of my experiment. Basically, I used the search utility available on the AMD front page to conduct a full-text search for words such as “cosmogenic,” “exposure-age,” and similar terms that seemed relevant. I then attempted to navigate through links in each AMD entry that this search located, so as to actually locate the data described in the entry. This exercise was interesting. At the time of this writing, I found a total of 34 entries in the AMD whose description indicated that they might point to useful exposure-age data. In 14 of these cases, I was easily able to follow links to obtain a data set that closely resembled what was described in the AMD entry. In an additional 10 cases, I was able to navigate to at least some data that comprised part of the data set described in the AMD entry, and in some of these cases I was able to use additional knowledge (for example, I independently knew where the data were located on a different web site, or in a publication, that was not described in the AMD entry) to obtain all the data or verify that it was publicly accessible. In ten cases, I was not able to follow links to obtain anything that remotely resembled the data described in the entry; links were either dead or uninformative. Again, this page shows all the details. 

I think this is interesting. Clearly the requirement to index data generated in Antarctic research projects has a non-zero positive result; in the aggregate, I obtained a significant amount of information that does not otherwise appear in easily accessible publications. However, many of the AMD entries contain extremely sparse information that may minimally comply with the letter of the NSF indexing requirement, but is far from complying with the spirit of the overall goal of public access to data.

 

 

 

The ICE-D database project

April 20, 2015

The point of this posting is to describe a database project that I have been working on for the past few months.

Why:

There has been a lot of talk over the past ten years or so about building a “community database” for cosmogenic-nuclide data. The  motivation is pretty obvious. There exists an enormous inventory of cosmogenic-nuclide measurements that are potentially useful for synoptic studies of paleoclimate, ice sheet change, and Earth surface processes. However, because of continual research into production rate systematics, how we calculate exposure ages, erosion rates, etc. changes all the time. So if you want to compare exposure ages from papers published at different times, you have to go back and find all the raw observations in all the papers and use them to recalculate the exposure ages according to a consistent calibration data set, scaling scheme, and age calculation method. This is a big pain in the neck and, in my view more importantly, it leads to unreviewable papers. There is simply no way to verify in detail  the large and complex exposure-age compilation spreadsheets in papers that apply compilations of this sort to address large-scale questions.

It would be oh-so-much easier if you could have (i) a single database that stored only raw observations and was generally believed to be fairly accurate, coupled with (ii) some kind of software to dynamically calculate exposure ages with whatever the currently accepted calibration data set and scaling scheme is. This is not a particularly unique problem, or a unique solution. Everyone pretty much knows what is needed here.

The existing online exposure age calculator that has been in use since 2008 sort of provides this capability — one can maintain a spreadsheet of data extracted from old papers and paste it all into the online calculator, thus generating a compilation of ages or erosion rates that are computed in a consistent way. It is important that this capability exists and it is much better than nothing, and a lot of projects and papers that have collated large amounts of cosmogenic-nuclide data to address large-scale questions have only been possible because of this aspect of the online calculator. But this is silly, right? Why should everyone wade through the supplementary data for the same old papers and then maintain their own separate, mutually sort-of-inconsistent spreadsheets of the same thing? Although the idea that this is a valuable exercise for the student is not completely without merit, in a general sense it is an inefficient use of resources that maximizes the number of opportunities for error as well as the amount of work needed every time new data or production rate calibrations appear.

So for those of you who are interested in applying the vast number of existing cosmogenic-nuclide measurements that exist out there to address synoptic questions in paleoclimate and Earth surface processes, the vision is pretty clear. It is obvious what is needed. There has been plenty of discussion of what and how. However, basically nothing has happened. The reasons for this are beyond the scope of the current post, but are fairly routine in the overall area of scientific data discovery and management, and are well documented in both Earth science and other fields.

The point of the current post is that even though I’ve certainly made my contribution (or non-contribution) to past inaction, I am tired of waiting for something to happen. Thus, I have built exactly this sort of database for cosmogenic-nuclide exposure-age data for Antarctica. Antarctica is an interesting case because, first, cosmogenic-nuclide exposure-age data from ice-free parts of Antarctica are by far the most extensive data set we have for reconstructing past ice thickness change in Antarctica and thus Antarctica’s contribution to past sea level change, so it is really, legitimately, important to be able to look at these data together at continental scale. Second, Antarctic exposure-age data have been collected over more than two decades, spanning innumerable changes in how we calculate production rates and exposure ages, so they are a great example of an intercomparison mess.  And third, there are not that many exposure-age measurements for Antarctica — two or three thousand total measurements all together — so the data set is manageable in scale.

What:

The project is located here:

http://antarctica.ice-d.org/

In addition, Brad Herried of the Polar Geospatial Center at the University of Minnesota has built a geographic interface at this address:

http://hess.ess.washington.edu/iced/map/

For those of you who (i) are personally involved in collecting this type of data in Antarctica, and/or (ii) have access to the PGC high-resolution imagery, the map interface is just awesome. It’s fascinating and immersive to be able to look at exactly where all the samples collected in previous work are throughout the entire continent. On the other hand, it suddenly makes the continent feel a lot smaller.

newsletter_image

Exposure age samples from Mt. Darling, Ford Ranges, Marie Byrd Land, via the ICE-D database. Thanks PGC.

A few important features of the project:

1. Functionally what is happening is that the data are stored in a MySQL database running on the Google Cloud SQL service. Then the web pages are served by Python code, running on Google App Engine, that extracts data from the database and interacts programmatically with the web service API for the online exposure age calculator . This all employs commonly used software tools and none of this is rocket science.

2. The data that are in there fall into two main categories.

One, the majority of published cosmogenic-nuclide data for Antarctica are in there and indexed according to publication. I’m still working on entering some published data, in particular older data from the Dry Valleys area that are not as well documented in some ways as most newer data. So there are some gaps there that are being gradually filled. In general, data are not very complete for the Dry Valleys, but they are fairly complete — at least as regards published data — from elsewhere.

Two, every single bit of exposure-age data I have ever collected in Antarctica, published or unpublished, is there. I should qualify that by saying that there are some data collected in collaborative projects, where I didn’t collect the samples and I was mainly providing noble gas measurements, that are not there. Yet. But I am working on that. The point is that there are tons of unpublished data in there. I feel great about this — because it is no fun to feel guilty about sitting on a huge hoard of unpublished data that were collected at public expense. Also notable is that the samples collected by me are attached to extremely comprehensive background data, including a lot (= thousands) of photos of samples and sample sites. Here’s an example. Basically, you now have all the information you need to tell me that I collected the wrong sample in the field.

3. The data that are not there fall into two categories:

One, older papers that I have not yet gotten around to extracting data from, as noted above.

Two, several large hoards of unpublished exposure-age data collected by researchers who are actively working in Antarctica. Some of the most important data collected in recent years, that should be both published and represented here, are not. Those responsible know who they are.

4. Everything that is in there that I did not collect personally is, to the best of my knowledge, an accurate representation of what was published in source papers. But I am sure there are many publication and transcription errors that exceed the scope of my knowledge. One goal of this project that is not yet implemented is an editing interface, so that those who know about errors in existing data and/or new data can contribute to improving the overall product. Really what this should be is not a means of data archiving but something more like a structured data wiki. It’s not there yet, but that’s the goal.

If you do have specific knowledge of data that aren’t there or data that are there but are incorrect, let me know.

5. I’m still working on the interface code and I will be for the foreseeable future. Possibly for years. The web interface may, and probably will, be broken at any time. It’s a mess.

6. It’s a bit slow. Working on that. Be patient.

6. This is not a manifestation of any larger project. There were no meetings. Except for Brad Herried at PGC who built the map interface, no one else was involved to any significant extent, although several people helped by reminding me about papers/studies that I forgot or never knew about. Except for the resources contributed by PGC (which is funded by the NSF Antarctic and Arctic research programs) to build and run the map interface, there is no specific funding source. In fact, I am paying Google $10/month for SQL database hosting, so if you think this project is worthwhile you are welcome to contribute. In addition, this means that if you send me an email with the sentence “Greg, I think you should add a feature that does…something,”  then the next sentence of the email should read “I am really good at writing Python code and I would very much like to help you make that happen.”

Finally, to summarize what’s been accomplished or not accomplished:

1. This project solves the problem of making a dynamic cosmogenic-nuclide exposure-age database. In principle the idea of dynamically calculating exposure ages from stored raw observations isn’t complicated, but no one has done it. Now someone has. It’s not nearly as smooth as, for example, the Neotoma database, but it works. Progress.

aepDAR

Dynamically generated exposure age – elevation plot for Mt. Darling (pictured above). Compare to Figure 2 in Stone and others, Science, 2003.

2. This shows what is possible and what we should have done years ago. There is no obstacle to doing this with basin-scale erosion-rate data, alpine glacial moraine data, or anything else. With modern cloud computing services, it is easy to do this cheaply, efficiently, and scalably. It really is.

3. I haven’t discussed this in any detail above, but it is possible to interact with the database programmatically to do synoptic analyses. In fact, that’s the whole point. Thus, as the database itself and the age calculation methods evolve and improve, the synoptic analyses that stem from the database can automatically evolve as well. This is important. More later on this.

4. The geographic interface is awesome. Awesome.

5. I am not there yet in terms of making this more of a data wiki than a data archive. That’s an important idea, and it’s what is going to make this actually a useful tool, but it’s not yet implemented at all.