AMS “intercomparison” at GS09
At the Goldschmidt meeting last week, Silke Merchel gave a talk about sort of an ad hoc AMS intercomparison exercise that she carried out recently. Basically, she prepared a large number of Be-10 and Cl-36 AMS cathodes from a few solutions with different isotope ratios, and shipped them off to various AMS labs to be analysed as unknowns. In this talk, she then revealed the results, and stated which AMS lab produced which result. At first appearance the results were not too good: results from the full set of AMS labs differed by up to tens of percent for both nuclides, and there appeared to be significant systematic differences between various pairs of labs. A PDF of her abstract (which unfortunately gives only a fraction of the information in her talk) is here.
This is sort of a controversial thing to do. Immemorial custom in the radiocarbon-dating community requires that AMS intercomparison exercises be carried out in an anonymous fashion, where a central authority distributes an intercomparison material, the labs run it and submit results in sealed envelopes to the central authority, and the C.A. compiles the results and publishes them as statistical abstracts without associating specific labs with specific results. This is a comfortable and nonconfrontational way of doing things, and the idea is that labs whose own results are far away from the average result will take steps to get their act together without further prodding, whereas a public airing of inter-lab differences would just make people look bad, annoy them, and inhibit rather than increase cooperation.
However, this isn’t the most satisfying method from the user’s perspective. As a user I’d like to know how various labs perform so I can make a sensible decision about where to send samples. Thus Silke’s talk was extremely refreshing. I would like to see this exercise carried out in public more often.
However, it did bring up the important question of whether or not we should panic about this. If AMS labs analyse the same material and get results that vary by 10%, then actual exposure ages could be totally wrong by an additional 10% on top of production rate errors etc. If inter-lab differences were systematic, then PRIME Lab users would always assign moraines in New Zealand to the Younger Dryas, whereas LLNL users would find only the Antarctic Cold Reversal. This would be bad.
I don’t think things are that bad, for a couple of reasons. First, Silke’s results are not in agreement with other measurements, including some of the CRONUS rock sample intercomparisons, that were not as elaborate, and are hard to directly compare to these results, but produced more consistent results between labs than Silke suggests. Second, at most AMS labs this experiment only involved a couple of analyses of a couple of cathodes. Thus, measurement error hasn’t completely (or at all) been taken out of the experiment, which makes it hard to evaluate the results. This experiment would be much better if all labs were obligated to carry out a large number of analyses of each intercomparison solution.
Third, I am nearly certain that the Be-10 results at least are still contaminated by Be isotope ratio standardization errors. Remember, in making a Be isotope ratio measurement you are comparing the ratio in your sample to the ratio in a standard material with a defined isotope ratio. Get the defined ratio of the standard wrong and you get the sample wrong too. The problem here arises because different labs use different standard materials, and the assumed isotope ratios of these materials aren’t internally consistent. Thus, to properly compare Be isotope ratio measurements from different labs, you need to i) determine which standards they were normalized to, ii) determine what the assumed isotope ratios of those standards were, iii) determine (from an entirely separate set of experiments that may or may not ever have been conducted) whether the assumed isotope ratios for the standards are mutually consistent, and iv) if they’re not, correct the sample results accordingly before comparing them. This is extremely difficult to get right. When Silke distributed the samples, she told the receiving AMS labs to submit results that were normalized to a particular standard. However, given the fact that different labs use different assumed isotope ratios for this standard (and others) these instructions left, in my opinion, too much wiggle room, and it is nearly certain that the results in Silke’s data sent have not been consistently standardized. Silke’s instructions probably prevented apples-to-oranges comparisons, but some oranges are still probably being compared to grapefruit.
A better approach would have been to not specify a particular standardization, but to obtain from each lab the following pieces of information: i) the measured ratio of the sample, ii) the identity of the normalization standard used to make this measurement, and iii) the isotope ratio assumed for the normalization standard. With this information, it would be possible to ensure that internally consistent ratios were being compared, as well as to discard from the experiment any situations where adequate data to do the renormalization didn’t exist (i.e., if the two normalization standards in question had never been compared). Without this information, we continue to wonder if each datum is an orange or a grapefruit.
To summarize, I think Silke’s exercise was a very good thing to do. It is certainly embarassing to the cosmogenic-nuclide community that large differences appear to exist among AMS labs. Silke’s talk called much-needed attention to the issue and made AMS operators squirm a little. Unless anonymous intercomparisons are supplemented with fully reported public ones, labs risk having low credibility with the user community. More public intercomparison, more credibility. However, this experiment was not quite fair. If you’re going to stir up a storm of public opinion, you should do it with adequate data. Random measurement error should be taken out of the equation by standardizing the number of cathodes to be run (at a large number). More importantly, one needs to make damn sure all standardization issues have been sorted out before concluding that there really are unresolved sytematic differences. So we should do this again — send out intercomparison standards and make the results fully public — but we should do a better job next time.