# What is a camel diagram anyway?

This post addresses what a “camel diagram” actually is. So what is it? Basically, this is a stupid name, apparently invented by myself (the name, not the diagram, although even that is hard to believe), for a type of diagram which is commonly used to in the cosmogenic-nuclide literature to represent exposure-age data. Here is an example from a recent paper (Kelly MA and 6 0thers, 2008, *Quat. Sci. Rev.* 27, 2273-2282):

Basically, the caption above says what this is: it’s a way of representing a lot of measurements of the same thing that have Gaussian uncertainties. You draw a Gaussian with mean and standard deviation corresponding to each of your individual measurements, and then add them all up to obtain a summary curve. The use of this type of diagram in geochronology dates back to the ’80’s, mostly in the fission-track and argon-argon dating literature — here is an interesting example from a paper on the ages of spherules in the lunar regolith from Tim Culler (*Science* 287 pp. 1785-1788):

More relevant to glacial chronology might be another example in a paper by Tom Lowell about radiocarbon dating of LGM moraines in Ohio (Lowell, T.V., 1995. The application of radiocarbon age estimates to the dating of glacial sequences: an example from the Miami Sublobe, Ohio, USA. Quaternary Science Reviews 14, 85–99.). The point of this post is to explain what the point of this diagram is, why and when you should use it, and how to apply it both rightly and wrongly to exposure-age data. As with all good statistical constructs, it can be useful for misleading readers into thinking what you want them to think.

First, what is the point of this diagram? Basically, we are using this diagram to describe the frequency distribution of observations. We have made a set of measurements of what we believe to be the same thing, and we want to represent the distribution of those measurements. Normally to carry out this task, we would use a histogram, which is a fairly basic sort of a diagram in which we divide the observation space into bins, determine how many observations fall into each bin, and then fill each bin with a bar whose height is proportional to the number of measurements. Let’s say we measured a bunch of exposure ages, in thousands of years BP (i.e., ka), on a moraine, and got the following results:

`[23.1 24.1 16.3 24.1 21.3 15.9 17.8 20.5 24.6 24.6 16.6 24.7 24.6 19.9 23.0 16.4 19.2 24.2 22.9 24.6 ]`

We could create a histogram of these data by defining bins, let’s say with a width of 2000 years and starting at 0, and assigning these data to bins. An exposure age of 19.2 ka goes in the 18-20 ka bin, an exposure age of 22.9 ka goes in the 22-24 ka bin, etc. and so on. This yields the following table of how many samples fall in each bin:

Bin | Number in bin |

14-16 ka | 1 |

16-18 ka | 4 |

18-20 ka | 2 |

20-22 ka | 2 |

22-24 ka | 3 |

24-26 ka | 8 |

Which in turn produces the following histogram:

The x-axis is the exposure age, each bar is a bin, and the y-axis is the number of samples that fall into each bin. Three important points about histograms. First, they represent an observed frequency distribution of measurements. They’re not necessarily a probability distribution function for the ages of boulders on the moraine. If you i) made the additional assumption that the probability of observing a certain exposure age is exactly equal to the frequency distribution of exposure ages we have already observed (which is highly restrictive, but might be true if you had analysed all the boulders on the moraine), and then ii) renormalized the y-axis so that the sum of all bar heights was equal to 1, then you would arguably have a probability density function for boulder age. Second, you need to make two arbitrary decisions when you create a histogram: how wide are the bins, and where are they located? If you change these things, the histogram changes. Third, there is no uncertainty in histograms. Each measurement goes in one and only one bin.

Whether a histogram is or is not a probability density function is largely semantic and depends on your definition of terms, but the second and third points above mean that histograms are a lousy way to represent data when either one of two things are true: i) there are only a few measurements, and ii) the measurements have uncertainty associated with them. Obviously, these two things describe most geochronological data, cosmogenic-nuclide exposure ages in particular. We don’t collect very many because they’re expensive, and they have measurement uncertainty. So here is an example of how wrong you can go with a histogram representation of exposure-age data. Let’s say you analysed two boulders and found them to have apparent exposure ages of 16.9 +/- 2.1 ka and 18.2 +/- 1.5 ka. There are two important things about these results. First, the two ages are different. Second, they agree when their uncertainties are taken into account. However, it’s impossible to communicate both of these important observations at the same time using a histogram. Here’s one possible histogram for these ages:

This one gives the impression that the two ages are irreconcilably different. Wrong. So that’s misleading. How about this one:

That one indicates that the two ages are the same. Also wrong and misleading. The point is that when data are sparse and have measurement uncertainties, representing their distribution with histograms fails to communicate the information we are trying to communicate. This is the problem that “camel diagrams” are intended to solve. In constructing a histogram, we are basically representing each measurement by a rectangle with width equal to the bin width, and then adding the representations of all the samples together to get a summary histogram. Now what we will do instead is represent each measurement by something other than a rectangle. Usually, because we are generally working with cosmogenic-nuclide measurements that have normal, i.e. Gaussian, uncertainties, we represent each sample by a Gaussian-shaped curve. This is just a curve generated by the formula for a normal probability distribution:

where is the mean and is the standard deviation of the probability distribution. To do this for a single exposure age, we take the age we measured to be and the 1-standard-error uncertainty in the age to be . Doing this for the two data just mentioned above gives:

Representing the measurements as Gaussian curves visually communicates a lot of important things that we couldn’t communicate with the histogram. First, although the measurements are different, they are similar in light of their measurement uncertainties — if we envision each curve as something like a probability density function for the actual age of each of the samples, then the fact that there is a lot of overlap between the curves indicates that there is a high likelihood that they are both measurements of the same thing, and are different only because of measurement error. Second, we can compare the difference between the best estimate of each measurement — the location of each peak — and the size of the uncertainty on each measurement. In this case the measurements are more similar than their uncertainties, which also communicates the high likelihood that they are both measurements of the same thing. Third, because the formula for a Gaussian curve is defined such that the area under each curve is always the same, the height of each curve is inversely proportional to measurement uncertainty. This feature draws the eye immediately toward the most precise, i.e. tallest, measurements, and the viewer naturally tends to give those more weight. So representing data by continuous Gaussians instead of rectangles clears up a lot of the visual misrepresentation that histograms incur with small and uncertain data sets.

Typically one then adds the Gaussian curves corresponding to the single measurements together to come up with a summary plot, as follows:

The black line is the sum of the two individual Gaussians. The fact that it has only one peak correctly visually communicates the idea that the two measurements are both inaccurate measurements of the same thing — the true age of whatever we are dating — and considering them together tells us that this true age is likely to be somewhere between the two measurements, slightly closer to the more precise of the two. Basically we are performing a sort of a visual maximum likelihood estimate.

If we had data that didn’t agree even considering their uncertainties, we’d get something like this:

Because there’s not much overlap between the two Gaussian curves, they don’t add much to each other and we have a two-hump rather than one-hump summary plot. Hence the name “camel plot.” One hump good, two humps bad.

So the summary is that this type of a presentation, in which we represent observations by continuous functions rather than a histogram, solves the fact that a histogram fails to communicate the important information about small data sets with measurement uncertainty.

The next question is what to call it. The term “camel diagram,” while easy to remember, is pretty dumb. It’s not a histogram. It’s not really a probability density function (as suggested in the caption from Meredith Kelly’s paper given as an example above) because it’s not intended to represent the probability of observing a particular outcome — it’s intended to represent the frequency distribution of measurements already collected. This fact led Culler and others (other example above) to call it an “ideogram created by summing the Gaussians.” As the word “ideogram” is more commonly used to describe written characters in Chinese and other “ideographic” languages that communicate an entire idea or concept by a single character, using “ideogram” to describe this sort of a plot is, at the very least, confusing. Really what it is is a sort of a smoothed frequency distribution, and the proper statistical term for it is a “normal kernel density estimate.” This term communicates the fact that we are trying to estimate the frequency density of actual observations. The “kernel” is just what sort of shape is used to represent each datum. In a standard histogram, the kernel is a rectangle. Here it is the equation of a normal, i.e. Gaussian, PDF, so it is a “normal kernel.” In principle one could have any sort of kernel — triangular, Poisson, sinusoidal, anything you want. There is a lot of statistical research devoted to the proper way to construct a kernel density estimate.

When and why to use it? As noted above the value of this type of plot is in overcoming the fact that histograms are visually misleading for sparse and uncertain data. If you have sparse and uncertain data, the camel diagram is a very good way to visually communicate a lot of the important conclusions that should be drawn from the data. For this reason, it’s a very good way of presenting geochronological data. It doesn’t make a whole lot of sense to use it when the opposite things are true of your data set — data that are numerous and whose uncertainties are very small compared to the spread in their values are easily and honestly presented in a histogram.

Finally, several marginally related things to note. First, the fact that one **adds** the Gaussian kernels together is also related to the question of whether this diagram is a density estimate and not a probability estimate. If, as in the example above, we had two age measurements with Gaussian uncertainties, and we took each of those to be a probability density function for the age of the landform, then if we wanted to combine them into a single probability distribution, one could argue that we would instead want to **multiply** them to obtain the joint probability of both things being true at once.

Second, one potential serious error in the use of this diagram for geochronological data occurs in the situation where an age measurement is not distinguishable from zero. Let’s say you have an exposure age of 300 +/- 200 years on a Little Ice Age moraine. If you take this to be a Gaussian uncertainty, then you are saying there is a finite probability that the age is less than zero. Of course, it is not possible that the age of the boulder is less than zero, so taking this uncertainty to be Gaussian is wrong. Hence, a normal kernel density estimate representation of such data would also be misleading. In principle, you could overcome this by using a different type of kernel — Poisson for example — that always goes to zero at t = 0. Again, there is lots of statistical literature describing improved kernels that correct this problem.

Third, one question specifically about applying camel diagrams to exposure-age data: which uncertainty to use? Commonly in exposure dating we talk about two different values for the uncertainty. The so-called “internal” uncertainty includes only measurement uncertainty on the cosmogenic-nuclide concentration. So if we make a Be-10 measurement with 5% precision, then the internal error on the exposure age calculated from that Be-10 concentration is also 5%. The so-called “external” uncertainty adds uncertainty in the nuclide production rate that we use to compute the exposure age from the Be-10 concentration. For example, if the production rate uncertainty is 10%, then the same Be-10 measurement will yield an external uncertainty on the exposure age of about 12%. The important difference between these two is that the internal uncertainties are independent between exposure ages for samples from the same location, whereas the external uncertainties are not — they are all subject to a shared production rate uncertainty. This means that when comparing two samples at the same site to each other, one needs to use the internal uncertainty, not the external uncertainty: if we used the external uncertainty, we would often conclude that two samples agreed within their respective uncertainties when in fact this was not true. Because constructing a “camel diagram” for exposure ages from a particular landform is basically an exercise in comparing a set of samples to each other — you want to come to a conclusion about whether exposure ages are scattered due to postdepositional disturbance, for example — in most cases you should use the internal uncertainty along in constructing the diagram.

Lastly, here is some MATLAB code for actually constructing camel diagrams.

Was this post by any chance motivated by this article that came out shortly prior? https://www.medpagetoday.com/Pulmonology/SleepDisorders/26811

I noted at the time that this paragraph:

“To explore the issue, O’Brien and her colleagues looked at children in grades 2 and 5 (mean age 9)…”

was a perfect example of egregious misuse of statistics in the sciences. I happened to think back on it, and searched on the phrase “camel distribution” (the obvious name for the shape of the age distribution of 2nd and 5th graders) and found this post, which came out a month after the sleep disorders article.