Skip to content

The “fancy-pants” camelplot

September 25, 2018

This follows up on a previous post that describes the application of the normal kernel density estimate, also sometimes known as a “camel plot,” as a visual aid for displaying cosmogenic-nuclide exposure-age data. This type of diagram has one interesting problem that arises when it is used to display data that cover a wide range of ages. As covered at some length in the previous post, one useful aspect of this diagram for visual communication of data is that, as usually constructed, the Gaussian kernel drawn for each measurement is normalized to unit area. Thus, a more precise measurement will produce a narrower and taller kernel than a less precise one, which draws the viewer’s eye to the more precise measurement and provides a neat and effective way of rapidly communicating the difference in precision between measurements.

The problem with this is that, again as these figures are usually drawn, the normalization to unit area is linear in real units, i.e. units of years, rather than in relative units. Thus, suppose you have two measurements that have very different ages, say 10,000 and 100,000 years, but the same relative precision, say 3%. Even though the relative precision is the same, the uncertainty in the older age is larger in units of years; for this example the uncertainties are 300 years and 3000 years. Thus, the kernel for the older age spans more years, so normalizing to unit area involves dividing by a larger factor. The result is that even though both ages have the same relative precision, the kernel for the older age is shorter than the one for the younger age. Here is what this example looks like:

The ages have the same relative precision, but they don’t look like it. Depending on what you are trying to communicate with this diagram, this might not be what you want: the difference in size signals to the viewer that the older age is less important or less reliable than the younger one.

So one might want to correct this. Here is one way to do it. The equation for a Gaussian probability distribution is:

p(x) =  \frac{1}{\sqrt{2\pi\sigma^{2}}}\exp{\left[ \frac{-\left(x-\mu\right)^{2}}{2\sigma^{2}}\right]}

Where, \mu is the age, \sigma is the uncertainty, and the factor of \sqrt{2\pi\sigma^{2}}  is the normalization factor needed to make each kernel have unit area.  So the maximum height p_{max}  of each normalized Gaussian, i.e., the value when x = \mu , is:

p_{max} = \frac{1}{\sqrt{2\pi\sigma^{2}}}

To correct our problem, we want this not to vary with the age. A simple way to do this is to assume that all measurements have the same relative uncertainty — for example, a typical analytical uncertainty of 3% — and compute the expected height of the kernels as a function of the age. This expected height p^{*}_{max}(\mu)  is:

p^{*}_{max}(\mu) = \frac{1}{\sqrt{2\pi\left(0.03\mu\right)^{2}}}

Then, if we want to adjust the visuals so that measurements with the same relative precision, but different ages, have the same kernel height, we can just normalize again by this expected height. Then the kernels plotted on the diagram have the form:

p(x) =  \frac{1}{p^{*}_{max}(\mu)}\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp{\left[ \frac{-\left(x-\mu\right)^{2}}{2\sigma^{2}}\right]}

This loses the normalization to unit area, but this isn’t relevant in most applications of this diagram for visual presentation of data. What it does is make it so exposure ages with different ages, but the same relative uncertainty, have the same height. So the example above now looks like this:

fcp2

Measurements with different relative uncertainties will still have different heights. If the older measurement now has 6% uncertainty:

fcp3

If the younger has 6% and the older has 3%:

fcp4

So this formulation gets rid of the problem that two measurements with the same relative precision, but different age, will appear to have different importance, while retaining the effect that measurements with different relative precision will have different heights. It will have a minimal effect on the appearance of this figure for a data set if all the ages are similar; it only changes the appearance for data sets with a very wide range of ages. It does have the side effect that the area under each kernel scales with the age, so older ages may visually appear very large; this may or may not be a problem depending on what you are trying to do. It doesn’t affect the basic functions of the camel diagram in communicating information about whether ages do or don’t agree with each other that is difficult to do accurately with a histogram.

Note that this modification is intended to solve a visual-communication problem, and not to actually generate a quantitative probability density estimate. So you would not want to use the results of this procedure for subsequent likelihood estimates or anything of that nature.

The MATLAB code to do this is here:

fancy_pants_camelplot.m

The name is terrible, but it does give the idea that this is both kind of complicated and also a little bit fake at the same time, which is probably not a bad description and may help as a reminder that it’s designed for a visual effect and not a quantitative probability estimate. Other options might be pretentious_camelplot.m, uppity_camelplot.m, or too_big_for_britches_camelplot.m.

 

 

 

 

 

Advertisements
No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: