Skip to content

Putting the O in OCTOPUS

April 13, 2018

This post is about the recently developed “OCTOPUS” database of cosmogenic-nuclide erosion rate data. During the last year, folks involved in cosmogenic-nuclide erosion-rate measurements began to receive emails like this:

Dear [researcher],

I hope this email finds you well! You are receiving this email because you and coauthors published catchment-averaged 10Be / 26Al data within the last years. In doing so, you contributed to the number of more than 3500 in situ detrital catchment-averaged 10Be data that are available to date, forming a highly variable, statistically resilient dataset that represents substantial effort of both capital and labour. However, published data are often still inaccessible to researchers, are frequently subject to lacking crucial information, and are commonly different in underlying calculation and standardisation algorithms. Resulting data disharmony has confounded the purposeful (re)use of published 10Be-derived data, for instance for inter-study comparison, for looking at the greater picture of Earth surface’s downwearing or for innovative data re-evaluation.

Going online in June 2017, OCTOPUS database will do away with these problems on global level by providing and maintaining a freely available, fully harmonized, and easy to use compilation of catchment-averaged 10Be data. The project has obtained A$176,000 funding from the Australian National Data Service (ANDS) to build the infrastructure for hosting and maintaining the data at the University of Wollongong (UOW) and making this available to the research community via an OGC compliant Web Map Service.

However, this email is not just to inform you about OCTOPUS, but also to ask for your valued contribution to the project. To reconcile our compiled data and GIS derivates with your original files, we would highly appreciate if you would provide us with (a) drainage point shape file(s), and more important (b) catchment polygon files of your above named publication(s) within the next two weeks. Cross-checking your original files with our reproduced data will be an important part of our quality management.

I thought, OK, that’s interesting, databases are good, but lots of people come up with database projects that sound good to funding agencies but never accomplish much, and in addition I’ll be very surprised if anyone complies with the demand for shapefiles. Then a couple of weeks ago (that is, several months after June 2017) I got this via a listserv:

Dear Friends

A few days ago we have made available online a new open and global database of cosmogenic radionuclide and luminescence measurements in fluvial sediment (OCTOPUS).

With support from the Australian National Data Service (ANDS) we have built infrastructure for hosting and maintaining the data at the University of Wollongong and making this available to the research community via an OGC compliant Web Map Service.

The cosmogenic radionuclide (CRN) part of the database consists of Be-10 and Al-26 measurements in fluvial sediment samples along with ancillary geospatial vector and raster layers, including sample site, basin outline, digital elevation model, gradient raster, flow direction and flow accumulation rasters, atmospheric pressure raster, and CRN production scaling and topographic shielding factor rasters. The database also includes a comprehensive metadata and all necessary information and input files for the recalculation of denudation rates using CAIRN, an open source program for calculating basin-wide denudation rates from Be-10 and Al-26 data.

OCTOPUS can be accessed at: https://earth.uow.edu.au.

Great. Time to check this out and see what it is.

But first of all, to get one thing out of the way, this is proof that Earth science has now reached the acronym-pocalypse — in which it has become so critically important to have a fancy acronym that all rules relating the acronym to the words being abbreviated have been thrown away, and you can do whatever you want.  It is true that even though we hope the responsible parties will be experts in geochemistry, we don’t necessarily need them to be wise in the use of the English language — but, frankly, this is pretty bad. While all the letters in “OCTOPUS” do occur in the full title “an Open Cosmogenic isoTOPe and lUmineScence database”, the total lack of relation between word beginnings and acronym elements makes the reader wonder why it could not equally well have been called “MESS” (‘open cosMogEnic iSotope and luminescence databaSe’) or “POOP” (‘oPen cOsmOgenic isotoPe and luminescence database’), because both “MESS” and “POOP” are just as closely related to geomorphology and/or cosmogenic nuclide geochemistry as “OCTOPUS.”  Or, perhaps, in light of the discussion below, it could be “NODATA” (‘opeN cOsmogenic isotope and luminescence DATAbase’).

So now that that’s out of the way, let’s proceed to actually looking at the database. As noted above and as also evident from the fact that I’ve developed three other online databases of cosmogenic-nuclide data (see here), I think online databases are great. The current state of the art in cosmogenic-nuclide geochemistry — in which everyone interested in synoptic analysis of exposure ages or erosion rates must maintain tedious, unwieldy, and mutually inconsistent spreadsheets that require constant updating — is a huge waste of everyone’s time and is very effectively calculated to maximize extra work and errors. It would be much better if there were an online repository of generally-agreed-upon raw observations from prevous studies, so that everyone interested in synoptic analysis is referring to the same data set. So from this perspective, let’s get over the embarrassing acronym choice and begin looking at the website with a positive attitude.

Right. Upon locating the website, we’re presented with (i) a very large area devoted to advertising the University of Wollongong, surrounding (ii) a Google map of the world, with (iii) a layer selection panel.

Selecting an example layer displays river basins on the map that are presumably associated with cosmogenic-nuclide data:

Clicking on a random basin in South Africa brings up a window displaying some information about this basin:

Great. But now we have a problem. Incomprehensibly, nothing in this information window is a live link. One of the most basic questions that a user wants to ask in this situation is whether the erosion-rate estimate that’s presented for this basin agrees with what one might calculate using a different scaling method or a different online erosion rate calculator. To facilitate this, in this situation I would expect that, for example, the ID number of the basin would be a link to a page displaying all known data for that basin: location, Be-10 concentrations and standardization, etc. and so on, hopefully formatted in such a way as to allow for easy recalculation. However, no such thing is present. This is a serious problem — why is there no access to the actual source data?

Fortunately, a bit of experimenting reveals that making the window bigger causes a “download” link to appear:

This seems like a bit of a coding error — obviously the existence of the link shouldn’t be dependent on the window size — but whatever. A bit more experimentation shows that one can select an area within which data are to be downloaded:

But now we have a serious problem. I can’t actually get any data without providing a name and email address. Full stop.

OK, so far we have what is advertised as “Open” and a “Database”, but isn’t either. It’s not open — anonymous download is not possible — and so far I haven’t actually seen any data, just a picture of what locations are purported to have some associated data. I haven’t so far entered anything in the name and email boxes and pressed ‘request download,’ so I don’t know whether (i) the web server will just record the information and send me the data, or whether (ii) it starts an approval process where the developers have to decide whether or not I’m worthy of the privilege. Let’s consider both options, though. If (i) is the case and I can just enter “Mark Zuckerberg” under “name” and “mzuckerberg@facebook.com” as the email address, then there is absolutely no point to interrupting the user’s interaction with the website to collect valueless information, except to give the user the cheap, queasy, and frustrated feeling, familiar to everyone from their interaction with many commercial websites, that he or she is not the customer but the product. If (ii) is the case and the site developers must decide if I am a suitable person to whom to entrust data which have, of course, already been published in publicly available scientific literature, then this entire project is the opposite of “open.” Either option is pretty bad, and I’m extremely disappointed to see that the developers here don’t seem to have the strength of character to stand behind the use of the word “open” in the title. Not impressive.

In addition, requiring identity information to access data is going to make it rather difficult to really have anonymous peer review of the accompanying paper. Not great if the anonymous peer reviewers have to log in as themselves to see the thing they are supposed to be reviewing.

Fortunately there are some clues to how we can get around this problem. The email quoted above that announces the existence of the database makes reference to OGC-compliant web services — these are, essentially, links that GIS or mapping software can use to access online data dynamically rather than having to download a local copy of the data – and, presumably, this is what is being referenced by the embedded Google map on the web page. Unfortunately, the website itself makes zero reference as to whether or not these web services exist or how to access them. So clearly a bit of investigation is needed. Investigating the Javascript source code associated with the embedded map shows us:

which indicates that the Google map is, in fact, getting data from a OGC-style web map service (wms) at the address https://earth.uow.edu.au/geoserver/wms. A “web map service” is intended to serve raster data — digital elevation models, imagery, etc., which is actually not what we want if we are interested in attribute-type data associated with point or vector features. The equivalent service for that is a “web feature service,” and if such a thing is also running on the CTOPUS web server (if it’s not “Open,” we can’t really call it OCTOPUS any more) we ought to be able to find out via the following URL:

https://earth.uow.edu.au/geoserver/wfs?request=GetCapabilities

Obtaining that URL with a browser (try it) results in a large XML object that indicates that yes, in fact, a web feature service is operating at that address:

This contains a couple of useful bits of information. One, information about what feature layers are available. The following screen shot shows information (<FeatureType> elements) for two layers, which correspond to those on the layer selection box available via the website:

Two, information about what output formats are available:

Using this information, we can form a URL asking the web feature server to supply all data from the ‘crn_aus_basins’ layer as a KML file:

https://earth.uow.edu.au/geoserver/wfs?request=GetFeature&typename=be10-denude:crn_aus_basins&outputformat=KML

Using a browser to download resulting data from that URL does, in fact, yield a (fairly large) KML file containing basin outlines and a complete data dump for each element, including the critical observations such as Be-10 concentrations and standardizations. Here it is when viewed in a local copy of Google Earth:

So it turns out that it is, in fact, possible to anonymously obtain cosmogenic-nuclide data from OCTOPUS. This is quite valuable: a WFS implementation of this data set is potentially really useful for all sorts of applications, and is a major step toward fulfilling the basic goal of doing better than the current state of the art of mutually-inconsistent-spreadsheets. And we can put the “O” back in. But that was kind of a pain in the neck, it required encouraging the server to display capabilities that the website developers apparently did not want me to know about, and it was much more complicated than it had to be. Wouldn’t it be easier simply to provide direct links to the raw data, as well as the data formatted in such a way as to be easily entered into other online erosion rate calculators, on the web site?

So I’ll try to summarize here. Placing this data set online is potentially very useful. The WFS server implementation — once I located it — is enormously better than the primary existing competition in this field, which is the unwieldy spreadsheet distributed as an appendix to an article by Eric Portenga. But in my view there are several serious problems here.

One is that of credibility. A critical element of having a useful online source of these data is that it is generally believed to be a complete and accurate representation of the raw observations — locations, elevations, nuclide concentrations, etc. — and it’s hard for users to believe this if they can’t easily examine the data. The ICE-D exposure age databases have direct access through the web interface to all raw data for this reason. If you can’t examine all the data in a granular way, you start asking why you’re not being allowed to do this, and this makes it harder to believe that the data are really accurate and believable. This is a basic principle of having a data source, or anything else, become trusted and credible: if Ronald Reagan hadn’t been able to trust but also verify, the Cold War would still be in progress.

The second is just the message that is being sent to users. The really useful part of this database implementation — the web feature service that makes it possible to access these data through desktop GIS systems that one would use for data analysis, figure preparation, etc. — is completely hidden from users. One would think that this service would be the thing that is most highlighted and most advertised by the developers, because it’s the most useful and the most potentially transformative in terms of synoptic analysis of these data. It’s really a contribution to science.  This is a good thing. But instead the developers have hidden this and made it as difficult as possible to view, evaluate, or obtain source data. To say the least, this is puzzling, and, frankly, it leads the observer to uncharitable hypotheses about the motivation of the website developers. Are they trying to use this website to keep tabs on competitors who might carry out and publish synoptic research on global erosion rate distribution before they do? Is the key function of the database not so much to be useful to the overall research community as to improve the citation metrics of the authors? Not knowing any of the developers personally, I have no idea whether these hypotheses are anywhere near the truth, and, having developed similar websites, I sympathize with the feeling of having done a lot of work that other people are going to take advantage of for free, and the feeling of wanting to keep control of what I just developed. However, according to the initial email describing this project, the developers are not, in fact, giving this away for free, but rather for $176K in public funding.  Under these circumstances, the lack of openness built into this website could not do otherwise but make me suspicious of the developers’ motivations. This is not a good thing.

To summarize, it’s either an open database or it’s not. Right now it’s not: open means complete, anonymous, granular access to the raw data. We don’t have that. On one hand, this project has the potential to be fundamentally useful in enabling synoptic research into how erosion works globally. That’s good. On the other hand, the developers have unexplainably designed the website to make access to the data harder instead of easier. That’s bad, and why they have done this is a total mystery. But this project is going to remain just potentially transformative, instead of actually transformative, until they fix it.

 

 

 

 

 

 

 

 

 

 

 

Advertisements
4 Comments leave one →
  1. May 10, 2018 02:01

    Dear Greg

    I am assuming that you have been asked to review our manuscript. I think this is great! – I always enjoy reading your reviews that are always exhaustive and fair.

    But sometimes, mate, it is safe to remove the tinfoil hat. The air is actually quite fresh outside…

    OCTOPUS does not want to harm anyone, but with the 200+ GB of GIS files, we figured that sending you links via email would be the easiest way to disseminate the data. Most people like this and have sent us emails to let us know. We do not check the details entered and once you enter the info, the link is sent directly without delay.

    And because it is an OGC service, you can do all that hocus-pocus you did and get shapefiles or KLM files without letting us know that it is you.

    Now that we all know what is happening, and hopefully you feel calmer — please download some data and do a proper and thorough review.

    Thank you
    Tibi

    • Greg Balco permalink*
      May 18, 2018 04:57

      To give some help to the non-native English speakers in the audience, a “tinfoil hat” refers to the belief, which is often attributed to paranoid or otherwise mentally disturbed persons, that surrounding one’s brain with an aluminum foil shield gives protection from mind-control technology thought to be commonly used by government agencies, secret international conspiracies, and space aliens. Climate deniers, Kennedy assassination theorists, Bernie Sanders voters, and followers of Lyndon Larouche, to give several examples, are commonly described as being clad in tinfoil hats.

      So, if nothing else, readers can be sure that this message, perhaps unlike the cosmic-ray neutron flux, wasn’t brought to them by extraterrestrials.

  2. Brandon Graham permalink
    May 18, 2018 18:39

    Would a FTP server be of value to allow users easier access to the data? New to the game and don’t have experience making and maintaining large databases, but I’ve used them to get large amounts of Lidar .LAS files with ease. (>200 GB worth of data in an afternoon)

  3. Greg Balco permalink*
    June 26, 2018 01:20

    An update to this discussion is that an editor of Earth System Science Data, having read this blog posting, invited me to review the paper describing the “OCTOPUS” database. This journal has an open discussion and review process and the review, with quite a lot of additional commentary by others and responses by the authors, is at this link:

    https://www.earth-syst-sci-data-discuss.net/essd-2018-32/#discussion

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: