Monday, March 31, 2008

Parataxonomy vs. taxonomy

I just read Krell's (2004) interesting article about Parataxonomy, a organism sorting method based on the identification of so called recognizable taxonomic units (RTU) instead of 'real' species.

Simplification of taxonomy is also very common in paleontology (-related) investigations.

Yes I confess I also submitted studies without clarifying my taxonomic concepts, to repent I enclose a jpg of what I formerly regarded as Neogloboquadrina pachyderma Ehrenberg 1861
As far as I remember from my former life as as 'foram picker', the proper use of taxonomy is quite uncommon in e.g. paleoclimte studies. A large amount of data based on organism counts with very unclear taxonomic basis therefore already exists (which was a motivation for TaxonRank).

After reading Krell I wondered how good actually taxonomic documentation in the current literature of my former discipline today is. Therefore, I decided to perform a quick test on the 2007 Volumes of Marine Micropaleontology. I scanned 40 articles (surely not enough, but I just wanted to have a quick impression) and the result confirmed more or less my bad expectations.

Only 2 articles included a complete systematics section where the species concepts have been described including synonymy lists. At least 15 Articles provided a species list in the appendix, 6 of these lists included references to the original reference of the treated taxon. 1 article used DNA analysis.
However, the majority of articles (22) used species names but did not include any taxonomic documentation. And this was a bit surprising to me. Three of these articles included electronical supplements with species lists - unavailable for readers of the printed version. Nine of these articles at least included one or more references as taxonomic key.

But still, more than 25% of all scanned publications used species names but did not document their taxonomic concepts at all! So there surely is a lack of proper methodological documentation.


Krell(2004): Parataxonomy vs. taxonomy in biodiversity studies-pitfalls and applicability of 'morphospecies' sorting. Biodiversity and Conservation Vol. 13, p.795-812.

Friday, March 28, 2008

Introducing Agenames

We are proud to announce a first 'alpha' release of 'Agenames' , a new Stratigraphy.Net project we started last year.

Agenames is very much inspired by the geonames initiative which collects and publishes geographical names and their coordinates. In analogy, Agenames aims to collect stratigraphic terms in relation to their chronostratigraphic position (relative age).
So if you need to find out what e.g. the 'Ammergauer Schichten' or the 'Black Donald Pluton' means Agenames might help..

But Agenames will offer more! We have started some first experiments with Ageparser a text parsing tool which uses the Agenames index and is able to scan documents for stratigraphic terms, thus identify the stratigraphic context of a publication.
We will use Ageparser for stratigraphic indexing of geoscientific documents with the ultimate goal of a 4D (space and time) search engine ... sometimes...

To learn more visit the Agenames homepage at or visit the EGU 2008 in Vienna where Jens will give a talk on Agenames at Wednesday, 16 April 2008.

Thursday, March 27, 2008

The end of the sandbox

Originally, we have provided TaxonConcept's sandbox as a testing area for persons interested in the functionalities of TaxonConcept. The sandbox was a complete mirror of TaxonConcept, it was intended to be the site where people could just play around with the tool without obligations and risks after a short online registraton.
We have offered free access to the sandbox for several months, but we found that despite comparably many people have registered, none of them did really use it. After more than one year, there was only a hand full of testing entries. After comparing this little impact with the efforts required to maintain the sandbox we decided to close the sandbox.
I now wonder why that many people registered to the sandbox and apparently have decided not to use it. OK, TaxonConcept is not an easy tool, but we have now two students working with it and it took just one or two days to train them.
Probably people who registered simply did expect something completely differend to receive after clicking on the 'register me!' link, maybe something like a newsletter?

Wednesday, March 12, 2008

New business models for earth science data management

The value of data sharing has long been recognised and proclaimed in several manifestos and policies (e.g. Berlin Declaration, Budapest Open Access Initiative, OECD). Funding organisations most strongly support Open Access or even have initiated programs which aim to strengthen data and information infrastuctures (e.g. NSF). The importance of data archiving was also acknowledged by funding agencies and some of them have published good practice guides or data policys which aimed to convince scientists to publish their primary data in appropriate archives.

However, data management costs money. In the past, funding agencies preferred to fund the development of new systems, but they fail to ensure funding long term operation of the resulting infrastructures. Funding organisations have slowly woken up to the problem of how projects can be transformed into infrastructures but the problem is still not solved.

Even though the value of data sharing is recognised, there is little motivation for researchers to prepare their data for online access. It only causes extra work, does not add much to prestige and recognition among peers. From a researcher’s perspective, the money is better spent on further research. In this framework, policies on data sharing remain without effect.

This does not mean that researchers are unwilling. In fact, the majority is willing to share their data but in many cases are frustrated by the difficulties arising when they try to submit their data to a database. Many scientific database operators have not understood the paradigm shift in how the web works, the shift towards user generated content. I know, that mentioning "user generated content" in this context opens a can of worms. The point is, that most scientific databases, especially the publicly mandated ones, are not service oriented and simply rely on their mandate.

The funding agencies are in a dilemma. Their rules make it difficult to adapt to this rapidly evolving field. So, where is the business model to start-up independent and innovative, service oriented scientific databases? Restricted data access and paid services? This will not work because individual researchers are not able and willing to pay. Another possible avenue to obtain funding is to convince researchers of the benefits of data services and join scientific projects e.g. as subcontractors.

For this model to be successful requires motivation on both sides. User frustration needs to be avoided and technical as well as service infrastructure needs to be most up to date. Improved cooperation between data centres surely is an advantage to close own service gaps. Project specific data management networks which can share responsibilities might be a solution to satisfy user needs.

Friday, March 7, 2008

Dublin Core for stratigraphic units

This is a first draft which shows how I would encode metadata on stratigraphic units.
The usage of the coverage and subject (which both now contain the chronostratigraphic position of the unit) tag is still preliminary. I have to look up the DC specifications, as far as I remember there was the possibility to specify spatial as well as temporal coverage..

<oai_dc:dc xsi:schemaLocation="">
Licensed under a Creative Commons Attribution 2.5 License
<dc:type>Stratigraphic Unit</dc:type>
<dc:title>Ammergauer Schichten</dc:title>

Monday, March 3, 2008

Taxonomy as a form of art

From 29 February to 9 March 2008 the Natural History Museum in Berlin is stage for "HUM - The Art of Collecting - A Taxomaniac Parcours". The parcours leads through the Natural History Museum collection and touches on issues such as:
- The meaning of names, originals and order,
- The temporal validity of knowledge,
- The scope of cognition and pattern recognition,
- The economy of categories,
- The mental and vital capacity of our world.
The performance is complemented by interviews with the museum's custodians.