Thursday, April 30, 2009

Agetagging the Geoblogosphere - 2nd try

Some of you may already have noticed the new 'stratigraphic tag cloud' at Geoblogosphere News. After my first attempt to use the ageparser tool for stratigraphic indexing of geoblogosphere posts. This is a new attempt to add more semantics to Geoblogosphere News.

Any post summary is now sent to ageparser which returns the 'most likely chronostratigraphic context' of this post. These tags are store in the database and used to build the tag cloud which is an attempt to give a quick overview of the weekly stratigraphic focus of the geoblogosphere.

The fun part will be to use the temporal+spatial tags (agetags, geotags) to build some 4D search interface for Geoblogosphere News which will allow you to search for e.g. blog posts treating the Cretaceous of Spain..

Friday, April 24, 2009

How to trust the persistence of GUIDs

After some rather quiet months, members of the Taxonomic Database Working Group (TDWG) have started to discuss the GUID (LSID) issue again.
Beneath the never ending discussion on which identifier system (DOI, LSID, PURL, handle etc.) is best, some other interesting issues have been raised: does a GUID system need central services, does it need a business model (money) etc.

Most interesting Roger Hyam has questioned the trustworthiness, thus persistence of GUIDs. He comments:

What I find most interesting is that the only people who promise persistence are people who are trying to persuade you to use their system. The basic message is “don’t trust them trust us”
When we say GUID persistence what we really mean is reliability of GUID resolution in the short term. This is something entirely different from persistence for the long term! If it doesn’t work now we will never build a system that is worth preserving into the future. Lets just do the easy stuff now and then migrate it if we ever need to. I believe this is my final word on persistence of GUIDs
Good point! But instead of questioning the trustworthiness of GUIDs in general we should start thinking on ways to ensure or at least enhance it. So, how can we REALLY trust? Surely not based on promises. But based on facts!

Therefore, the only way such service providers (also archives, databases etc.) could prove their trustworthiness would be a specific audit procedure and certificates.

There are some initiatives working on this. For example the German nestor group. They developed a criteria catalogue for long term archives largely based on the OAIS model. Another check list is the TRAC list here. And there is the European DRAMBORA project which provides a repository audit method based on risk assessment'.

The TDWG GUID group was focusing on technology, which was good to demonstrate a GUID system can work for biodiversity informatics. Now that we face these 'new' problems with GUIDs (business model + trustworthiness) TDWG should continue, revitalize the GUID group and care about these issues.

A good starting point would be to develop such an audit procedure for LSIDs, based on the models mentioned above. TDWG (or some other organisation) could offer these audits and LSID authorities could get certified. This would enable them to demonstrate their trustworthiness, thus the persistence of their GUIDs. Such a certificate could electronically be assigned and added for example as a link to the LSID metadata.
And why offering such audits for free? Maybe revenues from audit certificates could be one pillar of a future business model for LSIDs .. or TDWG?

Wednesday, April 8, 2009

iPhylo: Patenting biodiversity tools

Software patents are a real pain!
Especially when claimed by members of a research community. I was quite shocked when I read Rod Page's post Patenting biodiversity tools .
Even worse to see that the same organizations - like uBio, which I was a big fan of - which make massive use of freely available information do apply for software patents!

By claiming a software patent they reserve the rights to prohibit others to use the same methods to handle biodiversity data and the uBio patent goes pretty far...

Biodiversity informatics is not the research area to earn the big money. So in this particular research context do we have to assume such patents aim to hinder scientific progress? Or what else can be the intention of such patents?

Monday, April 6, 2009

Sponge Bob is not related to humans

Pre-Cambrian life is a truly fascinating subject. A recent study looked at the genetic evolution of 128 genes of 55 species to determine the order of early branching of taxa (sponge groups, ctenophores, placozoans, cnidarians, and bilaterians). One of the questions asked was whether the nervous system evolved once or twice in the course of early evolution. The study also sheds some light on the nature of sponges, a taxon that had already fascinated Charles Darwin at the beginning of his career.

"Phylogenomics revives traditional views on deep animal relationships";
Hervé Philippe Romain Derelle, Philippe Lopez, Kerstin Pick, Carole Borchiellini, Nicole Boury-Esnault, Jean Vacalet, Emmanuelle Renard, Evelyn Houliston, Eric Quéinnec, Corinne Da Silva, Patrick Wincker, Hervé Le Guyader, Sally Leys, Daniel J. Jackson, Fabian Schreiber, Dirk Erpenbeck, Burkhard Morgenstern, Gert Wörheide und Michaël Manuel;
Current Biology online 2. April 2009;

See also: Deep Phylogeny Project

Thursday, April 2, 2009

A preprint archive for the geosciences

Today two interesting blog posts about getting hold of geoscientific papers have been published: The Open Source Paleontologist lists several ways to find and retrieve pdfs on your own. Dave Hone picks this up and complains a bit about those people who are too lazy to use these resources but are asking copies directly from the authors-thus causing extra work.

Surely, there are a variety of possibilities to find relevant literature on the web. But the majority of geoscientific papers are still published in commercial, closed access journals. So even if you found your artcle on Google Scholar, you'll frequently end on a page asking you for your credit card, unless you work for an institution which can afford the necessary licenses or it was publishe in an Open Access journal.

Open Access is a fine thing and many interesting papers already appear in open access journals such as PLOS. The idea certainly will become even more popular in the future. But in my opinion it is unrealistic to assume that researchers will favor open access journals for their top results when they have the chance to get their 'Nature paper'.
Other disciplines have reacted on this dilemma and provide and maintain so called preprint archives such as Arxiv. A preprint is a draft of a scientific paper that has not yet been published in a scientific journal. Preprint archives enable authors to quickly circulate their results and most important: copies of archived preprints are freely available!

Unfortunately, there is no dedicated geoscientific preprint archive :(( You can find geoscientific preprints in some institutional repositories and postprints on some homepages etc.., but there is no common access point.

But it should be there! Could a geo preprint archive be a community effort? Of cource it had to include the community, but could a community driven archive work?
Alternatively, libraries could take on that responsibility. So... geolibrarians could you hear me? Give us a working geo preprint archive, ..please.. (maybe you could call it Geoxiv;) ).
Sure, motivating the geo community to contribute will not be an easy job, but maybe one just has to start?