Tuesday, December 16, 2008

Call for Disasters in the Field Submissions

Gilian Ice and Darna Dufour from the Ohio University are preparing a book about disasters that happened to people during their field work! Today, Philip Cantino sent around this message via the taxacom list which mainly addresses biologists. However, I hope this message is also spread among geologists. Many heroic situations in exotic places have been experienced by some geobloggers (e.g. reprted here ,here, here, here, here, here, here, here , here, here , here, here, and here). I could imagine you could also tell hundreds of desaster stories ;)

Here is the message sent around today by the taxacom list:

Call for Disasters in the Field Submissions

Have you ever had essential equipment fail when you are in the middle of the jungle? A difficult student? Research permission revoked in the middle of the project? Been struck by lightning? Dropped your camcorder in a river? We need your stories about
challenges in field research. We all know that the old adage, "what can go wrong, will go wrong" often holds true when we do international field research. However, we all find creative ways of working around these potential disasters. We are working on a book titled, Disasters in the Field: Preparing for and Coping with Unexpected Events to be published by Altamira Press. The purpose is to present students and researchers with an overview of problems associated with doing international fieldwork-to provide them with practical suggestions that will help them prepare for the field and minimize the impact of unexpected events. We're going to use real
stories to make these issues come to life. If you have a story about any of the topics below, please consider submitting it.
Stories are 200-1000 words and will be incorporated into chapters. You have the option of being credited with the submission or requesting that it be anonymous. Please contact Gillian Ice and Darna Dufour at field.disasters@oucom.ohiou.edu if you are interested. We'll provide you with guidelines and deadlines.

Monday, December 15, 2008

Paleostrat is now GeoStratSys


Paleostrat has changed its name to GeoStratSys.

"The GeoStrat Digital Information System (GeoStratSys) provides a desktop working environment for stratigraphy-related geologic data. Providing a secure private working space, access to public data, imbedded GIS capabilities, and visualization and analytical tools, it may be used by individuals, collaborative projects, and organizations in support of research, publication, and public outreach. GeoStratSys is an evolutionary version of PaleoStrat that it replaces. It has been designed to move away from the classic web site that delivers static information to the next generation of web-accessible, digital information systems."

They just started, so most services are not yet finished. However this really sounds exiting! I'm pretty curious how the final system will look like.

Friday, December 12, 2008

My Eocene Honey Bee

In his Amphibol blog Gunnar reported about a fossil insect literature list at fossilinsects.net which brought up some memories ...

As a student I was working at the Eocene Eckfelder Maar fossil site which is well known for it's rich insect fauna. During this job we (OK it was Stefan..) found a well preserved honey bee. A big surprise as no other honey bee of this age was known at this time (1992). Fossilinsects has a link to the EDNA Fossil Insects Database where I queried for insects from Eckfeld, Germany. The first hit was Eckfeldapis electrapoides. Googling for Eckfeldapis brought several hits e.g. a citation for:

Lutz, H., 1993. Eckfeldapis electrapoides nov.gen.n.sp., eine "Honigbiene" aus dem Mittel-Eozän des "Eckfelder-Maares" bei Manderscheid/Eifel, Deutschland (Hymenoptera: Apidae, Apinae). Mainzer Naturwissenschaftliches Archiv, 31, 177-199.


I guess this was the bee we found this day...

Anyway, these 4 weeks in Eckfeld were great and exciting and this evening I discovered some old photos which brought even more nice memories..

Wednesday, December 10, 2008

Taxonomy Puzzles


In his blog Taxonomy vs. Systematics Dave Hone suspected that at least some paleontologist are not aware that systematics and taxonomy are two different things.
Even worse: As I wrote earlier both are completely ignored in a large number of paleontology related research articles! The authors of these articles do use species names but there is no 'systematics/taxonomy' section which explains the taxonomic concept of the author.

Some may now argue that this is not a problem because many species used in these investigations are so common or simple that everybody knows what is meant by species XY. But this is not true. For example it has been shown by the 'ElKef blind test' (to clarify how aprupt the K/T extinction really was) that differing taxonomic concepts can drastically reduce the comparability of scientific results (Lipps, 1997, Keller, 1997).
There are many reasons why this happens: Taxon names change, names are wrongly assigned or different reference specimen or images are used etc. The result is a big chaos of synonymies which is hard to unpuzzle. Therefore, scientists may use the same taxon name but mean totally different species.

By carefully prepared synonymy lists taxonomists traditionally try to track such taxonomic changes. With our TaxonConcept tool we have stored thousands of synonymy lists and it is often surprising how complicated it can get when e.g. synonyms of taxon names get their own synonymy lists or taxon names have been wrongly assigned. As an example, the picture on the top of this page which shows a graph of the relations of the taxon 'Archaeoglobigerina cretacea' to other taxon names which includes hundreds of other taxon names.
So, if you use taxon names in your studies try to provide some hints on what your taxonomic concept is. Otherwise you may leave future generations of with taxonomy puzzles like this.

References:

Keller, Gerta: Analysis of El Kef blind test I, Marine Micropaleontology, Volume 29, Issue 2, January 1997, Pages 89-93.

Lipps, Jere H.: The cretaceous-tertiary boundary: The El Kef blind test, Marine Micropaleontology, Volume 29, Issue 2, January 1997, Pages 65-66.

Tuesday, December 2, 2008

Agetagging the Geoblogosphere - 1st try

I took blogs of the last 7 days listed in Geoblogosphere - News and threw them against my ageparser tool. Here is a wordle illustrating the result of my first attempt to agetag the geoblogosphere.
The stratigraphic focus of the last 7 days seems to be on the Ordovician - really?. The difficulty is to identify the 'real content' of a blog and the results seem to be biased by the surrounding html somehow. For example Geology.com News has a 'related stories' section with numerous entries on Barnett Shale.

Monday, December 1, 2008

Metadata: Oh - The Pain!

If you really want to scare away scientists from your project, say: "metadata". Most scientists develop a skin rash on having to deal with metadata, yet useful information systems depend on metadata of some form or other. Many metadata are straight forward and can be generated from context. The most difficult metadata elements seem to be keywords.

Isn't there a way to generate metadata automatically? Well, there are methods proposed and there are some tools around for automatic metadata extraction. With less guesswork involved, the extraction process can be made more efficient. Therefore it is also useful to know which metadata can be embedded in which file formats.

A report by Polfreman and Rajbhandari, published last week in the JISC information environment repository. The extensive report looks at methods and tools for automated metadata generation, mainly from the angle of generating Dublin Core metadata for institutional repositories.

Polfreman, Malcolm, und Shrija Rajbhandari (2008), MetaTools - Investigating Metadata Generation Tools , JISC, London, United Kingdom. [online] Available from: http://ie-repository.jisc.ac.uk/258/



Robert has already looked at the tool offered by Yahoo! and considers it potentially useful for Stratigraphy.net. We'll keep you posted on any progress we make with automated metadata extraction.

Wednesday, November 26, 2008

Geoblogosphere aggregator

I have started a new experiment to test my ageparser tool: I will try to determine the 'chronostratigraphic pulse' of the geoblogosphere which I intend to publish monthly.

As a starting point I have created a geoblogosphere aggregator, which collects titles, urls etc from the RSS feed of as may geoblogs as possible, basically those blogs listed here in geoberg.de. Check it out.. new blogs can be entered by visitors themselves. The next step will be to add stratigraphic information which will be extracted automatically from the blog content. If you are interested how this works visit my agesearch toy..

Currently listed blogs are:

Wednesday, November 12, 2008

Neogene Chaos

The Tertiary is gone since 2003 but the discussion on the stratigraphic classification of the Neogene is still ongoing. For example, the International Stratigraphic Commission decided to ban the Quaternary from the official time scale, but geologists have sucessfully fought to keep this system (see for example this document). Both, Quaternary as well as Tertiary were regarded as 'remnants of the Neptunist concept of stratigraphy'. Strange, as recently, the 'Anthropocene' was proposed in 2007 by the Stratigraphic Commission of the Geological Society (UK) which also caused some blog echo for example here and here.

These 'outs' and' ins' are hard to follow, therefore the International Comission of Stratigraphy quite frequently provides updates on their stratigraphic chart at http://www.stratigraphy.org/cheu.pdf. Every time I visit this page this 'standard chart' looks different and unfortunately the stratigraphy.org page does not archive older versions of the chart for comparison.
Fortunately there is the Internet Archive which never forgets! Just visit http://web.archive.org/web/*/http://www.stratigraphy.org/cheu.pdf and you can download older versions of the stratigraphic chart back to 2004: a 'sequence of stratigraphic charts'.

Are you an old-fashioned geologists and still keep on using terms like 'Tertiary'? You are not alone ;) Indeed the Commission itself was not really sure how to deal with these Neptunistic concepts even after their 2003 decision. A closer look at these old charts quite nicely illustrates the discussion which followed:

  1. 2004 The Tertiary and the Quaternary ... both gone ...

  2. December 2005: The Quaternary is back! looks strange, but it's there..
    The * is a placeholder for the following footnotes:
    • until April, 2006: 'Proposed by ICS'
    • from October 2006 on: 'Formal chronostratigraphic unit sensu joint ICS-INQUA taskforce (2005) and ICS.'

  3. October 2006: The Tertiary is back!
    Footnote:' Informal chronostratigraphic unit sensu Aubry et al. (2005, Episodes 28/2).'



  4. September 2007: Tertiary off again.. and a strange line from Quarternary to the base of Gelasian..
    Footnote: The status of the Quarternary is not yet decided.




  5. Current version (Nov. 2008): Quaternary has two potential bases?



What comes next? The International Stratigraphic Comission provides an interesting pdf which is named: 'A Proposal for Simplifying the International Geological Time Scale Chart' ... and this is the simplification for the Cenozoic:

No comment...

Friday, November 7, 2008

CollectConcept

Today 'exhumed' one of my older projects I didn't touch for weeks: CollectConcept. This post shall remind me in the future to proceed with it and finish it soon ;)
CollectConcept is a online collection management tool and initially was an excercise to play with museum standards and OAI-PMH. It is now used for the management of the objects of the 'Deichmuseum Dorum' but still it's in a early beta phase...

Thursday, October 30, 2008

Biblical genesis most likely of Ordovician age


One of the reasons why Agesearch is still alpha is that it sometimes returns false positives e.g. when stratigraphic terms are ambiguous such as 'Canadian'.
I am testing this by feeding Agesearch with terms which should have no chronostratigraphic context, such as 'beer' or 'pizza' etc.

After it surprisingly turned out that beer does have a stratigraphic context, I formulated some more provocative queries.
The definetely funniest result was Ordovician as the best stratigraphic context for the query 'bible genesis'.

Ha! in stratigraphic terms this would indeed be a good compromise between the creationists and scientists point of view ;).

Wednesday, October 22, 2008

Status of the Geoblogosphere

Here is a really good analysis on the current status of the geoblogosphere, an initiative of the NOVA geoblog.
Interestingly, the majority of geo bloggers are students, folowed by faculty staff and industry employees. And if you take the numbers on slide 27 and weight bloggers with 'professional background' with against the rest you get a ratio of approx. 1:1!

What does this mean for science communication in the age of web 2.0? At least it shows that not only paleontologists are late adopters ;)

P.S.: A very good compilation of geoblogs can be found at geoberg.de which offers a categorized list of geoblogs.

Monday, October 20, 2008

Visiting OneGeology


After reading this enthusiastic blog on OneGeology I was curious how cool OneGeology really is and tried to play with it a bit.

The first thing I got from OneGeology was a "Your browser is not supported" message: "This application is optimized for IE6, IE7, Flock 1.2 and Firefox 2".
Good to know... but not too cool: I am using Firefox 3. So I had to switch to Internet Explorer and a nice Google-Maps like satellite image was shown by their browser plug-in after I entered their 'portal'.

My plan was to get a geological map of the area around Moixent (Valencia, Spain) where I did the field work for my diploma thesis. I zoomed to Spain to get an overview of the geology of the prebetic mountains. Still no geological map appeared. It took me a while to discover that (and where) I had to add additional layers. Finally, a nice geological map appeared ... but.. now the satellite map layer disappeared and I had no idea where to zoom next. Here I stopped playing with OneGeology.

I would say OneGeology still is a cool project, I know how hard it can be to bring people together to share data. The list of contributing parties is impressing and really a big success for OneGeology.

However, for field geologists (and others) the tool is far from being useful in it's current state. The map is much too small, a legend is missing, there are little export options and the overall usability could be improved.

Their press release states that OneGeology aims to do "the same for the rocks beneath our feet that Google does for maps of the Earth's surface". There is still a lot of homework to do to reach this goal.

Saturday, October 11, 2008

Eocene beer


Finally I found something on the web which indeed relates geology and beer and is NOT about drinking;)

This blog on Geology News reports about beer which is brewed by 'Fossil Fuel Brewing Co.'. The company produces beer with yeast which was extracted from the gut of an Eocene insect found in a piece of fossil amber.

Sunday, October 5, 2008

Stratigraphy for web searches: Agesearch


This is a screenshot of Snet's latest toy: Agesearch. It is a demonstration mashup of basically two web services, the Google API as well as our own Ageparser REST service. I will post more technological details about all this soon.

Agenames is a scientific web search engine which allows to estimate the "stratigraphy" of web search results. It scans found web pages for names of stratigraphic units, e.g. litho-, chrono or biostratigraphic zones, and displays the best matching stratigraphic context of the search term.

It can be useful if you need a quick estimate on the stratigraphic context of e.g. a geographic region or a distinct fossil name try e.g. this search for Morozovella angulata. You can also do some fun queries, e.g. to give an overview on the stratigraphic expertise of one of your colleagues if you enter his name.

Tuesday, September 23, 2008

Chronotagging vs. agetagging

The new version of ageparser is almost ready to be released, so I was thinking which term would be appropriate to describe what ageparser does.

Ageparser V.2 is going to provide a REST webservice which will analyse strings or webpages for stratigraphically relevant terms and return the weighted results (the identified stratigraphical units and their chronostratigraphic position) as JSON or XML. So basically, the service returns a list of stratigraphic terms which can be use to describe or tag the content of a document. Therefore I thought something like 'agetagging' or 'chronotagging' perfectly describes what ageparser does.

After googling for those terms I found that 'chronotagging' is already used when a page is tagged with a specific date. But 'agetagging' seems to be a term which does not yet have another meaning, so I probably will use 'agetagging'..

Wednesday, September 17, 2008

Is Beer bad for Science?

In this blog we had looked at some social factors in science already. Geologists have the reputation for really liking beer. But what does it do to science? Read more at FREAK Shots.

Monday, September 1, 2008

Stratigraphy.net internals on public radio

Last week Thursday (28 August 2008, 1930h), "Stratigraphy.net internals" was featured briefly on Deutschlandradio Kultur. In their series "Forschung und Gesellschaft" (Research and Society) the broadcast "Unter dem Wikiskop" (Under the Wikiscope) by Jana Wuttke looks at how digital networks revolutionise the way scientific research operates. "Stratigraphy.net internals" is mentioned as one of the few examples of science blogs from Germany.

I am sure, there must me more science blogs out there. We will be on the lookout for more of them.

Monday, August 25, 2008

International Geo Sample Numbers (IGSN) in publications


I came across another interesting article by Rod Page. He reports on his attempt to use regular expression to find Genbank identifiers in full texts. His regular expression worked for Genbank identifiers but surprisingly also matched UTM coordinates ergo gave some false positives.

At the first sight, the identifiers he found looked very similar to those used by the International Geo Sample Numbers (IGSN) project which aimed to resolve ambiguities sample naming.

IGSNs are assigned to samples (on request) by a central registry (SESAR) which cares for unambiguity of these identifiers. By definition, a IGSN is a 9 digit identifier where the first 3 digits stand for the institution responsible for the sample and the remaining 6 digits for the sample itself, for example HRV0002Y4.

So can such mismatches as reported by Rod also occur if I would search for IGSNs in scientific articles?
To test if there are other identifier systems using the same pattern, I found the cool exalead search engine, which allows to use regular expressions for web searches. The regular expression which would match such identifiers is [A-Z]{3}[0-9]{6}.
And indeed, the first match exalead returned leads to data from the Interpro protein database which uses the pattern IPRxxxxxx for it's accession numbers. Good that SESAR has not yet assigned the prefix IPR ;) However, the example shows that theoretically IGSNs can be ambiguous.

Today IGSNs are mostly used in geoscientific sample (core) repositories and there they truly are unambiguous. However, most probably these identifiers will also be used in scientific publications. As molecular biological methods are sometimes used in paleontology and paleoclimatology studies, it is not completely unlikely that such accession numbers are used in publications together with IGSNs. Trouble for geoinformatics text mining applications;)

A simple solution for this dilemma would be to recommend authors to cite IGSN as IGSN:ABC012345, a IGSN: followed by the 9 digit identifier. This is already the way they are displayed on the bar code labels SESAR provides.

Sunday, August 24, 2008

Biodiversity informatics session at EGU 2009

I just discovered the provisional programme for the EGU 2009 ESSI (Earth and Space Science Informatics) sessions. The session topics really sound interesting... and surprise... : for the first time there will be a biodiversity informatics session.

Seems as if things will grow together and, hey, is this the beginning of a biogeoinformatics community? ;)

Friday, August 22, 2008

Disinforming Google Street View


After the first Google cars appeared here in my home town Bremen, Germany, many people have been concerned about Google's Street View activities. But apparently the legal situation does not allow hindering Google to make pictures of every corner of the city.

I personally was very amused to see the Google car in my street just before I finished painting our house;)

So what can you do to protect your privacy at least a bit when Google comes? By camouflage and disinformation;)

Friday, August 15, 2008

Citation parsing

The next version of ageparser will extensively use regular expressions to identify stratigraphic terms. While working on this, I also played with some regular expressions which are useful to identify citations within a scientific document and to parse authors and year of these citations. I assume this is a quite common task for some of you, so maybe you find some of the following expressions useful for your own code:


Pattern for common person names:

$personpat=(([Bb][Ee][Nn]\s|[Dd][Ee]\s[Ll][Aa]\s|P'|[Dd]'|[vV][Aa][Nn]\s|[vV][Oo][Nn]\s|[dD][eE][lL]?\s|[dD][iI]\s)?[ÄÖÜA-Z]{1}[A-ZÄÖÜÒÓÀÉÈóòäöüàéèâa-z-]{1,})

Pattern for authors:

$citpersonpat=$personpat."{1}(,\s".$personpat.",?)?((\sand\s".$personpat.")|((,\s".$personpat.",?)?\set al(\.|[i]{2})))?"

And two patterns for citations:

$citpattern1="[\s\.]{1}(".$citpersonpat.")\s\([0-9]{4}[a-z]?\)"
$citpattern2="[\s\.]{1}[\(,;]?(".$citpersonpat.")[,;]?\s[0-9]{4}[a-z]?[\),;\.\s]"

Tuesday, July 29, 2008

Cruzified by Cuil


I just visited cuil (self description: the 'world biggest search engine') and was searching for TaxonConcept to see if the site is already indexed.
Besides some TDWD WIKI entries on the TaxonConcept Scheme, cuil showed links to some paleonet newsgroup posts where I mentioned TaxonConcept. No link to TaxonConcept or Snet which is a bit disappointing but not surprising for a search engine start up.

The real surprise however was the thumbnail which cuil displayed (for whatever reason) beneath the link: A cruzifixion scene! Hmm.. does this mean something ?

Friday, July 18, 2008

Analysis of Author Networks in Wikipedia

The Social Sciences Department of the J.W. Goethe Univsersity Frankfurt sent out a press release on idw about a nice piece of work by Christian Stegbauer, a professor of social science at this department.

In his work, Stegbauer analysed the network of Wikipedia authors and their contributions to Wikipedia topic discussions on philosophy. The network analysis - or graph analysis - showed the social network of Wikipedia authors, how they interact and how they are connected through common topics.

In their graph analysis Stegbauer et al. used an approach very similar to TaxonRank. Of course, the same questions that Stegbauer asked about Wikipedia author networks could also be applied to taxonomists.

Thursday, July 17, 2008

Geology-related Reality TV

In his Arizona geologist blog Lee Allison featured a reality TV show on drilling for oil. "Black Gold - 2 Miles Deep or 6 Feet Under". Sounds dramatic, doesn't it? Is there any geology footage on YouTube? I haven't checked for that yet.

Wednesday, July 16, 2008

Fearless geologists

Stefan just sent me a scanned page of the Johannesburg Star, containing a very funny story about a (fictitious) 'Survivor' like tv show: Some geologists were sent to a very active volcano and the winner would be the 'hard-core' remaining geologist. I found some pdf scans of this article also on Chris Rowan's blog article Surviver: Geologists where you can read the whole story.

Regarding fearless geologists I also found MJC Rocks article A Carnival of Death-Defying Geologists ....

And there is this an article on Uncylcopedia on Geologists which is also very funny and worth of reading it ;)

Well, and I simply could not resist to post this picture showing 5 young fearless geologists jumping from a very high dune in central Iran - also sent to me by Stefan, who saved my day;)

Tuesday, July 8, 2008

Geoparser

Today I was scanning the web for tools which are able to scan documents and identify locations or coordinates (which we'll need to reach of our ultimate goal of a 4D (space and time) index and search engine ;) ) and found Rod Page's interesting article: iPhylo: From PDFs to Google Earth.
He offers a online service probably based on some regular expessions?, which is able to extract coordinates from pdf files and returns KML or JSON files. A simple and pragmatic approach. Cool!

I also found some geoparser tools which are able to identify location names in texts. The most interesting is Metacartas geoparser API which seems to give good results. Metacartas internet pages offer some impressive examples on how this API can be used.
Another geoparser is DIGMAP's text mining service which returns some OGC compliant XML file containing all found (not only geographic) features.
And there is MEDINA's geoxwalk which seems to be restricted to the british islands. However, I could not test this tool: the mentioned site only offers a screenshot and some pdf documnents on this tool.

Metacarta's geoparser seems to be the most advanced solution, however it's a service offered by a commercial company and unfortunately their 'terms of use' page returns a 404 error. Most probably they will not offer this service for free.

I wonder if it would be possible to create something similar to agenames which identifies location names and returns coordinate pairs. Maybe based on the geonames gazetteer?

Friday, July 4, 2008

The new Stratigraphy.net pages

Today, I released the new Stratigraphy.net pages which are a real improvement in comparison to the old Snet homepage. The site structure is simpler and much easier to navigate. It now contains a data portal which allows searches on Snet data from our subprojects TaxonConcept and Agenames. I already wrote some comments on the technical details of the new data portal which is based on the OAI-PMH (Open Archives) protocol.

Interestingly, the Snet data page allows even more flexible queries than the original search pages of TaxonConcept or Agenames. For example authors of taxa or localities of stratigraphic units can be used in queries while TaxonConcept and Agenames only allow queries on the taxon name or name of a stratigraphic unit respectively.

Saturday, June 21, 2008

Back to the roots - Chiemgau impact

This is a bit off topic, but since I am in holidays...:
The chiemgau impact is a controversially discussed, postulated meteorite impact, located in the heart of my Bavarian home, the Chiemgau. The change log for the german Wikipedia entry has more than 500 (!) pages and shows the quality of the controversy.
One of the pro arguments was the discovery of some strange, finger-print like looking surfaces on stones collected in the Chiemsee, which were identified as 'regmaglypts' (surface ablation structures produced by partial melting of the meteorite surface when it passes the atmosphere) by the impact advocates.
I found this explanation very unlikely, as regmaglypts are extremely rare, and such ornamented stones are very common in this lake. The carbonate rich waters of the Chiemsee favors calcareous algae such as Charophyta (v) or Cyanobacteria. The bottom river Alz which origins in the Chiemsee is covered near Seebruck by recent calcareous Oncoids (e.g. Rott, 1991, Hägele et. al, 2006). Further, similar structures are quite common and well known at the Bodensee, Attersee and other alpine lakes area where they are known as 'Furchensteine'.
Therefore a biogenic origin of those 'regmaglypts' seems to be more likely, most probably produced by endolithic cyanobacteria. I reported my suspect, and my hint was frankly published on their homepage ... but they didn't really believe.
I currently am visiting my home village and last rainy Tuesday I had the opportunity to visit the lake Chiemsee and could collect and observe a large number of these Furchensteine near Chieming. Here they are really very common, I could count up to 40 of them per square meter.
I found dry specimen at the lake beach as well as some fresh, in-situ exemplars in the lake itself covered by water. Those Furchensteine which are covered by water, show a cauliflower like structure, made up by some kind of organic coating, most probably algae. After scratching off this coating, the typical finger-print structures appear.
For me there is no doubt that these structures are biogenic. If you want to see more pictures, I have uploaded some here.

Tuesday, June 17, 2008

What LSIDs are good for

Recently, I found David Shorthouses blog post on LSIDs (Life Science Identifiers) where he reports that apparently several LSIDs from different data sources exist for a distinct (spider) taxon. His intended use for LSIDs was to use them instead of the real taxon names and to "confidently link names with other sources of information such as information about the type specimens, gene sequences, synonyms, specimens etc". But for this purpose he concluded LSIDs are useless without a centralized identifier registry which ensures that a taxon name has only one LSID.

Well, replacing taxon names by unique identifiers is of course not the most obvious usage of LSIDs. LSIDs are only useful to persistently link to a electronic ressource and are not suitable to link to an abstract entity such as a taxon name. Instead, LSIDs link to a set of metadata, which contains the information which was considered to be useful by a LSID authority. Thus, LSIDs for taxon names link to nothing more than to the electronic resource (metadata, data sheet) which represents the LSID authority's concept of this taxon (btw.: a very interesting article on the whole complex of taxon names, identifiers, authorities, 'real taxonomists' and 'name users' is the Nature paper written by Nimis (2001): A tale from Bioutopia.).

LSIDs are however extremely useful for taxon names when they link to an electronic ressource which serves as authoritative record for this name. Especially LSIDs for newly assigned names which are officially registered by the ICZN in ZOOBANK can serve as citation for the name. This is similar to the usage of DOIs for scientific primary data. R.Pyle for example linked his newly assigned fish names to ZOOBANK records by LSIDs in his recently published paper. An impressive demonstration how useful LSIDs for taxon names can be.

Thursday, May 29, 2008

Geoinformatics 2008 Programme On-line

The meeting programme for Geoinformatics 2008 is now on-line at <http://gi2008.gfz-potsdam.de/index.php?id=1037>

Jens wrote in his announcement email today: 'The meeting provides an international forum for researchers and educators from earth and planetary sciences, and information technology/computer science to present new data, data analysis or modelling techniques, visualization schemes or technologies as they relate to developing the cyberinfrastructure for the geosciences.'

There will be many interesting talks for those who are interested in SSP (Sedimentology, Stratigraphy and Paleontology) geoinformatics. Among others for example the keynote on OGC standards by G.Perceval, the talk on GeoSciNet by W. Snyder et.al or the presentation of OneGeology by I. Jackson.

And there is another very interesting talk: Neptune - Developing a Digital Information Infrastructure for Micropaleontology in the 21st Century presented by Lazarus D. et al. Good to hear that the flagship of CHRONOS will be continued!

Tuesday, May 20, 2008

Salisbury Craigs

You may have noticed this blog's new look;) I have adopted the design of the blog to the look of the upcoming new version of Snet.
The image is taken from the classic publication 'Essay on the geology of the Lothians' written by Robert J. H. Cunnigham in 1838.
It's a colored drawing of the famous 'Hutton Section' at Salisbury Crags, Scotland where a prominent irregular junction between sandstone and dolerite can be admired. The discovery of this process -a magmatic intrusion- was very important for Hutton's theory of rock formations.
I don't know who the illustrated geologist is, but this wondering man just perfectly illustrates how science should work...

Friday, May 9, 2008

AGU data position statement - YACOA

In the recent volume of EOS (Vol 89(16)), AGU invites for comments on its data position statement which earliest version dates back to 1997. Needless to say that such statements are a good thing and many other organisations have also provided their statements on open access to scientific information.

However, I found it strange to read the new draft without finding a comment on the role of the societies themselves.
Those who work in scientific data management know how difficult it is to motivate researchers to submit their data to archives. Submitting data to data centers is voluntary for researchers. To my knowledge, there is no funding agency which obliges researchers to submit their primary data to data archives nor is there a publisher of geo-journals which asks for links to archived primary data on which the results of an article are based on.

Now, AGU is the publisher of about 20 high impact journals and it is astonishing but true that AGU publishing
itself does not provide a real data policy which for example cares about primary data! This is strange, and I really think AGU should consider its own responsibility and its potential to act as exemplar when calling for data policies.
Without an own true data policy which includes clear guidelines and rules for data handling within the responsibility of AGU such statement is a paper tiger, a YACOA: yet another call for open access.

Monday, May 5, 2008

We're living in a georeferenced world

In an editorial in its current issue (1 May 2008), Nature calls for georeferencing of research data. Guessing locations from place-names can only be a workaround. With GPS technology at hand it should be so easy to record the time and the place where a sample or specimen was taken.

"Gene sequence and structure databases have flourished in part because journals require authors to submit published data to them. It is worth considering a similar requirement that all samples in a published study be registered, along with GPS coordinates, in online databases such as the Global Biodiversity Information Facility. At the same time, it would behove spatial scientists to articulate to the broader research community the potential of recording and making accessible spatial data in the appropriate formats — and the painlessness of the process."

Hopefully, scientists will listen to Nature.

Friday, April 25, 2008

Agenames Vienna presentation

The slides of the talk (Google docs version) Jens gave on Agenames during the EGU 2008 are available here:

http://docs.google.com/Presentation?id=dfx3hqfg_7gftng9dm

Thursday, April 24, 2008

Technological Twists on Taxonomy

Browsing through Nature, I came across a book review by Kevin Kelly on "Systematics as Cyberscience: Computers, Change, and Continuity in Science" by Christine Hine. In his review Kelly takes a look at why taxonomy has been so slow to adopt the new tools provided by information technology.

"Taxonomy, the science of identification and classification of new species, has been one of the slowest disciplines to adopt computers. When most other scientists routinely use these number crunchers to detect patterns within large sets of data, why have taxonomists only recently started to use them?

The reasons are many. Foremost has been the subtle variation among closely related species, which makes quantification of their traits difficult. No computer program can outdo the highly refined judgements of a taxonomic expert who can classify from nuanced alterations even the smallest organism. Consequently, new species are identified and described in a manner that would have been familiar to Charles Darwin 150 years ago.

Second, much taxonomic information has been, and remains, parochial. The expertise required for classifying fly parasites has little in common with that for fungal species or whales. Taxonomic information occupies niches — niche being the exact biological term for these narrow confines. Specialized niches of information with their own protocols challenge computerization.

Third, the low priority given to taxonomy has meant it is perennially underfunded. High-powered computation and software come low on the list after the meagre needs of traditional taxonomy are (barely) met.

Despite these hurdles, the related field of systematics (exploring relationships between organisms over time) is rapidly transforming itself as computation becomes integral. In Systematics as Cyberscience, sociologist Christine Hine investigates the effects of computers and communication technology on the taxonomic community."

Read more ... (Sorry, no Open Access).

Friday, April 18, 2008

A preview of the new Stratigraphy.net look

For quite a while we have been very unhappy with the old Snet homepage. The content is completely outdated and simply does no more reflect or promote our current activities. Further, after more than 5 years we found it was really time to reconsider the overall look of the homepage ;)
The old version was based on the CMS Contenido which is nice, but develloping modules for such a system e.g. a search interface allowing direct access to Snet data is a pain. Further, contenido now is popular enough to attract hackers which forces us webmasters to carefully watch the latest vulnerability reports, thus frequently perform security fix updates.

Therefore I decided to completely redesign the Snet homepage. The new version will be home brewn, slimmer but more informative. It will concentrate more on content, Snet data , services and news.

To keep maintenance costs as low as possible, Snet 2 will support major standard protocols to ingest content from various sources. For example we will use OAI-PMH to collect data from Agenames, Taxonconcept (and possibly other 3d party sources). It will also use RSS feeds to include news from all Snet projects as well as from this blog.

More datails on the new Snet architecture will be published here soon. By now I would like to invite those interesed to visit the first beta of Snet2 here:
http://www.stratigraphy.net/Snet2.

Wednesday, April 2, 2008

3D Scanners for Paleontology

Browsing through the pages of the Arts and Humanities E-Science Support Centre, I came across a really cool project: a 3D colour scanner.

This is what their short write-up states: "

E-Curator : 3D colour scans for remote object identification and assessment

E-Curator
This project will use University College London's collections and state of the art 3D colour scanner, which can revolutionize the traditional methods in museums and archives based on text and images. The project envisions to use 3D recording to describe artefacts as a whole. This method will offer yet unknown details and insights into the object's structure. Such 3D scans could then help with the identification of degraded surfaces. They would allow comparisons of whole three-dimensional objects. As a proof-of-concept, six artefacts will be 3D-scanned and stored at UCL and federated sites."

Wouldn't this be a really useful application for sharing paleontological collections without actually having to move anything physically?

Is anybody using this technology already in paleontology? I am curious to find out more.

Monday, March 31, 2008

Parataxonomy vs. taxonomy

I just read Krell's (2004) interesting article about Parataxonomy, a organism sorting method based on the identification of so called recognizable taxonomic units (RTU) instead of 'real' species.

Simplification of taxonomy is also very common in paleontology (-related) investigations.


Yes I confess I also submitted studies without clarifying my taxonomic concepts, to repent I enclose a jpg of what I formerly regarded as Neogloboquadrina pachyderma Ehrenberg 1861
As far as I remember from my former life as as 'foram picker', the proper use of taxonomy is quite uncommon in e.g. paleoclimte studies. A large amount of data based on organism counts with very unclear taxonomic basis therefore already exists (which was a motivation for TaxonRank).

After reading Krell I wondered how good actually taxonomic documentation in the current literature of my former discipline today is. Therefore, I decided to perform a quick test on the 2007 Volumes of Marine Micropaleontology. I scanned 40 articles (surely not enough, but I just wanted to have a quick impression) and the result confirmed more or less my bad expectations.

Only 2 articles included a complete systematics section where the species concepts have been described including synonymy lists. At least 15 Articles provided a species list in the appendix, 6 of these lists included references to the original reference of the treated taxon. 1 article used DNA analysis.
However, the majority of articles (22) used species names but did not include any taxonomic documentation. And this was a bit surprising to me. Three of these articles included electronical supplements with species lists - unavailable for readers of the printed version. Nine of these articles at least included one or more references as taxonomic key.

But still, more than 25% of all scanned publications used species names but did not document their taxonomic concepts at all! So there surely is a lack of proper methodological documentation.

Reference:

Krell(2004): Parataxonomy vs. taxonomy in biodiversity studies-pitfalls and applicability of 'morphospecies' sorting. Biodiversity and Conservation Vol. 13, p.795-812.

Friday, March 28, 2008

Introducing Agenames

We are proud to announce a first 'alpha' release of 'Agenames' , a new Stratigraphy.Net project we started last year.

Agenames is very much inspired by the geonames initiative which collects and publishes geographical names and their coordinates. In analogy, Agenames aims to collect stratigraphic terms in relation to their chronostratigraphic position (relative age).
So if you need to find out what e.g. the 'Ammergauer Schichten' or the 'Black Donald Pluton' means Agenames might help..

But Agenames will offer more! We have started some first experiments with Ageparser a text parsing tool which uses the Agenames index and is able to scan documents for stratigraphic terms, thus identify the stratigraphic context of a publication.
We will use Ageparser for stratigraphic indexing of geoscientific documents with the ultimate goal of a 4D (space and time) search engine ... sometimes...

To learn more visit the Agenames homepage at http://agenames.stratigraphy.net or visit the EGU 2008 in Vienna where Jens will give a talk on Agenames at Wednesday, 16 April 2008.

Thursday, March 27, 2008

The end of the sandbox

Originally, we have provided TaxonConcept's sandbox as a testing area for persons interested in the functionalities of TaxonConcept. The sandbox was a complete mirror of TaxonConcept, it was intended to be the site where people could just play around with the tool without obligations and risks after a short online registraton.
We have offered free access to the sandbox for several months, but we found that despite comparably many people have registered, none of them did really use it. After more than one year, there was only a hand full of testing entries. After comparing this little impact with the efforts required to maintain the sandbox we decided to close the sandbox.
I now wonder why that many people registered to the sandbox and apparently have decided not to use it. OK, TaxonConcept is not an easy tool, but we have now two students working with it and it took just one or two days to train them.
Probably people who registered simply did expect something completely differend to receive after clicking on the 'register me!' link, maybe something like a newsletter?

Wednesday, March 12, 2008

New business models for earth science data management

The value of data sharing has long been recognised and proclaimed in several manifestos and policies (e.g. Berlin Declaration, Budapest Open Access Initiative, OECD). Funding organisations most strongly support Open Access or even have initiated programs which aim to strengthen data and information infrastuctures (e.g. NSF). The importance of data archiving was also acknowledged by funding agencies and some of them have published good practice guides or data policys which aimed to convince scientists to publish their primary data in appropriate archives.

However, data management costs money. In the past, funding agencies preferred to fund the development of new systems, but they fail to ensure funding long term operation of the resulting infrastructures. Funding organisations have slowly woken up to the problem of how projects can be transformed into infrastructures but the problem is still not solved.

Even though the value of data sharing is recognised, there is little motivation for researchers to prepare their data for online access. It only causes extra work, does not add much to prestige and recognition among peers. From a researcher’s perspective, the money is better spent on further research. In this framework, policies on data sharing remain without effect.

This does not mean that researchers are unwilling. In fact, the majority is willing to share their data but in many cases are frustrated by the difficulties arising when they try to submit their data to a database. Many scientific database operators have not understood the paradigm shift in how the web works, the shift towards user generated content. I know, that mentioning "user generated content" in this context opens a can of worms. The point is, that most scientific databases, especially the publicly mandated ones, are not service oriented and simply rely on their mandate.

The funding agencies are in a dilemma. Their rules make it difficult to adapt to this rapidly evolving field. So, where is the business model to start-up independent and innovative, service oriented scientific databases? Restricted data access and paid services? This will not work because individual researchers are not able and willing to pay. Another possible avenue to obtain funding is to convince researchers of the benefits of data services and join scientific projects e.g. as subcontractors.

For this model to be successful requires motivation on both sides. User frustration needs to be avoided and technical as well as service infrastructure needs to be most up to date. Improved cooperation between data centres surely is an advantage to close own service gaps. Project specific data management networks which can share responsibilities might be a solution to satisfy user needs.

Friday, March 7, 2008

Dublin Core for stratigraphic units

This is a first draft which shows how I would encode metadata on stratigraphic units.
The usage of the coverage and subject (which both now contain the chronostratigraphic position of the unit) tag is still preliminary. I have to look up the DC specifications, as far as I remember there was the possibility to specify spatial as well as temporal coverage..


<metadata>
<oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:format>text/html</dc:format>
<dc:rights>
Licensed under a Creative Commons Attribution 2.5 License
</dc:rights>
<dc:publisher>
Stratigraphy.Net http://www.stratigraphy.net</dc:publisher>
<dc:type>Stratigraphic Unit</dc:type>
<dc:title>Ammergauer Schichten</dc:title>
<dc:date>1855</dc:date>
<dc:creator>Studer</dc:creator>
<dc:identifier>http://some_URL</dc:identifier>
<dc:relation>Wetzstein-Schichten</dc:relation>
<dc:subject>Malm</dc:subject>
<dc:coverage>Malm</dc:coverage>
<dc:coverage>Germany</dc:coverage>
<dc:coverage>Ammergebirge</dc:coverage>
</oai_dc:dc>
</metadata>

Monday, March 3, 2008

Taxonomy as a form of art

From 29 February to 9 March 2008 the Natural History Museum in Berlin is stage for "HUM - The Art of Collecting - A Taxomaniac Parcours". The parcours leads through the Natural History Museum collection and touches on issues such as:
- The meaning of names, originals and order,
- The temporal validity of knowledge,
- The scope of cognition and pattern recognition,
- The economy of categories,
- The mental and vital capacity of our world.
The performance is complemented by interviews with the museum's custodians.

Thursday, February 28, 2008

TaxonConcept's Taxon Concepts II

Well, it is done ;) Here is an example which shows how our TCS XML looks like:
http://taxonconcept.stratigraphy.net/taxon_tcs.php?taxid=792

TCS seems to be really the XML standard to describe TaxonConcepts synonymy list entries.
The only open problem is how fine the granularity of a data set should be. I have now included any synonymy list of a distinct taxon in one TCS file. So our TCS granularity is at the taxon level. A finer granularity at he synonymy list level however also seems to be reasonable.

Wednesday, February 20, 2008

Dublin Core for Taxa

I recently wrote here that TaxonConcept will soon offer its metadata in TCS (Taxon Concept Schema) format to improve data exchange and interoperability with other groups. As the planned exchange interface will be a Open Archives (OAI) provider, we also have to deliver our data in Dublin Core (DC) format.
This brought up the problem how taxonomic data should be encoded in DC, which is mainly designed for document, electronic ressources and similar entities.
I had a short discussion with some of the members of the TDWG GUID list and decided to treat taxa similar to museum objects. For physical objects DC fields creator and date should contain e.g.the artist's name and date of creation instead of the name of the creator and publishing date of the electronic metadata representation. Therefore, I decided to use the creator tag for the taxon author, the date field as the date of description of the taxon and the title field for the taxon name. To describe the classification of a taxon I'll use the subject tag. Further, the relation tag will be used to handle the information TaxonConcepts stores from published synonymy lists.
An example:


<metadata>
<oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:format>text/html</dc:format>
<dc:format>application/pdf</dc:format>
<dc:rights>
Licensed under a Creative Commons Attribution 2.5 License
</dc:rights>
<dc:publisher>
Stratigraphy.Net http://taxonconcept.stratigraphy.net
</dc:publisher>
<dc:type>Taxon Concept</dc:type>
<dc:title>Eoglobigerina edita</dc:title>
<dc:date>1953</dc:date>
<dc:creator>Subbotina, N.N.</dc:creator>
<dc:identifier>
http://taxonconcept.stratigraphy.net/taxon_details.php?taxid=787
</dc:identifier>
<dc:relation>Globigerina edita Subbotina 1953</dc:relation>
<dc:relation>Globorotalia (Globorotalia) edita Subbotina 1953</dc:relation>
<dc:relation>Eoglobigerina edita edita Subbotina 1953</dc:relation>
<dc:relation>Globigerina edita var. polycamera Khalilov 1956</dc:relation>
<dc:relation>Eoglobigerina edita polycamera Khalilov 1956</dc:relation>
<dc:relation>
Globigerina (Eoglobigerina) hemisphaerica Morozova 1961
</dc:relation>
<dc:relation>
Globigerina (Eoglobigerina) tetragona Morozova 1961
</dc:relation>
<dc:relation>
Globigerina (Eoglobigerina) pentagona Morozova 1961
</dc:relation>
<dc:relation>
Globigerina (Eoglobigerina) theodosica Morozova 1961
</dc:relation>
<dc:relation>Globanomalina pentagona Morozova 1961</dc:relation>
<dc:subject>Eukaryota</dc:subject>
<dc:subject>Protoctista</dc:subject>
<dc:subject>Granuloreticulosa</dc:subject>
<dc:subject>Foraminifera</dc:subject>
<dc:subject>Globigerinida</dc:subject>
<dc:subject>Globigerinaceae</dc:subject>
<dc:subject>Globigerinidae</dc:subject>
<dc:subject>Eoglobigerinidae</dc:subject>
<dc:subject>Eoglobigerina</dc:subject>
<dc:subject>Eoglobigerina edita</dc:subject>
<dc:subject>Acarinina nitida</dc:subject>
</oai_dc:dc>
</metadata>