Showing posts with label taxonconcept. Show all posts
Showing posts with label taxonconcept. Show all posts

Wednesday, August 19, 2009

Charting taxonomic knowledge through ontologies and ranking algorithms - Post-print at GFZ

For those who are interested in reading our paper on Taxonrank , feel free to download a post-print copy at: http://edoc.gfz-potsdam.de/gfz/get/13007/0/d8b09c133462792c99eb6a163a6c5601/13007.pdf


TaxonRank is a ranking algorithm based on bibliometric analysis and Internet page ranking algorithms. TaxonRank uses published synonymy list data stored in TaxonConcept, a taxonomic information system. The basic ranking algorithm has been modified to include a measure of confidence on species identification based on the Open Nomenclature notation used in synonymy list, as well as other synonymy specific criteria...

Wednesday, December 10, 2008

Taxonomy Puzzles


In his blog Taxonomy vs. Systematics Dave Hone suspected that at least some paleontologist are not aware that systematics and taxonomy are two different things.
Even worse: As I wrote earlier both are completely ignored in a large number of paleontology related research articles! The authors of these articles do use species names but there is no 'systematics/taxonomy' section which explains the taxonomic concept of the author.

Some may now argue that this is not a problem because many species used in these investigations are so common or simple that everybody knows what is meant by species XY. But this is not true. For example it has been shown by the 'ElKef blind test' (to clarify how aprupt the K/T extinction really was) that differing taxonomic concepts can drastically reduce the comparability of scientific results (Lipps, 1997, Keller, 1997).
There are many reasons why this happens: Taxon names change, names are wrongly assigned or different reference specimen or images are used etc. The result is a big chaos of synonymies which is hard to unpuzzle. Therefore, scientists may use the same taxon name but mean totally different species.

By carefully prepared synonymy lists taxonomists traditionally try to track such taxonomic changes. With our TaxonConcept tool we have stored thousands of synonymy lists and it is often surprising how complicated it can get when e.g. synonyms of taxon names get their own synonymy lists or taxon names have been wrongly assigned. As an example, the picture on the top of this page which shows a graph of the relations of the taxon 'Archaeoglobigerina cretacea' to other taxon names which includes hundreds of other taxon names.
So, if you use taxon names in your studies try to provide some hints on what your taxonomic concept is. Otherwise you may leave future generations of with taxonomy puzzles like this.

References:

Keller, Gerta: Analysis of El Kef blind test I, Marine Micropaleontology, Volume 29, Issue 2, January 1997, Pages 89-93.

Lipps, Jere H.: The cretaceous-tertiary boundary: The El Kef blind test, Marine Micropaleontology, Volume 29, Issue 2, January 1997, Pages 65-66.

Friday, April 18, 2008

A preview of the new Stratigraphy.net look

For quite a while we have been very unhappy with the old Snet homepage. The content is completely outdated and simply does no more reflect or promote our current activities. Further, after more than 5 years we found it was really time to reconsider the overall look of the homepage ;)
The old version was based on the CMS Contenido which is nice, but develloping modules for such a system e.g. a search interface allowing direct access to Snet data is a pain. Further, contenido now is popular enough to attract hackers which forces us webmasters to carefully watch the latest vulnerability reports, thus frequently perform security fix updates.

Therefore I decided to completely redesign the Snet homepage. The new version will be home brewn, slimmer but more informative. It will concentrate more on content, Snet data , services and news.

To keep maintenance costs as low as possible, Snet 2 will support major standard protocols to ingest content from various sources. For example we will use OAI-PMH to collect data from Agenames, Taxonconcept (and possibly other 3d party sources). It will also use RSS feeds to include news from all Snet projects as well as from this blog.

More datails on the new Snet architecture will be published here soon. By now I would like to invite those interesed to visit the first beta of Snet2 here:
http://www.stratigraphy.net/Snet2.

Thursday, March 27, 2008

The end of the sandbox

Originally, we have provided TaxonConcept's sandbox as a testing area for persons interested in the functionalities of TaxonConcept. The sandbox was a complete mirror of TaxonConcept, it was intended to be the site where people could just play around with the tool without obligations and risks after a short online registraton.
We have offered free access to the sandbox for several months, but we found that despite comparably many people have registered, none of them did really use it. After more than one year, there was only a hand full of testing entries. After comparing this little impact with the efforts required to maintain the sandbox we decided to close the sandbox.
I now wonder why that many people registered to the sandbox and apparently have decided not to use it. OK, TaxonConcept is not an easy tool, but we have now two students working with it and it took just one or two days to train them.
Probably people who registered simply did expect something completely differend to receive after clicking on the 'register me!' link, maybe something like a newsletter?

Wednesday, January 30, 2008

TaxonConcept's Taxon Concepts

Currently, TaxonConcept's data exchange capabilities are quite limited. This is mainly because we have not been able to determine a appropriate XML format which would allow us to represent the majority of TaxonConcepts information categories such as, concepts, descriptions, links to image objects, references etc.
Commonly used formats such as DarwinCore and ABCD are mainly designed to represent metadata of collection objects or species observation data and therefore not suited for our purposes. But at least TC's most important information pieces the taxonomic concepts and references can be represented fairly good by the TDWG standard TCS (TaxonConcept transfer schema).
According to this TDWG standard, a Taxon Concept is a name plus a description of a taxon, a definition which fits perfectly to what we do:
The taxonomic concepts stored in TC are entries of published synonymy lists. These basically represent definitions of individual taxonomic opinions by including or excluding other author's descriptions. In other words, a synonymy list is a Taxon Concept, which includes other Taxon Concepts.
The most difficult part when we start to translate our data to TCS will be the translation of 'Open Nomenclature' to TCS. Fortunately, TCS allows to represent our synonymy list entries as concept relationships, pro parte relationships (p) as well as misapplied names (non). which should enable us to create sufficient XML representations of our data.

Monday, January 21, 2008

How TaxonRank works

Fossilized organisms are important tools in the study of stratigraphy, past climates and ecologies. However, taxonomic classifications of organism, and thereby their names, change frequently and finding correct synonymies for a given species is a considerable problem for non-taxonomists.
Computer-based knowledge management systems can help to make the existing wealth of taxonomic knowledge accessible and easier to interpret. TaxonRank is a ranking algorithm based on bibliometric analysis and Internet page ranking technologies. TaxonRank uses published synonymy list data stored in TaxonConcept, Snet's taxonomic information system.

Synonymy lists contain valuable taxonomic information. Since decades the Open Nomenclature notation allows to express unclear classifications and to comment on other author’s identifications of a specimen. These lists contain occurrences of specimen in the literature matching the author's concept of a specific taxon and reflect the taxonomic opinion or concept of the list's author on a specific taxon. Their highly formalized nature makes them ideally suited for information systems which allow to analyze and describe relations between taxonomic concepts. Since synonymy lists contain references to other synonymy lists or taxonomic descriptions respectively, they represent a typical ontology.

The idea behind TaxonRank is that some authors species identifications might have a stronger impact on a 'common taxon concept' than others. This can be the result of many factors, e.g. the quality of species illustrations, the reputation of the author or the availability of a publication. In analogy to PageRank we state that the rank of a synonymy list is determined by the rank of the synonymy cited in a particular synonymy list.
The PageRank algorithm is based on concepts and topology of the world wide web and therefore we first need to define ’pages’ and ’links’ between these pages.

To apply the PageRank to synonymy lists, we define a synonymy list Si for a taxon t published by author i as an analogon of a Internet page containing an arbitrary number of pairs of synonymous names syn and the cited publication doc listed by author i as l{syn, doc}. We further define such pairs as synonymy list entries.
The order of such a synonymy list entries o({syn, doc}, Si) is in turn defined by the publication year of the document containing the synonymy list.
A link within a synonymy list from Si to Sj is present when the synonymy list entry l{syn, doc} exists in Si and in Sj and o({syn, doc}, Sj) > o({syn, doc}, Si) and syn is element of P, i.e. a synonym name, publication pair has been previously used by the author of an older publication.

The set of all synonymy lists Sj of P is LSi and the number of links from Sj is Nj . Further, any pair l{syn, doc} is defined as a synonym list having itself as only synonym list entry. The distance distj between taxon tj and taxon ti is the distance of nodes within the ontological graph network P and determines the strength strength(Si, Sj) = 1/distj of a link l leading from Si
to Sj .
The SynonymyRank SR of a specific synonymy list Si is defined analogous to the PageRank algorithm and calculated recursively using equation (1):
(1)

The rank of a synonymy list Si is thus defined as the sum of the ranks of all synonymy lists pointing to list Si, divided by the number of all links on Sj .

To calculate the rank of a specific taxon within a synonymy list we included a pre-ranking derived from the Open Nomenclature notations used by both the synonymy list author as well as the citation author. In our ranking experiment we regard certain open nomenclature tags as indicators of confidence with respect to a species identification and assign a scalar value to each tag.
This scalar value is used as confidence factor for each species determination of a synonymy list and represents a measure of the taxonomic expert knowledge.
The rank of a taxon occurrence tik in a synonymy list is calculated as the product of this confidence factor and the synonymy list rank SR.
(2)

We can then calculate the rank of a taxon within a synonymy list TR(ti, Sj) using equation (2) as the mean of all instances of a taxon under consideration of the confidence factor in (e.g 0.5 for open nomenclature tags cf, p, sp. etc..).
In a first approach to determine the rank for a taxon as an element of P we can calculate the total taxon rank as the sum of all TR(ti) of any synonymy list SR.

As an example, we calculated TaxonRank on Subbotina triangularis. The sizes of the circles reflect the ranks of the synonym candidate taxa, All highly ranked taxa plot in the cluster around the target species, which indicates that TaxonRank correctly identifies the most important taxa. Testing the quality of TaxonRank is simplified by the fact that the most recent literature has little influence on the rank of a specific taxon. And in fact we find all taxa, for which a rank > 30 was calculated, in the synonymy list of the paleocene foraminifer working group (Olsson, 1999).

Thursday, January 17, 2008

Graph layout experiments with Jsviz


Yesterday I found jsviz, a javascript based tree and graph layout classes at http://www.jsviz.org .
While I am quite happy to have the prefuse Java plug-in to display the complex relationships between taxa I always wished to have something php or js based because the prefuse applet is loading quite slow.
The first experiments with jsviz are quite promising! I begun to simply modify the examples from the jsviz blog pages and fed them with XML files generated by TaxonConcept. The results are very impressing, for example the snowflake graph for Archeoglobigerina cretacea just looks beautiful: http://taxonconcept.stratigraphy.net/taxon_jsviz.php?taxid=1861