Wednesday, March 4, 2009

Ageparser is reaching beta status

This week I finally found some time to care a bit about my Ageparser tool. Ageparser is a text mining service which is able to find stratigraphic terms within a text/document. In a second step, these terms are used to find their chronostratigraphic age in the Agenames database.
The idea was to provide a service which would analyze a text and determine it's chronostratigraphic context. You could upload a document and it would say: this text is about the Cretaceous.
However, in some cases Ageparser gave some very strange results, e.g. if it discovered the term 'Canadian' in a document it returned 'Lower Ordovician' as possible stratigraphic context. I therefore had to implement something which at least gives an estimate on the credibility of such an age determination.
The current solution is that Ageparser calculates a credibility index (between 0 and 1) which is based on the frequency of a determination as well as on the diversity of terms for a given age. The assumption behind is that the credibility of a age determination better if e.g. someone uses many different stratigraphic terms (different formation names or stages) of a distinct system.
So I think Ageparser is now ready to leave the alpha status and get beta now.. well and I have to start to prepare a publication on it.

No comments: