Sunday, August 24, 2008

Biodiversity informatics session at EGU 2009

I just discovered the provisional programme for the EGU 2009 ESSI (Earth and Space Science Informatics) sessions. The session topics really sound interesting... and surprise... : for the first time there will be a biodiversity informatics session.

Seems as if things will grow together and, hey, is this the beginning of a biogeoinformatics community? ;)

Friday, August 22, 2008

Disinforming Google Street View


After the first Google cars appeared here in my home town Bremen, Germany, many people have been concerned about Google's Street View activities. But apparently the legal situation does not allow hindering Google to make pictures of every corner of the city.

I personally was very amused to see the Google car in my street just before I finished painting our house;)

So what can you do to protect your privacy at least a bit when Google comes? By camouflage and disinformation;)

Friday, August 15, 2008

Citation parsing

The next version of ageparser will extensively use regular expressions to identify stratigraphic terms. While working on this, I also played with some regular expressions which are useful to identify citations within a scientific document and to parse authors and year of these citations. I assume this is a quite common task for some of you, so maybe you find some of the following expressions useful for your own code:


Pattern for common person names:

$personpat=(([Bb][Ee][Nn]\s|[Dd][Ee]\s[Ll][Aa]\s|P'|[Dd]'|[vV][Aa][Nn]\s|[vV][Oo][Nn]\s|[dD][eE][lL]?\s|[dD][iI]\s)?[ÄÖÜA-Z]{1}[A-ZÄÖÜÒÓÀÉÈóòäöüàéèâa-z-]{1,})

Pattern for authors:

$citpersonpat=$personpat."{1}(,\s".$personpat.",?)?((\sand\s".$personpat.")|((,\s".$personpat.",?)?\set al(\.|[i]{2})))?"

And two patterns for citations:

$citpattern1="[\s\.]{1}(".$citpersonpat.")\s\([0-9]{4}[a-z]?\)"
$citpattern2="[\s\.]{1}[\(,;]?(".$citpersonpat.")[,;]?\s[0-9]{4}[a-z]?[\),;\.\s]"