Today I was scanning the web for tools which are able to scan documents and identify locations or coordinates (which we'll need to reach of our ultimate goal of a 4D (space and time) index and search engine ;) ) and found Rod Page's interesting article:
iPhylo: From PDFs to Google Earth.He offers a online service probably based on some regular expessions?, which is able to extract coordinates from pdf files and returns KML or JSON files. A simple and pragmatic approach. Cool!
I also found some geoparser tools which are able to identify location names in texts. The most interesting is
Metacartas geoparser API which seems to give good results. Metacartas internet pages offer some impressive examples on how this API can be used.
Another geoparser is DIGMAP's
text mining service which returns some OGC compliant XML file containing all found (not only geographic) features.
And there is MEDINA's
geoxwalk which seems to be restricted to the british islands. However, I could not test this tool: the mentioned site only offers a screenshot and some pdf documnents on this tool.
Metacarta's geoparser seems to be the most advanced solution, however it's a service offered by a commercial company and unfortunately their 'terms of use' page returns a 404 error. Most probably they will not offer this service for free.
I wonder if it would be possible to create something similar to
agenames which identifies location names and returns coordinate pairs. Maybe based on the
geonames gazetteer?