Showing posts with label data management. Show all posts
Showing posts with label data management. Show all posts

Tuesday, October 23, 2012

Biodiversity Data Journal - archiving or self archiving?

Another interesting development in the world of research data management: Pensoft announced that the Biodiversity Data Journal (BDJ) will start accepting submissions in December 2012. The BDJ is a new data journal similar to the Earth System Science Data Journal (ESSD) which pioneered in publishing research data.

Biodiversity Data Journal (BDJ) is a community peer-reviewed, open-access, comprehensive online platform, designed to accelerate publishing, dissemination and sharing of biodiversity-related data of any kind. All structural elements of the articles – text, morphological descriptions, occurrences, data tables, etc. – will be treated and stored as DATA.

So far this sounds very good. But IMHO not so good is the data archive policy of BDJ, which is a bit vague.

The BDJ journal homepage only provides a link to the general pensoft data publishing guidelines which offers many options how one should archive data. Of course they mention Dryad and PANGAEA etc. which are suitable data archives. But it is hard to understand why GBIF's Integrated Publishers Tookit (IPT) and scratchpads are also included as an option.

GBIF is not an archive and both, scratchpads and the IPT are software solutions which can be used for archiving purposes but need to be hosted by a reliable organisation - but these are not named. Therefore, in its current form, pensofts data archiving policy could be understood as an invitation to self archiving ... or does BDJ e.g. mean IPT in the sense of "pensoft IPT"?. This should be fixed until December.

Wednesday, August 12, 2009

How much money is in the scientific data management 'business'?

The answer to this question probably nobody knows... But, based on some numbers I know, I will give a rough estimate on how much money potentially could be made with scientific data management - in Europe.

The 7th European Framework Programme (fp7) provides more than 50 billion € for research projects between 2007-2013. The structure of fp7 is quite complicated and it is almost impossible to find out how much of this money exactly is spent for bio- and geosciences.
The themes 'Agriculture and Fisheries, Biotechnology' and 'Environment (including Climate Change)' alone are funded with > 3.8 billion €. So let's estimate the European Commission spends around 4 billion € for bio- and geosciences.






SourceMio €
fp7 (total)50000
fp7 (bio/geo themes)~4000
fp7 (potential for DM)~45-50

To my knowledge, those European research projects which provide some money for data management, reserve 0.5-3% of the total project funding for this purpose. The mean percentage is about 1.5%. From the 4 billion I mentioned above, some projects may not need data management at all. However, if at least 75-80% of all research projects produce data, 1.5% of the remaining 3 billion € would include around 45-50 million € for data management. This seems to be much money, but the fp7 started in 2007 and will last until 2013, so the yearly amount of money for data management (DM) potentially(!) spent is around 6.5-7 Mio €. The potential market for scientific data management still seems to be considerable.

Of course this money is not really spent. The percentage of projects which include proper data management is certainly below 75%. I would estimate less than 30% of all projects reserve money for data handling. Further, we need to consider that proponents from 27 member countries compete for this money (as little appropriate data centers exists the competition for the remaining millions is not as hard in reality...).

I used the fp7 example, as the European Commission has shown considerable interest in improving open access to scientific data. As far as I know, this issue is also considered in project proposal evaluations. However, the money the European Commission spends is only one possibility for funding, ideally national funding agency will also support access to data. If you are living in a lucky country, national funding for DM could equal the amount the Commission spends.

In summary, I think the scientific data management niche is still interesting, while there is not as many money you might have expected. However, as the importance of e-science infrastructures and open access to scientific information has only recently been recognized, this sector may still grow in the future.

Wednesday, March 12, 2008

New business models for earth science data management

The value of data sharing has long been recognised and proclaimed in several manifestos and policies (e.g. Berlin Declaration, Budapest Open Access Initiative, OECD). Funding organisations most strongly support Open Access or even have initiated programs which aim to strengthen data and information infrastuctures (e.g. NSF). The importance of data archiving was also acknowledged by funding agencies and some of them have published good practice guides or data policys which aimed to convince scientists to publish their primary data in appropriate archives.

However, data management costs money. In the past, funding agencies preferred to fund the development of new systems, but they fail to ensure funding long term operation of the resulting infrastructures. Funding organisations have slowly woken up to the problem of how projects can be transformed into infrastructures but the problem is still not solved.

Even though the value of data sharing is recognised, there is little motivation for researchers to prepare their data for online access. It only causes extra work, does not add much to prestige and recognition among peers. From a researcher’s perspective, the money is better spent on further research. In this framework, policies on data sharing remain without effect.

This does not mean that researchers are unwilling. In fact, the majority is willing to share their data but in many cases are frustrated by the difficulties arising when they try to submit their data to a database. Many scientific database operators have not understood the paradigm shift in how the web works, the shift towards user generated content. I know, that mentioning "user generated content" in this context opens a can of worms. The point is, that most scientific databases, especially the publicly mandated ones, are not service oriented and simply rely on their mandate.

The funding agencies are in a dilemma. Their rules make it difficult to adapt to this rapidly evolving field. So, where is the business model to start-up independent and innovative, service oriented scientific databases? Restricted data access and paid services? This will not work because individual researchers are not able and willing to pay. Another possible avenue to obtain funding is to convince researchers of the benefits of data services and join scientific projects e.g. as subcontractors.

For this model to be successful requires motivation on both sides. User frustration needs to be avoided and technical as well as service infrastructure needs to be most up to date. Improved cooperation between data centres surely is an advantage to close own service gaps. Project specific data management networks which can share responsibilities might be a solution to satisfy user needs.