Yesterday I was pointed to an interesting blog post about the problems of tracking data citation in the blog reserachremix by Heather Piwowar.
Heather was trying to track the reuse of datasets that have a DOI through different channels (Google scholar, Web of Science etc.) and was not satisfied with the result.
The DOI names she was tracking 10.3334/ORNLDAAC/* were not assigned by DataCite but through CrossRef, and are actually a good example, why DataCite is needed.
A DOI name for a dataset alone is nothing but a an identifier. DataCite’s goal is to build up additional services for all datasets registered by our members. This includes uploading of the citations into Web of Science or Goggle Scholar and tools to measure the use, re-use and citation. These are major efforts that we want to achive in cooperation with our members and data centers. As writen in my last entry, the first services will be available in the middle of June. For a first glance of what is possible with data registration, I would strongly suggest to have a closer look on the Publishing Network for Geoscientific & Environmental Data (PANGAEA) . They have registered over half a million datasets through our system and have started to upload their content to Google Scholar and OAIster. If you try to track their data DOI names, these results should be much more satisfying (all their DOI names start with 10.1594/PANGAEA) as they not only have DOI names for their datasets but they have an excellent infrastructure behind it that is freely crawlable by third parties.
Nevertheless tracking the re-use of datasets through DOI names is still a problem, as the actual idea of re-using, referencing or citing existing data has just started. A lot of PANGAEA’s datasets are used in scholary publication and though these connections are sometimes visible (doi:10.1016/j.margeo.2004.03.017) the DOI names of the datasets do not appear in the metadata of article yet. This is one of our goals in the next month, to convince editorial boards to allow and explicitely ask for citations of datasets used in a manuscript, or as a first step to actively provide the publishers with the information that there is data available for their articles.
This is where we are going. Just assigning DOI names or any simple identifier is not enough, we need to have a central portal, additional services, and direct cooperations with third parties but most of all a strong voice to be heard by all the other players.