DataCite summer meeting recap

In the last 4 days Berkeley has been the heart of the „data citation world“. We first had the summit of the CODATA task group on data citation , followed by our two day DataCite summer meeting.

Our meeting began with a fantastic keynote by John Wilbanks from Creative Commons. He gave us a motto for our path: We should keep in my mind that all our solutions have to be:

                        Simple. Weak. Scalable. Open

John’s slides can be found here , but I understand that he will upload a version with audio soon.

DataCite is about to become a community. A community to achieve data citation. It was great to see that so many different players were present to built this community here with us:

Data centers, Libraries, Publishers, Universities, Founding organisation and scientists.

Some statements amongst others that I will take home from the last days:

–         Abstracts to articles are open, it is time that the reference lists will be open too

–         We have to move from our current practise of “data sighting” to “data citing”

–         But even the ability to cite data might not be enough incentive for the scientists to publish their data. (“Waiving the carrot only makes sense to those who like carrots”)

–         One possibility could be establishing data papers as independent scientific items, There already are data journals out there: ESSD , G3 , and the upcoming GigaScienceJournal

–         The web once was invented for scholars, now it has changed everything except scholarly publication. It is time to end this.

–         Data is the real outcome of science, the article is only the summary or even mere metadata to the data.

–         Concerning the information overflow: We do not have to turn off the tabs, we have to build boats.

–         Great work has already been done, that we can all learn from and cooperate with at ICPSR , ORNL and PANGAEA for example.

–         What is publishing data anyway: Publishing with a small “p” (putting it online somehow) or with a big “P” (Quality controlled, peer-reviewed, persistent availability)

–         The difference between CrossRef and DataCite is the difference between communities. Those communities cooperate so do DataCite and CrossRef.

–         Concerning the issue, whether datasets and journals should be stored and maintained together: There already is a place for this and this place is the library.

It has been a great week. Full of great discussions and interesting thoughts.

A big thank you to all of those who have been here and enriched the discussions. We have received a great feedback by the community that also is a heavy mission for us. We will respect it. See you all next year in Copenhagen for the 2012 summer meeting.

The slides of the summer meeting will be up on the datacite webpage soon. You can find more summaries of the meeting on the web:

By Karthik Ram

By @mrgunn

By Carl Boettiger here and here

By GigaScience



Posted in Meetings | 2 Comments

Metadata Schema Version 2.2 released

Today a new DataCite Metadata Schema version was released.
Version 2.2 of the DataCite Metadata Schema introduces several changes, as noted below:

  • Addition of “URL” to list of allowed values for relatedIdentifierType.
  • Addition of the following values to list of allowed values for contributorType: Producer, Distributor, RelatedPerson, Supervisor, Sponsor, Funder, RightsHolder.
  • Addition of “SeriesInformation” to list of allowed values for descriptionType.
  • Addition of “Model” to list of allowed values for resourceTypeGeneral.

Version 2.2 of the DataCite Metadata Schema documentation includes these changes:

  • Provision of more examples of xml for different types of objects.
  • Explanation of the PublicationYear property in consideration of the requirements of citation.
  • A change to the definition of the Publication property, which now reads, “The name of the entity that holds, archives, publishes, prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role.”

The DOIs for the new version are:

Documentation: 10.5438/0005

XSD: 10.5438/0006

Posted in working groups | Leave a comment

DataCite Summer Meeting “Data and the Scholarly Record: the Changing Landscape”

DataCite will hold its second Summer Meeting on August 24th and 25th at the historic Shattuck Plaza Hotel in Berkeley, California. The Summer Meeting will be a 1.5 day event and is open to all. You can register at:

The Summer Meeting brings together people from research organisations, data centers, government, and information service providers to hear about the latest developments in data science, data citation, discovery, and reuse. It also provides opportunities to exchange experience and influence the next generation of data citation services.

This year’s program will include sessions on data citation, state of the art in data publishing and a discussion rounds on the new challenges that come with increasing access to scientific data. The keynote speaker is John Wilbanks, Vice President for Science of Creative Commons.

The 2010 DataCite summer meeting brought together a strong programme of speakers and participants . Highlights were published in a D-Lib special issue .

Posted in Meetings | Leave a comment

EHEC genome with a DOI name

The May 2011 outbreak of an E. coli infection in Europe has resulted in serious concerns about the potential appearance of a new deadly strain of bacteria. The Beijing Genomics Institute (BGI) in collaboration with
the University Medical Center Hamburg-Eppendorf researchers, used their genomic technology to determine the infectious strain, reveal the mechanisms of infection, and facilitate the development of measures to control
the spread of this epidemic thus helping German scientists to find the cause for the epedimia and understanding the nature of the infection.

Why do I mention it here?
The BGI is a customer of DataCite using the British Library as their DOI-agency. Therefore the data of the genome’s sequence of the bacterium was published with a DOI name.

The data is available here:

To maximise it’s utility to the research community and aid those fighting the current epidemic, genomic data
is released by BGI into the public domain under a CC0 licence.
Until the publication of research papers on the assembly and whole-genome analysis of this isolate
the dataset can be cited as:

Li, D; Xi, F; Zhao, M; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, C; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen, Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Song, Y; Zhao, X; Chen, F; Yin, X; Rohde, H; Liang, Y; Li, Y
and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011):
Genomic data from Escherichia coli O104:H4 isolate TY-2482.
BGI Shenzhen. doi:10.5524/100001



Posted in DOI, News | 1 Comment

DataCite website refurbished

Dear all,

we have refurbished the DataCite Website.
The new version is easier to navigate and allows user to find the information they look for more directly. Please take your time and if you like, give us some feedback on it.

Furthermore we have scheduled the dates for our DataCite summer meeting! Following the great success of last years meeting in Hannover, Germany this years meeting will be on August 24. and 25. in Berkeley, California. The scope of the summer meeting is to:

  • present the state-of-the-art in data related science
  • give examples of projects and infrastructure
  • provide a platform for data centers to exchange their experience
  • give all DataCite members a platform to present their latest developments to a wide audience

Look out for more details on the agenda coming up soon.

Best wishes from Vancouver, where I am currently attending the IASSIST conference that has the motto: “Data Science Professionals: a Global Community of Sharing”, today we will present DataCite’S work and have a session about current practise of data citation, which is of course one of DataCite’s main subjects.

I will report about it next week.



Posted in Meetings, Website | Leave a comment

DataCite email accounts

Dear all,

we experienced technical issues with DataCite email accounts between February and March 2011, which may have resulted in undelivered messages. All technical issues have been resolved, but please let us know if you have contacted us in the past two months and have not yet received a response.

Sorry for that.

Oh and by the way: Next week we will have a new refurbished website out, that should be alittle easier in navigation and accessing information than the last one.

I will keep you informed.


Posted in Website | Leave a comment

Update of DataCite Metadata Scheme online

The DataCite Metadata Scheme has been updated, and the schema and accompanying documentation are now available in Version 2.1. The Schema is accessible at this location: The documentation is here:

Several early adopters and other careful readers generously provided us with feedback regarding the details of the specification. As a result, we were able to make a number of improvements. The most significant change to the schema is that it now includes a namespace, which provides OAI PMH compatibility.

The documentation changes may be less significant, but we hope they add clarity. A new column in the properties tables provides guidance as to whether the property being described is an attribute or a child of the corresponding property that has preceded it. In addition, in response to a request, we gave one of the allowed values lists (the relationType pairs) a thorough overhaul.

I’d like to add my personal thanks to the Metadata Working Group members who helped review these changes, to the technical experts from our member institutions who provided advice, and to our Metadata Coordinator, Frauke Ziedorn, for everything she does, including keeping track of the feedback we get from community members.

On another note, a small team from the Metadata Working Group will begin working in April on a second version of the schema that is interoperable with the Dublin Core. Please stay tuned for more information on this development as it unfolds.

Joan Starr

Posted in working groups | 2 Comments