Data citation, especially using Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. For example, the AGU Council asserts that data “publications” should “be credited and cited like the products of any other scientific activity,” and Thomson Reuters has recently announced a data citation index built from DOIs assigned to data sets. Correspondingly, formal guidelines for how to cite a data set (using DOIs or similar identifiers/locators) have recently emerged, notably those from the international DataCite consortium, the UK Digital Curation Centre, and the US Federation of Earth Science Information Partners.
These different data citation guidelines are largely congruent. They agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on some of the more subtle nuances of data citation. They define different methods for handling different data set versions, especially for the very dynamic, growing data sets that are common in Earth Sciences. They also differ in how people should cite specific, arbitrarily large elements, “passages,” or subsets of a larger data collection, i.e., the precise data records actually used in a study. This detailed “micro-citation”, and careful reference to exact versions of data are essential to ensure scientific reproducibility. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. Careful practice must be coupled with the use of curated identifiers. In this paper we review the pros and cons of different approaches to versioning and micro-citation. We suggest a workable solution for most existing Earth science data and suggest a more rigorous path forward for the future.