Clarifying Data Citation and Sharing

Some Practical Information for Journals

Wednesday, August 31, 2016 Philadelphia, PA

Agenda

Overview

This workshop will provide journal editors and other members of their editing teams with a summary of recent developments in data citation, management, and sharing, and suggest some ways in which these improvements can be integrated into journal workflows. The workshop will provide editorial teams with an update on current practices, including (where relevant) describing trade‐offs between different options. The intention is to explain some of the less visible aspects of data citation and sharing, and to provide more information about different available options.

The workshop is organized by the Data Preservation Alliance for the Social Sciences (Data‐PASS), a voluntary partnership of organizations created to archive, catalog, and preserve data used for social science research. The event will be co‐led by five Data‐PASS organizations with foci that include political science: the Institute for Quantitative Social Science (IQSS) at Harvard University; the Howard W. Odum Institute for Research in Social Science at the University of North Carolina‐Chapel Hill; the Inter‐ university Consortium for Political and Social Research (ICPSR) at the University of Michigan; the Qualitative Data Repository (QDR) at Syracuse University; and the Cornell Institute for Social and Economic Research (CISER) and the Roper Center for Public Opinion Research at Cornell University.

Light Refreshments (1:00pm – 1:30pm)
Introduction (1:30pm – 2:00pm)

Slides by George Alter (ICPSR) – Slides

Introductions, learning outcomes, initial questions

Participants receive a list of attendees for networking and continued discussion on data management and citation.

Data Citation (2:00pm – 2:50pm)

Presentation by Sebastian Karcher (QDR) – Slides

Brief review of current best practices in data citation, including the use of permanent identifiers, notably Digital Object Identifiers (DOIs), which provide stable, persistent, and resolvable references.

There is a growing consensus that data, just like other scholarly products, should be cited in scholarly works. Appropriate citation gives credit to data creators and makes it easy for readers to track down data used in scholarly works. Permanent identifiers such as DOIs assure that such citations remain stable over time. They also facilitate subsequent analysis of the data‐publication linkages.

Participants receive a summary of recent discussions of data citation principles, including sample data citations.

Coffee Break (2:50pm‐3:10pm)
Managing and Archiving Data (3:10pm – 4:00pm)

Presentation by Tom Carsey (Odum) – Slides
Presentation by William Block (CISER) – Slides

We will discuss several characteristics of data, and how these impact the extent to which data can be effectively described and stored. For example, whether arriving in the form of a spreadsheet of numbers, a collection of archival pages, or a set of transcribed interviews, data need to be accompanied by clear and comprehensive documentation. Similarly, adhering to common norms about file storage formats assures the long term usability of data. Likewise, in order for researchers to be able to find data cited in scholarly work, they should be described in human‐ and machine‐readable formats (metadata). Careful attention must be paid as to whether the data are under ethical or legal constraint, for example where they involve the privacy and/or security of human participants to whom data refer. Finally, where replicating scholarly work is a goal, additional information (e.g., software versions and code) is required. Researchers should keep these data management concerns in mind from the planning to the publication phase of their research (the “data lifecycle”). While journals only interact with data during one portion of the lifecycle, they can play an important role in ensuring data integrity and discoverability.

Participants receive handouts describing the data lifecycle, and a glossary of terms.

Brief Break (4:00pm‐4:10pm)
Journal Workflows (4:10pm – 5:00pm)

Presentation by Tom Carsey (Odum) – Slides
Presentation by Gustavo Durand (IQSS) – Slides

Where journals offer or require the submission of replication datasets together with publications, it can be helpful for them to develop a workflow with domain repositories to facilitate the process. Such workflows can range from a simple set of steps to fully automated processes through application program interfaces (APIs). This session describes some general principles, presents case studies of journal workflows, and suggests different options journals may consider in making their choices.

Participants receive a set of slides describing sample workflows and best practices of several journals and repositories.

Journal Policies (5:00pm – 5.30pm)

Presentation by Colin Elman (QDR) – Slides

A journal’s approach to data is encapsulated in its policies, which in turn are instantiated in a series of documents, including: a data and replication policy, a set of author’s guidelines, and a checklist to assist authors in preparing their materials. This session will provide a description of each of these documents.

Participants receive exemplars of journal policy documents, including a data and replication policy, author’s guidelines, and a data checklist.