Preserving Data and Workflows

University policy is that research data generated through funded projects, and any associated records should be retained for as long as they are of continuing value to the researcher and the wider research community, and as long as specified by research funder, patent law, legislative and other regulatory requirements.

In general, as specified in the RCUK  ‘Guidance on best practice in the management of research data’ the UK Research Councils expect data that underpins findings in publications should be accessible for at least ten years after publication. 

Individual Research Councils’ and other funders’ data policies and good research practice guidance provide additional requirements and should be consulted and retention periods specified in each Research Data Management Plan.  In many instances, researchers will resolve to retain research data and records for a longer period than the minimum requirement

Preservation means the storage of a project’s digital outputs in such a way that they remain usable, understandable and accessible, beyond the end of funding.  In practice therefore preservation is often achieved by depositing the digital material in an archive/repository during the project, or shortly afterwards.  Often, charges made by the archive for preparing and ingesting the data can be directly costed into your grant application (NB. such charges usually need to be paid within the lifetime of the project and not after it has finished). 

Think about preservation at the start of your research project – what data will you be working with, what digital outputs will be produced, where will they be stored, in what way and for how long.  Contact RIES early on in your project for advice.

Active steps to preserve data also usually enable a wider sharing of data within the academic community, bringing with it the associated benefits in terms of research impact and visibility.

A place to securely hold digital research materials (data) of any sort along with documentation that helps explain what they are and how to use them (metadata). The application of consistent archiving policies, preservation techniques and discovery tools, further increases the long term availability and usefulness of the data. This is the main difference between storage and archiving of data.

An archive is for stable (completed) versions of the data not a research workspace. Once data are deposited, they remain in that state and are attributable via a persistent identifier. Active (‘live’) data that are constantly being worked on should not be deposited in an archive. Stable snapshots of longitudinal data can be deposited.

You should always in the first instance seek to deposit your data in a National Data archive (maintaining a metadata record in the UWTSD research data as well).   In many cases your research funder will maintain a national archive and specify that this must be used.  The UWTSD data repository can be used when there is no suitable external repository.

Deposit Workflow: Do I need to deposit my data? 

The UWTSD data deposit decision tree outlines some of the questions involved in your choice of what research data to preserve and identification of a suitable repository

Include our PDF and Infographic

 

 

 

 

 

 

 

 

 

Is there already an archive appropriate for your subject area?

Some disciplines are well served by established and well known data archives. Examples include the UK Data Archive, Dryad, GenBank, EMBL-EBI, Natural Environment Research Council (NERC) Open Research Archive. Deposit in some of these archives is dependent on funding body or publisher. Over a thousand specialist archives are listed in the registry of research data repositories. However the fact remains some disciplines do not have obvious locations for archiving data. If this is the case, the UWTSD data archive can be used.

Long term accessibility?   A data archive should agree to store your data for a significant period of time. It should also undertake to ensure data will remain findable and accessible for this period. It should give details of scenarios and procedures where data will be removed or deleted. Check the policies and terms and conditions of the archive carefully to ensure that it will retain your data for as long as you require, or at least give you sufficient notice of removal.

A record for the data.   Two of the minimum requirements for datasets, required by funders and journal publishers, is that the dataset can be cited and found. To this end, the dataset should have a persistent, meaningful and discoverable record. The metadata describing the dataset should be compliant with common standards. This is likely to include the Datacite minimum metadata set for data citation..

Digital Object Identifiers (DOIs).  It is rapidly becoming the norm that datasets require a DOI (the digital equivalent of an ISBN) so they can be cited and found. Some major funders recommend (although don’t require) that a DOI is used for the unique identifier. Does your selected archive assign DOIs to deposited datasets?

Meeting funder’s requirements.   Does the archive enable you to meet your funder’s requirements? The requirements might include elements such as:

  • Ensuring access for a given period (eg 10 years after last access)
  • Assignment of sufficient metadata to describe and locate the data
  • Assignment of a DOI or other unique identifier

What are the costs of preservation?

Data management and sharing activities need to be costed into research, in terms of time and resources needed. Early planning of data management can significantly reduce the costs. The UK Data Archive has a very useful tool to assist in identifying costs of data management in the social sciences.

What you shouldn’t use:

Personal storage on an external hard drive

An external hard drive is inexpensive but should only be seen as a temporary or short-term storage option. The UWTSD RDM policy states that all research data should be stored in the University’s manged IT environment, and if you are working remotely, it should be transferred at the earliest opportunity.  The difference between storing your data on a drive and in a reputable archive is that:

  • A drive does not benefit from multiple backups in multiple locations
  • Preservation and curation actions will not be carried out to ensure continued accessibility of content over time
  • Using a drive places all the responsibility on the drive owner to do it properly
  • The data creator suffers the consequences of it not being assigned a DOI
  • The data creator suffers the consequences of it not being discoverable
  • All hard drives fail eventually

Cloud storage

This should be seen as an online equivalent of hard drive storage rather than as alternative to archival preservation. Pricing and convenience of use can be attractive but terms and conditions of each provider should be examined in detail. It is common for such services to clearly state they accept no liability for security breaches or data loss. In addition most Cloud Storage services do not offer appropriate citation or access options.

Can I preserve my data at UWTSD?

  • Yes.  The UWTSD Research Data Repository is available for simple deposit and long-term preservation of datasets.  This should only be used however if your dataset cannot be offered to an external repository.  You should always however submit a metadata record to the UWTSD repository. If you have any questions about depositing your data at UWTSD, please contact RIES or the LLC
  • Your funder may require you to deposit data to be preserved for the long-term in a specific data centre.  You will be expected however as an award holder, to deal with any copyright/third party issues that concern your research. 
  • There are a number of services available which can handle the preservation of research outputs for you, including repositories and data centres/archives. More information on these services can be found on our How to share page
  • Many journals require that data underpinning an article is cited and stored in a data archive for long-term accessibility.  It is also recommended that your data can be cited using a persistent identifier such as a DOI.  If you are not using an established subject or national data archive, the UWTSD Library and Learning Resource Centre can assign a DOI to your archived data. 

I am collaborating with a University overseas on a project & my data is not in the UK – what do I need to consider?

  • If your data is not stored in the UK, you will need to ensure that it is not held somewhere where the legal safeguards are lower than in the UK.  If you are using a cloud storage solution, you will need to be aware of the legal jurisdiction covering your cloud storage provider.

Do I have to preserve all digital data from my research – or just those data underpinning publication?

  • This differs by funder.  Check your funder data policy or see the section on funder requirements.  It may be that not everything can or should be preserved, as there may be ethical issues or too high a cost in doing so. Quite simply, you should aim to justify any decision to discard a digital output.   For further guidance see:
  • Link to To share or not to share   
  • Link to Ethical issues in data protection

What format should data be preserved in?

Data Archive will work with different data formats for different purposes. There are however optimal data formats that are used for long-term preservation of data.  The UK Data Archive has a useful summary on optimal file formats for long-term preservation of data which you should consult.

What about ethical and legal issues concerning retention of data?  How does this conflict with preservation?

  • Data may need to be anonymised so that individuals, organisations or businesses cannot be identified.  Alternatively sensitive and confidential data may be safeguarded effectively by regulating or controlling access to data or use of them.  Some repositories will allow you to submit your data at the end of a project (when it is easiest to pull the data together) but embargo (restrict access) its release for a few years.
  • In some situations, repositories may allow you to store part of a set of materials publicly and will maintain the other, more proprietary parts while keeping them hidden and inaccessible. For more information, see the UK Data Archive’s advice on Access Control.
  • In all cases however, you must ensure that you are compliant with the terms of the Data Protection Act (1988).  The UK Data Archive has very good advice.  In general you should consider:

How do I ensure my data is understandable and usable into the future?

  • Ensure effective documentation is being collected that describes not only types of data but also the decisions that lay behind any file naming.  If changes have occurred in working practices during the course of the research project, these need to be documented too.  See the section on Documentation and Organisation.

How do I preserve data to be shared – are there any additional stages of data preparation needed to allow re-use by others?

  • The UK Data Archive’s has excellent advice on ‘Planning for Sharing‘. This covers
    • Why share data
    • Roles & responsibilities
    • Costing
    • How to share data

How do I cover the costs of data preservation?

  • Most funders will cover appropriate costs of preparation and ingest of digital outputs that are incurred within the funding period.  Therefore it is important to address the issue of data preservation from the start of your project (in the data management plan) and include costs in grant applications.

 The UK Data Archive has produced this checklist which can help you identify what to put in place for good data practices, and which actions to take to optimise data sharing.

  • Are you using standardised and consistent procedures to collect, process, check, validate and verify data?
  • Are your structured data self-explanatory in terms of variable names, codes and abbreviations used?
  • Which descriptions and contextual documentation can explain what your data mean, how they were collected and the methods used to create them?
  • How will you label and organise data, records and files?
  • Will you apply consistency in how data are catalogued, transcribed and organised, e.g. standard templates or input forms?
  • Which data formats will you use? Do formats and software enable sharing and long-term validity of data, such as non-proprietary software and software based on open standards?
  • When converting data across formats, do you check that no data or internal metadata have been lost or changed?
  • Are your digital and non-digital data, and any copies, held in a safe and secure location?
  • Do you need to securely store personal or sensitive data?
  • If data are collected with mobile devices, how will you transfer and store the data?
  • If data are held in various places, how will you keep track of versions?
  • Are your files backed up sufficiently and regularly and are back-ups stored safely?
  • Do you know what the master version of your data files is?
  • Do your data contain confidential or sensitive information? If so, have you discussed data sharing with the respondents from whom you collected the data?
  • Are you gaining (written) consent from respondents to share data beyond your research?
  • Do you need to anonymise data, e.g. to remove identifying information or personal data, during research or in preparation for sharing?
  • Have you established who owns the copyright of your data? Might there be joint copyright?
  • Who has access to which data during and after research? Are various access regulations needed?
  • Who is responsible for which part of data management?
  • Do you need extra resources to manage data, such as people, time or hardware?

Training

  • RIES and LLC can offer advice a bespoke courses on research data management.  If you would like to arrange a workshop with your faculty, school or research group please contact RIES.
  • Resources from the 2015-16 CPD sessions will be available shortly, here.
  • Data management and sharing activities need to be costed into research, in terms of time and resources needed. Early planning of data management can significantly reduce the costs. The UK Data Archive has a very useful tool to assist in identifying costs of data management in the social sciences.
  • Preservation, Sharing and Licensing is an interactive online training module from Edinburgh University’s MANTRA Project.
  • DataCite and Databib both provide lists of research data repositories.

re3data.org

re3data.org is a global registry of research data repositories, covering a wide range of academic disciplines. The registry offers details of repositories intended for the permanent storage and access of research datasetsts, and aims to promote a culture of sharing, increased access and better visibility of research data.

Open Access Directory

A list of repositories and databases for open data from the Open Access Directory.

DataCite

DataCite’s aim is help make data more accessible and more useful. It’s purpose is to develop and support methods to locate, identify and cite data and other research objects. Specifically, it develops and supports the standards behind persistent identifiers for data, and its members assign them.   DataCite brings together actors from the research community to address the challenges of making research objects visible and accessible. Together they constitute a global network of dataset researchers.

Digital Curation Centre

The Digital Curation Centre (DCC) is a world-leading centre of expertise in digital information curation with a focus on building capacity, capability and skills for research data management across the UK's higher education research community. The Digital Curation Centre provides expert advice and practical help to anyone in UK higher education and research wanting to store, manage, protect and share digital research data.

Resources provided by the DCC include:

UK Data Archive

The UK Data Archive provides extensive good practice advice from the UK’s largest social science and humanities data archive.  The UK Data Archive website offers advice regarding:

BioSharing

BioSharing is a curated, searchable portal of inter-related data standards, databases, and policies in the life, environmental, and biomedical sciences.

Sherpa/Juliet

Research funders’ policies on open access data archiving and open access publishing.

Archaeology Data Service

Advice from the Archaeology Data Service, covering the whole data lifecycle.

These good practice guides cover a wide range of topics, including:

  • Planning for the Creation of Digital Data
  • Project Documentation
  • Project Metadata
  • Data Selection: Preservation Intervention Points
  • Copyright and Intellectual Property Rights
  • Databases and Spreadsheets
  • Data Collection and Fieldwork
  • Data Analysis and Visualisation
  • Preparing and Depositing Your Archive

Dataverse 

An open source web application for sharing, preserving, citing, exploring, and analysing research data. The Dataverse Project is based at Harvard University. The Harvard Dataverse repository is open to researchers from all disciplines (and from all over the world) who wish to deposit data and make it available for others to use.

 

 

 

 

 

 


Creative Commons Graphic
Adapted from the University of Oxford under Creative Commons Attribution 3.0 Unported licence (CC BY 3.0).  Original content at: http://researchdata.ox.ac.uk/