Organising your data
Future-proof your data! Find out more about organising, naming, structuring and documenting.
You may have planned your data management strategy down to the last detail and cleared all of the ethical issues and intellectual property rights, but if you don’t organise your data properly on a day-to-day basis, there is always a risk that you won’t be able to find things when you need them. Likewise, if you don’t document your data, you may not be able to understand why exactly you recorded what you did, or how your data was derived when you come back to it in future. If you are planning on sharing your data at any point, then documentation is especially important.
Take a look at the software and web services available to you. Ensure you are using the most appropriate tools for structuring the information you are gathering.
What do I need to consider when choosing a filename?
- It’s generally useful to aim for file names which are concise, but informative – it makes life easier if you can tell what’s in a file without having to open it.
- Similarly, being consistent in your file naming practices will make it easier to locate the file you want. Within a research group, you may want to agree on file naming conventions early on in the project
- Operating systems usually default to sorting files alphabetically, so it can be helpful to think about what comes at the start of the file name – is it more useful to order the files by date, by author, or by subject, for example?
- If you have multiple versions or drafts of a file, it can also be useful to include a version number in the file name – this makes it straightforward to see which copy is the most recent one.
What about file structures? What tips are there for developing a system?
- Most operating systems default to a hierarchical file structure – files inside folders, which may be nested inside other folders. This great if your material can easily be grouped into relatively discrete categories.
- In planning a hierarchical folder structure, aim for a balance between breadth and depth – so no one category gets too big, but also so that you don’t have to click through endless folders to find a file.
- In some cases, it may be more helpful to use a tag-based system – where each file is assigned one or more tags, or labels. This makes it easier to have overlapping categories, and files can be categorised in multiple ways simultaneously (by subject, by author, and by the project it relates to, for example). Some modern operating systems will allow you to add tags to files; file tagging software is also available.
- It’s worth taking time every now and then to reassess your folder or tag structure, perhaps moving old, unused items to a folder called ‘Archive’ or something similar so they don’t clutter up the screen.
What are documentation and metadata, and why should I consider creating them?
- Good documentation makes material understandable, verifiable, and reusable (by you or by others).
- It includes all the contextual information needed to help a future user interpret it properly – for example, information about when, why, and by whom the data was created, what methods were used, and explanation of acronyms, coding, or jargon.
- It is good practice to begin documenting your data at the start of your research project and to continue to add information as the project progresses. You should also include procedures for documentation in your data planning activities.
- ‘Metadata’ is simply ‘data about data’. It is related to the broader contextual information that describes your data, but is usually more structured in that it conforms to set standards and is machine readable. One typical use of metadata is to create a catalogue record for a dataset held in an archive. By using a standard set of tags, an automatic system can tell where the information about the title, creator, description and so forth begin and end.
- The UK Data Archive has an excellent overview of this topic.
- Library staff are expert in data modelling and metadata for description, discovery and preservation. Please contact us for help.
What is reference management software?
- Reference management software can be used to store details of all the articles, books, and other sources you make use of in your research, and to automatically generate citations in written work.
- You can also use reference management software to store copies of articles (usually as PDFs), and to record your own notes. Some software packages offer additional features, such as the ability to annotate PDFs.
- Popular reference managers include EndNote, RefWorks, Mendeley, Zotero, and Colwiz.
The Digital Curation Centre offers a list of tools and services for Managing Active Research Data
JISC offers a collection of advice pages on Managing Digital Media
Social Sciences: The UK Data Archive provides advice pages on Documenting Your Data and Formatting Your Data, aimed at researchers working in the social sciences and some humanities disciplines. Teaching materials for a classroom-based course are also available.
Archaeology: Module 3 of the DataTrain Archaeology teaching materials provides a guide to working with digital data (including file structure and documentation). The entire course Open Access Post-Graduate Teaching Materials in Managing Research Data in Archaeology will also be of interest.
Social Anthropology: Module 2 of the DataTrain Social Anthropology teaching materials (aimed at pre-fieldwork doctoral students) looks at documentation.