This is part 4 of a four-part blog series on FAIR data principles (Part 1: Findable, part2: Accessible, part3: Interoperable, part4: Reusable). We reviewed making data findable in the part 1 blog, accessible in the part 2 blog, and interoperable in the part 3 blog. In this blog, we will discuss how to make data reusable, which is the ultimate goal of the FAIR data principles.
Part 4: Reusable
To optimize the reuse of data, metadata and data must be well described so that they can be replicated and/or combined in different settings. There are four principles that will help accomplish this goal. We will use the phrase ‘(meta)data’ in cases where each principle should be applied to both metadata and data.
- (Meta)data are richly described with a plurality of accurate and relevant attributes
This principle focuses on ‘plurality’ of metadata to provide as much information as possible to make data useful in a particular context. Some examples of (meta)data include:
- Describe the scope of your data: for what purpose was it generated/collected?
- Mention any particularities or limitations about the data that other users should be aware of.
- Describe the level of aggregation (e.g., individual, aggregated, summarized), and the degree of data processing that has occurred (i.e., raw or processed).
- Specify the equipment and the software (name and version) used to collect/analyze data.
- Ensure that all variable names are explained (e.g., data dictionary) or self-explained (i.e., defined in the research field’s controlled vocabulary)
- (Meta)data are released with a clear and accessible data usage license
This principle is about legal interoperability. Usage rights should be clearly described since ambiguity could severely limit the reuse of your data. Clarity of licensing status and the conditions under which the data can be used should be clear to both human users and machines (e.g., automated searches for licensing considerations). Commonly used licenses like MIT or Creative Commons can be linked to your data. The DTL FAIRifier provides methods for marking up this metadata.
- (Meta)data are associated with detailed provenance
For others to reuse your data, they should know where the data came from, whom to cite and how you wish to be acknowledged. Please include a description of the workflow that led to your data, how it has been processed, whether it has been published before, and whether it contains data from another source (secondary analysis). This information should be described in a machine-readable format.
- (Meta)data meet domain-relevant community standards
It is easier to reuse (e.g., compare or combine) data sets if they are organized in a standardized way or in a standard template, have sustainable open file formats, documentation (metadata), and use a common vocabulary. If community standards (e.g., common data elements and data standards) or best data archiving and sharing practices exist in a field, they should be followed.
For more details about the FAIRification process for Reusability (R1-R4), please visit the Go Fair Website.
This concludes the four-part blog series of the FAIR Data Principles. Good data management and stewardship using the FAIR data principles will lead to high-quality digital publications that facilitate and simplify the ongoing process of discovery, integration, and reuse in secondary studies, thus maximizing their digital publishing value.
Resources
- FAIR data principles: The keys to data sharing – part 1 of 4: Findable
- FAIR data principles: The keys to data sharing – part 2 of 4: Accessible
- FAIR data principles: The keys to data sharing – part 3 of 4: Interoperable
- Go FAIR: FAIR data principles
- The NIH Strategic Plan for data science
- DataCite metadata standard
- Creative commons license RDF
- DTL data FAIRPort: Find FAIR data tools
Readings
- Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). DOI: https://doi.org/10.1038/sdata.2016.18