This is part 3 of a four-part blog series on FAIR data principles (Part 1: Findable, part2: Accessible, part3: Interoperable, part4: Reusable). We reviewed making data findable using metadata in the part 1 blog and making data accessible in the part 2 blog. In this blog, we will discuss how to make data interoperable.
Part 3: Interoperable
Interoperability is defined as “the ability of data or tools from non-cooperating resources to integrate or work together with minimal effort” in the 2016 FAIR Guiding Principles publication. Data need to be interoperable to be integrated into larger data sets and processed by various applications or workflows for analysis, storage and processing. The key to making this happen can be broken down into the following three principles. We will use ‘(meta)data’ in cases where each principle should be applied to both metadata and data.
- (Meta)data use a formal, accessible, shared and broadly applicable language for knowledge representation
The critical part of FAIR data is that both humans and computers should be able to exchange and interpret each other’s data without requiring specialized programs, translators or mappings. For computers, interoperability means that each computer system knows the other system’s data exchange formats. For this to happen — and to ensure automatic findability and interoperability of datasets — it is crucial to use open file formats (e.g., jpeg, mp4, csv, txt, pdf, html, etc.), commonly used controlled vocabularies, ontologies and thesauri. Wikipedia provides an extensive list of open file formats, so convert your datasets into open file formats before you submit them to a repository. In addition, using common data elements and data standards helps accomplish this goal.
- (Meta)data use vocabularies that follow FAIR data principles
The controlled vocabulary that describes datasets needs to be documented and resolvable using globally unique and persistent identifiers (PIDs) such as DOI (Digital Object Identifier). This documentation needs to be easily findable and accessible by anyone who uses the datasets. A good example would be the supplementary files, such as a data dictionary and readme file, that are required when you submit datasets to a repository. These files will be included in the DOI issued for the dataset and downloadable along with the dataset. One way to ensure interoperability is using the FAIR Data Point (FDP), a metadata service that provides access to metadata following the FAIR data principles.
- (Meta)data include qualified references to other (meta)data
A qualified reference is a cross-reference that explains its intent. For example, you should specify if one dataset builds on another dataset, if additional datasets are needed to complete the data or if complementary information is stored in a different dataset. The goal is to create meaningful links between (meta)data resources to enrich the contextual knowledge of the data. In particular, the scientific links between the datasets need to be described, and all datasets must be appropriately cited, including their globally unique and persistent identifiers (PIDs).
For more details about the FAIRification process for Interoperability (I1-I3), please visit the Go FAIR website. Stay tuned for the last part of this blog series (part 4), where I will review FAIR data principles’ reusability aspect, the FAIR data’s ultimate goal!
Resources
- FAIR data principles: The keys to data sharing – part 1 of 4: Findable
- FAIR data principles: The keys to data sharing – part 2 of 4: Accessible
- Go FAIR: FAIR data principles
- The NIH strategic plan for data science
- DataCite metadata standard
- FAIRDataPoint (FDP)
- Resource Description Framework (RDF)
Readings
- Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). DOI: https://doi.org/10.1038/sdata.2016.18