This is part 2 of a four-part blog series on FAIR data principles (part 1: Findable, part2: Accessible, part3: Interoperable, part4: Reusable). We reviewed making data findable using metadata in part 1, so now we will discuss how to make data accessible.
Part 2: Accessible
There are three principles:
- Data should be retrievable by their identifier using a standardized communication protocol that is open, free, and universally available.
To maximize data reuse, the protocol should be free and open-sourced. Most internet users retrieve data by clicking on a link, where the computer loads data in the user’s web browser. FAIR data retrieval should not require specialized or proprietary tools or communication methods. The commonly used protocols are HTTP, FTP, and SMTP. This facilitates global data retrieval by anyone with a computer and an internet connection. Non-compliant examples are Skype and the Microsoft Exchange Server protocol, which are proprietary and therefore not universally implementable. For highly sensitive data, it is perfectly FAIR to use a contact protocol, which provides an email and telephone number of a contact person who can provide access to the data. Contact protocols should be clearly captured in the metadata.
- The communication protocol allows for an authentication and authorization procedure where necessary.
The “A” in FAIR implies that one should provide the exact conditions under which the data are accessible so that even protected and private data can be FAIR. Therefore, accessibility means that computer software must be able to understand the access requirements and either execute the requirements or alert the user. For restricted access data, a repository will request users to create a user account to authenticate the user and enforce the specific data use agreement set by the dataset owner.
- Metadata should be accessible even when the data is no longer available.
Maintaining data online has associated costs, and over time, old datasets can degrade or disappear (invalid links). However, storing the metadata is generally easier and cheaper and should persist even after the data are no longer maintained, allowing authors, institutions or publications associated with the original research to be contacted and followed up with.
Selecting a suitable data repository with the desirable characteristics recommended by NIH is important and will address accessibility correctly. For example, Digital Commons@Becker meets these recommended criteria and is available at no charge to all researchers at WashU School of Medicine.
For more details about the FAIRification process for Accessibility (A1-A4), please visit the Go FAIR website. Stay tuned for Part 3 of this blog series, where I will review the Interoperability aspect of FAIR Data Principles.
Resources
- FAIR data principles: The keys to data sharing – part 1 of 4: Findable
- Go FAIR: FAIR data principles
- The NIH strategic plan for data science
- DataCite metadata standard
Readings
- Wilkinson, M., Dumontier, M., Aalbersberg, I. et al.The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). DOI: https://doi.org/10.1038/sdata.2016.18