Skip to content
Washington University School of Medicine
The Library is Closed
Becker Medical Library Logotype
  • The Library is Closed
  • Resources
    • Library Catalog
    • E-Journals
    • E-Books
    • Databases
    • Subject Guides
    • Digital Commons@Becker
    • Software
    • The Center for the History of Medicine at Becker Medical Library
    • Feuerstein Health & Wellness Information Center
    • Center for Health and Science Communication
    • Research Profiles Support
  • Services
    • Author Analytics and Support
    • Grant Application and Compliance
    • Systematic and Scoping Review Service
    • Clinical Rounding
    • Health Literacy and Communication
    • Research Computing
    • Data Management and Sharing
    • Resource Management
    • Search and Reference
    • Library Classes
  • Archives & Rare Books
    • About Becker Archives and Rare Books
    • Archival Collections
    • Exhibits and Presentations
    • Policies and Requesting Materials
    • Rare Books
  • Using the Library
    • About the Library
    • Hours and Access
    • Becker Blog
    • Maps and Directions
    • Borrowing and Accounts
    • Requesting Materials (ILLiad System)
    • Computers and Software
    • Print and Copy
    • Remote/Proxy Server Access
    • Staff Directory
    • Make a Payment
  • Help

FAIR data principles: The keys to data sharing — part 1 of 4: Findable

By Seonyoung Kim — October 26, 2022

Here, we will break down the details of the FAIR data principles in a four-part blog series: Part 1: Findable, Part 2: Accessible, Part 3: Interoperable, Part 4: Reusable.

So what are the FAIR data principles, and why are they important? The FAIR data principles emphasize machine-actionability to ensure that computational systems can find, access, interoperate, and reuse data with minimal human user intervention. This is necessary because as data increases in volume, complexity and creation speed, it outpaces individual researchers’ ability to work with these large data sets effectively.

The NIH and many funding organizations worldwide require that data generated from their funded research be managed and shared using the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. These principles were first introduced in 2016 with the publication of the FAIR Guiding Principles for Scientific Data Management and Stewardship in Scientific Data and further expanded through the GO FAIR Initiative.

Adopting FAIR data principles will make it easier for researchers to use computational tools to search, process and analyze large datasets. Standardization with FAIR principles is also crucial for repurposing datasets for secondary research purposes. Below is a quick overview of the FAIR data principles described in the NIH Strategic Plan for Data Science.

  • To be Findable, data must have unique persistent identifiers (e.g., DOI (Digital Object Identifier)) to label it and make it searchable within a larger data structure
  • To be Accessible, data must be easily retrievable via open systems and include effective and secure authentication and authorization procedures
  • To be Interoperable, data should “use and speak the same language” using standardized vocabulary and data format
  • To be Reusable, data must be adequately described to a new user, have clear information about data-usage licenses, and have a traceable “owner’s manual” or provenance

GO-FAIR_Logo

Part 1: Findable

The key component that makes data findable online is metadata. Metadata is essentially “data about data” or data documentation. Clear and detailed documentation is essential. Metadata includes the data’s content, format, and internal organization which allows researchers to find, use, and cite your data set.

Metadata is required when data is deposited into a repository to be shared. However, it is best to document data from the beginning of the project through its completion to prevent important details from being lost or forgotten. Once published data is placed in a repository, it will often be assigned a DOI, a form of a persistent unique identifier (PID) for research products, which is crucial for “findability”.

Globally unique and persistent identifiers consist of an internet link (e.g., a URL to a web page that defines the concept, such as a particular human protein). Identifiers are essential to Open Science and will help others properly cite your work.

Below are the key elements that should be included to create rich metadata. An easy way to create metadata is to use an Excel spreadsheet as a collection template. Fill in the available information at the start of a project and update it as the project progresses. Keep this metadata file, along with the data itself, in the same folder.

  • Title: Name of the dataset or research project that generated it. If the dataset is part of a manuscript, it is recommended (and often required) to use the manuscript title to link the data with the publication
  • Creator/Author: Names and addresses of the organizations or people who created the data; the preferred format for personal names is surname first (e.g., Smith, Jane). Using ORCID iDs for authors is highly recommended
  • Funder: Funding agencies/organizations, including the Crossref Funder ID
  • Date: Key dates, including project start and end dates, release date, time period covered by the data and other dates related to the data lifespan, such as maintenance cycle and update schedule. The preferred format is the ISO 8601 standard (e.g., yyyy-mm-dd or yyyy.mm.dd-yyyy.mm.dd for a range)
  • Description: Keywords or phrases describing the subject or content of the data
  • Place: Note the physical locations where data are collected (e.g., Washington University School of Medicine in St. Louis, MPRB Building, Room 10302, Molecular Microbiology Imaging Facility)
  • Method: Describe how the data were generated, listing equipment and software used (including model and/or version numbers), formulae, algorithms, experimental protocols, reagents, and other details that one might include in a lab notebook. RRIDs (Research Resource Identifiers) are recommended for citing key resources such as antibodies, model organisms, cell lines, plasmids, and other tools (e.g., software, databases, services). Protocol DOIs can be generated using protocols.io. In addition, LabArchives, the Electronic Lab Notebook (ELN) WashU offers to researchers at no charge, recently integrated protocols.io, making it easy to incorporate it into your LabArchives notebook
  • Processing: A description of how the data have been filtered and processed prior to analysis
  • Source: Citations to data derived from other sources, including details of where the source data is held and how it was accessed
  • File inventory: A list of all files associated with the project, including extensions (e.g., baseline_CDI.csv, readme.txt)
  • File structure: Directory URL where your datasets are located, along with a description of how data files are organized
  • Necessary software: List special-purpose software required to create, view, analyze, or otherwise use the data

When you submit your datasets to a repository, you might be required to convert this metadata collection template to an open file format (e.g., .pdf or .txt) or transfer it into the repository’s metadata collection form. Once your dataset is deposited into a repository, your data will be assigned a DOI with rich metadata, allowing it to be findable online.

For more details about the FAIRification process for Findability (F1-F4), please visit the GO FAIR Website. In my next blog, I will review the Accessibility of the FAIR Data Principles, so please stay tuned!

Resources

  • Go FAIR: FAIR Data Principles
  • The NIH Strategic Plan for Data Science
  • DataCite metadata standard

Readings

  • Wilkinson, M., Dumontier, M., Aalbersberg, I. et al.The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). DOI: https://doi.org/10.1038/sdata.2016.18
  • Gould, Maria. “People, places, and things: Persistent identifiers in the scholarly communication landscape.” College & Research Libraries News[Online], 83.9 (2022): 398. Web. 17 Oct. 2022. DOI: https://doi.org/10.5860/crln.83.9.398

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Facebook (Opens in new window)
Posted in Mastering Information, Science and Informatics
Tagged FAIR Data, FAIR Data Principles, Findable, Metadata, NIH Data Management and Sharing Policy, Open Science, Persistent Unique Identifier (PID)

Post navigation

Previous: Scholarly publishing Round-up October 2022
Next: Resource spotlight: ‘Natural Medicines’
Blog Categories
  • All Posts
  • Announcements
  • Archives and Rare Books
  • Mastering Information
  • Scholarly Publishing
  • Science and Informatics
  • Staff News
  • Uncategorized
  • Email Updates

    Prefer updates in your inbox?

    Sign up for our email list

    Bernard Becker Medical Library
    MSC 8132-13-01
    660 South Euclid Avenue
    St. Louis, MO 63110-1010
    (314) 362-7080 | askbecker@wustl.edu
    • Resources
    • Services
    • Archives & Rare Books
    • Using the Library
    Becker Library Facebook
    Becker Library Twitter
    Becker Library Instagram
    YouTube
    Becker Library Feed
    Becker Library Newsletter
    © 2023 Washington University in St. Louis