Sample Exchange Blog

Insights, research, and innovations in the world of scientific sample sharing and open science. Stay updated on the latest developments in sample tracking, attribution, and collaborative research.

Showing 2 articles

Revolutionizing Sample Science: How Better Tracking Can Unlock Hidden Research Potential

Opening Doors to a New Era of Sample Discovery and Attribution

Sample tracking and attribution workflow diagram

Workflow diagram showing the integration of persistent identifiers in sample tracking and research attribution. Source: Damerow et al., Scientific Data (2025)

Imagine a world where every rock sample collected from a remote mountain, every soil core extracted from a wetland, and every water sample drawn from a river could tell its complete story—not just where it came from, but every analysis performed, every dataset created, and every scientific discovery it enabled. This vision is closer to reality thanks to groundbreaking new guidelines from the Earth Science Information Partners (ESIP) Physical Samples Curation Cluster.

The Hidden Crisis in Sample Science

Physical samples are the unsung heroes of scientific research. From understanding climate change through ice cores to discovering new medicines from soil microbes, samples drive innovation across disciplines. Yet, we're facing a critical problem: most samples disappear into a "black hole" after collection.

Consider this: The University of Michigan Museum of Zoology manages over 150,000 specimens. When researchers tried to track how these specimens were used in publications, they found that only 245 out of 1,297 papers properly cited the specimens. The rest? Lost to poor documentation practices, making it impossible to understand the true impact of these invaluable collections.

Why This Matters for Open Science

The new ESIP guidelines address four critical challenges:

1. Enabling Large-Scale Sample Discovery

When researchers can efficiently cite hundreds or thousands of samples in their work, we can finally see the big picture of how samples contribute to science. The IEDA2 system already links over 34,000 unique samples to datasets—imagine this scaled globally!

2. Giving Credit Where It's Due

Collection managers, field researchers, and laboratories invest enormous resources in sample collection and curation. Proper tracking means they can finally demonstrate their impact to funders and institutions, ensuring continued support for these critical resources.

3. Tracking the Data Journey

A single sample might generate genomic data at one lab, chemical analyses at another, and ecological observations in the field. Current systems lose these connections, but with persistent identifiers (PIDs), we can follow a sample's complete scientific journey.

4. Breaking Down Disciplinary Silos

Environmental samples often cross disciplines—a soil sample might be relevant to microbiologists, geochemists, and climate scientists. Better tracking enables unexpected discoveries at these intersections.

The Power of Persistent Identifiers

The solution centers on PIDs—globally unique identifiers that work like DOIs for publications but for physical samples. Just as you can always find a paper using its DOI, PIDs ensure samples remain findable and citable forever. Over 12.5 million IGSN IDs have already been created, showing the momentum behind this movement.

What This Means for Researchers

The guidelines provide practical steps any researcher can implement today:

  • Describe samples with rich metadata using community standards
  • Assign or use existing PIDs for all samples
  • Include PIDs in datasets and publications
  • Cite samples properly in papers, making them discoverable

Building Tomorrow's Research Infrastructure

This isn't just about better bookkeeping—it's about transforming how we do science. When samples are properly tracked:

  • Synthesis studies become possible across massive scales (think GBIF's 3.1 billion species records)
  • Interdisciplinary collaborations flourish as researchers discover relevant samples from other fields
  • Irreplaceable samples (from sites that no longer exist or can't be resampled) become accessible to global research
  • Research reproducibility improves as others can access the exact samples used in studies

Join the Movement

The future of sample-based research is collaborative, transparent, and interconnected. Whether you're a researcher planning field work, a collection manager preserving specimens, or a data scientist building the next generation of research tools, you have a role to play.

By adopting these practices, we're not just organizing data—we're unlocking the full potential of every sample collected, ensuring that the effort invested in field work continues to generate discoveries for generations to come.

Ready to transform your sample management? Start by exploring the full guidelines and join the growing community building the future of open sample science.

The physical samples we collect today are the scientific treasures of tomorrow. Let's make sure their stories are never lost.

Research Citation

Damerow, J.E., Raia, N.H., Stanley, V. et al. Opening doors to physical sample tracking and attribution in Earth and environmental sciences. Sci Data 12, 1047 (2025). https://doi.org/10.1038/s41597-025-05295-z

Read Full Article
Received: 13 May 2024
Accepted: 29 May 2025
Published: 20 June 2025

From Chaos to Clarity: How BioSamples is Revolutionizing Biological Data Management with 18 Million Samples and Counting

Making Sample Metadata FAIR at Scale Through Machine Learning and Community Standards

BioSamples database architecture and integration diagram

BioSamples database architecture showing the integration of machine learning, community standards, and multi-omics data management. Source: Courtot et al., Nucleic Acids Research (2022)

In the world of biological research, we're drowning in data—but not in the good way. Imagine having 18 million biological samples scattered across databases worldwide, each described with different terms, formats, and standards. Now imagine trying to find all COVID-19 samples, or all plant samples with both genetic and physical trait data. Until recently, this was like searching for needles in a haystack while blindfolded.

Enter the BioSamples database at EMBL-EBI, which has transformed from a simple metadata repository into a sophisticated FAIR data powerhouse that's revolutionizing how we manage, discover, and connect biological samples across the globe.

The Scale of the Challenge

The numbers tell a compelling story:

  • 18 million samples (tripled from 5 million in just 3 years)
  • 50,000+ unique attributes describing samples
  • 60 million unique attribute values
  • 24 different ways to record longitude alone!

This explosion of heterogeneous data reflects the beautiful diversity of life sciences research—but it also creates a massive integration headache. How do you harmonize data when one researcher calls it "collection date" and another types "collection timestamp" or "collection time"?

Three Layers of Innovation

BioSamples has developed a brilliant "layer cake" approach to tackle this complexity:

Layer 1: Automated FAIRification

Using machine learning and text mining, BioSamples automatically:

  • Identifies and merges similar attributes (reducing the attribute space by 10%)
  • Suggests corrections for common mistakes
  • Validates samples against community standards
  • Adds ontology annotations for better searchability

One stunning example: after curation, searches for "sample type" yielded 25% more results—that's 180,000 additional samples made discoverable!

Layer 2: Data Management Infrastructure

BioSamples acts as a central hub, allowing researchers to:

  • Register samples before experiments (capturing metadata that might otherwise be lost)
  • Link samples across multiple archives
  • Track relationships between parent samples and subsamples
  • Maintain consistent identifiers across time and institutions

Layer 3: Multi-Omics Integration

At the highest level, BioSamples enables complex queries across archives. Imagine finding "COVID-19 pathogen samples in ENA associated with human samples in EGA"—connecting viral and patient data while respecting privacy controls.

Real-World Impact Stories

Fighting COVID-19 Through Better Data

When the pandemic struck, researchers faced an urgent challenge: COVID-19 samples were labeled dozens of different ways ("novel coronavirus pneumonia," "nCoV," "Wuhan seafood market pneumonia virus"). BioSamples:

  • Standardized 1.2 million COVID-19 samples with consistent taxonomy
  • Created a drag-and-drop submission tool for overwhelmed labs
  • Enabled the COVID-19 Data Portal to aggregate global research efforts

Bridging the Plant Genomics Gap

Plant researchers have long struggled to connect genetic data with physical traits—crucial for crop breeding. BioSamples now:

  • Implements the MIAPPE standard for plant experiments
  • Links genotype data in EVA with phenotype data in ENA
  • Provides unified authentication across archives

Empowering Drug Discovery

The ReSOLUTE project studies proteins crucial for drug development. Previously, vital experimental details were buried in PDF files. Now:

  • Automated ETL processes extract metadata from PDFs
  • Samples are validated against MINSEQE standards
  • Cell line information links to the Cellosaurus database

The Power of Community Standards

What makes BioSamples special isn't just its size—it's the commitment to community-driven standards. The platform hosts:

  • JSON Schema representations of community standards
  • Automated validation at submission time
  • A recommendation engine suggesting metadata improvements
  • Integration with GA4GH for genomic data sharing

Looking Forward: Federation, Not Centralization

Perhaps most importantly, BioSamples represents a philosophical shift in how we approach biological data. Rather than forcing everyone into one system, it provides:

  • APIs for automated curation
  • Tools that can be used independently
  • Standards that work across platforms
  • A federation model where data stays distributed but becomes discoverable

Join the FAIR Data Revolution

The transformation of BioSamples from a simple repository to an intelligent, interconnected system shows what's possible when we combine:

  • Smart technology (machine learning, graph databases)
  • Community standards (MIAPPE, MINSEQE, GA4GH)
  • Human expertise (manual curation, domain knowledge)
  • Open science principles (FAIR data, accessibility)

For researchers, this means less time wrestling with metadata and more time making discoveries. For institutions, it means their sample collections become more valuable and impactful. For science as a whole, it means we can finally start answering questions that require integrating data across disciplines, institutions, and continents.

The future of biological research isn't just about generating more data—it's about making that data work together. BioSamples is showing us how.

Ready to make your samples FAIR? Visit BioSamples to explore the database, validate your metadata, or contribute to the growing ecosystem of discoverable biological samples.

Research Citation

Mélanie Courtot, Dipayan Gupta, Isuru Liyanage, Fuqi Xu, Tony Burdett, BioSamples database: FAIRer samples metadata to accelerate research data management, Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D1500–D1507. https://doi.org/10.1093/nar/gkab1046

Published: 7 January 2022