The FAIR² Chronicles: Data stories for an AI world

Senscience

The FAIR² Chronicles: Data Stories for an AI World” is an AI-generated podcast bringing published FAIR² datasets to life. Each episode spotlights real-world datasets structured under FAIR² principles—FAIR + AI-readiness, Responsible AI, and Context—showcasing their impact on scientific discovery, innovation, and global challenges. From climate research to biomedical breakthroughs, AI narrates the data’s journey, revealing how structured, machine-actionable datasets are driving the future of open science. senscience.substack.com

  1. From Trash to Trajectories: Linking Waste, Emissions, and Development in a Single Global System

    JAN 27

    From Trash to Trajectories: Linking Waste, Emissions, and Development in a Single Global System

    Waste is often reported as a management problem. Emissions are reported as a climate problem. This episode is about what happens when those two are finally treated as the same system. We dive into the Frontiers Planet Prize National Champion-Awarded Global Waste Sector Dataset (1990–2050), a harmonized resource spanning historical data through mid-century projections that explicitly links municipal solid waste generation to greenhouse gas emissions, while tying both back to the socioeconomic forces that drive them. Developed by Hoy, Woon, Chin, Fan, Yoo, and an international consortium of researchers, the dataset is framed not as a single study outcome, but as durable research infrastructure designed for reuse, comparison, and modeling. At its core, the dataset connects population growth and PPP-adjusted economic development to physical waste generation, then traces how that waste translates into carbon dioxide, methane, and nitrous oxide emissions through different treatment pathways. Historical data from major public sources—including the World Bank, OECD, Eurostat, and UNFCCC national reports—is rigorously harmonized before being extended into the future using Shared Socioeconomic Pathways (SSPs). Methodologically, the project is notable for how seriously it treats system complexity. Historical waste generation is reconstructed using fixed-effects panel regression to control for country-specific characteristics, while future emissions are modeled using country-level machine learning ensembles that capture nonlinear relationships—particularly critical for methane, whose climate impact is handled using GWP-STAR rather than conventional metrics. The result is a dataset that allows researchers to do more than track growth. It supports cross-country benchmarking, long-term decoupling analysis, and exploration of how waste management choices shape near- and long-term climate outcomes. By keeping waste generation and emissions structurally linked, the dataset avoids the common pitfall of treating climate impacts as detached from material flows. The authors are also explicit about the limits: national-scale resolution only, scenario-dependent futures, no explicit uncertainty intervals, and uneven country coverage driven by historical data availability. These constraints are documented as part of the dataset’s context, reinforcing responsible reuse rather than obscuring uncertainty. Delivered through a FAIR²-aligned data portal with persistent identifiers, rich metadata, and machine-actionable structure, this resource is designed to move directly into lifecycle assessment, climate modeling, and AI-driven analysis. If you’re interested in understanding waste not just as an output of consumption, but as a measurable driver of emissions across decades—and in how economic development, infrastructure, and climate impacts intersect at national scale—this episode offers a clear, integrated starting point. Hoy, Z.X., Woon, K.S., Chin, W.C., Fan, Y.V., & Yoo, S.J. (2025). Global Waste Sector Dataset (1990–2050): Scenario-Based Projections of Generation, Emissions, and Socioeconomic Drivers. Front. Environ. Sci., section Environmental Economics and Management. Data article: https://doi.org/10.3389/fenvs.2025.1717992. FAIR² Data portal: https://doi.org/10.71728/senscience.k2f7-p5v9. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit senscience.substack.com

    17 min
  2. Seeing Atolls as a System: Inside the First Fully Integrated Indo-Pacific Atoll Dataset

    JAN 27

    Seeing Atolls as a System: Inside the First Fully Integrated Indo-Pacific Atoll Dataset

    If you work in ecology, conservation, island biogeography, or environmental data—and you’ve ever struggled to connect biodiversity, climate, oceanography, and human history across small islands—this episode is for you. We take a deep dive into a landmark effort to harmonize data for all 310 Indo-Pacific atolls with permanent emergent land, transforming decades of scattered literature, field surveys, satellite products, and historical records into a single, integrated, machine-readable resource. Led by Frontiers Planet Prize national champion Sebastian Stiebel and an international team spanning academia and conservation organizations, this project represents a foundational shift in how atolls can be studied—not as isolated case studies, but as a connected system. The dataset synthesizes over 4,200 species records from 677 sources, standardized across 90 environmental, biological, and contextual variables. It combines terrestrial biodiversity inventories, seabird population estimates, climate and oceanographic drivers, reef and land habitat classifications, human population data, and a uniquely detailed layer on historical military land use—capturing legacy impacts that often shape present-day ecological outcomes but are rarely included in large-scale models. What makes this resource especially powerful is how it’s delivered. Rather than a static download, the data is available through an interactive, FAIR²-certified portal, designed to be immediately usable by both researchers and machines. By prioritizing AI readiness and responsible reuse, the project removes long-standing barriers between ecological data and predictive modeling. We also discuss the realities and limitations of working with historical sources—uneven sampling, taxonomic gaps, and the impossibility of retroactively standardizing past fieldwork—and why acknowledging those constraints is essential for responsible analysis. Even so, this dataset establishes a long-needed baseline for comparative research, conservation planning, and data-driven forecasting across one of the world’s most fragile and important ecosystems. If you’re interested in moving from descriptive ecology to predictive conservation—and in understanding how climate, biodiversity, and human history intersect across remote island systems—this episode is for you. Steibl S, Burnett MW, Holmes ND, Wegmann AS and Russell JC (2025). Atoll biodiversity and environments: an AI-ready, interactive data portal for Indo-Pacific atolls. Front. Environ. Sci. , section Environmental Informatics and Remote Sensing. Data article: https://doi.org/10.3389/fenvs.2025.1723851 FAIR² Data portal: https://doi.org/10.71728/senscience.4f2j-8h1k This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit senscience.substack.com

    12 min
  3. A Multi-Site View of Brain Injury: The TOP-NT Harmonized Rat MRI Dataset

    12/15/2025

    A Multi-Site View of Brain Injury: The TOP-NT Harmonized Rat MRI Dataset

    If you work in neuroimaging, neurotrauma, or computational neuroscience—and especially if you’ve ever wished for a truly comparable, cross-lab view of traumatic brain injury—this episode is for you. We dig into a remarkable new FAIR² dataset from the TOP-NT consortium: the first harmonized, multi-site diffusion MRI resource for preclinical TBI. It brings together 343 high-resolution scans from 184 rats across four research centers, all acquired under a unified protocol and processed through a rigorous, standardized pipeline. What makes this dataset so valuable? It marries tightly controlled acquisition with advanced harmonization methods like neuroCombat and multi-site template registration—removing scanner biases while preserving the biological injury signal. The result is a clean, comparable view of how structural brain changes unfold at 3 and 30 days after controlled cortical impact. Researchers can now reliably detect diffusion abnormalities, quantify tissue atrophy, and visualize injury progression across institutions. We also note the limitations—like the single injury model, two timepoints, and the challenge of fully removing site effects—and emphasize how the dataset, now viewable through the interactive FAIR² Data Portal and archived in the ODC-TBI repository, can still advance harmonization research, benchmark AI models, and strengthen reproducible TBI science moving forward. If you’re looking for a benchmark dataset for AI model training, injury signature discovery, cross-site reproducibility, or simply a clearer map of TBI evolution, this episode has you covered. Kislik G, Fox R, Korotcov A, Zhou J, Febo M, Moghadas B, Bibic A, Zou Y, Wan J, Koehler RC., Adebayo T, Burns MP., McCabe JT., Wang KK.W., Huie J.R, Ferguson AR., Paydar A, Wanner IB., Harris NG. and The TOP-NT Investigators (2025) Multi-site, in vivo MRI dataset of brain diffusivity measures before and after harmonization, and atrophy measures following controlled cortical impact in male and female adult rats. Front. Neurol. 16:1719618. doi: 10.3389/fneur.2025.1719618 This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit senscience.substack.com

    20 min
  4. Staying Ahead of SARS-CoV-2: Inside the Ultimate Spike Protein Mutation Dataset

    09/08/2025

    Staying Ahead of SARS-CoV-2: Inside the Ultimate Spike Protein Mutation Dataset

    If you work anywhere near virology, structural biology, or computational genomics—and especially if you’ve ever wished for a true map of the SARS-CoV-2 spike protein’s mutational landscape—this episode and the new FAIR² Data Article from the Stay Ahead Project is for you. We’re diving into the Stay Ahead Project’s latest data release: a thoroughly curated, structure-informed resource focused on the spike protein’s receptor binding domain (RBD). Created by a team led by Erik Schultes (LACDR/GoFair Foundation) with Max Van de Boom and Thomas Hankemeyer, this dataset systematically catalogs every possible single-point mutation in the RBD—over 3,700 in total—plus real-world Omicron variants. What sets this resource apart? It combines state-of-the-art protein structure prediction (using both AlphaFold2 and ESMFold), deep mutational scanning data for ACE2 binding and surface expression, and biophysical sequence features. We discuss the technical details, the challenges of integrating computational and experimental data, and how this dataset can inform predictive modeling of variant behavior. We also talk openly about the limitations—like the focus on the RBD and the challenges of modeling higher-order mutational effects—and the ways this resource, accessible via the FAIR² Data Portal, can support the research community moving forward. If you’re looking for new tools for variant surveillance, functional annotation, or just want a deeper understanding of spike protein evolution, this episode is for you. van den Boom M, Schultes E and Hankemeier T (2025) Structure-based prediction of SARS-CoV-2 variant properties using machine learning on mutational neighborhoods. Front. Bioinform. 5:1634111. doi: 10.3389/fbinf.2025.1634111 This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit senscience.substack.com

    16 min

About

The FAIR² Chronicles: Data Stories for an AI World” is an AI-generated podcast bringing published FAIR² datasets to life. Each episode spotlights real-world datasets structured under FAIR² principles—FAIR + AI-readiness, Responsible AI, and Context—showcasing their impact on scientific discovery, innovation, and global challenges. From climate research to biomedical breakthroughs, AI narrates the data’s journey, revealing how structured, machine-actionable datasets are driving the future of open science. senscience.substack.com