Base by Base

Gustavo Barra

Base by Base explores advances in genetics and genomics, with a focus on gene-disease associations, variant interpretation, protein structure, and insights from exome and genome sequencing. Each episode breaks down key studies and their clinical relevance—one base at a time. Powered by AI, Base by Base offers a new way to learn on the go. Special thanks to authors who publish under CC BY 4.0, making open-access science faster to share and easier to explore.

  1. 1 day ago

    401: LDB1 variants split neurodevelopmental outcomes by location and mechanism

    Fluri R et al., The American Journal of Human Genetics - This episode examines a cohort study of 16 individuals with de novo LDB1 variants that reveals two overlapping but distinct neurodevelopmental phenotypes tied to variant location. Functional assays and Drosophila models demonstrate loss-of-function effects for N-terminal variants and dominant-negative effects for C-terminal variants. Key terms: LDB1, neurodevelopmental disorder, ventriculomegaly, dominant-negative, haploinsufficiency. Study Highlights: The authors assembled 16 individuals with de novo LDB1 variants and mapped variants to the N-terminal dimerization domain or the C-terminal LIM interaction domain. In vitro assays showed N-terminal missense variants disrupt homodimerization leading to loss of function, while C-terminal variants impair LHX2 binding and act in a dominant-negative manner. Drosophila knockdown and overexpression corroborated dosage sensitivity and distinct in vivo effects, including rescue by wild-type LDB1 and worsening by C-terminal variants. Clinically, C-terminal LID-affecting variants associate with congenital ventriculomegaly and more frequent extra‑neural anomalies, whereas N-terminal variants tend to cause variable NDD without consistent brain malformations. Conclusion: Variant location in LDB1 predicts distinct pathomechanisms and overlapping clinical presentations: N-terminal variants cause haploinsufficiency/loss of function, while C-terminal LID variants act dominant-negatively and are linked to ventriculomegaly and broader organ involvement. Music: Enjoy the music based on this article at the end of the episode. Article title: De novo variants in LDB1 are linked to distinct neurodevelopmental phenotypes determined by variant location and differing pathomechanisms First author: Fluri R Journal: The American Journal of Human Genetics DOI: 10.1016/j.ajhg.2026.05.012 Reference: Fluri R., Coll-Tané M., Brunet T., et al. De novo variants in LDB1 are linked to distinct neurodevelopmental phenotypes determined by variant location and differing pathomechanisms. The American Journal of Human Genetics. 2026;113:1–15. doi:10.1016/j.ajhg.2026.05.012 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/ldb1-variant-location-pathomechanisms QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-23. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Audited sections describing LDB1 structure (DD and LID), variant spectrum (N-terminal vs C-terminal), two mechanisms (haploinsufficiency vs dominant-negative), cellular assays (protein levels, ubiquitination, aggregates), LDB1-LHX2 interactions, Drosophila chi model (dosage sensitivity, rescue/toxicity, sleep), and cli - transcript topics: LDB1 structure and domains (DD and LID); Variant spectrum across LDB1 (N-terminal vs C-terminal); Mechanisms: haploinsufficiency and dominant-negative effects; Cellular assays: protein stability, ubiquitination, aggregates; LDB1 interactions: dimerization and LHX2 binding; Drosophila model chi (chip) dosage sensitivity and experiments QC Summary: - factual score: 9/10 - metadata score: 10/10 - supported core claims: 7 - claims flagged for review: 1... Chapters (00:00:20) - Beyond the genetic blueprint of neurodevelopmental disorders(00:02:51) - Common mutations in the LDB1 gene cause congenital ventric(00:08:42) - Mutations in the LDB1 gene cause severe brain dysfunction(00:14:24) - C terminal variant causes sleep disorders in flies

    24 min
  2. 1 day ago

    400: Complete chromosome 21 centromere sequencing and Down syndrome

    Mastrorosa F et al., The American Journal of Human Genetics - Long-read assemblies and epigenetic mapping of chromosome 21 centromeres in families with trisomy 21 reveal centromere size diversity, two cases of extreme maternal centromere size asymmetry, and no global enrichment of small centromeres in affected individuals. Key terms: trisomy 21, centromere, alpha-satellite, long-read sequencing, meiotic nondisjunction. Study Highlights: Using PacBio HiFi and ultra-long ONT reads with hybrid assembly and DiMeLo-seq, the authors fully resolved chr21 centromeres in eight T21 individuals and several parents and compared them to 287 population haplotypes. Small centromeres were not overall enriched in T21 cases, contradicting earlier reports, but two families showed extreme (>10-fold) maternal centromere size asymmetry. CDRs and CENP-A/CENP-C signals were present across haplotypes and methylation profiles were largely conserved between generations and sample types. Phylogenetic analysis indicates recent rapid evolution of chr21 centromere haplotypes that may facilitate such asymmetry. Conclusion: Centromere size alone does not explain trisomy 21 risk at the population level, but extreme maternal centromere size asymmetry appears in a minority of families and may contribute to nondisjunction in those cases. Music: Enjoy the music based on this article at the end of the episode. Article title: Complete chromosome 21 centromere sequencing of families with Down syndrome First author: Mastrorosa F Journal: The American Journal of Human Genetics DOI: 10.1016/j.ajhg.2026.05.010 Reference: Mastrorosa F.K., Daponte A., de Gennaro L., et al. Complete chromosome 21 centromere sequencing of families with Down syndrome. The American Journal of Human Genetics. 113, 1–18 (2026). https://doi.org/10.1016/j.ajhg.2026.05.010 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/chr21-centromere-sequencing-down-syndrome QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-23. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Audited transcript sections covering centromere structure, long-read sequencing workflow, extreme centromere size asymmetry findings, CpG/epigenetic mapping (CDRs, CENP-A/CENP-C), and population/evolutionary context. - transcript topics: Centromere structure and alpha-satellite HOR arrays; Maternal nondisjunction and Down syndrome etiology; Long-read sequencing technologies and hybrid phasing; Epigenetic centromere mapping (CDRs, CENP-A/CENP-C, CpG methylation); Centromere size asymmetry in Down syndrome families; Population diversity of chr21 centromeres (African ancestry four-mer HOR) QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 7 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: - Small chr21 centromeres are not enriched in Down syndrome cases compared with controls (p = 0.72). - Extreme centromere size asymmetry (>10-fold) observed in two Down syndrome families (e.g., 10.7-fold... Chapters (00:00:20) - Down Syndrome: The mystery of the cell division(00:04:45) - Down Syndrome: The repetitive DNA handles(00:09:49) - Down Syndrome: The tug of war(00:14:16) - The genetics of trisomy 21(00:15:58) - Down Syndrome: The mystery of the genetic cause(00:20:13) - A Single Link in the Code

    23 min
  3. 2 days ago

    399: Ménière disease: inner ear development and retinoic acid pathways

    Shi Z et al., The American Journal of Human Genetics - A large GWAS meta-analysis across five biobanks (8,969 cases, 1,962,542 controls) identifies five genome-wide significant loci for Ménière disease, implicating developmental regulators EYA1/EYA4 and retinoic acid metabolism genes including CYP26A1. Integrative fine-mapping, eQTL, and single-cell expression place these signals in inner ear cell types and link MD to related sensory and neurological traits. Key terms: Ménière disease, EYA1, EYA4, retinoic acid, GWAS. Study Highlights: A GWAS meta-analysis of 8,969 Ménière disease cases and 1,962,542 controls across five biobanks identified five independent genome-wide significant loci, including two signals each at EYA4 and EYA1 and one near CYP26A1. Observed-scale SNP heritability was estimated at 7% (SE 0.8%), indicating a modest contribution of common variation. Fine-mapping, eQTL and single-cell expression data implicate dysregulation of inner ear developmental regulators and retinoic acid metabolism. Phenome-wide and genetic-correlation analyses reveal shared architecture with vertigo, tinnitus, hearing loss, migraine, and sleep apnea. Conclusion: Regulatory common variants in genes governing inner ear development (EYA1, EYA4) and retinoic acid signaling (CYP26A1/C1, ALDH1A2) contribute to Ménière disease risk, providing a genetic framework for functional follow-up and polygenic risk modeling. Music: Enjoy the music based on this article at the end of the episode. Article title: Genome-wide analysis implicates inner ear development in Ménière disease First author: Shi Z Journal: The American Journal of Human Genetics DOI: 10.1016/j.ajhg.2026.05.011 Reference: Shi Z, Mandla R, Li J, et al. Genome-wide analysis implicates inner ear development in Ménière disease. The American Journal of Human Genetics. 2026;113:1–12. https://doi.org/10.1016/j.ajhg.2026.05.011 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/base-by-base-399-meniere-inner-ear QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-22. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Substantive audit of the transcript's representation of GWAS scale, loci and genes (EYA4, EYA1, CYP26A1, ALDH1A2, LMO4), developmental/retinoic acid pathways, genetic correlations, limitations, and future directions as reported in the canonical article. - transcript topics: Genome-wide association study scale and meta-analysis across five biobanks; Identification of five independent signals: two at EYA4, two at EYA1, one near CYP26A1; EYA4 and EYA1 as developmental regulators of inner ear; Regulatory vs coding variants and gene expression implications; Retinoic acid signaling pathway involvement: CYP26A1/C1 and ALDH1A2; LMO4 as a suggestive signal and its developmental context QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 7 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: - MD SNP-based heritability estimated at 7% (SE 0.8%) on the observ... Chapters (00:00:20) - The genetic basis of Meniere disease(00:02:04) - Scientists solve the genetic mystery of Meniere's disease(00:06:35) - The genetic heritability of Meniere's(00:10:18) - Genetics of Meniere's Disease and gl(00:15:35) - Genetic determinants of Meniere's

    22 min
  4. 2 days ago

    398: Modeling JAK2V617F Clonal Expansion in the General Population

    Snyder J et al., Proceedings of the National Academy of Sciences (PNAS) - Longitudinal VAF measurements from 67 JAK2V617F-positive participants in the Danish GESUS study were analyzed with a Moran-process stem cell model and ABC-SMC to infer per-individual self-renewal advantages and assess prognostic value for MPN progression. Key terms: JAK2V617F, clonal hematopoiesis, Moran model, myeloproliferative neoplasms, mathematical modeling. Study Highlights: The study follows 67 individuals from the GESUS cohort with >1% JAK2V617F VAF and multiple follow-up measurements over >10 years. A Moran-process model at the HSC level fitted by ABC-SMC reproduced longitudinal VAF trajectories for 66 of 67 subjects and yielded per-individual estimates of the mutant self-renewal advantage s. Results show heterogeneity: ~70% of subjects had a statistically positive s, ~18% had a negative s, and ~12% were neutral, indicating many carriers show no expansion or even contraction. The fitted model can predict future VAF evolution for most subjects but s alone is not a perfect predictor of MPN diagnosis. Conclusion: A stem-cell Moran-process model explains longitudinal JAK2V617F VAF dynamics in most GESUS participants; inferred selective advantage varies widely, correlates with—but does not fully predict—MPN diagnosis, supporting individualized monitoring and further study of non-VAF risk factors. Music: Enjoy the music based on this article at the end of the episode. Article title: Mathematical modeling of JAK2V617F clonal expansion in a general population cohort First author: Snyder J Journal: Proceedings of the National Academy of Sciences (PNAS) DOI: 10.1073/pnas.2507773123 Reference: Snyder J, Andersen M, Gudmand-Høyer J, et al. Mathematical modeling of JAK2V617F clonal expansion in a general population cohort. Proc Natl Acad Sci U.S.A. 2026;123:e2507773123. doi:10.1073/pnas.2507773123 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/base398-jak2v617f-moran QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-21. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Substantively audited sections include background on JAK2V617F and MPN, Moran process model with carrying capacity and generation time, ABC-SMC inference of per-subject s, results breakdown (positive/neutral/negative s and 66/67 fit), link between s and MPN progression, inflammation/CRP and statin discussion, and limit - transcript topics: JAK2V617F mutation and myeloproliferative neoplasms biology; Moran process model and stem cell carrying capacity; Inference of per-individual selective advantage (s) via ABC-SMC; VAF trajectories across individuals and fit to the model; Association between s and progression to MPN; imperfect prediction; Inflammation (CRP) and statins as modifiers of clonal dynamics QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 8 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: - 67 individuals with >1% JAK2V617F VAF in... Chapters (00:00:00) - Can Your Body Naturally Suppress a Cancer Mutation?(00:05:30) - The genetics of JAK2 cancer(00:10:47) - The Stochastic Selection of MPN(00:16:05) - JAK2 mutation and the immune system

    25 min
  5. 4 days ago

    397: SciPhy: Bayesian phylogenetics for sequential genetic lineage tracing

    Seidel et al., Nature Communications - SciPhy is a BEAST2-integrated Bayesian framework that models sequential CRISPR‑based insertion edits to jointly infer time-scaled single-cell lineage trees, editing dynamics, and population growth. The authors validate SciPhy on simulations and apply it to HEK293T monoclonal expansion and murine gastruloid datasets, showing improved tree and branch-length inference relative to UPGMA and enabling phylodynamic estimates of growth. Key terms: Bayesian phylogenetics, lineage tracing, CRISPR, DNA Typewriter, phylodynamics. Study Highlights: SciPhy implements a mechanistic model of ordered, irreversible insertions with per-tape clock rates and insertion probabilities and computes the likelihood using a pruning algorithm within BEAST2. Validation on calibrated simulations shows correct posterior coverage and high correlations between true and inferred editing and tree parameters. Application to HEK293T and gastruloid data recovers per-tape edit rates and preferential insert probabilities, infers growth rates including time-varying dynamics in gastruloid development, and yields more accurate topologies and branch lengths than UPGMA. The framework reports uncertainty and enables joint phylodynamic analysis of lineage tracing data. Conclusion: A mechanistic, order-aware Bayesian model for sequential genome-editing lineage recorders improves reconstruction of time-calibrated cell lineage trees, quantifies editing biases and clock rates, and enables inference of cell population dynamics from single-cell lineage tracing data. Music: Enjoy the music based on this article at the end of the episode. Article title: SciPhy: A Bayesian phylogenetic framework using sequential genetic lineage tracing data First author: Seidel Journal: Nature Communications DOI: 10.1038/s41467-026-73377-6 Reference: Seidel, S., Zwaans, A., Regalado, S. et al. SciPhy: A Bayesian phylogenetic framework using sequential genetic lineage tracing data. Nat Commun (2026). https://doi.org/10.1038/s41467-026-73377-6 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/sciphy-bayesian-lineage-tracing QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-20. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Substantive auditing of the transcript's description of SciPhy's mechanistic model (ordered CRISPR edits), BEAST2 implementation, validation results, HEK293T monoclonal expansion, gastruloid development with CHIR treatment, and discussed limitations and computational considerations. - transcript topics: Mechanistic editing model with ordered inserts; BEAST2 integration and likelihood calculation; Editing rate clock rates and insertion probabilities; Validation: in-silico, HEK293T monoclonal expansion; Insertion bias: CAT vs GCG; Growth dynamics and phylodynamic inference QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 7 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: - SciPhy is a Bayesian framewor...

    23 min
  6. 5 days ago

    396: Physical homology recognition between DNA duplexes

    Stannard A et al., Proceedings of the National Academy of Sciences (PNAS) - This episode summarizes a PNAS study that uses a FRET-responsive DNA tweezers nanosensor to detect and quantify sequence-dependent interactions between intact double-stranded DNA duplexes in ionic solutions. Key terms: homologous recognition, double-stranded DNA, electrostatic interactions, DNA nanosensor, divalent cations. Study Highlights: Using a tuned DNA-tweezers FRET assay, the authors show that homologous dsDNA duplexes coalign more readily than heterologous ones in the presence of divalent cations. They quantify a homologous recognition free energy of roughly −0.02 kBT (≈ −0.01 kcal/mol) per base pair and show this is largely independent of Mg2+ versus Ca2+ within the tested range. Controls exclude strand exchange and sequence-specific ion adsorption as alternative explanations. An electrostatic helical coherence model reproduces the magnitude and salt dependence of the measured effect. Conclusion: Protein-free, sequence-specific electrostatic interactions between intact dsDNA can produce a small but measurable homology recognition energy consistent with helical coherence theory and relevant under confined, DNA-rich conditions. Music: Enjoy the music based on this article at the end of the episode. Article title: Direct evidence and quantification of homologous recognition between DNA duplexes First author: Stannard A Journal: Proceedings of the National Academy of Sciences (PNAS) DOI: 10.1073/pnas.2530949123 Reference: Stannard A., Haimov E., Hedley J.G., et al. Direct evidence and quantification of homologous recognition between DNA duplexes. Proc. Natl. Acad. Sci. U.S.A. 2026; doi:10.1073/pnas.2530949123. License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/homologous-dna-recognition-396 QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-18. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Audited the transcript portions describing the DNA tweezers design and readout, cation dependence, homologous vs heterologous comparison, longer-duplex effects, strand-exchange controls, helical-coherence theory, and cellular relevance. - transcript topics: DNA tweezers design and FRET readout; Monovalent vs divalent cation effects on duplex coalignment; Homologous versus heterologous sequence comparisons; Strand-exchange controls and GC clamps; Length dependence: 36 bp vs 68 bp and entropic effects; Helical coherence theory mechanism and charge patterning QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 7 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: - Direct evidence for homologous recognition between intact dsDNA in protein-free ionic conditions. - Recognition energy per base pair is about -0.02 kBT (≈ -0.01 kcal/mol per base pair). - Recognition is largely independent of whether Mg2+ or Ca2+ is the divalent cation and of their concentration within tested range. - Divalent cations promote coalignme...

    23 min
  7. 5 days ago

    395: Extended sequence context shapes mutational bias in Escherichia coli

    Green R et al., PNAS - Collating >100,000 base-pair substitutions from 32 mutation-accumulation experiments, this study shows that sequence context well beyond adjacent bases — up to ±6 bp and even hundreds of bp — shapes mutational biases in E. coli and interacts with DNA repair. Key terms: mutational bias, sequence context, Escherichia coli, mismatch repair, mononucleotide runs. Study Highlights: The authors analyzed 117,807 base-pair substitutions from 32 MA experiments and quantified nucleotide frequencies up to ±6 bp (and sliding windows to 1,000 bp) around mutation sites. Extended context effects vary by substitution type, DNA repair background (proofreading and MMR), and replication strand. Mononucleotide runs (notably AC3+ and GC3+) are strong hotspots consistent with transient misalignment; GC3+ can increase G:C→C:G transversions by orders of magnitude. Broader GC% biases persist hundreds of base pairs and are modulated by MMR activity. Conclusion: Extended sequence context and its interaction with proofreading, mismatch repair, and replication strand identity create complex, BPS-specific mutational signatures in E. coli, improving the resolution of mutation-rate predictions and highlighting long-range and motif-specific hotspots. Music: Enjoy the music based on this article at the end of the episode. Article title: Extended sequence context shapes mutational bias in Escherichia coli First author: Green R Journal: PNAS DOI: 10.1073/pnas.2601345123 Reference: Green R., Jago M.J., Knight C.G., Czernuszka M.R., Denisova S., Krašovec R., Lagator M. Extended sequence context shapes mutational bias in Escherichia coli. PNAS. 2026;123(23):e2601345123. doi:10.1073/pnas.2601345123. License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/extended-sequence-context-mutational-bias-e-coli QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-18. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Audited sections covering extended sequence context (±6 bp), mononucleotide run hotspots (AC3+, GC3+), GC3+ and G:C→C:G transversions, 5′ preceding nucleotide effects, leading vs lagging strand replication, and GC-content effects up to 1000 bp. - transcript topics: Extended sequence context (±6 bp); Mononucleotide runs and transient misalignment; GC3+ hotspot and other motifs; DNA proofreading and mismatch repair effects; Leading vs lagging strand replication and context biases; Regional GC-content effects up to 1000 bp QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 6 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: - Extended context up to ±6 bp influences mutational bias beyond trinucleotide context - Mononucleotide runs AC3+ and GC3+ are mutational hotspots; GC3+ increases G:C→C:G transversions up to ~10^4-fold - A strong GC3+ hotspot near GG sequences can reach extremely large fold increases; in some backgrounds ~50,000-fold for G:C→C:G transversions at GG C7 - 5′ preceding nucleotide...

    23 min
  8. 17 Jun

    394: Benchmarking LLMs for cfRNA biomarker discovery

    Gaudio HA et al., Nature Communications - This episode examines a systematic benchmark of six commercial large language models applied to plasma cell-free RNA across three clinical cohorts, assessing LLM-driven gene-panel nomination and autonomous classifier construction versus conventional statistical workflows. Key terms: large language models, cell-free RNA, biomarker discovery, machine learning, diagnostics. Study Highlights: Six state-of-the-art LLMs were tested on cfRNA datasets from Kawasaki disease vs MIS-C, tuberculosis vs symptomatic controls, and ME/CFS vs sedentary controls for gene-panel nomination and end-to-end classifier building. LLM-nominated panels recapitulated canonical immune pathways and outperformed random gene sets, matching differential expression–derived panels in the tuberculosis cohort. End-to-end automation was feasible but model- and task-dependent: OpenAI o3 matched conventional performance for KD vs MIS-C but underperformed for TB and ME/CFS. Models showed prompt-adherence issues and sometimes returned non-reference or hallucinated features, which limits reproducibility. Conclusion: Current LLMs can extract biologically meaningful cfRNA candidate panels and partially automate biomarker workflows, but results are variable and traditional or hybrid statistical workflows remain necessary; rigorous validation and constrained output schemas are required before clinical deployment. Music: Enjoy the music based on this article at the end of the episode. Article title: Benchmarking large language models for cell-free RNA diagnostic biomarker discovery First author: Gaudio HA Journal: Nature Communications DOI: 10.1038/s41467-026-74077-x Reference: Gaudio HA, Bliss A, Loy CJ, Eweis‑LaBolle D, Gardella AE & De Vlaminck I. Benchmarking large language models for cell-free RNA diagnostic biomarker discovery. Nature Communications (2026). doi:10.1038/s41467-026-74077-x License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/benchmarking-llms-cfrna QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-17. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Substantively audited the transcript's coverage of the study design, LLM benchmarking across three cohorts, gene-panel nomination, end-to-end classifier construction, prompt effects, and the hybrid-workflow conclusions, with reference to supporting results in the article. - transcript topics: Study design and cohorts (KD vs MIS-C, TB vs symptomatic controls, ME/CFS vs sedentary controls); Prompt adherence and gene-panel nomination; Comparison of LLM panels to random and DGE panels; End-to-end classifier construction and cross-cohort performance; Disease-informed vs disease-naïve prompts impact; Limitations: probability calibration, data leakage concerns QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 5 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: - Six LLMs were evaluated across three cohorts: OpenAI o3, GPT-4o, Claude...

    23 min

About

Base by Base explores advances in genetics and genomics, with a focus on gene-disease associations, variant interpretation, protein structure, and insights from exome and genome sequencing. Each episode breaks down key studies and their clinical relevance—one base at a time. Powered by AI, Base by Base offers a new way to learn on the go. Special thanks to authors who publish under CC BY 4.0, making open-access science faster to share and easier to explore.