Base by Base

Gustavo Barra

Base by Base explores advances in genetics and genomics, with a focus on gene-disease associations, variant interpretation, protein structure, and insights from exome and genome sequencing. Each episode breaks down key studies and their clinical relevance—one base at a time. Powered by AI, Base by Base offers a new way to learn on the go. Special thanks to authors who publish under CC BY 4.0, making open-access science faster to share and easier to explore.

  1. 11h ago

    384: RNA Brake on Cholera Phage: CisR Controls CTXϕ

    Haycocks JRJ et al., PNAS - This episode examines the discovery of CisR, a small RNA produced from the 3’UTR of prtV in Vibrio cholerae, which posttranscriptionally represses the CTXϕ-encoded cep mRNA via Hfq-mediated base-pairing. CisR accumulation is controlled by HapR and CRP and processed by RNase E, linking quorum sensing and carbon status to phage activation. We discuss the experimental evidence showing CisR lowers Cep protein levels and limits CTXϕ production under induction conditions. Key terms: small RNA, CTXϕ, Vibrio cholerae, Hfq, quorum sensing. Study Highlights: Researchers identified CisR as a ~50-nt sRNA derived from the 3’UTR of prtV that requires RNase E for processing and Hfq for stability and action. RIL-seq and reporter assays show CisR base-pairs with the cep mRNA ribosome binding site to inhibit translation, and deletion or overexpression of cisR respectively increases or decreases Cep levels and extracellular CTXϕ DNA after induction. Transcription of the vca0224-prtV-cisR operon is directly activated by HapR and CRP, linking CisR to quorum sensing and carbon catabolite signals. The work positions a core-genome sRNA as a posttranscriptional regulator that modulates a horizontally acquired phage life cycle. Conclusion: CisR is a 3’UTR-derived sRNA that integrates cell-density and metabolic signals to repress CTXϕ coat protein production via Hfq-dependent base-pairing with cep, thereby limiting phage production under stress and coordinating phage activation with host physiology. Music: Enjoy the music based on this article at the end of the episode. Article title: A 3’UTR-derived small RNA modulates the life cycle of the cholera toxin–encoding filamentous phage, CTXϕ First author: Haycocks JRJ Journal: PNAS DOI: 10.1073/pnas.2535142123 Reference: Haycocks JRJ, O’Driscoll E, Sprenger M, Thriene K, Jung E-M, Siemers M, Lippegaus A, Krautwurst S, Grainger DC, Papenfort K. A 3’UTR-derived small RNA modulates the life cycle of the cholera toxin–encoding filamentous phage, CTXϕ. PNAS. 2026;123(23):e2535142123. doi:10.1073/pnas.2535142123 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/cisr-controls-ctxphi-life-cycle QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-03. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Substantively audited the transcript sections describing CisR origin, CisR-cep interaction, HapR/CRP regulation, RIL-seq mapping, and functional impact on CTXϕ production under induction conditions. - transcript topics: CTXϕ life cycle and filamentous phage biology; CisR discovery from prtV 3’UTR and RNase E processing; HFQ and RIL-seq methodology to map RNA interactions; CisR-cep base-pairing and translational repression; HapR and CRP regulation of prtV-cisR transcription; CisR impact on CTXϕ production during MMC induction QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 6 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: - CisR is produced from the 3’...

    23 min
  2. 1d ago

    383: Genetics of the Circulating Proteome: pQTLs, Pathways, and Disease Links

    Koprulu M et al., Cell - A 38-cohort proteogenomic meta-analysis of up to 78,664 people maps fine‑mapped protein quantitative trait loci (pQTLs) across 1,116 circulating proteins, uses machine learning to assign trans effector genes, and triangulates genetic and observational evidence to highlight disease mechanisms and therapeutic opportunities. Key terms: proteogenomics, protein QTLs, N-linked glycosylation, Mendelian randomization, drug repurposing. Study Highlights: The study meta-analyzed antibody-based proteomic data across 38 cohorts (n up to 78,664) and identified 24,738 fine-mapped pQTL credible sets for 1,116 proteins, including 5,040 cis and 19,698 trans pQTLs. Machine-learning effector-gene assignment for trans-pQTLs revealed enriched pathways and cell types that regulate plasma proteins, with N-linked glycosylation and liver/hepatocyte signals prominent. Systematic causal inference and triangulation with observational biomarker studies identified candidate drug targets and repurposing signals (e.g., TYK2, furin) but also showed limited concordance between cis genetic instruments and measured protein–disease associations. Conclusion: Large-scale multi-cohort proteogenomics uncovers widespread distal genetic regulation of the circulating proteome, identifies biological pathways and tissues that shape plasma protein levels (notably N-linked glycosylation and hepatic/immune contributors), and provides genetic evidence to prioritize biomarkers and drug targets while highlighting discordance between genetic and observational signatures that requires careful interpretation. Music: Enjoy the music based on this article at the end of the episode. Article title: Multi-cohort proteogenomic analyses reveal genetic effects across the proteome and diseasome First author: Koprulu M Journal: Cell DOI: 10.1016/j.cell.2026.03.049 Reference: Koprulu M., Smith-Byrne K., Ferolito B.R., et al., 2026. Multi-cohort proteogenomic analyses reveal genetic effects across the proteome and diseasome. Cell 189, 3339–3357. https://doi.org/10.1016/j.cell.2026.03.049 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/multi-cohort-proteogenomics-pqtl-diseasome QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-02. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Audited the transcript sections describing trans-pQTL dominance, effector-gene mapping, N-linked glycosylation, tissue/cell-type enrichment, MR concordance issues, and clinical exemplars (TYK2 for RA, NT-proBNP for heart failure, extracellular furin). - transcript topics: Trans-pQTL paradigm and global regulatory architecture; Cis vs trans pQTL landscape and study scale; Olink proximity extension assay methodology; Effector gene assignment and pathway/cell-type enrichment; N-linked glycosylation as a key trans-regulatory pathway; Clinical translation: TYK2 and rheumatoid arthritis QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 6 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license ...

    23 min
  3. 1d ago

    382: How animal blood cells evolved from unicellular ancestors

    Nagahata Y et al., PNAS - A transcriptome-driven reconstruction of blood cell evolution shows modern animal blood lineages arose by repurposing an ancestral unicellular toolkit. The study traces macrophage-like origins, a bilaterian mast/killer split, and later deuterostome/vertebrate innovations. Key terms: blood cell evolution, macrophage, mast cell, Fos, thymus. Study Highlights: The authors used cross-species transcriptomics and TF-focused phylogenies to show that the metazoan blood cell program traces to a premetazoan toolkit governed by Fos, producing macrophage-like initial blood cells. They infer a first major bifurcation at the origin of Bilateria that gave rise to a mast/killer lineage equipped with granular proteases specialized for antiparasitic defense. Deuterostome and vertebrate ancestors then diversified this mast lineage into T/NK and erythrocyte/thrombocyte branches while B cells emerged from the macrophage branch, and a prototypic thymus formed at tunicate gill edges. Murine hematopoiesis retains vestiges of this history: macrophage and mast potentials are widely preserved and ancient HSC-like progenitors persist. Conclusion: Blood cell diversity in animals represents a Fos-mediated repurposing of an ancestral unicellular program, with macrophage-like cells as the earliest blood lineage and subsequent bilaterian and deuterostome innovations producing mast/killer, lymphoid, and erythroid branches. Music: Enjoy the music based on this article at the end of the episode. Article title: Animals have expanded the evolutionary legacy of unicellular ancestors in blood cells First author: Nagahata Y Journal: PNAS DOI: 10.1073/pnas.2528110123 Reference: Nagahata Y, Ishidae T, Satou Y, Nishimura Y, Kaitani R, Leong JCK, Oda-Ishiie I, Carmona-Rivas M, Najle SR, Ruiz-Trillo I, Kohtsuka H, Abeg S, Ikuta K, Miura T, Kawamoto H, Casacuberta E, Ogasawara M, Irieda N. Animals have expanded the evolutionary legacy of unicellular ancestors in blood cells. PNAS. 2026;123(23):e2528110123. doi:10.1073/pnas.2528110123 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/base-by-base-382-blood-cell-evolution QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-02. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Audited the spoken content describing the article’s core evolutionary timeline for blood cells, Fos-driven regulatory origins, macrophage-like ancestries, bilaterian mast/killer divergence, lineage branching to T/NK and erythrocyte/thrombocyte, B cell origin from macrophages, thymus evolution at gill edges, and implica - transcript topics: Origin of blood cells at the metazoan root and Fos-driven premetazoan toolkit; Cross-species transcriptomics and transcription factor phylogeny; Macrophage-like primitive blood cells; Divergence of mast/killer lineage at Bilateria origin and granzyme-containing cells; Erythrocyte/thrombocyte lineage derived from mast/killer cells; B cells arising from macrophages QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 6 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited:

    26 min
  4. 2d ago

    381: Light-written spatial barcodes enable tunable multiomic sequencing (BALI)

    Battistoni G et al., PNAS - This paper presents BALI, a light-driven method that writes combinatorial DNA spatial barcodes directly onto biomolecules in tissue by iterative photocleavage and ligation, enabling user-defined, scalable spatial profiling of RNA, chromatin accessibility, or both from the same section and automation via a LightScribe instrument. Key terms: spatial multiomics, photocaged ligation, BALI, LightScribe, chromatin accessibility. Study Highlights: BALI uses a photocaged ligation root and patterned UV illumination to direct iterative DNA ligations that assemble combinatorial spatial barcodes in situ. The method achieves tunable spatial resolution down to ~3 µm and can scale barcode complexity from a few regions to theoretical millions by increasing barcode digits. As proof of concept, the authors profiled transcriptomes and chromatin accessibility in defined regions of embryonic and adult mouse brain and combined both readouts in a single-section multiomic workflow, showing concordance with established datasets. They also built the LightScribe instrument to automate combinatorial barcode writing and demonstrated automated encoding of hundreds of regions. Conclusion: BALI couples light-directed combinatorial ligation with standard sequencing workflows to offer histology-aware, tunable, and scalable spatial multiomic profiling with subcellular resolution and an accessible automation path, enabling targeted high-throughput studies across large sample sets. Music: Enjoy the music based on this article at the end of the episode. Article title: Spatially tuneable multiomic sequencing using light-driven combinatorial barcoding of molecules in tissues First author: Battistoni G Journal: PNAS DOI: 10.1073/pnas.2527896123 Reference: Battistoni G, Torres-Garcia S, Sia CY, Corriero S, Boquetale C, Williams E, et al. Spatially tuneable multi-omics sequencing using light-driven combinatorial barcoding of molecules in tissues. Proc Natl Acad Sci U S A. 2026;123(21):e2527896123. doi:10.1073/pnas.2527896123 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/bali-light-driven-combinatorial-barcoding QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-01. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Audited the transcript sections describing the BALI method, UV uncaging, iterative ligation, LightScribe automation, and multiomic validation (RNA and ATAC) as well as scalability and cost considerations; compared these claims with the canonical article. - transcript topics: Spatial omics trade-offs and the need for histology-driven boundaries; BALI: Barcoding by Activated Linkage of Indexes; UV uncaging, ligation cycles, and barcode encoding; LightScribe automation with DMD mirrors; RNA profiling in embryonic mouse brain regions; Chromatin accessibility (ATAC) profiling in brain tissue QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 5 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: -...

    26 min
  5. 5d ago

    380: Prime-SGE maps drug-resistance variants at scale

    Abadie FMC et al., Cell Genomics - Abadie et al. present prime‑SGE, a pooled prime‑editing framework that installs thousands of precise point mutations across multiple oncogenes and identifies drug‑resistance variants by sequencing integrated pegRNAs after positive‑selection with kinase inhibitors. The method resolved known resistance mutations (e.g., EGFR C797S, KRAS G12 variants), uncovered less-characterized candidates, compared resistance landscapes across covalent and non‑covalent EGFR inhibitors, and validated resistant edits in vivo. Key terms: prime editing, drug resistance, EGFR, KRAS, multiplex screening. Study Highlights: Prime‑SGE uses libraries of barcoded pegRNAs/epegRNAs delivered at low MOI into PEmax‑expressing, MLH1‑knockout cell lines to program thousands of point mutations and read out variant abundance by sequencing integrated guides after drug selection. In pooled screens across eight oncogenes and three EGFR inhibitors, prime‑SGE recovered established resistance mutations (EGFR C797S, KRAS G12 variants) and identified less-characterized hits (e.g., EGFR Q791, Y801). Distinct resistance landscapes emerged for covalent versus non‑covalent EGFR inhibitors, and barcodes showed many resistant clones arose from independent editing events. Prime‑edited resistant cells formed tumors in osimertinib-treated xenografts, demonstrating in vivo relevance. Conclusion: Prime‑SGE enables scalable, positive‑selection profiling of thousands of precise point mutations across the genome to identify and compare drug‑resistance variants, though sensitivity is limited by variable prime editing efficiency. The approach can prioritize resistance variants for follow-up and inform inhibitor development and choice. Music: Enjoy the music based on this article at the end of the episode. Article title: A multiplex, prime editing framework for identifying drug resistance variants at scale First author: Abadie FMC Journal: Cell Genomics DOI: 10.1016/j.xgen.2026.101167 Reference: Abadie FMC, Suiter CC, Smith NT, et al. A multiplex, prime editing framework for identifying drug resistance variants at scale. Cell Genomics. 2026;6:101167. doi:10.1016/j.xgen.2026.101167 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/prime-sge-drug-resistance-variants QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-05-29. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Audited the transcript's sections describing Prime-SGE concept, the scale of edits, key resistance mutations (EGFR C797S, KRAS G12 variants, Q791, Y801), inhibitor contexts (osimertinib, sunvozertinib, CH7233163), in vivo xenograft validation, and limitations (editing efficiency, false negatives). QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 8 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: - Prime-SGE enables multiplexed installation of thousands of precise edits across multiple genes with readout by integrated pegRNA barcodes - Large-scale scr...

    12 min
  6. May 27

    379: Long reads reveal hidden structural and repeat variation in autism

    Mortazavi M et al., Cell Genomics - PaperCast Base by Base discusses a long-read whole-genome sequencing study of 267 individuals from 63 families that increased detection of structural variants and tandem repeats, resolved complex rearrangements, linked repeat expansions to methylation at FMR1, and estimated rare-variant contributions to ASD heritability. Key terms: long-read sequencing, structural variants, tandem repeats, autism, methylation. Study Highlights: The authors performed long-read WGS (PacBio HiFi and Oxford Nanopore) on 267 individuals and integrated calls with prior short-read data, increasing detection of gene-disrupting SVs by 33% and TRs by 38%. They discovered novel exonic de novo and somatic-mosaic SVs and characterized a previously undescribed class of nested DUP-DEL complex rearrangements. Joint phasing and methylation analysis identified deletions affecting imprinted genes (e.g., ADNP2) and showed that intermediate FMR1 CGG expansions (35–54 repeats) associate with allele-specific hypermethylation. Burden and heritability analyses indicate rare SVs, TRs, and damaging SNVs together explain a measurable fraction of ASD heritability, though power is limited by sample size. Conclusion: Long-read WGS uncovers substantial previously hidden structural and repeat variation and enables combined phased genetic and methylation analysis to improve functional interpretation in ASD, but larger cohorts and deeper coverage are needed to refine associations and heritability estimates. Music: Enjoy the music based on this article at the end of the episode. Article title: Long-read genome sequencing improves detection and functional interpretation of structural and repeat variants in autism First author: Mortazavi M Journal: Cell Genomics DOI: 10.1016/j.xgen.2026.101186 Reference: Mortazavi M., Guevara J., Diaz J., et al., 2026. Long-read genome sequencing improves detection and functional interpretation of structural and repeat variants in autism. Cell Genomics 6, 101186. https://doi.org/10.1016/j.xgen.2026.101186 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/long-read-wgs-autism-structural-repeat-variants QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-05-27. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Substantively audited portions describing LR-WGS methodology, SV/TR detection gains, mosaic and de novo SVs (STK33), large balanced rearrangements, nested DUP-DEL SVs, imprinting (ADNP2), FMR1 gray-zone methylation, ASD heritability, and study limitations/future directions. - transcript topics: LR-WGS methods and methylation calling; Structural variants and tandem repeats detection gains; Mosaic and de novo SVs (STK33) and functional impact; Complex DUP-DEL rearrangements; Imprinted genes and ADNP2; FMR1 CGG repeats and methylation, XCI independence QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 7 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: - Cohort: LR-WGS perform...

    27 min
  7. May 26

    378: Dominant-negative PSMB8 variants stall immunoproteasome assembly

    Wijngaard R et al., The American Journal of Human Genetics - Researchers describe seven individuals with monoallelic PSMB8 missense variants that impair immunoproteasome assembly, causing early-onset immunodeficiency and variable systemic inflammation via a dominant-negative mechanism. Key terms: PSMB8, immunoproteasome, PRAAS-ID, immunodeficiency, proteasome assembly. Study Highlights: Seven individuals from five families carrying distinct monoallelic PSMB8 variants presented with neonatal-onset immunodeficiency, B cell lymphopenia, hypogammaglobulinemia, and variable inflammatory disease. Structural modeling predicted destabilization of proteasome interfaces, and complexome profiling plus native assays showed reduced fully assembled immunoproteasomes with accumulation of a ∼440-kDa assembly intermediate. Mutant PSMB8 precursors accumulated, incorporation into 20S/26S complexes was reduced, immunoproteasome-specific activity decreased, and integrated stress response genes were induced. These data support a shared dominant-negative mechanism disrupting immunoproteasome biogenesis and immune signaling. Conclusion: Monoallelic PSMB8 missense variants impair incorporation of β5i into assembling immunoproteasomes, stalling biogenesis, reducing immunoproteasome abundance and activity, and producing clinically variable immunodeficiency with systemic inflammation consistent with PRAAS-ID. Music: Enjoy the music based on this article at the end of the episode. Article title: Monoallelic PSMB8 variants cause PRAAS with immunodeficiency through impaired immunoproteasome assembly First author: Wijngaard R Journal: The American Journal of Human Genetics DOI: 10.1016/j.ajhg.2026.04.015 Reference: Wijngaard R., van der Made C.I., Kalkan Uçar S., et al. Monoallelic PSMB8 variants cause PRAAS with immunodeficiency through impaired immunoproteasome assembly. Am J Hum Genet. 2026;113:1–19. doi:10.1016/j.ajhg.2026.04.015 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/monoallelic-psmb8-praas-id-immunoproteasome-assembly QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-05-26. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Substantive audit of immunoproteasome biology, dominant-negative mechanism of monoallelic PSMB8 variants, complexome profiling findings (440-kDa assembly intermediate, reduced IP abundance), functional consequences (IP activity reduction, ISR activation), and clinical implications described in the transcript. - transcript topics: Immunoproteasome structure and SP/IP distinction; Dominant-negative PSMB8 variants and mechanism; Complexome profiling methodology and IP assembly intermediates; Impaired IP biogenesis and 440-kDa intermediate; ISR activation and immune signaling effects; Clinical features: B cell lymphopenia, hypogammaglobulinemia, leukocyte inclusions QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 6 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi - article_title - article_journal - license Factual Items Audited: - Monoallelic PSMB8 vari...

    23 min
  8. May 26

    377: ProteomeLM — proteome-scale language modeling for interactomes and essential genes

    Malbranke C et al., Proceedings of the National Academy of Sciences (PNAS) - ProteomeLM is a transformer-based language model trained on complete proteomes that produces contextualized protein embeddings and attention signals which recover protein–protein interactions unsupervised and support supervised PPI and gene essentiality prediction across diverse taxa. Key terms: proteome language model, protein–protein interactions, gene essentiality, ProteomeLM, deep learning. Study Highlights: ProteomeLM was trained on ~32,000 proteomes using ESM‑C embeddings and a custom polar loss to reconstruct masked protein embeddings in proteome context. Its attention heads encode protein–protein interactions without supervision and distinguish direct physical binding, complex membership, and broader functional associations. As a fast first-pass filter it outperforms amino-acid coevolution (DCA) in recall while reducing compute by orders of magnitude. Downstream supervised models—ProteomeLM-PPI and ProteomeLM-Ess—achieve state-of-the-art cross-species PPI prediction and strong gene essentiality prediction that generalizes to held-out and synthetic minimal genomes. Conclusion: Representing proteins in whole-proteome context yields interpretable attention signals that capture functional and physical relationships, enabling rapid, accurate interactome screening and improved gene essentiality prediction across the tree of life. Music: Enjoy the music based on this article at the end of the episode. Article title: ProteomeLM: A proteome-scale language model enables accurate and rapid prediction of protein–protein interactions and gene essentiality across taxa First author: Malbranke C Journal: Proceedings of the National Academy of Sciences (PNAS) DOI: 10.1073/pnas.2524201123 Reference: Malbranke C, Zalaffi GP, Bitbol A-F. ProteomeLM: A proteome-scale language model enabling accurate and rapid prediction of protein–protein interactions and gene essentiality across taxa. Proc Natl Acad Sci U S A. 2026;123:e2524201123. doi:10.1073/pnas.2524201123 License: This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/ Support: Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00 Official website https://basebybase.com On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics. Episode link: https://basebybase.com/episodes/proteomelm-interactomes-essentiality QC: This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-05-26. QC Scope: - article metadata and core scientific claims from the narration - excludes analogies, intro/outro, and music - transcript coverage: Audited substantive scientific content in transcript: ProteomeLM architecture, functional encoding, polar loss, unsupervised PPI via attention, speed/screening benefits, supervised PPI (ProteomeLM-PPI), gene essentiality predictions (ProteomeLM-Ess), and cross-species/minimal cells. - transcript topics: ProteomeLM architecture and training on whole proteomes; Functional encoding using orthology (OrthoDB); Polar loss and avoiding reliance on coarse functional encoding; Attention coefficients encoding protein-protein interactions (PPI) in unsupervised manner; Unsupervised PPI detection and protein complex membership; Speed and scalability of whole-interactome screening vs DCA QC Summary: - factual score: 10/10 - metadata score: 10/10 - supported core claims: 6 - claims flagged for review: 0 - metadata checks passed: 4 - metadata issues found: 0 Metadata Audited: - article_doi...

    26 min

Ratings & Reviews

3
out of 5
2 Ratings

About

Base by Base explores advances in genetics and genomics, with a focus on gene-disease associations, variant interpretation, protein structure, and insights from exome and genome sequencing. Each episode breaks down key studies and their clinical relevance—one base at a time. Powered by AI, Base by Base offers a new way to learn on the go. Special thanks to authors who publish under CC BY 4.0, making open-access science faster to share and easier to explore.

You Might Also Like