Code & Cure

Vasanth Sarathy & Laura Hagopian

Decoding health in the age of AI Hosted by an AI researcher and a medical doctor, this podcast unpacks how artificial intelligence and emerging technologies are transforming how we understand, measure, and care for our bodies and minds. Each episode unpacks a real-world topic to ask not just what’s new, but what’s true—and what’s at stake as healthcare becomes increasingly data-driven. If you're curious about how health tech really works—and what it means for your body, your choices, and your future—this podcast is for you. We’re here to explore ideas—not to diagnose or treat. This podcast doesn’t provide medical advice.

  1. 23H AGO

    #33 - Patients Don’t Talk Like Textbooks

    What if the most confident answer in the room is also the most misleading? Large language models can ace medical exams, yet falter when faced with a real person’s messy, incomplete story. In this episode, we explore how that gap plays out in one of medicine’s highest-stakes decisions: triage. Drawing on Laura’s experience in emergency medicine and Vasanth’s background in AI research, we unpack a new study where laypeople role-played both routine and high-risk conditions and turned to leading LLMs for advice. The surprising twist? Tiny shifts in phrasing produced opposite recommendations—“rest at home” versus “go to the ER”—revealing how sensitive these systems are to prompts, and how an agreeable tone can drown out critical clinical signals. We take you inside the exam room to contrast what clinicians actually do. Real diagnosis isn’t a single question and answer—it’s an evolving process. Doctors gather a history that unfolds with each response, test competing hypotheses, and scan for subtle red flags and nonverbal cues that never show up in a chat window. From the ominous “worst headache of my life” to abdominal pain that could signal gallstones—or a heart attack—Laura explains how risk-first thinking and strategic follow-ups shape safe decisions. Meanwhile, Vasanth breaks down how preference-tuned models are trained to satisfy users, not challenge them—and why linguistic confidence can increase even as clinical accuracy declines. The study’s findings are sobering: models struggled to identify key conditions, and their triage decisions were no better than basic symptom checkers. But this isn’t a story of hype or doom—it’s about design. Reliable medical AI must interrogate before it interprets. That means structured red-flag checks, resistance to user-led anchors like “maybe it’s just stress,” and clear, actionable next steps instead of overwhelming option lists. Calibrated uncertainty, transparent reasoning, and human oversight can transform AI from a risky decider into a valuable assistant. If you care about digital health, safe triage, and the future of human-AI collaboration in medicine, this conversation offers a grounded look at both the limits—and the real promise—of these tools. If this episode resonated, follow the show, share it with a colleague, and leave a quick review to help more listeners discover Code and Cure. Reference: Reliability of LLMs as medical assistants for the general public: a randomized preregistered study Andrew M. Bean et al. Nature Medicine (2026) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    30 min
  2. FEB 19

    #32 - When Data Isn’t Better: Rethinking Fertility Tracking

    What if the most reliable ways to track fertility are also the simplest? In this episode, we examine the science of ovulation timing and hold modern wearables to a high standard, comparing passive temperature and vital sign data with established methods like LH surge testing and cervical mucus observation. Drawing on perspectives from a cognitive scientist and an emergency physician, we explain what each method actually measures, how well it performs outside the lab, and where convenience falls short of accuracy. We begin by clarifying the fertile window and the underlying physiology, then connect that biology to signals people can track at home. Changes in cervical mucus provide a strong, real time indicator of peak fertility. Urine LH strips offer a clear 24 to 36 hour advance signal at low cost. Basal body temperature can confirm that ovulation has already occurred, but it is less helpful for predicting timing in advance. Against this foundation, we review a meta analysis of wearable data showing that temperature remains the strongest predictor, while heart rate and variability contribute only modest improvements. The conclusion is straightforward: wearables can approximate existing signals, but they do not clearly outperform simple tools for timing intercourse, insemination, or pregnancy avoidance. Along the way, we challenge the idea that more data and a paid app automatically lead to better outcomes. We weigh privacy risks, cost, and false confidence against the accessibility of test strips and the high signal value of mucus observations. The takeaway is a practical hierarchy. Use LH strips and cervical mucus as primary guides, add calendar context and basal temperature if useful, and treat wearables as optional conveniences rather than a definitive solution. Women’s health deserves thoughtful innovation, and sometimes real progress comes from choosing what works, not what is marketed most aggressively. If this episode resonated, follow the show, share it with a friend navigating fertility, and leave a review with your experience and what has worked best for you. Reference: The diagnostic accuracy of wearable digital technology in detecting fertility window and menstrual cycles: a systematic review and Bayesian network meta-analysis Yue Shi et al. Nature NPJ Digital Medicine (2026) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    20 min
  3. FEB 12

    #31 - How Retrieval-Augmented AI Can Verify Clinical Summaries

    Fluent summaries that cannot prove their claims are a hidden liability in healthcare, quietly eroding clinician trust and wasting time. In this episode, we walk through a practical system that replaces “sounds right” narratives with evidence-backed summaries by pairing retrieval augmented generation with a large language model that serves as a judge. Instead of asking one AI to write and police itself, the work is divided. One model drafts the summary, while another breaks it into atomic claims, retrieves supporting chart excerpts, and issues clear verdicts of supported, not supported, or insufficient, with explanations clinicians can review. We explain why generic summarization often breaks down in clinical settings and how retrieval augmented generation keeps the model grounded in the patient’s actual record. The conversation digs into subtle but common failure modes, including when a model ignores retrieved evidence, when a sentence mixes correct and incorrect facts, and when wording implies causation that the record does not support. A concrete example brings this to life: a claim that a patient was intubated for septic shock is overturned by operative notes showing intubation for a procedure, with the system flagging the discrepancy and guiding a precise correction. That is not just higher accuracy; it is accountability you can audit later. We also explore a deeper layer of the problem: argumentation. Clinical care is not just a list of facts, but the relationships between them. By evaluating claims alongside their evidence, surfacing contradictions, and pushing for precise language, the system helps generate summaries that reflect real clinical reasoning rather than confident guessing. The payoff is less time spent chasing errors, more time with patients, and a defensible trail for quality review and compliance. If you care about chart review, clinical documentation, retrieval augmented generation, and building AI systems clinicians can trust, this episode offers practical takeaways.  Reference: Verifying Facts in Patient Care Documents Generated by Large Language Models Using Electronic Health Records Philip Chung et al.  NEJM AI (2025) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    24 min
  4. FEB 5

    #30 - From Reddit To Rescue: Real-Time Signals Of The Opioid Crisis

    What if the earliest warning sign of an opioid overdose surge isn’t locked inside a delayed report, but unfolding in real time on Reddit? In this episode, we explore how social media conversations, especially pseudonymous, community-led forums, can reveal emerging overdose risks before traditional surveillance systems catch up. We unpack research that analyzed more than a decade of posts to show how even simple drug mentions sharpened forecasts of overdose death rates. The signal was especially strong for fentanyl, exposing where existing public health tools lag and why online communities often see danger first. Along the way, we explain the mechanics in plain language: how time-series models respond faster than surveys, why subreddit structure filters noise, and how historical archives enable rigorous validation. But it doesn’t stop at counting mentions. We dig into what happens when posts are classified by lived experience: overdose stories, sourcing concerns, or test strip discussions.  We also examine what broke during COVID, when behavior and access shifted overnight, and how to detect those regime changes before models start to fail. The takeaway is urgent and practical. Social data won’t replace public health surveillance, but it can make it fast enough to save lives. We share a field-ready playbook for turning online signals into timely interventions, and show how feedback from the same communities can explain why a response worked—or didn’t—so teams can adapt quickly. If you care about real-time epidemiology, harm reduction, and responsible AI in healthcare, this conversation connects raw text to real-world impact. Reference: Monitoring the opioid epidemic via social media discussions Delaney A Smith et al.  Nature NPJ Digital Health (2025) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    19 min
  5. JAN 29

    #29 - AI Hype Meets Hospital Reality

    What really happens when a “smart” system steps into the operating room, and collides with the messy, time-pressured reality of clinical care? In this episode, we unpack a multi-center pilot that streamed audio and video from live surgeries to fuel safety checklists, flag cases for review, and promise rapid, actionable insight. What emerged instead was a clear-eyed lesson in the gap between aspiration and execution. Across four fault lines, the story shows where clinicians’ expectations of AI ran ahead of what today’s systems can reliably deliver, and what that means for patient safety. We begin with the promise. Surgeons and care teams envisioned near-instant post-case summaries: what went well, what raised concern, and which patients might be at risk. The reality looked different. Training demands, configuration work, and brittle workflows made it clear that AI is anything but plug-and-play. We explore why polished language can be mistaken for intelligence, why models need the right tools to reason effectively, and why moving AI from one hospital to another is closer to a redesign than a simple deployment. Then we follow the data. When it takes six to eight weeks to turn raw footage into usable insight, the value of learning forums like morbidity and mortality conferences quickly erodes. Privacy protections, de-identification, and quality control matter—but without pipelines built for speed and trust, insights arrive too late to change practice. We contrast where the system delivered real value, such as checklists and procedural signals, with where it fell short: predicting post-operative complications and producing research-ready datasets. Throughout the conversation, we argue for a minimum clinically viable product: tightly scoped use cases, early and deep involvement from surgeons and nurses, and data flows that respect governance without stalling learning. AI can strengthen patient safety and team performance—but only when expectations align with capability and operations are designed for real clinical tempo. If this resonates, follow the show, share it with a colleague, and leave a review with one takeaway you’d apply in your own clinical setting.  Reference: Expectations vs Reality of an Intraoperative Artificial Intelligence Intervention Melissa Thornton et al.  JAMA Surgery (2026) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    26 min
  6. JAN 22

    #28 - How AI Confidence Masks Medical Uncertainty

    Can you trust a confident answer, especially when your health is on the line? This episode explores the uneasy relationship between language fluency and medical truth in the age of large language models (LLMs). New research asks these models to rate their own certainty, but the results reveal a troubling mismatch: high confidence doesn’t always mean high accuracy, and in some cases, the least reliable models sound the most sure. Drawing on her ER experience, Laura illustrates how real clinical care embraces uncertainty—listening, testing, adjusting. Meanwhile, Vasanth breaks down how LLMs generate their fluent responses by predicting the next word, and why their self-reported “confidence” is just more language, not actual evidence. We contrast AI use in medicine with more structured domains like programming, where feedback is immediate and unambiguous. In healthcare, missing data, patient preferences, and shifting guidelines mean there's rarely a single “right” answer. That’s why fluency can mislead, and why understanding what a model doesn’t know may matter just as much as what it claims. If you're navigating AI in healthcare, this episode will sharpen your eye for nuance and help you build stronger safeguards.  Reference:  Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study Mahmud Omar et al. JMIR (2025) Credits:  Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    26 min
  7. JAN 15

    #27 - Sleep’s Hidden Forecast

    What if one night in a sleep lab could offer a glimpse into your long-term health? Researchers are now using a foundation model trained on hundreds of thousands of hours of sleep data to do just that, by predicting the next five seconds of a polysomnogram, the model learns the rhythms of sleep and, with minimal fine-tuning, begins estimating risks for conditions like Parkinson’s, dementia, heart failure, stroke, and even some cancers. We break down how it works: during a sleep study, sensors capture brain waves (EEG), eye movements (EOG), muscle tone (EMG), heart rhythms (ECG), and breathing. The model compresses these multimodal signals into a reusable format, much like how language models process text. Add a small neural network, and suddenly those sleep signals can help predict disease risk up to six years out. The associations make clinical sense: EEG patterns are more telling for neurodegeneration, respiratory signals flag pulmonary issues, and cardiac rhythms hint at circulatory problems. But, the scale of what’s possible from a single night’s data is remarkable. We also tackle the practical and ethical questions. Since sleep lab patients aren’t always representative of the general population, we explore issues of selection bias, fairness, and external validation. Could this model eventually work with consumer wearables that capture less data but do so every night? And what should patients be told when risk estimates are uncertain or only partially actionable? If you're interested in sleep science, AI in healthcare, or the delicate balance of early detection and patient anxiety, this episode offers a thoughtful look at what the future might hold—and the trade-offs we’ll face along the way. Reference:  A multimodal sleep foundation model for disease prediction Rahul Thapa Nature (2026) Credits:  Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    24 min
  8. JAN 8

    #26 - How Your Phone Keyboard Signals Your State Of Mind

    What if your keyboard could reveal your mental health? Emerging research suggests that how you type—not what you type—could signal early signs of depression. By analyzing keystroke patterns like speed, timing, pauses, and autocorrect use, researchers are exploring digital biomarkers that might quietly reflect changes in mood. In this episode, we break down how this passive tracking compares to traditional screening tools like the PHQ. While questionnaires offer valuable insight, they rely on memory and reflect isolated moments. In contrast, continuous keystroke monitoring captures real-world behaviors—faster typing, more pauses, shorter sessions, and increased autocorrect usage—all patterns linked to mood shifts, especially when anxiety overlaps with depression. We discuss the practical questions this raises: How do we account for personal baselines and confounding factors like time of day or age? What’s the difference between correlation and causation? And how can we design systems that protect privacy while still offering clinical value? From privacy-preserving on-device processing to broader behavioral signals like sleep and movement, this conversation explores how digital phenotyping might help detect depression earlier—and more gently. If you're curious about AI in healthcare, behavioral science, or the ethics of digital mental health tools, this episode lays out both the potential and the caution needed. Reference:  Effects of mood and aging on keystroke dynamics metadata and their diurnal patterns in a large open-science sample: A BiAffect iOS study Claudia Vesel et al. J Am Med Inform Assoc (2020) Credits:  Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    20 min
5
out of 5
5 Ratings

About

Decoding health in the age of AI Hosted by an AI researcher and a medical doctor, this podcast unpacks how artificial intelligence and emerging technologies are transforming how we understand, measure, and care for our bodies and minds. Each episode unpacks a real-world topic to ask not just what’s new, but what’s true—and what’s at stake as healthcare becomes increasingly data-driven. If you're curious about how health tech really works—and what it means for your body, your choices, and your future—this podcast is for you. We’re here to explore ideas—not to diagnose or treat. This podcast doesn’t provide medical advice.