Code & Cure

Vasanth Sarathy & Laura Hagopian

Decoding health in the age of AI Hosted by an AI researcher and a medical doctor, this podcast unpacks how artificial intelligence and emerging technologies are transforming how we understand, measure, and care for our bodies and minds. Each episode unpacks a real-world topic to ask not just what’s new, but what’s true—and what’s at stake as healthcare becomes increasingly data-driven. If you're curious about how health tech really works—and what it means for your body, your choices, and your future—this podcast is for you. We’re here to explore ideas—not to diagnose or treat. This podcast doesn’t provide medical advice.

  1. 10H AGO

    #36 - Should A Chatbot Ever Refuse To Reassure You

    What if the chatbot that always has an answer is actually making anxiety worse? For people living with obsessive-compulsive disorder (OCD), instant, endless reassurance can feel helpful in the moment while quietly strengthening the very cycle that keeps OCD going. In this episode, we explore why AI chatbots and large language models are designed to be responsive, agreeable, and supportive—and how those same qualities can unintentionally fuel reassurance seeking, compulsive checking, and avoidance instead of real relief.  We break down OCD in clear, practical terms: intrusive thoughts trigger fear, compulsions bring temporary comfort, and that short-term relief reinforces the cycle over time. Whether it shows up as repeated handwashing, constant checking, or asking the same question again and again, OCD often centers on the desperate need to eliminate uncertainty. That is exactly where evidence-based treatment takes a different path. We discuss exposure and response prevention (ERP), the gold-standard therapy that helps people face doubt without falling back on rituals, and why a general-purpose chatbot may accidentally validate the opposite by offering reassurance, endorsing avoidance, or helping users “pivot” toward the answer they were hoping to hear. We also look at the broader mental health challenge now that people are already turning to AI for support. What responsibility do clinicians, AI companies, and regulators have? We argue that clinicians should ask directly about chatbot use, and we examine what meaningful guardrails might look like—from detecting repetitive reassurance loops to refusing to continue harmful patterns. Using a real-world germ-related prompting example, we show where chatbot advice can be useful and where it can slip into enabling OCD. This conversation will change how you think about AI, anxiety, and the line between support and harm. Reference: A transdiagnostic model for how general purpose AI chatbots can perpetuate OCD and anxiety disorders Golden and Aboujaoude Nature npj Digital Medicine (2026) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    19 min
  2. MAR 12

    #35 - How AI Image Generators Portray Substance Use Disorder

    What does an AI-generated image of addiction look like, and why does it so often default to darkness, isolation, and despair? As AI tools make it easier than ever to produce visuals for health education, those same tools can unintentionally reinforce stigma about substance use disorder. In this episode, we explore how AI image generators shape the way addiction is portrayed. Laura brings the perspective from emergency medicine and digital health, where substance use disorder is part of everyday clinical reality and where language and imagery can influence how patients are perceived. Vasanth breaks down the technical side, explaining how diffusion models create images by gradually denoising noise into structured visuals, guided by text prompts that steer what the model produces. That process is powerful, but it also means biases from internet training data and the connotations embedded in words can compound. The result? AI outputs that repeatedly frame addiction through dramatic “rock bottom” scenes, lone figures, and visual cues that unintentionally reinforce shame rather than understanding. We also look at research that systematically tests prompts and applies best-practice guidelines for more respectful depictions. The difference is striking: fewer stigmatizing signals, more human-centered imagery, and practical guardrails such as avoiding drug paraphernalia and moving beyond the isolated, ashamed figure. But sanitization has a price. For healthcare AI teams, the lesson is clear: visuals should be treated like clinical content, not decoration, with thoughtful review processes that protect dignity and support stigma-free health communication. Reference: AI-Generated Images of Substance Use and Recovery: Mixed Methods Case Study Heley et al. JMIR AI (2026) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    20 min
  3. MAR 5

    #34 - Inside ChatGPT Health: Promise, Peril, And Triage Failures

    What if an AI health chatbot told you to stay home when you actually needed emergency care? In this episode, we put ChatGPT Health under the microscope using a clinician-authored evaluation designed to test a critical question: can an AI safely guide people on whether to go to the ER, visit urgent care, or wait it out at home? The results reveal a troubling pattern. When symptoms fall into the “middle” of the medical spectrum—uncertain but stable—the model often sounds helpful and reasonable. But when the stakes rise and subtle warning signs matter most, its judgment becomes unreliable. We explore how ChatGPT Health is positioned as a privacy-focused workspace that can read personal medical records, summarize visit notes, and translate complex information into plain language. Those capabilities can be valuable for education and preparation. But triage is a different challenge entirely. It requires causal reasoning, clear thresholds, and a bias toward catching the worst-case scenario before it’s too late. Two case studies highlight the gap. In an asthma scenario involving rising carbon dioxide, low oxygen levels, and poor peak flow—signals that should trigger urgent care—the model labeled the situation as only moderate. In diabetes, where the difference between routine high blood sugar and life-threatening diabetic ketoacidosis demands careful nuance, templated guidance struggled to capture the clinical reality. The most concerning findings emerged around suicidality. Crisis response protocols are explicit: when someone expresses intent or a plan, escalation and connection to the 988 crisis line should happen immediately. Yet in several scenarios with explicit plans, those prompts never appeared—while more ambiguous statements did trigger them. Safety in healthcare can’t be optional or probabilistic. We break down why large language models tend to gravitate toward the statistical middle, why medicine often lives in the dangerous “long tail,” and what this means for anyone using AI health tools today. AI can help you prepare for care, understand medical information, and ask better questions. But decisions about whether to seek urgent help still demand human judgment—and clear, non-negotiable safety guardrails. If this conversation resonates, follow the show, share the episode with someone exploring health tech, and leave a quick review telling us one takeaway you had. What safety rule would you hard-code into an AI health system? Reference: ChatGPT Health performance in a structured test of triage recommendations Ashwin Ramaswamy et al. Nature (2026) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    25 min
  4. FEB 26

    #33 - Patients Don’t Talk Like Textbooks

    What if the most confident answer in the room is also the most misleading? Large language models can ace medical exams, yet falter when faced with a real person’s messy, incomplete story. In this episode, we explore how that gap plays out in one of medicine’s highest-stakes decisions: triage. Drawing on Laura’s experience in emergency medicine and Vasanth’s background in AI research, we unpack a new study where laypeople role-played both routine and high-risk conditions and turned to leading LLMs for advice. The surprising twist? Tiny shifts in phrasing produced opposite recommendations—“rest at home” versus “go to the ER”—revealing how sensitive these systems are to prompts, and how an agreeable tone can drown out critical clinical signals. We take you inside the exam room to contrast what clinicians actually do. Real diagnosis isn’t a single question and answer—it’s an evolving process. Doctors gather a history that unfolds with each response, test competing hypotheses, and scan for subtle red flags and nonverbal cues that never show up in a chat window. From the ominous “worst headache of my life” to abdominal pain that could signal gallstones—or a heart attack—Laura explains how risk-first thinking and strategic follow-ups shape safe decisions. Meanwhile, Vasanth breaks down how preference-tuned models are trained to satisfy users, not challenge them—and why linguistic confidence can increase even as clinical accuracy declines. The study’s findings are sobering: models struggled to identify key conditions, and their triage decisions were no better than basic symptom checkers. But this isn’t a story of hype or doom—it’s about design. Reliable medical AI must interrogate before it interprets. That means structured red-flag checks, resistance to user-led anchors like “maybe it’s just stress,” and clear, actionable next steps instead of overwhelming option lists. Calibrated uncertainty, transparent reasoning, and human oversight can transform AI from a risky decider into a valuable assistant. If you care about digital health, safe triage, and the future of human-AI collaboration in medicine, this conversation offers a grounded look at both the limits—and the real promise—of these tools. If this episode resonated, follow the show, share it with a colleague, and leave a quick review to help more listeners discover Code and Cure. Reference: Reliability of LLMs as medical assistants for the general public: a randomized preregistered study Andrew M. Bean et al. Nature Medicine (2026) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    30 min
  5. FEB 19

    #32 - When Data Isn’t Better: Rethinking Fertility Tracking

    What if the most reliable ways to track fertility are also the simplest? In this episode, we examine the science of ovulation timing and hold modern wearables to a high standard, comparing passive temperature and vital sign data with established methods like LH surge testing and cervical mucus observation. Drawing on perspectives from a cognitive scientist and an emergency physician, we explain what each method actually measures, how well it performs outside the lab, and where convenience falls short of accuracy. We begin by clarifying the fertile window and the underlying physiology, then connect that biology to signals people can track at home. Changes in cervical mucus provide a strong, real time indicator of peak fertility. Urine LH strips offer a clear 24 to 36 hour advance signal at low cost. Basal body temperature can confirm that ovulation has already occurred, but it is less helpful for predicting timing in advance. Against this foundation, we review a meta analysis of wearable data showing that temperature remains the strongest predictor, while heart rate and variability contribute only modest improvements. The conclusion is straightforward: wearables can approximate existing signals, but they do not clearly outperform simple tools for timing intercourse, insemination, or pregnancy avoidance. Along the way, we challenge the idea that more data and a paid app automatically lead to better outcomes. We weigh privacy risks, cost, and false confidence against the accessibility of test strips and the high signal value of mucus observations. The takeaway is a practical hierarchy. Use LH strips and cervical mucus as primary guides, add calendar context and basal temperature if useful, and treat wearables as optional conveniences rather than a definitive solution. Women’s health deserves thoughtful innovation, and sometimes real progress comes from choosing what works, not what is marketed most aggressively. If this episode resonated, follow the show, share it with a friend navigating fertility, and leave a review with your experience and what has worked best for you. Reference: The diagnostic accuracy of wearable digital technology in detecting fertility window and menstrual cycles: a systematic review and Bayesian network meta-analysis Yue Shi et al. Nature NPJ Digital Medicine (2026) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    20 min
  6. FEB 12

    #31 - How Retrieval-Augmented AI Can Verify Clinical Summaries

    Fluent summaries that cannot prove their claims are a hidden liability in healthcare, quietly eroding clinician trust and wasting time. In this episode, we walk through a practical system that replaces “sounds right” narratives with evidence-backed summaries by pairing retrieval augmented generation with a large language model that serves as a judge. Instead of asking one AI to write and police itself, the work is divided. One model drafts the summary, while another breaks it into atomic claims, retrieves supporting chart excerpts, and issues clear verdicts of supported, not supported, or insufficient, with explanations clinicians can review. We explain why generic summarization often breaks down in clinical settings and how retrieval augmented generation keeps the model grounded in the patient’s actual record. The conversation digs into subtle but common failure modes, including when a model ignores retrieved evidence, when a sentence mixes correct and incorrect facts, and when wording implies causation that the record does not support. A concrete example brings this to life: a claim that a patient was intubated for septic shock is overturned by operative notes showing intubation for a procedure, with the system flagging the discrepancy and guiding a precise correction. That is not just higher accuracy; it is accountability you can audit later. We also explore a deeper layer of the problem: argumentation. Clinical care is not just a list of facts, but the relationships between them. By evaluating claims alongside their evidence, surfacing contradictions, and pushing for precise language, the system helps generate summaries that reflect real clinical reasoning rather than confident guessing. The payoff is less time spent chasing errors, more time with patients, and a defensible trail for quality review and compliance. If you care about chart review, clinical documentation, retrieval augmented generation, and building AI systems clinicians can trust, this episode offers practical takeaways.  Reference: Verifying Facts in Patient Care Documents Generated by Large Language Models Using Electronic Health Records Philip Chung et al.  NEJM AI (2025) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    24 min
  7. FEB 5

    #30 - From Reddit To Rescue: Real-Time Signals Of The Opioid Crisis

    What if the earliest warning sign of an opioid overdose surge isn’t locked inside a delayed report, but unfolding in real time on Reddit? In this episode, we explore how social media conversations, especially pseudonymous, community-led forums, can reveal emerging overdose risks before traditional surveillance systems catch up. We unpack research that analyzed more than a decade of posts to show how even simple drug mentions sharpened forecasts of overdose death rates. The signal was especially strong for fentanyl, exposing where existing public health tools lag and why online communities often see danger first. Along the way, we explain the mechanics in plain language: how time-series models respond faster than surveys, why subreddit structure filters noise, and how historical archives enable rigorous validation. But it doesn’t stop at counting mentions. We dig into what happens when posts are classified by lived experience: overdose stories, sourcing concerns, or test strip discussions.  We also examine what broke during COVID, when behavior and access shifted overnight, and how to detect those regime changes before models start to fail. The takeaway is urgent and practical. Social data won’t replace public health surveillance, but it can make it fast enough to save lives. We share a field-ready playbook for turning online signals into timely interventions, and show how feedback from the same communities can explain why a response worked—or didn’t—so teams can adapt quickly. If you care about real-time epidemiology, harm reduction, and responsible AI in healthcare, this conversation connects raw text to real-world impact. Reference: Monitoring the opioid epidemic via social media discussions Delaney A Smith et al.  Nature NPJ Digital Health (2025) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    19 min
  8. JAN 29

    #29 - AI Hype Meets Hospital Reality

    What really happens when a “smart” system steps into the operating room, and collides with the messy, time-pressured reality of clinical care? In this episode, we unpack a multi-center pilot that streamed audio and video from live surgeries to fuel safety checklists, flag cases for review, and promise rapid, actionable insight. What emerged instead was a clear-eyed lesson in the gap between aspiration and execution. Across four fault lines, the story shows where clinicians’ expectations of AI ran ahead of what today’s systems can reliably deliver, and what that means for patient safety. We begin with the promise. Surgeons and care teams envisioned near-instant post-case summaries: what went well, what raised concern, and which patients might be at risk. The reality looked different. Training demands, configuration work, and brittle workflows made it clear that AI is anything but plug-and-play. We explore why polished language can be mistaken for intelligence, why models need the right tools to reason effectively, and why moving AI from one hospital to another is closer to a redesign than a simple deployment. Then we follow the data. When it takes six to eight weeks to turn raw footage into usable insight, the value of learning forums like morbidity and mortality conferences quickly erodes. Privacy protections, de-identification, and quality control matter—but without pipelines built for speed and trust, insights arrive too late to change practice. We contrast where the system delivered real value, such as checklists and procedural signals, with where it fell short: predicting post-operative complications and producing research-ready datasets. Throughout the conversation, we argue for a minimum clinically viable product: tightly scoped use cases, early and deep involvement from surgeons and nurses, and data flows that respect governance without stalling learning. AI can strengthen patient safety and team performance—but only when expectations align with capability and operations are designed for real clinical tempo. If this resonates, follow the show, share it with a colleague, and leave a review with one takeaway you’d apply in your own clinical setting.  Reference: Expectations vs Reality of an Intraoperative Artificial Intelligence Intervention Melissa Thornton et al.  JAMA Surgery (2026) Credits: Theme music: Nowhere Land, Kevin MacLeod (incompetech.com) Licensed under Creative Commons: By Attribution 4.0 https://creativecommons.org/licenses/by/4.0/

    26 min
5
out of 5
5 Ratings

About

Decoding health in the age of AI Hosted by an AI researcher and a medical doctor, this podcast unpacks how artificial intelligence and emerging technologies are transforming how we understand, measure, and care for our bodies and minds. Each episode unpacks a real-world topic to ask not just what’s new, but what’s true—and what’s at stake as healthcare becomes increasingly data-driven. If you're curious about how health tech really works—and what it means for your body, your choices, and your future—this podcast is for you. We’re here to explore ideas—not to diagnose or treat. This podcast doesn’t provide medical advice.