Send us Fan Mail Paper Discussed in this Episode: Can large language models like ChatGPT and Gemini interpret cervical cytology accurately? Saroja Devi Geetha. Annals of Diagnostic Pathology 2026; Volume 83, 152641. Episode Summary: In this journal club deep dive, we explore what happens when advanced artificial intelligence is thrown into the visually chaotic realm of human biology. We examine a 2026 study evaluating whether two massive multimodal models—GPT-5 and Gemini 2.5 Pro—can accurately read digital cervical Pap smears without any prior fine-tuning,,. We unpack how these general-purpose models perform on highly specialized visual tasks, revealing that while they aren't ready to fly solo, they exhibit fascinating and distinct diagnostic "personalities" that will undoubtedly reshape the future of the pathology lab,. In This Episode, We Cover: • The "Textbook" Test Setup: How researchers tested the baseline visual reasoning of GPT-5 and Gemini 2.5 Pro by feeding them 100 curated, gold-standard digital Pap test images from the Hologic Education Site to classify using the Bethesda System,,. • The Clinical Reality Check: While the models only achieved a coin-toss exact diagnostic match rate (47% for GPT-5 and 48% for Gemini), their accuracy jumped to 66% when evaluating clinical management protocols—proving they are beginning to grasp the underlying severity and medical consequences of cellular abnormalities,,. • The Over-Anxious Resident (Gemini 2.5 Pro): Gemini acted like a highly sensitive but unrefined trainee, hitting 84% sensitivity and expertly spotting infectious organisms (71%),,. However, its tendency to confuse dense, overlapping cellular clumps with high-grade squamous intraepithelial lesions (HSIL) led to massive overcalling, dragging its specificity down to 71% and creating a risk of false alarms,. • The Big-Picture Academic (GPT-5): GPT-5 proved to be much more measured, demonstrating better overall specificity (74%) and excelling at identifying subtle structural shifts like low-grade squamous intraepithelial lesions (LSIL) (75%) and glandular changes,. Yet, in its focus on the big picture, it completely missed obvious infectious organisms, scoring a dismal 20%,. • The Future of the Lab - Prompt Engineering & The Algorithmic Auditor: Why the next era of cytopathology requires rigorous AI fine-tuning on proprietary datasets and cytology-specific prompt optimization. We discuss a major paradigm shift where human pathologists may transition from actively hunting for disease to acting as "algorithmic auditors" whose primary job is to filter out the hyper-vigilant machine's noise,. Key Takeaway: Current multimodal LLMs are not yet reliable for independent Pap test interpretation due to critical blind spots and tendencies to overcall lesions,. However, their out-of-the-box performance establishes a staggering baseline. By understanding their unique mechanical flaws, pathologists can prepare to use these systems as highly effective co-pilots, seamlessly combining the algorithm's computational brute force with the indispensable filter of human medical reasoning Support the show Get the "Digital Pathology 101" FREE E-book and join us!