Dev and Doc: AI For Healthcare Podcast

Dev and Doc

Bringing doctors and developers together to unlock the potential of AI in healthcare. Together, we can build models that matter. 🤖👨🏻‍⚕️ Hello! We are Dev & Doc, Zeljko and Josh :) Josh is a Neurologist, AI Researcher and Clinical AI Lead. Zeljko is an AI engineer, CTO and associate professor (UCL) ------------- Substack- https://aiforhealthcare.substack.com/ YT - https://youtube.com/@DevAndDoc

  1. 22 AUG

    Everything you need to know about LLM benchmarks- Turing Test, OpenAI's Healthbench, ARC prize, LM arena

    Whenever there was AI, there were benchmarks- from the turing test, to society-changing benchmarks like MNIST and ImageNet to modern problems like the ARC prize, benchmarked served a vital purpose to measure the performance of AI models. But something has shifted in modern times, in the LLM era have benchmarks lost their utility, becoming mere advertisement for big tech? Even seemingly more sophisticated benchmarks like LM Arena can be gamed by tech giants. We also deep dive into healthcare benchmarks like OpenAI's Healthbench (deeply problematic) and Microsoft's AI-DXO orchestrator agent for diagnosis. Where is this all going? How do we make the perfect benchmark? Or is the real work to be done afterwards in the real world? 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) --- Timestamps00:00 Intro - The OG benchmarks - Turing test, MNIST, ImageNET06:40 Are large language models benchmarks similar to humans taking tests?10:05 Are we testing model capability vs production ready?12:00 LLM era - data contamination15:30 LM Arena - The leaderboard illusion paper - how big tech games benchmarks28:35 Goodhart's law - When a measure becomes a target, it ceases to be a good measure32:05 Some good benchmarks - games - Pokemon, ARC prize, Minecraft34:35 Medical benchmarks - OpenAI's healthbench has some big problems46:50 Microsoft AI-DXO orchestrator for case reports --- Connect with Us Your Hosts:👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - LinkedIn🤖 Dev - Zeljko Kraljevic - Twitter Follow & Subscribe:YT: https://youtube.com/@DevAndDocSpotify: Follow us on SpotifyApple Podcasts: Listen on Apple PodcastsSubstack: https://aiforhealthcare.substack.com/ For enquiries:📧 Devanddoc@gmail.com --- Production Credits🎞️ Editor: Dragan Kraljević - Instagram🎨 Brand & Art: Ana Grigorovici - Behance

    55 min
  2. 9 MAY

    #28 AI agents explained - Manus AI, computer control, Agentic workflows (healthcare)

    AI agents are here, but how did we get here in the first place? How do we build and leverage AI agents for high stakes domains like healthcare? In this episode of Dev and Doc, we go deep into the forest that is AI agents and computer control - starting from the "caveman" era of LLMs discovering tools, to cultivating intelligent models and agentic workflows. We dissect everyday agents like MANUS AI, and deep dive into how, where and when AI agents should be used. Are these agents hype or hope, is this actually the second deepseek moment? 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) Episode Timestamps: 00:00 Highlight 3:13 start / intro 5:20 LLM's caveman era - tool usage 6:46 Agents have autonomy and interact with environment 11:15 workflows and agentic flows 15:30 when should you be using an agent? 24:27 vibe coding is like driving a car 29:07 Demo - MANUS gathering financial trends, computer control 35:55 Demo MANUS AI- website creation for Autism Assessment 49:05 computer control factions- Freedom vs Process automation 55:00 Autism website testing 59:13 summary + end Hosts: 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/ 🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr Find us on: YT - https://youtube.com/@DevAndDoc Spotify - https://podcasters.spotify.com/pod/show/devanddoc Apple- https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120 Substack- https://aiforhealthcare.substack.com/ For enquiries: 📧Devanddoc@gmail.com Credits: 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/ 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d

    1h 1m
  3. 26 FEB

    #27 Exploring Claude Sonnet 3.7 for healthcare

    body{font-family:sans-serif;color:#fff;background:#121212;margin:0;padding:10px}p{margin:8px 0}h1{font-size:18px;margin:10px 0}.note{background:#535353;padding:10px;border-radius:4px;margin:10px 0}.timestamps span{color:#1DB954;font-weight:bold}a{color:#1DB954;text-decoration:none}Can Claude perform a range of complex clinical tasks? Dev and Doc are here to investigate.Claude sonnet 3.7 was released less than 48 hours ago, the model is highly intelligent and is one of the best we have seen in recent memory. Definitely passes the vibe check. We give some amazing examples of coding with claude with few shot prompts, and cover technical and clinical evaluations and share our first thoughts. We even tested claude to take a patient history! NB - PLEASE don't do this at home, obviously this is a demo and we do not in any way condone or recommend using an LLM as your doctor or healthcare provider, we are just demonstrating what the future could be. If you are sick, please seek a medical professional. 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) TIMESTAMPS 00:00 start + highlights 01:54 Introduction 08:54 Benchmarks, state of the art 14:44 guardrails, refusals, AI safety and catastrophic risks 22:36 show and tell- great for coding and make video games! 26:54 example hospital runner 30:17 Medical use cases- clinical coding, biomedical entity extraction 37:04 only medical example in Claude model card- still hallucinating citations 38:37 making an anatomy app 40:10 forecasting clinical diagnoses 43:36 taking a medical history from a patient 53:33 wrap up 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - linkedin.com/in/dr-joshua-auyeung 🤖Dev - Zeljko Kraljevic twitter.com/zeljkokr YT:youtube.com/@DevAndDoc Spotify:podcasters.spotify.com/pod/show/devanddoc Apple:podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120 Substack:aiforhealthcare.substack.com For enquiries - 📧 Devanddoc@gmail.com 🎞️ Editor - Dragan Kraljević instagram.com/dragan_kraljevic 🎨 Brand design - Ana Grigorovici behance.net/anagrigorovici027d

    58 min
  4. 21 FEB

    #26 Is it still worth doing a PhD in 2025? (Computer Science / Machine Learning)

    Is it still worth doing a PhD in 2025? Is the academic system broken in this publish-or-perish landscape? When is a PhD not worth pursuing? About this Episode In this Dev and Doc episode, Zeljko (now associate professor!) and Josh (doctor, PhD drop out) talk about the good and the bad of PhD life. They provide insight into the academic world with a focus on computer science and machine learning. 👋 Connect With Us! Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 🎙️ Hosts 👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - LinkedIn 🤖 Dev - Zeljko Kraljevic - Twitter ⏳ Timestamps 00:00 - Start and highlight 01:42 - Intro 03:11 - What made you pursue PhD in the first place 05:05 - Industry or PhD first 10:00 - Positives - Moonshots 17:03 - Positives - Access to world experts and collaboration 20:55 - Positives - Open source and open science 24:49 - Positives - A good environment enables a smooth PhD 27:04 - Negatives - You are a one-man show 31:33 - Negatives - Publish or Perish 45:44 - Bring your research closer to the audience through blogs and other media, journals are legacy media 51:20 - Verdict - Is a PhD still worth it in 2025? 📢 Follow Us LinkedIn Newsletter YouTube Spotify Apple Podcasts Substack 📧 Contact Us For enquiries - devanddoc@gmail.com 🎞️ Video Production 🎬 Editor - Dragan Kraljević - Instagram 🎨 Brand Design & Art Direction - Ana Grigorovici - Behance

    57 min
  5. 7 FEB

    #25 Testing Deepseek R1 on Complex Medical Tasks. Here's what we found. (GRPO explainer)

    Dev and Doc put Deepseek R1 to the test in a technical and clinical deep dive. 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-au-yeung/ 🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr TIMESTAMPS 00:00 Highlights 04:36 Intro 08:29 response from OpenAI, Anthropic- model training costs, tightening restrictions on China, pricing wars 13:13 what an open-source deepseek means for the world. 15:38 Sam altman and Dario amodei feeling the pressure 23:10 TECHNICAL deep dive - RLHF, ppo, dpo 37:08 GRPO, R1s secret sauce 45:02 the aha moment, learning like a human? 50:25 deepseek R1 training and controversy 59:08 deepseek healthcare evaluation - Ethnic Bias 1:06:17 The diagnostic acid test (fail) 1:12:46 Coding clinical data / Medical billing (shout out SNOMED) LinkedIn Newsletter https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7216474068085026817 YT - https://youtube.com/@DevAndDoc Spotify - https://podcasters.spotify.com/pod/show/devanddoc Apple- https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120 Substack- https://aiforhealthcare.substack.com/ For enquiries - 📧Devanddoc@gmail.com 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/ 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d

    1h 21m
  6. 10 JAN

    #24 Significantly advancing LLMs with RAG (Google's Gemini 2.0, Deep Research, notebookLM)

    Dev and Doc - Latest News Dev and Doc - Latest News It's 2025, Dev and Doc cover the latest news including Google's deep research and notebook LM, DeepMind's Promptbreeder, and Anthropic's new RAG approach. We also go through what retrieval augmented generation (RAG) is, and how this technique is advancing LLM performance. 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) Meet the Team 👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - LinkedIn 🤖 Dev - Zeljko Kraljevic - Twitter Where to Follow Us LinkedIn Newsletter YouTube Spotify Apple Podcasts Substack Contact Us 📧 For enquiries - Devanddoc@gmail.com Credits 🎞️ Editor - Dragan Kraljević - Instagram 🎨 Brand Design and Art Direction - Ana Grigorovici - Behance Episode Timeline 00:00 Highlights 00:53 News - Notebook LM, OpenAI 12 days of Christmas 07:44 Change in the meta - post-training 11:34 Optimizing prompts with DeepMind Promptbreeder 13:20 Is OpenAI losing their lead against Google 16:45 Deep research vs Perplexity 24:18 AIME and oncology 26:00 Deep research results 30:20 RAG intro 33:14 Second pass RAG 36:20 RAG didn't take off 38:40 Wikichat 39:16 How do we improve on RAG? 41:11 Semantic/topic chunking, cross-encoders, agentic RAG 51:15 Google’s Problem Decomposition 53:32 Anthropic’s Contextual Retrieval Processing 56:07 Summary and wrap up References Cross Encoders Wikichat Google's Problem Decomposition Anthropic's Contextual Retrieval Google AIME in Oncology DeepMind's Promptbreeder

    58 min
  7. 20/09/2024

    #23 Can OpenAI's GPT o1 solve complex medical problems?

    First Thoughts and Preliminary Insights into OpenAI's GPT o1 Strawberry in the Medical Domain With some expected and unexpected findings, we have a "bake off" between o1 and Doc to demonstrate how o1 fares with tricky medical scenarios. Disclaimer Obviously, don't use AI to diagnose or treat your medical problems. If you are unwell, please seek a medical professional (AI isn't good enough just yet :)). 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) Contributors • 👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/ • 🤖 Dev - Zeljko Kraljevic - https://twitter.com/zeljkokr Follow Us • https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7216474068085026817 • https://youtube.com/@DevAndDoc • https://podcasters.spotify.com/pod/show/devanddoc • https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120 • https://aiforhealthcare.substack.com/ For enquiries - 📧 mailto:Devanddoc@gmail.com Team • 🎞️ Editor - Dragan Kraljević - https://www.instagram.com/dragan_kraljevic/ • 🎨 Brand Design and Art Direction - Ana Grigorovici - https://www.behance.net/anagrigorovici027d Timestamps • 00:00 - Start + Highlights • 01:28 - Intro, What is GPT o1? • 05:18 - What is "Reasoning" in o1? • 12:38 - Benchmarks: o1's Successes and Failures • 24:07 - o1 and Doctor Bake Off! • 24:21 - The Pregnancy Acid Test for LLMs • 26:23 - Clinical Coding • 30:06 - Tricky Patient Scenarios • 32:25 - Opioid Dose Conversions

    40 min
  8. 15/08/2024

    #22 Explaining Explainable AI (for healthcare) with Dr Annabelle Painter (RSM digital health section Podcast)

    Dev and Doc is joined by guest Annabelle Painter, doctor, CMO, and podcaster for the Royal Society of Medicine Digital Health Podcast. We deep dive into explainability and interpretability with concrete healthcare examples. Check out Dr. Painter's Podcast here, she has some amazing guests and great insights into AI in healthcare! - https://spotify.link/pzSgxmpD5yb 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/ 🤖 Dev - Zeljko Kraljevic - https://twitter.com/zeljkokr LinkedIn Newsletter YouTube Channel Spotify Apple Podcasts Substack For enquiries - 📧 Devanddoc@gmail.com 🎞️ Editor - Dragan Kraljević - https://www.instagram.com/dragan_kraljevic/ 🎨 Brand design and art direction - Ana Grigorovici - https://www.behance.net/anagrigorovici027d Timestamps: 00:00 - Start + highlights 03:47 - Intro 08:16 - Does all AI in healthcare need to be explainable? 15:56 - History and explanation of Explainable/Interpretable AI 20:43 - Gradient-based saliency and heat maps 24:14 - LIME - Local Interpretable Model-agnostic Explanations 30:09 - Nonsensical correlations - When explainability goes wrong 33:57 - Modern explainability - Anthropic 37:15 - Comparing LLMs with the human brain 40:02 - Clinician-AI interaction 47:11 - Where is this all going? Aligning models to ground truth and teaching them to say "I don't know" References: Fun Examples of when models go wrong - Nonsensical correlations Mechanistic interpretability Anthropic - Mapping the mind of language models Limitations of current AI explainability approaches Explainability does not improve automation bias in radiologists

    59 min

About

Bringing doctors and developers together to unlock the potential of AI in healthcare. Together, we can build models that matter. 🤖👨🏻‍⚕️ Hello! We are Dev & Doc, Zeljko and Josh :) Josh is a Neurologist, AI Researcher and Clinical AI Lead. Zeljko is an AI engineer, CTO and associate professor (UCL) ------------- Substack- https://aiforhealthcare.substack.com/ YT - https://youtube.com/@DevAndDoc

You Might Also Like