From molecular biology to electronic health records, Prof. Karin Verspoor discusses why structured vocabularies still matter in the age of LLMs — and why domain expertise is the one thing AI can’t replace. GuestProfessor Karin Verspoor, Executive Dean of Computer Science, RMIT University Keywordsnatural language processing, healthcare analytics, structured data, unstructured data, medical terminology, UMLS, SNOMED, ICD, electronic health records, AI governance, human-in-the-loop, knowledge representation, cognitive science, protein function prediction, clinical documentation, machine learning SummaryJon Scheele speaks with Professor Karin Verspoor, Executive Dean of Computer Science at RMIT University, about the critical role of language in making sense of healthcare data. Karin traces her journey from cognitive science and NLP research, through an AI startup and Los Alamos National Lab, to healthcare analytics — starting with a colleague’s question about protein function prediction when she didn’t even know what a protein was. They discuss how structured vocabularies like the Unified Medical Language System (UMLS), SNOMED, and ICD codes provide an anchoring framework for clinical data, why simple dictionary lookup falls short (especially with negation in medical records), and how LLMs are changing the landscape while still lacking domain-specific clinical context. The conversation explores the balance between generative AI tools and traditional predictive models, and why human oversight and domain expertise remain essential for safe, effective use of AI in healthcare. Key TakeawaysKarin’s path into healthcare started with a colleague asking her to apply NLP to protein function prediction — she didn’t know what a protein was at the time.Scientific literature and clinical records are overwhelmingly expressed in natural language, making NLP essential for extracting structured insights.The Unified Medical Language System (UMLS) unifies standards like ICD and SNOMED into a shared framework — and underpins billing systems worldwide.Simple dictionary lookup against these vocabularies is a useful starting point, but fails with negation (e.g., “no evidence of infection” being read as “infection”).LLMs have shifted clinician attitudes — before ChatGPT, many didn’t see the value of AI tools; now demand outpaces what can be safely deployed.AI scribes and documentation tools are among the first clinical adoptions, but rely on doctors manually verifying output — a model that may not scale.Generative AI won’t replace traditional predictive and classification models — healthcare will use a mix of approaches for different tasks.The key question to ask of any AI system is: “What’s not in your data?” LLMs lack the specific context of individual situations.Domain knowledge is what allows humans to critically evaluate AI output — without it, you can’t spot errors.Every situation is unique, and that contextual understanding is what humans bring that LLMs currently cannot.Sound Bites"What’s not in your data?”“I literally looked at him and said, what’s a protein?”“Every situation is unique — and that’s what a human can bring that the LLM doesn’t have access to.”“People don’t always use the terminology correctly.”“I checked it nine times and it was right… the tenth time, they just tick the box.”