The Glitchatorio

Witch of Glitch

30-minute introductions to some of the trickiest issues around AI today, such as:  - The alignment problem - Questions of LLM consciousness - Chain-of-thought and monitorability - Scheming and hallucinations The Glitchatorio is a podcast about the aspects of AI that don't fit into standard narratives about superintelligence or technology-as-destiny. We look into the failure modes, emergent mysteries and unexpected behaviors of artificial intelligence that baffle even the experts. You'll hear from technical researchers, data scientists and machine learning experts, as well as psychologists, philosophers and others whose work intersects with AI. Most Glitchatorio episodes follow the standard podcast interview format. Sometimes these episodes alternate with fictional audio skits or personal voice notes. The voices, music and audio effects you hear on The Glitchatorio are all recorded or composed by the Witch of Glitch; they are not AI-generated. 

  1. The Scratchpad Monologues (CoT part 2)

    MAR 16

    The Scratchpad Monologues (CoT part 2)

    If chain of thought is a model "thinking aloud" to itself, then why does it express doubt, frustration or suspicion about the problems it's solving, sometimes for pages and pages of its scratchpad? And what does chain of thought mean for AI safety? We'll hear from Julian Schulz, a researcher who's studying encoded reasoning in large language models, about where the opportunities, risks and weirdness lie in chain of thought. Here are some links to his research: On a model jailbreaking its monitor: https://www.lesswrong.com/posts/szyZi5d4febZZSiq3/monitor-jailbreaking-evading-chain-of-thought-monitoringA roadmap for safety cases based on CoT: https://arxiv.org/html/2510.19476v1#S1His posts on Less Wrong: https://www.lesswrong.com/users/wuschel-schulzSome of the other papers we discussed include: On the biology of a large language model: https://transformer-circuits.pub/2025/attribution-graphs/biology.htmlMonitoring reasoning models for misbehavior and the risks of promoting obfuscation: https://arxiv.org/pdf/2503.11926How steganography comes about: https://arxiv.org/pdf/2506.01926Assuring agent safety evals by analysing transcripts (with excerpts from weird monologues): https://www.alignmentforum.org/posts/e8nMZewwonifENQYB/assuring-agent-safety-evaluations-by-analysing-transcriptsStress-testing deliberative misalignment: https://www.apolloresearch.ai/research/stress-testing-deliberative-alignment-for-anti-scheming-training/And the "watchers" CoT snippet from the paper above:  https://www.antischeming.ai/snippets#using-non-standard-language

    46 min

Ratings & Reviews

5
out of 5
2 Ratings

About

30-minute introductions to some of the trickiest issues around AI today, such as:  - The alignment problem - Questions of LLM consciousness - Chain-of-thought and monitorability - Scheming and hallucinations The Glitchatorio is a podcast about the aspects of AI that don't fit into standard narratives about superintelligence or technology-as-destiny. We look into the failure modes, emergent mysteries and unexpected behaviors of artificial intelligence that baffle even the experts. You'll hear from technical researchers, data scientists and machine learning experts, as well as psychologists, philosophers and others whose work intersects with AI. Most Glitchatorio episodes follow the standard podcast interview format. Sometimes these episodes alternate with fictional audio skits or personal voice notes. The voices, music and audio effects you hear on The Glitchatorio are all recorded or composed by the Witch of Glitch; they are not AI-generated. 

You Might Also Like