Into AI Safety

Jacob Haimes

The Into AI Safety podcast aims to make it easier for everyone, regardless of background, to get meaningfully involved with the conversations surrounding the rules and regulations which should govern the research, development, deployment, and use of the technologies encompassed by the term "artificial intelligence" or "AI" For better formatted show notes, additional resources, and more, go to https://kairos.fm/intoaisafety/

  1. Drawing Red Lines w/ Su Cizem

    APR 6

    Drawing Red Lines w/ Su Cizem

    Technology has been moving faster than policy for some time now, and the advent of AI isn't changing that, so what can we do to maintain safety despite uncertainty? Su Cizem has spent the last few years trying to answer that question. As an analyst at the Future Society, she works on global AI governance, specifically on building international consensus around AI red lines: the thresholds we collectively agree must never be crossed. In this conversation, Su walks through her path from philosophy to policy, the evolution of the global AI safety summit series, why voluntary commitments from AI labs aren't enough, and what it would actually take to make international cooperation on AI safety real. Chapters (00:00) - Introduction (03:23) - From Philosophy to Policy (22:25) - What AI Governance Actually Means (26:49) - The Summit Series (43:01) - Drawing The Red Lines (01:10:51) - Can These Companies Govern Themselves? (01:24:01) - Breaking Into The Field (01:27:51) - Closing Thoughts & Outro Critical LinksBelow are the most important links for this episode. For more, visit the episode page on Kairos.fm.Su's LinkedInGlobal Call for AI Red LinesThe Futures Society report - “Facing the Stakes of AI Together”: 2025 Athens Roundtable ReportPolitico article - How the global effort to keep AI safe went off the railsTechPolicy.Press article - A Timeline of the Anthropic-Pentagon DisputeThe Guardian article - AI got the blame for the Iran school bombing. The truth is far more worryingGoogle and OpenAI Employee open letter - We Will Not Be DividedThe Register article - Altman said no to military AI abuses – then signed Pentagon deal anywaySaferAI report - Evaluating AI Providers’ Frontier AI Safety Frameworks

    1h 32m
  2. Thinking Through "Digital Minds" w/ Jacy Reese-Anthis

    MAR 10

    Thinking Through "Digital Minds" w/ Jacy Reese-Anthis

    Jacy Reese-Anthis, founder of Sentience Institute and researcher at Stanford, began his journey working for animal welfare, but is now finishing up his PhD with research in many different AI subfields at the intersection of neuroscience, philosophy, social science, and machine learning. While this may seem like an odd jump at first, Jacy shares how his work has all been centered around the idea of moral circle expansion. In this episode, we dig into what sentience actually means (or at least how we can begin to think about it), why anthropomorphization is more complicated than it sounds, and how language models may be able to be leveraged as an effective tool for social science research. Jacy also shares his median AGI estimate somewhere in there, so stay tuned if you want to catch it. As part of my effort to make this whole podcasting thing more sustainable, I have created a Kairos.fm Patreon which includes an extended version of this episode. Supporting gets you access to these extended cuts, as well as other perks in development. Chapters (00:00) - Introduction (05:41) - From Animal Welfare to Digital Minds (09:00) - Founding Sentience Institute (22:00) - Defining Sentience (27:13) - The Anthropomorphization Problem (47:51) - Why "Digital Minds" (Not "Artificial Intelligence") (51:05) - LLMs as Social Science Tools (01:07:03) - Jacy’s AGI Timeline & The Singularity (01:09:23) - Final Thoughts & Outro Critical LinksBelow are the most important links for this episode. For more, visit the episode page on Kairos.fm.Jacy's websiteWikipedia article - Jacy Reese AnthisSentience Institute websiteCHI paper - Digital Companionship: Overlapping Uses of AI Companions and AI AssistantsICML paper - LLM Social Simulations Are a Promising Research MethodACL paper - The Impossibility of Fair LLMsWikipedia article - ELIZA effectThe Atlantic article - How a Google Employee Fell for the Eliza Effect

    1h 11m
  3. Scaling AI Safety Through Mentorship w/ Dr. Ryan Kidd

    FEB 2

    Scaling AI Safety Through Mentorship w/ Dr. Ryan Kidd

    What does it actually take to build a successful AI safety organization? I'm joined by Dr. Ryan Kidd, who has co-led MATS from a small pilot program to one of the field's premier talent pipelines. In this episode, he reveals the low-hanging fruit in AI safety field-building that most people are missing: the amplifier archetype. I pushed Ryan on some hard questions, from balancing funder priorities and research independence, to building a robust selection process for both mentors and participants. Whether you're considering a career pivot into AI safety or already working in the field, this conversation offers practical advice on how to actually make an impact. Chapters (00:00) - - Intro (08:16) - - Building MATS Post-FTX & Summer of Love (13:09) - - Balancing Funder Priorities and Research Independence (19:44) - - The MATS Selection Process (33:15) - - Talent Archetypes in AI Safety (50:22) - - Comparative Advantage and Career Capital in AI Safety (01:04:35) - - Building the AI Safety Ecosystem (01:15:28) - - What Makes a Great AI Safety Amplifier (01:21:44) - - Lightning Round Questions (01:30:30) - - Final Thoughts & Outro LinksMATSRyan's Writing LessWrong post - Talent needs of technical AI safety teamsLessWrong post - AI safety undervalues foundersLessWrong comment - Comment permalink with 2025 MATS program detailsLessWrong post - Talk: AI Safety Fieldbuilding at MATSLessWrong post - MATS Mentor SelectionLessWrong post - Why I funded PIBBSSEA Forum post - How MATS addresses mass movement building concernsFTX Funding of AI Safety LessWrong blogpost - An Overview of the AI Safety Funding SituationFortune article - Why Sam Bankman-Fried’s FTX debacle is roiling A.I. researchNY Times article - FTX probes $6.5M in payments to AI safety group amid clawback crusadeCointelegraph article - FTX probes $6.5M in payments to AI safety group amid clawback crusadeFTX Future Fund article - Future Fund June 2022 Update (archive)Tracxn page - Anthropic Funding and InvestorsTraining & Support Programs Catalyze ImpactSeldon LabSPARBlueDot ImpactYCombinatorPivotalAthenaAstra FellowshipHorizon FellowshipBASE FellowshipLASR LabsEntrepeneur FirstFunding Organizations Coefficient Giving (previously Open Philanthropy)LTFFLongview PhilanthropyRenaissance PhilanthropyCoworking Spaces LISAMoxLighthavenFAR LabsConstellationColliderNET OfficeBAISHResearch Organizations & Startups Atla AIApollo ResearchTimaeusRAND CASTCHAIOther Sources AXRP website - The AI X-risk Research PodcastLessWrong blogpost - Shard Theory: An Overview

    1h 32m
  4. Sobering Up on AI Progress w/ Dr. Sean McGregor

    12/29/2025

    Sobering Up on AI Progress w/ Dr. Sean McGregor

    Sean McGregor and I discuss about why evaluating AI systems has become so difficult; we cover everything from the breakdown of benchmarking, how incentives shape safety work, and what approaches like BenchRisk (his recent paper at NeurIPS) and AI auditing aim to fix as systems move into the real world. We also talk about his history and journey in AI safety, including his PhD on ML for public policy, how he started the AI Incident Database, and what he's working on now: AVERI, a non-profit for frontier model auditing. Chapters (00:00) - Intro (02:36) - What's broken about benchmarking (03:41) - Sean’s wild PhD (14:28) - The phantom internship (19:25) - Sean's journey (22:25) - Market-vs-regulatory modes and AIID (32:13) - Drunk on AI progress (38:34) - BenchRisk (43:20) - Moral hazards and Master Hand (50:34) - Liability, Section 230, and open source (59:20) - AVERI (01:11:30) - Closing thoughts & outro LinksSean McGregor's websiteAVERI websiteBenchRisk BenchRisk websiteNeurIPS paper - Risk Management for Mitigating Benchmark Failure Modes: BenchRiskNeurIPS paper - AI and the Everything in the Whole Wide World BenchmarkAIID AI Incident Database websiteIAAI paper - Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident DatabasePreprint - Lessons for Editors of AI Incidents from the AI Incident DatabaseAIAAIC website (another incident tracker)Hot AI Summer CACM article - A Few Useful Things to Know About Machine LearningCACM article - How the AI Boom Went BustUndergraduate Thesis - Analyzing the Prospect of an Approaching AI WinterTech Genies article - AI History: The First Summer and Winter of AICACM article - There Was No ‘First AI Winter’Measuring Generalization Neural Computation article - The Lack of A Priori Distinctions Between Learning AlgorithmsICLR paper - Understanding deep learning requires rethinking generalizationICML paper - Model-agnostic Measure of Generalization DifficultyRadiology Artificial Intelligence article - Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological PitfallsPreprint - Quantifying Generalization Complexity for Large Language ModelsInsurers Exclude AI Financial Times article - Insurers retreat from AI cover as risk of multibillion-dollar claims mountTom's Hardware article - Major insurers move to avoid liability for AI lawsuits as multi-billion dollar risks emerge — Recent public incidents have lead to costly repercussionsInsurance Newsnet article - Insurers Scale Back AI Coverage Amid Fears of Billion-Dollar ClaimsInsurance Business article - Insurance’s gen AI reckoning has comeSection 230 Section 230 overviewLegal sidebar - Section 230 Immunity and Generative Artificial IntelligenceBad Internet Bills websiteTechDirt article - Section 230 Faces Repeal. Support The Coverage That’s Been Getting It Right All Along.Privacy Guides video - Dissecting Bad Internet Bills with Taylor Lorenz: KOSA, SCREEN Act, Section 230Journal of Technology in Behavioral Health article - Social Media and Mental Health: Benefits, Risks, and Opportunities for Research and PracticeTime article - Lawmakers Unveil New Bills to Curb Big Tech’s Power and ProfitHouse Hearing transcript - Legislative Solutions to Protect Children and Teens OnlineRelevant Kairos.fm Episodes Into AI Safety episode - Growing BlueDot's Impact w/ Li-Lian AngmuckrAIkers episode - NeurIPS 2024 Wrapped 🌯Other Links Encyclopedia of Life websiteIBM Watson AI XPRIZE websiteML Commons websiteWikipedia article

    1h 14m
  5. Against 'The Singularity' w/ Dr. David Thorstad

    11/24/2025

    Against 'The Singularity' w/ Dr. David Thorstad

    Philosopher Dr. David Thorstad tears into one of AI safety's most influential arguments: the singularity hypothesis. We discuss why the idea of recursive self-improvement leading to superintelligence doesn't hold up under scrutiny, how these arguments have redirected hundreds of millions in funding away from proven interventions, and why people keep backpedaling to weaker versions when challenged. David walks through the actual structure of singularity arguments, explains why similar patterns show up in other longtermist claims, and makes the case for why we should focus on concrete problems happening right now like poverty, disease, the rise of authoritarianism instead of speculative far-future scenarios. Chapters (00:00) - Intro (02:13) - David's background (08:00) - (Against) The Singularity Hypothesis (29:46) - Beyond the The Singularity (39:56) - What We Should Actually Be Worried About (49:00) - Philanthropic Funding LinksDavid's personal websiteReflective Altruism, David's blogThe Singularity Hypothesis David's Philosophical Studies article - Against the singularity hypothesisTime "AI Dictionary" page - SingularityEA Forum blogpost - Summary: Against the singularity hypothesisJournal of Conciousness Studies article - The Singularity: A Philisophical AnalysisInterim Report from the Panel Chairs: AAAI Presidential Panel on Long-Term AI FuturesEpoch AI blogpost - Do the returns to software R&D point towards a singularity?Epoch AI report - Estimating Idea Production: A Methodological SurveyFunding References LessWrong blogpost - An Overview of the AI Safety Funding SituationAISafety.com funding pageReport - Stanford AI Index 2025, Chapter 4.3Forbes article - AI Spending To Exceed A Quarter Trillion Next YearAI Panic article - The “AI Existential Risk” Industrial ComplexGiveWell webpage - How Much Does It Cost To Save a Life?Wikipedia article - Purchasing power parityPascal's Mugging and the St. Petersburg Paradox Wikipedia article - St. Petersburg ParadoxConjecture Magazine article - Pascal’s Mugging and Bad Explanationsneurabites explainer - Ergodicity: the Most Over-Looked AssumptionWikipedia article - Extraordinary claims require extraordinary evidenceThe Time of Perils Global Priorities Institute working paper - Existential risk pessimism and the time of perilsEthics article - Mistakes in the Moral Mathematics of Existential RiskPhilosophy & Public Affairs article - High Risk, Low Reward: A Challenge to the Astronomical Value of Existential Risk MitigationToby Ord Book - The PrecipiceRethink Priorities blogpost - Charting the precipiceAI Futures Project blogpost - AI 2027Trump's Higher Education Threat Compact Wikipedia article - Compact for Academic Excellence in Higher EducationPen America explainer - What is Trump’s Compact for Higher Education? And More Frequently Asked QuestionsStatement by the Vanderbilt AAUP Executive Committee on the “Compact for Academic Excellence in Higher Education”The Vanderbilt Hustler article - BREAKING: Chancellor Daniel Diermeier fails to reject higher education compact, reaffirms Vanderbilt’s values and openness to discussionThe Vanderbilt Hustler article - Students and faculty organize rally outside Kirkland Hall against Trump administration’s higher education compactFree Speech Center article - Compact for Academic ExcellenceMore of David's Work Global Priorities Institute working paper - What power-seeking theorems do not showBook - Essays on LongtermismVibe Shift Blood in the Machine article - GPT-5 Is a Joke. Will It Matter?Futurism article - Evidence Grows That GPT-5 Is a Bit of a DudGary Marcus substack - GPT-5: Overdue, overhyped and underwhelming. And that’s not the worst of it.Pew Research report - How the U.S. Public and AI Experts View Artificial IntelligenceN...

    1h 9m
  6. Getting Agentic w/ Alistair Lowe-Norris

    10/20/2025

    Getting Agentic w/ Alistair Lowe-Norris

    Alistair Lowe-Norris, Chief Responsible AI Officer at Iridius and co-host of The Agentic Insider podcast, joins to discuss AI compliance standards, the importance of narrowly scoping systems, and how procurement requirements could encourage responsible AI adoption across industries. We explore the gap between the empty promises companies provide and actual safety practices, as well as the importance of vigilance and continuous oversight. Listen to Alistair on his podcast, The Agentic Insider! As part of my effort to make this whole podcasting thing more sustainable, I have created a Kairos.fm Patreon which includes an extended version of this episode. Supporting gets you access to these extended cuts, as well as other perks in development. Chapters (00:00) - Intro (02:46) - Trustworthy AI and the Human Side of Change (13:57) - This is Essentially Avatar, Right? (23:00) - AI Call Centers (49:38) - Standards, Audits, and Accountability (01:04:11) - What Happens when Standards aren’t Met? LinksIridius websiteGPT-5 Commentary Where's Your Ed At blogpost - How Does GPT-5 Work?Zvi LessWrong blogpost - GPT-5: The Reverse DeepSeek momentBlood in the Machine article - GPT-5 Is a Joke. Will It Matter?Futurism article - Evidence Grows That GPT-5 Is a Bit of a DudGary Marcus substack - GPT-5: Overdue, overhyped and underwhelming. And that’s not the worst of it.Customer Service and AI Adoption Gartner press release - Gartner Survey Finds 64% of Customers Would Prefer That Companies Didn't Use AI for Customer ServicePreprint - Deploying Chatbots in Customer Service: Adoption Hurdles and Simple RemediesKDD '25 paper - Retrieval And Structuring Augmented Generation with Large Language ModelsGlobal Nerdy blogpost - Retrieval-augmented generation explained “Star Wars” styleThe Security Cafe article - A Quick And Dirty Guide To Starting SOC2Standards ISO overview - AI management systemsISO standard - ISO/IEC 42001CyberZoni guide - ISO 42001 The Complete GuideA-LIGN article - Understanding ISO 42001ISO standard - ISO/IEC 27001ISO standard - ISO/IEC 42005Governance and Regulation NIST framework - AI Risk Management FrameworkEU AI Act article - Article 99: PenaltiesColorado Senate Bill 24-205 (Colorado AI Act) webpageUtah Senate Bill 149 webpageMicrosoft AI Compliance Schellman blogpost - Microsoft DPR AI Requirements and ISO 42001Microsoft documentation - ISO/IEC 42001 AI Management System offeringMicrosoft webpage - Responsible AI Principles and ApproachMicrosoft Service Trust Portal documentation - Responsible AI Standard v2Microsoft documentation - Supplier Security & Privacy Assurance Program Guide v11 April 2025

    1h 12m
  7. Growing BlueDot's Impact w/ Li-Lian Ang

    09/15/2025

    Growing BlueDot's Impact w/ Li-Lian Ang

    I'm joined by my good friend, Li-Lian Ang, first hire and product manager at BlueDot Impact. We discuss how BlueDot has evolved from their original course offerings to a new "defense-in-depth" approach, which focuses on three core threat models: reduced oversight in high risk scenarios (e.g. accelerated warfare), catastrophic terrorism (e.g. rogue actors with bioweapons), and the concentration of wealth and power (e.g. supercharged surveillance states). On top of that, we cover how BlueDot's strategies account for and reduce the negative impacts of common issues in AI safety, including exclusionary tendencies, elitism, and echo chambers. 2025.09.15: Learn more about how to make design effective interventions to make AI go well and potentially even get funded for it on BlueDot Impact's AGI Strategy course! BlueDot is also hiring, so if you think you’d be a good fit, I definitely recommend applying; I had a great experience when I contracted as a course facilitator. If you do end up applying, let them know you found out about the opportunity from the podcast! Follow Li-Lian on LinkedIn, and look at more of her work on her blog! As part of my effort to make this whole podcasting thing more sustainable, I have created a Kairos.fm Patreon which includes an extended version of this episode. Supporting gets you access to these extended cuts, as well as other perks in development. (03:23) - Meeting Through the Course (05:46) - Eating Your Own Dog Food (13:13) - Impact Acceleration (22:13) - Breaking Out of the AI Safety Mold (26:06) - Bluedot’s Risk Framework (41:38) - Dangers of "Frontier" Models (54:06) - The Need for AI Safety Advocates (01:00:11) - Hot Takes and Pet Peeves LinksBlueDot Impact websiteDefense-in-Depth BlueDot Impact blogpost - Our vision for comprehensive AI safety trainingEngineering for Humans blogpost - The Swiss cheese model: Designing to reduce catastrophic lossesOpen Journal of Safety Science and Technology article - The Evolution of Defense in Depth Approach: A Cross Sectorial AnalysisX-clusion and X-risk Nature article - AI Safety for EveryoneBen Kuhn blogpost - On being welcomingReflective Altruism blogpost - Belonging (Part 1: That Bostrom email)AIxBio RAND report - The Operational Risks of AI in Large-Scale Biological AttacksOpenAI "publication" (press release) - Building an early warning system for LLM-aided biological threat creationAnthropic Frontier AI Red Team blogpost - Why do we take LLMs seriously as a potential source of biorisk?Kevin Esvelt preprint - Foundation models may exhibit staged progression in novel CBRN threat disclosureAnthropic press release - Activating AI Safety Level 3 protectionsPersuasive AI Preprint - Lies, Damned Lies, and Distributional Language Statistics: Persuasion and Deception with Large Language ModelsNature Human Behavior article - On the conversational persuasiveness of GPT-4Preprint - Large Language Models Are More Persuasive Than Incentivized Human PersuadersAI, Anthropomorphization, and Mental Health Western News article - Expert insight: Humanlike chatbots detract from developing AI for the human goodAI & Society article - Anthropomorphization and beyond: conceptualizing humanwashing of AI-enabled machinesArtificial Ignorance article - The Chatbot TrapMaking Noise and Hearing Things blogpost - Large language models cannot replace mental health professionalsIdealogo blogpost - 4 reasons not to turn ChatGPT into your therapistJournal of Medical Society Editorial - Importance of informed consent in medical practiceIndian Journal of Medical Research article - Consent in psychiatry - concept, application & implicationsMedia Naama article - The Risk of Humanising AI Chabots: Why ChatGPT Mimicking Feelings Can BackfireBecker's Behavioral Health blogpost - OpenAI’s mental health roadmap: 5 things to knowMiscellaneous References Carnegie Council blogpost - What Do We Mean When We Talk About "AI Democratization"?Collective Intelligence Project policy brief - Four Approaches to Democratizing AIBlueDot Impact blogpost - How Does AI Learn? A Beginner's Guide with ExamplesBlueDot Impact blogpost - AI safety needs more public-facing advocacyMore Li-Lian Links Humans of Minerva podcast websiteLi-Lian's book - Purple is the Noblest ShroudRelevant Podcasts from Kairos.fm Scaling Democracy w/ Dr. Igor Krawczuk for AI safety exclusion and echo chambersGetting into PauseAI w/ Will Petillo for AI in warfare and exclusion in AI safety

    1h 8m
  8. Layoffs to Leadership w/ Andres Sepulveda Morales

    08/04/2025

    Layoffs to Leadership w/ Andres Sepulveda Morales

    Andres Sepulveda Morales joins me to discuss his journey from three tech layoffs to founding Red Mage Creative and leading the Fort Collins chapter of the Rocky Mountain AI Interest Group (RMAIIG). We explore the current tech job market, AI anxiety in nonprofits, dark patterns in AI systems, and building inclusive tech communities that welcome diverse perspectives. Reach out to Andres on his LinkedIn, or check out the Red Mage Creative website! For any listeners in Colorado, consider attending an RMAIIG event: Boulder; Fort Collins (00:00) - Intro (01:04) - Andres' Journey (05:15) - Tech Layoff Cycle (26:12) - Why AI? (30:58) - What is Red Mage? (36:12) - AI as a Tool (41:55) - AInxiety (47:26) - Dark Patterns and Critical Perspectives (01:01:35) - RMAIIG (01:10:09) - Inclusive Tech Education (01:18:05) - Colorado AI Governance (01:23:46) - Building Your Own Tech Community LinksTech Job Market Layoff tracker websiteThe Big Newsletter article - Why Are We Pretending AI Is Going to Take All the Jobs?METR preprint - Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer ProductivityAI Business blogpost - https://aibusiness.com/responsible-ai/debunking-the-ai-job-crisisCrunchbase article - Data: Tech Layoffs Remain Stubbornly High, With Big Tech Leading The WayComputerworld article - Tech layoffs surge even as US unemployment remains stableApollo Technical blogpost - Ghost jobs in tech: Why companies are posting roles they don’t plan to fillThe HR Digest article - The Rise of Ghost Jobs Is Leaving Job Seekers Frustrated and DisappointedA Life After Layoff video - The Tech Job Market Is Hot Trash Right NowEconomy Media video - Will The Tech Job Market Ever Recover?Soleyman Shahir video - Tech CEO Explains: The Real Reason Behind AI LayoffsDark Patterns Deceptive Design websiteJournal of Legal Analysis article - Shining a Light on Dark PatternsICLR paper - DarkBench: Benchmarking Dark Patterns in Large Language ModelsComputing Within Limits paper - Imposing AI: Deceptive design patterns against sustainabilityCommunications of the ACM blogpost - Dark Patterns[Preprint] - A Comprehensive Study on Dark PatternsColorado AI Regulation Senate Bill 24-205 (Colorado AI Act) bill and webpageNAAG article - A Deep Dive into Colorado’s Artificial Intelligence ActColorado Sun article - Why Colorado’s artificial intelligence law is a big deal for the whole countryCFO Dive blogpost - ‘Heavy lift’: Colorado AI law sets high bar, analysts sayDenver 7 article - Colorado could lose federal funding as Trump administration targets AI regulationsAmerica's AI Action Plan documentOther Sources Concordia Framework report and repo80,000 Hours websiteAI Incident Database website

    1h 40m

Ratings & Reviews

5
out of 5
4 Ratings

About

The Into AI Safety podcast aims to make it easier for everyone, regardless of background, to get meaningfully involved with the conversations surrounding the rules and regulations which should govern the research, development, deployment, and use of the technologies encompassed by the term "artificial intelligence" or "AI" For better formatted show notes, additional resources, and more, go to https://kairos.fm/intoaisafety/

You Might Also Like