Last Week in AI

Skynet Today

Weekly summaries of the AI news that matters!

  1. #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon

    1 DAY AGO

    #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon

    Our 235th episode with a summary and discussion of last week's big AI news! Recorded on 02/27/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: Model and tool updates highlight Anthropic’s Sonnet 4.6 (1M context; strong ARC-AGI-2 results), Google’s Gemini 3.1 Pro (major ARC-AGI-2 jump and multimodal demos), xAI’s Grok 4.2 beta (multi-agent debate), plus Anthropic’s Claude Code “Remote Control” and Perplexity’s multi-agent “Computer” coordinator.Compute and business moves include Meta’s reported up-to-$100B AMD chip deal with warrant/equity incentives, MatX raising $500M to build specialized transformer chips shipping in 2027, World Labs raising $1B for world-model/3D environment tech, and a new startup raising $100M to simulate/predict human behavior.Infrastructure and geopolitics cover Stargate data-center delays amid OpenAI/Oracle/SoftBank control disputes and cash concerns, and China’s plan to scale 7nm/5nm wafer output despite yield and tooling constraints.Research and safety/policy discuss optimizer gains from masked updates, “deep thinking tokens” as a reasoning-effort signal, LLM attractor-state behaviors in bot-to-bot chats, mechanistic interpretability of counting/line-wrapping, methods to map task difficulty to human time horizons, plus Anthropic–Pentagon contract tensions, Anthropic’s report on distillation attacks (DeepSeek/Moonshot/Minimax), and OpenAI’s report on disrupting malicious use. A thank you to our current sponsors: Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year Timestamps: (00:00:10) Intro / Banter(00:01:52) News PreviewTools & Apps(00:03:20) Anthropic releases Sonnet 4.6 | TechCrunch(00:11:24) Google Rolls Out Latest AI Model, Gemini 3.1 Pro - CNET(00:14:54) Elon Musk says Grok 4.20 public beta is now available: Capabilities of AI chatbot offered by xAI - The Times of India(00:18:06) Anthropic just released a mobile version of Claude Code called Remote Control | VentureBeat(00:21:01) Perplexity announces "Computer," an AI agent that assigns work to other AI agents - Ars TechnicaApplications & Business(00:23:40) Meta strikes up to $100B AMD chip deal as it chases 'personal superintelligence' | TechCrunch(00:27:05) Nvidia challenger AI chip startup MatX raised $500M | TechCrunch(00:31:00) World Labs lands $1B, with $200M from Autodesk, to bring world models into 3D workflows | TechCrunch(00:33:07) Simile Raises $100 Million for AI Aiming to Predict Human Behavior(00:33:52) Stargate AI data centers for OpenAI reportedly delayed by squabbles between partners — sources say OpenAI, Oracle, and SoftBank disagreed on who would have ultimate control of the planned data centers(00:36:43) China to increase leading-edge chip output by 5x in two years, report claims — aims to lift 7nm and 5nm production to 100,000 wafers per month, targeting half a million monthly by 2030Research & Advancements(00:40:33) On Surprising Effectiveness of Masking Updates in Adaptive Optimizers(00:48:03) Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens(00:54:52) models have some pretty funny attractor states(01:01:41) When Models Manipulate Manifolds: The Geometry of a Counting Task(01:05:16) BRIDGE: Predicting Human Task Completion Time From Model Performance(01:12:00) NESSiE: The Necessary Safety Benchmark -- Identifying Errors that should not Exist(01:13:15) The least understood driver of AI progress(01:21:45) The Persona Selection Model: Why AI Assistants might Behave like HumansPolicy & Safety(01:25:04) Anthropic CEO Amodei says Pentagon's threats 'do not change our position' on AI(01:33:04) Musk's xAI, Pentagon reach deal to use Grok in classified systems(01:34:17) Detecting and preventing distillation attacks(01:38:36) OpenAI details expanding efforts to disrupt malicious use of AI in new report - SiliconANGLESee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1h 42m
  2. #234 - Opus 4.6, GPT-5.3-codex, Seedance 2.0, GLM-5

    16 FEB

    #234 - Opus 4.6, GPT-5.3-codex, Seedance 2.0, GLM-5

    Our 234th episode with a summary and discussion of last week's big AI news! Recorded on 01/02/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: Major model launches include Anthropic’s Opus 4.6 with a 1M-token context window and “agent teams,” OpenAI’s GPT-5.3 Codex and faster Codex Spark via Cerebras, and Google’s Gemini 3 Deep Think posting big jumps on ARC-AGI-2 and other STEM benchmarks amid criticism about missing safety documentation.Generative media advances feature ByteDance’s Seedance 2.0 text-to-video with high realism and broad prompting inputs, new image models Seedream 5.0 and Alibaba’s Qwen Image 2.0, plus xAI’s Grok Imagine API for text/image-to-video.Open and competitive releases expand with Zhipu’s GLM-5, DeepSeek’s 1M-token context model, Cursor Composer 1.5, and open-weight Qwen3 Coder Next using hybrid attention aimed at efficient local/agentic coding.Business updates include ElevenLabs raising $500M at an $11B valuation, Runway raising $315M at a $5.3B valuation, humanoid robotics firm Apptronik raising $935M at a $5.3B valuation, Waymo announcing readiness for high-volume production of its 6th-gen hardware, plus industry drama around Anthropic’s Super Bowl ad and departures from xAI. A thank you to our current sponsors: Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year Timestamps: (00:00:10) Intro / Banter(00:02:03) Sponsor Break(00:05:33) Response to listener commentsTools & Apps(00:07:27) AAnthropic releases Opus 4.6 with new 'agent teams' | TechCrunch(00:11:28) OpenAI's new GPT-5.3-Codex is 25% faster and goes way beyond coding now - what's new | ZDNET(00:25:30) OpenAI launches new macOS app for agentic coding | TechCrunch(00:26:38) Google Unveils Gemini 3 Deep Think for Science & Engineering | The Tech Buzz(00:31:26) ByteDance's Seedance 2.0 Might be the Best AI Video Generator Yet - TechEBlog(00:35:14) China’s ByteDance, Alibaba unveil AI image tools to rival Google’s popular Nano Banana | South China Morning Post(00:36:54) DeepSeek boosts AI model with 10-fold token addition as Zhipu AI unveils GLM-5 | South China Morning Post(00:43:11) CCursor launches Composer 1.5 with upgrades for complex tasks(00:44:03) xAI launches Grok Imagine API for text and image to videoApplications & Business(00:45:47) Nvidia-backed AI voice startups ElevenLabs hits $11 billion valuation(00:52:04) AI video startup Runway raises $315M at $5.3B valuation, eyes more capable world models | TechCrunch(00:54:02) Humanoid robot startup Apptronik has now raised $935M at a $5B+ valuation | TechCrunch(00:57:10) Anthropic says ‘Claude will remain ad-free,’ unlike an unnamed rival | The Verge(01:00:18) Okay, now exactly half of xAI's founding team has left the company | TechCrunch(01:04:03) Waymo’s next-gen robotaxi is ready for passengers — and also ‘high-volume production’ | The VergeProjects & Open Source(01:04:59) Qwen3-Coder-Next: Pushing Small Hybrid Models on Agentic Coding(01:08:38) OpenClaw’s AI ‘skill’ extensions are a security nightmare | The VergeResearch & Advancements(01:10:40) Learning to Reason in 13 Parameters(01:16:01) Reinforcement World Model Learning for LLM-based Agents(01:20:00) Opus 4.6 on Vending-Bench – Not Just a Helpful AssistantPolicy & Safety(01:22:28) METR GPT-5.2(01:26:59) The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1h 31m
  3. #233 - Moltbot, Genie 3, Qwen3-Max-Thinking

    6 FEB

    #233 - Moltbot, Genie 3, Qwen3-Max-Thinking

    Our 233rd episode with a summary and discussion of last week's big AI news! Recorded on 01/30/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: Google introduces Gemini AI agent in Chrome for advanced browser functionality, including auto-browsing for pro and ultra subscribers.OpenAI releases ChatGPT Translator and Prism, expanding its applications beyond core business to language translation and scientific research assistance.Significant funding rounds and valuations achieved by startups Recursive and New Rofo, focusing on specialized AI chips and optical processors respectively.Political and social issues, including violence in Minnesota, prompt tech leaders in AI like Ade from Anthropic and Jeff Dean from Google to express concerns about the current administration's actions. Timestamps: (00:00:10) Intro / Banter Tools & Apps(00:04:09) Google adds Gemini AI-powered ‘auto browse’ to Chrome | The Verge(00:07:11) Users flock to open source Moltbot for always-on AI, despite major risks - Ars Technica(00:13:25) Google Brings Genie 3 'World Building' Experiment to AI Ultra Subscribers - CNET(00:16:17) OpenAI’s ChatGPT translator challenges Google Translate | The Verge(00:18:27) OpenAI launches Prism, a new AI workspace for scientists | TechCrunch Applications & Business(00:19:49) Exclusive: China gives nod to ByteDance, Alibaba and Tencent to buy Nvidia's H200 chips - sources | Reuters(00:22:55) AI chip startup Ricursive hits $4B valuation 2 months after launch(00:24:38) AI Startup Recursive in Funding Talks at $4 Billion Valuation - Bloomberg(00:27:30) Flapping Airplanes and the promise of research-driven AI | TechCrunch(00:31:54) From invisibility cloaks to AI chips: Neurophos raises $110M to build tiny optical processors for inferencing | TechCrunch Projects & Open Source(00:35:34) Qwen3-Max-Thinking debuts with focus on hard math, code(00:38:26) China's Moonshot releases a new open-source model Kimi K2.5 and a coding agent | TechCrunch(00:46:00) Ai2 launches family of open-source AI developer agents that adapt to any codebase - SiliconANGLE(00:47:46) Tiny startup Arcee AI built a 400B-parameter open source LLM from scratch to best Meta’s Llama Research & Advancements(00:52:53) Post-LayerNorm Is Back: Stable, ExpressivE, and Deep(00:58:00) [2601.19897] Self-Distillation Enables Continual Learning(01:03:04) [2601.20802] Reinforcement Learning via Self-Distillation(01:05:58) Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability Policy & Safety(01:09:13) Amodei, Hoffman Join Tech Workers Decrying Minnesota Violence - BloombergSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1h 21m
  4. #232 - ChatGPT Ads, Thinking Machines Drama, STEM

    28 JAN

    #232 - ChatGPT Ads, Thinking Machines Drama, STEM

    Our 232st episode with a summary and discussion of last week's big AI news! Recorded on 01/23/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: OpenAI announces testing of ads in ChatGPT and introduces child age prediction to enhance safety features, amidst ongoing ethical debates and funding expansions in AI integration with educational tools and business models.China's AI landscape sees significant progress with AI firm Jpu training advanced models on domestic hardware, and strong competitive moves by data centers, highlighting the intense demand in AI manufacturing and infrastructure.Silicon Valley tensions rise as startup Thinking Machines experiences high-profile departures back to OpenAI, reflecting broader industry struggles and rapid shifts in organizational dynamics.AI legislation and safety measures advance with the US Senate's Defiance Act addressing explicit content, and Anthropic updating Claude's constitution to guide ethical AI interactions, while cultural pushbacks from artists signal ongoing debates in intellectual property and AI-generated content. Timestamps: (00:00:10) Intro / Banter(00:02:08) News Preview(00:02:26) Response to listener commentsTools & Apps(00:11:55) OpenAI to test ads in ChatGPT as it burns through billions - Ars Technica(00:18:05) OpenAI is launching age prediction for ChatGPT accounts(00:23:37) Google now offers free SAT practice exams, powered by Gemini | TechCrunch(00:24:57) Baidu’s AI Assistant Reaches Milestone of 200 Million Monthly Active Users - WSJApplications & Business(00:26:53) The Drama at Thinking Machines, a New A.I. Start-Up, Is Riveting Silicon Valley - The New York Times(00:31:44) Zhipu AI breaks US chip reliance with first major model trained on Huawei stack | South China Morning Post(00:36:31) Elon Musk’s xAI launches world’s first Gigawatt AI supercluster to rival OpenAI and Anthropic(00:41:25) Sequoia to invest in Anthropic, breaking VC taboo on backing rivals: FT(00:45:18) Humans&, a 'human-centric' AI startup founded by Anthropic, xAI, Google alums, raised $480M seed round | TechCrunchProjects & Open Source(00:48:51) Black Forest Labs Releases FLUX.2 [klein]: Compact Flow Models for Interactive Visual Intelligence - MarkTechPost(00:50:35) [2601.10611] Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding(00:52:53) [2601.10547] HeartMuLa: A Family of Open Sourced Music Foundation Models(00:54:46) [2601.11044] AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World ContextsResearch & Advancements(00:57:05) STEM: Scaling Transformers with Embedding Modules(01:06:22) Reasoning Models Generate Societies of Thought(01:14:21) Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research AttemptsPolicy & Safety(01:19:41) Senate passes bill letting victims sue over Grok AI explicit images(01:22:03) Building Production-Ready Probes For Gemini(01:27:32) Anthropic Publishes Claude AI's New Constitution | TIMESynthetic Media & Art(01:34:13) Artists Launch Stealing Isn't Innovation Campaign To Protest Big Tech See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1h 41m
  5. #231 - Claude Cowork, Anthropic $10B, Deep Delta Learning

    21 JAN

    #231 - Claude Cowork, Anthropic $10B, Deep Delta Learning

    Our 231st episode with a summary and discussion of last week's big AI news! Recorded on 01/16/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: Anthropic's new cowork tool integrates Claude code, potentially simplifying multiple computing tasks from editing videos to compiling spreadsheets.Significant funding rounds see Anthropic raising $10B at a valuation of $350B, while XAI raises $20B, underscoring the immense market interest in AI startups.Nvidia faces supply challenges for H200 AI chips due to overwhelming demand from China, despite high costs per unit and its potential impact on U.S. company revenue.Policy debates highlight tensions around U.S. export controls to China, with leaders like Justin Lin from Alibaba and Jake Sullivan, former national security advisor, weighing in on the ramifications for the AI industry's future. Timestamps: (00:00:10) Intro / Banter(00:01:30) News PreviewTools & Apps(00:02:13) Anthropic’s new Cowork tool offers Claude Code without the code | TechCrunch(00:09:45) Google’s Gemini AI will use what it knows about you from Gmail, Search, and YouTube | The Verge(00:12:45) Google removes some AI health summaries after investigation finds “dangerous” flaws - Ars Technica(00:16:29) Gmail is getting a Gemini AI overhaul(00:18:12) Slackbot is an AI agent now | TechCrunchApplications & Business(00:20:11) Anthropic Raising $10 Billion at $350 Billion Value(00:22:25) Elon Musk xAI raises $20 billion from Nvidia, Cisco, investors(00:24:47) NVIDIA Needs a Supply Chain ‘Miracle’ From TSMC as China’s H200 AI Chip Orders Overwhelm Supply, Triggering a Bottleneck(00:29:26) OpenAI signs deal, worth $10B, for compute from Cerebras | TechCrunch(00:31:49) CoreWeave in focus as it amends credit agreement(00:34:30) LMArena lands $1.7B valuation four months after launching its product | TechCrunchProjects & Open Source(00:35:54) Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models(00:43:15) mHC: Manifold-Constrained Hyper-Connections(00:49:53) IQuest_Coder_Technical_Report(00:54:58) TII Abu-Dhabi Released Falcon H1R-7B: A New Reasoning Model Outperforming Others in Math and Coding with only 7B Params with 256k Context Window - MarkTechPostResearch & Advancements(01:01:42) Deep Delta Learning(01:07:47) Recursive Language Models(01:13:39) Conditional memory via scalable lookup(01:18:54) Extending the Context of Pretrained LLMs by Dropping their Positional EmbeddingsPolicy & Safety(01:26:06) Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks(01:31:00) Nvidia CEO says purchase orders, not formal declaration, will signal Chinese approval of H200(01:32:24) China AI Leaders Warn of Widening Gap With US After $1B IPO Week(01:37:25) Jake Sullivan is furious that Trump removed Biden’s AI chip export controls | The VergeSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1h 43m
  6. #230 - 2025 Retrospective, Nvidia buys Groq, GLM 4.7, METR

    7 JAN

    #230 - 2025 Retrospective, Nvidia buys Groq, GLM 4.7, METR

    Our 230th episode with a summary and discussion of last week's big AI news! Recorded on 01/02/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: Nvidia's acquisition of AI chip startup Groq for $20 billion highlights a strategic move for enhanced inference technology in GPUs.New York's RAISE Act legislation aims to regulate AI safety, marking the second major AI safety bill in the US.The launch of GLM 4.7 by Zhipu AI marks a significant advancement in open-source AI models for coding.Evaluation of long-horizon AI agents raises concerns about the rising costs and efficiency of AI in performing extended tasks. Timestamps: (00:00:10) Intro / Banter(00:01:58) 2025 Retrospective Tools & Apps(00:24:39) OpenAI bets big on audio as Silicon Valley declares war on screens | TechCrunch Applications & Business(00:26:39) Nvidia buying AI chip startup Groq for about $20 billion, biggest deal(00:34:28) Exclusive | Meta Buys AI Startup Manus, Adding Millions of Paying Users - WSJ(00:38:05) Cursor continues acquisition spree with Graphite deal | TechCrunch(00:39:15) Micron Hikes CapEx to $20B with 2026 HBM Supply Fully Booked; HBM4 Ramps 2Q26(00:42:06) Chinese fabs are reportedly upgrading older ASML DUV lithography chipmaking machines — secondary channels and independent engineers used to soup up Twinscan NXT series Projects & Open Source(00:47:52) Z.AI launches GLM-4.7, new SOTA open-source model for coding(00:50:11) Evaluating AI’s ability to perform scientific research tasks Research & Advancements(00:54:32) Large Causal Models from Large Language Models(00:57:33) Universally Converging Representations of Matter Across Scientific Foundation Models(01:02:11) META-RL INDUCES EXPLORATION IN LANGUAGE AGENTS(01:07:16) Are the Costs of AI Agents Also Rising Exponentially?(01:11:17) METR eval for Opus 4.5(01:16:19) How to game the METR plot Policy & Safety(01:17:24) New York governor Kathy Hochul signs RAISE Act to regulate AI safety | TechCrunch(01:20:40) Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers(01:26:46) Monitoring Monitorability(01:32:07) Sam Altman is hiring someone to worry about the dangers of AI | The Verge(01:33:38) X users asking Grok to put this girl in bikini, Grok is happy obliging - India TodaySee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1h 38m
  7. 25/12/2025

    #229 - Gemini 3 Flash, ChatGPT Apps, Nemotron 3

    Our 229th episode with a summary and discussion of last week's big AI news! Recorded on 12/19/2025 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: Notable releases include OpenAI's GPT-5.2 Codex for advanced coding and Google's Gemini Free Flash for competitive AI application performance. Nvidia's new open-source Trion-3 models also showcase impressive benchmarks.Funding updates highlight Lovable's $330M Series B, valuing the AI coding startup at $6.6B, and Faya's $140M Series D for AI model hosting, valued at $4.5B.China makes significant strides in semiconductor technology with advances in EUV lithography machines, led by Huawei and SMIC, potentially disrupting global chip manufacturing dominance.Key safety and policy updates include OpenAI's GPT-5.2 system card focusing on biosecurity and cybersecurity risks, while Google partners with the US military to power a new AI platform with Gemini models. Timestamps: (00:00:10) Intro / Banter(00:02:09) News PreviewTools & Apps(00:02:56) Google launches Gemini 3 Flash, makes it the default model in the Gemini app | TechCrunch(00:10:13) ChatGPT launches an app store, lets developers know it's open for business | TechCrunch(00:13:35) Introducing GPT-5.2-Codex | OpenAI(00:19:23) Story about OpenAI release - GPT image 1.5(00:22:27) Meta partners with ElevenLabs to power AI audio across Instagram, Horizon - The Economic TimesApplications & Business(00:23:16) OpenAI to End Equity Vesting Period for Employees, WSJ Says(00:28:20) How China built its ‘Manhattan Project’ to rival the West in AI chips(00:36:47) China’s Huawei, SMIC Make Progress With Chips, Report Finds(00:41:03) OpenAI in Talks to Raise At Least $10 Billion From Amazon and Use Its AI Chips(00:43:32) Amazon has a new leader for its ‘AGI’ group as it plays catch-up on AI | The Verge(00:47:27) Broadcom reveals its mystery $10 billion customer is Anthropic(00:49:12) Vibe-coding startup Lovable raises $330M at a $6.6B valuation | TechCrunch(00:50:38) Fal nabs $140M in fresh funding led by Sequoia, tripling valuation to $4.5B | TechCrunchProjects & Open Source(00:51:10) Nvidia Becomes a Major Model Maker With Nemotron 3 | WIRED(00:59:24) Meta introduces new SAM AI able to isolate and edit audio • The Register(00:59:54) [2512.14856] T5Gemma 2: Seeing, Reading, and Understanding Longer(01:03:10) Anthropic makes agent Skills an open standard - SiliconANGLEResearch & Advancements(01:03:47) Budget-Aware Tool-Use Enables Effective Agent Scaling(01:08:21) Rethinking Thinking Tokens: LLMs as Improvement Operators(01:10:50) What if AI capabilities suddenly accelerated in 2027? How would the world know?Policy & Safety(01:12:58) Update to GPdfT-5 System Card: GPT-5.2(01:18:04) Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors(01:20:47) Async Control: Stress-testing Asynchronous Control Measures for LLM Agents(01:24:37) Google is powering a new US military AI platform | The VergeSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1h 27m
  8. 17/12/2025

    #228 - GPT 5.2, Scaling Agents, Weird Generalization

    Our 228th episode with a summary and discussion of last week's big AI news! Recorded on 12/12/2025 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: OpenAI's latest model GPT-5.2 demonstrates improved performance and enhanced multi-modal capabilities but comes with increased costs and a different knowledge cutoff date.Disney invests $1 billion in OpenAI to generate Disney character content, creating unique licensing agreements across characters from Marvel, Pixar, and Star Wars franchises.The U.S. government imposes new AI chip export rules involving security reviews, while simultaneously moving to prevent states from independently regulating AI.DeepMind releases a paper outlining the challenges and findings in scaling multi-agent systems, highlighting the complexities of tool coordination and task performance. Timestamps: (00:00:00) Intro / Banter(00:01:19) News PreviewTools & Apps(00:01:58) GPT-5.2 is OpenAI’s latest move in the agentic AI battle | The Verge(00:08:48) Runway releases its first world model, adds native audio to latest video model | TechCrunch(00:11:51) Google says it will link to more sources in AI Mode | The Verge(00:12:24) ChatGPT can now use Adobe apps to edit your photos and PDFs for free | The Verge(00:13:05) Tencent releases Hunyuan 2.0 with 406B parametersApplications & Business(00:16:15) China set to limit access to Nvidia’s H200 chips despite Trump export approval(00:21:02) Disney investing $1 billion in OpenAI, will allow characters on Sora(00:24:48) Unconventional AI confirms its massive $475M seed round(00:29:06) Slack CEO Denise Dresser to join OpenAI as chief revenue officer | TechCrunch(00:31:18) The state of enterprise AIProjects & Open Source(00:33:49) [2512.10791] The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality(00:36:27) Claude 4.5 Opus' Soul DocumentResearch & Advancements(00:43:49) [2512.08296] Towards a Science of Scaling Agent Systems(00:48:43) Evaluating Gemini Robotics Policies in a Veo World Simulator(00:52:10) Guided Self-Evolving LLMs with Minimal Human Supervision(00:56:08) Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning(01:00:39) [2512.07783] On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models(01:04:42) Stabilizing Reinforcement Learning with LLMs: Formulation and Practices(01:09:42) Google’s AI unit DeepMind announces UK 'automated research lab'Policy & Safety(01:10:28) Trump Moves to Stop States From Regulating AI With a New Executive Order - The New York Times(01:13:54) [2512.09742] Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs(01:17:57) Forecasting AI Time Horizon Under Compute Slowdowns(01:20:46) AI Security Institute focuses on AI measurements and evaluations(01:21:16) Nvidia AI Chips to Undergo Unusual U.S. Security Review Before Export to China(01:22:01) U.S. Authorities Shut Down Major China-Linked AI Tech Smuggling NetworkSynthetic Media & Art(01:24:01) RSL 1.0 has arrived, allowing publishers to ask AI companies pay to scrape content | The Verge See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1h 27m

Ratings & Reviews

4.7
out of 5
3 Ratings

About

Weekly summaries of the AI news that matters!

You Might Also Like