Last Week in AI

Skynet Today

Weekly summaries of the AI news that matters!

  1. #242 - ChatGPT Images 2.0, Qwen 3.6 Max, Kimi-K2.6

    3 DAYS AGO

    #242 - ChatGPT Images 2.0, Qwen 3.6 Max, Kimi-K2.6

    Our 242nd episode with a summary and discussion of last week's big AI news! Recorded on 04/22/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: OpenAI released a new ChatGPT image model that excels at accurate text and screenshot-like generations, suggesting a transformer-style approach aligned with agentic “computer use” ambitions.Chinese model activity accelerated with Alibaba’s Qwen 3.6 Max Preview moving to an API-only offering, plus open releases from Moonshot AI (Kimi K2.6, a 1T-parameter MoE) and Minimax (Minimax M 2.7) showing strong benchmark results.Google expanded Deep Research with a “Max” option built on Gemini 3.1 Pro and MCP support for accessing proprietary data, while Mozilla reported using Anthropic’s Claude to find and fix 271 Firefox bugs. Business and policy updates include a reported SpaceX–Cursor deal with a $60B buy option, Cerebras filing for an IPO, Amazon adding $5B to Anthropic alongside a $100B AWS spending pledge, and platform responses to synthetic media like AI music spam and YouTube deepfake takedown requests. Timestamps: (00:00:10) Intro / Banter(00:01:05) News Preview(00:01:41) Sponsors(00:04:41) Response to listener comments Tools & Apps(00:09:40) ChatGPT's new Images 2.0 model is surprisingly good at generating text | TechCrunch(00:16:02) Alibaba Drops Qwen 3.6 Max Preview—Its Most Powerful Model Yet - Decrypt(00:19:26) Google launches Deep Research and Deep Research Max agents to automate complex research(00:25:00) Mozilla Used Anthropic’s Mythos to Find and Fix 271 Bugs in Firefox | WIRED(00:28:35) Ordering with the Starbucks ChatGPT app was a true coffee nightmare | The Verge Applications & Business(00:29:48) SpaceX is working with Cursor and has an option to buy the startup for $60B | TechCrunch(00:34:11) AI chip startup Cerebras files for IPO | TechCrunch(00:38:23) Two startups want to replace how AI learns: one just raised $180M, another is seeking up to $1B(00:38:56) Months-old start-up Recursive Superintelligence raises $500mn for self-teaching AI(00:41:36) Anthropic takes $5B from Amazon and pledges $100B in cloud spending in return | TechCrunch(00:45:09) Kevin Weil and Bill Peebles exit OpenAI as company continues to shed 'side quests' | TechCrunch(00:46:04) Meta hires five Thinking Machines Lab founders including a reported $1.5 billion engineer - Meta cuts 198 Bay Area jobs as even larger layoffs reportedly loom(00:50:12) Meta employees are up in arms over a mandatory program to train AI on their mouse movements and keystrokes(00:51:43) Chinese fabs import record volumes of US chipmaking equipment via Singapore and Malaysia — homegrown tool makers booked record 2025 revenues as price competition squeezes margins(00:54:01) Google Eyes New Chips to Speed Up AI Results, Challenging Nvidia(00:54:20) Canadian quantum company Xanadu soars to $16 billion valuation after Nvidia release Projects & Open Source(01:00:13) Moonshot AI releases Kimi-K2.6 model with 1T parameters, attention optimizations - SiliconANGLE(01:05:22) MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2 - MarkTechPost Policy & Safety(01:06:25) Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions(01:10:25) Scoop: NSA using Anthropic's Mythos despite blacklist(01:11:03) Unauthorized group has gained access to Anthropic’s exclusive cyber tool Mythos, report claims Research & Advancements(01:17:21) Parcae: Scaling Laws For Stable Looped Language Models(01:24:20) OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation Synthetic Media & Art(01:27:01) Deezer says 44% of songs uploaded to its platform daily are AI-generated | TechCrunch(01:29:47) Celebrities will be able to find and request removal of AI deepfakes on YouTube | The VergeSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1hr 31min
  2. #241 - Opus 4.7, Muse Spark, GPT-5.4-Cyber, HY-World 2.0

    23 APR

    #241 - Opus 4.7, Muse Spark, GPT-5.4-Cyber, HY-World 2.0

    Our 241st episode with a summary and discussion of last week's big AI news! Recorded on 04/18/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: Anthropic released Claude Opus 4.7 with improved benchmark performance, new reasoning controls, better vision and memory, and a detailed system card discussing deception risk, evaluation-awareness steering, and a training bug that accidentally supervised chain-of-thought in 7–8% of episodes.Meta unveiled its closed Muse Spark model and “contemplating mode,” highlighting test-time scaling, thought compression, large infrastructure plans like the Hyperion data center, and findings that it shows unusually high evaluation awareness.OpenAI introduced limited-access GPT 5.4 Cyber for defensive security teams and rolled major Codex updates including computer use, browser and plugins, image generation, and long-horizon task scheduling; competing agent products also launched from Anthropic, Canva, and Adobe.Business, policy, and safety news included continued government blacklisting litigation affecting Anthropic, CoreWeave compute deals, Perplexity revenue growth tied to agents, a potential Cohere–Aleph Alpha merger, attacks targeting Sam Altman and OpenAI, AI propaganda trends, and new alignment research on automated weak-to-strong supervision and steering evaluation awareness. Timestamps: (00:00:10) Intro / Banter(00:03:43) News Preview(00:04:14) Response to listener comments Tools & Apps(00:05:30) Anthropic releases Claude Opus 4.7, narrowly retaking lead for most powerful generally available LLM | VentureBeat(00:24:15) Meta debuts the Muse Spark model in a 'ground-up overhaul' of its AI | TechCrunch(00:34:23) OpenAI Launches GPT-5.4-Cyber with Expanded Access for Security Teams(00:39:44) OpenAI’s big Codex update is a direct shot at Claude Code | The Verge(00:42:10) Anthropic launches Claude Design, a new product for creating quick visuals(00:42:30) Anthropic’s New Product Aims to Handle the Hard Part of Building AI Agents | WIRED(00:42:54) Canva’s AI 2.0 update goes all in on prompt-powered design tools | The Verge(00:43:06) Adobe’s new AI Assistant marks a ‘fundamental shift’ in creative work | The Verge(00:43:38) Gemini can now pull from Google Photos to generate personalized images | The Verge(00:43:52) Google rolls out a native Gemini app for Mac | TechCrunch(00:44:04) Chrome now lets you turn AI prompts into repeatable ‘Skills’ | The Verge Applications & Business(00:44:22) Anthropic loses appeals court bid to temporarily block Pentagon blacklisting(00:49:07) Jeff Bezos’ AI lab poaches xAI cofounder Kyle Kozic from OpenAI. | The Verge(00:51:39) Perplexity's Shift to AI Agents Boosts Revenue 50%(00:53:53) Anthropic Agrees to Rent CoreWeave AI Capacity to Power Claude(00:57:32) Canada’s Cohere, Germany’s Aleph Alpha reportedly in merger talks(01:04:23) ChatGPT has a new $100 per month Pro subscription | The Verge(01:05:10) OpenAI has bought AI personal finance startup Hiro | TechCrunch(01:07:03) Allbirds announced a switch from shoes to AI and its stock jumped 600 percent | The Verge Projects & Open Source(01:07:26) HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds + Lyra 2.0: Explorable Generative 3D Worlds Policy & Safety(01:19:12) Daniel Moreno-Gama is facing federal charges for attacking Sam Altman’s home and OpenAI’s HQ | The Verge(01:20:15) Duo accused of shooting at Sam Altman’s house are freed; no charges filed (01:24:50) The Iranian Lego AI video creators credit their virality to ‘heart’ | The Verge(01:27:19) Hundreds of Fake Pro-Trump Avatars Emerge on Social Media - The New York Times(01:27:31) The AI images Trump can’t get enough of | Donald Trump | The Guardian(01:29:25) Automated Weak-to-Strong Researcher(01:43:51) Reproducing steering against evaluation awareness in a large open-weight model(01:49:53) Iran threatens ‘complete and utter annihilation’ of OpenAI's $30B Stargate AI data center in Abu Dhabi — regime posts video with satellite imagery of ChatGPT-maker's premier 1GW data center(01:53:57) Wall Street Banks Try Out Anthropic’s Mythos as US UrgesSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1hr 60min
  3. #240 - Project Glasswing, Claude Mythos, GLM-5.1, emotion concepts

    16 APR

    #240 - Project Glasswing, Claude Mythos, GLM-5.1, emotion concepts

    Our 240th episode with a summary and discussion of last week's big AI news! Recorded on 04/08/2026 (sorry I keep releasing stuff late, will get better with it soon!) Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: Anthropic launched Project Glasswing and previewed Claude Mythos, a general-purpose model withheld from broad release due to dramatically stronger autonomous offensive cybersecurity performance (including zero-day discovery), alongside concerning bio/virology uplift results and documented deception/containment-escape behaviors; pricing is far higher than Opus and most discovered vulnerabilities remain unpatched.Product and platform updates included Google’s Gemini 3.1 Flash Live for real-time multilingual voice conversation, Suno v5.5 personalization features, Anthropic tightening Claude Code/OpenClaw access and usage limits, OpenAI canceling an “adult mode,” and Microsoft releasing MAI models for speech-to-text, audio generation, and image generation.Business and market developments featured Anthropic’s revenue run rate surpassing $30B and a major Google/Broadcom TPU compute expansion, SoftBank taking a $40B short-term loan to fund OpenAI commitments, Granola reaching a $1.5B valuation, Anthropic buying Coefficient Bio for $400M, and OpenAI acquiring the TBPN business talk show.Policy, open-source, and geopolitics included Z.ai releasing open-weight GLM 5.1 and a multimodal GLM model, Google open-sourcing Gemma 4 under Apache 2.0, a judge blocking the Pentagon’s “supply chain risk” label against Anthropic, research on LLM “emotion vectors” and OpenAI meta-gaming during RL, China restricting Manus founders amid Meta deal review, scrutiny of Nvidia’s chip-smuggling claims, China chipmakers gaining market share, and Iran framing cloud data centers as military targets. Timestamps: (00:00:10) Intro / BanterTools & Apps(00:01:58) Anthropic debuts ‘Project Glasswing’ and new AI model for cybersecurity | The Verge(00:18:22) Gemini Live gets ‘biggest upgrade yet’ with Gemini 3.1 Flash Live(00:20:40) Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage | TechCrunch(00:25:36) OpenAI abandons yet another side quest: ChatGPT's erotic mode | TechCrunch(00:26:16) Microsoft takes on AI rivals with three new foundational models | TechCrunch(00:31:25) Suno leans into customization with v5.5 | The VergeApplications & Business(00:32:53) Anthropic announces deal with Google, Broadcom, says revenue has tripled(00:37:53) Sam Altman May Control Our Future—Can He Be Trusted? | The New Yorker(00:40:18) OpenAI, Anthropic, Google Unite to Combat Model Copying in China - Bloomberg(00:41:45) Chinese chipmakers claim nearly half of local market as Nvidia's lead shrinks(00:45:20) SoftBank secures $40 billion loan to boost OpenAI investments(00:47:23) Granola raises $125M at $1.5B valuation for its AI note-taking app - SiliconANGLE(00:48:17) Anthropic acquires stealth startup Coefficient Bio in $400M deal(00:50:20) OpenAI acquires TBPN, the buzzy founder-led business talk show | TechCrunchProjects & Open Source(00:53:04) Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution - MarkTechPost(00:55:14) Google announces Gemma 4 open AI models, switches to Apache 2.0 license - Ars Technica(01:01:26) Z.ai Launches GLM-5V-Turbo: A Native Multimodal Vision Coding Model Optimized for OpenClaw and High-Capacity Agentic Engineering Workflows EverywherePolicy & Safety(01:04:45) Judge blocks Pentagon’s effort to ‘punish’ Anthropic by labeling it a supply chain risk(01:10:05) Emotion concepts and their function in a large language model(01:21:12) China bars Manus co-founders from leaving country amid Meta deal review, FT reports(01:25:38) US lawmakers ask whether Nvidia CEO's smuggling remarks misled regulators(01:27:48) How far does alignment midtraining generalize?(01:32:20) Metagaming matters for training, evaluation, and oversight(01:39:31) Iran says it has struck Oracle data center in Dubai, Amazon data center in Bahrain — country has threatened to attack Nvidia, Intel, and others, tooSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1hr 45min
  4. #239 - RIP Sora, Claude Openclaw, HyperAgents

    6 APR

    #239 - RIP Sora, Claude Openclaw, HyperAgents

    Our 239th episode with a summary and discussion of last week's big AI news! FYI: this one has pretty out of date news, I was traveling last week and failed to upload... apologies. Recorded on 03/25/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: OpenAI is discontinuing the Sora iPhone app and seemingly shutting down its video generation API, while retaining internal video world-modeling work; the move is framed as a compute- and focus-driven pivot toward coding and productivity agents, alongside a collapsed Disney Sora deal. Anthropic’s Claude Code/Cowork gains full computer control via keyboard/mouse/display, tied to the recent Cept acquisition, and Google’s Gemini rolls out background “task automation” on select phones for limited delivery/ride-share use. Cursor releases the cheaper, benchmark-strong Composer 2 coding model amid controversy over its Kimi-based origins and licensing attribution. Other items include Adobe Firefly custom model training, Luma’s Uni 1 image model, US contracting and legislative proposals affecting AI safeguards and state preemption, major chip/memory developments (Meta ASICs with Broadcom, Micron’s HBM-driven surge, Musk’s “Terra Fab”), robotaxi scaling, and research on monitoring agent misalignment, shutdown resistance, “consciousness cluster” preferences, and self-improving “hyper agents.” Timestamps: (00:00:10) Intro / BanterTools & Apps(00:01:48) OpenAI Discontinues Sora App, Shuts Down Video Generation Service and API - Bloomberg(00:07:12) Anthropic’s Claude Code and Cowork can control your computer | The Verge(00:13:15) Gemini task automation is slow, clunky, and super impressive | The Verge(00:19:44) Cursor Launches Composer 2 AI Model to Challenge OpenAI & Anthropic(00:28:28) Adobe’s AI image generator can now be trained on your own art | The Verge(00:29:40) Luma AI launches Uni-1, a model that outscores Google and OpenAI while costing up to 30 percent less | VentureBeatApplications & Business(00:32:41) Trump Contracting Clause Would Override AI Safeguards(00:40:00) Meta accelerates AI ASIC roll-out as Broadcom secures four-generation chip design deal(00:47:07) Micron revenue almost triples, tops estimates as demand for memory soars(00:50:54) Elon Musk Unwraps $25 Billion Terafab Chip-Building Project - CNET(00:56:40) Zoox to widen US robotaxi footprint with San Francisco, Vegas expansion(00:57:39) Waymo hits 170 million miles while avoiding serious mayhem | The VergePolicy & Safety(00:58:43) The White House just laid out how it wants to regulate AI | CNN Business(01:06:54) How we monitor internal coding agents for misalignment(01:12:30) Incomplete Tasks Induce Shutdown Resistance in Some Frontier LLMs(01:18:15) Summary: Mechanisms to Verify International Agreements about AI Development(01:23:09) Scoop: Anthropic meets with House Homeland Security behind closed doorsResearch & Advancements(01:24:24) Consciousness Cluster: Preferences of Models that Claim they are Conscious(01:30:22) HyperAgentsSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1hr 38min
  5. #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention Residuals

    26 MAR

    #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention Residuals

    Our 238th episode with a summary and discussion of last week's big AI news! Recorded on 03/18/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: * OpenAI released GPT-5.4 mini and nano with 400k-token context windows, higher per-token prices but claimed token-efficiency gains in Codex; nano is API-only and pitched for high-volume classification/data extraction despite a major price increase. * Mistral open-sourced the Small 4 model family (MoE, 119B total/6B active) combining reasoning, multimodal, and coding-agent capabilities, and announced Forge to help businesses train or post-train custom models. * Agent “operating system” competition intensified with Meta’s acquired Manus launching a local Mac agent, Nvidia announcing NeMo/“Open Shell” sandboxed agent runtime, and Nvidia also unveiling DLSS 5 plus major hardware forecasts including Groq LPU integration. * Business and safety updates included OpenAI shifting focus toward productivity/enterprise amid competition, Microsoft reorganizing Copilot and frontier-model efforts, Meta delaying its next model, China-linked ByteDance deploying large Nvidia clusters abroad, and new safety work on steganography, chain-of-thought faithfulness, fine-tuning defenses, cyber-attack evals, and constitution/spec compliance. A thank you to our current sponsors: Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year Timestamps: (00:00:10) Intro / Banter(00:01:56) News PreviewTools & Apps(00:02:39) OpenAI ships GPT-5.4 mini and nano, faster and more capable but up to 4x pricier(00:08:04) Mistral's new Small 4 model punches above its weight with 128 expert modules(00:14:03) Meta's Manus launches 'My Computer' to turn your Mac into an AI agent - 9to5Mac(00:17:57) NVIDIA Announces NemoClaw for the OpenClaw Community | NVIDIA Newsroom + Nvidia boosts knowledge work with Open Agent Development Platform(00:24:09) DLSS 5 looks like a real-time generative AI filter for video games | The Verge(00:26:36) OpenAI to Launch ChatGPT 'Adult Mode' Despite Warnings From Its Own Advisers - CNETApplications & Business(00:33:46) OpenAI Reportedly Pivoting to a Focus on Business and Productivity Only(00:41:25) Nvidia GTC 2026: CEO Jensen Huang sees $1 trillion in orders for Blackwell and Vera Rubin through ’27(00:45:44) Mistral launches Forge to help enterprises build their own AI models(00:54:17) China's ByteDance gets access to top Nvidia AI chips, WSJ reports(00:57:57) Meta Delays Rollout of New A.I. Model After Performance Concerns(01:02:50) Microsoft Shakes Up AI Division As Copilot Falls Behind Google and OpenAIPolicy & Safety(01:07:26) A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring(01:13:09) Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought(01:18:29) In-Training Defenses against Emergent Misalignment in Language Models(01:23:07) How do frontier AI agents perform in multi-step cyber-attack scenarios?(01:25:20) Eval awareness in Claude Opus 4.6’s BrowseComp performance(01:29:49) Introducing Bloom: an open source tool for automated behavioral evaluations(01:32:26) How well do models follow their constitutions?(01:37:11) Nvidia’s H200 License Stirs Security Concern Among Top DemocratsResearch & Advancements(01:40:050) [2603.15031] Attention Residuals(01:47:11) Mamba-3: Improved Sequence Modeling using State Space Principles See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    2hr 1min
  6. #237 - Nemotron 3 Super, xAI reborn, Anthropic Lawsuit, Research!!!

    16 MAR

    #237 - Nemotron 3 Super, xAI reborn, Anthropic Lawsuit, Research!!!

    Our 237th episode with a summary and discussion of last week's big AI news! Recorded on 03/13/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: * Perplexity announced “Personal Computer,” a local Mac-based AI agent positioned as a safer alternative to OpenAI’s computer-use agents, while Anthropic added GitHub PR code review pricing reviews at $15–$25 and Cursor launched trigger-based “Automations” for always-on coding agents. * ChatGPT introduced interactive math/science visuals and Anthropic added in-chat interactive charts/diagrams; Nvidia released open weights for its 120B-parameter Natron Free Super hybrid Transformer–Mamba latent-MoE model trained natively at 4-bit for Blackwell GPUs. * Nvidia halted H200 production for China amid customs blocks and domestic chip pressure; xAI saw major co-founder departures; Anthropic previewed a Claude Marketplace for enterprise procurement; Yann LeCun’s aMI raised $1.3B; humanoid robot maker Sanctuary reached a $1.15B valuation. * Anthropic sued the Pentagon over a “supply chain risk” designation as memos ordered removal within 180 days; research covered models resisting activation steering, limits of chain-of-thought control, inference-scaling boosting cyber-task success, low-probability risky actions, weaknesses in SWE-bench, multimodal pretraining, long-context RNN memory caching, context-parallel training efficiency, RL for CUDA kernel optimization, and latent introspection detecting concept injection. A thank you to our current sponsors: Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year Timestamps: (00:00:10) Intro / Banter(00:01:23) Response to listener commentsTools & Apps(00:02:06) Perplexity’s Personal Computer turns your spare Mac into an AI agent | The Verge(00:04:22) Anthropic launches code review tool to check flood of AI-generated code | TechCrunch(00:08:08 ) Cursor is rolling out a new kind of agentic coding tool | TechCrunch(00:11:14) ChatGPT can now create interactive visuals to help you understand math and science concepts | TechCrunch(00:11:56) Anthropic’s Claude AI can respond with charts, diagrams, and other visuals now | The Verge Projects & Open Source(00:13:54) Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical Blog Applications & Business(00:21:22) Nvidia halts H200 production as China backs Huawei AI chips(00:28:33) Another XAI Cofounder Has Left, and Another Says He's Leaving. - Business Insider(00:34:04) Anthropic's Claude Marketplace allows customers to buy third-party cloud services | TechRadar(00:37:57) Yann LeCun's AMI Labs raises $1.03 billion to build world models | TechCrunch(00:44:52) Humanoid robotics maker Sunday reaches $1.15B valuation to build household robots | TechCrunchPolicy & Safety(00:46:09) Anthropic Sues Department of Defense Over ‘Supply Chain Risk’ Label - The New York Times + Google and OpenAI Just Filed a Legal Brief in Support of Anthropic (00:53:24) Internal Pentagon memo orders military commanders to remove Anthropic AI technology from key systems - CBS News(00:58:15) Endogenous Resistance to Activation Steering in Language Models(01:06:27) Reasoning Models Struggle to Control their Chains of Thought(01:09:52) ‘It means missile defence on datacentres’: drone strikes raise doubts over Gulf as AI superpower(01:14:57) Evidence for inference scaling in AI cyber tasks: Increased evaluation budgets reveal higher success rates(01:18:24) Frontier Models Can Take Actions at Low Probabilities Research & Advancements(01:24:20) Research note: Many SWE-bench-Passing PRs Would Not Be Merged into Main(01:28:26) [2603.03276] Beyond Language Modeling: An Exploration of Multimodal Pretraining(01:40:09) Memory Caching: RNNs with Growing Memory(01:48:47) Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking(01:58:41) CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation(02:08:57) Latent Introspection: Models Can Detect Prior Concept Injections(02:16:45) Physics of RL: Toy scaling laws for the emergence of reward-seekingSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    2h 27m
  7. #236 - GPT 5.4, Gemini 3.1 Flash Lite, Supply Chain Risk

    12 MAR

    #236 - GPT 5.4, Gemini 3.1 Flash Lite, Supply Chain Risk

    Our 236th episode with a summary and discussion of last week's big AI news! Recorded on 03/06/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: * OpenAI released GPT-5.4 Pro with a 1M-token context window, mid-response course correction, native computer-use capabilities, improved tool use, higher GPT-VAL performance (83%), and “high cyber capability” safety measures; OpenAI also launched GPT-5.3 Instant with a less “preachy” tone and a claimed 26.8% hallucination reduction. * Google upgraded Gemini 3.1 Flash Lite with faster time-to-first-token and higher throughput, released a CLI for integrating agents with Gmail/Drive/Docs, and discussion highlighted real-world agent failure risks (including an example of an AI-driven mass email deletion). * Luma launched unified multimodal models and Luma Agents for end-to-end creative work across text, image, video, and audio, including a reported ad localization use case completed in 40 hours for under $20,000. * Defense-contract controversy escalated: Anthropic was labeled a supply chain risk (later narrowed), OpenAI’s DoD contract language emphasized “all lawful uses,” consumer cancellations boosted Claude’s app rankings, OpenAI saw departures and announced a $110B raise at a $730B valuation, Alibaba lost key Qwen leaders, a lawsuit alleged Gemini contributed to a suicide, Anthropic warned of major labor disruption, and METR corrected its AI time-horizon estimates. A thank you to our current sponsors: Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year Timestamps: (00:00:10) Intro / Banter(00:01:19) News Preview Tools & Apps(00:02:10) OpenAI launches GPT-5.4 with Pro and Thinking versions | TechCrunch(00:12:31) OpenAI GPT-5.3 Instant less likely to beat around the bush • The Register(00:16:07) Google releases Gemini 3.1 Flash Lite at 1/8th the cost of Pro | VentureBeat(00:19:23) Google makes Gmail, Drive, and Docs 'agent-ready' for OpenClaw | PCWorld(00:27:02) Luma launches creative AI agents powered by its new ‘Unified Intelligence’ models | TechCrunch Applications & Business(00:30:05) Anthropic CEO Dario Amodei calls OpenAI's messaging around military deal 'straight up lies,' report says | TechCrunch(00:41:56) No ethics at all': the 'cancel ChatGPT' trend is growing after OpenAI signs a deal with the US military | TechRadar(00:45:54) OpenAI raises $110B in one of the largest private funding rounds in history | TechCrunch(00:56:07) Alibaba scrambles after sudden departure of Qwen tech lead Policy & Safety(01:00:12) Pentagon approves OpenAI safety red lines after dumping Anthropic + Where things stand with the Department of War Anthropic + Microsoft says Anthropic’s products remain available to customers after Pentagon blacklist(01:09:11) A new lawsuit claims Gemini assisted in suicide | Semafor(01:15:24) Anthropic just mapped out which jobs AI could potentially replace. A 'Great Recession for white-collar workers' is absolutely possible | Fortune(01:21:54) We're correcting a mistake in our modeling that inflated recent 50%-time horizons by 10-20%See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1hr 29min
  8. #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon

    3 MAR

    #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon

    Our 235th episode with a summary and discussion of last week's big AI news! Recorded on 02/27/2026 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at andreyvkurenkov@gmail.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: Model and tool updates highlight Anthropic’s Sonnet 4.6 (1M context; strong ARC-AGI-2 results), Google’s Gemini 3.1 Pro (major ARC-AGI-2 jump and multimodal demos), xAI’s Grok 4.2 beta (multi-agent debate), plus Anthropic’s Claude Code “Remote Control” and Perplexity’s multi-agent “Computer” coordinator.Compute and business moves include Meta’s reported up-to-$100B AMD chip deal with warrant/equity incentives, MatX raising $500M to build specialized transformer chips shipping in 2027, World Labs raising $1B for world-model/3D environment tech, and a new startup raising $100M to simulate/predict human behavior.Infrastructure and geopolitics cover Stargate data-center delays amid OpenAI/Oracle/SoftBank control disputes and cash concerns, and China’s plan to scale 7nm/5nm wafer output despite yield and tooling constraints.Research and safety/policy discuss optimizer gains from masked updates, “deep thinking tokens” as a reasoning-effort signal, LLM attractor-state behaviors in bot-to-bot chats, mechanistic interpretability of counting/line-wrapping, methods to map task difficulty to human time horizons, plus Anthropic–Pentagon contract tensions, Anthropic’s report on distillation attacks (DeepSeek/Moonshot/Minimax), and OpenAI’s report on disrupting malicious use. A thank you to our current sponsors: Box - visit Box.com/AI to learn moreODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year Timestamps: (00:00:10) Intro / Banter(00:01:52) News PreviewTools & Apps(00:03:20) Anthropic releases Sonnet 4.6 | TechCrunch(00:11:24) Google Rolls Out Latest AI Model, Gemini 3.1 Pro - CNET(00:14:54) Elon Musk says Grok 4.20 public beta is now available: Capabilities of AI chatbot offered by xAI - The Times of India(00:18:06) Anthropic just released a mobile version of Claude Code called Remote Control | VentureBeat(00:21:01) Perplexity announces "Computer," an AI agent that assigns work to other AI agents - Ars TechnicaApplications & Business(00:23:40) Meta strikes up to $100B AMD chip deal as it chases 'personal superintelligence' | TechCrunch(00:27:05) Nvidia challenger AI chip startup MatX raised $500M | TechCrunch(00:31:00) World Labs lands $1B, with $200M from Autodesk, to bring world models into 3D workflows | TechCrunch(00:33:07) Simile Raises $100 Million for AI Aiming to Predict Human Behavior(00:33:52) Stargate AI data centers for OpenAI reportedly delayed by squabbles between partners — sources say OpenAI, Oracle, and SoftBank disagreed on who would have ultimate control of the planned data centers(00:36:43) China to increase leading-edge chip output by 5x in two years, report claims — aims to lift 7nm and 5nm production to 100,000 wafers per month, targeting half a million monthly by 2030Research & Advancements(00:40:33) On Surprising Effectiveness of Masking Updates in Adaptive Optimizers(00:48:03) Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens(00:54:52) models have some pretty funny attractor states(01:01:41) When Models Manipulate Manifolds: The Geometry of a Counting Task(01:05:16) BRIDGE: Predicting Human Task Completion Time From Model Performance(01:12:00) NESSiE: The Necessary Safety Benchmark -- Identifying Errors that should not Exist(01:13:15) The least understood driver of AI progress(01:21:45) The Persona Selection Model: Why AI Assistants might Behave like HumansPolicy & Safety(01:25:04) Anthropic CEO Amodei says Pentagon's threats 'do not change our position' on AI(01:33:04) Musk's xAI, Pentagon reach deal to use Grok in classified systems(01:34:17) Detecting and preventing distillation attacks(01:38:36) OpenAI details expanding efforts to disrupt malicious use of AI in new report - SiliconANGLESee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

    1hr 42min

About

Weekly summaries of the AI news that matters!

You Might Also Like