AI Explained Official Podcast

Philip - Host of AI Explained YT

Covering the biggest news of the century - the arrival of smarter-than-human AI. From the author of Simple Bench, which reveals the remaining gap between LLM and human reasoning. Hype-free, and the British accent is a freebie bonus.

  1. What the Freakiness of 2025 in AI Tells Us About 2026

    23/12/2025

    What the Freakiness of 2025 in AI Tells Us About 2026

    It’s probably not possible to satisfactorily condense a 12 month’s worth of weird progress in AI, as well as predictions for the year to come, into one video. But I’m gonna try anyway because it has been a very strange time. http://matsprogram.org/s26-aie My new app! https://lmcouncil.ai Patreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094 Chapters: 00:00 - Introduction 00:34 - Reasoning Models … and limits 02:54 - A playable world 03:36 - Realism 03:50 - AI Slop gone mainstream 05:03 - DolphinGemma 05:39 - Public Mood 07:34 - AI Enlisted 08:30 - GPT-5 11:05 - Open Weight not out 13:00 - METR Breakout 17:30 - VASA-1 18:28 - Lateral Productivity 20:15 - 1 or 1000 benchmarks needed? 24:54 - Continual Learning + Altman on Superintelligence 28:08 - Automated Information Discovery ft AlphaEvolve Hassabis on Generality: https://x.com/demishassabis/status/2003097405026193809 https://www.youtube.com/watch?v=PqVbypvxDto Gemini 3: https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif Reasoning Trade-offs: https://arxiv.org/pdf/2504.13837 DolphinGemma: https://blog.google/technology/ai/dolphingemma/?s=09 Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/ METR Time Horizon: https://arxiv.org/pdf/2503.14499 https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ Flaws: https://x.com/ShashwatGoel7/status/2002369517499105443 https://shash42.substack.com/p/how-to-game-the-metr-plot https://x.com/METR_Evals/status/2002203627377574113 GPT-5 - Altman phd in everything: https://edition.cnn.com/2025/08/14/business/chatgpt-rollout-problems https://simple-bench.com/ AI Slop: https://www.youtube.com/watch?v=I_3vxoJDD9k https://www.theguardian.com/technology/2025/dec/16/boost-for-artists-in-ai-copyright-battle-as-only-3-per-cent-back-uk-active-opt-out-plan Survey: https://x.com/SearchlightInst/status/2001057144842387920/photo/1 Nvidia Nemotron: https://x.com/percyliang/status/2000608134205985169 OpenAI Compute Flywheel: https://x.com/OpenAI/status/2001363007209914399/photo/1 Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ AI in Govt: https://x.com/jdcmedlock/status/1939814516503847259 Benchmark Gaming: https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/ AlphaEvolve: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content= Continual Learning: https://abehrouz.github.io/files/NL.pdf Job Risk: https://archive.ph/20250708204527/https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic GPT4o: https://x.com/AISafetyMemes/status/1916889492172013989 Vasa-1: https://www.microsoft.com/en-us/research/project/vasa-1/ Three Views: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines Turing Test: https://x.com/tunguz/status/1907185471211422147 Karpathy Year in Review: https://karpathy.bearblog.dev/year-in-review-2025/ LLM Brainrot: https://arxiv.org/pdf/2510.13928 Lateral Productivity: https://www.aisi.gov.uk/frontier-ai-trends-report Emotional Quotient: https://arxiv.org/pdf/2511.08394 Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/ AI Insiders ($9!): https://www.patreon.com/AIExplained

    33 min
  2. Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but …

    19/12/2025

    Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but …

    The condensed highlights of hours of AI lab leader interviews, model releases, Gemini 3 Flash insights (plus it’s hidden flaw), Hassabis’ ‘proto-AGI’ and much more… https://matsprogram.org/apply?utm_source=ai-explained&utm_medium=youtube&utm_campaign=s26   Also, do check out my new app: https://lmcouncil.ai Chapters:  00:00 - Introduction 00:50 - Results 02:44 - But… the Flaw 04:49 - So Benchmarks are fake? No 07:37 - Spatial Reasoning + Hassabis 10:06 - Proto-AGI 12:07 - Minimal AGI 15:07 - Compute Slowdown 17:56 - New Data Paradigm Gemini 3 Flash: https://deepmind.google/models/gemini/flash/ Hassabis Interview: https://www.youtube.com/watch?v=PqVbypvxDto Legg Interview: https://www.youtube.com/watch?v=l3u_FAv33G0 Pre-training Lead Interview: https://www.youtube.com/watch?v=cNGDAqFXvew Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ Brockman Video: https://x.com/OpenAI/status/2001336514786017417 Post-Training Reveal: https://x.com/OfficialLoganK/status/2001742530472534442 Hallucinations Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf Patreon Hallucinations Vid: https://www.patreon.com/posts/blockers-to-and-139264812 AA-Omniscience Benchmark: https://artificialanalysis.ai/evaluations/omniscience https://arxiv.org/pdf/2511.13029 lmcouncil.ai/benchmarks  https://simple-bench.com/ https://x.com/scaling01/status/1999620587744813205 5.2 Codex Drop: https://cdn.openai.com/pdf/ac7c37ae-7f4c-4442-b741-2eabdeaf77e0/oai_5_2_Codex.pdf OpenAI Compute Trend: https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq Cramer Tweet/Response: https://x.com/BorisMPower/status/2001440650210976018 OpenAI Valuation: ​​https://www.theinformation.com/articles/openai-discussed-raising-tens-billions-valuation-around-750-billion?rc=sy0ihq Indian Data: https://www.reuters.com/world/india/with-freebies-openai-google-vie-indian-users-training-data-2025-12-17/ TheInformation Data: https://x.com/theinformation/status/2001421225751351778 Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/ Sima 2: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/ Veo 3.1: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/ METR: https://metr.org/blohttps://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ AI Insiders ($9!): https://www.patreon.com/AIExplained Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    20 min
  3. GPT 5.2: OpenAI Strikes Back

    12/12/2025

    GPT 5.2: OpenAI Strikes Back

    Full GPT-5.2 breakdown - did OpenAI reclaim the crown? A story of tokens, time and cost, plus 9 details you wouldn’t get just from reading the headlines. https://www.youtube.com/@eightythousandhours AI Insiders ($9!): https://www.patreon.com/AIExplained https://lmcouncil.ai Chapters: 00:00 - Introduction 00:55 - Better than Human @ Professional Tasks? 04:42 - Test time Compute 07:05 - Benchmark Selection 09:32 - Simple Results + council comparison 13:01 - Long Context 13:52 - Self-Improvement 15:00 - 10 Years + New Models Release Page: https://openai.com/index/introducing-gpt-5-2/ GPT 5.2 Benchmark Comparison: https://www.reddit.com/r/singularity/comments/1pka1y9/gpt52_all_20_benchmarks_rankings_and_pricing/ https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif https://lmcouncil.ai/benchmarks Charxiv: https://charxiv.github.io/#leaderboard GDPval: https://arxiv.org/pdf/2510.04374 My vid: https://www.youtube.com/watch?v=oK5LxMaROSA Kilpatrick: https://x.com/OfficialLoganK/status/1999270402712023158/photo/1 Noam Brown: https://x.com/polynoamial/status/1999189845164667132 New Model in New Year: https://www.theinformation.com/articles/openai-developing-garlic-model-counter-googles-recent-gains?rc=sy0ihq 10 Years of OpenAI: https://openai.com/index/ten-years/ GPQA: https://x.com/idavidrein/status/1841265634170278063 ARC-AGI 1-2: https://arcprize.org/arc-agi/2/ Sunday Robotics: https://x.com/tonyzzhao/status/1991204839578300813 Non-hype Newsletter: https://signaltonoise.beehiiv.com/ https://lmcouncil.ai

    18 min
  4. You Are Being Told Contradictory Things About AI: 8 examples

    05/12/2025

    You Are Being Told Contradictory Things About AI: 8 examples

    With headlines of an imminent job apocalypse, code red for ChatGPT and recursive self-improvement, at the same time as Anthropic's CEO yesterday saying we know how to scale to AGI, and Gemini 3 DeepThink out today, it is easy to get lost among the narratives and counter-narratives. So here are both, plus the facts behind them, for you to decide. https://epoch.ai/data/data-centers Epoch AI is the sponsor of today’s video, and my views, and those expressed in this video, do not necessarily reflect Epoch AI’s views in any way. Chapters:  00:00 - Introduction 00:42 - Job Apocalypse? 01:45 - Scaling to AGI 04:15 - Recursive Self-Improvement Needed, or Not 09:57 - OpenAI Code Red vs Gemini 3 DeepThink vs Claude Opus 4.5 13:27 - DeepSeek Speciale vs Mistral Large v3 16:45 - Claude Soul Document https://lmcouncil.ai/ AI Insiders ($9!): https://www.patreon.com/AIExplained Guardian Interview: https://www.theguardian.com/technology/ng-interactive/2025/dec/02/jared-kaplan-artificial-intelligence-train-itself MIT Study on Jobs/Tasks: https://iceberg.mit.edu/report.pdf vs https://www.cnbc.com/2025/11/26/mit-study-finds-ai-can-already-replace-11point7percent-of-us-workforce.html Amodei on Scaling: https://www.youtube.com/watch?v=FEj7wAjwQIk Claude Soul Document: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document Capabilities Original Stance: https://www.anthropic.com/news/core-views-on-ai-safety Ilya Interview: https://www.dwarkesh.com/p/ilya-sutskever-2 Ricursive Intelligence: https://x.com/RicursiveAI/status/1995932204703346946 Economist Worker Usage of GenAI: https://www.economist.com/finance-and-economics/2025/11/26/investors-expect-ai-use-to-soar-thats-not-happening#selection-1409.94-1413.42 Mistral v3 Large: https://docs.mistral.ai/models/mistral-large-3-25-12 Compute Slowdown Paper: https://joel-becker.com/images/publications/forecasting_time_horizon_under_compute_slowdown.pdf https://x.com/joel_bkr/status/1993023436541903155 METR Chart: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq OpenAI Code Red: https://www.anthropic.com/news/core-views-on-ai-safety Rocket Company: https://www.independent.co.uk/news/world/americas/sam-altman-rocket-elon-musk-spacex-b2878351.html DeepSeek Paper: https://arxiv.org/html/2512.02556v1 DeepSeek Crowdstrike CCP: https://www.crowdstrike.com/en-us/blog/crowdstrike-researchers-identify-hidden-vulnerabilities-ai-coded-software/ https://simple-bench.com/ Patreon Post: https://www.patreon.com/c/aiexplained/posts Robot: https://x.com/jloganolson/status/1985850115379351799

    20 min
  5. Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that

    14/11/2025

    Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that

    A lot just got released in the last 36 hours, and it will all affect hundreds of millions of people. 10 details you would miss if you just read the headlines, from GPT 5.1 regressions, to how Claude hacked Govt Agencies, to SIMA 2, and Musical Turing Tests. https://assemblyai.com/aiexplained Chapters: 00:00 - Introduction 00:56 - GPT 5.1 Smarter? 01:47 - Some Regressions 03:22 - Sycophancy? 05:22 - Claude Auto-Hacking  06:16 - Jailbreaking through Granularity 08:22 - This Will be Re-used 09:30 - Hallucinating Hacker 09:57 - Surprisingly Neutral Tone 12:18 - SIMA 2 14:10 - Alpha Parallels 17:24 - AI Music GPT 5.1 Announcement: https://openai.com/index/gpt-5-1/ System Card: https://cdn.openai.com/pdf/4173ec8d-1229-47db-96de-06d87147e07e/5_1_system_card.pdf Benchmarks: https://openai.com/index/gpt-5-1-for-developers/ Simple Bench: https://lmcouncil.ai/benchmarks Auto-Hacking: https://x.com/AnthropicAI/status/1989033793190277618 https://www.anthropic.com/news/disrupting-AI-espionage Report: https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf Sima 2 Announcement: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/ https://x.com/amoufarek/status/1988986075331858693 Scepticism: https://www.technologyreview.com/2025/11/13/1127921/google-deepmind-is-using-gemini-to-train-agents-inside-goat-simulator-3/ Voyager: https://voyager.minedojo.org/ Reuters Music: https://www.reuters.com/legal/litigation/are-you-listening-bots-survey-shows-ai-music-is-virtually-undetectable-2025-11-12/

    18 min
  6. Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)

    10/11/2025

    Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)

    Don’t let headlines about bubbles distract you from the real avenues of progress being explored in AI every week, including what had been thought to be a long-term blocker - continual learning (learning on the fly).  https://app.grayswan.ai/ai-explained This, plus models introspecting (hesitate before you berate), Nano Banana 2 possibly spotted, Chinese imagen and more. AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 01:26 - Continual Learning (Nested Learning / HOPE) 07:00 - Introspection 10:54 - Image-Gen Progress Nested Learning Post: https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/ Nested Learning Paper: https://abehrouz.github.io/files/NL.pdf Original Titans Paper: https://arxiv.org/pdf/2501.00663 Siri News: https://www.bloomberg.com/news/articles/2025-11-05/apple-plans-to-use-1-2-trillion-parameter-google-gemini-model-to-power-new-siri Introspection: https://www.anthropic.com/research/introspection Full Paper: https://transformer-circuits.pub/2025/introspection/index.html#mechanisms Earlier Work: https://www.anthropic.com/research/mapping-mind-language-model https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html Release Post: https://x.com/AnthropicAI/status/1983584136972677319 https://lmcouncil.ai  Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/

    13 min
  7. Sora 2 - It will only get more realistic from here

    01/10/2025

    Sora 2 - It will only get more realistic from here

    Sora 2 - the start of the infinite slop-feed or a key step to a generalist agent? Better than VEO 3 or over-hyped? I bring out 6 details you may have missed, contrast the announcement to Periodic Labs and even squeeze in some Claude Sonnet 4.5 analysis. Maybe I should make my videos longer… https://80000hours.org/aiexplained AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:40 - Two models? 01:15 - Rollout Details 01:43 - Versus Sora 1 / Veo 3 04:30 - Sora App / Social Media 06:40 - Masterplan 09:30 - Generalist Agent? Periodic Labs 12:05 - Claude Sonnet 4.5 13:42 - Future Outlook Announcement: https://openai.com/index/sora-2/ Launch Video: https://www.youtube.com/live/gzneGhpXwjU System Card: https://cdn.openai.com/pdf/50d5973c-c4ff-4c2d-986f-c72b5d0ff069/sora_2_system_card.pdf Sam Altman Blog Post on Sora App: https://blog.samaltman.com/sora-2 Most Intelligent Claim: https://x.com/willdepue/status/1973089331284681110 GTA: https://x.com/AndrewCurran_/status/1973298436536766666 Meta Vibes: https://x.com/alexandr_wang/status/1971295156411433228?s=46 Altman on Regulations: https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman OpenAI Profit: https://www.theinformation.com/articles/openais-first-half-results-4-3-billion-sales-2-5-billion-cash-burn?rc=sy0ihq Periodic Labs: https://periodic.com/ https://www.nytimes.com/2025/09/30/technology/ai-meta-google-openai-periodic.html https://x.com/LiamFedus/status/1973055380193431965 https://baincapitalventures.com/insight/we-must-know-we-will-know/?s=09 Sonnet 4.5: https://www.anthropic.com/news/claude-sonnet-4-5 https://simple-bench.com/ Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/

    16 min

Descrizione

Covering the biggest news of the century - the arrival of smarter-than-human AI. From the author of Simple Bench, which reveals the remaining gap between LLM and human reasoning. Hype-free, and the British accent is a freebie bonus.

Potrebbero piacerti anche…