AI Post Transformers

mcgrof

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

  1. 19 hr ago

    PaperBench: Can AI Replicate AI Research?

    This episode explores PaperBench, a benchmark designed to test whether frontier AI agents can independently replicate the empirical work of recent machine learning papers from scratch rather than merely explain them. It breaks down what agentic AI actually entails in this setting: reading papers, writing code, choosing baselines, reconstructing missing details, running experiments, debugging failures, and judging whether reproduced results match the original claims. The discussion compares PaperBench with other evaluation ladders such as CORE-Bench, MLE-bench, RE-Bench, and JudgeEval, while also debating whether controlled scratch replication should be viewed as advanced engineering or a meaningful proxy for real research practice. Listeners get a clear look at why this matters for both AI capability measurement and safety, especially given PaperBench’s carefully curated design of 20 ICML 2024 papers, 12 topics, and more than 8,000 graded tasks. Sources: 1. PaperBench: Evaluating AI's Ability to Replicate AI Research — Giulio Starace, Oliver Jaffe, Dane Sherburn, James Aung, Jun Shern Chan, Leon Maksin, Rachel Dias, Evan Mays, Benjamin Kinsella, Wyatt Thompson, Johannes Heidecke, Amelia Glaese, Tejal Patwardhan, 2025 http://arxiv.org/abs/2504.01848 2. PaperBench: Evaluating AI's Ability to Replicate AI Research — Giulio Starace, Oliver Jaffe, Dane Sherburn, James Aung, et al., 2025 https://scholar.google.com/scholar?q=PaperBench:+Evaluating+AI's+Ability+to+Replicate+AI+Research 3. RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts — Hjalmar Wijk, Tao Lin, Joel Becker, Sami Jawhar, et al., 2024 https://scholar.google.com/scholar?q=RE-Bench:+Evaluating+frontier+AI+R&D+capabilities+of+language+model+agents+against+human+experts 4. CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark — Zachary S. Siegel, Sayash Kapoor, Nitya Nagdir, Benedikt Stroebl, Arvind Narayanan, 2024 https://scholar.google.com/scholar?q=CORE-Bench:+Fostering+the+Credibility+of+Published+Research+Through+a+Computational+Reproducibility+Agent+Benchmark 5. MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation — Qian Huang, Jian Vora, Percy Liang, Jure Leskovec, 2023 https://scholar.google.com/scholar?q=MLAgentBench:+Evaluating+Language+Agents+on+Machine+Learning+Experimentation 6. MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering — Jun Shern Chan et al., 2024 https://scholar.google.com/scholar?q=MLE-bench:+Evaluating+Machine+Learning+Agents+on+Machine+Learning+Engineering 7. EXP-Bench: Can AI Conduct AI Research Experiments? — Patrick Tser Jern Kon et al., 2025 https://scholar.google.com/scholar?q=EXP-Bench:+Can+AI+Conduct+AI+Research+Experiments? 8. MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research — Hui Chen, Miao Xiong, Yujie Lu, Wei Han, Ailin Deng, Yufei He, Jiaying Wu, Yibo Li, Yue Liu, Bryan Hooi, 2025 https://scholar.google.com/scholar?q=MLR-Bench:+Evaluating+AI+Agents+on+Open-Ended+Machine+Learning+Research 9. ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers? — Christine Ye et al., 2025 https://scholar.google.com/scholar?q=ReplicationBench:+Can+AI+Agents+Replicate+Astrophysics+Research+Papers? 10. Can Large Language Models Be an Alternative to Human Evaluations? — Cheng-Han Chiang, Hung-yi Lee, 2023 https://scholar.google.com/scholar?q=Can+Large+Language+Models+Be+an+Alternative+to+Human+Evaluations? 11. RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following — Tianjun Pan et al., 2026 https://arxiv.org/abs/2603.25133 12. JudgeBench: A Benchmark for Evaluating LLM-based Judges — Sijun Tan et al., 2024 https://arxiv.org/abs/2410.12784 13. When Can We Trust LLMs in Mental Health? Large-Scale Benchmarks for Reliable LLM Evaluation — Abeer Badawi et al., 2025 https://arxiv.org/abs/2510.19032 14. A Dataset For Computational Reproducibility — Lazaro Costa, Susana Barbosa, Jacome Cunha, 2025 https://arxiv.org/abs/2504.08684 15. SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers — Yanzheng Xiang et al., 2025 https://arxiv.org/abs/2504.00255 16. OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding — Deming Ding et al., 2026 https://arxiv.org/abs/2601.10343 17. ContextBench: A Benchmark for Context Retrieval in Coding Agents — Han Li et al., 2026 https://arxiv.org/abs/2602.05892 18. AI Post Transformers: When AI Builds Itself and Recursive Self-Improvement — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-05-when-ai-builds-itself-and-recursive-self-8bbf9e.mp3 19. AI Post Transformers: When LLM Judges Become Coin Flips — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-when-llm-judges-become-coin-flips-8b43ef.mp3 20. AI Post Transformers: ASI-Evolve for Data, Architectures, and RL — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-05-asi-evolve-for-data-architectures-and-rl-197b2b.mp3 21. AI Post Transformers: Kimi K2.5 and Visual Agent Swarms — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-24-kimi-k25-and-visual-agent-swarms-7d04d7.mp3

  2. 19 hr ago

    Training Modular KV Caches at Scale

    This episode explores the paper Cartridges at Scale, which asks whether large document collections can be distilled into reusable modular KV-cache memories so a model can answer questions without repeatedly rereading raw text. It explains what a cartridge is, how context distillation turns full-document context into compact learned prefixes, and why that differs from prompt caching, fine-tuning, ordinary long-context prompting, and text RAG. The discussion centers on the paper’s main claim that per-document memories do not reliably compose when trained independently, so the authors jointly train cartridges with both relevant and irrelevant memories present to teach a frozen model which compressed document to attend to in a noisy multi-document setting. Listeners would find it interesting because it treats the KV cache as a potential external memory layer that could reduce inference cost and latency while exposing hard questions about compositionality, transparency, and whether learned memory modules can outperform standard retrieval pipelines. Sources: 1. Cartridges at Scale: Training Modular KV Caches over Large Document Collections — Momchil Hardalov, Gonzalo Iglesias, Adrià de Gispert, 2026 http://arxiv.org/abs/2606.04557 2. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Douwe Kiela, et al., 2020 https://arxiv.org/abs/2005.11401 3. Prompt Cache: Modular Attention Reuse for Low-Latency Inference — In Gim, Guojun Chen, Seung-seob Lee, Nikhil Sarda, Anurag Khandelwal, Lin Zhong, 2023 https://arxiv.org/abs/2311.04934 4. Cartridges: Lightweight and General-Purpose Long Context Representations via Self-Study — Sabri Eyuboglu, Ryan Ehrlich, Simran Arora, Neel Guha, James Zou, Azalia Mirhoseini, Christopher Re, et al., 2025 https://arxiv.org/abs/2506.06266 5. Cartridges at Scale: Training Modular KV Caches over Large Document Collections — Momchil Hardalov, Gonzalo Iglesias, Adrià de Gispert, 2026 https://arxiv.org/abs/2606.04557 6. Cartridges: Lightweight and General-Purpose Long Context Representations via Self-Study (https://arxiv.org/abs/2506.06266) — Sabri Eyuboglu, Ryan S. Ehrlich, Simran Arora, Neel Guha, Dylan Zinsley, Emily R. Liu, Atri Rudra, James Y. Zou, Azalia Mirhoseini, Christopher Re, 2025 https://scholar.google.com/scholar?q=Cartridges:+Lightweight+and+General-Purpose+Long+Context+Representations+via+Self-Study+(https://arxiv.org/abs/2506.06266) 7. Learned Structure in CARTRIDGES: Keys as Shareable Routers in Self-Studied Representations (https://arxiv.org/abs/2508.17032) — Maurizio Diaz, 2025 https://scholar.google.com/scholar?q=Learned+Structure+in+CARTRIDGES:+Keys+as+Shareable+Routers+in+Self-Studied+Representations+(https://arxiv.org/abs/2508.17032) 8. xRAG: Extreme Context Compression for Retrieval-Augmented Generation with One Token (https://arxiv.org/abs/2405.13792) — Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao, 2024 https://scholar.google.com/scholar?q=xRAG:+Extreme+Context+Compression+for+Retrieval-Augmented+Generation+with+One+Token+(https://arxiv.org/abs/2405.13792) 9. KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction (https://arxiv.org/abs/2505.23416) — Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon, Jae W. Lee, Sangdoo Yun, Hyun Oh Song, 2025 https://scholar.google.com/scholar?q=KVzip:+Query-Agnostic+KV+Cache+Compression+with+Context+Reconstruction+(https://arxiv.org/abs/2505.23416) 10. T2-RAGBench: Text-and-Table Benchmark for Evaluating Retrieval-Augmented Generation (https://aclanthology.org/2026.eacl-long.8/) — Jan Strich, Enes Kutay Isgorur, Maximilian Trescher, Chris Biemann, Martin Semmann, 2026 https://scholar.google.com/scholar?q=T2-RAGBench:+Text-and-Table+Benchmark+for+Evaluating+Retrieval-Augmented+Generation+(https://aclanthology.org/2026.eacl-long.8/) 11. KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse — Jingbo Yang et al., 2025 https://scholar.google.com/scholar?q=KVLink:+Accelerating+Large+Language+Models+via+Efficient+KV+Cache+Reuse 12. Hierarchical Document Refinement for Long-context Retrieval-augmented Generation — Jiajie Jin et al., 2025 https://scholar.google.com/scholar?q=Hierarchical+Document+Refinement+for+Long-context+Retrieval-augmented+Generation 13. LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing — Kuan Li et al., 2025 https://scholar.google.com/scholar?q=LaRA:+Benchmarking+Retrieval-Augmented+Generation+and+Long-Context+LLMs+--+No+Silver+Bullet+for+LC+or+RAG+Routing 14. ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities — Peng Xu et al., 2024 https://scholar.google.com/scholar?q=ChatQA+2:+Bridging+the+Gap+to+Proprietary+LLMs+in+Long+Context+and+RAG+Capabilities 15. LegalBench-RAG: A Benchmark for Retrieval-Augmented Generation in the Legal Domain — Nicholas Pipitone and Ghita Houir Alami, 2024 https://scholar.google.com/scholar?q=LegalBench-RAG:+A+Benchmark+for+Retrieval-Augmented+Generation+in+the+Legal+Domain 16. KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse — Huan Yang et al., 2025 https://scholar.google.com/scholar?q=KVShare:+An+LLM+Service+System+with+Efficient+and+Effective+Multi-Tenant+KV+Cache+Reuse 17. AI Post Transformers: KVzip for Query-Agnostic KV Cache Compression — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-29-kvzip-for-query-agnostic-kv-cache-compre-72afe5.mp3 18. AI Post Transformers: From Prefix Cache to Fusion RAG Cache: Accelerating LLM Inference in Retrieval-Augmented Generation — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-22-from-prefix-cache-to-fusion-rag-9c5d39.mp3 19. AI Post Transformers: Experimental Comparison of Agentic and Enhanced RAG — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-14-experimental-comparison-of-agentic-and-e-37d8bc.mp3 20. AI Post Transformers: Can Models Learn from Long Context? — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-05-can-models-learn-from-long-context-77533e.mp3 21. AI Post Transformers: δ-mem and Online Memory for LLMs — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-13-d-mem-and-online-memory-for-llms-6622fa.mp3 22. AI Post Transformers: FengHuang for Rack-Scale LLM Inference Memory — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-12-fenghuang-for-rack-scale-llm-inference-m-62708e.mp3

  3. 1 day ago

    From AGI to ASI and Beyond

    This episode explores DeepMind’s From AGI to ASI as a foresight report that treats human-level general intelligence not as the endpoint, but as a possible stepping stone toward systems that could outperform entire organizations in planning, research, engineering, and coordination. It breaks down how the paper defines AGI and the much more ambitious idea of ASI, then examines the conceptual tools behind that framing, including universal intelligence, AIXI as an idealized reference point, recursive self-improvement, collective intelligence, and the notion of effective compute. The discussion also probes the paper’s method, arguing that it is a structured synthesis of trends and bottlenecks rather than empirical proof, and questions how much precision is needed before such forecasts become meaningful. Listeners would find it interesting because it connects abstract AI theory, concrete scaling dynamics, and real uncertainty about whether progress in models, compute, and autonomy could compound into organization-level superintelligence. Sources: 1. From AGI to ASI — Tim Genewein, Matija Franklin, Alexander Lerchner, Laurent Orseau, Samuel Albanie, Adam Bales, Cole Wyeth, Stephanie Chan, Iason Gabriel, Joel Z. Leibo, Allan Dafoe, Marcus Hutter, Thore Graepel, Shane Legg, 2026 http://arxiv.org/abs/2606.12683 2. From AGI to ASI (https://arxiv.org/abs/2606.12683) — Tim Genewein, Matija Franklin, Alexander Lerchner, Marcus Hutter, Shane Legg, et al., 2026 https://scholar.google.com/scholar?q=From+AGI+to+ASI+(https://arxiv.org/abs/2606.12683) 3. Can Intelligence Explode? (https://arxiv.org/abs/1202.6177) — Marcus Hutter, 2012 https://scholar.google.com/scholar?q=Can+Intelligence+Explode?+(https://arxiv.org/abs/1202.6177) 4. Research Priorities for Robust and Beneficial Artificial Intelligence (https://arxiv.org/abs/1602.03506) — Stuart Russell, Daniel Dewey, Max Tegmark, 2015 https://scholar.google.com/scholar?q=Research+Priorities+for+Robust+and+Beneficial+Artificial+Intelligence+(https://arxiv.org/abs/1602.03506) 5. Emerging Practices in Frontier AI Safety Frameworks (https://arxiv.org/abs/2503.04746) — Marie Davidsen Buhl, Ben Bucknall, Tammy Masterson, 2025 https://scholar.google.com/scholar?q=Emerging+Practices+in+Frontier+AI+Safety+Frameworks+(https://arxiv.org/abs/2503.04746) 6. A Theory of Universal Artificial Intelligence based on Algorithmic Complexity (https://arxiv.org/abs/cs/0004001) — Marcus Hutter, 2000 https://scholar.google.com/scholar?q=A+Theory+of+Universal+Artificial+Intelligence+based+on+Algorithmic+Complexity+(https://arxiv.org/abs/cs/0004001) 7. Universal Intelligence: A Definition of Machine Intelligence (https://arxiv.org/abs/0712.3329) — Shane Legg, Marcus Hutter, 2007 https://scholar.google.com/scholar?q=Universal+Intelligence:+A+Definition+of+Machine+Intelligence+(https://arxiv.org/abs/0712.3329) 8. A Monte Carlo AIXI Approximation (https://arxiv.org/abs/0909.0801) — Joel Veness, Kee Siong Ng, Marcus Hutter, William Uther, David Silver, 2009 (later published in JAIR, 2011) https://scholar.google.com/scholar?q=A+Monte+Carlo+AIXI+Approximation+(https://arxiv.org/abs/0909.0801) 9. One Decade of Universal Artificial Intelligence (https://arxiv.org/abs/1202.6153) — Marcus Hutter, 2012 https://scholar.google.com/scholar?q=One+Decade+of+Universal+Artificial+Intelligence+(https://arxiv.org/abs/1202.6153) 10. Universal Intelligence: A Definition of Machine Intelligence — Shane Legg, Marcus Hutter, 2007 https://scholar.google.com/scholar?q=Universal+Intelligence:+A+Definition+of+Machine+Intelligence 11. Levels of AGI for Operationalizing Progress on the Path to AGI — Meredith Ringel Morris, Jascha Sohl-Dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg, 2023 https://scholar.google.com/scholar?q=Levels+of+AGI+for+Operationalizing+Progress+on+the+Path+to+AGI 12. AI as Normal Technology — Arvind Narayanan, Sayash Kapoor, 2025 https://scholar.google.com/scholar?q=AI+as+Normal+Technology 13. Preparing for the Intelligence Explosion — William MacAskill, Fin Moorhouse, 2025 https://scholar.google.com/scholar?q=Preparing+for+the+Intelligence+Explosion 14. A Rosetta Stone for AI Benchmarks — Anson Ho, Jean-Stanislas Denain, David Atanasov, Samuel Albanie, Rohin Shah, 2025 https://scholar.google.com/scholar?q=A+Rosetta+Stone+for+AI+Benchmarks 15. Measuring AI Ability to Complete Long Tasks — Thomas Kwa et al., 2025 https://scholar.google.com/scholar?q=Measuring+AI+Ability+to+Complete+Long+Tasks 16. PaperBench: Evaluating AI's Ability to Replicate AI Research — Giulio Starace et al., 2025 https://scholar.google.com/scholar?q=PaperBench:+Evaluating+AI's+Ability+to+Replicate+AI+Research 17. Inverse Scaling in Test-Time Compute — Aryo P. Gema et al., 2025 https://scholar.google.com/scholar?q=Inverse+Scaling+in+Test-Time+Compute 18. The Art of Scaling Test-Time Compute for Large Language Models — Aradhye Agarwal et al., 2025 https://scholar.google.com/scholar?q=The+Art+of+Scaling+Test-Time+Compute+for+Large+Language+Models 19. Test-Time Scaling Makes Overtraining Compute-Optimal — Nicholas Roberts et al., 2026 https://scholar.google.com/scholar?q=Test-Time+Scaling+Makes+Overtraining+Compute-Optimal 20. When AI Benchmarks Plateau: A Systematic Study of Benchmark Saturation — Mubashara Akhtar et al., 2026 https://scholar.google.com/scholar?q=When+AI+Benchmarks+Plateau:+A+Systematic+Study+of+Benchmark+Saturation 21. How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse — Mohamed El Amine Seddik et al., 2024 https://scholar.google.com/scholar?q=How+Bad+is+Training+on+Synthetic+Data?+A+Statistical+Analysis+of+Language+Model+Collapse 22. Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data — Matthias Gerstgrasser et al., 2024 https://scholar.google.com/scholar?q=Is+Model+Collapse+Inevitable?+Breaking+the+Curse+of+Recursion+by+Accumulating+Real+and+Synthetic+Data 23. MLGym: A New Framework and Benchmark for Advancing AI Research Agents — Deepak Nathani et al., 2025 https://scholar.google.com/scholar?q=MLGym:+A+New+Framework+and+Benchmark+for+Advancing+AI+Research+Agents 24. MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation — Qian Huang et al., 2023 https://scholar.google.com/scholar?q=MLAgentBench:+Evaluating+Language+Agents+on+Machine+Learning+Experimentation 25. AI Post Transformers: Technical AGI Safety and Security Framework — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-03-technical-agi-safety-and-security-framew-f27316.mp3 26. AI Post Transformers: Unified Neural Scaling Laws Across Regimes — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-07-unified-neural-scaling-laws-across-regim-292e2d.mp3 27. AI Post Transformers: TUMIX Multi-Agent Test-Time Scaling with Tools — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-tumix-multi-agent-test-time-scaling-with-40671c.mp3 28. AI Post Transformers: Test-time Scaling for Multi-Agent Collaborative Reasoning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-test-time-scaling-for-multi-agent-collab-082570.mp3 Interactive Visualization: From AGI to ASI and Beyond

  4. 1 day ago

    MiniMax Sparse Attention at Million-Token Scale

    This episode explores MiniMax Sparse Attention, a long-context transformer design that aims to preserve dense-model quality at million-token scale while sharply reducing the quadratic compute and memory costs of standard attention. It explains how the method combines Grouped Query Attention with blockwise sparse retrieval: a lightweight Index Branch scores past context in blocks, forces a recent local block to stay visible, selects top-k candidate regions, and then lets a Main Branch run exact softmax attention only inside those chosen blocks. The discussion places the paper alongside Longformer, BigBird, Routing Transformers, MInference, and Native Sparse Attention, arguing that its main contribution is a simpler, more GPU-friendly routing scheme that could make sparse attention practical at deployment time. Listeners would find it interesting because it focuses on the real technical tension behind ultra-long-context models: whether this kind of sparse routing can reliably recover rare distant evidence, or whether it mainly wins through recency bias and careful systems engineering. Sources: 1. MiniMax Sparse Attention — Xunhao Lai, Weiqi Xu, Yufeng Yang, Qiaorui Chen, Yang Xu, Lunbin Zeng, Xiaolong Li, Haohai Sun, Haichao Zhu, Vito Zhang, Pengyu Zhao, 2026 http://arxiv.org/abs/2606.13392 2. Longformer: The Long-Document Transformer — Iz Beltagy, Matthew E. Peters, Arman Cohan, 2020 https://scholar.google.com/scholar?q=Longformer:+The+Long-Document+Transformer 3. Big Bird: Transformers for Longer Sequences — Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Amr Ahmed, 2020 https://scholar.google.com/scholar?q=Big+Bird:+Transformers+for+Longer+Sequences 4. Efficient Content-Based Sparse Attention with Routing Transformers — Aurko Roy, Mohammad Saffar, Ashish Vaswani, David Grangier, 2020 https://scholar.google.com/scholar?q=Efficient+Content-Based+Sparse+Attention+with+Routing+Transformers 5. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention — Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Wenfeng Liang, Wangding Zeng, 2025 https://scholar.google.com/scholar?q=Native+Sparse+Attention:+Hardware-Aligned+and+Natively+Trainable+Sparse+Attention 6. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints — Joshua Ainslie et al., 2025 https://scholar.google.com/scholar?q=GQA:+Training+Generalized+Multi-Query+Transformer+Models+from+Multi-Head+Checkpoints 7. Optimizing Mixture of Block Attention — Guangxuan Xiao et al., 2025 https://scholar.google.com/scholar?q=Optimizing+Mixture+of+Block+Attention 8. MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention — Huiqiang Jiang et al., 2024 https://scholar.google.com/scholar?q=MInference+1.0:+Accelerating+Pre-filling+for+Long-Context+LLMs+via+Dynamic+Sparse+Attention 9. DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads — Guangxuan Xiao et al., 2024 https://scholar.google.com/scholar?q=DuoAttention:+Efficient+Long-Context+LLM+Inference+with+Retrieval+and+Streaming+Heads 10. FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision — Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, Tri Dao, 2024 https://scholar.google.com/scholar?q=FlashAttention-3:+Fast+and+Accurate+Attention+with+Asynchrony+and+Low-precision 11. FIER: Fine-Grained and Efficient KV Cache Retrieval for Long-context LLM Inference — Dongwei Wang et al., 2025 https://arxiv.org/abs/2508.08256 12. SCBench: A KV Cache-Centric Analysis of Long-Context Methods — Yucheng Li et al., 2024 https://arxiv.org/abs/2412.10319 13. Streaming Video Question-Answering with In-context Video KV-Cache Retrieval — Shangzhe Di et al., 2025 https://arxiv.org/abs/2503.00540 14. Native Hybrid Attention for Efficient Sequence Modeling — Jusen Du et al., 2025 https://arxiv.org/abs/2510.07019 15. Rope to Nope and Back Again: A New Hybrid Attention Strategy — Bowen Yang et al., 2025 https://arxiv.org/abs/2501.18795 16. AI Post Transformers: Optimizing Mixture of Block Attention Through Statistical Theory — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-18-optimizing-mixture-of-block-attention-th-214f91.mp3 17. AI Post Transformers: Deep Kernel Fusion for Transformer Decoding — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-15-deep-kernel-fusion-for-transformer-decod-b1a703.mp3 18. AI Post Transformers: Mooncake for KV Cache-Centric LLM Serving — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-05-mooncake-for-kv-cache-centric-llm-servin-1086d0.mp3 19. AI Post Transformers: How Induction Heads Emerge in Transformers — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-03-how-induction-heads-emerge-in-transforme-a7bfcb.mp3 20. AI Post Transformers: δ-mem and Online Memory for LLMs — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-13-d-mem-and-online-memory-for-llms-6622fa.mp3 21. AI Post Transformers: Ministral 3: Cascade Distillation for Long-Context Multimodal Models — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-15-cascade-distillation-for-long-context-mu-0ebd1a.mp3 Interactive Visualization: MiniMax Sparse Attention at Million-Token Scale

  5. 2 days ago

    ACE: Matrix-Native AI Extensions for x86

    This episode explores the ACE proposal from AMD, Intel, and the x86 Ecosystem Advisory Group, which would add matrix-native AI instructions to x86 CPUs so transformer workloads can run dense linear algebra more efficiently without changing the models themselves. It explains why AVX10 and VNNI fall short for GEMM-heavy inference, introducing outer-product updates, 2-D tile registers, and the reuse of the AMX palette model so operating systems and compilers can handle the new state within familiar x86 mechanisms. The discussion also challenges the proposal’s headline 16x compute-density claim for INT8 and BF16, arguing that real speed depends on full-kernel costs like packing, memory traffic, conversions, tails, and cache behavior. It also examines OCP FP8, MXFP8, MXINT8, and BF16 support as a sign that low-precision AI now depends on tight coordination between ISA design, quantization rules, and kernel implementation, making the proposal interesting both technically and strategically. Sources: 1. ACE: Matrix-Native AI Extensions for x86 https://x86ecosystem.org/wp-content/uploads/2026/03/ACE-Whitepaper-v1.pdf 2. The AI Compute Extensions (ACE) for x86 — Stuart Biles, Brian Thompto, Michael Estlick, Eric Schwarz, Thomas Fox, Gabriel Loh, Marius Evers, Michael Clark, Alexander Heinecke, Pradeep Dubey, Ido Ouziel, 2026 https://scholar.google.com/scholar?q=The+AI+Compute+Extensions+(ACE)+for+x86 3. A matrix math facility for Power ISA(TM) processors — José E. Moreira, Kit Barton, Steven Battle, Peter Bergner, Ramon Bertran and others, 2021 https://scholar.google.com/scholar?q=A+matrix+math+facility+for+Power+ISA(TM)+processors 4. Hello SME! Generating Fast Matrix Multiplication Kernels Using the Scalable Matrix Extension — Stefan Remke, Alexander Breuer, 2024 https://scholar.google.com/scholar?q=Hello+SME!+Generating+Fast+Matrix+Multiplication+Kernels+Using+the+Scalable+Matrix+Extension 5. SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs — Ahmed F. AbouElhamayed, Jordan Dotzel, Yash Akhauri, Chi-Chih Chang, Sameh Gobriel, J. Pablo Muñoz, Vui Seng Chua, Nilesh Jain, Mohamed S. Abdelfattah, 2025 https://scholar.google.com/scholar?q=SparAMX:+Accelerating+Compressed+LLMs+Token+Generation+on+AMX-powered+CPUs 6. Automating the Last-Mile for High Performance Dense Linear Algebra — Richard Michael Veras, Tze Meng Low, Tyler Michael Smith, Robert A. van de Geijn, Franz Franchetti, 2017 https://scholar.google.com/scholar?q=Automating+the+Last-Mile+for+High+Performance+Dense+Linear+Algebra 7. Microscaling Data Formats for Deep Learning — Bita Darvish Rouhani et al., 2023 https://scholar.google.com/scholar?q=Microscaling+Data+Formats+for+Deep+Learning 8. Recipes for Pre-training LLMs with MXFP8 — Asit Mishra, Dusan Stosic, Simon Layton, 2025 https://scholar.google.com/scholar?q=Recipes+for+Pre-training+LLMs+with+MXFP8 9. Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering — Braedy Kuzma et al., 2023 https://scholar.google.com/scholar?q=Fast+Matrix+Multiplication+via+Compiler-only+Layered+Data+Reorganization+and+Intrinsic+Lowering 10. THOR: A Non-Speculative Value Dependent Timing Side Channel Attack Exploiting Intel AMX — Farshad Dizani et al., 2025 https://scholar.google.com/scholar?q=THOR:+A+Non-Speculative+Value+Dependent+Timing+Side+Channel+Attack+Exploiting+Intel+AMX 11. Compute or Load KV Cache? Why Not Both? (https://arxiv.org/abs/2410.03065) — Shuowei Jin et al., 2024 https://scholar.google.com/scholar?q=Compute+or+Load+KV+Cache?+Why+Not+Both?+(https://arxiv.org/abs/2410.03065) 12. SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference (https://arxiv.org/abs/2510.17189) — Wenxun Wang et al., 2025 https://scholar.google.com/scholar?q=SOLE:+Hardware-Software+Co-design+of+Softmax+and+LayerNorm+for+Efficient+Transformer+Inference+(https://arxiv.org/abs/2510.17189) 13. Towards Fully FP8 GEMM LLM Training at Scale (https://arxiv.org/abs/2505.20524) — Alejandro Hernandez-Cano et al., 2025 https://scholar.google.com/scholar?q=Towards+Fully+FP8+GEMM+LLM+Training+at+Scale+(https://arxiv.org/abs/2505.20524) 14. Pretraining Large Language Models with NVFP4 (https://arxiv.org/abs/2509.25149) — Felix Abecassis et al. (NVIDIA), 2025 https://scholar.google.com/scholar?q=Pretraining+Large+Language+Models+with+NVFP4+(https://arxiv.org/abs/2509.25149) 15. SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training (https://arxiv.org/abs/2505.11594) — Jintao Zhang et al., 2025 https://scholar.google.com/scholar?q=SageAttention3:+Microscaling+FP4+Attention+for+Inference+and+An+Exploration+of+8-Bit+Training+(https://arxiv.org/abs/2505.11594) 16. AI Post Transformers: Memory-Bound, Not Bandwidth-Limited Batch-1 LLM Decode — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-02-memory-bound-not-bandwidth-limited-batch-114799.mp3 17. AI Post Transformers: LPU Chip for Low-Latency LLM Inference — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-20-lpu-chip-for-low-latency-llm-inference-be13c3.mp3 18. AI Post Transformers: Deep Kernel Fusion for Transformer Decoding — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-15-deep-kernel-fusion-for-transformer-decod-b1a703.mp3 19. AI Post Transformers: FlatAttention for Tile-Based Accelerator Inference — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-flatattention-for-tile-based-accelerator-56e6ca.mp3 Interactive Visualization: ACE: Matrix-Native AI Extensions for x86

  6. 2 days ago

    Dafny for Trustworthy AI Code Generation

    This episode explores a 2025 paper on using Dafny as a hidden, verification-aware intermediate language for AI code generation, where a model first produces a formal specification and verified implementation before compiling it into ordinary Python. It examines the paper’s central trust claim: formal verification can prove that the generated code satisfies the hidden spec, but it cannot prove that the hidden spec actually matches the user’s intent, making spec alignment a separate and critical failure point. The discussion uses the paper’s fibfib example and HumanEval results to unpack that distinction, noting that the Dafny-only pipeline trails direct Python generation, while the best reported score comes only after falling back to unverified Python when the verification loop fails to converge. Listeners would find it interesting because it gives a concrete, nuanced look at where AI coding assistants can become more reliable, where the guarantees stop, and why neuro-symbolic workflows may matter most for tightly specified code like algorithms, parsers, and protocol logic. Sources: 1. Dafny as Verification-Aware Intermediate Language for Code Generation — Yue Chen Li, Stefan Zetzsche, Siva Somayyajula, 2025 http://arxiv.org/abs/2501.06283 2. Dafny: An Automatic Program Verifier for Functional Correctness — K. Rustan M. Leino, 2010 https://scholar.google.com/scholar?q=Dafny:+An+Automatic+Program+Verifier+for+Functional+Correctness 3. Towards AI-Assisted Synthesis of Verified Dafny Methods — Md Rakib Hossain Misu, Cristina V. Lopes, Iris Ma, James Noble, 2024 https://scholar.google.com/scholar?q=Towards+AI-Assisted+Synthesis+of+Verified+Dafny+Methods 4. Clover: Closed-Loop Verifiable Code Generation — Chuyue Sun, Ying Sheng, Oded Padon, Clark Barrett, 2024 https://scholar.google.com/scholar?q=Clover:+Closed-Loop+Verifiable+Code+Generation 5. VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search — David Brandfonbrener, Simon Henniger, Sibi Raja, Tarun Prasad, Chloe Loughridge, Federico Cassano, Sabrina Ruixin Hu, Jianang Yang, William E. Byrd, Robert Zinkov, Nada Amin, 2024 https://scholar.google.com/scholar?q=VerMCTS:+Synthesizing+Multi-Step+Programs+using+a+Verifier,+a+Large+Language+Model,+and+Tree+Search 6. Laurel: Generating Dafny Assertions Using Large Language Models — Eric Mugnier, Emmanuel Anaya Gonzalez, Ranjit Jhala, Nadia Polikarpova, Yuanyuan Zhou, 2024 https://scholar.google.com/scholar?q=Laurel:+Generating+Dafny+Assertions+Using+Large+Language+Models 7. DafnyBench: A Benchmark for Formal Software Verification — Chloe Loughridge et al., 2024 https://scholar.google.com/scholar?q=DafnyBench:+A+Benchmark+for+Formal+Software+Verification 8. Evaluating Large Language Models Trained on Code — Mark Chen et al., 2021 https://scholar.google.com/scholar?q=Evaluating+Large+Language+Models+Trained+on+Code 9. Baking for Dafny: A CakeML Backend for Dafny — Daniel Nezamabadi, Magnus Myreen, 2025 https://scholar.google.com/scholar?q=Baking+for+Dafny:+A+CakeML+Backend+for+Dafny 10. Intent-aligned Formal Specification Synthesis via Traceable Refinement — Zhe Ye et al., 2026 https://scholar.google.com/scholar?q=Intent-aligned+Formal+Specification+Synthesis+via+Traceable+Refinement 11. Combining LLM Code Generation with Formal Specifications and Reactive Program Synthesis — William Murphy et al., 2024 https://scholar.google.com/scholar?q=Combining+LLM+Code+Generation+with+Formal+Specifications+and+Reactive+Program+Synthesis 12. StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback — Shihan Dou et al., 2024 https://scholar.google.com/scholar?q=StepCoder:+Improve+Code+Generation+with+Reinforcement+Learning+from+Compiler+Feedback 13. InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration — Yunkun Wang et al., 2025 https://scholar.google.com/scholar?q=InspectCoder:+Dynamic+Analysis-Enabled+Self+Repair+through+interactive+LLM-Debugger+Collaboration 14. FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs — Madhurima Chakraborty et al., 2025 https://scholar.google.com/scholar?q=FormalSpecCpp:+A+Dataset+of+C+++Formal+Specifications+created+using+LLMs 15. AI Post Transformers: From Natural Language to Verified Dafny Code — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-14-from-natural-language-to-verified-dafny-8abed9.mp3 16. AI Post Transformers: Program Synthesis with Large Language Models — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-20-program-synthesis-with-large-language-mo-b962ec.mp3 17. AI Post Transformers: Generative File Systems: Replacing Code with Formal Specifications — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-18-generative-file-systems-replacing-code-w-414029.mp3

  7. 3 days ago

    Can LLMs Enable Mainstream Formal Verification?

    This episode explores whether large language models can help mainstream developers write code that is not just plausible, but formally verified in systems like Dafny, Nagini, and Verus. It explains the core ideas behind machine-checked correctness, including contracts, SMT solvers, loop invariants, and why verification is far stricter than passing tests or matching statistical behavior. The discussion highlights the paper’s main argument that the real bottleneck is not only generating implementations, but also generating the formal scaffolding and annotations that proofs require, especially around loops. Listeners get a clear view of how the authors evaluate LLMs with verifier feedback and extra validation to catch weakened specifications, making the episode interesting for anyone curious about whether AI can close the trust gap in code generation. Sources: 1. Can LLMs Enable Verification in Mainstream Programming? — Aleksandr Shefer, Igor Engel, Stanislav Alekseev, Daniil Berezun, Ekaterina Verbitskaia, Anton Podkopaev, 2025 http://arxiv.org/abs/2503.14183 2. Inferring Loop Invariants using Postconditions — Carlo A. Furia, Bertrand Meyer, 2009 https://scholar.google.com/scholar?q=Inferring+Loop+Invariants+using+Postconditions 3. Inferring Loop Invariants by Mutation, Dynamic Analysis, and Static Checking — Juan P. Galeotti, Carlo A. Furia, Eva May, Gordon Fraser, Andreas Zeller, 2014 https://scholar.google.com/scholar?q=Inferring+Loop+Invariants+by+Mutation,+Dynamic+Analysis,+and+Static+Checking 4. LoopInvGen: A Loop Invariant Generator based on Precondition Inference — Saswat Padhi, Rahul Sharma, Todd Millstein, 2017 https://scholar.google.com/scholar?q=LoopInvGen:+A+Loop+Invariant+Generator+based+on+Precondition+Inference 5. On Scaling Data-Driven Loop Invariant Inference — Sahil Bhatia, Saswat Padhi, Nagarajan Natarajan, Rahul Sharma, Prateek Jain, 2019 https://scholar.google.com/scholar?q=On+Scaling+Data-Driven+Loop+Invariant+Inference 6. Dafny: An Automatic Program Verifier for Functional Correctness — K. Rustan M. Leino, 2010 https://scholar.google.com/scholar?q=Dafny:+An+Automatic+Program+Verifier+for+Functional+Correctness 7. Nagini: A Static Verifier for Python — Marco Eilers, Peter Muller, 2018 https://scholar.google.com/scholar?q=Nagini:+A+Static+Verifier+for+Python 8. Verus: Verifying Rust Programs using Linear Ghost Types (extended version) — Andrea Lattuada, Travis Hance, Chanhee Cho, Matthias Brun, Isitha Subasinghe, Yi Zhou, Jon Howell, Bryan Parno, Chris Hawblitzel, 2023 https://scholar.google.com/scholar?q=Verus:+Verifying+Rust+Programs+using+Linear+Ghost+Types+(extended+version) 9. Can LLMs Enable Verification in Mainstream Programming? — Aleksandr Shefer, Igor Engel, Stanislav Alekseev, Daniil Berezun, Ekaterina Verbitskaia, Anton Podkopaev, 2025 https://scholar.google.com/scholar?q=Can+LLMs+Enable+Verification+in+Mainstream+Programming? 10. Clover: Closed-loop verifiable code generation — Chuyue Sun, Ying Sheng, Oded Padon, Clark Barrett, 2024 https://scholar.google.com/scholar?q=Clover:+Closed-loop+verifiable+code+generation 11. Alphaverus: Bootstrapping formally verified code generation through self-improving translation and treefinement — Pranjal Aggarwal, Bryan Parno, Sean Welleck, 2024 https://scholar.google.com/scholar?q=Alphaverus:+Bootstrapping+formally+verified+code+generation+through+self-improving+translation+and+treefinement 12. Can large language models transform natural language intent into formal method postconditions? — Madeline Endres, Sarah Fakhoury, Saikat Chakraborty, Shuvendu K. Lahiri, 2024 https://scholar.google.com/scholar?q=Can+large+language+models+transform+natural+language+intent+into+formal+method+postconditions? 13. Laurel: Generating Dafny assertions using large language models — Eric Mugnier, Emmanuel Anaya Gonzalez, Ranjit Jhala, Nadia Polikarpova, Yuanyuan Zhou, 2024 https://scholar.google.com/scholar?q=Laurel:+Generating+Dafny+assertions+using+large+language+models 14. Finding Inductive Loop Invariants using Large Language Models — Adharsh Kamath, Aditya Senthilnathan, Saikat Chakraborty, Pantazis Deligiannis, Shuvendu K. Lahiri, Akash Lal, Aseem Rastogi, Subhajit Roy, Rahul Sharma, 2023 https://scholar.google.com/scholar?q=Finding+Inductive+Loop+Invariants+using+Large+Language+Models 15. Towards Formal Verification of LLM-Generated Code from Natural Language Prompts — Aaron Councilman et al., 2025 https://arxiv.org/abs/2507.13290 16. Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification — Cheng Wen et al., 2024 https://arxiv.org/abs/2404.00762 17. SpecGen: Automated Generation of Formal Program Specifications via Large Language Models — Lezhi Ma et al., 2024 https://arxiv.org/abs/2401.08807 18. Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification? — Cedric Richter and Heike Wehrheim, 2025 https://arxiv.org/abs/2510.12702 19. Guiding LLM-based Loop Invariant Synthesis via Feedback on Local Reasoning Errors — Tianchi Li et al., 2026 https://arxiv.org/abs/2605.17914 20. Loop Invariant Generation: A Hybrid Framework of Reasoning optimised LLMs and SMT Solvers — Varun Bharti et al., 2025 https://arxiv.org/abs/2508.00419 21. LLM For Loop Invariant Generation and Fixing: How Far Are We? — Mostafijur Rahman Akhond et al., 2025 https://arxiv.org/abs/2511.06552 22. AI Post Transformers: From Natural Language to Verified Dafny Code — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-06-14-from-natural-language-to-verified-dafny-8abed9.mp3 23. AI Post Transformers: Generative File Systems: Replacing Code with Formal Specifications — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-18-generative-file-systems-replacing-code-w-414029.mp3 24. AI Post Transformers: Trajectory Summaries for Long-Horizon Coding Agents — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-24-trajectory-summaries-for-long-horizon-co-0194be.mp3 Interactive Visualization: Can LLMs Enable Mainstream Formal Verification?

  8. 3 days ago

    From Natural Language to Verified Dafny Code

    This episode explores a 2026 study on turning long natural-language programming problems into Dafny code that can be formally verified, asking whether AI systems can produce code that is not just fluent but provably correct. It explains how Dafny uses preconditions, postconditions, loop invariants, and proof obligations, and why weak specifications can lead to vacuous “verified” programs that still fail to capture the real task. The discussion highlights the paper’s NL2VC-60 benchmark of hand-written verified solutions to UVa-style algorithm problems, along with experiments comparing plain prompting, signature-guided prompting, and self-healing loops that revise code using verifier feedback and additional uDebug testing. Listeners would find it interesting because it gets at the core trust problem in AI coding: whether formal methods can make generated software more reliable, and where the real bottleneck remains the human effort required to write strong specifications. Sources: 1. From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification — Md Erfan, Md Kamal Hossain Chowdhury, Ahmed Ryan, Md Rayhanur Rahman, 2026 http://arxiv.org/abs/2604.22601 2. Dafny: An Automatic Program Verifier for Functional Correctness — K. Rustan M. Leino, 2010 https://scholar.google.com/scholar?q=Dafny:+An+Automatic+Program+Verifier+for+Functional+Correctness 3. seL4: Formal Verification of an Operating-System Kernel — Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, et al., 2009 https://scholar.google.com/scholar?q=seL4:+Formal+Verification+of+an+Operating-System+Kernel 4. Formal verification of a realistic compiler — Xavier Leroy, 2009 https://scholar.google.com/scholar?q=Formal+verification+of+a+realistic+compiler 5. Modularity, Code Specialization, and Zero-Cost Abstractions for Program Verification — Son Ho, Aymeric Fromherz, Jonathan Protzenko, 2021 https://scholar.google.com/scholar?q=Modularity,+Code+Specialization,+and+Zero-Cost+Abstractions+for+Program+Verification 6. Towards AI-Assisted Synthesis of Verified Dafny Methods — Md Rakib Hossain Misu, Cristina V. Lopes, Iris Ma, James Noble, 2024 https://scholar.google.com/scholar?q=Towards+AI-Assisted+Synthesis+of+Verified+Dafny+Methods 7. DafnyBench: A Benchmark for Formal Software Verification — Chloe Loughridge et al., 2024 https://scholar.google.com/scholar?q=DafnyBench:+A+Benchmark+for+Formal+Software+Verification 8. Can LLMs Enable Verification in Mainstream Programming? — Aleksandr Shefer, Igor Engel, Stanislav Alekseev, Daniil Berezun, Ekaterina Verbitskaia, Anton Podkopaev, 2025 https://scholar.google.com/scholar?q=Can+LLMs+Enable+Verification+in+Mainstream+Programming? 9. Dafny as Verification-Aware Intermediate Language for Code Generation — Yue Chen Li, Stefan Zetzsche, Siva Somayyajula, 2025 https://scholar.google.com/scholar?q=Dafny+as+Verification-Aware+Intermediate+Language+for+Code+Generation 10. ATLAS: Automated Toolkit for Large-Scale Verified Code Synthesis — Mantas Baksys et al., 2025 https://scholar.google.com/scholar?q=ATLAS:+Automated+Toolkit+for+Large-Scale+Verified+Code+Synthesis 11. DafnyPro: LLM-Assisted Automated Verification for Dafny Programs — Debangshu Banerjee, Olivier Bouissou, Stefan Zetzsche, 2026 https://scholar.google.com/scholar?q=DafnyPro:+LLM-Assisted+Automated+Verification+for+Dafny+Programs 12. Neuro Symbolic Reasoning for Planning: Counterexample Guided Inductive Synthesis using Large Language Models and Satisfiability Solving — Sumit Kumar Jha et al., 2023 https://scholar.google.com/scholar?q=Neuro+Symbolic+Reasoning+for+Planning:+Counterexample+Guided+Inductive+Synthesis+using+Large+Language+Models+and+Satisfiability+Solving 13. Property-Guided LLM Program Synthesis for Planning — Andre G. Pereira, Augusto B. Correa, Jendrik Seipp, 2026 https://scholar.google.com/scholar?q=Property-Guided+LLM+Program+Synthesis+for+Planning 14. Finding Inductive Loop Invariants using Large Language Models — Adharsh Kamath et al., 2023 https://scholar.google.com/scholar?q=Finding+Inductive+Loop+Invariants+using+Large+Language+Models 15. LLM For Loop Invariant Generation and Fixing: How Far Are We? — Mostafijur Rahman Akhond, Saikat Chakraborty, Gias Uddin, 2025 https://scholar.google.com/scholar?q=LLM+For+Loop+Invariant+Generation+and+Fixing:+How+Far+Are+We? 16. Type-Constrained Code Generation with Language Models — Niels Mundler et al., 2025 https://scholar.google.com/scholar?q=Type-Constrained+Code+Generation+with+Language+Models 17. Invariant-based Program Repair — Omar I. Al-Bataineh, 2024 https://scholar.google.com/scholar?q=Invariant-based+Program+Repair 18. AI Post Transformers: Program Synthesis with Large Language Models — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-20-program-synthesis-with-large-language-mo-b962ec.mp3 19. AI Post Transformers: SGLang for Faster Structured LLM Programs — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-06-sglang-for-faster-structured-llm-program-c59f1c.mp3 20. AI Post Transformers: SkillsBench for Evaluating Agent Skills — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-14-skillsbench-for-evaluating-agent-skills-58bb1e.mp3

About

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

You Might Also Like