Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)
Machine Learning Street Talk (MLST)

Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).

  1. Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu

    HÁ 4 DIAS

    Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu

    We speak with Sakana AI, who are building nature-inspired methods that could fundamentally transform how we develop AI systems. The guests include Chris Lu, a researcher who recently completed his DPhil at Oxford University under Prof. Jakob Foerster's supervision, where he focused on meta-learning and multi-agent systems. Chris is the first author of the DiscoPOP paper, which demonstrates how language models can discover and design better training algorithms. Also joining is Robert Tjarko Lange, a founding member of Sakana AI who specializes in evolutionary algorithms and large language models. Robert leads research at the intersection of evolutionary computation and foundation models, and is completing his PhD at TU Berlin on evolutionary meta-learning. The discussion also features Cong Lu, currently a Research Scientist at Google DeepMind's Open-Endedness team, who previously helped develop The AI Scientist and Intelligent Go-Explore. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting! https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** * DiscoPOP - A framework where language models discover their own optimization algorithms * EvoLLM - Using language models as evolution strategies for optimization The AI Scientist - A fully automated system that conducts scientific research end-to-end * Neural Attention Memory Models (NAMMs) - Evolved memory systems that make transformers both faster and more accurate TRANSCRIPT + REFS: https://www.dropbox.com/scl/fi/gflcyvnujp8cl7zlv3v9d/Sakana.pdf?rlkey=woaoo82943170jd4yyi2he71c&dl=0 Robert Tjarko Lange https://roberttlange.com/ Chris Lu https://chrislu.page/ Cong Lu https://www.conglu.co.uk/ Sakana https://sakana.ai/blog/ TOC: 1. LLMs for Algorithm Generation and Optimization [00:00:00] 1.1 LLMs generating algorithms for training other LLMs [00:04:00] 1.2 Evolutionary black-box optim using neural network loss parameterization [00:11:50] 1.3 DiscoPOP: Non-convex loss function for noisy data [00:20:45] 1.4 External entropy Injection for preventing Model collapse [00:26:25] 1.5 LLMs for black-box optimization using abstract numerical sequences 2. Model Learning and Generalization [00:31:05] 2.1 Fine-tuning on teacher algorithm trajectories [00:31:30] 2.2 Transformers learning gradient descent [00:33:00] 2.3 LLM tokenization biases towards specific numbers [00:34:50] 2.4 LLMs as evolution strategies for black box optimization [00:38:05] 2.5 DiscoPOP: LLMs discovering novel optimization algorithms 3. AI Agents and System Architectures [00:51:30] 3.1 ARC challenge: Induction vs. transformer approaches [00:54:35] 3.2 LangChain / modular agent components [00:57:50] 3.3 Debate improves LLM truthfulness [01:00:55] 3.4 Time limits controlling AI agent systems [01:03:00] 3.5 Gemini: Million-token context enables flatter hierarchies [01:04:05] 3.6 Agents follow own interest gradients [01:09:50] 3.7 Go-Explore algorithm: archive-based exploration [01:11:05] 3.8 Foundation models for interesting state discovery [01:13:00] 3.9 LLMs leverage prior game knowledge 4. AI for Scientific Discovery and Human Alignment [01:17:45] 4.1 Encoding Alignment & Aesthetics via Reward Functions [01:20:00] 4.2 AI Scientist: Automated Open-Ended Scientific Discovery [01:24:15] 4.3 DiscoPOP: LLM for Preference Optimization Algorithms [01:28:30] 4.4 Balancing AI Knowledge with Human Understanding [01:33:55] 4.5 AI-Driven Conferences and Paper Review

    1h38min
  2. Clement Bonnet - Can Latent Program Networks Solve Abstract Reasoning?

    19 DE FEV.

    Clement Bonnet - Can Latent Program Networks Solve Abstract Reasoning?

    Clement Bonnet discusses his novel approach to the ARC (Abstraction and Reasoning Corpus) challenge. Unlike approaches that rely on fine-tuning LLMs or generating samples at inference time, Clement's method encodes input-output pairs into a latent space, optimizes this representation with a search algorithm, and decodes outputs for new inputs. This end-to-end architecture uses a VAE loss, including reconstruction and prior losses. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting! https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT + RESEARCH OVERVIEW: https://www.dropbox.com/scl/fi/j7m0gaz1126y594gswtma/CLEMMLST.pdf?rlkey=y5qvwq2er5nchbcibm07rcfpq&dl=0 Clem and Matthew- https://www.linkedin.com/in/clement-bonnet16/ https://github.com/clement-bonnet https://mvmacfarlane.github.io/ TOC 1. LPN Fundamentals [00:00:00] 1.1 Introduction to ARC Benchmark and LPN Overview [00:05:05] 1.2 Neural Networks' Challenges with ARC and Program Synthesis [00:06:55] 1.3 Induction vs Transduction in Machine Learning 2. LPN Architecture and Latent Space [00:11:50] 2.1 LPN Architecture and Latent Space Implementation [00:16:25] 2.2 LPN Latent Space Encoding and VAE Architecture [00:20:25] 2.3 Gradient-Based Search Training Strategy [00:23:39] 2.4 LPN Model Architecture and Implementation Details 3. Implementation and Scaling [00:27:34] 3.1 Training Data Generation and re-ARC Framework [00:31:28] 3.2 Limitations of Latent Space and Multi-Thread Search [00:34:43] 3.3 Program Composition and Computational Graph Architecture 4. Advanced Concepts and Future Directions [00:45:09] 4.1 AI Creativity and Program Synthesis Approaches [00:49:47] 4.2 Scaling and Interpretability in Latent Space Models REFS [00:00:05] ARC benchmark, Chollet https://arxiv.org/abs/2412.04604 [00:02:10] Latent Program Spaces, Bonnet, Macfarlane https://arxiv.org/abs/2411.08706 [00:07:45] Kevin Ellis work on program generation https://www.cs.cornell.edu/~ellisk/ [00:08:45] Induction vs transduction in abstract reasoning, Li et al. https://arxiv.org/abs/2411.02272 [00:17:40] VAEs, Kingma, Welling https://arxiv.org/abs/1312.6114 [00:27:50] re-ARC, Hodel https://github.com/michaelhodel/re-arc [00:29:40] Grid size in ARC tasks, Chollet https://github.com/fchollet/ARC-AGI [00:33:00] Critique of deep learning, Marcus https://arxiv.org/vc/arxiv/papers/2002/2002.06177v1.pdf

    51min
  3. Prof. Jakob Foerster - ImageNet Moment for Reinforcement Learning?

    18 DE FEV.

    Prof. Jakob Foerster - ImageNet Moment for Reinforcement Learning?

    Prof. Jakob Foerster, a leading AI researcher at Oxford University and Meta, and Chris Lu, a researcher at OpenAI -- they explain how AI is moving beyond just mimicking human behaviour to creating truly intelligent agents that can learn and solve problems on their own. Foerster champions open-source AI for responsible, decentralised development. He addresses AI scaling, goal misalignment (Goodhart's Law), and the need for holistic alignment, offering a quick look at the future of AI and how to guide it. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting! https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT/REFS: https://www.dropbox.com/scl/fi/yqjszhntfr00bhjh6t565/JAKOB.pdf?rlkey=scvny4bnwj8th42fjv8zsfu2y&dl=0 Prof. Jakob Foerster https://x.com/j_foerst https://www.jakobfoerster.com/ University of Oxford Profile: https://eng.ox.ac.uk/people/jakob-foerster/ Chris Lu: https://chrislu.page/ TOC 1. GPU Acceleration and Training Infrastructure [00:00:00] 1.1 ARC Challenge Criticism and FLAIR Lab Overview [00:01:25] 1.2 GPU Acceleration and Hardware Lottery in RL [00:05:50] 1.3 Data Wall Challenges and Simulation-Based Solutions [00:08:40] 1.4 JAX Implementation and Technical Acceleration 2. Learning Frameworks and Policy Optimization [00:14:18] 2.1 Evolution of RL Algorithms and Mirror Learning Framework [00:15:25] 2.2 Meta-Learning and Policy Optimization Algorithms [00:21:47] 2.3 Language Models and Benchmark Challenges [00:28:15] 2.4 Creativity and Meta-Learning in AI Systems 3. Multi-Agent Systems and Decentralization [00:31:24] 3.1 Multi-Agent Systems and Emergent Intelligence [00:38:35] 3.2 Swarm Intelligence vs Monolithic AGI Systems [00:42:44] 3.3 Democratic Control and Decentralization of AI Development [00:46:14] 3.4 Open Source AI and Alignment Challenges [00:49:31] 3.5 Collaborative Models for AI Development REFS [[00:00:05] ARC Benchmark, Chollet https://github.com/fchollet/ARC-AGI [00:03:05] DRL Doesn't Work, Irpan https://www.alexirpan.com/2018/02/14/rl-hard.html [00:05:55] AI Training Data, Data Provenance Initiative https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html [00:06:10] JaxMARL, Foerster et al. https://arxiv.org/html/2311.10090v5 [00:08:50] M-FOS, Lu et al. https://arxiv.org/abs/2205.01447 [00:09:45] JAX Library, Google Research https://github.com/jax-ml/jax [00:12:10] Kinetix, Mike and Michael https://arxiv.org/abs/2410.23208 [00:12:45] Genie 2, DeepMind https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/ [00:14:42] Mirror Learning, Grudzien, Kuba et al. https://arxiv.org/abs/2208.01682 [00:16:30] Discovered Policy Optimisation, Lu et al. https://arxiv.org/abs/2210.05639 [00:24:10] Goodhart's Law, Goodhart https://en.wikipedia.org/wiki/Goodhart%27s_law [00:25:15] LLM ARChitect, Franzen et al. https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf [00:28:55] AlphaGo, Silver et al. https://arxiv.org/pdf/1712.01815.pdf [00:30:10] Meta-learning, Lu, Towers, Foerster https://direct.mit.edu/isal/proceedings-pdf/isal2023/35/67/2354943/isal_a_00674.pdf [00:31:30] Emergence of Pragmatics, Yuan et al. https://arxiv.org/abs/2001.07752 [00:34:30] AI Safety, Amodei et al. https://arxiv.org/abs/1606.06565 [00:35:45] Intentional Stance, Dennett https://plato.stanford.edu/entries/ethics-ai/ [00:39:25] Multi-Agent RL, Zhou et al. https://arxiv.org/pdf/2305.10091 [00:41:00] Open Source Generative AI, Foerster et al. https://arxiv.org/abs/2405.08597

    54min
  4. Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

    12 DE FEV.

    Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

    Daniel Franzen and Jan Disselhoff, the "ARChitects" are the official winners of the ARC Prize 2024. Filmed at Tufa Labs in Zurich - they revealed how they achieved a remarkable 53.5% accuracy by creatively utilising large language models (LLMs) in new ways. Discover their innovative techniques, including depth-first search for token selection, test-time training, and a novel augmentation-based validation system. Their results were extremely surprising. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting! https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** Jan Disselhoff https://www.linkedin.com/in/jan-disselhoff-1423a2240/ Daniel Franzen https://github.com/da-fr ARC Prize: http://arcprize.org/ TRANSCRIPT AND BACKGROUND READING: https://www.dropbox.com/scl/fi/utkn2i1ma79fn6an4yvjw/ARCHitects.pdf?rlkey=67pe38mtss7oyhjk2ad0d2aza&dl=0 TOC 1. Solution Architecture and Strategy Overview [00:00:00] 1.1 Initial Solution Overview and Model Architecture [00:04:25] 1.2 LLM Capabilities and Dataset Approach [00:10:51] 1.3 Test-Time Training and Data Augmentation Strategies [00:14:08] 1.4 Sampling Methods and Search Implementation [00:17:52] 1.5 ARC vs Language Model Context Comparison 2. LLM Search and Model Implementation [00:21:53] 2.1 LLM-Guided Search Approaches and Solution Validation [00:27:04] 2.2 Symmetry Augmentation and Model Architecture [00:30:11] 2.3 Model Intelligence Characteristics and Performance [00:37:23] 2.4 Tokenization and Numerical Processing Challenges 3. Advanced Training and Optimization [00:45:15] 3.1 DFS Token Selection and Probability Thresholds [00:49:41] 3.2 Model Size and Fine-tuning Performance Trade-offs [00:53:07] 3.3 LoRA Implementation and Catastrophic Forgetting Prevention [00:56:10] 3.4 Training Infrastructure and Optimization Experiments [01:02:34] 3.5 Search Tree Analysis and Entropy Distribution Patterns REFS [00:01:05] Winning ARC 2024 solution using 12B param model, Franzen, Disselhoff, Hartmann https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf [00:03:40] Robustness of analogical reasoning in LLMs, Melanie Mitchell https://arxiv.org/html/2411.14215 [00:07:50] Re-ARC dataset generator for ARC task variations, Michael Hodel https://github.com/michaelhodel/re-arc [00:15:00] Analysis of search methods in LLMs (greedy, beam, DFS), Chen et al. https://arxiv.org/html/2408.00724v2 [00:16:55] Language model reachability space exploration, University of Toronto https://www.youtube.com/watch?v=Bpgloy1dDn0 [00:22:30] GPT-4 guided code solutions for ARC tasks, Ryan Greenblatt https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt [00:41:20] GPT tokenization approach for numbers, OpenAI https://platform.openai.com/docs/guides/text-generation/tokenizer-examples [00:46:25] DFS in AI search strategies, Russell & Norvig https://www.amazon.com/Artificial-Intelligence-Modern-Approach-4th/dp/0134610997 [00:53:10] Paper on catastrophic forgetting in neural networks, Kirkpatrick et al. https://www.pnas.org/doi/10.1073/pnas.1611835114 [00:54:00] LoRA for efficient fine-tuning of LLMs, Hu et al. https://arxiv.org/abs/2106.09685 [00:57:20] NVIDIA H100 Tensor Core GPU specs, NVIDIA https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ [01:04:55] Original MCTS in computer Go, Yifan Jin https://stanford.edu/~rezab/classes/cme323/S15/projects/montecarlo_search_tree_report.pdf

    1h9min
  5. Sepp Hochreiter - LSTM: The Comeback Story?

    12 DE FEV.

    Sepp Hochreiter - LSTM: The Comeback Story?

    Sepp Hochreiter, the inventor of LSTM (Long Short-Term Memory) networks – a foundational technology in AI. Sepp discusses his journey, the origins of LSTM, and why he believes his latest work, XLSTM, could be the next big thing in AI, particularly for applications like robotics and industrial simulation. He also shares his controversial perspective on Large Language Models (LLMs) and why reasoning is a critical missing piece in current AI systems. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting! https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT AND BACKGROUND READING: https://www.dropbox.com/scl/fi/n1vzm79t3uuss8xyinxzo/SEPPH.pdf?rlkey=fp7gwaopjk17uyvgjxekxrh5v&dl=0 Prof. Sepp Hochreiter https://www.nx-ai.com/ https://x.com/hochreitersepp https://scholar.google.at/citations?user=tvUH3WMAAAAJ&hl=en TOC: 1. LLM Evolution and Reasoning Capabilities [00:00:00] 1.1 LLM Capabilities and Limitations Debate [00:03:16] 1.2 Program Generation and Reasoning in AI Systems [00:06:30] 1.3 Human vs AI Reasoning Comparison [00:09:59] 1.4 New Research Initiatives and Hybrid Approaches 2. LSTM Technical Architecture [00:13:18] 2.1 LSTM Development History and Technical Background [00:20:38] 2.2 LSTM vs RNN Architecture and Computational Complexity [00:25:10] 2.3 xLSTM Architecture and Flash Attention Comparison [00:30:51] 2.4 Evolution of Gating Mechanisms from Sigmoid to Exponential 3. Industrial Applications and Neuro-Symbolic AI [00:40:35] 3.1 Industrial Applications and Fixed Memory Advantages [00:42:31] 3.2 Neuro-Symbolic Integration and Pi AI Project [00:46:00] 3.3 Integration of Symbolic and Neural AI Approaches [00:51:29] 3.4 Evolution of AI Paradigms and System Thinking [00:54:55] 3.5 AI Reasoning and Human Intelligence Comparison [00:58:12] 3.6 NXAI Company and Industrial AI Applications REFS: [00:00:15] Seminal LSTM paper establishing Hochreiter's expertise (Hochreiter & Schmidhuber) https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory [00:04:20] Kolmogorov complexity and program composition limitations (Kolmogorov) https://link.springer.com/article/10.1007/BF02478259 [00:07:10] Limitations of LLM mathematical reasoning and symbolic integration (Various Authors) https://www.arxiv.org/pdf/2502.03671 [00:09:05] AlphaGo’s Move 37 demonstrating creative AI (Google DeepMind) https://deepmind.google/research/breakthroughs/alphago/ [00:10:15] New AI research lab in Zurich for fundamental LLM research (Benjamin Crouzier) https://tufalabs.ai [00:19:40] Introduction of xLSTM with exponential gating (Beck, Hochreiter, et al.) https://arxiv.org/abs/2405.04517 [00:22:55] FlashAttention: fast & memory-efficient attention (Tri Dao et al.) https://arxiv.org/abs/2205.14135 [00:31:00] Historical use of sigmoid/tanh activation in 1990s (James A. McCaffrey) https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx [00:36:10] Mamba 2 state space model architecture (Albert Gu et al.) https://arxiv.org/abs/2312.00752 [00:46:00] Austria’s Pi AI project integrating symbolic & neural AI (Hochreiter et al.) https://www.jku.at/en/institute-of-machine-learning/research/projects/ [00:48:10] Neuro-symbolic integration challenges in language models (Diego Calanzone et al.) https://openreview.net/forum?id=7PGluppo4k [00:49:30] JKU Linz’s historical and neuro-symbolic research (Sepp Hochreiter) https://www.jku.at/en/news-events/news/detail/news/bilaterale-ki-projekt-unter-leitung-der-jku-erhaelt-fwf-cluster-of-excellence/ YT: https://www.youtube.com/watch?v=8u2pW2zZLCs

    1h7min
  6. Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

    8 DE FEV.

    Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

    Professor Randall Balestriero joins us to discuss neural network geometry, spline theory, and emerging phenomena in deep learning, based on research presented at ICML. Topics include the delayed emergence of adversarial robustness in neural networks ("grokking"), geometric interpretations of neural networks via spline theory, and challenges in reconstruction learning. We also cover geometric analysis of Large Language Models (LLMs) for toxicity detection and the relationship between intrinsic dimensionality and model control in RLHF. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** Randall Balestriero https://x.com/randall_balestr https://randallbalestriero.github.io/ Show notes and transcript: https://www.dropbox.com/scl/fi/3lufge4upq5gy0ug75j4a/RANDALLSHOW.pdf?rlkey=nbemgpa0jhawt1e86rx7372e4&dl=0 TOC: - Introduction - 00:00:00: Introduction - Neural Network Geometry and Spline Theory - 00:01:41: Neural Network Geometry and Spline Theory - 00:07:41: Deep Networks Always Grok - 00:11:39: Grokking and Adversarial Robustness - 00:16:09: Double Descent and Catastrophic Forgetting - Reconstruction Learning - 00:18:49: Reconstruction Learning - 00:24:15: Frequency Bias in Neural Networks - Geometric Analysis of Neural Networks - 00:29:02: Geometric Analysis of Neural Networks - 00:34:41: Adversarial Examples and Region Concentration - LLM Safety and Geometric Analysis - 00:40:05: LLM Safety and Geometric Analysis - 00:46:11: Toxicity Detection in LLMs - 00:52:24: Intrinsic Dimensionality and Model Control - 00:58:07: RLHF and High-Dimensional Spaces - Conclusion - 01:02:13: Neural Tangent Kernel - 01:08:07: Conclusion REFS: [00:01:35] Humayun – Deep network geometry & input space partitioning https://arxiv.org/html/2408.04809v1 [00:03:55] Balestriero & Paris – Linking deep networks to adaptive spline operators https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf [00:13:55] Song et al. – Gradient-based white-box adversarial attacks https://arxiv.org/abs/2012.14965 [00:16:05] Humayun, Balestriero & Baraniuk – Grokking phenomenon & emergent robustness https://arxiv.org/abs/2402.15555 [00:18:25] Humayun – Training dynamics & double descent via linear region evolution https://arxiv.org/abs/2310.12977 [00:20:15] Balestriero – Power diagram partitions in DNN decision boundaries https://arxiv.org/abs/1905.08443 [00:23:00] Frankle & Carbin – Lottery Ticket Hypothesis for network pruning https://arxiv.org/abs/1803.03635 [00:24:00] Belkin et al. – Double descent phenomenon in modern ML https://arxiv.org/abs/1812.11118 [00:25:55] Balestriero et al. – Batch normalization’s regularization effects https://arxiv.org/pdf/2209.14778 [00:29:35] EU – EU AI Act 2024 with compute restrictions https://www.lw.com/admin/upload/SiteAttachments/EU-AI-Act-Navigating-a-Brave-New-World.pdf [00:39:30] Humayun, Balestriero & Baraniuk – SplineCam: Visualizing deep network geometry https://openaccess.thecvf.com/content/CVPR2023/papers/Humayun_SplineCam_Exact_Visualization_and_Characterization_of_Deep_Network_Geometry_and_CVPR_2023_paper.pdf [00:40:40] Carlini – Trade-offs between adversarial robustness and accuracy https://arxiv.org/pdf/2407.20099 [00:44:55] Balestriero & LeCun – Limitations of reconstruction-based learning methods https://openreview.net/forum?id=ez7w0Ss4g9 (truncated, see shownotes PDF)

    1h18min
  7. Nicholas Carlini (Google DeepMind)

    25 DE JAN.

    Nicholas Carlini (Google DeepMind)

    Nicholas Carlini from Google DeepMind offers his view of AI security, emergent LLM capabilities, and his groundbreaking model-stealing research. He reveals how LLMs can unexpectedly excel at tasks like chess and discusses the security pitfalls of LLM-generated code. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** Transcript: https://www.dropbox.com/scl/fi/lat7sfyd4k3g5k9crjpbf/CARLINI.pdf?rlkey=b7kcqbvau17uw6rksbr8ccd8v&dl=0 TOC: 1. ML Security Fundamentals [00:00:00] 1.1 ML Model Reasoning and Security Fundamentals [00:03:04] 1.2 ML Security Vulnerabilities and System Design [00:08:22] 1.3 LLM Chess Capabilities and Emergent Behavior [00:13:20] 1.4 Model Training, RLHF, and Calibration Effects 2. Model Evaluation and Research Methods [00:19:40] 2.1 Model Reasoning and Evaluation Metrics [00:24:37] 2.2 Security Research Philosophy and Methodology [00:27:50] 2.3 Security Disclosure Norms and Community Differences 3. LLM Applications and Best Practices [00:44:29] 3.1 Practical LLM Applications and Productivity Gains [00:49:51] 3.2 Effective LLM Usage and Prompting Strategies [00:53:03] 3.3 Security Vulnerabilities in LLM-Generated Code 4. Advanced LLM Research and Architecture [00:59:13] 4.1 LLM Code Generation Performance and O(1) Labs Experience [01:03:31] 4.2 Adaptation Patterns and Benchmarking Challenges [01:10:10] 4.3 Model Stealing Research and Production LLM Architecture Extraction REFS: [00:01:15] Nicholas Carlini’s personal website & research profile (Google DeepMind, ML security) - https://nicholas.carlini.com/ [00:01:50] CentML AI compute platform for language model workloads - https://centml.ai/ [00:04:30] Seminal paper on neural network robustness against adversarial examples (Carlini & Wagner, 2016) - https://arxiv.org/abs/1608.04644 [00:05:20] Computer Fraud and Abuse Act (CFAA) – primary U.S. federal law on computer hacking liability - https://www.justice.gov/jm/jm-9-48000-computer-fraud [00:08:30] Blog post: Emergent chess capabilities in GPT-3.5-turbo-instruct (Nicholas Carlini, Sept 2023) - https://nicholas.carlini.com/writing/2023/chess-llm.html [00:16:10] Paper: “Self-Play Preference Optimization for Language Model Alignment” (Yue Wu et al., 2024) - https://arxiv.org/abs/2405.00675 [00:18:00] GPT-4 Technical Report: development, capabilities, and calibration analysis - https://arxiv.org/abs/2303.08774 [00:22:40] Historical shift from descriptive to algebraic chess notation (FIDE) - https://en.wikipedia.org/wiki/Descriptive_notation [00:23:55] Analysis of distribution shift in ML (Hendrycks et al.) - https://arxiv.org/abs/2006.16241 [00:27:40] Nicholas Carlini’s essay “Why I Attack” (June 2024) – motivations for security research - https://nicholas.carlini.com/writing/2024/why-i-attack.html [00:34:05] Google Project Zero’s 90-day vulnerability disclosure policy - https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html [00:51:15] Evolution of Google search syntax & user behavior (Daniel M. Russell) - https://www.amazon.com/Joy-Search-Google-Master-Information/dp/0262042878 [01:04:05] Rust’s ownership & borrowing system for memory safety - https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [01:10:05] Paper: “Stealing Part of a Production Language Model” (Carlini et al., March 2024) – extraction attacks on ChatGPT, PaLM-2 - https://arxiv.org/abs/2403.06634 [01:10:55] First model stealing paper (Tramèr et al., 2016) – attacking ML APIs via prediction - https://arxiv.org/abs/1609.02943

    1h21min
  8. Subbarao Kambhampati - Do o1 models search?

    23 DE JAN.

    Subbarao Kambhampati - Do o1 models search?

    Join Prof. Subbarao Kambhampati and host Tim Scarfe for a deep dive into OpenAI's O1 model and the future of AI reasoning systems. * How O1 likely uses reinforcement learning similar to AlphaGo, with hidden reasoning tokens that users pay for but never see * The evolution from traditional Large Language Models to more sophisticated reasoning systems * The concept of "fractal intelligence" in AI - where models work brilliantly sometimes but fail unpredictably * Why O1's improved performance comes with substantial computational costs * The ongoing debate between single-model approaches (OpenAI) vs hybrid systems (Google) * The critical distinction between AI as an intelligence amplifier vs autonomous decision-maker SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** TOC: 1. **O1 Architecture and Reasoning Foundations** [00:00:00] 1.1 Fractal Intelligence and Reasoning Model Limitations [00:04:28] 1.2 LLM Evolution: From Simple Prompting to Advanced Reasoning [00:14:28] 1.3 O1's Architecture and AlphaGo-like Reasoning Approach [00:23:18] 1.4 Empirical Evaluation of O1's Planning Capabilities 2. **Monte Carlo Methods and Model Deep-Dive** [00:29:30] 2.1 Monte Carlo Methods and MARCO-O1 Implementation [00:31:30] 2.2 Reasoning vs. Retrieval in LLM Systems [00:40:40] 2.3 Fractal Intelligence Capabilities and Limitations [00:45:59] 2.4 Mechanistic Interpretability of Model Behavior [00:51:41] 2.5 O1 Response Patterns and Performance Analysis 3. **System Design and Real-World Applications** [00:59:30] 3.1 Evolution from LLMs to Language Reasoning Models [01:06:48] 3.2 Cost-Efficiency Analysis: LLMs vs O1 [01:11:28] 3.3 Autonomous vs Human-in-the-Loop Systems [01:16:01] 3.4 Program Generation and Fine-Tuning Approaches [01:26:08] 3.5 Hybrid Architecture Implementation Strategies Transcript: https://www.dropbox.com/scl/fi/d0ef4ovnfxi0lknirkvft/Subbarao.pdf?rlkey=l3rp29gs4hkut7he8u04mm1df&dl=0 REFS: [00:02:00] Monty Python (1975) Witch trial scene: flawed logical reasoning. https://www.youtube.com/watch?v=zrzMhU_4m-g [00:04:00] Cade Metz (2024) Microsoft–OpenAI partnership evolution and control dynamics. https://www.nytimes.com/2024/10/17/technology/microsoft-openai-partnership-deal.html [00:07:25] Kojima et al. (2022) Zero-shot chain-of-thought prompting ('Let's think step by step'). https://arxiv.org/pdf/2205.11916 [00:12:50] DeepMind Research Team (2023) Multi-bot game solving with external and internal planning. https://deepmind.google/research/publications/139455/ [00:15:10] Silver et al. (2016) AlphaGo's Monte Carlo Tree Search and Q-learning. https://www.nature.com/articles/nature16961 [00:16:30] Kambhampati, S. et al. (2023) Evaluates O1's planning in "Strawberry Fields" benchmarks. https://arxiv.org/pdf/2410.02162 [00:29:30] Alibaba AIDC-AI Team (2023) MARCO-O1: Chain-of-Thought + MCTS for improved reasoning. https://arxiv.org/html/2411.14405 [00:31:30] Kambhampati, S. (2024) Explores LLM "reasoning vs retrieval" debate. https://arxiv.org/html/2403.04121v2 [00:37:35] Wei, J. et al. (2022) Chain-of-thought prompting (introduces last-letter concatenation). https://arxiv.org/pdf/2201.11903 [00:42:35] Barbero, F. et al. (2024) Transformer attention and "information over-squashing." https://arxiv.org/html/2406.04267v2 [00:46:05] Ruis, L. et al. (2023) Influence functions to understand procedural knowledge in LLMs. https://arxiv.org/html/2411.12580v1 (truncated - continued in shownotes/transcript doc)

    1h32min
4,7
de 5
83 avaliações

Sobre

Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).

Você também pode gostar de

Para ouvir episódios explícitos, inicie sessão.

Fique por dentro deste podcast

Inicie sessão ou crie uma conta para seguir podcasts, salvar episódios e receber as atualizações mais recentes.

Selecionar um país ou região

África, Oriente Médio e Índia

Ásia‑Pacífico

Europa

América Latina e Caribe

Estados Unidos e Canadá