Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)
Machine Learning Street Talk (MLST)

Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).

  1. Transformers Need Glasses! - Federico Barbero

    HÁ 5 H

    Transformers Need Glasses! - Federico Barbero

    Federico Barbero (DeepMind/Oxford) is the lead author of "Transformers Need Glasses!". Have you ever wondered why LLMs struggle with seemingly simple tasks like counting or copying long strings of text? We break down the theoretical reasons behind these failures, revealing architectural bottlenecks and the challenges of maintaining information fidelity across extended contexts. Federico explains how these issues are rooted in the transformer's design, drawing parallels to over-squashing in graph neural networks and detailing how the softmax function limits sharp decision-making. But it's not all bad news! Discover practical "glasses" that can help transformers see more clearly, from simple input modifications to architectural tweaks. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting! https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** https://federicobarbero.com/ TRANSCRIPT + RESEARCH: https://www.dropbox.com/s/h7ys83ztwktqjje/Federico.pdf?dl=0 TOC: 1. Transformer Limitations: Token Detection & Representation [00:00:00] 1.1 Transformers fail at single token detection [00:02:45] 1.2 Representation collapse in transformers [00:03:21] 1.3 Experiment: LLMs fail at copying last tokens [00:18:00] 1.4 Attention sharpness limitations in transformers 2. Transformer Limitations: Information Flow & Quantization [00:18:50] 2.1 Unidirectional information mixing [00:18:50] 2.2 Unidirectional information flow towards sequence beginning in transformers [00:21:50] 2.3 Diagonal attention heads as expensive no-ops in LAMA/Gemma [00:27:14] 2.4 Sequence entropy affects transformer model distinguishability [00:30:36] 2.5 Quantization limitations lead to information loss & representational collapse [00:38:34] 2.6 LLMs use subitizing as opposed to counting algorithms 3. Transformers and the Nature of Reasoning [00:40:30] 3.1 Turing completeness conditions in transformers [00:43:23] 3.2 Transformers struggle with sequential tasks [00:45:50] 3.3 Windowed attention as solution to information compression [00:51:04] 3.4 Chess engines: mechanical computation vs creative reasoning [01:00:35] 3.5 Epistemic foraging introduced REFS: [00:01:05] Transformers Need Glasses!, Barbero et al. https://proceedings.neurips.cc/paper_files/paper/2024/file/b1d35561c4a4a0e0b6012b2af531e149-Paper-Conference.pdf [00:05:30] Softmax is Not Enough, Veličković et al. https://arxiv.org/abs/2410.01104 [00:11:30] Adv Alg Lecture 15, Chawla https://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec15.pdf [00:15:05] Graph Attention Networks, Veličković https://arxiv.org/abs/1710.10903 [00:19:15] Extract Training Data, Carlini et al. https://arxiv.org/pdf/2311.17035 [00:31:30] 1-bit LLMs, Ma et al. https://arxiv.org/abs/2402.17764 [00:38:35] LLMs Solve Math, Nikankin et al. https://arxiv.org/html/2410.21272v1 [00:38:45] Subitizing, Railo https://link.springer.com/10.1007/978-1-4419-1428-6_578 [00:43:25] NN & Chomsky Hierarchy, Delétang et al. https://arxiv.org/abs/2207.02098 [00:51:05] Measure of Intelligence, Chollet https://arxiv.org/abs/1911.01547 [00:52:10] AlphaZero, Silver et al. https://pubmed.ncbi.nlm.nih.gov/30523106/ [00:55:10] Golden Gate Claude, Anthropic https://www.anthropic.com/news/golden-gate-claude [00:56:40] Chess Positions, Chase & Simon https://www.sciencedirect.com/science/article/abs/pii/0010028573900042 [01:00:35] Epistemic Foraging, Friston https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2016.00056/full

    1h1min
  2. Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu

    1 DE MAR.

    Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu

    We speak with Sakana AI, who are building nature-inspired methods that could fundamentally transform how we develop AI systems. The guests include Chris Lu, a researcher who recently completed his DPhil at Oxford University under Prof. Jakob Foerster's supervision, where he focused on meta-learning and multi-agent systems. Chris is the first author of the DiscoPOP paper, which demonstrates how language models can discover and design better training algorithms. Also joining is Robert Tjarko Lange, a founding member of Sakana AI who specializes in evolutionary algorithms and large language models. Robert leads research at the intersection of evolutionary computation and foundation models, and is completing his PhD at TU Berlin on evolutionary meta-learning. The discussion also features Cong Lu, currently a Research Scientist at Google DeepMind's Open-Endedness team, who previously helped develop The AI Scientist and Intelligent Go-Explore. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting! https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** * DiscoPOP - A framework where language models discover their own optimization algorithms * EvoLLM - Using language models as evolution strategies for optimization The AI Scientist - A fully automated system that conducts scientific research end-to-end * Neural Attention Memory Models (NAMMs) - Evolved memory systems that make transformers both faster and more accurate TRANSCRIPT + REFS: https://www.dropbox.com/scl/fi/gflcyvnujp8cl7zlv3v9d/Sakana.pdf?rlkey=woaoo82943170jd4yyi2he71c&dl=0 Robert Tjarko Lange https://roberttlange.com/ Chris Lu https://chrislu.page/ Cong Lu https://www.conglu.co.uk/ Sakana https://sakana.ai/blog/ TOC: 1. LLMs for Algorithm Generation and Optimization [00:00:00] 1.1 LLMs generating algorithms for training other LLMs [00:04:00] 1.2 Evolutionary black-box optim using neural network loss parameterization [00:11:50] 1.3 DiscoPOP: Non-convex loss function for noisy data [00:20:45] 1.4 External entropy Injection for preventing Model collapse [00:26:25] 1.5 LLMs for black-box optimization using abstract numerical sequences 2. Model Learning and Generalization [00:31:05] 2.1 Fine-tuning on teacher algorithm trajectories [00:31:30] 2.2 Transformers learning gradient descent [00:33:00] 2.3 LLM tokenization biases towards specific numbers [00:34:50] 2.4 LLMs as evolution strategies for black box optimization [00:38:05] 2.5 DiscoPOP: LLMs discovering novel optimization algorithms 3. AI Agents and System Architectures [00:51:30] 3.1 ARC challenge: Induction vs. transformer approaches [00:54:35] 3.2 LangChain / modular agent components [00:57:50] 3.3 Debate improves LLM truthfulness [01:00:55] 3.4 Time limits controlling AI agent systems [01:03:00] 3.5 Gemini: Million-token context enables flatter hierarchies [01:04:05] 3.6 Agents follow own interest gradients [01:09:50] 3.7 Go-Explore algorithm: archive-based exploration [01:11:05] 3.8 Foundation models for interesting state discovery [01:13:00] 3.9 LLMs leverage prior game knowledge 4. AI for Scientific Discovery and Human Alignment [01:17:45] 4.1 Encoding Alignment & Aesthetics via Reward Functions [01:20:00] 4.2 AI Scientist: Automated Open-Ended Scientific Discovery [01:24:15] 4.3 DiscoPOP: LLM for Preference Optimization Algorithms [01:28:30] 4.4 Balancing AI Knowledge with Human Understanding [01:33:55] 4.5 AI-Driven Conferences and Paper Review

    1h38min
  3. Clement Bonnet - Can Latent Program Networks Solve Abstract Reasoning?

    19 DE FEV.

    Clement Bonnet - Can Latent Program Networks Solve Abstract Reasoning?

    Clement Bonnet discusses his novel approach to the ARC (Abstraction and Reasoning Corpus) challenge. Unlike approaches that rely on fine-tuning LLMs or generating samples at inference time, Clement's method encodes input-output pairs into a latent space, optimizes this representation with a search algorithm, and decodes outputs for new inputs. This end-to-end architecture uses a VAE loss, including reconstruction and prior losses. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting! https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT + RESEARCH OVERVIEW: https://www.dropbox.com/scl/fi/j7m0gaz1126y594gswtma/CLEMMLST.pdf?rlkey=y5qvwq2er5nchbcibm07rcfpq&dl=0 Clem and Matthew- https://www.linkedin.com/in/clement-bonnet16/ https://github.com/clement-bonnet https://mvmacfarlane.github.io/ TOC 1. LPN Fundamentals [00:00:00] 1.1 Introduction to ARC Benchmark and LPN Overview [00:05:05] 1.2 Neural Networks' Challenges with ARC and Program Synthesis [00:06:55] 1.3 Induction vs Transduction in Machine Learning 2. LPN Architecture and Latent Space [00:11:50] 2.1 LPN Architecture and Latent Space Implementation [00:16:25] 2.2 LPN Latent Space Encoding and VAE Architecture [00:20:25] 2.3 Gradient-Based Search Training Strategy [00:23:39] 2.4 LPN Model Architecture and Implementation Details 3. Implementation and Scaling [00:27:34] 3.1 Training Data Generation and re-ARC Framework [00:31:28] 3.2 Limitations of Latent Space and Multi-Thread Search [00:34:43] 3.3 Program Composition and Computational Graph Architecture 4. Advanced Concepts and Future Directions [00:45:09] 4.1 AI Creativity and Program Synthesis Approaches [00:49:47] 4.2 Scaling and Interpretability in Latent Space Models REFS [00:00:05] ARC benchmark, Chollet https://arxiv.org/abs/2412.04604 [00:02:10] Latent Program Spaces, Bonnet, Macfarlane https://arxiv.org/abs/2411.08706 [00:07:45] Kevin Ellis work on program generation https://www.cs.cornell.edu/~ellisk/ [00:08:45] Induction vs transduction in abstract reasoning, Li et al. https://arxiv.org/abs/2411.02272 [00:17:40] VAEs, Kingma, Welling https://arxiv.org/abs/1312.6114 [00:27:50] re-ARC, Hodel https://github.com/michaelhodel/re-arc [00:29:40] Grid size in ARC tasks, Chollet https://github.com/fchollet/ARC-AGI [00:33:00] Critique of deep learning, Marcus https://arxiv.org/vc/arxiv/papers/2002/2002.06177v1.pdf

    51min
  4. Prof. Jakob Foerster - ImageNet Moment for Reinforcement Learning?

    18 DE FEV.

    Prof. Jakob Foerster - ImageNet Moment for Reinforcement Learning?

    Prof. Jakob Foerster, a leading AI researcher at Oxford University and Meta, and Chris Lu, a researcher at OpenAI -- they explain how AI is moving beyond just mimicking human behaviour to creating truly intelligent agents that can learn and solve problems on their own. Foerster champions open-source AI for responsible, decentralised development. He addresses AI scaling, goal misalignment (Goodhart's Law), and the need for holistic alignment, offering a quick look at the future of AI and how to guide it. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting! https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT/REFS: https://www.dropbox.com/scl/fi/yqjszhntfr00bhjh6t565/JAKOB.pdf?rlkey=scvny4bnwj8th42fjv8zsfu2y&dl=0 Prof. Jakob Foerster https://x.com/j_foerst https://www.jakobfoerster.com/ University of Oxford Profile: https://eng.ox.ac.uk/people/jakob-foerster/ Chris Lu: https://chrislu.page/ TOC 1. GPU Acceleration and Training Infrastructure [00:00:00] 1.1 ARC Challenge Criticism and FLAIR Lab Overview [00:01:25] 1.2 GPU Acceleration and Hardware Lottery in RL [00:05:50] 1.3 Data Wall Challenges and Simulation-Based Solutions [00:08:40] 1.4 JAX Implementation and Technical Acceleration 2. Learning Frameworks and Policy Optimization [00:14:18] 2.1 Evolution of RL Algorithms and Mirror Learning Framework [00:15:25] 2.2 Meta-Learning and Policy Optimization Algorithms [00:21:47] 2.3 Language Models and Benchmark Challenges [00:28:15] 2.4 Creativity and Meta-Learning in AI Systems 3. Multi-Agent Systems and Decentralization [00:31:24] 3.1 Multi-Agent Systems and Emergent Intelligence [00:38:35] 3.2 Swarm Intelligence vs Monolithic AGI Systems [00:42:44] 3.3 Democratic Control and Decentralization of AI Development [00:46:14] 3.4 Open Source AI and Alignment Challenges [00:49:31] 3.5 Collaborative Models for AI Development REFS [[00:00:05] ARC Benchmark, Chollet https://github.com/fchollet/ARC-AGI [00:03:05] DRL Doesn't Work, Irpan https://www.alexirpan.com/2018/02/14/rl-hard.html [00:05:55] AI Training Data, Data Provenance Initiative https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html [00:06:10] JaxMARL, Foerster et al. https://arxiv.org/html/2311.10090v5 [00:08:50] M-FOS, Lu et al. https://arxiv.org/abs/2205.01447 [00:09:45] JAX Library, Google Research https://github.com/jax-ml/jax [00:12:10] Kinetix, Mike and Michael https://arxiv.org/abs/2410.23208 [00:12:45] Genie 2, DeepMind https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/ [00:14:42] Mirror Learning, Grudzien, Kuba et al. https://arxiv.org/abs/2208.01682 [00:16:30] Discovered Policy Optimisation, Lu et al. https://arxiv.org/abs/2210.05639 [00:24:10] Goodhart's Law, Goodhart https://en.wikipedia.org/wiki/Goodhart%27s_law [00:25:15] LLM ARChitect, Franzen et al. https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf [00:28:55] AlphaGo, Silver et al. https://arxiv.org/pdf/1712.01815.pdf [00:30:10] Meta-learning, Lu, Towers, Foerster https://direct.mit.edu/isal/proceedings-pdf/isal2023/35/67/2354943/isal_a_00674.pdf [00:31:30] Emergence of Pragmatics, Yuan et al. https://arxiv.org/abs/2001.07752 [00:34:30] AI Safety, Amodei et al. https://arxiv.org/abs/1606.06565 [00:35:45] Intentional Stance, Dennett https://plato.stanford.edu/entries/ethics-ai/ [00:39:25] Multi-Agent RL, Zhou et al. https://arxiv.org/pdf/2305.10091 [00:41:00] Open Source Generative AI, Foerster et al. https://arxiv.org/abs/2405.08597

    54min
  5. Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

    12 DE FEV.

    Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

    Daniel Franzen and Jan Disselhoff, the "ARChitects" are the official winners of the ARC Prize 2024. Filmed at Tufa Labs in Zurich - they revealed how they achieved a remarkable 53.5% accuracy by creatively utilising large language models (LLMs) in new ways. Discover their innovative techniques, including depth-first search for token selection, test-time training, and a novel augmentation-based validation system. Their results were extremely surprising. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting! https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** Jan Disselhoff https://www.linkedin.com/in/jan-disselhoff-1423a2240/ Daniel Franzen https://github.com/da-fr ARC Prize: http://arcprize.org/ TRANSCRIPT AND BACKGROUND READING: https://www.dropbox.com/scl/fi/utkn2i1ma79fn6an4yvjw/ARCHitects.pdf?rlkey=67pe38mtss7oyhjk2ad0d2aza&dl=0 TOC 1. Solution Architecture and Strategy Overview [00:00:00] 1.1 Initial Solution Overview and Model Architecture [00:04:25] 1.2 LLM Capabilities and Dataset Approach [00:10:51] 1.3 Test-Time Training and Data Augmentation Strategies [00:14:08] 1.4 Sampling Methods and Search Implementation [00:17:52] 1.5 ARC vs Language Model Context Comparison 2. LLM Search and Model Implementation [00:21:53] 2.1 LLM-Guided Search Approaches and Solution Validation [00:27:04] 2.2 Symmetry Augmentation and Model Architecture [00:30:11] 2.3 Model Intelligence Characteristics and Performance [00:37:23] 2.4 Tokenization and Numerical Processing Challenges 3. Advanced Training and Optimization [00:45:15] 3.1 DFS Token Selection and Probability Thresholds [00:49:41] 3.2 Model Size and Fine-tuning Performance Trade-offs [00:53:07] 3.3 LoRA Implementation and Catastrophic Forgetting Prevention [00:56:10] 3.4 Training Infrastructure and Optimization Experiments [01:02:34] 3.5 Search Tree Analysis and Entropy Distribution Patterns REFS [00:01:05] Winning ARC 2024 solution using 12B param model, Franzen, Disselhoff, Hartmann https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf [00:03:40] Robustness of analogical reasoning in LLMs, Melanie Mitchell https://arxiv.org/html/2411.14215 [00:07:50] Re-ARC dataset generator for ARC task variations, Michael Hodel https://github.com/michaelhodel/re-arc [00:15:00] Analysis of search methods in LLMs (greedy, beam, DFS), Chen et al. https://arxiv.org/html/2408.00724v2 [00:16:55] Language model reachability space exploration, University of Toronto https://www.youtube.com/watch?v=Bpgloy1dDn0 [00:22:30] GPT-4 guided code solutions for ARC tasks, Ryan Greenblatt https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt [00:41:20] GPT tokenization approach for numbers, OpenAI https://platform.openai.com/docs/guides/text-generation/tokenizer-examples [00:46:25] DFS in AI search strategies, Russell & Norvig https://www.amazon.com/Artificial-Intelligence-Modern-Approach-4th/dp/0134610997 [00:53:10] Paper on catastrophic forgetting in neural networks, Kirkpatrick et al. https://www.pnas.org/doi/10.1073/pnas.1611835114 [00:54:00] LoRA for efficient fine-tuning of LLMs, Hu et al. https://arxiv.org/abs/2106.09685 [00:57:20] NVIDIA H100 Tensor Core GPU specs, NVIDIA https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ [01:04:55] Original MCTS in computer Go, Yifan Jin https://stanford.edu/~rezab/classes/cme323/S15/projects/montecarlo_search_tree_report.pdf

    1h9min
  6. Sepp Hochreiter - LSTM: The Comeback Story?

    12 DE FEV.

    Sepp Hochreiter - LSTM: The Comeback Story?

    Sepp Hochreiter, the inventor of LSTM (Long Short-Term Memory) networks – a foundational technology in AI. Sepp discusses his journey, the origins of LSTM, and why he believes his latest work, XLSTM, could be the next big thing in AI, particularly for applications like robotics and industrial simulation. He also shares his controversial perspective on Large Language Models (LLMs) and why reasoning is a critical missing piece in current AI systems. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting! https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT AND BACKGROUND READING: https://www.dropbox.com/scl/fi/n1vzm79t3uuss8xyinxzo/SEPPH.pdf?rlkey=fp7gwaopjk17uyvgjxekxrh5v&dl=0 Prof. Sepp Hochreiter https://www.nx-ai.com/ https://x.com/hochreitersepp https://scholar.google.at/citations?user=tvUH3WMAAAAJ&hl=en TOC: 1. LLM Evolution and Reasoning Capabilities [00:00:00] 1.1 LLM Capabilities and Limitations Debate [00:03:16] 1.2 Program Generation and Reasoning in AI Systems [00:06:30] 1.3 Human vs AI Reasoning Comparison [00:09:59] 1.4 New Research Initiatives and Hybrid Approaches 2. LSTM Technical Architecture [00:13:18] 2.1 LSTM Development History and Technical Background [00:20:38] 2.2 LSTM vs RNN Architecture and Computational Complexity [00:25:10] 2.3 xLSTM Architecture and Flash Attention Comparison [00:30:51] 2.4 Evolution of Gating Mechanisms from Sigmoid to Exponential 3. Industrial Applications and Neuro-Symbolic AI [00:40:35] 3.1 Industrial Applications and Fixed Memory Advantages [00:42:31] 3.2 Neuro-Symbolic Integration and Pi AI Project [00:46:00] 3.3 Integration of Symbolic and Neural AI Approaches [00:51:29] 3.4 Evolution of AI Paradigms and System Thinking [00:54:55] 3.5 AI Reasoning and Human Intelligence Comparison [00:58:12] 3.6 NXAI Company and Industrial AI Applications REFS: [00:00:15] Seminal LSTM paper establishing Hochreiter's expertise (Hochreiter & Schmidhuber) https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory [00:04:20] Kolmogorov complexity and program composition limitations (Kolmogorov) https://link.springer.com/article/10.1007/BF02478259 [00:07:10] Limitations of LLM mathematical reasoning and symbolic integration (Various Authors) https://www.arxiv.org/pdf/2502.03671 [00:09:05] AlphaGo’s Move 37 demonstrating creative AI (Google DeepMind) https://deepmind.google/research/breakthroughs/alphago/ [00:10:15] New AI research lab in Zurich for fundamental LLM research (Benjamin Crouzier) https://tufalabs.ai [00:19:40] Introduction of xLSTM with exponential gating (Beck, Hochreiter, et al.) https://arxiv.org/abs/2405.04517 [00:22:55] FlashAttention: fast & memory-efficient attention (Tri Dao et al.) https://arxiv.org/abs/2205.14135 [00:31:00] Historical use of sigmoid/tanh activation in 1990s (James A. McCaffrey) https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx [00:36:10] Mamba 2 state space model architecture (Albert Gu et al.) https://arxiv.org/abs/2312.00752 [00:46:00] Austria’s Pi AI project integrating symbolic & neural AI (Hochreiter et al.) https://www.jku.at/en/institute-of-machine-learning/research/projects/ [00:48:10] Neuro-symbolic integration challenges in language models (Diego Calanzone et al.) https://openreview.net/forum?id=7PGluppo4k [00:49:30] JKU Linz’s historical and neuro-symbolic research (Sepp Hochreiter) https://www.jku.at/en/news-events/news/detail/news/bilaterale-ki-projekt-unter-leitung-der-jku-erhaelt-fwf-cluster-of-excellence/ YT: https://www.youtube.com/watch?v=8u2pW2zZLCs

    1h7min
  7. Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

    8 DE FEV.

    Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

    Professor Randall Balestriero joins us to discuss neural network geometry, spline theory, and emerging phenomena in deep learning, based on research presented at ICML. Topics include the delayed emergence of adversarial robustness in neural networks ("grokking"), geometric interpretations of neural networks via spline theory, and challenges in reconstruction learning. We also cover geometric analysis of Large Language Models (LLMs) for toxicity detection and the relationship between intrinsic dimensionality and model control in RLHF. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** Randall Balestriero https://x.com/randall_balestr https://randallbalestriero.github.io/ Show notes and transcript: https://www.dropbox.com/scl/fi/3lufge4upq5gy0ug75j4a/RANDALLSHOW.pdf?rlkey=nbemgpa0jhawt1e86rx7372e4&dl=0 TOC: - Introduction - 00:00:00: Introduction - Neural Network Geometry and Spline Theory - 00:01:41: Neural Network Geometry and Spline Theory - 00:07:41: Deep Networks Always Grok - 00:11:39: Grokking and Adversarial Robustness - 00:16:09: Double Descent and Catastrophic Forgetting - Reconstruction Learning - 00:18:49: Reconstruction Learning - 00:24:15: Frequency Bias in Neural Networks - Geometric Analysis of Neural Networks - 00:29:02: Geometric Analysis of Neural Networks - 00:34:41: Adversarial Examples and Region Concentration - LLM Safety and Geometric Analysis - 00:40:05: LLM Safety and Geometric Analysis - 00:46:11: Toxicity Detection in LLMs - 00:52:24: Intrinsic Dimensionality and Model Control - 00:58:07: RLHF and High-Dimensional Spaces - Conclusion - 01:02:13: Neural Tangent Kernel - 01:08:07: Conclusion REFS: [00:01:35] Humayun – Deep network geometry & input space partitioning https://arxiv.org/html/2408.04809v1 [00:03:55] Balestriero & Paris – Linking deep networks to adaptive spline operators https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf [00:13:55] Song et al. – Gradient-based white-box adversarial attacks https://arxiv.org/abs/2012.14965 [00:16:05] Humayun, Balestriero & Baraniuk – Grokking phenomenon & emergent robustness https://arxiv.org/abs/2402.15555 [00:18:25] Humayun – Training dynamics & double descent via linear region evolution https://arxiv.org/abs/2310.12977 [00:20:15] Balestriero – Power diagram partitions in DNN decision boundaries https://arxiv.org/abs/1905.08443 [00:23:00] Frankle & Carbin – Lottery Ticket Hypothesis for network pruning https://arxiv.org/abs/1803.03635 [00:24:00] Belkin et al. – Double descent phenomenon in modern ML https://arxiv.org/abs/1812.11118 [00:25:55] Balestriero et al. – Batch normalization’s regularization effects https://arxiv.org/pdf/2209.14778 [00:29:35] EU – EU AI Act 2024 with compute restrictions https://www.lw.com/admin/upload/SiteAttachments/EU-AI-Act-Navigating-a-Brave-New-World.pdf [00:39:30] Humayun, Balestriero & Baraniuk – SplineCam: Visualizing deep network geometry https://openaccess.thecvf.com/content/CVPR2023/papers/Humayun_SplineCam_Exact_Visualization_and_Characterization_of_Deep_Network_Geometry_and_CVPR_2023_paper.pdf [00:40:40] Carlini – Trade-offs between adversarial robustness and accuracy https://arxiv.org/pdf/2407.20099 [00:44:55] Balestriero & LeCun – Limitations of reconstruction-based learning methods https://openreview.net/forum?id=ez7w0Ss4g9 (truncated, see shownotes PDF)

    1h18min
  8. Nicholas Carlini (Google DeepMind)

    25 DE JAN.

    Nicholas Carlini (Google DeepMind)

    Nicholas Carlini from Google DeepMind offers his view of AI security, emergent LLM capabilities, and his groundbreaking model-stealing research. He reveals how LLMs can unexpectedly excel at tasks like chess and discusses the security pitfalls of LLM-generated code. SPONSOR MESSAGES: *** CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/ Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/ *** Transcript: https://www.dropbox.com/scl/fi/lat7sfyd4k3g5k9crjpbf/CARLINI.pdf?rlkey=b7kcqbvau17uw6rksbr8ccd8v&dl=0 TOC: 1. ML Security Fundamentals [00:00:00] 1.1 ML Model Reasoning and Security Fundamentals [00:03:04] 1.2 ML Security Vulnerabilities and System Design [00:08:22] 1.3 LLM Chess Capabilities and Emergent Behavior [00:13:20] 1.4 Model Training, RLHF, and Calibration Effects 2. Model Evaluation and Research Methods [00:19:40] 2.1 Model Reasoning and Evaluation Metrics [00:24:37] 2.2 Security Research Philosophy and Methodology [00:27:50] 2.3 Security Disclosure Norms and Community Differences 3. LLM Applications and Best Practices [00:44:29] 3.1 Practical LLM Applications and Productivity Gains [00:49:51] 3.2 Effective LLM Usage and Prompting Strategies [00:53:03] 3.3 Security Vulnerabilities in LLM-Generated Code 4. Advanced LLM Research and Architecture [00:59:13] 4.1 LLM Code Generation Performance and O(1) Labs Experience [01:03:31] 4.2 Adaptation Patterns and Benchmarking Challenges [01:10:10] 4.3 Model Stealing Research and Production LLM Architecture Extraction REFS: [00:01:15] Nicholas Carlini’s personal website & research profile (Google DeepMind, ML security) - https://nicholas.carlini.com/ [00:01:50] CentML AI compute platform for language model workloads - https://centml.ai/ [00:04:30] Seminal paper on neural network robustness against adversarial examples (Carlini & Wagner, 2016) - https://arxiv.org/abs/1608.04644 [00:05:20] Computer Fraud and Abuse Act (CFAA) – primary U.S. federal law on computer hacking liability - https://www.justice.gov/jm/jm-9-48000-computer-fraud [00:08:30] Blog post: Emergent chess capabilities in GPT-3.5-turbo-instruct (Nicholas Carlini, Sept 2023) - https://nicholas.carlini.com/writing/2023/chess-llm.html [00:16:10] Paper: “Self-Play Preference Optimization for Language Model Alignment” (Yue Wu et al., 2024) - https://arxiv.org/abs/2405.00675 [00:18:00] GPT-4 Technical Report: development, capabilities, and calibration analysis - https://arxiv.org/abs/2303.08774 [00:22:40] Historical shift from descriptive to algebraic chess notation (FIDE) - https://en.wikipedia.org/wiki/Descriptive_notation [00:23:55] Analysis of distribution shift in ML (Hendrycks et al.) - https://arxiv.org/abs/2006.16241 [00:27:40] Nicholas Carlini’s essay “Why I Attack” (June 2024) – motivations for security research - https://nicholas.carlini.com/writing/2024/why-i-attack.html [00:34:05] Google Project Zero’s 90-day vulnerability disclosure policy - https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html [00:51:15] Evolution of Google search syntax & user behavior (Daniel M. Russell) - https://www.amazon.com/Joy-Search-Google-Master-Information/dp/0262042878 [01:04:05] Rust’s ownership & borrowing system for memory safety - https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html [01:10:05] Paper: “Stealing Part of a Production Language Model” (Carlini et al., March 2024) – extraction attacks on ChatGPT, PaLM-2 - https://arxiv.org/abs/2403.06634 [01:10:55] First model stealing paper (Tramèr et al., 2016) – attacking ML APIs via prediction - https://arxiv.org/abs/1609.02943

    1h21min
    4,7
    de 5
    83 avaliações

    Sobre

    Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).

    Você também pode gostar de

    Conteúdo restrito

    Este episódio não pode ser reproduzido na web em seu país ou região.

    Para ouvir episódios explícitos, inicie sessão.

    Fique por dentro deste podcast

    Inicie sessão ou crie uma conta para seguir podcasts, salvar episódios e receber as atualizações mais recentes.

    Selecionar um país ou região

    África, Oriente Médio e Índia

    Ásia‑Pacífico

    Europa

    América Latina e Caribe

    Estados Unidos e Canadá