157 episodes

Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).

Machine Learning Street Talk (MLST‪)‬ Machine Learning Street Talk (MLST)

    • Technology
    • 4.8 • 69 Ratings

Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).

    Sara Hooker - Why US AI Act Compute Thresholds Are Misguided

    Sara Hooker - Why US AI Act Compute Thresholds Are Misguided

    Sara Hooker is VP of Research at Cohere and leader of Cohere for AI. We discuss her recent paper critiquing the use of compute thresholds, measured in FLOPs (floating point operations), as an AI governance strategy.



    We explore why this approach, recently adopted in both US and EU AI policies, may be problematic and oversimplified. Sara explains the limitations of using raw computational power as a measure of AI capability or risk, and discusses the complex relationship between compute, data, and model architecture.



    Equally important, we go into Sara's work on "The AI Language Gap." This research highlights the challenges and inequalities in developing AI systems that work across multiple languages. Sara discusses how current AI models, predominantly trained on English and a handful of high-resource languages, fail to serve the linguistic diversity of our global population. We explore the technical, ethical, and societal implications of this gap, and discuss potential solutions for creating more inclusive and representative AI systems.



    We broadly discuss the relationship between language, culture, and AI capabilities, as well as the ethical considerations in AI development and deployment.



    YT Version: https://youtu.be/dBZp47999Ko



    TOC:

    [00:00:00] Intro

    [00:02:12] FLOPS paper

    [00:26:42] Hardware lottery

    [00:30:22] The Language gap

    [00:33:25] Safety

    [00:38:31] Emergent

    [00:41:23] Creativity

    [00:43:40] Long tail

    [00:44:26] LLMs and society

    [00:45:36] Model bias

    [00:48:51] Language and capabilities

    [00:52:27] Ethical frameworks and RLHF





    Sara Hooker

    https://www.sarahooker.me/

    https://www.linkedin.com/in/sararosehooker/

    https://scholar.google.com/citations?user=2xy6h3sAAAAJ&hl=en

    https://x.com/sarahookr



    Interviewer: Tim Scarfe



    Refs



    The AI Language gap

    https://cohere.com/research/papers/the-AI-language-gap.pdf



    On the Limitations of Compute Thresholds as a Governance Strategy.

    https://arxiv.org/pdf/2407.05694v1



    The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

    https://arxiv.org/pdf/2406.18682



    Cohere Aya

    https://cohere.com/research/aya



    RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

    https://arxiv.org/pdf/2407.02552



    Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

    https://arxiv.org/pdf/2402.14740



    Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence

    https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/



    EU AI Act

    https://www.europarl.europa.eu/doceo/document/TA-9-2024-0138_EN.pdf



    The bitter lesson

    http://www.incompleteideas.net/IncIdeas/BitterLesson.html



    Neel Nanda interview

    https://www.youtube.com/watch?v=_Ygf0GnlwmY



    Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

    https://transformer-circuits.pub/2024/scaling-monosemanticity/



    Chollet's ARC challenge

    https://github.com/fchollet/ARC-AGI



    Ryan Greenblatt on ARC

    https://www.youtube.com/watch?v=z9j3wB1RRGA



    Disclaimer: This is the third video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview.

    • 1 hr 5 min
    Prof. Murray Shanahan - Machines Don't Think Like Us

    Prof. Murray Shanahan - Machines Don't Think Like Us

    Murray Shanahan is a professor of Cognitive Robotics at Imperial College London and a senior research scientist at DeepMind. He challenges our assumptions about AI consciousness and urges us to rethink how we talk about machine intelligence.



    We explore the dangers of anthropomorphizing AI, the limitations of current language in describing AI capabilities, and the fascinating intersection of philosophy and artificial intelligence.



    Show notes and full references: https://docs.google.com/document/d/1ICtBI574W-xGi8Z2ZtUNeKWiOiGZ_DRsp9EnyYAISws/edit?usp=sharing



    Prof Murray Shanahan:

    https://www.doc.ic.ac.uk/~mpsha/ (look at his selected publications)

    https://scholar.google.co.uk/citations?user=00bnGpAAAAAJ&hl=en

    https://en.wikipedia.org/wiki/Murray_Shanahan

    https://x.com/mpshanahan



    Interviewer: Dr. Tim Scarfe



    Refs (links in the Google doc linked above):

    Role play with large language models

    Waluigi effect

    "Conscious Exotica" - Paper by Murray Shanahan (2016)

    "Simulators" - Article by Janis from LessWrong

    "Embodiment and the Inner Life" - Book by Murray Shanahan (2010)

    "The Technological Singularity" - Book by Murray Shanahan (2015)

    "Simulacra as Conscious Exotica" - Paper by Murray Shanahan (newer paper of the original focussed on LLMs)

    A recent paper by Anthropic on using autoencoders to find features in language models (referring to the "Scaling Monosemanticity" paper)

    Work by Peter Godfrey-Smith on octopus consciousness

    "Metaphors We Live By" - Book by George Lakoff (1980s)

    Work by Aaron Sloman on the concept of "space of possible minds" (1984 article mentioned)

    Wittgenstein's "Philosophical Investigations" (posthumously published)

    Daniel Dennett's work on the "intentional stance"

    Alan Turing's original paper on the Turing Test (1950)

    Thomas Nagel's paper "What is it like to be a bat?" (1974)

    John Searle's Chinese Room Argument (mentioned but not detailed)

    Work by Richard Evans on tackling reasoning problems

    Claude Shannon's quote on knowledge and control

    "Are We Bodies or Souls?" - Book by Richard Swinburne

    Reference to work by Ethan Perez and others at Anthropic on potential deceptive behavior in language models

    Reference to a paper by Murray Shanahan and Antonia Creswell on the "selection inference framework"

    Mention of work by Francois Chollet, particularly the ARC (Abstraction and Reasoning Corpus) challenge

    Reference to Elizabeth Spelke's work on core knowledge in infants

    Mention of Karl Friston's work on planning as inference (active inference)

    The film "Ex Machina" - Murray Shanahan was the scientific advisor

    "The Waluigi Effect"

    Anthropic's constitutional AI approach

    Loom system by Lara Reynolds and Kyle McDonald for visualizing conversation trees

    DeepMind's AlphaGo (mentioned multiple times as an example)

    Mention of the "Golden Gate Claude" experiment

    Reference to an interview Tim Scarfe conducted with University of Toronto students about self-attention controllability theorem

    Mention of an interview with Irina Rish

    Reference to an interview Tim Scarfe conducted with Daniel Dennett

    Reference to an interview with Maria Santa Caterina

    Mention of an interview with Philip Goff

    Nick Chater and Martin Christianson's book ("The Language Game: How Improvisation Created Language and Changed the World")

    Peter Singer's work from 1975 on ascribing moral status to conscious beings

    Demis Hassabis' discussion on the "ladder of creativity"

    Reference to B.F. Skinner and behaviorism

    • 2 hr 15 min
    David Chalmers - Reality+

    David Chalmers - Reality+

    In the coming decades, the technology that enables virtual and augmented reality will improve beyond recognition. Within a century, world-renowned philosopher David J. Chalmers predicts, we will have virtual worlds that are impossible to distinguish from non-virtual worlds. But is virtual reality just escapism?



    In a highly original work of 'technophilosophy', Chalmers argues categorically, no: virtual reality is genuine reality. Virtual worlds are not second-class worlds. We can live a meaningful life in virtual reality - and increasingly, we will.



    What is reality, anyway? How can we lead a good life? Is there a god? How do we know there's an external world - and how do we know we're not living in a computer simulation? In Reality+, Chalmers conducts a grand tour of philosophy, using cutting-edge technology to provide invigorating new answers to age-old questions.



    David J. Chalmers is an Australian philosopher and cognitive scientist specializing in the areas of philosophy of mind and philosophy of language. He is Professor of Philosophy and Neural Science at New York University, as well as co-director of NYU's Center for Mind, Brain, and Consciousness. Chalmers is best known for his work on consciousness, including his formulation of the "hard problem of consciousness."



    Reality+: Virtual Worlds and the Problems of Philosophy

    https://amzn.to/3RYyGD2



    https://consc.net/

    https://x.com/davidchalmers42



    00:00:00 Reality+ Intro

    00:12:02 GPT conscious? 10/10

    00:14:19 The consciousness processor thought experiment (11/10)

    00:20:34 Intelligence and Consciousness entangled? 10/10

    00:22:44 Karl Friston / Meta Problem 10/10

    00:29:05 Knowledge argument / subjective experience (6/10)

    00:32:34 Emergence 11/10 (best chapter)

    00:42:45 Working with Douglas Hofstadter 10/10

    00:46:14 Intelligence is analogy making? 10/10

    00:50:47 Intelligence explosion 8/10

    00:58:44 Hypercomputation 10/10

    01:09:44 Who designed the designer? (7/10)

    01:13:57 Experience machine (7/10)

    • 1 hr 17 min
    Ryan Greenblatt - Solving ARC with GPT4o

    Ryan Greenblatt - Solving ARC with GPT4o

    Ryan Greenblatt from Redwood Research recently published "Getting 50% on ARC-AGI with GPT-4.0," where he used GPT4o to reach a state-of-the-art accuracy on Francois Chollet's ARC Challenge by generating many Python programs.



    Sponsor:

    Sign up to Kalshi here https://kalshi.onelink.me/1r91/mlst -- the first 500 traders who deposit $100 will get a free $20 credit! Important disclaimer - In case it's not obvious - this is basically gambling and a *high risk* activity - only trade what you can afford to lose.



    We discuss:

    - Ryan's unique approach to solving the ARC Challenge and achieving impressive results.

    - The strengths and weaknesses of current AI models.

    - How AI and humans differ in learning and reasoning.

    - Combining various techniques to create smarter AI systems.

    - The potential risks and future advancements in AI, including the idea of agentic AI.



    https://x.com/RyanPGreenblatt

    https://www.redwoodresearch.org/





    Refs:

    Getting 50% (SoTA) on ARC-AGI with GPT-4o [Ryan Greenblatt]

    https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt



    On the Measure of Intelligence [Chollet]

    https://arxiv.org/abs/1911.01547



    Connectionism and Cognitive Architecture: A Critical Analysis [Jerry A. Fodor and Zenon W. Pylyshyn]

    https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/proseminars/Proseminar13/ConnectionistArchitecture.pdf



    Software 2.0 [Andrej Karpathy]

    https://karpathy.medium.com/software-2-0-a64152b37c35



    Why Greatness Cannot Be Planned: The Myth of the Objective [Kenneth Stanley]

    https://amzn.to/3Wfy2E0



    Biographical account of Terence Tao’s mathematical development. [M.A.(KEN) CLEMENTS]

    https://gwern.net/doc/iq/high/smpy/1984-clements.pdf



    Model Evaluation and Threat Research (METR)

    https://metr.org/



    Why Tool AIs Want to Be Agent AIs

    https://gwern.net/tool-ai



    Simulators - Janus

    https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators



    AI Control: Improving Safety Despite Intentional Subversion

    https://www.lesswrong.com/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion

    https://arxiv.org/abs/2312.06942



    What a Compute-Centric Framework Says About Takeoff Speeds

    https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/



    Global GDP over the long run

    https://ourworldindata.org/grapher/global-gdp-over-the-long-run?yScale=log



    Safety Cases: How to Justify the Safety of Advanced AI Systems

    https://arxiv.org/abs/2403.10462



    The Danger of a “Safety Case"

    http://sunnyday.mit.edu/The-Danger-of-a-Safety-Case.pdf



    The Future Of Work Looks Like A UPS Truck (~02:15:50)

    https://www.npr.org/sections/money/2014/05/02/308640135/episode-536-the-future-of-work-looks-like-a-ups-truck



    SWE-bench

    https://www.swebench.com/



    Using DeepSpeed and Megatron to Train Megatron-Turing NLG

    530B, A Large-Scale Generative Language Model

    https://arxiv.org/pdf/2201.11990



    Algorithmic Progress in Language Models

    https://epochai.org/blog/algorithmic-progress-in-language-models

    • 2 hr 18 min
    Aiden Gomez - CEO of Cohere (AI's 'Inner Monologue' – Crucial for Reasoning)

    Aiden Gomez - CEO of Cohere (AI's 'Inner Monologue' – Crucial for Reasoning)

    Aidan Gomez, CEO of Cohere, reveals how they're tackling AI hallucinations and improving reasoning abilities. He also explains why Cohere doesn't use any output from GPT-4 for training their models.



    Aidan shares his personal insights into the world of AI and LLMs and Cohere's unique approach to solving real-world business problems, and how their models are set apart from the competition. Aidan reveals how they are making major strides in AI technology, discussing everything from last mile customer engineering to the robustness of prompts and future architectures.



    He also touches on the broader implications of AI for society, including potential risks and the role of regulation. He discusses Cohere's guiding principles and the health the of startup scene. With a particular focus on enterprise applications. Aidan provides a rare look into the internal workings of Cohere and their vision for driving productivity and innovation.



    https://cohere.com/

    https://x.com/aidangomez



    Check out Cohere's amazing new Command R* models here

    https://cohere.com/command



    Disclaimer: This is the second video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview.

    • 1 hr
    New "50%" ARC result and current winners interviewed

    New "50%" ARC result and current winners interviewed

    The ARC Challenge, created by Francois Chollet, tests how well AI systems can generalize from a few examples in a grid-based intelligence test. We interview the current winners of the ARC Challenge—Jack Cole, Mohammed Osman and their collaborator Michael Hodel. They discuss how they tackled ARC (Abstraction and Reasoning Corpus) using language models. We also discuss the new "50%" public set approach announced today from Redwood Research (Ryan Greenblatt).

    Jack and Mohammed explain their winning approach, which involves fine-tuning a language model on a large, specifically-generated dataset and then doing additional fine-tuning at test-time, a technique known in this context as "active inference". They use various strategies to represent the data for the language model and believe that with further improvements, the accuracy could reach above 50%. Michael talks about his work on generating new ARC-like tasks to help train the models.

    They also debate whether their methods stay true to the "spirit" of Chollet's measure of intelligence. Despite some concerns, they agree that their solutions are promising and adaptable for other similar problems.

    Note:
    Jack's team is still the current official winner at 33% on the private set. Ryan's entry is not on the private leaderboard or eligible.
    Chollet invented ARC in 2019 (not 2017 as stated)

    "Ryan's entry is not a new state of the art. We don't know exactly how well it does since it was only evaluated on 100 tasks from the evaluation set and does 50% on those, reportedly. Meanwhile Jacks team i.e. MindsAI's solution does 54% on the entire eval set and it is seemingly possible to do 60-70% with an ensemble"

    Jack Cole:
    https://x.com/Jcole75Cole
    https://lab42.global/community-interview-jack-cole/

    Mohamed Osman:
    Mohamed is looking to do a PhD in AI/ML, can you help him?
    Email: mothman198@outlook.com
    https://www.linkedin.com/in/mohamedosman1905/

    Michael Hodel:
    https://arxiv.org/pdf/2404.07353v1
    https://www.linkedin.com/in/michael-hodel/
    https://x.com/bayesilicon
    https://github.com/michaelhodel

    Getting 50% (SoTA) on ARC-AGI with GPT-4o - Ryan Greenblatt
    https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

    Neural networks for abstraction and reasoning: Towards broad generalization in machines [Mikel Bober-Irizar, Soumya Banerjee]
    https://arxiv.org/pdf/2402.03507

    Measure of intelligence:
    https://arxiv.org/abs/1911.01547

    YT version: https://youtu.be/jSAT_RuJ_Cg

    • 2 hr 14 min

Customer Reviews

4.8 out of 5
69 Ratings

69 Ratings

diamond bishop ,

Clear expert sharing with others

Worth listening to and learning. My only note is the Connor episodes can be skipped.

harryoekndn ,

Super informative!

A podcast that has truly changed my life over the past three years. Phenomenal guests, impeccable ideas.

Usability guy ,

Neel Nanda episode was fantastic

Adds to a strong catalog.

Top Podcasts In Technology

Acquired
Ben Gilbert and David Rosenthal
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Lex Fridman Podcast
Lex Fridman
Hard Fork
The New York Times
The Vergecast
The Verge
TED Radio Hour
NPR

You Might Also Like

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al
Alessio + swyx
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Sam Charrington
Dwarkesh Podcast
Dwarkesh Patel
Practical AI: Machine Learning, Data Science, LLM
Changelog Media
No Priors: Artificial Intelligence | Technology | Startups
Conviction | Pod People
Last Week in AI
Skynet Today