157 episodes

Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).

Machine Learning Street Talk (MLST‪)‬ Machine Learning Street Talk (MLST)

- Technology
- 4.8 • 69 Ratings

- JUL 18, 2024
Sara Hooker - Why US AI Act Compute Thresholds Are Misguided

Sara Hooker - Why US AI Act Compute Thresholds Are Misguided

Sara Hooker is VP of Research at Cohere and leader of Cohere for AI. We discuss her recent paper critiquing the use of compute thresholds, measured in FLOPs (floating point operations), as an AI governance strategy.

We explore why this approach, recently adopted in both US and EU AI policies, may be problematic and oversimplified. Sara explains the limitations of using raw computational power as a measure of AI capability or risk, and discusses the complex relationship between compute, data, and model architecture.

Equally important, we go into Sara's work on "The AI Language Gap." This research highlights the challenges and inequalities in developing AI systems that work across multiple languages. Sara discusses how current AI models, predominantly trained on English and a handful of high-resource languages, fail to serve the linguistic diversity of our global population. We explore the technical, ethical, and societal implications of this gap, and discuss potential solutions for creating more inclusive and representative AI systems.

We broadly discuss the relationship between language, culture, and AI capabilities, as well as the ethical considerations in AI development and deployment.

YT Version: https://youtu.be/dBZp47999Ko

TOC:

[00:00:00] Intro

[00:02:12] FLOPS paper

[00:26:42] Hardware lottery

[00:30:22] The Language gap

[00:33:25] Safety

[00:38:31] Emergent

[00:41:23] Creativity

[00:43:40] Long tail

[00:44:26] LLMs and society

[00:45:36] Model bias

[00:48:51] Language and capabilities

[00:52:27] Ethical frameworks and RLHF

Sara Hooker

https://www.sarahooker.me/

https://www.linkedin.com/in/sararosehooker/

https://scholar.google.com/citations?user=2xy6h3sAAAAJ&hl=en

https://x.com/sarahookr

Interviewer: Tim Scarfe

Refs

The AI Language gap

https://cohere.com/research/papers/the-AI-language-gap.pdf

On the Limitations of Compute Thresholds as a Governance Strategy.

https://arxiv.org/pdf/2407.05694v1

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

https://arxiv.org/pdf/2406.18682

Cohere Aya

https://cohere.com/research/aya

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

https://arxiv.org/pdf/2407.02552

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

https://arxiv.org/pdf/2402.14740

Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence

https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/

EU AI Act

https://www.europarl.europa.eu/doceo/document/TA-9-2024-0138_EN.pdf

The bitter lesson

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Neel Nanda interview

https://www.youtube.com/watch?v=_Ygf0GnlwmY

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

https://transformer-circuits.pub/2024/scaling-monosemanticity/

Chollet's ARC challenge

https://github.com/fchollet/ARC-AGI

Ryan Greenblatt on ARC

https://www.youtube.com/watch?v=z9j3wB1RRGA

Disclaimer: This is the third video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview.
- 1 hr 5 min
- JUL 14, 2024
Prof. Murray Shanahan - Machines Don't Think Like Us

Prof. Murray Shanahan - Machines Don't Think Like Us

Murray Shanahan is a professor of Cognitive Robotics at Imperial College London and a senior research scientist at DeepMind. He challenges our assumptions about AI consciousness and urges us to rethink how we talk about machine intelligence.

We explore the dangers of anthropomorphizing AI, the limitations of current language in describing AI capabilities, and the fascinating intersection of philosophy and artificial intelligence.

Show notes and full references: https://docs.google.com/document/d/1ICtBI574W-xGi8Z2ZtUNeKWiOiGZ_DRsp9EnyYAISws/edit?usp=sharing

Prof Murray Shanahan:

https://www.doc.ic.ac.uk/~mpsha/ (look at his selected publications)

https://scholar.google.co.uk/citations?user=00bnGpAAAAAJ&hl=en

https://en.wikipedia.org/wiki/Murray_Shanahan

https://x.com/mpshanahan

Interviewer: Dr. Tim Scarfe

Refs (links in the Google doc linked above):

Role play with large language models

Waluigi effect

"Conscious Exotica" - Paper by Murray Shanahan (2016)

"Simulators" - Article by Janis from LessWrong

"Embodiment and the Inner Life" - Book by Murray Shanahan (2010)

"The Technological Singularity" - Book by Murray Shanahan (2015)

"Simulacra as Conscious Exotica" - Paper by Murray Shanahan (newer paper of the original focussed on LLMs)

A recent paper by Anthropic on using autoencoders to find features in language models (referring to the "Scaling Monosemanticity" paper)

Work by Peter Godfrey-Smith on octopus consciousness

"Metaphors We Live By" - Book by George Lakoff (1980s)

Work by Aaron Sloman on the concept of "space of possible minds" (1984 article mentioned)

Wittgenstein's "Philosophical Investigations" (posthumously published)

Daniel Dennett's work on the "intentional stance"

Alan Turing's original paper on the Turing Test (1950)

Thomas Nagel's paper "What is it like to be a bat?" (1974)

John Searle's Chinese Room Argument (mentioned but not detailed)

Work by Richard Evans on tackling reasoning problems

Claude Shannon's quote on knowledge and control

"Are We Bodies or Souls?" - Book by Richard Swinburne

Reference to work by Ethan Perez and others at Anthropic on potential deceptive behavior in language models

Reference to a paper by Murray Shanahan and Antonia Creswell on the "selection inference framework"

Mention of work by Francois Chollet, particularly the ARC (Abstraction and Reasoning Corpus) challenge

Reference to Elizabeth Spelke's work on core knowledge in infants

Mention of Karl Friston's work on planning as inference (active inference)

The film "Ex Machina" - Murray Shanahan was the scientific advisor

"The Waluigi Effect"

Anthropic's constitutional AI approach

Loom system by Lara Reynolds and Kyle McDonald for visualizing conversation trees

DeepMind's AlphaGo (mentioned multiple times as an example)

Mention of the "Golden Gate Claude" experiment

Reference to an interview Tim Scarfe conducted with University of Toronto students about self-attention controllability theorem

Mention of an interview with Irina Rish

Reference to an interview Tim Scarfe conducted with Daniel Dennett

Reference to an interview with Maria Santa Caterina

Mention of an interview with Philip Goff

Nick Chater and Martin Christianson's book ("The Language Game: How Improvisation Created Language and Changed the World")

Peter Singer's work from 1975 on ascribing moral status to conscious beings

Demis Hassabis' discussion on the "ladder of creativity"

Reference to B.F. Skinner and behaviorism
- 2 hr 15 min
- JUL 8, 2024
David Chalmers - Reality+

David Chalmers - Reality+

In the coming decades, the technology that enables virtual and augmented reality will improve beyond recognition. Within a century, world-renowned philosopher David J. Chalmers predicts, we will have virtual worlds that are impossible to distinguish from non-virtual worlds. But is virtual reality just escapism?

In a highly original work of 'technophilosophy', Chalmers argues categorically, no: virtual reality is genuine reality. Virtual worlds are not second-class worlds. We can live a meaningful life in virtual reality - and increasingly, we will.

What is reality, anyway? How can we lead a good life? Is there a god? How do we know there's an external world - and how do we know we're not living in a computer simulation? In Reality+, Chalmers conducts a grand tour of philosophy, using cutting-edge technology to provide invigorating new answers to age-old questions.

David J. Chalmers is an Australian philosopher and cognitive scientist specializing in the areas of philosophy of mind and philosophy of language. He is Professor of Philosophy and Neural Science at New York University, as well as co-director of NYU's Center for Mind, Brain, and Consciousness. Chalmers is best known for his work on consciousness, including his formulation of the "hard problem of consciousness."

Reality+: Virtual Worlds and the Problems of Philosophy

https://amzn.to/3RYyGD2

https://consc.net/

https://x.com/davidchalmers42

00:00:00 Reality+ Intro

00:12:02 GPT conscious? 10/10

00:14:19 The consciousness processor thought experiment (11/10)

00:20:34 Intelligence and Consciousness entangled? 10/10

00:22:44 Karl Friston / Meta Problem 10/10

00:29:05 Knowledge argument / subjective experience (6/10)

00:32:34 Emergence 11/10 (best chapter)

00:42:45 Working with Douglas Hofstadter 10/10

00:46:14 Intelligence is analogy making? 10/10

00:50:47 Intelligence explosion 8/10

00:58:44 Hypercomputation 10/10

01:09:44 Who designed the designer? (7/10)

01:13:57 Experience machine (7/10)
- 1 hr 17 min
- JUL 6, 2024
Ryan Greenblatt - Solving ARC with GPT4o

Ryan Greenblatt - Solving ARC with GPT4o

Ryan Greenblatt from Redwood Research recently published "Getting 50% on ARC-AGI with GPT-4.0," where he used GPT4o to reach a state-of-the-art accuracy on Francois Chollet's ARC Challenge by generating many Python programs.

Sponsor:

Sign up to Kalshi here https://kalshi.onelink.me/1r91/mlst -- the first 500 traders who deposit $100 will get a free $20 credit! Important disclaimer - In case it's not obvious - this is basically gambling and a *high risk* activity - only trade what you can afford to lose.

We discuss:

- Ryan's unique approach to solving the ARC Challenge and achieving impressive results.

- The strengths and weaknesses of current AI models.

- How AI and humans differ in learning and reasoning.

- Combining various techniques to create smarter AI systems.

- The potential risks and future advancements in AI, including the idea of agentic AI.

https://x.com/RyanPGreenblatt

https://www.redwoodresearch.org/

Refs:

Getting 50% (SoTA) on ARC-AGI with GPT-4o [Ryan Greenblatt]

https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

On the Measure of Intelligence [Chollet]

https://arxiv.org/abs/1911.01547

Connectionism and Cognitive Architecture: A Critical Analysis [Jerry A. Fodor and Zenon W. Pylyshyn]

https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/proseminars/Proseminar13/ConnectionistArchitecture.pdf

Software 2.0 [Andrej Karpathy]

https://karpathy.medium.com/software-2-0-a64152b37c35

Why Greatness Cannot Be Planned: The Myth of the Objective [Kenneth Stanley]

https://amzn.to/3Wfy2E0

Biographical account of Terence Tao’s mathematical development. [M.A.(KEN) CLEMENTS]

https://gwern.net/doc/iq/high/smpy/1984-clements.pdf

Model Evaluation and Threat Research (METR)

https://metr.org/

Why Tool AIs Want to Be Agent AIs

https://gwern.net/tool-ai

Simulators - Janus

https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators

AI Control: Improving Safety Despite Intentional Subversion

https://www.lesswrong.com/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion

https://arxiv.org/abs/2312.06942

What a Compute-Centric Framework Says About Takeoff Speeds

https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/

Global GDP over the long run

https://ourworldindata.org/grapher/global-gdp-over-the-long-run?yScale=log

Safety Cases: How to Justify the Safety of Advanced AI Systems

https://arxiv.org/abs/2403.10462

The Danger of a “Safety Case"

http://sunnyday.mit.edu/The-Danger-of-a-Safety-Case.pdf

The Future Of Work Looks Like A UPS Truck (~02:15:50)

https://www.npr.org/sections/money/2014/05/02/308640135/episode-536-the-future-of-work-looks-like-a-ups-truck

SWE-bench

https://www.swebench.com/

Using DeepSpeed and Megatron to Train Megatron-Turing NLG

530B, A Large-Scale Generative Language Model

https://arxiv.org/pdf/2201.11990

Algorithmic Progress in Language Models

https://epochai.org/blog/algorithmic-progress-in-language-models
- 2 hr 18 min
- JUN 29, 2024
Aiden Gomez - CEO of Cohere (AI's 'Inner Monologue' – Crucial for Reasoning)

Aiden Gomez - CEO of Cohere (AI's 'Inner Monologue' – Crucial for Reasoning)

Aidan Gomez, CEO of Cohere, reveals how they're tackling AI hallucinations and improving reasoning abilities. He also explains why Cohere doesn't use any output from GPT-4 for training their models.

Aidan shares his personal insights into the world of AI and LLMs and Cohere's unique approach to solving real-world business problems, and how their models are set apart from the competition. Aidan reveals how they are making major strides in AI technology, discussing everything from last mile customer engineering to the robustness of prompts and future architectures.

He also touches on the broader implications of AI for society, including potential risks and the role of regulation. He discusses Cohere's guiding principles and the health the of startup scene. With a particular focus on enterprise applications. Aidan provides a rare look into the internal workings of Cohere and their vision for driving productivity and innovation.

https://cohere.com/

https://x.com/aidangomez

Check out Cohere's amazing new Command R* models here

https://cohere.com/command

Disclaimer: This is the second video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview.
- 1 hr
- JUN 18, 2024
New "50%" ARC result and current winners interviewed

New "50%" ARC result and current winners interviewed

The ARC Challenge, created by Francois Chollet, tests how well AI systems can generalize from a few examples in a grid-based intelligence test. We interview the current winners of the ARC Challenge—Jack Cole, Mohammed Osman and their collaborator Michael Hodel. They discuss how they tackled ARC (Abstraction and Reasoning Corpus) using language models. We also discuss the new "50%" public set approach announced today from Redwood Research (Ryan Greenblatt).

Jack and Mohammed explain their winning approach, which involves fine-tuning a language model on a large, specifically-generated dataset and then doing additional fine-tuning at test-time, a technique known in this context as "active inference". They use various strategies to represent the data for the language model and believe that with further improvements, the accuracy could reach above 50%. Michael talks about his work on generating new ARC-like tasks to help train the models.

They also debate whether their methods stay true to the "spirit" of Chollet's measure of intelligence. Despite some concerns, they agree that their solutions are promising and adaptable for other similar problems.

Note:
Jack's team is still the current official winner at 33% on the private set. Ryan's entry is not on the private leaderboard or eligible.
Chollet invented ARC in 2019 (not 2017 as stated)

"Ryan's entry is not a new state of the art. We don't know exactly how well it does since it was only evaluated on 100 tasks from the evaluation set and does 50% on those, reportedly. Meanwhile Jacks team i.e. MindsAI's solution does 54% on the entire eval set and it is seemingly possible to do 60-70% with an ensemble"

Jack Cole:
https://x.com/Jcole75Cole
https://lab42.global/community-interview-jack-cole/

Mohamed Osman:
Mohamed is looking to do a PhD in AI/ML, can you help him?
Email: mothman198@outlook.com
https://www.linkedin.com/in/mohamedosman1905/

Michael Hodel:
https://arxiv.org/pdf/2404.07353v1
https://www.linkedin.com/in/michael-hodel/
https://x.com/bayesilicon
https://github.com/michaelhodel

Getting 50% (SoTA) on ARC-AGI with GPT-4o - Ryan Greenblatt
https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

Neural networks for abstraction and reasoning: Towards broad generalization in machines [Mikel Bober-Irizar, Soumya Banerjee]
https://arxiv.org/pdf/2402.03507

Measure of intelligence:
https://arxiv.org/abs/1911.01547

YT version: https://youtu.be/jSAT_RuJ_Cg
- 2 hr 14 min