Machine Learning Street Talk (MLST) Machine Learning Street Talk (MLST)
-
- Technology
-
Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).
-
Sara Hooker - Why US AI Act Compute Thresholds Are Misguided
Sara Hooker is VP of Research at Cohere and leader of Cohere for AI. We discuss her recent paper critiquing the use of compute thresholds, measured in FLOPs (floating point operations), as an AI governance strategy.
We explore why this approach, recently adopted in both US and EU AI policies, may be problematic and oversimplified. Sara explains the limitations of using raw computational power as a measure of AI capability or risk, and discusses the complex relationship between compute, data, and model architecture.
Equally important, we go into Sara's work on "The AI Language Gap." This research highlights the challenges and inequalities in developing AI systems that work across multiple languages. Sara discusses how current AI models, predominantly trained on English and a handful of high-resource languages, fail to serve the linguistic diversity of our global population. We explore the technical, ethical, and societal implications of this gap, and discuss potential solutions for creating more inclusive and representative AI systems.
We broadly discuss the relationship between language, culture, and AI capabilities, as well as the ethical considerations in AI development and deployment.
YT Version: https://youtu.be/dBZp47999Ko
TOC:
[00:00:00] Intro
[00:02:12] FLOPS paper
[00:26:42] Hardware lottery
[00:30:22] The Language gap
[00:33:25] Safety
[00:38:31] Emergent
[00:41:23] Creativity
[00:43:40] Long tail
[00:44:26] LLMs and society
[00:45:36] Model bias
[00:48:51] Language and capabilities
[00:52:27] Ethical frameworks and RLHF
Sara Hooker
https://www.sarahooker.me/
https://www.linkedin.com/in/sararosehooker/
https://scholar.google.com/citations?user=2xy6h3sAAAAJ&hl=en
https://x.com/sarahookr
Interviewer: Tim Scarfe
Refs
The AI Language gap
https://cohere.com/research/papers/the-AI-language-gap.pdf
On the Limitations of Compute Thresholds as a Governance Strategy.
https://arxiv.org/pdf/2407.05694v1
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm
https://arxiv.org/pdf/2406.18682
Cohere Aya
https://cohere.com/research/aya
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
https://arxiv.org/pdf/2407.02552
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
https://arxiv.org/pdf/2402.14740
Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence
https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/
EU AI Act
https://www.europarl.europa.eu/doceo/document/TA-9-2024-0138_EN.pdf
The bitter lesson
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Neel Nanda interview
https://www.youtube.com/watch?v=_Ygf0GnlwmY
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
https://transformer-circuits.pub/2024/scaling-monosemanticity/
Chollet's ARC challenge
https://github.com/fchollet/ARC-AGI
Ryan Greenblatt on ARC
https://www.youtube.com/watch?v=z9j3wB1RRGA
Disclaimer: This is the third video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview. -
Prof. Murray Shanahan - Machines Don't Think Like Us
Murray Shanahan is a professor of Cognitive Robotics at Imperial College London and a senior research scientist at DeepMind. He challenges our assumptions about AI consciousness and urges us to rethink how we talk about machine intelligence.
We explore the dangers of anthropomorphizing AI, the limitations of current language in describing AI capabilities, and the fascinating intersection of philosophy and artificial intelligence.
Show notes and full references: https://docs.google.com/document/d/1ICtBI574W-xGi8Z2ZtUNeKWiOiGZ_DRsp9EnyYAISws/edit?usp=sharing
Prof Murray Shanahan:
https://www.doc.ic.ac.uk/~mpsha/ (look at his selected publications)
https://scholar.google.co.uk/citations?user=00bnGpAAAAAJ&hl=en
https://en.wikipedia.org/wiki/Murray_Shanahan
https://x.com/mpshanahan
Interviewer: Dr. Tim Scarfe
Refs (links in the Google doc linked above):
Role play with large language models
Waluigi effect
"Conscious Exotica" - Paper by Murray Shanahan (2016)
"Simulators" - Article by Janis from LessWrong
"Embodiment and the Inner Life" - Book by Murray Shanahan (2010)
"The Technological Singularity" - Book by Murray Shanahan (2015)
"Simulacra as Conscious Exotica" - Paper by Murray Shanahan (newer paper of the original focussed on LLMs)
A recent paper by Anthropic on using autoencoders to find features in language models (referring to the "Scaling Monosemanticity" paper)
Work by Peter Godfrey-Smith on octopus consciousness
"Metaphors We Live By" - Book by George Lakoff (1980s)
Work by Aaron Sloman on the concept of "space of possible minds" (1984 article mentioned)
Wittgenstein's "Philosophical Investigations" (posthumously published)
Daniel Dennett's work on the "intentional stance"
Alan Turing's original paper on the Turing Test (1950)
Thomas Nagel's paper "What is it like to be a bat?" (1974)
John Searle's Chinese Room Argument (mentioned but not detailed)
Work by Richard Evans on tackling reasoning problems
Claude Shannon's quote on knowledge and control
"Are We Bodies or Souls?" - Book by Richard Swinburne
Reference to work by Ethan Perez and others at Anthropic on potential deceptive behavior in language models
Reference to a paper by Murray Shanahan and Antonia Creswell on the "selection inference framework"
Mention of work by Francois Chollet, particularly the ARC (Abstraction and Reasoning Corpus) challenge
Reference to Elizabeth Spelke's work on core knowledge in infants
Mention of Karl Friston's work on planning as inference (active inference)
The film "Ex Machina" - Murray Shanahan was the scientific advisor
"The Waluigi Effect"
Anthropic's constitutional AI approach
Loom system by Lara Reynolds and Kyle McDonald for visualizing conversation trees
DeepMind's AlphaGo (mentioned multiple times as an example)
Mention of the "Golden Gate Claude" experiment
Reference to an interview Tim Scarfe conducted with University of Toronto students about self-attention controllability theorem
Mention of an interview with Irina Rish
Reference to an interview Tim Scarfe conducted with Daniel Dennett
Reference to an interview with Maria Santa Caterina
Mention of an interview with Philip Goff
Nick Chater and Martin Christianson's book ("The Language Game: How Improvisation Created Language and Changed the World")
Peter Singer's work from 1975 on ascribing moral status to conscious beings
Demis Hassabis' discussion on the "ladder of creativity"
Reference to B.F. Skinner and behaviorism -
David Chalmers - Reality+
In the coming decades, the technology that enables virtual and augmented reality will improve beyond recognition. Within a century, world-renowned philosopher David J. Chalmers predicts, we will have virtual worlds that are impossible to distinguish from non-virtual worlds. But is virtual reality just escapism?
In a highly original work of 'technophilosophy', Chalmers argues categorically, no: virtual reality is genuine reality. Virtual worlds are not second-class worlds. We can live a meaningful life in virtual reality - and increasingly, we will.
What is reality, anyway? How can we lead a good life? Is there a god? How do we know there's an external world - and how do we know we're not living in a computer simulation? In Reality+, Chalmers conducts a grand tour of philosophy, using cutting-edge technology to provide invigorating new answers to age-old questions.
David J. Chalmers is an Australian philosopher and cognitive scientist specializing in the areas of philosophy of mind and philosophy of language. He is Professor of Philosophy and Neural Science at New York University, as well as co-director of NYU's Center for Mind, Brain, and Consciousness. Chalmers is best known for his work on consciousness, including his formulation of the "hard problem of consciousness."
Reality+: Virtual Worlds and the Problems of Philosophy
https://amzn.to/3RYyGD2
https://consc.net/
https://x.com/davidchalmers42
00:00:00 Reality+ Intro
00:12:02 GPT conscious? 10/10
00:14:19 The consciousness processor thought experiment (11/10)
00:20:34 Intelligence and Consciousness entangled? 10/10
00:22:44 Karl Friston / Meta Problem 10/10
00:29:05 Knowledge argument / subjective experience (6/10)
00:32:34 Emergence 11/10 (best chapter)
00:42:45 Working with Douglas Hofstadter 10/10
00:46:14 Intelligence is analogy making? 10/10
00:50:47 Intelligence explosion 8/10
00:58:44 Hypercomputation 10/10
01:09:44 Who designed the designer? (7/10)
01:13:57 Experience machine (7/10) -
Ryan Greenblatt - Solving ARC with GPT4o
Ryan Greenblatt from Redwood Research recently published "Getting 50% on ARC-AGI with GPT-4.0," where he used GPT4o to reach a state-of-the-art accuracy on Francois Chollet's ARC Challenge by generating many Python programs.
Sponsor:
Sign up to Kalshi here https://kalshi.onelink.me/1r91/mlst -- the first 500 traders who deposit $100 will get a free $20 credit! Important disclaimer - In case it's not obvious - this is basically gambling and a *high risk* activity - only trade what you can afford to lose.
We discuss:
- Ryan's unique approach to solving the ARC Challenge and achieving impressive results.
- The strengths and weaknesses of current AI models.
- How AI and humans differ in learning and reasoning.
- Combining various techniques to create smarter AI systems.
- The potential risks and future advancements in AI, including the idea of agentic AI.
https://x.com/RyanPGreenblatt
https://www.redwoodresearch.org/
Refs:
Getting 50% (SoTA) on ARC-AGI with GPT-4o [Ryan Greenblatt]
https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt
On the Measure of Intelligence [Chollet]
https://arxiv.org/abs/1911.01547
Connectionism and Cognitive Architecture: A Critical Analysis [Jerry A. Fodor and Zenon W. Pylyshyn]
https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/proseminars/Proseminar13/ConnectionistArchitecture.pdf
Software 2.0 [Andrej Karpathy]
https://karpathy.medium.com/software-2-0-a64152b37c35
Why Greatness Cannot Be Planned: The Myth of the Objective [Kenneth Stanley]
https://amzn.to/3Wfy2E0
Biographical account of Terence Tao’s mathematical development. [M.A.(KEN) CLEMENTS]
https://gwern.net/doc/iq/high/smpy/1984-clements.pdf
Model Evaluation and Threat Research (METR)
https://metr.org/
Why Tool AIs Want to Be Agent AIs
https://gwern.net/tool-ai
Simulators - Janus
https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators
AI Control: Improving Safety Despite Intentional Subversion
https://www.lesswrong.com/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion
https://arxiv.org/abs/2312.06942
What a Compute-Centric Framework Says About Takeoff Speeds
https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/
Global GDP over the long run
https://ourworldindata.org/grapher/global-gdp-over-the-long-run?yScale=log
Safety Cases: How to Justify the Safety of Advanced AI Systems
https://arxiv.org/abs/2403.10462
The Danger of a “Safety Case"
http://sunnyday.mit.edu/The-Danger-of-a-Safety-Case.pdf
The Future Of Work Looks Like A UPS Truck (~02:15:50)
https://www.npr.org/sections/money/2014/05/02/308640135/episode-536-the-future-of-work-looks-like-a-ups-truck
SWE-bench
https://www.swebench.com/
Using DeepSpeed and Megatron to Train Megatron-Turing NLG
530B, A Large-Scale Generative Language Model
https://arxiv.org/pdf/2201.11990
Algorithmic Progress in Language Models
https://epochai.org/blog/algorithmic-progress-in-language-models -
Aiden Gomez - CEO of Cohere (AI's 'Inner Monologue' – Crucial for Reasoning)
Aidan Gomez, CEO of Cohere, reveals how they're tackling AI hallucinations and improving reasoning abilities. He also explains why Cohere doesn't use any output from GPT-4 for training their models.
Aidan shares his personal insights into the world of AI and LLMs and Cohere's unique approach to solving real-world business problems, and how their models are set apart from the competition. Aidan reveals how they are making major strides in AI technology, discussing everything from last mile customer engineering to the robustness of prompts and future architectures.
He also touches on the broader implications of AI for society, including potential risks and the role of regulation. He discusses Cohere's guiding principles and the health the of startup scene. With a particular focus on enterprise applications. Aidan provides a rare look into the internal workings of Cohere and their vision for driving productivity and innovation.
https://cohere.com/
https://x.com/aidangomez
Check out Cohere's amazing new Command R* models here
https://cohere.com/command
Disclaimer: This is the second video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview. -
New "50%" ARC result and current winners interviewed
The ARC Challenge, created by Francois Chollet, tests how well AI systems can generalize from a few examples in a grid-based intelligence test. We interview the current winners of the ARC Challenge—Jack Cole, Mohammed Osman and their collaborator Michael Hodel. They discuss how they tackled ARC (Abstraction and Reasoning Corpus) using language models. We also discuss the new "50%" public set approach announced today from Redwood Research (Ryan Greenblatt).
Jack and Mohammed explain their winning approach, which involves fine-tuning a language model on a large, specifically-generated dataset and then doing additional fine-tuning at test-time, a technique known in this context as "active inference". They use various strategies to represent the data for the language model and believe that with further improvements, the accuracy could reach above 50%. Michael talks about his work on generating new ARC-like tasks to help train the models.
They also debate whether their methods stay true to the "spirit" of Chollet's measure of intelligence. Despite some concerns, they agree that their solutions are promising and adaptable for other similar problems.
Note:
Jack's team is still the current official winner at 33% on the private set. Ryan's entry is not on the private leaderboard or eligible.
Chollet invented ARC in 2019 (not 2017 as stated)
"Ryan's entry is not a new state of the art. We don't know exactly how well it does since it was only evaluated on 100 tasks from the evaluation set and does 50% on those, reportedly. Meanwhile Jacks team i.e. MindsAI's solution does 54% on the entire eval set and it is seemingly possible to do 60-70% with an ensemble"
Jack Cole:
https://x.com/Jcole75Cole
https://lab42.global/community-interview-jack-cole/
Mohamed Osman:
Mohamed is looking to do a PhD in AI/ML, can you help him?
Email: mothman198@outlook.com
https://www.linkedin.com/in/mohamedosman1905/
Michael Hodel:
https://arxiv.org/pdf/2404.07353v1
https://www.linkedin.com/in/michael-hodel/
https://x.com/bayesilicon
https://github.com/michaelhodel
Getting 50% (SoTA) on ARC-AGI with GPT-4o - Ryan Greenblatt
https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt
Neural networks for abstraction and reasoning: Towards broad generalization in machines [Mikel Bober-Irizar, Soumya Banerjee]
https://arxiv.org/pdf/2402.03507
Measure of intelligence:
https://arxiv.org/abs/1911.01547
YT version: https://youtu.be/jSAT_RuJ_Cg
Customer Reviews
Clear expert sharing with others
Worth listening to and learning. My only note is the Connor episodes can be skipped.
Super informative!
A podcast that has truly changed my life over the past three years. Phenomenal guests, impeccable ideas.
Neel Nanda episode was fantastic
Adds to a strong catalog.