46 episodes

About AI progress. You can watch the video recordings and check out the transcripts at theinsideview.ai

The Inside View Michaël Trazzi

    • Technology

About AI progress. You can watch the video recordings and check out the transcripts at theinsideview.ai

    Kellin Pelrine on beating the strongest go AI

    Kellin Pelrine on beating the strongest go AI

    Youtube: https://youtu.be/_ANvfMblakQ

    Part 1 (about the paper): https://youtu.be/Tip1Ztjd-so

    Paper: https://arxiv.org/pdf/2211.00241

    Patreon: https://www.patreon.com/theinsideview

    • 18 min
    Paul Christiano's views on "doom" (ft. Robert Miles)

    Paul Christiano's views on "doom" (ft. Robert Miles)

    Youtube: https://youtu.be/JXYcLQItZsk

    Paul Christiano's post: https://www.lesswrong.com/posts/xWMqsvHapP3nwdSW8/my-views-on-doom

    • 4 min
    Neel Nanda on mechanistic interpretability, superposition and grokking

    Neel Nanda on mechanistic interpretability, superposition and grokking

    Neel Nanda is a researcher at Google DeepMind working on mechanistic interpretability. He is also known for his YouTube channel where he explains what is going on inside of neural networks to a large audience.

    In this conversation, we discuss what is mechanistic interpretability, how Neel got into it, his research methodology, his advice for people who want to get started, but also papers around superposition, toy models of universality and grokking, among other things.

    Youtube: https://youtu.be/cVBGjhN4-1g

    Transcript: https://theinsideview.ai/neel


    (00:00) Intro

    (00:57) Why Neel Started Doing Walkthroughs Of Papers On Youtube

    (07:59) Induction Heads, Or Why Nanda Comes After Neel

    (12:19) Detecting Induction Heads In Basically Every Model

    (14:35) How Neel Got Into Mechanistic Interpretability

    (16:22) Neel's Journey Into Alignment

    (22:09) Enjoying Mechanistic Interpretability And Being Good At It Are The Main Multipliers

    (24:49) How Is AI Alignment Work At DeepMind?

    (25:46) Scalable Oversight

    (28:30) Most Ambitious Degree Of Interpretability With Current Transformer Architectures

    (31:05) To Understand Neel's Methodology, Watch The Research Walkthroughs

    (32:23) Three Modes Of Research: Confirming, Red Teaming And Gaining Surface Area

    (34:58) You Can Be Both Hypothesis Driven And Capable Of Being Surprised

    (36:51) You Need To Be Able To Generate Multiple Hypothesis Before Getting Started

    (37:55) All the theory is b******t without empirical evidence and it's overall dignified to make the mechanistic interpretability bet

    (40:11) Mechanistic interpretability is alien neuroscience for truth seeking biologists in a world of math

    (42:12) Actually, Othello-GPT Has A Linear Emergent World Representation

    (45:08) You Need To Use Simple Probes That Don't Do Any Computation To Prove The Model Actually Knows Something

    (47:29) The Mechanistic Interpretability Researcher Mindset

    (49:49) The Algorithms Learned By Models Might Or Might Not Be Universal

    (51:49) On The Importance Of Being Truth Seeking And Skeptical

    (54:18) The Linear Representation Hypothesis: Linear Representations Are The Right Abstractions

    (00:57:26) Superposition Is How Models Compress Information

    (01:00:15) The Polysemanticity Problem: Neurons Are Not Meaningful

    (01:05:42) Superposition and Interference are at the Frontier of the Field of Mechanistic Interpretability

    (01:07:33) Finding Neurons in a Haystack: Superposition Through De-Tokenization And Compound Word Detectors

    (01:09:03) Not Being Able to Be Both Blood Pressure and Social Security Number at the Same Time Is Prime Real Estate for Superposition

    (01:15:02) The Two Differences Of Superposition: Computational And Representational

    (01:18:07) Toy Models Of Superposition

    (01:25:39) How Mentoring Nine People at Once Through SERI MATS Helped Neel's Research

    (01:31:25) The Backstory Behind Toy Models of Universality

    (01:35:19) From Modular Addition To Permutation Groups

    (01:38:52) The Model Needs To Learn Modular Addition On A Finite Number Of Token Inputs

    (01:41:54) Why Is The Paper Called Toy Model Of Universality

    (01:46:16) Progress Measures For Grokking Via Mechanistic Interpretability, Circuit Formation

    (01:52:45) Getting Started In Mechanistic Interpretability And Which WalkthroughS To Start With

    (01:56:15) Why Does Mechanistic Interpretability Matter From an Alignment Perspective

    (01:58:41) How Detection Deception With Mechanistic Interpretability Compares to Collin Burns' Work

    (02:01:20) Final Words From Neel

    • 2 hr 4 min
    Joscha Bach on how to stop worrying and love AI

    Joscha Bach on how to stop worrying and love AI

    Joscha Bach (who defines himself as an AI researcher/cognitive scientist) has recently been debating existential risk from AI with Connor Leahy (previous guest of the podcast), and since their conversation was quite short I wanted to continue the debate in more depth.

    The resulting conversation ended up being quite long (over 3h of recording), with a lot of tangents, but I think this gives a somewhat better overview of Joscha’s views on AI risk than other similar interviews. We also discussed a lot of other topics, that you can find in the outline below.

    A raw version of this interview was published on Patreon about three weeks ago. To support the channel and have access to early previews, you can subscribe here: https://www.patreon.com/theinsideview

    Youtube: ⁠https://youtu.be/YeXHQts3xYM⁠

    Transcript: https://theinsideview.ai/joscha

    Host: https://twitter.com/MichaelTrazzi

    Joscha: https://twitter.com/Plinz


    (00:00) Intro

    (00:57) Why Barbie Is Better Than Oppenheimer

    (08:55) The relationship between nuclear weapons and AI x-risk

    (12:51) Global warming and the limits to growth

    (20:24) Joscha’s reaction to the AI Political compass memes

    (23:53) On Uploads, Identity and Death

    (33:06) The Endgame: Playing The Longest Possible Game Given A Superposition Of Futures

    (37:31) On the evidence of delaying technology leading to better outcomes

    (40:49) Humanity is in locust mode

    (44:11) Scenarios in which Joscha would delay AI

    (48:04) On the dangers of AI regulation

    (55:34) From longtermist doomer who thinks AGI is good to 6x6 political compass

    (01:00:08) Joscha believes in god in the same sense as he believes in personal selves

    (01:05:45) The transition from cyanobacterium to photosynthesis as an allegory for technological revolutions

    (01:17:46) What Joscha would do as Aragorn in Middle-Earth

    (01:25:20) The endgame of brain computer interfaces is to liberate our minds and embody thinking molecules

    (01:28:50) Transcending politics and aligning humanity

    (01:35:53) On the feasibility of starting an AGI lab in 2023

    (01:43:19) Why green teaming is necessary for ethics

    (01:59:27) Joscha's Response to Connor Leahy on "if you don't do that, you die Joscha. You die"

    (02:07:54) Aligning with the agent playing the longest game

    (02:15:39) Joscha’s response to Connor on morality

    (02:19:06) Caring about mindchildren and actual children equally

    (02:20:54) On finding the function that generates human values

    (02:28:54) Twitter And Reddit Questions: Joscha’s AGI timelines and p(doom)

    (02:35:16) Why European AI regulations are bad for AI research

    (02:38:13) What regulation would Joscha Bach pass as president of the US

    (02:40:16) Is Open Source still beneficial today?

    (02:42:26) How to make sure that AI loves humanity

    (02:47:42) The movie Joscha would want to live in

    (02:50:06) Closing message for the audience

    • 2 hr 54 min
    Erik Jones on Automatically Auditing Large Language Models

    Erik Jones on Automatically Auditing Large Language Models

    Erik is a Phd at Berkeley working with Jacob Steinhardt, interested in making generative machine learning systems more robust, reliable, and aligned, with a focus on large language models.In this interview we talk about his paper "Automatically Auditing Large Language Models via Discrete Optimization" that he presented at ICML.

    Youtube: https://youtu.be/bhE5Zs3Y1n8

    Paper: https://arxiv.org/abs/2303.04381

    Erik: https://twitter.com/ErikJones313

    Host: https://twitter.com/MichaelTrazzi

    Patreon: https://www.patreon.com/theinsideview


    00:00 Highlights

    00:31 Eric's background and research in Berkeley

    01:19 Motivation for doing safety research on language models

    02:56 Is it too easy to fool today's language models?

    03:31 The goal of adversarial attacks on language models

    04:57 Automatically Auditing Large Language Models via Discrete Optimization

    06:01 Optimizing over a finite set of tokens rather than continuous embeddings

    06:44 Goal is revealing behaviors, not necessarily breaking the AI

    07:51 On the feasibility of solving adversarial attacks

    09:18 Suppressing dangerous knowledge vs just bypassing safety filters

    10:35 Can you really ask a language model to cook meth?

    11:48 Optimizing French to English translation example

    13:07 Forcing toxic celebrity outputs just to test rare behaviors

    13:19 Testing the method on GPT-2 and GPT-J

    14:03 Adversarial prompts transferred to GPT-3 as well

    14:39 How this auditing research fits into the broader AI safety field

    15:49 Need for automated tools to audit failures beyond what humans can find

    17:47 Auditing to avoid unsafe deployments, not for existential risk reduction

    18:41 Adaptive auditing that updates based on the model's outputs

    19:54 Prospects for using these methods to detect model deception

    22:26 Prefer safety via alignment over just auditing constraints, Closing thoughts

    Patreon supporters:

    Tassilo Neubauer
    Alexey Malafeev
    Jack Seroy
    JJ Hepburn
    Max Chiswick
    William Freire
    Edward Huff
    Gunnar Höglund
    Ryan Coppolo
    Cameron Holmes
    Emil Wallner
    Jesse Hoogland
    Jacques Thibodeau
    Vincent Weisser

    • 22 min
    Dylan Patel on the GPU Shortage, Nvidia and the Deep Learning Supply Chain

    Dylan Patel on the GPU Shortage, Nvidia and the Deep Learning Supply Chain

    Dylan Patel is Chief Analyst at SemiAnalysis a boutique semiconductor research and consulting firm specializing in the semiconductor supply chain from chemical inputs to fabs to design IP and strategy. The SemiAnalysis substack has ~50,000 subscribers and is the second biggest tech substack in the world. In this interview we discuss the current GPU shortage, why hardware is a multi-month process, the deep learning hardware supply chain and Nvidia's strategy.

    Youtube: https://youtu.be/VItz2oEq5pA

    Transcript: https://theinsideview.ai/dylan

    • 12 min

Top Podcasts In Technology

The New York Times
Boston Consulting Group BCG
Lex Fridman
Ben Gilbert and David Rosenthal
Jason Calacanis

You Might Also Like

Dwarkesh Patel
Spencer Greenberg
Lex Fridman
Erik Torenberg
Erik Torenberg, Nathan Labenz
This is 42