
46 episodes

The Inside View Michaël Trazzi
-
- Technology
About AI progress. You can watch the video recordings and check out the transcripts at theinsideview.ai
-
Kellin Pelrine on beating the strongest go AI
Youtube: https://youtu.be/_ANvfMblakQ
Part 1 (about the paper): https://youtu.be/Tip1Ztjd-so
Paper: https://arxiv.org/pdf/2211.00241
Patreon: https://www.patreon.com/theinsideview -
Paul Christiano's views on "doom" (ft. Robert Miles)
Youtube: https://youtu.be/JXYcLQItZsk
Paul Christiano's post: https://www.lesswrong.com/posts/xWMqsvHapP3nwdSW8/my-views-on-doom -
Neel Nanda on mechanistic interpretability, superposition and grokking
Neel Nanda is a researcher at Google DeepMind working on mechanistic interpretability. He is also known for his YouTube channel where he explains what is going on inside of neural networks to a large audience.
In this conversation, we discuss what is mechanistic interpretability, how Neel got into it, his research methodology, his advice for people who want to get started, but also papers around superposition, toy models of universality and grokking, among other things.
Youtube: https://youtu.be/cVBGjhN4-1g
Transcript: https://theinsideview.ai/neel
OUTLINE
(00:00) Intro
(00:57) Why Neel Started Doing Walkthroughs Of Papers On Youtube
(07:59) Induction Heads, Or Why Nanda Comes After Neel
(12:19) Detecting Induction Heads In Basically Every Model
(14:35) How Neel Got Into Mechanistic Interpretability
(16:22) Neel's Journey Into Alignment
(22:09) Enjoying Mechanistic Interpretability And Being Good At It Are The Main Multipliers
(24:49) How Is AI Alignment Work At DeepMind?
(25:46) Scalable Oversight
(28:30) Most Ambitious Degree Of Interpretability With Current Transformer Architectures
(31:05) To Understand Neel's Methodology, Watch The Research Walkthroughs
(32:23) Three Modes Of Research: Confirming, Red Teaming And Gaining Surface Area
(34:58) You Can Be Both Hypothesis Driven And Capable Of Being Surprised
(36:51) You Need To Be Able To Generate Multiple Hypothesis Before Getting Started
(37:55) All the theory is b******t without empirical evidence and it's overall dignified to make the mechanistic interpretability bet
(40:11) Mechanistic interpretability is alien neuroscience for truth seeking biologists in a world of math
(42:12) Actually, Othello-GPT Has A Linear Emergent World Representation
(45:08) You Need To Use Simple Probes That Don't Do Any Computation To Prove The Model Actually Knows Something
(47:29) The Mechanistic Interpretability Researcher Mindset
(49:49) The Algorithms Learned By Models Might Or Might Not Be Universal
(51:49) On The Importance Of Being Truth Seeking And Skeptical
(54:18) The Linear Representation Hypothesis: Linear Representations Are The Right Abstractions
(00:57:26) Superposition Is How Models Compress Information
(01:00:15) The Polysemanticity Problem: Neurons Are Not Meaningful
(01:05:42) Superposition and Interference are at the Frontier of the Field of Mechanistic Interpretability
(01:07:33) Finding Neurons in a Haystack: Superposition Through De-Tokenization And Compound Word Detectors
(01:09:03) Not Being Able to Be Both Blood Pressure and Social Security Number at the Same Time Is Prime Real Estate for Superposition
(01:15:02) The Two Differences Of Superposition: Computational And Representational
(01:18:07) Toy Models Of Superposition
(01:25:39) How Mentoring Nine People at Once Through SERI MATS Helped Neel's Research
(01:31:25) The Backstory Behind Toy Models of Universality
(01:35:19) From Modular Addition To Permutation Groups
(01:38:52) The Model Needs To Learn Modular Addition On A Finite Number Of Token Inputs
(01:41:54) Why Is The Paper Called Toy Model Of Universality
(01:46:16) Progress Measures For Grokking Via Mechanistic Interpretability, Circuit Formation
(01:52:45) Getting Started In Mechanistic Interpretability And Which WalkthroughS To Start With
(01:56:15) Why Does Mechanistic Interpretability Matter From an Alignment Perspective
(01:58:41) How Detection Deception With Mechanistic Interpretability Compares to Collin Burns' Work
(02:01:20) Final Words From Neel -
Joscha Bach on how to stop worrying and love AI
Joscha Bach (who defines himself as an AI researcher/cognitive scientist) has recently been debating existential risk from AI with Connor Leahy (previous guest of the podcast), and since their conversation was quite short I wanted to continue the debate in more depth.
The resulting conversation ended up being quite long (over 3h of recording), with a lot of tangents, but I think this gives a somewhat better overview of Joscha’s views on AI risk than other similar interviews. We also discussed a lot of other topics, that you can find in the outline below.
A raw version of this interview was published on Patreon about three weeks ago. To support the channel and have access to early previews, you can subscribe here: https://www.patreon.com/theinsideview
Youtube: https://youtu.be/YeXHQts3xYM
Transcript: https://theinsideview.ai/joscha
Host: https://twitter.com/MichaelTrazzi
Joscha: https://twitter.com/Plinz
OUTLINE
(00:00) Intro
(00:57) Why Barbie Is Better Than Oppenheimer
(08:55) The relationship between nuclear weapons and AI x-risk
(12:51) Global warming and the limits to growth
(20:24) Joscha’s reaction to the AI Political compass memes
(23:53) On Uploads, Identity and Death
(33:06) The Endgame: Playing The Longest Possible Game Given A Superposition Of Futures
(37:31) On the evidence of delaying technology leading to better outcomes
(40:49) Humanity is in locust mode
(44:11) Scenarios in which Joscha would delay AI
(48:04) On the dangers of AI regulation
(55:34) From longtermist doomer who thinks AGI is good to 6x6 political compass
(01:00:08) Joscha believes in god in the same sense as he believes in personal selves
(01:05:45) The transition from cyanobacterium to photosynthesis as an allegory for technological revolutions
(01:17:46) What Joscha would do as Aragorn in Middle-Earth
(01:25:20) The endgame of brain computer interfaces is to liberate our minds and embody thinking molecules
(01:28:50) Transcending politics and aligning humanity
(01:35:53) On the feasibility of starting an AGI lab in 2023
(01:43:19) Why green teaming is necessary for ethics
(01:59:27) Joscha's Response to Connor Leahy on "if you don't do that, you die Joscha. You die"
(02:07:54) Aligning with the agent playing the longest game
(02:15:39) Joscha’s response to Connor on morality
(02:19:06) Caring about mindchildren and actual children equally
(02:20:54) On finding the function that generates human values
(02:28:54) Twitter And Reddit Questions: Joscha’s AGI timelines and p(doom)
(02:35:16) Why European AI regulations are bad for AI research
(02:38:13) What regulation would Joscha Bach pass as president of the US
(02:40:16) Is Open Source still beneficial today?
(02:42:26) How to make sure that AI loves humanity
(02:47:42) The movie Joscha would want to live in
(02:50:06) Closing message for the audience -
Erik Jones on Automatically Auditing Large Language Models
Erik is a Phd at Berkeley working with Jacob Steinhardt, interested in making generative machine learning systems more robust, reliable, and aligned, with a focus on large language models.In this interview we talk about his paper "Automatically Auditing Large Language Models via Discrete Optimization" that he presented at ICML.
Youtube: https://youtu.be/bhE5Zs3Y1n8
Paper: https://arxiv.org/abs/2303.04381
Erik: https://twitter.com/ErikJones313
Host: https://twitter.com/MichaelTrazzi
Patreon: https://www.patreon.com/theinsideview
Outline
00:00 Highlights
00:31 Eric's background and research in Berkeley
01:19 Motivation for doing safety research on language models
02:56 Is it too easy to fool today's language models?
03:31 The goal of adversarial attacks on language models
04:57 Automatically Auditing Large Language Models via Discrete Optimization
06:01 Optimizing over a finite set of tokens rather than continuous embeddings
06:44 Goal is revealing behaviors, not necessarily breaking the AI
07:51 On the feasibility of solving adversarial attacks
09:18 Suppressing dangerous knowledge vs just bypassing safety filters
10:35 Can you really ask a language model to cook meth?
11:48 Optimizing French to English translation example
13:07 Forcing toxic celebrity outputs just to test rare behaviors
13:19 Testing the method on GPT-2 and GPT-J
14:03 Adversarial prompts transferred to GPT-3 as well
14:39 How this auditing research fits into the broader AI safety field
15:49 Need for automated tools to audit failures beyond what humans can find
17:47 Auditing to avoid unsafe deployments, not for existential risk reduction
18:41 Adaptive auditing that updates based on the model's outputs
19:54 Prospects for using these methods to detect model deception
22:26 Prefer safety via alignment over just auditing constraints, Closing thoughts
Patreon supporters:
Tassilo Neubauer
MonikerEpsilon
Alexey Malafeev
Jack Seroy
JJ Hepburn
Max Chiswick
William Freire
Edward Huff
Gunnar Höglund
Ryan Coppolo
Cameron Holmes
Emil Wallner
Jesse Hoogland
Jacques Thibodeau
Vincent Weisser -
Dylan Patel on the GPU Shortage, Nvidia and the Deep Learning Supply Chain
Dylan Patel is Chief Analyst at SemiAnalysis a boutique semiconductor research and consulting firm specializing in the semiconductor supply chain from chemical inputs to fabs to design IP and strategy. The SemiAnalysis substack has ~50,000 subscribers and is the second biggest tech substack in the world. In this interview we discuss the current GPU shortage, why hardware is a multi-month process, the deep learning hardware supply chain and Nvidia's strategy.
Youtube: https://youtu.be/VItz2oEq5pA
Transcript: https://theinsideview.ai/dylan