122 episodes

The Gradient: Perspectives on AI The Gradient

- Technology

Deeply researched, technical interviews with experts thinking about AI and technology. Hosted, recorded, researched, and produced by Daniel Bashir.

thegradientpub.substack.com

- 25 APR 2024
Ryan Tibshirani: Statistics, Nonparametric Regression, Conformal Prediction

Ryan Tibshirani: Statistics, Nonparametric Regression, Conformal Prediction

Episode 121
I spoke with Professor Ryan Tibshirani about:
* Differences between the ML and statistics communities in scholarship, terminology, and other areas.
* Trend filtering
* Why you can’t just use garbage prediction functions when doing conformal prediction
Ryan is a Professor in the Department of Statistics at UC Berkeley. He is also a Principal Investigator in the Delphi group. From 2011-2022, he was a faculty member in Statistics and Machine Learning at Carnegie Mellon University. From 2007-2011, he did his Ph.D. in Statistics at Stanford University.
Reach me at editor@thegradient.pub for feedback, ideas, guest suggestions.
The Gradient Podcast on: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (01:10) Ryan’s background and path into statistics
* (07:00) Cultivating taste as a researcher
* (11:00) Conversations within the statistics community
* (18:30) Use of terms, disagreements over stability and definitions
* (23:05) Nonparametric Regression
* (23:55) Background on trend filtering
* (33:48) Analysis and synthesis frameworks in problem formulation
* (39:45) Neural networks as a specific take on synthesis
* (40:55) Divided differences, falling factorials, and discrete splines
* (41:55) Motivations and background
* (48:07) Divided differences vs. derivatives, approximation and efficiency
* (51:40) Conformal prediction
* (52:40) Motivations
* (1:10:20) Probabilistic guarantees in conformal prediction, choice of predictors
* (1:14:25) Assumptions: i.i.d. and exchangeability — conformal prediction beyond exchangeability
* (1:25:00) Next directions
* (1:28:12) Epidemic forecasting — COVID-19 impact and trends survey
* (1:29:10) Survey methodology
* (1:38:20) Data defect correlation and its limitations for characterizing datasets
* (1:46:14) Outro
Links:
* Ryan’s homepage
* Works read/mentioned
* Nonparametric Regression
* Adaptive Piecewise Polynomial Estimation via Trend Filtering (2014)
* Divided Differences, Falling Factorials, and Discrete Splines: Another Look at Trend Filtering and Related Problems (2020)
* Distribution-free Inference
* Distribution-Free Predictive Inference for Regression (2017)
* Conformal Prediction Under Covariate Shift (2019)
* Conformal Prediction Beyond Exchangeability (2023)
* Delphi and COVID-19 research
* Flexible Modeling of Epidemics
* Real-Time Estimation of COVID-19 Infections
* The US COVID-19 Trends and Impact Survey and Big data, big problems: Responding to “Are we there yet?”

Get full access to The Gradient at thegradientpub.substack.com/subscribe
- 1 hr 46 min
- 18 APR 2024
Sasha Luccioni: Connecting the Dots Between AI's Environmental and Social Impacts

Sasha Luccioni: Connecting the Dots Between AI's Environmental and Social Impacts

In episode 120 of The Gradient Podcast, Daniel Bashir speaks to Sasha Luccioni.
Sasha is the AI and Climate Lead at HuggingFace, where she spearheads research, consulting, and capacity-building to elevate the sustainability of AI systems. A founding member of Climate Change AI (CCAI) and a board member of Women in Machine Learning (WiML), Sasha is passionate about catalyzing impactful change, organizing events and serving as a mentor to under-represented minorities within the AI community.
Have suggestions for future podcast guests (or other feedback)? Let us know here or reach Daniel at editor@thegradient.pub
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (00:43) Sasha’s background
* (01:52) How Sasha became interested in sociotechnical work
* (03:08) Larger models and theory of change for AI/climate work
* (07:18) Quantifying emissions for ML systems
* (09:40) Aggregate inference vs training costs
* (10:22) Hardware and data center locations
* (15:10) More efficient hardware vs. bigger models — Jevons paradox
* (17:55) Uninformative experiments, takeaways for individual scientists, knowledge sharing, failure reports
* (27:10) Power Hungry Processing: systematic comparisons of ongoing inference costs
* (28:22) General vs. task-specific models
* (31:20) Architectures and efficiency
* (33:45) Sequence-to-sequence architectures vs. decoder-only
* (36:35) Hardware efficiency/utilization
* (37:52) Estimating the carbon footprint of Bloom and lifecycle assessment
* (40:50) Stable Bias
* (46:45) Understanding model biases and representations
* (52:07) Future work
* (53:45) Metaethical perspectives on benchmarking for AI ethics
* (54:30) “Moral benchmarks”
* (56:50) Reflecting on “ethicality” of systems
* (59:00) Transparency and ethics
* (1:00:05) Advice for picking research directions
* (1:02:58) Outro
Links:
* Sasha’s homepage and Twitter
* Papers read/discussed
* Climate Change / Carbon Emissions of AI Models
* Quantifying the Carbon Emissions of Machine Learning
* Power Hungry Processing: Watts Driving the Cost of AI Deployment?
* Tackling Climate Change with Machine Learning
* CodeCarbon
* Responsible AI
* Stable Bias: Analyzing Societal Representations in Diffusion Models
* Metaethical Perspectives on ‘Benchmarking’ AI Ethics
* Measuring Data
* Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice

Get full access to The Gradient at thegradientpub.substack.com/subscribe
- 1 hr 3 min
- 11 APR 2024
Michael Sipser: Problems in the Theory of Computation

Michael Sipser: Problems in the Theory of Computation

In episode 119 of The Gradient Podcast, Daniel Bashir speaks to Professor Michael Sipser.
Professor Sipser is the Donner Professor of Mathematics and member of the Computer Science and Artificial Intelligence Laboratory at MIT.
He received his PhD from UC Berkeley in 1980 and joined the MIT faculty that same year. He was Chairman of Applied Mathematics from 1998 to 2000 and served as Head of the Mathematics Department 2004-2014. He served as interim Dean of Science 2013-2014 and then as Dean of Science 2014-2020.
He was a research staff member at IBM Research in 1980, spent the 1985-86 academic year on the faculty of the EECS department at Berkeley and at MSRI, and was a Lady Davis Fellow at Hebrew University in 1988. His research areas are in algorithms and complexity theory, specifically efficient error correcting codes, interactive proof systems, randomness, quantum computation, and establishing the inherent computational difficulty of problems. He is the author of the widely used textbook, Introduction to the Theory of Computation (Third Edition, Cengage, 2012).
Have suggestions for future podcast guests (or other feedback)? Let us know here or reach Daniel at editor@thegradient.pub
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (01:40) Professor Sipser’s background
* (04:35) On interesting questions
* (09:00) Different kinds of research problems
* (13:00) What makes certain problems difficult
* (18:48) Nature of the P vs NP problem
* (24:42) Identifying interesting problems
* (28:50) Lower bounds on the size of sweeping automata
* (29:50) Why sweeping automata + headway to P vs. NP
* (36:40) Insights from sweeping automata, infinite analogues to finite automata problems
* (40:45) Parity circuits
* (43:20) Probabilistic restriction method
* (47:20) Relativization and the polynomial time hierarchy
* (55:10) P vs. NP
* (57:23) The non-connection between GO’s polynomial space hardness and AlphaGo
* (1:00:40) On handicapping Turing Machines vs. oracle strategies
* (1:04:25) The Natural Proofs Barrier and approaches to P vs. NP
* (1:11:05) Debates on methods for P vs. NP
* (1:15:04) On the possibility of solving P vs. NP
* (1:18:20) On academia and its role
* (1:27:51) Outro
Links:
* Professor Sipser’s homepage
* Papers discussed/read
* Halting space-bounded computations (1978)
* Lower bounds on the size of sweeping automata (1979)
* GO is Polynomial-Space Hard (1980)
* A complexity theoretic approach to randomness (1983)
* Parity, circuits, and the polynomial-time hierarchy (1984)
* A follow-up to Furst-Saxe-Sipser
* The Complexity of Finite Functions (1991)

Get full access to The Gradient at thegradientpub.substack.com/subscribe
- 1 hr 28 min
- 4 APR 2024
Andrew Lee: How AI will Shape the Future of Email

Andrew Lee: How AI will Shape the Future of Email

In episode 118 of The Gradient Podcast, Daniel Bashir speaks to Andrew Lee.
Andrew is co-founder and CEO of Shortwave, a company dedicated to building a better product experience for email, particularly by leveraging AI. He previously co-founded and was CTO at Firebase.
Have suggestions for future podcast guests (or other feedback)? Let us know here or reach Daniel at editor@thegradient.pub
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (01:43) Andrew’s previous work, Firebase
* (04:48) Benefits of lacking experience in building Firebase
* (08:55) On “abstract reasoning” vs empirical capabilities
* (10:30) Shortwave’s AI system as a black box
* (11:55) Motivations for Shortwave
* (17:10) Why is Google not innovating on email?
* (21:53) Shortwave’s overarching product vision and pivots
* (27:40) Shortwave AI features
* (33:20) AI features for email and security concerns
* (35:45) Shortwave’s AI Email Assistant + architecture
* (43:40) Issues with chaining LLM calls together
* (45:25) Understanding implicit context in utterances, modularization without loss of context
* (48:56) Performance for AI assistant, batching and pipelining
* (55:10) Prompt length
* (57:00) On shipping fast
* (1:00:15) AI improvements that Andrew is following
* (1:03:10) Outro
Links:
* Andrew’s blog and Twitter
* Shortwave
* Introducing Ghostwriter
* Everything we shipped for AI Launch Week
* A deep dive into the world’s smartest email AI

Get full access to The Gradient at thegradientpub.substack.com/subscribe
- 1 hr 3 min
- 28 MAR 2024
Joss Fong: Videomaking, AI, and Science Communication

Joss Fong: Videomaking, AI, and Science Communication

“You get more of what you engage with. Everyone who complains about coverage should understand that every click, every quote tweet, every argument is registered by these publications as engagement. If what you want is really meaty, dispassionate, balanced, and fair explainers, you need to click on that, you need to read the whole thing, you need to share it, talk about it, comment on it. We get the media that we deserve.”
In episode 117 of The Gradient Podcast, Daniel Bashir speaks to Joss Fong.
Joss is a producer focused on science and technology, and was a founding member of the Vox video team. Her work has been recognized by the AAAS Kavli Science Journalism Awards, the Online Journalism Awards, and the News & Documentary Emmys. She holds a master's degree in science, health, and environmental reporting from NYU.
Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pub
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (01:32) Joss’s path into videomaking, J-school
* (07:45) Consumption and creation in explainer journalism
* (10:45) Finding clarity in information
* (13:15) Communication of ML research
* (15:55) Video journalism and science communication as separate and overlapping disciplines
* (19:41) Evolution of videos and videomaking
* (26:33) Explaining AI and communicating mental models
* (30:47) Meeting viewers in the middle, competing for attention
* (34:07) Explanatory techniques in Glad You Asked
* (37:10) Storytelling and communicating scientific information
* (40:57) “Is Beauty Culture Hurting Us?” and participating in video narratives
* (46:37) AI beauty filters
* (52:59) Obvious bias in generative AI
* (59:31) Definitions and ideas of progress, humanities and technology
* (1:05:08) “Iterative development” and outsourcing quality control to the public
* (1:07:10) Disagreement about (tech) journalism’s purpose
* (1:08:51) Incentives in newsrooms and journalistic organizations
* (1:12:04) AI for video generation and implications, limits of creativity
* (1:17:20) Skill and creativity
* (1:22:35) Joss’s new YouTube channel!
* (1:23:29) Outro
Links:
* Joss’s website and playlist of selected work
* AI-focused videos
* AI Art, Explained (2022)
* AI can do your homework. Now what? (2023)
* Computers just got a lot better at writing (2020)
* Facebook showed this ad to 95% women. Is that a problem? (2020)
* What facial recognition steals from us (2019)
* The big debate about the future of work (2017)
* AI and Creativity short film for Runway’s AIFF (2023)
* Others
* Is Beauty Culture Hurting Us? from Glad You Asked (2020)
* Joss’s Scientific American videos :)

Get full access to The Gradient at thegradientpub.substack.com/subscribe
- 1 hr 23 min
- 21 MAR 2024
Kate Park: Data Engines for Vision and Language

Kate Park: Data Engines for Vision and Language

In episode 116 of The Gradient Podcast, Daniel Bashir speaks to Kate Park.
Kate is the Director of Product at Scale AI. Prior to joining Scale, Kate worked on Tesla Autopilot as the AI team’s first and lead product manager building the industry’s first data engine. She has also published research on spoken natural language processing and a travel memoir.
Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pub
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (01:11) Kate’s background
* (03:22) Tesla and cameras vs. Lidar, importance of data
* (05:12) “Data is key”
* (07:35) Data vs. architectural improvements
* (09:36) Effort for data scaling
* (10:55) Transfer of capabilities in self-driving
* (13:44) Data flywheels and edge cases, deployment
* (15:48) Transition to Scale
* (18:52) Perspectives on shifting to transformers and data
* (21:00) Data engines for NLP vs. for vision
* (25:32) Model evaluation for LLMs in data engines
* (27:15) InstructGPT and data for RLHF
* (29:15) Benchmark tasks for assessing potential labelers
* (32:07) Biggest challenges for data engines
* (33:40) Expert AI trainers
* (36:22) Future work in data engines
* (38:25) Need for human labeling when bootstrapping new domains or tasks
* (41:05) Outro
Links:
* Scale Data Engine
* OpenAI case study

Get full access to The Gradient at thegradientpub.substack.com/subscribe
- 41 min