15 episodes

Listen to resources from the AI Safety Fundamentals: Alignment 201 course!https://course.aisafetyfundamentals.com/alignment-201

AI Safety Fundamentals: Alignment 201 BlueDot Impact

    • Technology

Listen to resources from the AI Safety Fundamentals: Alignment 201 course!https://course.aisafetyfundamentals.com/alignment-201

    Empirical Findings Generalize Surprisingly Far

    Empirical Findings Generalize Surprisingly Far

    Previously, I argued that emergent phenomena in machine learning mean that we can’t rely on current trends to predict what the future of ML will be like. In this post, I will argue that despite this, empirical findings often do generalize very far, including across “phase transitions” caused by emergent behavior.This might seem like a contradiction, but actually I think divergence from current trends and empirical generalization are consistent. Findings do often generalize, but you need to th...

    • 11 min
    Worst-Case Thinking in AI Alignment

    Worst-Case Thinking in AI Alignment

    Alternative title: “When should you assume that what could go wrong, will go wrong?” Thanks to Mary Phuong and Ryan Greenblatt for helpful suggestions and discussion, and Akash Wasil for some edits. In discussions of AI safety, people often propose the assumption that something goes as badly as possible. Eliezer Yudkowsky in particular has argued for the importance of security mindset when thinking about AI alignment. I think there are several distinct reasons that this might be the right ass...

    • 11 min
    Least-To-Most Prompting Enables Complex Reasoning in Large Language Models

    Least-To-Most Prompting Enables Complex Reasoning in Large Language Models

    Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. So...

    • 16 min
    Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions

    Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions

    Using hard multiple-choice reading comprehension questions as a testbed, we assess whether presenting humans with arguments for two competing answer options, where one is correct and the other is incorrect, allows human judges to perform more accurately, even when one of the arguments is unreliable and deceptive. If this is helpful, we may be able to increase our justified trust in language-model-based systems by asking them to produce these arguments where needed. Previous research has shown...

    • 16 min
    Low-Stakes Alignment

    Low-Stakes Alignment

    Right now I’m working on finding a good objective to optimize with ML, rather than trying to make sure our models are robustly optimizing that objective. (This is roughly “outer alignment.”) That’s pretty vague, and it’s not obvious whether “find a good objective” is a meaningful goal rather than being inherently confused or sweeping key distinctions under the rug. So I like to focus on a more precise special case of alignment: solve alignment when decisions are “low stakes.” I think this cas...

    • 13 min
    Imitative Generalisation (AKA ‘Learning the Prior’)

    Imitative Generalisation (AKA ‘Learning the Prior’)

    This post tries to explain a simplified version of Paul Christiano’s mechanism introduced here, (referred to there as ‘Learning the Prior’) and explain why a mechanism like this potentially addresses some of the safety problems with naïve approaches. First we’ll go through a simple example in a familiar domain, then explain the problems with the example. Then I’ll discuss the open questions for making Imitative Generalization actually work, and the connection with the Microscope AI idea. A mo...

    • 18 min

Top Podcasts In Technology

Acquired
Ben Gilbert and David Rosenthal
Lex Fridman Podcast
Lex Fridman
The TED AI Show
TED
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Hard Fork
The New York Times
Last Week in AI
Skynet Today