500 episodes

LessWrong (30+ Karma‪)‬ LessWrong

- Technology

Audio narrations of LessWrong posts.

- 21 MAY 2024
“EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024” by scasper

“EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024” by scasper

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Part 13 of 12 in the Engineer's Interpretability Sequence.
TL;DR
On May 5, 2024, I made a set of 10 predictions about what the next sparse autoencoder (SAE) paper from Anthropic would and wouldn’t do. Today's new SAE paper from Anthropic was full of brilliant experiments and interesting insights, but it ultimately underperformed my expectations. I am beginning to be concerned that Anthropic's recent approach to interpretability research might be better explained by safety washing than practical safety work.
Think of this post as a curt editorial instead of a technical piece. I hope to revisit my predictions and this post in light of future updates.
Reflecting on predictions
Please see my original post for 10 specific predictions about what today's paper would and wouldn’t accomplish. I think that Anthropic obviously did 1 and 2 [...]
---
Outline:
(00:16) TL;DR
(00:57) Reflecting on predictions
(02:09) A review + thoughts
---

First published:

May 21st, 2024

Source:

https://www.lesswrong.com/posts/pH6tyhEnngqWAXi9i/eis-xiii-reflections-on-anthropic-s-sae-research-circa-may

---
Narrated by TYPE III AUDIO.
- 6 min
- 21 MAY 2024
“On Dwarkesh’s Podcast with OpenAI’s John Schulman” by Zvi

“On Dwarkesh’s Podcast with OpenAI’s John Schulman” by Zvi

Dwarkesh Patel recorded a Podcast with John Schulman, cofounder of OpenAI and at the time their head of current model post-training. Transcript here. John's job at the time was to make the current AIs do what OpenAI wanted them to do. That is an important task, but one that employs techniques that their at-the-time head of alignment, Jan Leike, made clear we should not expect to work on future more capable systems. I strongly agree with Leike on that.

Then Sutskever left and Leike resigned, and John Schulman was made the new head of alignment, now charged with what superalignment efforts remain at OpenAI to give us the ability to control future AGIs and ASIs.

This gives us a golden opportunity to assess where his head is at, without him knowing he was about to step into that role.

There is no question that John Schulman [...]
---
Outline:
(01:12) The Big Take
(07:27) The Podcast
(20:27) Reasoning and Capabilities Development
(25:01) Practical Considerations
---

First published:

May 21st, 2024

Source:

https://www.lesswrong.com/posts/rC6CXZd34geayEH4s/on-dwarkesh-s-podcast-with-openai-s-john-schulman

---
Narrated by TYPE III AUDIO.
- 38 min
- 21 MAY 2024
“The Problem With the Word ‘Alignment’” by peligrietzer, particlemania

“The Problem With the Word ‘Alignment’” by peligrietzer, particlemania

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This post was written by Peli Grietzer, inspired by internal writings by TJ (tushant jha), for AOI[1]. The original post, published on Feb 5, 2024, can be found here: https://ai.objectives.institute/blog/the-problem-with-alignment.
The purpose of our work at the AI Objectives Institute (AOI) is to direct the impact of AI towards human autonomy and human flourishing. In the course of articulating our mission and positioning ourselves -- a young organization -- in the landscape of AI risk orgs, we’ve come to notice what we think are serious conceptual problems with the prevalent vocabulary of ‘AI alignment.’ This essay will discuss some of the major ways in which we think the concept of ‘alignment’ creates bias and confusion, as well as our own search for clarifying concepts.
At AOI, we try to think about AI within the context of [...]
The original text contained 2 footnotes which were omitted from this narration.
---

First published:

May 21st, 2024

Source:

https://www.lesswrong.com/posts/p3aL6BwpbPhqxnayL/the-problem-with-the-word-alignment-1

---
Narrated by TYPE III AUDIO.
- 11 min
- 21 MAY 2024
“New voluntary commitments (AI Seoul Summit)” by Zach Stein-Perlman

“New voluntary commitments (AI Seoul Summit)” by Zach Stein-Perlman

The UK and Republic of Korea governments announced that the following organisations have agreed to the Frontier AI Safety Commitments:
AmazonAnthropicCohereGoogleG42IBMInflection AIMetaMicrosoftMistral AINaverOpenAISamsung ElectronicsTechnology Innovation InstitutexAIZhipu.aiThe above organisations, in furtherance of safe and trustworthy AI, undertake to develop and deploy their frontier AI models and systems[1] responsibly, in accordance with the following voluntary commitments, and to demonstrate how they have achieved this by publishing a safety framework focused on severe risks by the upcoming AI Summit in France.
Given the evolving state of the science in this area, the undersigned organisations’ approaches (as detailed in paragraphs I-VIII) to meeting Outcomes 1, 2 and 3 may evolve in the future. In such instances, organisations will provide transparency on this, including their reasons, through public updates.
The above organisations also affirm their commitment [...]
The original text contained 3 footnotes which were omitted from this narration.
---

First published:

May 21st, 2024

Source:

https://www.lesswrong.com/posts/qfEgzQ9jGEk9Cegvy/new-voluntary-commitments-ai-seoul-summit

---
Narrated by TYPE III AUDIO.
- 11 min
- 20 MAY 2024
“What’s Going on With OpenAI’s Messaging?” by ozziegooen

“What’s Going on With OpenAI’s Messaging?” by ozziegooen

This is a quickly-written opinion piece, of what I understand about OpenAI. I first posted it to Facebook, where it had some discussion.

Some arguments that OpenAI is making, simultaneously:
OpenAI will likely reach and own transformative AI (useful for attracting talent to work there). OpenAI cares a lot about safety (good for public PR and government regulations). OpenAI isn’t making anything dangerous and is unlikely to do so in the future (good for public PR and government regulations). OpenAI doesn’t need to spend many resources on safety, and implementing safe AI won’t put it at any competitive disadvantage (important for investors who own most of the company). Transformative AI will be incredibly valuable for all of humanity in the long term (for public PR and developers). People at OpenAI have thought long and hard about what will happen, and it will be fine. We can’t [...] ---

First published:

May 21st, 2024

Source:

https://www.lesswrong.com/posts/cy99dCEiLyxDrMHBi/what-s-going-on-with-openai-s-messaging

---
Narrated by TYPE III AUDIO.
- 6 min
- 20 MAY 2024
“Anthropic: Reflections on our Responsible Scaling Policy ” by Zac Hatfield-Dodds

“Anthropic: Reflections on our Responsible Scaling Policy ” by Zac Hatfield-Dodds

This is a link post.Last September we published our first Responsible Scaling Policy (RSP) [LW discussion], which focuses on addressing catastrophic safety failures and misuse of frontier models. In adopting this policy, our primary goal is to help turn high-level safety concepts into practical guidelines for fast-moving technical organizations and demonstrate their viability as possible standards. As we operationalize the policy, we expect to learn a great deal and plan to share our findings. This post shares reflections from implementing the policy so far. We are also working on an updated RSP and will share this soon.

We have found having a clearly-articulated policy on catastrophic risks extremely valuable. It has provided a structured framework to clarify our organizational priorities and frame discussions around project timelines, headcount, threat models, and tradeoffs. The process of implementing the policy has also surfaced a range of important questions, projects, and dependencies [...]
---
Outline:
(04:53) Threat Modeling and Evaluations
(12:04) The ASL-3 Standard
(16:26) Assurance Structures
---

First published:

May 20th, 2024

Source:

https://www.lesswrong.com/posts/vAopGQhFPdjcA8CEh/anthropic-reflections-on-our-responsible-scaling-policy

---
Narrated by TYPE III AUDIO.
- 20 min

Geeniuse digisaade | Geenius.ee

Geeniuse digisaade | Geenius.ee

The Logan Bartlett Show

The Logan Bartlett Show

by Redpoint Ventures

Everyday AI Podcast – An AI and ChatGPT Podcast

Everyday AI Podcast – An AI and ChatGPT Podcast

Istmesoojendus

Digitund

Lenny's Podcast: Product | Growth | Career

Lenny's Podcast: Product | Growth | Career

Lenny Rachitsky

The Nonlinear Library

The Nonlinear Library

The Nonlinear Fund

Future of Life Institute Podcast

Future of Life Institute Podcast

Future of Life Institute

Dwarkesh Podcast

Dwarkesh Podcast

Last Week in AI

Last Week in AI

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

Nathaniel Whittemore

This Day in AI Podcast

This Day in AI Podcast

Michael Sharkey, Chris Sharkey