“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt

LessWrong (Curated & Popular)

One hope for keeping existential risks low is to get AI companies to (successfully) make high-assurance safety cases: structured and auditable arguments that an AI system is very unlikely to result in existential risks given how it will be deployed.[1] Concretely, once AIs are quite powerful, high-assurance safety cases would require making a thorough argument that the level of (existential) risk caused by the company is very low; perhaps they would require that the total chance of existential risk over the lifetime of the AI company[2] is less than 0.25%[3][4].

The idea of making high-assurance safety cases (once AI systems are dangerously powerful) is popular in some parts of the AI safety community and a variety of work appears to focus on this. Further, Anthropic has expressed an intention (in their RSP) to "keep risks below acceptable levels"[5] and there is a common impression that Anthropic would pause [...]

---

Outline:

(03:19) Why are companies unlikely to succeed at making high-assurance safety cases in short timelines?

(04:14) Ensuring sufficient security is very difficult

(04:55) Sufficiently mitigating scheming risk is unlikely

(09:35) Accelerating safety and security with earlier AIs seems insufficient

(11:58) Other points

(14:07) Companies likely wont unilaterally slow down if they are unable to make high-assurance safety cases

(18:26) Could coordination or government action result in high-assurance safety cases?

(19:55) What about safety cases aiming at a higher risk threshold?

(21:57) Implications and conclusions

The original text contained 20 footnotes which were omitted from this narration.

---

First published:
January 23rd, 2025

Source:
https://www.lesswrong.com/posts/neTbrpBziAsTH5Bn7/ai-companies-are-unlikely-to-make-high-assurance-safety

---

Narrated by TYPE III AUDIO.

若要收听包含儿童不宜内容的单集,请登录。

关注此节目的最新内容

登录或注册,以关注节目、存储单集,并获取最新更新。

选择国家或地区

非洲、中东和印度

亚太地区

欧洲

拉丁美洲和加勒比海地区

美国和加拿大