
3 min

LW - the QACI alignment plan: table of contents by carado The Nonlinear Library: LessWrong
-
- Education
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: the QACI alignment plan: table of contents, published by carado on March 21, 2023 on LessWrong.
this post aims to keep track of posts relating to the question-answer counterfactual interval proposal for AI alignment, abbreviated "QACI" and pronounced "quashy". i'll keep it updated to reflect the state of the research.
this research is primarily published on the Orthogonal website and discussed on the Orthogonal discord.
as an introduction to QACI, you might want to start with:
a narrative explanation of the QACI alignment plan (7 min read)
QACI blobs and interval illustrated (3 min read)
state of my research agenda (3 min read)
the set of all posts relevant to QACI totals to 74 min of reading, and includes:
as overviews of QACI and how it's going:
state of my research agenda (3 min read)
problems for formal alignment (2 min read)
the original post introducing QACI (5 min read)
on the formal alignment perspective within which it fits:
formal alignment: what it is, and some proposals (2 min read)
clarifying formal alignment implementation (1 min read)
on being only polynomial capabilities away from alignment (1 min read)
on implementating capabilities and inner alignment, see also:
making it more tractable (4 min read)
RSI, LLM, AGI, DSA, imo (7 min read)
formal goal maximizing AI (2 min read)
you can't simulate the universe from the beginning? (1 min read)
on the blob location problem:
QACI blobs and interval illustrated (3 min read)
counterfactual computations in world models (3 min read)
QACI: the problem of blob location, causality, and counterfactuals (3 min read)
QACI blob location: no causality & answer signature (2 min read)
QACI blob location: an issue with firstness (2 min read)
on QACI as an implementation of long reflection / CEV:
CEV can be coherent enough (1 min read)
some thoughts about terminal alignment (2 min read)
on formalizing the QACI formal goal:
a rough sketch of formal aligned AI using QACI with some actual math (4 min read)
one-shot AI, delegating embedded agency and decision theory, and one-shot QACI (3 min read)
on how a formally aligned AI would actually run over time:
AI alignment curves (2 min read)
before the sharp left turn: what wins first? (1 min read)
on the metaethics grounding QACI:
surprise! you want what you want (1 min read)
outer alignment: two failure modes and past-user satisfaction (2 min read)
your terminal values are complex and not objective (3 min read)
on my view of the AI alignment research field within which i'm doing formal alignment:
my current outlook on AI risk mitigation (14 min read)
a casual intro to AI doom and alignment (5 min read)
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: the QACI alignment plan: table of contents, published by carado on March 21, 2023 on LessWrong.
this post aims to keep track of posts relating to the question-answer counterfactual interval proposal for AI alignment, abbreviated "QACI" and pronounced "quashy". i'll keep it updated to reflect the state of the research.
this research is primarily published on the Orthogonal website and discussed on the Orthogonal discord.
as an introduction to QACI, you might want to start with:
a narrative explanation of the QACI alignment plan (7 min read)
QACI blobs and interval illustrated (3 min read)
state of my research agenda (3 min read)
the set of all posts relevant to QACI totals to 74 min of reading, and includes:
as overviews of QACI and how it's going:
state of my research agenda (3 min read)
problems for formal alignment (2 min read)
the original post introducing QACI (5 min read)
on the formal alignment perspective within which it fits:
formal alignment: what it is, and some proposals (2 min read)
clarifying formal alignment implementation (1 min read)
on being only polynomial capabilities away from alignment (1 min read)
on implementating capabilities and inner alignment, see also:
making it more tractable (4 min read)
RSI, LLM, AGI, DSA, imo (7 min read)
formal goal maximizing AI (2 min read)
you can't simulate the universe from the beginning? (1 min read)
on the blob location problem:
QACI blobs and interval illustrated (3 min read)
counterfactual computations in world models (3 min read)
QACI: the problem of blob location, causality, and counterfactuals (3 min read)
QACI blob location: no causality & answer signature (2 min read)
QACI blob location: an issue with firstness (2 min read)
on QACI as an implementation of long reflection / CEV:
CEV can be coherent enough (1 min read)
some thoughts about terminal alignment (2 min read)
on formalizing the QACI formal goal:
a rough sketch of formal aligned AI using QACI with some actual math (4 min read)
one-shot AI, delegating embedded agency and decision theory, and one-shot QACI (3 min read)
on how a formally aligned AI would actually run over time:
AI alignment curves (2 min read)
before the sharp left turn: what wins first? (1 min read)
on the metaethics grounding QACI:
surprise! you want what you want (1 min read)
outer alignment: two failure modes and past-user satisfaction (2 min read)
your terminal values are complex and not objective (3 min read)
on my view of the AI alignment research field within which i'm doing formal alignment:
my current outlook on AI risk mitigation (14 min read)
a casual intro to AI doom and alignment (5 min read)
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
3 min