8 min

LW - Announcing Atlas Computing by miyazono The Nonlinear Library

- Education

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing Atlas Computing, published by miyazono on April 12, 2024 on LessWrong.
Atlas Computing is a new nonprofit working to collaboratively advance AI capabilities that are asymmetrically risk-reducing. Our work consists of building scoped prototypes and creating an ecosystem around @davidad's Safeguarded AI programme at ARIA (formerly referred to as the Open Agency Architecture).
We formed in Oct 2023, and raised nearly $1M, primarily from the Survival and Flourishing Fund and Protocol Labs. We have no physical office, and are currently only Evan Miyazono (CEO) and Daniel Windham (software lead), but over the coming months and years, we hope to create compelling evidence that:
The Safeguarded AI research agenda includes both research and engineering projects where breakthroughs or tools can incrementally reduce AI risks.
If Atlas Computing makes only partial progress toward building safeguarded AI, we'll likely have put tools into the world that are useful for accelerating human oversight and review of AI outputs, asymmetrically favoring risk reduction.
When davidad's ARIA program concludes, the work of Atlas Computing will have parallelized solving some tech transfer challenges, magnifying the impact of any technologies he develops.
Our overall strategy
We think that, in addition to encoding human values into AI systems, a very complementary way to dramatically reduce AI risk is to create external safeguards that limit AI outputs. Users (individuals, groups, or institutions) should have tools to create specifications that list baseline safety requirements (if not full desiderata for AI system outputs) and also interrogate those specifications with non-learned tools.
A separate system should then use the specification to generate candidate solutions along with evidence that the proposed solution satisfies the spec. This evidence can then be reviewed automatically for adherence to the specified safety properties. This is by comparison to current user interactions with today's generalist ML systems, where all candidate solutions are at best reviewed manually. We hope to facilitate a paradigm where the least safe user's interactions with AI looks like:
Specification-based AI vs other AI risk mitigation strategies
We consider near-term risk reductions that are possible with this architecture to be highly compatible with existing alignment techniques.
In Constitutional AI, humans are legislators but laws are sufficiently nuanced and subjective that they require a language model to act as a scalable executive and judiciary. Using specifications to establish an objective preliminary safety baseline that is automatically validated by a non-learned system could be considered a variation or subset of Constitutional AI.
Some work on evaluations focuses on finding metrics that demonstrate safety or alignment of outputs. Our architecture expresses goals in terms of states of a world-model that is used to understand the impact of policies proposed by the AI, and would be excited to see and supportive of evals researchers exploring work in this direction.
This approach could also be considered a form of scalable oversight, where a baseline set of safe specifications are automatically enforced via validation and proof generation against a spec.
How this differs from davidad's work at ARIA
You may be aware that davidad is funding similar work as a Programme Director at ARIA (watch his 30 minute solicitation presentation here). It's worth clarifying that, while davidad and Evan worked closely at Protocol Labs, davidad is not an employee of Atlas Computing, and Atlas has received no funding from ARIA. That said, we're pursuing highly complementary paths in our hopes to reduce AI risk.
His Safeguarded AI research agenda, described here, is focused on using cyberphysical systems, li