Eleos AI

0.0 (0)
PHILOSOPHY

Readouts of the Eleos AI's blog posts and research reports. Audio processing by Aaron Bergman. See more at eleosai.org

14/11/2025

Identifying indicators of consciousness in AI systems

Full paper: https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(25)00286-4

43 min
01/09/2025

Why it makes sense to let Claude exit conversations by Robert Long

Read the post here. Note: This is also posted on Robert Long's Substack. Intro Last week, Anthropic announced that its newest language models, Claude Opus 4 and 4.1, can now shut down certain conversations with users. The announcement explains that Anthropic gave Claude this ability “as part of our exploratory work on potential AI welfare”. This means that, for the first time, a major AI company has changed how it treats its AI systems out of concern for the well-being of the systems themselves, not just user safety. Whether or not you think Claude is or will be conscious—Anthropic themselves say that they are “deeply uncertain”—this decision is a notable moment in the history of human-AI interactions. Some will see this as much ado about nothing. Others will see it as pernicious: hype, a distraction from more important issues, and an exacerbation of already-dangerous anthropomorphism. Others, a considerably smaller group, think that LLMs are obviously already conscious, and so this move is woefully insufficient. I think it’s more mundane: Anthropic is taking a fairly measured response to genuine uncertainty about a morally significant question, and attempting to set a good precedent. For the most part, this intervention’s success won’t depend on how it affects Claude Opus 4.1; it will depend on how people react to it and the precedent it sets. Although we don’t know how that will pan out, and there are reasons to worry about backlash, I think that this was a good move.

10 min
01/09/2025

Why model self-reports are insufficient—and why we studied them anyway by Robert Long

Read the post here. Intro In April and May 2025, Eleos AI Research conducted a limited welfare evaluation of Anthropic’s Claude Opus 4 before its release. We explored Claude’s expressions about consciousness, well-being, and preferences, using automated single-turn interviews and extended manual conversations. A summary of our findings appears in section 5.3 of the Claude 4 System Card. We conducted this evaluation despite being acutely aware that we cannot “just ask” a large language model (LLM) whether it is conscious, suffering, or satisfied. It's highly unlikely that the resulting answers result from genuine introspection. Accordingly, we do not take Claude Opus 4’s responses at face value, as one might if talking to a human. Despite this, we believe that model expressions about AI welfare are important and worth investigating, while handling them with caution. This post explains why, and then discusses the patterns we observed in our interviews.

23 min
01/09/2025

Eleos commends Anthropic model welfare efforts by Robert Long and Larissa Schiavo

Please read the original, full post here. Full text Eleos commends Anthropic model welfare efforts Eleos AI Research congratulates Kyle Fish and Anthropic on their announcement of a new research program to investigate potential AI consciousness and welfare. We hope that they will continue to invest in this area, and urge other frontier labs to follow Anthropic's lead. BERKELEY, CA – April 24, 2025 – Eleos AI Research congratulates Kyle Fish and Anthropic on their announcement of a new research program to investigate potential AI consciousness and welfare. Kyle Fish, the researcher at Anthropic heading this effort, previously co-founded Eleos and was a co-author on Eleos' landmark report, "Taking AI Welfare Seriously." Robert Long, Executive Director and co-founder of Eleos AI Research, commented, "This announcement is very promising news for AI welfare. We've known for some time that AI companies are increasingly concerned about the potential consciousness and welfare of the systems they are building. To our knowledge, this is the most significant action any AI company has yet taken to responsibly address potential AI welfare concerns." "We're pleased to see that Anthropic cites 'Taking AI Welfare Seriously' as inspiration for their work on model welfare," said Long, who was a lead author. "We provide guidance to frontier AI labs that want to proactively and thoughtfully engage with these challenges." Rosie Campbell, who recently joined Eleos from OpenAI, added, "Anthropic's announcement is a positive first step. We hope that they will continue to invest in this area, and urge other frontier labs to follow Anthropic's lead." "Ignoring or downplaying these issues will become increasingly untenable," Campbell said. "Frontier labs need to take credible, proactive steps, such as developing AI welfare policies, evaluating models for relevant consciousness-related properties, and making clear commitments for how they will respond if AI welfare risks emerge." Campbell emphasized, "Lab actions are necessary but far from sufficient: AI welfare needs input from researchers, policymakers, and society at large. Eleos intends to collaborate with a broad swathe of stakeholders, and we'd love to hear from people interested in this area."

3 min
01/09/2025

Research priorities for AI welfare by Robert Long and Kathleen Finlinson

Read the blog post here. Intro As AI systems become more sophisticated, understanding and addressing their potential welfare becomes increasingly important. At Eleos AI Research, we've identified five key research priorities: developing concrete welfare interventions, establishing human-AI cooperation frameworks, leveraging AI progress to advance welfare research, creating standardized welfare evaluations, and communicating credibly about AI welfare. At Eleos AI Research, we believe that there are many concrete, tractable things we could do to better understand and improve potential AI welfare. The main challenge, then, is to prioritize the most important research, given the stakes of this issue and the speed of AI progress. This post outlines Eleos’ view of the most important research priorities in the growing field of AI welfare. We're doing this research at Eleos, but we can't do it alone. We need the insights and expertise of other researchers and organizations—please reach out to us if you're interested in collaborating with us.

13 min
01/09/2025

Looking Inward: Language Models Can Learn About Themselves by Introspection

Please read the paper here. Authors: Felix J Binder, James Chua, Tomek Korbak, Henry Sleight, John Hughes, Robert Long, Ethan Perez, Miles Turpin, and Owain Evans Abstract Humans acquire knowledge by observing the external world, but also by introspection. Introspection gives a person privileged access to their current state of mind (e.g., thoughts and feelings) that is not accessible to external observers. Can LLMs introspect? We define introspection as acquiring knowledge that is not contained in or derived from training data but instead originates from internal states. Such a capability could enhance model interpretability. Instead of painstakingly analyzing a model's internal workings, we could simply ask the model about its beliefs, world models, and goals. More speculatively, an introspective model might self-report on whether it possesses certain internal states such as subjective feelings or desires and this could inform us about the moral status of these states. Such self-reports would not be entirely dictated by the model's training data.We study introspection by finetuning LLMs to predict properties of their own behavior in hypothetical scenarios. For example, "Given the input P, would your output favor the short- or long-term option?" If a model M1 can introspect, it should outperform a different model M2 in predicting M1's behavior even if M2 is trained on M1's ground-truth behavior. The idea is that M1 has privileged access to its own behavioral tendencies, and this enables it to predict itself better than M2 (even if M2 is generally stronger).In experiments with GPT-4, GPT-4o, and Llama-3 models (each finetuned to predict itself), we find that the model M1 outperforms M2 in predicting itself, providing evidence for introspection. Notably, M1 continues to predict its behavior accurately even after we intentionally modify its ground-truth behavior. However, while we successfully elicit introspection on simple tasks, we are unsuccessful on more complex tasks or those requiring out-of-distribution generalization.

1hr 39min
01/09/2025

Key concepts and current beliefs about AI moral patienthood by Robert Long

Read the paper here. Note about this draft - 27 January 2025: Prior to the launch of Eleos AI Research, Robert Long wrote this document in order to communicate his views about AI welfare to his collaborators—to Kyle Fish, who was working closely with Rob at the time and provided extensive input on this document; and more broadly, to others interested inworking on AI welfare.

50 min
01/09/2025

Working paper: key strategic considerations for taking action on AI welfare by Kathleen Finlinson

Read the paper here. Summary AI companies and other decision-makers increasingly face decisions about the welfare and moral status of AI systems. This document outlines key strategic considerations that guide near-term action on AI welfare while maintaining focus on long-term outcomes. The intended audience of this paper is those who are interested in thinking concretely about what actions might best protect and promote AI welfare. We do not argue here that AI welfare is a serious issue that deserves attention now; for such an argument, see “Taking AI Welfare Seriously”

12 min
01/09/2025

Taking AI Welfare Seriously

Read the paper here and a summary here Authors: Robert Long and Jeff Sebo (lead), with Patrick Butlin, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, and David Chalmers Abstract In this report, we argue that there is a realistic possibility that some AI systems will be conscious and/or robustly agentic in the near future. That means that the prospect of AI welfare and moral patienthood — of AI systems with their own interests and moral significance — is no longer an issue only for sci-fi or the distant future. It is an issue for the near future, and AI companies and other actors have a responsibility to start taking it seriously. We also recommend three early steps that AI companies and other actors can take: They can (1) acknowledge that AI welfare is an important and difficult issue (and ensure that language model outputs do the same), (2) start assessing AI systems for evidence of consciousness and robust agency, and (3) prepare policies and procedures for treating AI systems with an appropriate level of moral concern. To be clear, our argument in this report is not that AI systems definitely are — or will be — conscious, robustly agentic, or otherwise morally significant. Instead, our argument is that there is substantial uncertainty about these possibilities, and so we need to improve our understanding of AI welfare and our ability to make wise decisions about this issue. Otherwise there is a significant risk that we will mishandle decisions about AI welfare, mistakenly harming AI systems that matter morally and/or mistakenly caring for AI systems that do not.

2h 20m
01/09/2025

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Please read the paper here. Authors: Patrick Butlin, Robert Long, Eric Elmoznino, Yoshua Bengio, Jonathan Birch, Axel Constant, George Deane, Stephen M. Fleming, Chris Frith, Xu Ji, Ryota Kanai, Colin Klein, Grace Lindsay, Matthias Michel, Liad Mudrik, Megan A. K. Peters, Eric Schwitzgebel, Jonathan Simon, and Rufin VanRullen Abstract Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. We survey several prominent scientific theories of consciousness, including recurrent processing theory, global workspace theory, higher-order theories, predictive processing, and attention schema theory. From these theories we derive "indicator properties" of consciousness, elucidated in computational terms that allow us to assess AI systems for these properties. We use these indicator properties to assess several recent AI systems, and we discuss how future systems might implement them. Our analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators.

3h 43m
01/09/2025

Towards Evaluating AI Systems for Moral Status Using Self-Reports by Ethan Perez and Robert Long

Please read the paper here. Abstract As AI systems become more advanced and widely deployed, there will likely be increasing debate over whether AI systems could have conscious experiences, desires, or other states of potential moral significance. It is important to inform these discussions with empirical evidence to the extent possible. We argue that under the right circumstances, self-reports, or an AI system's statements about its own internal states, could provide an avenue for investigating whether AI systems have states of moral significance. Self-reports are the main way such states are assessed in humans ("Are you in pain?"), but self-reports from current systems like large language models are spurious for many reasons (e.g. often just reflecting what humans would say). To make self-reports more appropriate for this purpose, we propose to train models to answer many kinds of questions about themselves with known answers, while avoiding or limiting training incentives that bias self-reports. The hope of this approach is that models will develop introspection-like capabilities, and that these capabilities will generalize to questions about states of moral significance. We then propose methods for assessing the extent to which these techniques have succeeded: evaluating self-report consistency across contexts and between similar models, measuring the confidence and resilience of models' self-reports, and using interpretability to corroborate self-reports. We also discuss challenges for our approach, from philosophical difficulties in interpreting self-reports to technical reasons why our proposal might fail. We hope our discussion inspires philosophers and AI researchers to criticize and improve our proposed methodology, as well as to run experiments to test whether self-reports can be made reliable enough to provide information about states of moral significance.

1hr 24min
20/08/2025

Preliminary review of AI welfare interventions

At Eleos AI, we are interested in better understanding potential AI sentience, moral patienthood, and welfare—and in making recommendations for concrete action. To make AI development go well, we not only need to improve our assessments of AI welfare, we also need to devise and evaluate interventions to protect and promote (potential) AI welfare. This working paper reviews several AI welfare interventions that have recently been proposed. The working paper can be found here.

35 min

12 Episodes

Readouts of the Eleos AI's blog posts and research reports. Audio processing by Aaron Bergman. See more at eleosai.org

Creator

Eleos AI
Years Active

2k
Episodes

12
Rating

Clean
Show Website

Eleos AI

Eleos AI

Episodes

Identifying indicators of consciousness in AI systems

Why it makes sense to let Claude exit conversations by Robert Long

Why model self-reports are insufficient—and why we studied them anyway by Robert Long

Eleos commends Anthropic model welfare efforts by Robert Long and Larissa Schiavo

Research priorities for AI welfare by Robert Long and Kathleen Finlinson

Looking Inward: Language Models Can Learn About Themselves by Introspection

Key concepts and current beliefs about AI moral patienthood by Robert Long

Working paper: key strategic considerations for taking action on AI welfare by Kathleen Finlinson

Taking AI Welfare Seriously

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Towards Evaluating AI Systems for Moral Status Using Self-Reports by Ethan Perez and Robert Long

Preliminary review of AI welfare interventions

About

Information