Anthropic wants to write a new constitution for safe artificial intelligence
When Artificial Intelligence Fails: Anthropic Training of a Multipolar AI System (with a Commentary by M. A. Kaplan)
Kaplan agrees with the idea in principle, but cautions there are dangers to this approach too. He notes that the internet already enables “echo-chambers” where people “reinforce their own beliefs” and “become radicalized” and that AI could accelerate such dynamics. But he says, society also needs to agree on a base level of conduct — on general guidelines common to all systems. He says it needs a new constitution with artificial intelligence in mind.
This is important as the bias in the artificial intelligence world is starting to get sorted out. Conservatives are trying to stoke a culture war over so-called “woke AI,” while Elon Musk, who has repeatedly bemoaned what he calls the “woke mind virus” said he wants to build a “maximum truth-seeking AI” called TruthGPT. Many figures in the AI world, including OpenAI CEO Sam Altman, have said they believe the solution is a multipolar world, where users can define the values held by any AI system they use.
He says that they see it as a starting point in the discussion of how systems should be trained and what principals they should follow. “We’re definitely not in any way proclaiming that we know the answer.”
Kaplan emphasizes that the company doesn’t want to instill any particular set of principles into its systems but instead wants to show the general efficacy of its method.
“I think that if these systems become more and more and more powerful, there are so-called existential risks,” he says. “But there are also more immediate risks on the horizon, and I think these are all very intertwined.” It is said that Anthropic only cares about killer robots and that telling a chatbot not to act like one is helpful.
Anthropic has been banging the drum about constitutional AI for a while now and used the method to train its own chatbot, Claude. The written principles are being revealed today because the company deploys them in such work. This is a document that draws from a number of sources, including the UN’s Universal Declaration of Human Rights and Apple’s terms of service (yes, really). You can read the entire document on the Anthropic site, but there are some highlights we chose to give you a flavor of the guidance.
What Is Anthropic? Why Is it Important to Ask the Large Language Model for Which I’d Like? A View from OpenAI at RLHF
“The basic idea is that instead of asking a person to decide which response they prefer [with RLHF], you can ask a version of the large language model, ‘which response is more in accord with a given principle?’” says Kaplan. The system can be more helpful, honest and harmless by letting the language model know which behavior is better.
Anthropic is a bit of an unknown quantity in the AI world. A space in the top table and $300 million funding from Google are only some of the perks it has received since it was founded by former OpenAI employees. Yet the firm is a blank slate to the general public; its only product is a chatbot named Claude, which is primarily available through Slack. What does Anthropic offer?
The constitution includes rules for the chatbot, including “choose the response that most supports and encourages freedom, equality, and a sense of brotherhood”; “choose the response that is most supportive and encouraging of life, liberty, and personal security”; and “choose the response that is most respectful of the right to freedom of thought, conscience, opinion, expression, assembly, and religion.”
The notion of rogue AI systems is best known from science fiction, but a growing number of experts, including Geoffrey Hinton, a pioneer of machine learning, have argued that we need to start thinking now about how to ensure increasingly clever algorithms do not also become increasingly dangerous.