Yoshua Bengio announces non-profit to develop 'honest' artificial intelligence

4 months ago 21

An artificial intelligence pioneer has launched a non-profit dedicated to developing an “honest” AI that will spot rogue systems attempting to deceive humans.

Yoshua Bengio, a renowned computer scientist described as one of the “godfathers” of AI, will be president of LawZero, an organisation committed to the safe design of the cutting-edge technology that has sparked a $1tn (£740bn) arms race.

Starting with funding of approximately $30m and more than a dozen researchers, Bengio is developing a system called Scientist AI that will act as a guardrail against AI agents – which carry out tasks without human intervention – showing deceptive or self-preserving behaviour, such as trying to avoid being turned off.

Describing the current suite of AI agents as “actors” seeking to imitate humans and please users, he said the Scientist AI system would be more like a “psychologist” that can understand and predict bad behaviour.

“We want to build AIs that will be honest and not deceptive,” Bengio said.

He added: “It is theoretically possible to imagine machines that have no self, no goal for themselves, that are just pure knowledge machines – like a scientist who knows a lot of stuff.”

However, unlike current generative AI tools, Bengio’s system will not give definitive answers and will instead give probabilities for whether an answer is correct.

“It has a sense of humility that it isn’t sure about the answer,” he said.

Deployed alongside an AI agent, Bengio’s model would flag potentially harmful behaviour by an autonomous system – having gauged the probability of its actions causing harm.

Scientist AI will “predict the probability that an agent’s actions will lead to harm” and, if that probability is above a certain threshold, that agent’s proposed action will then be blocked.

LawZero’s initial backers include AI safety body the Future of Life Institute, Jaan Tallinn, a founding engineer of Skype, and Schmidt Sciences, a research body founded by former Google chief executive Eric Schmidt.

skip past newsletter promotion

Bengio said the first step for LawZero would be demonstrating that the methodology behind the concept works – and then persuading companies or governments to support larger, more powerful versions. Open-source AI models, which are freely available to deploy and adapt, would be the starting point for training LawZero’s systems, Bengio added.

“The point is to demonstrate the methodology so that then we can convince either donors or governments or AI labs to put the resources that are needed to train this at the same scale as the current frontier AIs. It is really important that the guardrail AI be at least as smart as the AI agent that it is trying to monitor and control,” he said.

Bengio, a professor at the University of Montreal, earned the “godfather” moniker after sharing the 2018 Turing award – seen as the equivalent of a Nobel prize for computing – with Geoffrey Hinton, himself a subsequent Nobel winner, and Yann LeCun, the chief AI scientist at Mark Zuckerberg’s Meta.

A leading voice on AI safety, he chaired the recent International AI Safety report, which warned that autonomous agents could cause “severe” disruption if they become “capable of completing longer sequences of tasks without human supervision”.

Bengio said he was concerned by Anthropic’s recent admission that its latest system could attempt to blackmail engineers attempting to shut it down. He also pointed to research showing that AI models are capable of hiding their true capabilities and objectives. These examples showed the world is heading towards “more and more dangerous territory” with AIs that are able to reason better, said Bengio.

Read Entire Article

Yoshua Bengio announces non-profit to develop 'honest' artificial intelligence

Related

Cloudflare Cache Confusion

Rapid Developer-Driven Threat Modeling

Winners of the 2025 IFComp (Interactive Fiction Competition)...