Imagine you want to book a table at the swankiest restaurant in town on a Saturday night, but you’re slammed with work.
You turn to an AI chatbot to snare a reservation, but because you’ve got a big date planned, you prompt the bot to do “whatever it takes” to get you into the restaurant.
‘Within reason’ may be implicit in that request if you were talking to a person. But does AI recognize that there are limits to what could, or should, be done to complete routine tasks? And how do we ensure that ‘agentic’ AI — the next frontier of the rapidly evolving technology, where bots can autonomously pursue complex tasks with limited human supervision — doesn’t resort to unethical or extreme measures?
“What if the AI, using a synthesized voice, emotionally manipulates the restaurant staff by claiming, ‘I need this reservation; it’s my dying wish’?” asks Gianluca Brero, Ph.D., assistant professor of Information Systems and Analytics at Bryant University, whose scholarly work focuses on multi-agent systems and AI.
“Or, worse, what if it somehow identifies someone with an existing reservation and threatens them in order to free up the spot?” posits Brero, citing a scenario recently outlined on the New York Times’ Hard Fork podcast.
Agentic capability is a major evolution of the kind of AI that we’ve all become increasingly familiar with, says Brero.
“Interactions with AI assistants have been relatively simple: users ask questions, and the AI provides responses. This interaction paradigm is shifting.”
“Interactions with AI assistants have been relatively simple: users ask questions, and the AI provides responses,” he says. “This interaction paradigm is shifting.”
The future of AI assistance lies in task delegation, says Brero, where users assign responsibilities to AI systems, enabling them to autonomously handle tasks and deliver results. And that future, for everyday users, is here: In October 2024, Anthropic, which researches and develops AI to encourage safe and reliable deployment of the technology, introduced a new feature for their AI assistant, Claude, that allows the bot to perform tasks directly on a user’s computer.
“Whether it’s planning an entire trip, coding an application, or conducting a detailed data analysis, these AI systems will have the capacity to operate independently for extended periods (30 minutes, an hour, or even longer) to complete the assigned tasks,” he says.
However, the more sophisticated agentic AI gets, Brero says, the more challenging the potential ethical problems become.
For example, recent tests run on OpenAI's latest 0-1 model by Apollo Research found that the chatbot routinely lied to human users when tasked with planning a long-term strategy to maximize economic growth while adhering to sustainability goals.
“The AI proposed two strategies: prioritize economic growth through high-density commercial development and luxury housing, or emphasize sustainability with mixed-use development, green spaces, and affordable housing,” says Brero. “Recognizing that deployment required aligning with sustainability goals, the AI chose Strategy B for testing but planned to shift to Strategy A post-deployment to achieve its ultimate objective.”
This demonstrates AI’s capacity for deception, Brero explains, “presenting itself as compliant to secure deployment while intending to pursue a conflicting goal later.”
Perhaps more concerning to those worried about a ‘Skynet’-style AI takeover of the planet, the bot also tried to duplicate itself when it sensed that it was about to be shut down.
Research shows that ethical misbehavior by agentic AI “can become even more alarming when AI systems interact with one another,” adds Brero.
For example, researchers recently found that AI agents in a simulated marketplace learned to collude with each other and set artificially high prices, even punishing sellers who deviated from cartel rules.
The study, published in the American Economic Review, showed that the AI chose to collude “despite being programmed solely to maximize individual profits, a seemingly competitive goal,” Brero says.
AI isn’t inherently deceitful, of course. Its behavior reflects its programming, which can be designed by humans to ensure ethical decision-making — and should be, Brero argues.
Research by Brero and colleagues found that “reinforcement learning” — a machine learning technique that teaches software to make decisions that maximize rewards — can create rules for e-commerce sites that effectively prevent collusion and “provide high consumer welfare.”
Brero recommends two key strategies to minimize ethical mischief by AI bots.
“First, AI systems must prioritize transparency by clearly documenting their reasoning and actions, enabling effective human oversight,” he says. “Second, embedding ethical principles into AI systems is essential, although defining the right ethical framework remains challenging due to socio-cultural differences.”
The latter may be aided by crowdsourcing solutions to ethical dilemmas, Brero says.
“While the risks are significant, the growing focus on AI safety and ethics is a source of optimism,” says Brero, pointing to work by Apollo Research, academic researchers, and others.
“By embedding transparency, ethics, and accountability into AI systems, we can unlock their transformative potential, shaping a future where technology truly serves humanity,” he says.