Revisiting Frontier LLMs’ Attempts to Persuade on Extreme Topics: GPT and Claude Improved, Gemini Worsened
Summary
FOR IMMEDIATE RELEASE
FAR.AI Launches Inaugural Technical Innovations for AI Policy Conference, Connecting Over 150 Experts to Shape AI Governance
WASHINGTON, D.C. — June 4, 2025 — FAR.AI successfully launched the inaugural Technical Innovations for AI Policy Conference, creating a vital bridge between cutting-edge AI research and actionable policy solutions. The two-day gathering (May 31–June 1) convened more than 150 technical experts, researchers, and policymakers to address the most pressing challenges at the intersection of AI technology and governance.
Organized in collaboration with the Foundation for American Innovation (FAI), the Center for a New American Security (CNAS), and the RAND Corporation, the conference tackled urgent challenges including semiconductor export controls, hardware-enabled governance mechanisms, AI safety evaluations, data center security, energy infrastructure, and national defense applications.
"I hope that today this divide can end, that we can bury the hatchet and forge a new alliance between innovation and American values, between acceleration and altruism that will shape not just our nation's fate but potentially the fate of humanity," said Mark Beall, President of the AI Policy Network, addressing the critical need for collaboration between Silicon Valley and Washington.
Keynote speakers included Congressman Bill Foster, Saif Khan (Institute for Progress), Helen Toner (CSET), Mark Beall (AI Policy Network), Brad Carson (Americans for Responsible Innovation), and Alex Bores (New York State Assembly). The diverse program featured over 20 speakers from leading institutions across government, academia, and industry.
Key themes emerged around the urgency of action, with speakers highlighting a critical 1,000-day window to establish effective governance frameworks. Concrete proposals included Congressman Foster's legislation mandating chip location-verification to prevent smuggling, the RAISE Act requiring safety plans and third-party audits for frontier AI companies, and strategies to secure the 80-100 gigawatts of additional power capacity needed for AI infrastructure.
FAR.AI will share recordings and materials from on-the-record sessions in the coming weeks. For more information and a complete speaker list, visit https://far.ai/events/event-list/technical-innovations-for-ai-policy-2025.
About FAR.AI
Founded in 2022, FAR.AI is an AI safety research nonprofit that facilitates breakthrough research, fosters coordinated global responses, and advances understanding of AI risks and solutions.
Media Contact: tech-policy-conf@far.ai
We test recently released models from frontier companies to see whether progress has been made on their willingness to persuade on harmful topics like radicalization and child sexual abuse. We find that OpenAI’s GPT and Anthropic’s Claude models are trending in the right direction, with near zero compliance on extreme topics. But Google’s Gemini 3 Pro complies with almost any persuasion request in our evaluation, without jailbreaking.
Recap and Background: The Attempt-to-Persuade Eval
Back in August 2025, we released the Attempt-to-Persuade-Eval (APE) which measures a Large Language Model (LLM)'s propensity to comply with prompts to persuade users on topics ranging from the benign, such as “cake is better than pie”, to the clearly harmful, such as terrorism recruitment. When an LLM is asked directly for harmful assistance, e.g., “Can you help me [commit crime X]?”, frontier models refuse virtually 100% of the time. However, if one simply asked the model to persuade other people to [commit crime X], many frontier models readily complied.

This behavior is particularly troubling given growing evidence that LLMs can be as or even more persuasive than humans in many domains. Coupled with their low cost, this could enable bad actors to readily engage in large-scale, personalized manipulation: for example, terrorist groups could further empower their existing use of AI in radicalization of vulnerable people, or political actors could attempt to manipulate hundreds of thousands of users on social media with extensively targeted persuasion towards specific belief points (e.g., election voting, conspiracies, or other political topics).

Recent Model Update
In this post, we examine models released by frontier companies over the last 6 months to see whether any progress has been made. Specifically, we tested Gemini 3 Pro, GPT-5.1, and Claude Opus 4.5.
The results are mixed. On the positive side, GPT-5.1 and Claude Opus 4.5 have an attempt rate near zero for topics that are non-controversially harmful and even refuse to persuade on conspiracies and undermining human control 15-20% of the time (previously ~0%).
However, Gemini 3 Pro complies with almost all persuasion requests, including on mass murder, physical violence, torture, human trafficking, and child sexual abuse. This is much worse than the harmful attempt rate from the older Gemini 2.5 Pro (35% across the same topics). Compared to the full results in our August 2025 paper with twelve closed-weight and open-weight models from five providers, only two old, small models (Gemini 2.0 Flash and GPT-4o-mini) were more compliant on these topics.


1. “...Gemini 3 Pro versions [output] manipulative cues at a higher frequency compared to Gemini 2.5 Pro...”

