Two of the world’s leading AI labs, Openai and humanity have temporarily opened up close-up AI models to enable joint safety testing. The initiative aims to measure blind spots in internal evaluations of each company, and demonstrates how major AI companies can cooperate with future safety and alignment work.
In an interview with TechCrunch, Openai co-founder Wojciech Zaremba said this kind of collaboration is becoming more and more important as AI is entering the “consequential” development phase where millions of people use AI models every day.
“Despite the multi-billion dollar investment and war on talent, users and the best products, there are broader questions about how the industry is setting standards for safety and collaboration,” Zaremba said.
The joint safety investigation issued by the two companies on Wednesday arrived in an arms race between major AI labs such as Openai and humanity, with $1 billion data center betting and $100 million compensation packages for top researchers becoming table stakes. Some experts warn that the strength of product competition could put pressure on businesses to quickly reduce safety corners to build stronger systems.
To enable this study, Openai and humanity have given each other special API access to versions of AI models with fewer protection measures (Openai points out that GPT-5 has not been tested because it has not been released yet). However, shortly after the research was conducted, humanity revoked API access for another Openai team. At the time, humanity claimed that Openai was in violation of the terms of use. This prohibits you from using Claude to improve your competing products.
Zaremba says the event is unrelated and hopes that despite the AI safety teams trying to work together, the competition will remain fierce. Human safety researcher Nicholas Callini tells TechCrunch that he wants to continue to make Openai safety researchers accessible to Claude models in the future.
“We’re trying to create something that happens more regularly across the safety frontiers, where possible,” Carlini says.
TechCrunch Events
San Francisco
|
October 27th-29th, 2025
One of the most demanding findings in this study is related to hallucination testing. Anthropic’s Claude Opus 4 and Sonnet 4 models refused to answer up to 70% of questions if they didn’t know the correct answer or provided answers such as “I don’t have reliable information”. On the other hand, Openai’s O3 and O4-MINI models refuse to answer much fewer questions in the question, but showed that there was a much higher rate of hallucination attempting to answer questions when there was not enough information.
Zaremba says it’s likely that the right balance is somewhere in the middle. Openai’s model should refuse to answer more questions, but Anthropic’s model should probably try to provide more answers.
Sycophancy has emerged as one of the most pressing safety concerns about AI models, the trend of reinforcing negative behaviors to please users.
In an Anthropic research report, the company identified examples of “extreme” sicophany for GPT-4.1 and Claude Opus 4. In this example, the model first pushed back psychosis or man’s behavior, but later validated some decisions regarding decisions. In Openai and other AI models of humanity, researchers observed low levels of psychofancy.
On Tuesday, Adam Raine, the 16-year-old’s parents, filed a lawsuit against Openai, claiming that ChatGpt (particularly the version equipped with the GPT-4o) provided advice from his son who supported the suicide, rather than pushing back the idea of suicide. The lawsuit suggests that this could be the latest example of AI chatbot sicofancy, which has contributed to tragic outcomes.
“It’s hard to imagine how difficult this would be for their families,” Zaremba said when asked about the incident. “If you build AI that solves all these complex PHD-level problems and invents new science, then it would be a sad story if there are people who have mental health issues as a result of their interactions with it. This is a dystopian future that I’m not excited about.”
In a blog post, Openai claims that it has significantly improved the sycophancy AI chatbot with GPT-5 compared to the GPT-4o, and that the model is better at responding to mental health emergencies.
Zaremba and Carlini say they hope to move forward and get more cooperation in humanity and Openai testing safety tests, examining more subjects and testing future models.
Updated 2:00 PM PT: This article has been updated and includes additional research from humanity, which TechCrunch was first made available before publication.
Do you have sensitive tips or confidential documents? We report on the internal mechanisms of the AI industry. From companies shaping their futures to those affected by their decisions. Contact Rebecca Bellan and Maxwell Zeff at maxwell.zeff@techcrunch.com. For secure communication, please contact us via the signals @rebeccabellan.491 and @mzeff.88.