AI benchmarking organization criticized for waiting to disclose funding from OpenAI

An organization developing math benchmarks for AI didn’t disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI community.

Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grantmaking foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure an AI’s mathematical skills, was one of the benchmarks OpenAI used to demo its upcoming flagship AI, o3.

In a post on the forum LessWrong, a contractor for Epoch AI going by the username “Meemi” says that many contributors to the FrontierMath benchmark weren’t informed of OpenAI’s involvement until it was made public.

“The communication about this has been non-transparent,” Meemi wrote. “In my view Epoch AI should have disclosed OpenAI funding, and contractors should have transparent information about the potential of their work being used for capabilities, when choosing whether to work on a benchmark.”

On social media, some users raised concerns that the secrecy could erode FrontierMath’s reputation as an objective benchmark. In addition to backing FrontierMath, OpenAI had visibility into many of the problems and solutions in the benchmark — a fact that Epoch AI didn’t divulge prior to December 20, when o3 was announced.

In a post on X, Stanford PhD mathematics student Carina Hong also alleged that OpenAI has privileged access to FrontierMath thanks to its arrangement with Epoch AI, and that this isn’t sitting well with some contributors.

“Six mathematicians who significantly contributed to the FrontierMath benchmark confirmed [to me] … that they are unaware that OpenAI will have exclusive access to this benchmark (and others won’t),” Hong said. “Most express they are not sure they would have contributed had they known.”

Techcrunch event

San Francisco | October 27-29, 2025

REGISTER NOW

In a reply to Meemi’s post, Tamay Besiroglu, associate director of Epoch AI and one of the organization’s co-founders, asserted that the integrity of FrontierMath hadn’t been compromised, but admitted that Epoch AI “made a mistake” in not being more transparent.

“We were restricted from disclosing the partnership until around the time o3 launched, and in hindsight we should have negotiated harder for the ability to be transparent to the benchmark contributors as soon as possible,” Besiroglu wrote. “Our mathematicians deserved to know who might have access to their work. Even though we were contractually limited in what we could say, we should have made transparency with our contributors a non-negotiable part of our agreement with OpenAI.”

Besiroglu added that while OpenAI has access to FrontierMath, it has a “verbal agreement” with Epoch AI not to use FrontierMath’s problem set to train its AI. (Training an AI on FrontierMath would be akin to teaching to the test.) Epoch AI also has a “separate holdout set” that serves as an additional safeguard for independent verification of FrontierMath benchmark results, Besiroglu said.

“OpenAI has … been fully supportive of our decision to maintain a separate, unseen holdout set,” Besiroglu wrote.

However, muddying the waters, Epoch AI lead mathematician Elliot Glazer noted in a post on Reddit that Epoch AI hasn’t be able to independently verify OpenAI’s FrontierMath o3 results.

“My personal opinion is that [OpenAI’s] score is legit (i.e., they didn’t train on the dataset), and that they have no incentive to lie about internal benchmarking performances,” Glazer said. “However, we can’t vouch for them until our independent evaluation is complete.”

The saga is yet another example of the challenge of developing empirical benchmarks to evaluate AI — and securing the necessary resources for benchmark development without creating the perception of conflicts of interest.

TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.

Topics

AI, AI, ai benchmarks, benchmarking, controversy, epoch ai, frontiermath, Generative AI, o3, OpenAI

Kyle Wiggers

AI Editor

Kyle Wiggers was TechCrunch’s AI Editor until June 2025. His writing has appeared in VentureBeat and Digital Trends, as well as a range of gadget blogs including Android Police, Android Authority, Droid-Life, and XDA-Developers. He lives in Manhattan with his partner, a music therapist.

View Bio

Topics

More from TechCrunch

AI benchmarking organization criticized for waiting to disclose funding from OpenAI

Tech and VC heavyweights join the Disrupt 2025 agenda

Tech and VC heavyweights join the Disrupt 2025 agenda

Co-founder of Elon Musk’s xAI departs the company

Google CEO adds a new calendar feature at Stripe co-founder’s request

NASA has sparked a race to develop the data pipeline to Mars

Security flaws in a carmaker’s web portal let one hacker remotely unlock cars from anywhere

Sam Altman addresses ‘bumpy’ GPT-5 rollout, bringing 4o back, and the ‘chart crime’

AI benchmarking organization criticized for waiting to disclose funding from OpenAI

Tech and VC heavyweights join the Disrupt 2025 agenda

Tech and VC heavyweights join the Disrupt 2025 agenda

Most Popular

Co-founder of Elon Musk’s xAI departs the company

Google CEO adds a new calendar feature at Stripe co-founder’s request

NASA has sparked a race to develop the data pipeline to Mars

Security flaws in a carmaker’s web portal let one hacker remotely unlock cars from anywhere

The hidden cost of living in Mark Zuckerberg’s $110M compound

The computer science dream has become a nightmare

Sam Altman addresses ‘bumpy’ GPT-5 rollout, bringing 4o back, and the ‘chart crime’