OpenAI CEO Sam Altman speaks during the Microsoft Build conference at the Seattle Convention Center Summit Building in Seattle, Washington on May 21, 2024.
Image Credits:JASON REDMOND/AFP / Getty Images
AI

AI benchmarking organization criticized for waiting to disclose funding from OpenAI

An organization developing math benchmarks for AI didn’t disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI community.

Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grantmaking foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure an AI’s mathematical skills, was one of the benchmarks OpenAI used to demo its upcoming flagship AI, o3.

In a post on the forum LessWrong, a contractor for Epoch AI going by the username “Meemi” says that many contributors to the FrontierMath benchmark weren’t informed of OpenAI’s involvement until it was made public.

“The communication about this has been non-transparent,” Meemi wrote. “In my view Epoch AI should have disclosed OpenAI funding, and contractors should have transparent information about the potential of their work being used for capabilities, when choosing whether to work on a benchmark.”

On social media, some users raised concerns that the secrecy could erode FrontierMath’s reputation as an objective benchmark. In addition to backing FrontierMath, OpenAI had visibility into many of the problems and solutions in the benchmark — a fact that Epoch AI didn’t divulge prior to December 20, when o3 was announced.

In a post on X, Stanford PhD mathematics student Carina Hong also alleged that OpenAI has privileged access to FrontierMath thanks to its arrangement with Epoch AI, and that this isn’t sitting well with some contributors.

“Six mathematicians who significantly contributed to the FrontierMath benchmark confirmed [to me] … that they are unaware that OpenAI will have exclusive access to this benchmark (and others won’t),” Hong said. “Most express they are not sure they would have contributed had they known.”

Techcrunch event

Tech and VC heavyweights join the Disrupt 2025 agenda

Netflix, ElevenLabs, Wayve, Sequoia Capital, Elad Gil — just a few of the heavy hitters joining the Disrupt 2025 agenda. They’re here to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss the 20th anniversary of TechCrunch Disrupt, and a chance to learn from the top voices in tech — grab your ticket now and save up to $600+ before prices rise.

Tech and VC heavyweights join the Disrupt 2025 agenda

Netflix, ElevenLabs, Wayve, Sequoia Capital — just a few of the heavy hitters joining the Disrupt 2025 agenda. They’re here to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss the 20th anniversary of TechCrunch Disrupt, and a chance to learn from the top voices in tech — grab your ticket now and save up to $675 before prices rise.

San Francisco | October 27-29, 2025

In a reply to Meemi’s post, Tamay Besiroglu, associate director of Epoch AI and one of the organization’s co-founders, asserted that the integrity of FrontierMath hadn’t been compromised, but admitted that Epoch AI “made a mistake” in not being more transparent.

“We were restricted from disclosing the partnership until around the time o3 launched, and in hindsight we should have negotiated harder for the ability to be transparent to the benchmark contributors as soon as possible,” Besiroglu wrote. “Our mathematicians deserved to know who might have access to their work. Even though we were contractually limited in what we could say, we should have made transparency with our contributors a non-negotiable part of our agreement with OpenAI.”

Besiroglu added that while OpenAI has access to FrontierMath, it has a “verbal agreement” with Epoch AI not to use FrontierMath’s problem set to train its AI. (Training an AI on FrontierMath would be akin to teaching to the test.) Epoch AI also has a “separate holdout set” that serves as an additional safeguard for independent verification of FrontierMath benchmark results, Besiroglu said.

“OpenAI has … been fully supportive of our decision to maintain a separate, unseen holdout set,” Besiroglu wrote.

However, muddying the waters, Epoch AI lead mathematician Elliot Glazer noted in a post on Reddit that Epoch AI hasn’t be able to independently verify OpenAI’s FrontierMath o3 results.

“My personal opinion is that [OpenAI’s] score is legit (i.e., they didn’t train on the dataset), and that they have no incentive to lie about internal benchmarking performances,” Glazer said. “However, we can’t vouch for them until our independent evaluation is complete.”

The saga is yet another example of the challenge of developing empirical benchmarks to evaluate AI — and securing the necessary resources for benchmark development without creating the perception of conflicts of interest.

TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.

Topics

, , , , , , , , ,
Loading the next article
Error loading the next article