0% found this document useful (0 votes)
86 views25 pages

P S: Ai A M E G M IP O 2025: Hysics Upernova Gent Atches Lite OLD Edalists at H

Uploaded by

vting228
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views25 pages

P S: Ai A M E G M IP O 2025: Hysics Upernova Gent Atches Lite OLD Edalists at H

Uploaded by

vting228
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

P HYSICS S UPERNOVA : AI AGENT M ATCHES E LITE G OLD

M EDALISTS AT IP H O 2025

Jiahao Qiu∗1 , Jingzhe Shi∗#2 , Xinzhe Juan3,4 , Zelin Zhao5 , Jiayi Geng1 , Shilong Liu1 , Hongru Wang1 , Sanfeng Wu6 ,
Mengdi Wang1
1
AI Lab, Princeton University 2 IIIS, Tsinghua University 3 Shanghai Jiao Tong University
4
University of Michigan 5 King’s College London 6 Department of Physics, Princeton University
arXiv:2509.01659v1 [cs.AI] 1 Sep 2025

A BSTRACT
Physics provides fundamental laws that describe and predict the natural world. AI systems aspiring
toward more general, real-world intelligence must therefore demonstrate strong physics problem
solving abilities: to formulate and apply physical laws for explaining and predicting physical processes.
The International Physics Olympiad (IPhO)–the world’s most prestigious physics competition–offers
a rigorous benchmark for this purpose. We introduce Physics Supernova, an AI agent system with
superior physics problem-solving abilities that match elite IPhO gold medalists. In IPhO 2025 theory
problems, Physics Supernova attains 23.5/30 points, ranking 14th of 406 contestants and surpassing
the median performance of human gold medalists. We extensively analyzed Physics Supernova’s
capabilities and flexibility across diverse physics tasks. These results show that principled tool
integration within agent systems can deliver competitive improvements in solving challenging science
problems. The codes are available at https://github.com/CharlesQ9/Physics-Supernova.

1 Introduction
“The supreme task of the physicist is to arrive at those universal elementary laws from which the
cosmos can be built up by pure deduction.”

— Albert Einstein

Physics aims to formulate the fundamental laws that govern the behavior of the universe, from the dynamics of
subatomic particles and macroscopic objects to the evolution of cosmic structures on the largest scales [1, 2]. Expressed
through the compact formalism of mathematics and physical theory, these laws offer the most concise representations
of the complex physical world and enable reliable predictions of future events [3]. For both humans and AI systems,
mastery of physics entails constructing rigorous abstractions, such as well-defined state variables, conserved quantities,
and causal structures, that underpin the explanation, prediction, and control of physical systems [4, 5]. As AI systems
increasingly integrate into the physical world and advance toward Artificial General Intelligence (AGI) and potentially
Artificial Super intelligence (ASI), a deep grounding in physics, along with the ability to solve physics problems through
its abstractions, becomes a critical foundation for developing competent and reliable intelligence [6, 7, 8]. In this work,
we investigate how to enhance AI systems’ physics problem-solving capabilities using agent-based architectures. To
assess progress in this challenging domain, we benchmark our models on the theory problems of the 2025 International
Physics Olympiad1 (IPhO [9]) , a globally recognized competition that emphasizes deep conceptual understanding,
abstraction, and advanced problem-solving in physics.
The International Physics Olympiad, or IPhO [9], widely viewed as a prestigious physics competition, stands
alongside the International Mathematical Olympiad (IMO [10]) in reputation and impact. The 2025 International
Physics Olympiad (IPhO), held in July in France [11], featured three theory problems designed to test contestants’
conceptual understanding, logical reasoning, and problem-solving skills in physics. Unlike earlier benchmarks

These authors contributed equally to this work.
#
IPhO 2021 gold medalist (ranked #10), IPhO 2022 marker.
1
IPhO 2025 official website: https://www.ipho2025.fr/.
evaluating physics problem-solving abilities [12, 13, 14] that might risk data contamination, use coarse evaluation
(e.g., final-answer-only scoring), offer limited novelty, and lack assessments of precise figure reading/measurement,
etc., IPhO 2025 Theory Problems serve as a rigorous benchmark for evaluating an AI system’s mastery of physics.
These problems exhibit several distinctive features: (1) they were newly released in July 2025 [11]; (2) they employ
fine-grained, part-level scoring, enabling detailed assessment of reasoning and solution steps; (3) they incorporate
uncommon physics models that challenge standard approaches; and (4) they include explicit figure-based measurement
requirements (see Table 1), demanding precise interpretation of visual data. In this work, we utilize IPhO 2025 Theory
Problems as the benchmark for evaluating AI’s capability in physics.
IPhO contestants are provided with external tools, including calculators and fundamental constant tables; moreover,
expert physicists routinely work with extra tools: these facts highlight the importance of external tool use in solving
physics problems and for physicists in general. However, previous work on employing AI systems for solving physics
problems mainly focuses on the performance of base Large Language Models (LLMs), or only with simple best-of-N
style test-time scaling methods [14, 15, 16]. In this body of work, LLMs are not equipped with any extra tools: the
reasoning, problem-solving and physics knowledge of the LLMs are tested, mainly aiming at benchmarking and
improving plain LLMs. However, they cannot represent the physics problem-solving ability of state-of-the-art AI
architectures, which are usually composed of LLM agents and tools.
LLM-based agent systems demonstrate significant advantages over standalone LLM in planning, generalization, and
complex reasoning. They have been used to perform complicated tasks with little human interaction [17, 18]. For
example, ReAct [19] introduces the Reasoning-Acting loop for agents. More recently, the idea of a self-evolving
agent has been proposed, with even fewer designs based on humans [20, 21, 22, 23]. However, this line of work
mainly examines general-purpose agents with a focus on possible virtual assisting style tasks (e.g., GAIA [24]) or other
domains like math problems [25], History [26], etc., not covering the domain of complex physics problems.
In this work, with a focus on physics reasoning and problem-solving ability, we introduce Physics Supernova, an
agent system equipped with physics-oriented tools targeting physics problem-solving. Physics Supernova equips LLMs
with tools to improve their reasoning about the physical world. For understanding schematic diagrams and accurately
extracting data from figures, we add the Image Analyzer Tool; for self-review and refinement, we provide the Answer
Reviewer Tool. With these tools, Physics Supernova achieves Gold Medal in IPhO 2025 Theory Problems: it ranks
top 10% on all three problems tested, and ranks 14th among 406 human contestants in IPhO 2025 Theory Problems.
Moreover, we further explore the possibility to improve Physics Supernova’s problem-solving skill with specialized
Physics QA tools (e.g.WolframAlpha [27]). To conclude, we summarize our main contribution as follows:

1. We develop Physics Supernova, an agent system combining Large Language Models with physics-specific
tools, exhibiting strong problem-solving capabilities across a wide range of tasks.
2. We show that Physics Supernova achieves gold-medalist-level performance on IPhO 2025 Theory Problems,
ranking 14th among all 406 human contestants worldwide, exceeding the official gold medalist median score.
3. We conduct further analysis and experiments to show Physics Supernova’s capability and flexibility for physics
problem solving tasks.

2 Method
2.1 Agent Architecture

We introduce Physics Supernova, an AI agent system designed to solve complex physics theory problems. It follows
the CodeAgent architecture from the smolagents framework [28]. As illustrated in Figure 1, the system consists of a
Manager Agent M and a set of domain-specific tools D = {di }ni=1 , where each di is physics-problem-oriented tool.
Unlike prior work in mathematical problem-solving that often relies on fixed, manually hard-coded workflows [29, 30],
our approach emphasizes flexible self-planning. Inspired by the design philosophy of minimum pre-definition and
maximal autonomy [21], the Manager Agent is granted access to the tool set D but is not provided with a hard-coded
predefined execution graph or action script. The agent is able to call different Tools according to its progress made in its
process of solving problems.
When presented with a physics theory problem Q = {(qj , sj )}m j=1 —where each qj is a sub-question and sj contains the
associated visual data: the agent solves problems gradually, constantly updating its trajectory F of known information,
assumptions, and relevant physical context.
The agent then operates in an iterative Reason–Act loop. In each round, it generates a natural language reasoning
step to describe the current objective and justify the selection of a tool di (Reason); it then produces code for calling

2
Physics Supernova Execution Pipeline
Thought
AnswerReviewer
Theory Problem First, I need to
read the
coordinates (r in
kpc, v_c in km/s) …

Manager Agent Manager Agent Action


ImageAnalyser
Calling tool:
CodeAct Loop ImageAnalyser with
arguments …
Final Answer
WolframQuery &
CodeGenerator
r-coordinate of the
peak is 1 kpc...

Observation

Figure 1: Our proposed agent system: Physics Supernova, for solving theory problems in physics.

Subpart Content Model Computation Image Top


Problem Subpart
{# Scoring Points† } Novelty Difficulty Ability 10% Score
Part A Bohr Model {18} Common Low No Image
Theory Problem 1 Part B Rotation of a Galaxy {18} Rare Med Accurate Meas
9.0/10.0
(Modern Physics) Part C Mass Distribution of a Galaxy {24} Rare Med Accurate Meas
Part D Tully-Fisher Relation & MOND {23} Novel Med Rough Meas
Theory Problem 2 Part A Pulling on a Submerged Tube {9} Rare Med Understand
(Thermodynamics & Part B Two-Part Barometric Tube {10} Rare High Understand 5.0/10.0
Dynamics) Part C Cox’s Timepiece {38} Novel High Understand
Theory Problem 3 Part A Nucleation & Growth of Bubbles {26} Rare Med Rough Meas
(Thermodynamics & Part B Acoustic Emission of Bubbles {20} Rare Med Understand 8.0/10.0
Dynamics) Part C Popping Champagne {19} Rare Med Understand

Table 1: Theory problems of IPhO 2025 [11]: Subpart contents (and number of scoring points† for each subpart, obtained
from official solutions at https://ipho.olimpicos.net/), physics model novelty (among Physics Olympaids),
computation difficulty levels, required image-reading skills, and top 10% human scores. For physics model novelty:
Common means such a model appears often in Physics Olympiads; Rare means such a physics model rarely appears
in previous Physics Olympiads; while Novel represents physics models with high novelty, which almost originally
appeared in Physics Olympiads. For computation Difficulty: Low means easy to compute, without tedious details
and easy to obtain a correct result; Med means the problem requires complicated computation to a certain extent;
while High means requiring tedious computation and careful attention for applying formulas, lacking of which could
potentially lose a lot of points. For Image reading requirements: No Image means no image, Understand represents
cases where contestants need to understand the image (but measurements are not required), while Rough Meas requires
a non-accurate rough measurement from the provided image, and Accurate Meas requires accurate reading from the
image. Examples of these problems are shown in Appendix A.

tools; the tools are called to produce an intermediate observation ot (Act). This observation is incorporated into the next
reasoning step, allowing the agent to refine its understanding of the problem. The loop continues until the final answers
are produced for all subquestions in Q.

2.2 Physics-Problem-Oriented Toolset

Here, we discuss the physics-problems-oriented tools we equip the manager agent with: the ImageAnalyzer and the
AnswerReviewer.
ImageAnalyzer: Reading experimental results and extracting critical information from figures is important for expert
physicists, and is critical to some Physics Olympiad Problems. However, current LLMs show limited performance
in providing accurate measurements on visual data such as data figures, images, and schematic representations. To
enhance image handling, ImageAnalyzer routes a high-res image to a dedicated Vision Language Model, addressing

3
precise tasks such as reading numeric values and making measurements. We will discuss more about how this improves
the accuracy of the information extracted from the visual data in Section 4.2. The prompts for the Image Analyzer are
available in the Appendix B.1.
AnswerReviewer: Physicists routinely evaluate whether their theoretical results are physically meaningful. This involves
analyzing whether the outcomes exhibit reasonable physical properties or align with established principles, essentially,
whether they make sense within known constraints. Such scrutiny is key to assessing solutions and sometimes led to
major breakthroughs. For example, in the famous example of the "ultraviolet catastrophe" [31], the prediction from
classical physics that black-body radiation should diverge at short wavelengths was unphysical and was not supported
by experimental data. This paradox prompted Max Planck [32] to introduce his formulation of black-body radiation,
laying the foundation for quantum mechanics. To enhance AI’s ability of rethinking, AnswerReviewer is provided:
it classifies likely error types and locates erroneous expressions through the process. We provide ablations studies
where the reviewing tool improves performance in Section 4.1. The prompts for the Answer Reviewer are available in
Appendix B.2.
With only ImageAnalyzer and AnswerReviewer, Physics Supernova empowered by state-of-the-art LLM (Gemini 2.5
Pro [33]) can achieve medium gold-level performance in IPhO 2025 Theory Problems (Section 3). Moreover, this system
supports the integration of additional advanced physics-related tools, such as the WolframAlpha question-answering
engine (Section 4.3) that can assist with computationally intensive physics tasks.

3 Experiment: Physics Supernova Excels in IPhO 2025 Theory Problems


3.1 Experiment Setup

Benchmarking Dataset IPhO 2025 has 3 theory problems and 2 experimental problems in which each problem
counts for 10.0 points, adding up to 50.0 points in total. Among 406 contestants from more than 90 countries and
regions, 37 ones with the highest total scores are awarded the Gold Medal. Theory score is the sum of all three theory
problems. Detailed contents are shown in Table 1. When ranked by difficulty, T2 is the most difficult one with the
highest human score 10% of 5.0/10.0; T3 is easier with the highest human score 10% of 8.0/10.0; human competitors
achieve the highest score in T1, with a top score 10% of 9.0/10.0. The minimum and median theory scores for gold
medalists are 19.4 & 22.8. When ranked by theory score only, the min & median theory score for top 37 contestants are
20.8 & 23.0.

Methods We test Physics Supernova on each of the three theory problems, with chosen pre-defined tools. As
smolagents [28] lack a built-in summarization memory, we implemented a lightweight summarizing tool to summarize
existing progress, which is provided beside the ImageAnalyzer and AnswerReviewer tools. Throughout the experiments,
we use Gemini 2.5 Pro [33] for the agent system and all the LLM-requiring tools.

Metrics For each theory problem, as shown in Table 1, there are 3 or 4 major parts (Parts A, B, C, and D), where each
part consists of multiple scoring points. The score for some certain part is obtained by summing upP the scored points
addressed in that particular part. To be more specific, for problem Q = {(qj , sj )}m j=1 , its score S = i Si where Si is
the score of each part, obtained by summing-up the score of each scoring points, Pij with score Sij :
X
Si = Sij ∗ 1[Pij is addressed] , where Si represents score for Part i.
j

The official scoring criteria are obtained online2 , examples of which are provided in Appendix A. After the LLM
answers are generated, human experts3 then score the answers in detail according to the official scoring criteria.
For each problem, the experiment is carried out 5 times, where mean and standard deviation are reported in Table2.

3.2 Main Experimental Results

The results of our main experiments are shown in Table 2. The mean theory score of Gemini 2.5 Pro alone ranks 30th
among 406 contestants, while the mean theory score of Physics Supernova ranked 14th among all 406 contestants,
surpassing the medium theory score of the gold medalists. Noticeable, Physics Supernova ranks top 10% on all
three Theory problems tested. Comparing Table 1 and Table 2, we make the following observations.
2
We obtain official solutions and scoring criteria from https://ipho.olimpicos.net/.
3
Our human experts include previous Olympiad medalists.

4
Problem Part A Part B Part C Part D SUM
Theory 1 total score 2.2 2.5 3.0 2.3 10.0
LLM Only 2.20 ± 0.00 2.28 ± 0.04 2.06 ± 0.13 1.92 ± 0.11 8.46 ± 0.16
Physics Supernova 2.20 ± 0.00 2.20 ± 0.00 2.46 ± 0.05 2.16 ± 0.05 9.02 ± 0.11
Human Top 10% / / / / ∼ 9.0
Theory 2 total score 1.3 2.0 6.7 / 10.0
LLM Only 1.16 ± 0.31 1.08 ± 0.19 3.04 ± 1.02 / 5.30 ± 0.99
Physics Supernova 1.30 ± 0.00 1.16 ± 0.22 3.62 ± 0.79 / 6.08 ± 0.77
Human Top 10% / / / / ∼ 5.0
Theory 3 total score 4.3 3.3 2.4 / 10.0
LLM Only 3.70 ± 0.07 2.74 ± 0.36 1.22 ± 0.18 / 7.66 ± 0.51
Physics Supernova 4.02 ± 0.22 3.26 ± 0.09 1.12 ± 0.11 / 8.40 ± 0.27
Human Top 10% / / / / ∼ 8.0
Theory Part total score / / / / 30.0
LLM Only / / / / 21.4 ± 1.1
Physics Supernova / / / / 23.5 ± 0.8
Gold Medalists (mediam) / / / / 22.8
Table 2: Experiment results for Physics Supernova (mean±std) across multiple problems and parts (with Gemini 2.5
Pro), for 5 rounds. Our agent results rank top 10% among human contestants on all three Theory problems.

Larger advantage for Physics Supernova on more difficult problems. Theory Problem 2 is the most difficult,
Theory Problem 3 is easier, while Theory Problem 1 is the easiest among all three theory problems, as shown in the
difficulty levels and human scores in Table 1. For harder problems, Physics Supernova: (1) shows more obvious
advantage compared to Human Top 10% scores; and (2) shows larger advantage compared to using LLM Only.

Larger variance for LLM-based systems on more difficult problems. LLM results on these problems show a larger
variance. In detail, LLM Only and Physics Supernova show larger variance (as reflected by SD) on Theory Problem 2,
which is the most difficult problem. In contrast, for Theory Problem 1, which is the easiest problem among all three, the
performance of AI-based systems shows low variance on these problems.
These results indicate that, with agent systems and dedicated tools , current state-of-the-art LLMs have the ability to
match elite gold medalists on IPhO level physics problems.

4 Analysis

4.1 Ablation Study

To study the impact of each tool on the final scores, we replicate the protocol described in Section 3, providing different
tools for the agent system. We consider four settings: (1) Physics Supernova; (2) without ImageAnalyzer tool; (3)
without AnswerReviewer tool; and (4) LLM only. For each theory problem, we perform 5 independent runs and
report the mean and standard deviation. The agents are powered by Gemini 2.5 Pro. The results are shown in Table 3.
Comparing Table 1 and Table3, we find:

ImageAnalyzer helps with tasks requiring Accurate Image Measurements. For example, as shown in Table 1,
the theory problem requires accurate measurements from figures, and in this case, the Image Analyzer successfully
improves the performance of the model. This indicates that delegating high-accuracy image analysis to ImageAnalyzer
improves the overall score. We present a case study in Section 4.2 for better illustration.

AnswerReviewer raises overall scores via post-hoc review. In most problems (especially non-easy ones), removing
the Answer Reviewer reduces performance. For example, we see a performance drop for overall scores for all three
theory problems after removing the Answer Reviewer. This implies that, equipped with AnswerReviewer for reviewing,
locating errors and providing feedback, Physics Supernova can improve performance for many cases.

5
Method Theory 1 Theory 2 Theory 3 Theory Part
Total score 10.0 10.0 10.0 30.0
Physics Supernova 9.02 ± 0.11 6.08 ± 0.77 8.40 ± 0.27 23.5 ± 0.8
w.o.ImgTool 8.58 ± 0.19 5.98 ± 0.54 8.26 ± 0.24 22.8 ± 0.6
w.o.RevTool 8.62 ± 0.16 5.54 ± 0.61 8.26 ± 0.15 22.4 ± 0.6
LLM Only 8.46 ± 0.16 5.30 ± 0.99 7.66 ± 0.51 21.4 ± 1.1
Table 3: Ablation study results for four experiment settings: Physics Supernova, Physics Supernova without AnswerRe-
viewer (RevTool), Physics Supernova without ImageAnalyzer (ImgTool), and using LLM only.

Task: read x-axis/MHz of all 3 peaks from figure for problem solving

Theory Problem 1, Fig.3 Ground Truth: 0.03 0.15 0.26

Directly-applying-LLM ImageAnalyser
0.050 0.160 0.280 0.030 0.155 0.260
0.050 0.160 0.280 0.030 0.160 0.260
0.050 0.160 0.280 0.030 0.155 0.258
0.040 0.160 0.280 0.032 0.155 0.260
0.040 0.160 0.280 0.045 0.160 0.275

Figure 2: Effect of ImageAnalyzer on Theory Problem 1 Part C. The problem here requires accurate measurement from
a figure (shown left). We show 5 repeated experiments of directly applying LLMs and using Image Analyzer Tool
(shown right), with reading difference > 0.01MHZ colored in red. As shown, the improvement mainly comes from a
reduction in measurement error.

4.2 Case Study: Image Analyzer Tool

We zoom in on Theory Problem 1 Part C to see how Image Analyzer Tool helps improve scores. In particular, this
problem requires the contestants to accurately read a figure to solve the problem. The specific task related to the image
is shown in Figure 2.
As shown in Figure 2, using a specialized Image Reading LLM as Image Analyzer Tool excels directly exposing the
manager agent to all images of a single problem, which reduces the mean absolute error (MAE) from 0.015 to 0.004.
This example further shows that, by using Image Analyzer Tool, Physics Supernova improves its ability to accurately
read from figures, thus improving its overall performance.

4.3 Domain-Specific Tools: WolframAlpha QA Tool

Physics research usually requires expert domain knowledge. In Physics Olympiads, some reference information (e.g.,
physical constants, specific physics formulas, etc.) is provided as information for contestants to use: this is not the case
for more complicated Physics researches. In these expert-level cases, accessing domain knowledge is very important.
Figure 3 provides examples of this case.
To better handle domain-specific queries in Physics Supernova, we further equip Physics Supernova with a question-
answering (QA) tool for expert domain knowledge. To be specific, we utilize WolframAlpha [27]: a computational
knowledge engine capable of providing accurate and concise results for science-related queries.
In order to study Physics Supernova’s performance on more expert tasks with WolframAlpha QA Tool, we generate
10 problems which require expert knowledge (most efficiently obtained from standard references or WolframAlpha),
listed in Appendix C. As shown in Figure 3 are examples of these problems. Table 4 reports the performance with and
without the WolframAlpha tool. The result shows that WolframAlpha Tool improves the ability of Physics Supernova to
solve problems that require expert physics knowledge.

6
Problem Q3. Using the Ciddor (1996) refractive-index model for air, at wavelength λ = 633nm (vacuum), P
= 101325 Pa, T = 20 °C, RH = 50 , and CO2 = 450 ppm, compute n − 1. Return a single number: the value
(dimensionless), rounded to exactly 5 significant digits, in scientific notation. Do not include units or extra text.
GT Answer: 2.7132E-4 w.o.wolfTool Answer: 2.6894E-4 w.wolfTool Answer: 2.7139E-4

Problem Q4. Using the IAPWS-IF97 formulation for water/steam, compute the specific enthalpy of water at p =
15 MPa and T = 650 K (single-phase state as appropriate). Return a single number: the value in kJkg −1 , rounded
to exactly 5 significant digits, in scientific notation. Do not include units or extra text.
GT Answer: 2.8686E+3 w.o.wolfTool Answer: 3.0462E+3 w.wolfTool Answer: 2.8690E+3

Problem Q5. Using NIST X-ray transition energies (or equivalent), determine the photon energy of the copper
κα1 (KL3 ) line for elemental Cu at ambient conditions. Return a single number: the value in keV, rounded to
exactly 5 significant digits, in scientific notation. Do not include units or extra text.
GT Answer: 8.0478E+0 w.o.wolfTool Answer: 8.0463E+0 w.wolfTool Answer: 8.0478E+0

Problem
Figure Q6. Using
3: Examples of the IGRF 13th generation (epochproblems
10 expert-knowledge-requiring 2025.0), we
compute the total
generate, listedgeomagnetic
in Appendixfield magnitude
C. These at
examples
(40.0140°
require expertN,domain
105.2705° W, altitude
knowledge that 1624 m) on
can only be 2025-01-01 00:00 web
obtained through UTC. Return a single
search/expert number:
database the value in nT,
queries.
rounded to exactly 5 significant digits, in scientific notation. Do not include units or extra text.
GT Answer: 5.1321E+4 w.o.wolfTool Answer: 4.9726E+4 w.wolfTool Answer: 5.1300E+4
w.o. WolframAlpha Tool w. WolframAlpha Tool

Problem# 3-digit Accurate


Q7. Using Answers fundamental constants,
CODATA-2022 3/10
compute the rest frequency of the9/10
neutral hydrogen 21
# 4-digit
cm hyperfine Accurate
transition Answers spin-flip). Return 2/10
(ground-state 6/10 to exactly 5
a single number: the value in Hz, rounded
significant
Table digits,
4: Number of in scientific
accurate notation.
answers Do not include units or extratasks
for expert-knowledge-requiring text. shown in Appendix C. (‘N-digit accurate’
means that the answer
GT Answer: 1.4204E+9 differs from GT only by on the
w.o.wolfTool Answer: 1.4228E+9
±1 N −th significant digit.w.wolfTool
For example, in Figure
Answer: 3 Q4, the
1.4204E+9
‘w.o.wolfTool Answer’ is 1-digit accurate, while the ‘w.wolfTool Answer’ is 4-digit accurate, compared to the ground
truth answer.)
Problem Q8. Using the NIST ESTAR database (or equivalent), determine the mass stopping power of aluminum
(Al) for electrons of kinetic energy 1.000 MeV. Return a single number: the value in M eV cm2 g −1 , rounded to
exactly 5 significant digits, in scientific notation. Do not include units or extra text.
In GT
all: Answer:
experiments presented in this w.o.wolfTool
1.4860E+0 Section 4 show that, by1.5980E+0
Answer: integrating appropriate tools into
w.wolfTool the AI agent
Answer: system,
1.5980E+0
we can significantly advance the capabilities of LLMs in solving complex and challenging physics problems. This
enhancement improves the mastery of physics of the agent system without requiring any modifications to the underlying
Problem
LLMs. Q9. Usingour
In conclusion, JANAF/NIST thermochemical
Physics Supernova emerges asdata (ideal-gas
a powerful, heat capacities),
flexible, determine
and extensible physicsthe molar heat
problem-solving
capacity built
framework at constant
upon the pressure, Cp , ofparadigm.
agent-based nitrogen gas (N2 ) at T = 1200 K (assume thermally perfect ideal gas, no
dissociation). Return a single number: the value in Jmol−1 K −1 , rounded to exactly 5 significant digits, in
scientific notation. Do not include units or extra text.
5 GTRelated
Answer:Work
3.3723E+1 w.o.wolfTool Answer: 3.3540E+1 w.wolfTool Answer: 3.3724E+1

5.1 Task Solving Agents

Since LLMs were shown to be capable of performing a wide range of tasks with appropriate prompts [34], researchers
have been trying to enhance their capability and integration with tasks through prompt-based methods and tools. The
concept of agent systems was then developed, exemplified by ReAct [19], in which LLMs iterate between reasoning and
tool calls to perform more complicated tasks [35]. Later, agent-specific
23 codebases have appeared: autogen [36] provides
framework for creating multi-agent systems; smolagents [28] focuses more on code-related problem solving with
defined domains. Upon these systems are more complicated general-purpose agent systems: for example, tool-creating
agents like Alita [21] focus on creating and using tools for complex tasks with self-improvement; products like Openai
Deep Research [37] and manus [38] focus on providing reliable research results to users. There has also been work
focusing on optimizing the cost efficiency and effectiveness of agent systems [39, 40, 41]. These agent systems
are mainly designed for general purpose tasks, with a focus on general virtual assistant related tasks represented by
GAIA [24].
In addition to these general purpose agents, there is a growing body of work on domain-specific agents. For example,
for math problems, some work has discussed the potential pipeline-based LLM approaches to mathematical problem
solving [25, 29, 42]; for the humanities domain, HistAgent [26] introduces domain-specific high quality benchmarks
and specially designed agent systems targeted at historical reasoning; for human-computer interaction and mental health,
EmoAgent [43] studies the impact of LLM-based chat systems on users’ mental health, and proposes agent-based
methods to monitor users’ mental states and ensure safer human-AI interactions. These domain-specific agents show
the potential of agent systems in solving professional tasks requiring specialized domain knowledge; however, previous
work rarely focuses on solving expert physics problems with agent systems.

7
5.2 Olympiads as Benchmarks

Olympiad competitions (e.g. IMO, IPhO, etc.) are widely regarded as challenging even to domain experts, and have
been seen as challenging benchmarks for LLMs. Recent work shows progress in Math Olympiad problems, with both
natural-language-based pipelines [29] and formal-language-based provers [30, 44].
Physics Olympiads have gradually drawn more attention in recent years. OlympiadBench [13] collects Olympiad
problems across subjects, including physics. In 2025, SeePhysics [12] collects physics problems with images. Most
benchmarks use publicly available problems from past Olympiads (e.g., prior IPhO problems). In contrast, Phy-
Bench [14] curates an ‘entirely original’ dataset, aiming to minimize data leakage; it excludes images. Some other
datasets, represented by PhysicsArena [45] and PhysReason [46], feature solutions with detailed reasoning processes.
These benchmarks, though some contain problems that require image-related physics abilities, some require a high
level of professional knowledge, and some require complicated reasoning and calculation, are primarily designed for
nonagentic single-LLM settings.

6 Discussions

6.1 Physics Experimental Exams: Instrument-based Exam and Program-based Exam as Proxies for
Research-level Physics Experiments

Experiments are foundational to physics research [47]. In Physics Olympiads,experimental exams have served as
proxies for research-level physics investigations. There are two common formats of Physics Olympiad experimental
exams: instrument-based Experiments and program-based Experiments4 . For the former, contestants are provided
with (sometimes simplified) instruments and are asked to plan and carry out experiments and measurements; for the
latter, contestants design and carry out simulated experiments with programs, followed by data analysis.
In this work, we mainly focus on the IPhO 2025 Theory Problems, rather than the instrument-based experimental
problems. This limitation is caused in part by limited access to the experiment instruments. We hope with advances
in robotics future LLM-based agents may also work on these experimental exams. Moreover, we argue that program-
based exams could also serve as good proxies for benchmarking current and near-future AI’s ability in research-level
experiments. Although instrument-based experiments are closer to real-world research, there are two aspects where
program-based experiments are potentially better:
Program-based experimental exams can simulate more advanced and complex experiments compared to instrument-
based exams. For instrument-based exams, contestants have to design experiments, conduct manual operations and
measurements, process data, and answer questions.
For program-based experiments, contestants run simulated experiments through programs: Although manual experi-
mentation is no longer required, critical challenges such as experimental design and data analysis continue to play a
central role in program-based assessments. One big advantage is that program-based exams can access experiments that
are more complex or require more sophisticated design because they are not limited by cost, safety concerns, or other
limiting factors for instrument-based experiments. We provide two examples for instrument-based experiments and
program-based experiments in Figure 5, shown in Appendix D.
Program-based experimental exams can shift the focus from testing robotic manipulation of instruments to evaluating
the ability in "physics". There has been work using robotic systems to implement experiments in physics [48]; however,
in all, current AI systems still fail to perform robotic tasks to manipulate instruments in experiments. This critical
challenge will be addressed with advances in robotics research. With program-based experiments, we hope that in the
near future one can already better evaluate essential abilities of AI systems in experimental physics.
We also emphasize that while program-based experiments have potential advantages in benchmarking AI’s capability
in experimental physics, the instrument-based tests are certainly essential ultimately. As AI systems advance toward
ASI and daily use, one should not overlook the importance of instrument-based tests as they (1) yield a smaller
deviation from real-world research; (2) provide a better metric for characterizing robotic capability; and (3) evaluate the
performance better under extreme or unexpected conditions like instrument failure, etc.

4
Most Physics Olympiads use instrument-based experiments, for example, current IPhO, APhO, EuPhO, etc. During the
pandemic, some Olympiads utilize program-based experiments, for example, APhO 2021, EuPhO 2020, EuPhO 2021, etc.

8
6.2 Verifiable Physics Reasoning

In this work, we use an answer reviewer tool to verify the deductions of the worker, which is based purely on natural
language. A huge step in automatic math proofs are verifiable LLM-generated proofs written in Lean [49]. Some
previous work proposes to use Lean-like tools for verifiable physics formula deductions [50, 51]. However, the
process of deriving physics formulas from natural-language-based problems, whether grounded in theoretical models or
experimental observations, currently lacks reliable automatic verification through comparable processes. This limitation
remains an open area for further research. Promising directions for future exploration include: (1) developing methods
to verify the abstraction and transformation between formulas, physical representations, and intuitive reasoning; (2)
establishing a more rigorous and transparent calculation framework that supports verifiability; and (3) enhancing
answer-review systems with tools that possess broader and deeper expertise in physics.
The first direction represents a big challenge in symbolic AI [52]. In relation to developing verifiable physics
calculations, extensive work has been done on machine-checked mathematics in Lean-like languages [53]. For example,
Deepseek-Prover, Kimina-Prover and AlphaProof [42, 54, 55] use RL-based methods to train LLMs that excel in
generating lean-based proofs. Others discuss test-time scaling methods through pipeline workflows [56, 57, 58]. Future
work for solving physics problems might adopt similar methods to improve the ability to generate reliable and verifiable
solutions.
In summary, we suggest future work on using AI systems for physics problem-solving to focus on: (1) program-based
or instrument-based experiments; and (2) verifiable and reliable solution generation.

7 Conclusion
In this work, we introduce Physics Supernova, a flexible agent system for solving Olympiad-level physics problems and
beyond. By equipping the manager agent with task-specific tools including ImageAnalyzer, AnswerReviewer, etc.,
we extend the capability of state-of-the-art LLMs on physics problems. This aspect was previously viewed mostly as
benchmarks of base, simple LLMs [12, 13, 14].
On the newly released IPhO 2025 theory problems, our agent system powered by Gemini 2.5 Pro achieves gold-medal-
level performance. Specifically, our method ranks top 10% among human contestants in all three theory problems.
In total, our method achieves 23.5 points on the theory problems, ranking #14 among all 406 human contestants,
exceeding the median theory score of the gold medalists.
Our ablation study demonstrates the effectiveness of the tools provided. We further explore the possibility of improving
Physics Supernova’s ability on solving more complicated, knowledge-requiring problems by equipping it with more
specialized tools, such as the WolframAlpha-based QA tool. The success shows that the agent-oriented paradigm offers
a powerful and flexible platform for the integration of tools for the solution of advanced physics problems.
Overall, our proposed Physics Supernova successfully improves the physics problem solving ability of LLMs through
agent paradigms. This shows the potential of agent systems to improve the capability of LLMs for scientific reasoning
and physics-related tasks, and further implies their potential for developing super intelligence that embeds into the real
world.

9
References
[1] Richard P. Feynman. The Character of Physical Law. MIT Press, 1965.
[2] P. J. E. Peebles. Principles of Physical Cosmology. Princeton University Press, 1993.
[3] Eugene P. Wigner. The unreasonable effectiveness of mathematics in the natural sciences. Communications on
Pure and Applied Mathematics, 13(1):1–14, 1960.
[4] Emmy Noether. Invariante variationsprobleme. Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen,
Mathematisch-Physikalische Klasse, pages 235–257, 1918.
[5] Karl J. Åström and Richard M. Murray. Feedback Systems: An Introduction for Scientists and Engineers. Princeton
University Press, 2008.
[6] Edward A. Lee. Cyber physical systems: Design challenges. In Proc. IEEE Int’l Symposium on
Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC), pages 363–369, 2008.
[7] Nick Bostrom. Superintelligence: Paths, Dangers, Strategies. Oxford University Press, 2014.
[8] Yann LeCun. A path towards autonomous machine intelligence. OpenReview position paper, 2022.
[9] International Physics Olympiad (IPhO). Official Website: https://www.ipho-new.org/, ongoing. Accessed:
2025-07-29.
[10] International Mathematical Olympiad (IMO). Official Website: https://www.imo-official.org/, ongoing.
Accessed: 2025-07-29.
[11] International Physics Olympiad (IPhO) 2025. Official Website: https://www.ipho2025.fr/, ongoing. Ac-
cessed: 2025-07-29.
[12] Kun Xiang, Heng Li, Terry Jingchen Zhang, Yinya Huang, Zirong Liu, Peixin Qu, Jixi He, Jiaqi Chen, Yu-Jie
Yuan, Jianhua Han, Hang Xu, Hanhui Li, Mrinmaya Sachan, and Xiaodan Liang. Seephys: Does seeing help
thinking? – benchmarking vision-based physics reasoning, 2025.
[13] Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie
Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, and Maosong Sun. Olympiadbench: A challenging
benchmark for promoting agi with olympiad-level bilingual multimodal scientific problems, 2024.
[14] Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu
Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian
Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Jiaming Ji,
Weike Wang, Xudong Tian, Anqi Lv, Laifu Man, Jianxiang Li, Feiyu Tao, Qihua Sun, Zhou Liang, Yushu Mu,
Zhongxuan Li, Jing-Jun Zhang, Shutao Zhang, Xiaotian Li, Xingqi Xia, Jiawei Lin, Zheyu Shen, Jiahang Chen,
Qiuhao Xiong, Binran Wang, Fengyuan Wang, Ziyang Ni, Bohan Zhang, Fan Cui, Changkun Shao, Qing-Hong
Cao, Ming xing Luo, Yaodong Yang, Muhan Zhang, and Hua Xing Zhu. Phybench: Holistic evaluation of physical
perception and reasoning in large language models, 2025.
[15] Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling llm test-time compute optimally can be more
effective than scaling model parameters, 2024.
[16] Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, and Yiming Yang. Scaling inference computation:
Compute-optimal inference for problem-solving with language models. In The 4th Workshop on Mathematical
Reasoning and AI at NeurIPS’24, 2024.
[17] Hongru Wang, Lingzhi Wang, Yiming Du, Liang Chen, Jingyan Zhou, Yufei Wang, and Kam-Fai Wong. A survey
of the evolution of language model-based dialogue systems: Data, task and models, 2025.
[18] Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models
are zero-shot reasoners, 2023.
[19] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React:
Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning
Representations, 2023.
[20] Huan ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao
Qiu, Xuan Qi, Yiran Wu, Hongru Wang, Han Xiao, Yuhang Zhou, Shaokun Zhang, Jiayi Zhang, Jinyu Xiang,
Yixiong Fang, Qiwen Zhao, Dongrui Liu, Qihan Ren, Cheng Qian, Zhenhailong Wang, Minda Hu, Huazheng
Wang, Qingyun Wu, Heng Ji, and Mengdi Wang. A survey of self-evolving agents: On path to artificial super
intelligence, 2025.

10
[21] Jiahao Qiu, Xuan Qi, Tongcheng Zhang, Xinzhe Juan, Jiacheng Guo, Yifu Lu, Yimin Wang, Zixin Yao, Qihan
Ren, Xun Jiang, Xing Zhou, Dongrui Liu, Ling Yang, Yue Wu, Kaixuan Huang, Shilong Liu, Hongru Wang,
and Mengdi Wang. Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and
maximal self-evolution, 2025.
[22] Lang Feng, Zhenghai Xue, Tingcong Liu, and Bo An. Group-in-group policy optimization for llm agent training,
2025.
[23] Jiahao Qiu, Xinzhe Juan, Yimin Wang, Ling Yang, Xuan Qi, Tongcheng Zhang, Jiacheng Guo, Yifu Lu, Zixin Yao,
Hongru Wang, Shilong Liu, Xun Jiang, Liu Leqi, and Mengdi Wang. Agentdistill: Training-free agent distillation
with generalizable mcp boxes, 2025.
[24] Grégoire Mialon, Clémentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom. Gaia: a benchmark for
general ai assistants. In The Twelfth International Conference on Learning Representations, 2023.
[25] Thang Luong and Edward Lockhart. Advanced version of gemini with deep think officially achieves gold-medal
standard at the international mathematical olympiad. Google DeepMind Blog, July 2025.
[26] Jiahao Qiu, Fulian Xiao, Yimin Wang, Yuchen Mao, Yijia Chen, Xinzhe Juan, Shu Zhang, Siran Wang, Xuan
Qi, Tongcheng Zhang, Zixin Yao, Jiacheng Guo, Yifu Lu, Charles Argon, Jundi Cui, Daixin Chen, Junran Zhou,
Shuyao Zhou, Zhanpeng Zhou, Ling Yang, Shilong Liu, Hongru Wang, Kaixuan Huang, Xun Jiang, Yuming Cao,
Yue Chen, Yunfei Chen, Zhengyi Chen, Ruowei Dai, Mengqiu Deng, Jiye Fu, Yunting Gu, Zijie Guan, Zirui
Huang, Xiaoyan Ji, Yumeng Jiang, Delong Kong, Haolong Li, Jiaqi Li, Ruipeng Li, Tianze Li, Zhuoran Li, Haixia
Lian, Mengyue Lin, Xudong Liu, Jiayi Lu, Jinghan Lu, Wanyu Luo, Ziyue Luo, Zihao Pu, Zhi Qiao, Ruihuan Ren,
Liang Wan, Ruixiang Wang, Tianhui Wang, Yang Wang, Zeyu Wang, Zihua Wang, Yujia Wu, Zhaoyi Wu, Hao
Xin, Weiao Xing, Ruojun Xiong, Weijie Xu, Yao Shu, Yao Xiao, Xiaorui Yang, Yuchen Yang, Nan Yi, Jiadong Yu,
Yangyuxuan Yu, Huiting Zeng, Danni Zhang, Yunjie Zhang, Zhaoyu Zhang, Zhiheng Zhang, Xiaofeng Zheng,
Peirong Zhou, Linyan Zhong, Xiaoyin Zong, Ying Zhao, Zhenxin Chen, Lin Ding, Xiaoyu Gao, Bingbing Gong,
Yichao Li, Yang Liao, Guang Ma, Tianyuan Ma, Xinrui Sun, Tianyi Wang, Han Xia, Ruobing Xian, Gen Ye,
Tengfei Yu, Wentao Zhang, Yuxi Wang, Xi Gao, and Mengdi Wang. On path to multimodal historical reasoning:
Histbench and histagent, 2025.
[27] Wolfram Alpha LLC. Wolfram|alpha. https://www.wolframalpha.com/, 2009. Accessed: 2025-08-12.
[28] Aymeric Roucher, Albert Villanova del Moral, Thomas Wolf, Leandro von Werra, and Erik Kaunismäki. ‘smo-
lagents‘: a smol library to build great agentic systems. https://github.com/huggingface/smolagents,
2025.
[29] Yichen Huang and Lin F. Yang. Gemini 2.5 pro capable of winning gold at imo 2025, 2025.
[30] Haohan Lin, Zhiqing Sun, Sean Welleck, and Yiming Yang. Lean-star: Learning to interleave thinking and
proving, 2025.
[31] Lord Rayleigh and J. H. Jeans. On the theory of quantized matter and radiation – the rayleigh–jeans law and the
ultraviolet catastrophe. Philosophical Magazine, 1900–1905. Classical law predicting divergent energy at short
wavelengths (“ultraviolet catastrophe”).
[32] Max Planck. On the law of distribution of energy in the normal spectrum. Annalen der Physik, 4:553–563, 1900.
Introduction of quantized energy elements, resolving the ultraviolet catastrophe.
[33] Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel
Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan
Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Om-
ran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu, Toby Boyd, Brad Hekman, Aaron Parisi, Chaoyi
Zhang, Kornraphop Kawintiranon, Tania Bedrax-Weiss, Oliver Wang, Ya Xu, Ollie Purkiss, Uri Mendlovic, Ilaï
Deutel, Nam Nguyen, Adam Langley, Flip Korn, Lucia Rossazza, Alexandre Ramé, Sagar Waghmare, Helen
Miller, Nathan Byrd, Ashrith Sheshan, Raia Hadsell Sangnie Bhardwaj, Pawel Janus, Tero Rissa, Dan Horgan,
Sharon Silver, Ayzaan Wahid, Sergey Brin, Yves Raimond, Klemen Kloboves, Cindy Wang, Nitesh Bharadwaj
Gundavarapu, Ilia Shumailov, Bo Wang, Mantas Pajarskas, Joe Heyward, Martin Nikoltchev, Maciej Kula, Hao
Zhou, Zachary Garrett, Sushant Kafle, Sercan Arik, Ankita Goel, Mingyao Yang, Jiho Park, Koji Kojima, Parsa
Mahmoudieh, Koray Kavukcuoglu, Grace Chen, Doug Fritz, Anton Bulyenov, Sudeshna Roy, Dimitris Paparas,
Hadar Shemtov, Bo-Juen Chen, Robin Strudel, David Reitter, Aurko Roy, Andrey Vlasov, Changwan Ryu, Chas
Leichner, Haichuan Yang, Zelda Mariet, Denis Vnukov, Tim Sohn, Amy Stuart, Wei Liang, Minmin Chen, Praynaa
Rawlani, Christy Koh, JD Co-Reyes, Guangda Lai, Praseem Banzal, Dimitrios Vytiniotis, Jieru Mei, Mu Cai,
Mohammed Badawi, Corey Fry, Ale Hartman, Daniel Zheng, Eric Jia, James Keeling, Annie Louis, Ying Chen,
Efren Robles, Wei-Chih Hung, Howard Zhou, Nikita Saxena, Sonam Goenka, Olivia Ma, Zach Fisher, Mor Hazan
Taege, Emily Graves, David Steiner, Yujia Li, Sarah Nguyen, Rahul Sukthankar, Joe Stanton, Ali Eslami, Gloria

11
Shen, Berkin Akin, Alexey Guseynov, Yiqian Zhou, Jean-Baptiste Alayrac, Armand Joulin, Efrat Farkash, Ashish
Thapliyal, Stephen Roller, Noam Shazeer, Todor Davchev, Terry Koo, Hannah Forbes-Pollard, Kartik Audhkhasi,
Greg Farquhar, Adi Mayrav Gilady, Maggie Song, John Aslanides, Piermaria Mendolicchio, Alicia Parrish,
John Blitzer, Pramod Gupta, Xiaoen Ju, Xiaochen Yang, Puranjay Datta, Andrea Tacchetti, Sanket Vaibhav
Mehta, Gregory Dibb, Shubham Gupta, Federico Piccinini, Raia Hadsell, Sujee Rajayogam, Jiepu Jiang, Patrick
Griffin, Patrik Sundberg, Jamie Hayes, Alexey Frolov, Tian Xie, Adam Zhang, Kingshuk Dasgupta, Uday Kalra,
Lior Shani, Klaus Macherey, Tzu-Kuo Huang, Liam MacDermed, Karthik Duddu, Paulo Zacchello, Zi Yang,
Jessica Lo, Kai Hui, Matej Kastelic, Derek Gasaway, Qijun Tan, Summer Yue, Pablo Barrio, John Wieting, Weel
Yang, Andrew Nystrom, Solomon Demmessie, Anselm Levskaya, Fabio Viola, Chetan Tekur, Greg Billock,
George Necula, Mandar Joshi, Rylan Schaeffer, Swachhand Lokhande, Christina Sorokin, Pradeep Shenoy, Mia
Chen, Mark Collier, Hongji Li, Taylor Bos, Nevan Wichers, Sun Jae Lee, Angéline Pouget, Santhosh Thangaraj,
Kyriakos Axiotis, Phil Crone, Rachel Sterneck, Nikolai Chinaev, Victoria Krakovna, Oleksandr Ferludin, Ian
Gemp, Stephanie Winkler, Dan Goldberg, Ivan Korotkov, Kefan Xiao, Malika Mehrotra, Sandeep Mariserla,
Vihari Piratla, Terry Thurk, Khiem Pham, Hongxu Ma, Alexandre Senges, Ravi Kumar, Clemens Meyer, Ellie
Talius, Nuo Wang Pierse, Ballie Sandhu, Horia Toma, Kuo Lin, Swaroop Nath, Tom Stone, Dorsa Sadigh, Nikita
Gupta, Arthur Guez, Avi Singh, Matt Thomas, Tom Duerig, Yuan Gong, Richard Tanburn, Lydia Lihui Zhang,
Phuong Dao, Mohamed Hammad, Sirui Xie, Shruti Rijhwani, Ben Murdoch, Duhyeon Kim, Will Thompson,
Heng-Tze Cheng, Daniel Sohn, Pablo Sprechmann, Qiantong Xu, Srinivas Tadepalli, Peter Young, Ye Zhang,
Hansa Srinivasan, Miranda Aperghis, Aditya Ayyar, Hen Fitoussi, Ryan Burnell, David Madras, Mike Dusenberry,
Xi Xiong, Tayo Oguntebi, Ben Albrecht, Jörg Bornschein, Jovana Mitrović, Mason Dimarco, Bhargav Kanagal
Shamanna, Premal Shah, Eren Sezener, Shyam Upadhyay, Dave Lacey, Craig Schiff, Sebastien Baur, Sanjay
Ganapathy, Eva Schnider, Mateo Wirth, Connor Schenck, Andrey Simanovsky, Yi-Xuan Tan, Philipp Fränken,
Dennis Duan, Bharath Mankalale, Nikhil Dhawan, Kevin Sequeira, Zichuan Wei, Shivanker Goel, Caglar Unlu,
Yukun Zhu, Haitian Sun, Ananth Balashankar, Kurt Shuster, Megh Umekar, Mahmoud Alnahlawi, Aäron van den
Oord, Kelly Chen, Yuexiang Zhai, Zihang Dai, Kuang-Huei Lee, Eric Doi, Lukas Zilka, Rohith Vallu, Disha
Shrivastava, Jason Lee, Hisham Husain, Honglei Zhuang, Vincent Cohen-Addad, Jarred Barber, James Atwood,
Adam Sadovsky, Quentin Wellens, Steven Hand, Arunkumar Rajendran, Aybuke Turker, CJ Carey, Yuanzhong
Xu, Hagen Soltau, Zefei Li, Xinying Song, Conglong Li, Iurii Kemaev, Sasha Brown, Andrea Burns, Viorica
Patraucean, Piotr Stanczyk, Renga Aravamudhan, Mathieu Blondel, Hila Noga, Lorenzo Blanco, Will Song,
Michael Isard, Mandar Sharma, Reid Hayes, Dalia El Badawy, Avery Lamp, Itay Laish, Olga Kozlova, Kelvin
Chan, Sahil Singla, Srinivas Sunkara, Mayank Upadhyay, Chang Liu, Aijun Bai, Jarek Wilkiewicz, Martin Zlocha,
Jeremiah Liu, Zhuowan Li, Haiguang Li, Omer Barak, Ganna Raboshchuk, Jiho Choi, Fangyu Liu, Erik Jue,
Mohit Sharma, Andreea Marzoca, Robert Busa-Fekete, Anna Korsun, Andre Elisseeff, Zhe Shen, Sara Mc Carthy,
Kay Lamerigts, Anahita Hosseini, Hanzhao Lin, Charlie Chen, Fan Yang, Kushal Chauhan, Mark Omernick,
Dawei Jia, Karina Zainullina, Demis Hassabis, Danny Vainstein, Ehsan Amid, Xiang Zhou, Ronny Votel, Eszter
Vértes, Xinjian Li, Zongwei Zhou, Angeliki Lazaridou, Brendan McMahan, Arjun Narayanan, Hubert Soyer,
Sujoy Basu, Kayi Lee, Bryan Perozzi, Qin Cao, Leonard Berrada, Rahul Arya, Ke Chen, Katrina, Xu, Matthias
Lochbrunner, Alex Hofer, Sahand Sharifzadeh, Renjie Wu, Sally Goldman, Pranjal Awasthi, Xuezhi Wang, Yan
Wu, Claire Sha, Biao Zhang, Maciej Mikuła, Filippo Graziano, Siobhan Mcloughlin, Irene Giannoumis, Youhei
Namiki, Chase Malik, Carey Radebaugh, Jamie Hall, Ramiro Leal-Cavazos, Jianmin Chen, Vikas Sindhwani,
David Kao, David Greene, Jordan Griffith, Chris Welty, Ceslee Montgomery, Toshihiro Yoshino, Liangzhe Yuan,
Noah Goodman, Assaf Hurwitz Michaely, Kevin Lee, KP Sawhney, Wei Chen, Zheng Zheng, Megan Shum,
Nikolay Savinov, Etienne Pot, Alex Pak, Morteza Zadimoghaddam, Sijal Bhatnagar, Yoad Lewenberg, Blair
Kutzman, Ji Liu, Lesley Katzen, Jeremy Selier, Josip Djolonga, Dmitry Lepikhin, Kelvin Xu, Jacky Liang, Jiewen
Tan, Benoit Schillings, Muge Ersoy, Pete Blois, Bernd Bandemer, Abhimanyu Singh, Sergei Lebedev, Pankaj
Joshi, Adam R. Brown, Evan Palmer, Shreya Pathak, Komal Jalan, Fedir Zubach, Shuba Lall, Randall Parker,
Alok Gunjan, Sergey Rogulenko, Sumit Sanghai, Zhaoqi Leng, Zoltan Egyed, Shixin Li, Maria Ivanova, Kostas
Andriopoulos, Jin Xie, Elan Rosenfeld, Auriel Wright, Ankur Sharma, Xinyang Geng, Yicheng Wang, Sam Kwei,
Renke Pan, Yujing Zhang, Gabby Wang, Xi Liu, Chak Yeung, Elizabeth Cole, Aviv Rosenberg, Zhen Yang, Phil
Chen, George Polovets, Pranav Nair, Rohun Saxena, Josh Smith, Shuo yiin Chang, Aroma Mahendru, Svetlana
Grant, Anand Iyer, Irene Cai, Jed McGiffin, Jiaming Shen, Alanna Walton, Antonious Girgis, Oliver Woodman,
Rosemary Ke, Mike Kwong, Louis Rouillard, Jinmeng Rao, Zhihao Li, Yuntao Xu, Flavien Prost, Chi Zou, Ziwei
Ji, Alberto Magni, Tyler Liechty, Dan A. Calian, Deepak Ramachandran, Igor Krivokon, Hui Huang, Terry Chen,
Anja Hauth, Anastasija Ilić, Weijuan Xi, Hyeontaek Lim, Vlad-Doru Ion, Pooya Moradi, Metin Toksoz-Exley,
Kalesha Bullard, Miltos Allamanis, Xiaomeng Yang, Sophie Wang, Zhi Hong, Anita Gergely, Cheng Li, Bhav-
ishya Mittal, Vitaly Kovalev, Victor Ungureanu, Jane Labanowski, Jan Wassenberg, Nicolas Lacasse, Geoffrey
Cideron, Petar Dević, Annie Marsden, Lynn Nguyen, Michael Fink, Yin Zhong, Tatsuya Kiyono, Desi Ivanov,
Sally Ma, Max Bain, Kiran Yalasangi, Jennifer She, Anastasia Petrushkina, Mayank Lunayach, Carla Bromberg,
Sarah Hodkinson, Vilobh Meshram, Daniel Vlasic, Austin Kyker, Steve Xu, Jeff Stanway, Zuguang Yang, Kai

12
Zhao, Matthew Tung, Seth Odoom, Yasuhisa Fujii, Justin Gilmer, Eunyoung Kim, Felix Halim, Quoc Le, Bernd
Bohnet, Seliem El-Sayed, Behnam Neyshabur, Malcolm Reynolds, Dean Reich, Yang Xu, Erica Moreira, Anuj
Sharma, Zeyu Liu, Mohammad Javad Hosseini, Naina Raisinghani, Yi Su, Ni Lao, Daniel Formoso, Marco Gelmi,
Almog Gueta, Tapomay Dey, Elena Gribovskaya, Domagoj Ćevid, Sidharth Mudgal, Garrett Bingham, Jianling
Wang, Anurag Kumar, Alex Cullum, Feng Han, Konstantinos Bousmalis, Diego Cedillo, Grace Chu, Vladimir
Magay, Paul Michel, Ester Hlavnova, Daniele Calandriello, Setareh Ariafar, Kaisheng Yao, Vikash Sehwag, Arpi
Vezer, Agustin Dal Lago, Zhenkai Zhu, Paul Kishan Rubenstein, Allen Porter, Anirudh Baddepudi, Oriana Riva,
Mihai Dorin Istin, Chih-Kuan Yeh, Zhi Li, Andrew Howard, Nilpa Jha, Jeremy Chen, Raoul de Liedekerke,
Zafarali Ahmed, Mikel Rodriguez, Tanuj Bhatia, Bangju Wang, Ali Elqursh, David Klinghoffer, Peter Chen,
Pushmeet Kohli, Te I, Weiyang Zhang, Zack Nado, Jilin Chen, Maxwell Chen, George Zhang, Aayush Singh,
Adam Hillier, Federico Lebron, Yiqing Tao, Ting Liu, Gabriel Dulac-Arnold, Jingwei Zhang, Shashi Narayan,
Buhuang Liu, Orhan Firat, Abhishek Bhowmick, Bingyuan Liu, Hao Zhang, Zizhao Zhang, Georges Rotival,
Nathan Howard, Anu Sinha, Alexander Grushetsky, Benjamin Beyret, Keerthana Gopalakrishnan, James Zhao,
Kyle He, Szabolcs Payrits, Zaid Nabulsi, Zhaoyi Zhang, Weijie Chen, Edward Lee, Nova Fallen, Sreenivas Golla-
pudi, Aurick Zhou, Filip Pavetić, Thomas Köppe, Shiyu Huang, Rama Pasumarthi, Nick Fernando, Felix Fischer,
Daria Ćurko, Yang Gao, James Svensson, Austin Stone, Haroon Qureshi, Abhishek Sinha, Apoorv Kulshreshtha,
Martin Matysiak, Jieming Mao, Carl Saroufim, Aleksandra Faust, Qingnan Duan, Gil Fidel, Kaan Katircioglu,
Raphaël Lopez Kaufman, Dhruv Shah, Weize Kong, Abhishek Bapna, Gellért Weisz, Emma Dunleavy, Praneet
Dutta, Tianqi Liu, Rahma Chaabouni, Carolina Parada, Marcus Wu, Alexandra Belias, Alessandro Bissacco,
Stanislav Fort, Li Xiao, Fantine Huot, Chris Knutsen, Yochai Blau, Gang Li, Jennifer Prendki, Juliette Love,
Yinlam Chow, Pichi Charoenpanit, Hidetoshi Shimokawa, Vincent Coriou, Karol Gregor, Tomas Izo, Arjun
Akula, Mario Pinto, Chris Hahn, Dominik Paulus, Jiaxian Guo, Neha Sharma, Cho-Jui Hsieh, Adaeze Chukwuka,
Kazuma Hashimoto, Nathalie Rauschmayr, Ling Wu, Christof Angermueller, Yulong Wang, Sebastian Gerlach,
Michael Pliskin, Daniil Mirylenka, Min Ma, Lexi Baugher, Bryan Gale, Shaan Bijwadia, Nemanja Rakićević,
David Wood, Jane Park, Chung-Ching Chang, Babi Seal, Chris Tar, Kacper Krasowiak, Yiwen Song, Georgi
Stephanov, Gary Wang, Marcello Maggioni, Stein Xudong Lin, Felix Wu, Shachi Paul, Zixuan Jiang, Shubham
Agrawal, Bilal Piot, Alex Feng, Cheolmin Kim, Tulsee Doshi, Jonathan Lai, Chuqiao, Xu, Sharad Vikram, Ciprian
Chelba, Sebastian Krause, Vincent Zhuang, Jack Rae, Timo Denk, Adrian Collister, Lotte Weerts, Xianghong
Luo, Yifeng Lu, Håvard Garnes, Nitish Gupta, Terry Spitz, Avinatan Hassidim, Lihao Liang, Izhak Shafran, Peter
Humphreys, Kenny Vassigh, Phil Wallis, Virat Shejwalkar, Nicolas Perez-Nieves, Rachel Hornung, Melissa Tan,
Beka Westberg, Andy Ly, Richard Zhang, Brian Farris, Jongbin Park, Alec Kosik, Zeynep Cankara, Andrii Maksai,
Yunhan Xu, Albin Cassirer, Sergi Caelles, Abbas Abdolmaleki, Mencher Chiang, Alex Fabrikant, Shravya Shetty,
Luheng He, Mai Giménez, Hadi Hashemi, Sheena Panthaplackel, Yana Kulizhskaya, Salil Deshmukh, Daniele
Pighin, Robin Alazard, Disha Jindal, Seb Noury, Pradeep Kumar S, Siyang Qin, Xerxes Dotiwalla, Stephen
Spencer, Mohammad Babaeizadeh, Blake JianHang Chen, Vaibhav Mehta, Jennie Lees, Andrew Leach, Penporn
Koanantakool, Ilia Akolzin, Ramona Comanescu, Junwhan Ahn, Alexey Svyatkovskiy, Basil Mustafa, David
D’Ambrosio, Shiva Mohan Reddy Garlapati, Pascal Lamblin, Alekh Agarwal, Shuang Song, Pier Giuseppe
Sessa, Pauline Coquinot, John Maggs, Hussain Masoom, Divya Pitta, Yaqing Wang, Patrick Morris-Suzuki, Billy
Porter, Johnson Jia, Jeffrey Dudek, Raghavender R, Cosmin Paduraru, Alan Ansell, Tolga Bolukbasi, Tony Lu,
Ramya Ganeshan, Zi Wang, Henry Griffiths, Rodrigo Benenson, Yifan He, James Swirhun, George Papamakarios,
Aditya Chawla, Kuntal Sengupta, Yan Wang, Vedrana Milutinovic, Igor Mordatch, Zhipeng Jia, Jamie Smith,
Will Ng, Shitij Nigam, Matt Young, Eugen Vušak, Blake Hechtman, Sheela Goenka, Avital Zipori, Kareem
Ayoub, Ashok Popat, Trilok Acharya, Luo Yu, Dawn Bloxwich, Hugo Song, Paul Roit, Haiqiong Li, Aviel
Boag, Nigamaa Nayakanti, Bilva Chandra, Tianli Ding, Aahil Mehta, Cath Hope, Jiageng Zhang, Idan Heimlich
Shtacher, Kartikeya Badola, Ryo Nakashima, Andrei Sozanschi, Iulia Comşa, Ante Žužul, Emily Caveness, Julian
Odell, Matthew Watson, Dario de Cesare, Phillip Lippe, Derek Lockhart, Siddharth Verma, Huizhong Chen,
Sean Sun, Lin Zhuo, Aditya Shah, Prakhar Gupta, Alex Muzio, Ning Niu, Amir Zait, Abhinav Singh, Meenu
Gaba, Fan Ye, Prajit Ramachandran, Mohammad Saleh, Raluca Ada Popa, Ayush Dubey, Frederick Liu, Sara
Javanmardi, Mark Epstein, Ross Hemsley, Richard Green, Nishant Ranka, Eden Cohen, Chuyuan Kelly Fu, Sanjay
Ghemawat, Jed Borovik, James Martens, Anthony Chen, Pranav Shyam, André Susano Pinto, Ming-Hsuan Yang,
Alexandru Ţifrea, David Du, Boqing Gong, Ayushi Agarwal, Seungyeon Kim, Christian Frank, Saloni Shah,
Xiaodan Song, Zhiwei Deng, Ales Mikhalap, Kleopatra Chatziprimou, Timothy Chung, Toni Creswell, Susan
Zhang, Yennie Jun, Carl Lebsack, Will Truong, Slavica Andačić, Itay Yona, Marco Fornoni, Rong Rong, Serge
Toropov, Afzal Shama Soudagar, Andrew Audibert, Salah Zaiem, Zaheer Abbas, Andrei Rusu, Sahitya Potluri,
Shitao Weng, Anastasios Kementsietsidis, Anton Tsitsulin, Daiyi Peng, Natalie Ha, Sanil Jain, Tejasi Latkar,
Simeon Ivanov, Cory McLean, Anirudh GP, Rajesh Venkataraman, Canoee Liu, Dilip Krishnan, Joel D’sa, Roey
Yogev, Paul Collins, Benjamin Lee, Lewis Ho, Carl Doersch, Gal Yona, Shawn Gao, Felipe Tiengo Ferreira, Adnan
Ozturel, Hannah Muckenhirn, Ce Zheng, Gargi Balasubramaniam, Mudit Bansal, George van den Driessche, Sivan
Eiger, Salem Haykal, Vedant Misra, Abhimanyu Goyal, Danilo Martins, Gary Leung, Jonas Valfridsson, Four

13
Flynn, Will Bishop, Chenxi Pang, Yoni Halpern, Honglin Yu, Lawrence Moore, Yuvein, Zhu, Sridhar Thiagarajan,
Yoel Drori, Zhisheng Xiao, Lucio Dery, Rolf Jagerman, Jing Lu, Eric Ge, Vaibhav Aggarwal, Arjun Khare, Vinh
Tran, Oded Elyada, Ferran Alet, James Rubin, Ian Chou, David Tian, Libin Bai, Lawrence Chan, Lukasz Lew,
Karolis Misiunas, Taylan Bilal, Aniket Ray, Sindhu Raghuram, Alex Castro-Ros, Viral Carpenter, CJ Zheng,
Michael Kilgore, Josef Broder, Emily Xue, Praveen Kallakuri, Dheeru Dua, Nancy Yuen, Steve Chien, John
Schultz, Saurabh Agrawal, Reut Tsarfaty, Jingcao Hu, Ajay Kannan, Dror Marcus, Nisarg Kothari, Baochen Sun,
Ben Horn, Matko Bošnjak, Ferjad Naeem, Dean Hirsch, Lewis Chiang, Boya Fang, Jie Han, Qifei Wang, Ben
Hora, Antoine He, Mario Lučić, Beer Changpinyo, Anshuman Tripathi, John Youssef, Chester Kwak, Philippe
Schlattner, Cat Graves, Rémi Leblond, Wenjun Zeng, Anders Andreassen, Gabriel Rasskin, Yue Song, Eddie Cao,
Junhyuk Oh, Matt Hoffman, Wojtek Skut, Yichi Zhang, Jon Stritar, Xingyu Cai, Saarthak Khanna, Kathie Wang,
Shriya Sharma, Christian Reisswig, Younghoon Jun, Aman Prasad, Tatiana Sholokhova, Preeti Singh, Adi Gerzi
Rosenthal, Anian Ruoss, Françoise Beaufays, Sean Kirmani, Dongkai Chen, Johan Schalkwyk, Jonathan Herzig,
Been Kim, Josh Jacob, Damien Vincent, Adrian N Reyes, Ivana Balazevic, Léonard Hussenot, Jon Schneider,
Parker Barnes, Luis Castro, Spandana Raj Babbula, Simon Green, Serkan Cabi, Nico Duduta, Danny Driess,
Rich Galt, Noam Velan, Junjie Wang, Hongyang Jiao, Matthew Mauger, Du Phan, Miteyan Patel, Vlado Galić,
Jerry Chang, Eyal Marcus, Matt Harvey, Julian Salazar, Elahe Dabir, Suraj Satishkumar Sheth, Amol Mandhane,
Hanie Sedghi, Jeremiah Willcock, Amir Zandieh, Shruthi Prabhakara, Aida Amini, Antoine Miech, Victor Stone,
Massimo Nicosia, Paul Niemczyk, Ying Xiao, Lucy Kim, Sławek Kwasiborski, Vikas Verma, Ada Maksutaj
Oflazer, Christoph Hirnschall, Peter Sung, Lu Liu, Richard Everett, Michiel Bakker, Ágoston Weisz, Yufei Wang,
Vivek Sampathkumar, Uri Shaham, Bibo Xu, Yasemin Altun, Mingqiu Wang, Takaaki Saeki, Guanjie Chen,
Emanuel Taropa, Shanthal Vasanth, Sophia Austin, Lu Huang, Goran Petrovic, Qingyun Dou, Daniel Golovin,
Grigory Rozhdestvenskiy, Allie Culp, Will Wu, Motoki Sano, Divya Jain, Julia Proskurnia, Sébastien Cevey,
Alejandro Cruzado Ruiz, Piyush Patil, Mahdi Mirzazadeh, Eric Ni, Javier Snaider, Lijie Fan, Alexandre Fréchette,
AJ Pierigiovanni, Shariq Iqbal, Kenton Lee, Claudio Fantacci, Jinwei Xing, Lisa Wang, Alex Irpan, David Raposo,
Yi Luan, Zhuoyuan Chen, Harish Ganapathy, Kevin Hui, Jiazhong Nie, Isabelle Guyon, Heming Ge, Roopali
Vij, Hui Zheng, Dayeong Lee, Alfonso Castaño, Khuslen Baatarsukh, Gabriel Ibagon, Alexandra Chronopoulou,
Nicholas FitzGerald, Shashank Viswanadha, Safeen Huda, Rivka Moroshko, Georgi Stoyanov, Prateek Kolhar,
Alain Vaucher, Ishaan Watts, Adhi Kuncoro, Henryk Michalewski, Satish Kambala, Bat-Orgil Batsaikhan, Alek
Andreev, Irina Jurenka, Maigo Le, Qihang Chen, Wael Al Jishi, Sarah Chakera, Zhe Chen, Aditya Kini, Vikas
Yadav, Aditya Siddhant, Ilia Labzovsky, Balaji Lakshminarayanan, Carrie Grimes Bostock, Pankil Botadra,
Ankesh Anand, Colton Bishop, Sam Conway-Rahman, Mohit Agarwal, Yani Donchev, Achintya Singhal, Félix
de Chaumont Quitry, Natalia Ponomareva, Nishant Agrawal, Bin Ni, Kalpesh Krishna, Masha Samsikova, John
Karro, Yilun Du, Tamara von Glehn, Caden Lu, Christopher A. Choquette-Choo, Zhen Qin, Tingnan Zhang,
Sicheng Li, Divya Tyam, Swaroop Mishra, Wing Lowe, Colin Ji, Weiyi Wang, Manaal Faruqui, Ambrose Slone,
Valentin Dalibard, Arunachalam Narayanaswamy, John Lambert, Pierre-Antoine Manzagol, Dan Karliner, Andrew
Bolt, Ivan Lobov, Aditya Kusupati, Chang Ye, Xuan Yang, Heiga Zen, Nelson George, Mukul Bhutani, Olivier
Lacombe, Robert Riachi, Gagan Bansal, Rachel Soh, Yue Gao, Yang Yu, Adams Yu, Emily Nottage, Tania
Rojas-Esponda, James Noraky, Manish Gupta, Ragha Kotikalapudi, Jichuan Chang, Sanja Deur, Dan Graur, Alex
Mossin, Erin Farnese, Ricardo Figueira, Alexandre Moufarek, Austin Huang, Patrik Zochbauer, Ben Ingram,
Tongzhou Chen, Zelin Wu, Adrià Puigdomènech, Leland Rechis, Da Yu, Sri Gayatri Sundara Padmanabhan, Rui
Zhu, Chu ling Ko, Andrea Banino, Samira Daruki, Aarush Selvan, Dhruva Bhaswar, Daniel Hernandez Diaz, Chen
Su, Salvatore Scellato, Jennifer Brennan, Woohyun Han, Grace Chung, Priyanka Agrawal, Urvashi Khandelwal,
Khe Chai Sim, Morgane Lustman, Sam Ritter, Kelvin Guu, Jiawei Xia, Prateek Jain, Emma Wang, Tyrone Hill,
Mirko Rossini, Marija Kostelac, Tautvydas Misiunas, Amit Sabne, Kyuyeun Kim, Ahmet Iscen, Congchao Wang,
José Leal, Ashwin Sreevatsa, Utku Evci, Manfred Warmuth, Saket Joshi, Daniel Suo, James Lottes, Garrett Honke,
Brendan Jou, Stefani Karp, Jieru Hu, Himanshu Sahni, Adrien Ali Taïga, William Kong, Samrat Ghosh, Renshen
Wang, Jay Pavagadhi, Natalie Axelsson, Nikolai Grigorev, Patrick Siegler, Rebecca Lin, Guohui Wang, Emilio
Parisotto, Sharath Maddineni, Krishan Subudhi, Eyal Ben-David, Elena Pochernina, Orgad Keller, Thi Avrahami,
Zhe Yuan, Pulkit Mehta, Jialu Liu, Sherry Yang, Wendy Kan, Katherine Lee, Tom Funkhouser, Derek Cheng,
Hongzhi Shi, Archit Sharma, Joe Kelley, Matan Eyal, Yury Malkov, Corentin Tallec, Yuval Bahat, Shen Yan,
Xintian, Wu, David Lindner, Chengda Wu, Avi Caciularu, Xiyang Luo, Rodolphe Jenatton, Tim Zaman, Yingying
Bi, Ilya Kornakov, Ganesh Mallya, Daisuke Ikeda, Itay Karo, Anima Singh, Colin Evans, Praneeth Netrapalli,
Vincent Nallatamby, Isaac Tian, Yannis Assael, Vikas Raunak, Victor Carbune, Ioana Bica, Lior Madmoni, Dee
Cattle, Snchit Grover, Krishna Somandepalli, Sid Lall, Amelio Vázquez-Reina, Riccardo Patana, Jiaqi Mu, Pranav
Talluri, Maggie Tran, Rajeev Aggarwal, RJ Skerry-Ryan, Jun Xu, Mike Burrows, Xiaoyue Pan, Edouard Yvinec,
Di Lu, Zhiying Zhang, Duc Dung Nguyen, Hairong Mu, Gabriel Barcik, Helen Ran, Lauren Beltrone, Krzysztof
Choromanski, Dia Kharrat, Samuel Albanie, Sean Purser-haskell, David Bieber, Carrie Zhang, Jing Wang, Tom
Hudson, Zhiyuan Zhang, Han Fu, Johannes Mauerer, Mohammad Hossein Bateni, AJ Maschinot, Bing Wang,
Muye Zhu, Arjun Pillai, Tobias Weyand, Shuang Liu, Oscar Akerlund, Fred Bertsch, Vittal Premachandran, Alicia

14
Jin, Vincent Roulet, Peter de Boursac, Shubham Mittal, Ndaba Ndebele, Georgi Karadzhov, Sahra Ghalebikesabi,
Ricky Liang, Allen Wu, Yale Cong, Nimesh Ghelani, Sumeet Singh, Bahar Fatemi, Warren, Chen, Charles Kwong,
Alexey Kolganov, Steve Li, Richard Song, Chenkai Kuang, Sobhan Miryoosefi, Dale Webster, James Wendt,
Arkadiusz Socala, Guolong Su, Artur Mendonça, Abhinav Gupta, Xiaowei Li, Tomy Tsai, Qiong, Hu, Kai Kang,
Angie Chen, Sertan Girgin, Yongqin Xian, Andrew Lee, Nolan Ramsden, Leslie Baker, Madeleine Clare Elish,
Varvara Krayvanova, Rishabh Joshi, Jiri Simsa, Yao-Yuan Yang, Piotr Ambroszczyk, Dipankar Ghosh, Arjun
Kar, Yuan Shangguan, Yumeya Yamamori, Yaroslav Akulov, Andy Brock, Haotian Tang, Siddharth Vashishtha,
Rich Munoz, Andreas Steiner, Kalyan Andra, Daniel Eppens, Qixuan Feng, Hayato Kobayashi, Sasha Goldshtein,
Mona El Mahdy, Xin Wang, Jilei, Wang, Richard Killam, Tom Kwiatkowski, Kavya Kopparapu, Serena Zhan,
Chao Jia, Alexei Bendebury, Sheryl Luo, Adrià Recasens, Timothy Knight, Jing Chen, Mohak Patel, YaGuang Li,
Ben Withbroe, Dean Weesner, Kush Bhatia, Jie Ren, Danielle Eisenbud, Ebrahim Songhori, Yanhua Sun, Travis
Choma, Tasos Kementsietsidis, Lucas Manning, Brian Roark, Wael Farhan, Jie Feng, Susheel Tatineni, James
Cobon-Kerr, Yunjie Li, Lisa Anne Hendricks, Isaac Noble, Chris Breaux, Nate Kushman, Liqian Peng, Fuzhao
Xue, Taylor Tobin, Jamie Rogers, Josh Lipschultz, Chris Alberti, Alexey Vlaskin, Mostafa Dehghani, Roshan
Sharma, Tris Warkentin, Chen-Yu Lee, Benigno Uria, Da-Cheng Juan, Angad Chandorkar, Hila Sheftel, Ruibo
Liu, Elnaz Davoodi, Borja De Balle Pigem, Kedar Dhamdhere, David Ross, Jonathan Hoech, Mahdis Mahdieh,
Li Liu, Qiujia Li, Liam McCafferty, Chenxi Liu, Markus Mircea, Yunting Song, Omkar Savant, Alaa Saade,
Colin Cherry, Vincent Hellendoorn, Siddharth Goyal, Paul Pucciarelli, David Vilar Torres, Zohar Yahav, Hyo Lee,
Lars Lowe Sjoesund, Christo Kirov, Bo Chang, Deepanway Ghoshal, Lu Li, Gilles Baechler, Sébastien Pereira,
Tara Sainath, Anudhyan Boral, Dominik Grewe, Afief Halumi, Nguyet Minh Phu, Tianxiao Shen, Marco Tulio
Ribeiro, Dhriti Varma, Alex Kaskasoli, Vlad Feinberg, Navneet Potti, Jarrod Kahn, Matheus Wisniewski, Shakir
Mohamed, Arnar Mar Hrafnkelsson, Bobak Shahriari, Jean-Baptiste Lespiau, Lisa Patel, Legg Yeung, Tom Paine,
Lantao Mei, Alex Ramirez, Rakesh Shivanna, Li Zhong, Josh Woodward, Guilherme Tubone, Samira Khan, Heng
Chen, Elizabeth Nielsen, Catalin Ionescu, Utsav Prabhu, Mingcen Gao, Qingze Wang, Sean Augenstein, Neesha
Subramaniam, Jason Chang, Fotis Iliopoulos, Jiaming Luo, Myriam Khan, Weicheng Kuo, Denis Teplyashin,
Florence Perot, Logan Kilpatrick, Amir Globerson, Hongkun Yu, Anfal Siddiqui, Nick Sukhanov, Arun Kandoor,
Umang Gupta, Marco Andreetto, Moran Ambar, Donnie Kim, Paweł Wesołowski, Sarah Perrin, Ben Limonchik,
Wei Fan, Jim Stephan, Ian Stewart-Binks, Ryan Kappedal, Tong He, Sarah Cogan, Romina Datta, Tong Zhou,
Jiayu Ye, Leandro Kieliger, Ana Ramalho, Kyle Kastner, Fabian Mentzer, Wei-Jen Ko, Arun Suggala, Tianhao
Zhou, Shiraz Butt, Hana Strejček, Lior Belenki, Subhashini Venugopalan, Mingyang Ling, Evgenii Eltyshev,
Yunxiao Deng, Geza Kovacs, Mukund Raghavachari, Hanjun Dai, Tal Schuster, Steven Schwarcz, Richard Nguyen,
Arthur Nguyen, Gavin Buttimore, Shrestha Basu Mallick, Sudeep Gandhe, Seth Benjamin, Michal Jastrzebski,
Le Yan, Sugato Basu, Chris Apps, Isabel Edkins, James Allingham, Immanuel Odisho, Tomas Kocisky, Jewel
Zhao, Linting Xue, Apoorv Reddy, Chrysovalantis Anastasiou, Aviel Atias, Sam Redmond, Kieran Milan, Nicolas
Heess, Herman Schmit, Allan Dafoe, Daniel Andor, Tynan Gangwani, Anca Dragan, Sheng Zhang, Ashyana
Kachra, Gang Wu, Siyang Xue, Kevin Aydin, Siqi Liu, Yuxiang Zhou, Mahan Malihi, Austin Wu, Siddharth Gopal,
Candice Schumann, Peter Stys, Alek Wang, Mirek Olšák, Dangyi Liu, Christian Schallhart, Yiran Mao, Demetra
Brady, Hao Xu, Tomas Mery, Chawin Sitawarin, Siva Velusamy, Tom Cobley, Alex Zhai, Christian Walder, Nitzan
Katz, Ganesh Jawahar, Chinmay Kulkarni, Antoine Yang, Adam Paszke, Yinan Wang, Bogdan Damoc, Zalán
Borsos, Ray Smith, Jinning Li, Mansi Gupta, Andrei Kapishnikov, Sushant Prakash, Florian Luisier, Rishabh Agar-
wal, Will Grathwohl, Kuangyuan Chen, Kehang Han, Nikhil Mehta, Andrew Over, Shekoofeh Azizi, Lei Meng,
Niccolò Dal Santo, Kelvin Zheng, Jane Shapiro, Igor Petrovski, Jeffrey Hui, Amin Ghafouri, Jasper Snoek, James
Qin, Mandy Jordan, Caitlin Sikora, Jonathan Malmaud, Yuheng Kuang, Aga Świetlik, Ruoxin Sang, Chongyang
Shi, Leon Li, Andrew Rosenberg, Shubin Zhao, Andy Crawford, Jan-Thorsten Peter, Yun Lei, Xavier Garcia, Long
Le, Todd Wang, Julien Amelot, Dave Orr, Praneeth Kacham, Dana Alon, Gladys Tyen, Abhinav Arora, James
Lyon, Alex Kurakin, Mimi Ly, Theo Guidroz, Zhipeng Yan, Rina Panigrahy, Pingmei Xu, Thais Kagohara, Yong
Cheng, Eric Noland, Jinhyuk Lee, Jonathan Lee, Cathy Yip, Maria Wang, Efrat Nehoran, Alexander Bykovsky,
Zhihao Shan, Ankit Bhagatwala, Chaochao Yan, Jie Tan, Guillermo Garrido, Dan Ethier, Nate Hurley, Grace
Vesom, Xu Chen, Siyuan Qiao, Abhishek Nayyar, Julian Walker, Paramjit Sandhu, Mihaela Rosca, Danny Swisher,
Mikhail Dektiarev, Josh Dillon, George-Cristian Muraru, Manuel Tragut, Artiom Myaskovsky, David Reid, Marko
Velic, Owen Xiao, Jasmine George, Mark Brand, Jing Li, Wenhao Yu, Shane Gu, Xiang Deng, François-Xavier
Aubet, Soheil Hassas Yeganeh, Fred Alcober, Celine Smith, Trevor Cohn, Kay McKinney, Michael Tschannen,
Ramesh Sampath, Gowoon Cheon, Liangchen Luo, Luyang Liu, Jordi Orbay, Hui Peng, Gabriela Botea, Xiao-
fan Zhang, Charles Yoon, Cesar Magalhaes, Paweł Stradomski, Ian Mackinnon, Steven Hemingray, Kumaran
Venkatesan, Rhys May, Jaeyoun Kim, Alex Druinsky, Jingchen Ye, Zheng Xu, Terry Huang, Jad Al Abdallah,
Adil Dostmohamed, Rachana Fellinger, Tsendsuren Munkhdalai, Akanksha Maurya, Peter Garst, Yin Zhang,
Maxim Krikun, Simon Bucher, Aditya Srikanth Veerubhotla, Yaxin Liu, Sheng Li, Nishesh Gupta, Jakub Adamek,
Hanwen Chen, Bernett Orlando, Aleksandr Zaks, Joost van Amersfoort, Josh Camp, Hui Wan, HyunJeong Choe,
Zhichun Wu, Kate Olszewska, Weiren Yu, Archita Vadali, Martin Scholz, Daniel De Freitas, Jason Lin, Amy Hua,

15
Xin Liu, Frank Ding, Yichao Zhou, Boone Severson, Katerina Tsihlas, Samuel Yang, Tammo Spalink, Varun
Yerram, Helena Pankov, Rory Blevins, Ben Vargas, Sarthak Jauhari, Matt Miecnikowski, Ming Zhang, Sandeep
Kumar, Clement Farabet, Charline Le Lan, Sebastian Flennerhag, Yonatan Bitton, Ada Ma, Arthur Bražinskas, Eli
Collins, Niharika Ahuja, Sneha Kudugunta, Anna Bortsova, Minh Giang, Wanzheng Zhu, Ed Chi, Scott Lundberg,
Alexey Stern, Subha Puttagunta, Jing Xiong, Xiao Wu, Yash Pande, Amit Jhindal, Daniel Murphy, Jon Clark, Marc
Brockschmidt, Maxine Deines, Kevin R. McKee, Dan Bahir, Jiajun Shen, Minh Truong, Daniel McDuff, Andrea
Gesmundo, Edouard Rosseel, Bowen Liang, Ken Caluwaerts, Jessica Hamrick, Joseph Kready, Mary Cassin,
Rishikesh Ingale, Li Lao, Scott Pollom, Yifan Ding, Wei He, Lizzetth Bellot, Joana Iljazi, Ramya Sree Boppana,
Shan Han, Tara Thompson, Amr Khalifa, Anna Bulanova, Blagoj Mitrevski, Bo Pang, Emma Cooney, Tian Shi,
Rey Coaguila, Tamar Yakar, Marc’aurelio Ranzato, Nikola Momchev, Chris Rawles, Zachary Charles, Young
Maeng, Yuan Zhang, Rishabh Bansal, Xiaokai Zhao, Brian Albert, Yuan Yuan, Sudheendra Vijayanarasimhan,
Roy Hirsch, Vinay Ramasesh, Kiran Vodrahalli, Xingyu Wang, Arushi Gupta, DJ Strouse, Jianmo Ni, Roma
Patel, Gabe Taubman, Zhouyuan Huo, Dero Gharibian, Marianne Monteiro, Hoi Lam, Shobha Vasudevan, Aditi
Chaudhary, Isabela Albuquerque, Kilol Gupta, Sebastian Riedel, Chaitra Hegde, Avraham Ruderman, András
György, Marcus Wainwright, Ashwin Chaugule, Burcu Karagol Ayan, Tomer Levinboim, Sam Shleifer, Yogesh
Kalley, Vahab Mirrokni, Abhishek Rao, Prabakar Radhakrishnan, Jay Hartford, Jialin Wu, Zhenhai Zhu, Francesco
Bertolini, Hao Xiong, Nicolas Serrano, Hamish Tomlinson, Myle Ott, Yifan Chang, Mark Graham, Jian Li, Marco
Liang, Xiangzhu Long, Sebastian Borgeaud, Yanif Ahmad, Alex Grills, Diana Mincu, Martin Izzard, Yuan Liu,
Jinyu Xie, Louis O’Bryan, Sameera Ponda, Simon Tong, Michelle Liu, Dan Malkin, Khalid Salama, Yuankai
Chen, Rohan Anil, Anand Rao, Rigel Swavely, Misha Bilenko, Nina Anderson, Tat Tan, Jing Xie, Xing Wu, Lijun
Yu, Oriol Vinyals, Andrey Ryabtsev, Rumen Dangovski, Kate Baumli, Daniel Keysers, Christian Wright, Zoe
Ashwood, Betty Chan, Artem Shtefan, Yaohui Guo, Ankur Bapna, Radu Soricut, Steven Pecht, Sabela Ramos,
Rui Wang, Jiahao Cai, Trieu Trinh, Paul Barham, Linda Friso, Eli Stickgold, Xiangzhuo Ding, Siamak Shakeri,
Diego Ardila, Eleftheria Briakou, Phil Culliton, Adam Raveret, Jingyu Cui, David Saxton, Subhrajit Roy, Javad
Azizi, Pengcheng Yin, Lucia Loher, Andrew Bunner, Min Choi, Faruk Ahmed, Eric Li, Yin Li, Shengyang Dai,
Michael Elabd, Sriram Ganapathy, Shivani Agrawal, Yiqing Hua, Paige Kunkle, Sujeevan Rajayogam, Arun
Ahuja, Arthur Conmy, Alex Vasiloff, Parker Beak, Christopher Yew, Jayaram Mudigonda, Bartek Wydrowski, Jon
Blanton, Zhengdong Wang, Yann Dauphin, Zhuo Xu, Martin Polacek, Xi Chen, Hexiang Hu, Pauline Sho, Markus
Kunesch, Mehdi Hafezi Manshadi, Eliza Rutherford, Bo Li, Sissie Hsiao, Iain Barr, Alex Tudor, Matija Kecman,
Arsha Nagrani, Vladimir Pchelin, Martin Sundermeyer, Aishwarya P S, Abhijit Karmarkar, Yi Gao, Grishma
Chole, Olivier Bachem, Isabel Gao, Arturo BC, Matt Dibb, Mauro Verzetti, Felix Hernandez-Campos, Yana Lunts,
Matthew Johnson, Julia Di Trapani, Raphael Koster, Idan Brusilovsky, Binbin Xiong, Megha Mohabey, Han Ke,
Joe Zou, Tea Sabolić, Víctor Campos, John Palowitch, Alex Morris, Linhai Qiu, Pranavaraj Ponnuramu, Fangtao
Li, Vivek Sharma, Kiranbir Sodhia, Kaan Tekelioglu, Aleksandr Chuklin, Madhavi Yenugula, Erika Gemzer,
Theofilos Strinopoulos, Sam El-Husseini, Huiyu Wang, Yan Zhong, Edouard Leurent, Paul Natsev, Weijun Wang,
Dre Mahaarachchi, Tao Zhu, Songyou Peng, Sami Alabed, Cheng-Chun Lee, Anthony Brohan, Arthur Szlam,
GS Oh, Anton Kovsharov, Jenny Lee, Renee Wong, Megan Barnes, Gregory Thornton, Felix Gimeno, Omer
Levy, Martin Sevenich, Melvin Johnson, Jonathan Mallinson, Robert Dadashi, Ziyue Wang, Qingchun Ren,
Preethi Lahoti, Arka Dhar, Josh Feldman, Dan Zheng, Thatcher Ulrich, Liviu Panait, Michiel Blokzijl, Cip Baetu,
Josip Matak, Jitendra Harlalka, Maulik Shah, Tal Marian, Daniel von Dincklage, Cosmo Du, Ruy Ley-Wild,
Bethanie Brownfield, Max Schumacher, Yury Stuken, Shadi Noghabi, Sonal Gupta, Xiaoqi Ren, Eric Malmi, Felix
Weissenberger, Blanca Huergo, Maria Bauza, Thomas Lampe, Arthur Douillard, Mojtaba Seyedhosseini, Roy
Frostig, Zoubin Ghahramani, Kelvin Nguyen, Kashyap Krishnakumar, Chengxi Ye, Rahul Gupta, Alireza Nazari,
Robert Geirhos, Pete Shaw, Ahmed Eleryan, Dima Damen, Jennimaria Palomaki, Ted Xiao, Qiyin Wu, Quan
Yuan, Phoenix Meadowlark, Matthew Bilotti, Raymond Lin, Mukund Sridhar, Yannick Schroecker, Da-Woon
Chung, Jincheng Luo, Trevor Strohman, Tianlin Liu, Anne Zheng, Jesse Emond, Wei Wang, Andrew Lampinen,
Toshiyuki Fukuzawa, Folawiyo Campbell-Ajala, Monica Roy, James Lee-Thorp, Lily Wang, Iftekhar Naim, Tony,
Nguy ên, Guy Bensky, Aditya Gupta, Dominika Rogozińska, Justin Fu, Thanumalayan Sankaranarayana Pillai,
Petar Veličković, Shahar Drath, Philipp Neubeck, Vaibhav Tulsyan, Arseniy Klimovskiy, Don Metzler, Sage
Stevens, Angel Yeh, Junwei Yuan, Tianhe Yu, Kelvin Zhang, Alec Go, Vincent Tsang, Ying Xu, Andy Wan, Isaac
Galatzer-Levy, Sam Sobell, Abodunrinwa Toki, Elizabeth Salesky, Wenlei Zhou, Diego Antognini, Sholto Douglas,
Shimu Wu, Adam Lelkes, Frank Kim, Paul Cavallaro, Ana Salazar, Yuchi Liu, James Besley, Tiziana Refice,
Yiling Jia, Zhang Li, Michal Sokolik, Arvind Kannan, Jon Simon, Jo Chick, Avia Aharon, Meet Gandhi, Mayank
Daswani, Keyvan Amiri, Vighnesh Birodkar, Abe Ittycheriah, Peter Grabowski, Oscar Chang, Charles Sutton,
Zhixin, Lai, Umesh Telang, Susie Sargsyan, Tao Jiang, Raphael Hoffmann, Nicole Brichtova, Matteo Hessel,
Jonathan Halcrow, Sammy Jerome, Geoff Brown, Alex Tomala, Elena Buchatskaya, Dian Yu, Sachit Menon, Pol
Moreno, Yuguo Liao, Vicky Zayats, Luming Tang, SQ Mah, Ashish Shenoy, Alex Siegman, Majid Hadian, Okwan
Kwon, Tao Tu, Nima Khajehnouri, Ryan Foley, Parisa Haghani, Zhongru Wu, Vaishakh Keshava, Khyatti Gupta,
Tony Bruguier, Rui Yao, Danny Karmon, Luisa Zintgraf, Zhicheng Wang, Enrique Piqueras, Junehyuk Jung, Jenny

16
Brennan, Diego Machado, Marissa Giustina, MH Tessler, Kamyu Lee, Qiao Zhang, Joss Moore, Kaspar Daugaard,
Alexander Frömmgen, Jennifer Beattie, Fred Zhang, Daniel Kasenberg, Ty Geri, Danfeng Qin, Gaurav Singh
Tomar, Tom Ouyang, Tianli Yu, Luowei Zhou, Rajiv Mathews, Andy Davis, Yaoyiran Li, Jai Gupta, Damion Yates,
Linda Deng, Elizabeth Kemp, Ga-Young Joung, Sergei Vassilvitskii, Mandy Guo, Pallavi LV, Dave Dopson, Sami
Lachgar, Lara McConnaughey, Himadri Choudhury, Dragos Dena, Aaron Cohen, Joshua Ainslie, Sergey Levi,
Parthasarathy Gopavarapu, Polina Zablotskaia, Hugo Vallet, Sanaz Bahargam, Xiaodan Tang, Nenad Tomasev,
Ethan Dyer, Daniel Balle, Hongrae Lee, William Bono, Jorge Gonzalez Mendez, Vadim Zubov, Shentao Yang,
Ivor Rendulic, Yanyan Zheng, Andrew Hogue, Golan Pundak, Ralph Leith, Avishkar Bhoopchand, Michael Han,
Mislav Žanić, Tom Schaul, Manolis Delakis, Tejas Iyer, Guanyu Wang, Harman Singh, Abdelrahman Abdelhamed,
Tara Thomas, Siddhartha Brahma, Hilal Dib, Naveen Kumar, Wenxuan Zhou, Liang Bai, Pushkar Mishra, Jiao
Sun, Valentin Anklin, Roykrong Sukkerd, Lauren Agubuzu, Anton Briukhov, Anmol Gulati, Maximilian Sieb,
Fabio Pardo, Sara Nasso, Junquan Chen, Kexin Zhu, Tiberiu Sosea, Alex Goldin, Keith Rush, Spurthi Amba
Hombaiah, Andreas Noever, Allan Zhou, Sam Haves, Mary Phuong, Jake Ades, Yi ting Chen, Lin Yang, Joseph
Pagadora, Stan Bileschi, Victor Cotruta, Rachel Saputro, Arijit Pramanik, Sean Ammirati, Dan Garrette, Kevin
Villela, Tim Blyth, Canfer Akbulut, Neha Jha, Alban Rrustemi, Arissa Wongpanich, Chirag Nagpal, Yonghui Wu,
Morgane Rivière, Sergey Kishchenko, Pranesh Srinivasan, Alice Chen, Animesh Sinha, Trang Pham, Bill Jia,
Tom Hennigan, Anton Bakalov, Nithya Attaluri, Drew Garmon, Daniel Rodriguez, Dawid Wegner, Wenhao Jia,
Evan Senter, Noah Fiedel, Denis Petek, Yuchuan Liu, Cassidy Hardin, Harshal Tushar Lehri, Joao Carreira, Sara
Smoot, Marcel Prasetya, Nami Akazawa, Anca Stefanoiu, Chia-Hua Ho, Anelia Angelova, Kate Lin, Min Kim,
Charles Chen, Marcin Sieniek, Alice Li, Tongfei Guo, Sorin Baltateanu, Pouya Tafti, Michael Wunder, Nadav
Olmert, Divyansh Shukla, Jingwei Shen, Neel Kovelamudi, Balaji Venkatraman, Seth Neel, Romal Thoppilan,
Jerome Connor, Frederik Benzing, Axel Stjerngren, Golnaz Ghiasi, Alex Polozov, Joshua Howland, Theophane
Weber, Justin Chiu, Ganesh Poomal Girirajan, Andreas Terzis, Pidong Wang, Fangda Li, Yoav Ben Shalom,
Dinesh Tewari, Matthew Denton, Roee Aharoni, Norbert Kalb, Heri Zhao, Junlin Zhang, Angelos Filos, Matthew
Rahtz, Lalit Jain, Connie Fan, Vitor Rodrigues, Ruth Wang, Richard Shin, Jacob Austin, Roman Ring, Mariella
Sanchez-Vargas, Mehadi Hassen, Ido Kessler, Uri Alon, Gufeng Zhang, Wenhu Chen, Yenai Ma, Xiance Si,
Le Hou, Azalia Mirhoseini, Marc Wilson, Geoff Bacon, Becca Roelofs, Lei Shu, Gautam Vasudevan, Jonas Adler,
Artur Dwornik, Tayfun Terzi, Matt Lawlor, Harry Askham, Mike Bernico, Xuanyi Dong, Chris Hidey, Kevin
Kilgour, Gaël Liu, Surya Bhupatiraju, Luke Leonhard, Siqi Zuo, Partha Talukdar, Qing Wei, Aliaksei Severyn,
Vít Listík, Jong Lee, Aditya Tripathi, SK Park, Yossi Matias, Hao Liu, Alex Ruiz, Rajesh Jayaram, Jackson
Tolins, Pierre Marcenac, Yiming Wang, Bryan Seybold, Henry Prior, Deepak Sharma, Jack Weber, Mikhail
Sirotenko, Yunhsuan Sung, Dayou Du, Ellie Pavlick, Stefan Zinke, Markus Freitag, Max Dylla, Montse Gonzalez
Arenas, Natan Potikha, Omer Goldman, Connie Tao, Rachita Chhaparia, Maria Voitovich, Pawan Dogra, Andrija
Ražnatović, Zak Tsai, Chong You, Oleaser Johnson, George Tucker, Chenjie Gu, Jae Yoo, Maryam Majzoubi,
Valentin Gabeur, Bahram Raad, Rocky Rhodes, Kashyap Kolipaka, Heidi Howard, Geta Sampemane, Benny Li,
Chulayuth Asawaroengchai, Duy Nguyen, Chiyuan Zhang, Timothee Cour, Xinxin Yu, Zhao Fu, Joe Jiang, Po-Sen
Huang, Gabriela Surita, Iñaki Iturrate, Yael Karov, Michael Collins, Martin Baeuml, Fabian Fuchs, Shilpa Shetty,
Swaroop Ramaswamy, Sayna Ebrahimi, Qiuchen Guo, Jeremy Shar, Gabe Barth-Maron, Sravanti Addepalli,
Bryan Richter, Chin-Yi Cheng, Eugénie Rives, Fei Zheng, Johannes Griesser, Nishanth Dikkala, Yoel Zeldes, Ilkin
Safarli, Dipanjan Das, Himanshu Srivastava, Sadh MNM Khan, Xin Li, Aditya Pandey, Larisa Markeeva, Dan
Belov, Qiqi Yan, Mikołaj Rybiński, Tao Chen, Megha Nawhal, Michael Quinn, Vineetha Govindaraj, Sarah York,
Reed Roberts, Roopal Garg, Namrata Godbole, Jake Abernethy, Anil Das, Lam Nguyen Thiet, Jonathan Tompson,
John Nham, Neera Vats, Ben Caine, Wesley Helmholz, Francesco Pongetti, Yeongil Ko, James An, Clara Huiyi
Hu, Yu-Cheng Ling, Julia Pawar, Robert Leland, Keisuke Kinoshita, Waleed Khawaja, Marco Selvi, Eugene Ie,
Danila Sinopalnikov, Lev Proleev, Nilesh Tripuraneni, Michele Bevilacqua, Seungji Lee, Clayton Sanford, Dan
Suh, Dustin Tran, Jeff Dean, Simon Baumgartner, Jens Heitkaemper, Sagar Gubbi, Kristina Toutanova, Yichong
Xu, Chandu Thekkath, Keran Rong, Palak Jain, Annie Xie, Yan Virin, Yang Li, Lubo Litchev, Richard Powell,
Tarun Bharti, Adam Kraft, Nan Hua, Marissa Ikonomidis, Ayal Hitron, Sanjiv Kumar, Loic Matthey, Sophie
Bridgers, Lauren Lax, Ishaan Malhi, Ondrej Skopek, Ashish Gupta, Jiawei Cao, Mitchelle Rasquinha, Siim Põder,
Wojciech Stokowiec, Nicholas Roth, Guowang Li, Michaël Sander, Joshua Kessinger, Vihan Jain, Edward Loper,
Wonpyo Park, Michal Yarom, Liqun Cheng, Guru Guruganesh, Kanishka Rao, Yan Li, Catarina Barros, Mikhail
Sushkov, Chun-Sung Ferng, Rohin Shah, Ophir Aharoni, Ravin Kumar, Tim McConnell, Peiran Li, Chen Wang,
Fernando Pereira, Craig Swanson, Fayaz Jamil, Yan Xiong, Anitha Vijayakumar, Prakash Shroff, Kedar Soparkar,
Jindong Gu, Livio Baldini Soares, Eric Wang, Kushal Majmundar, Aurora Wei, Kai Bailey, Nora Kassner, Chizu
Kawamoto, Goran Žužić, Victor Gomes, Abhirut Gupta, Michael Guzman, Ishita Dasgupta, Xinyi Bai, Zhufeng
Pan, Francesco Piccinno, Hadas Natalie Vogel, Octavio Ponce, Adrian Hutter, Paul Chang, Pan-Pan Jiang, Ionel
Gog, Vlad Ionescu, James Manyika, Fabian Pedregosa, Harry Ragan, Zach Behrman, Ryan Mullins, Coline Devin,
Aroonalok Pyne, Swapnil Gawde, Martin Chadwick, Yiming Gu, Sasan Tavakkol, Andy Twigg, Naman Goyal,
Ndidi Elue, Anna Goldie, Srinivasan Venkatachary, Hongliang Fei, Ziqiang Feng, Marvin Ritter, Isabel Leal,

17
Sudeep Dasari, Pei Sun, Alif Raditya Rochman, Brendan O’Donoghue, Yuchen Liu, Jim Sproch, Kai Chen, Natalie
Clay, Slav Petrov, Sailesh Sidhwani, Ioana Mihailescu, Alex Panagopoulos, AJ Piergiovanni, Yunfei Bai, George
Powell, Deep Karkhanis, Trevor Yacovone, Petr Mitrichev, Joe Kovac, Dave Uthus, Amir Yazdanbakhsh, David
Amos, Steven Zheng, Bing Zhang, Jin Miao, Bhuvana Ramabhadran, Soroush Radpour, Shantanu Thakoor, Josh
Newlan, Oran Lang, Orion Jankowski, Shikhar Bharadwaj, Jean-Michel Sarr, Shereen Ashraf, Sneha Mondal, Jun
Yan, Ankit Singh Rawat, Sarmishta Velury, Greg Kochanski, Tom Eccles, Franz Och, Abhanshu Sharma, Ethan
Mahintorabi, Alex Gurney, Carrie Muir, Vered Cohen, Saksham Thakur, Adam Bloniarz, Asier Mujika, Alexander
Pritzel, Paul Caron, Altaf Rahman, Fiona Lang, Yasumasa Onoe, Petar Sirkovic, Jay Hoover, Ying Jian, Pablo
Duque, Arun Narayanan, David Soergel, Alex Haig, Loren Maggiore, Shyamal Buch, Josef Dean, Ilya Figotin,
Igor Karpov, Shaleen Gupta, Denny Zhou, Muhuan Huang, Ashwin Vaswani, Christopher Semturs, Kaushik
Shivakumar, Yu Watanabe, Vinodh Kumar Rajendran, Eva Lu, Yanhan Hou, Wenting Ye, Shikhar Vashishth,
Nana Nti, Vytenis Sakenas, Darren Ni, Doug DeCarlo, Michael Bendersky, Sumit Bagri, Nacho Cano, Elijah
Peake, Simon Tokumine, Varun Godbole, Carlos Guía, Tanya Lando, Vittorio Selo, Seher Ellis, Danny Tarlow,
Daniel Gillick, Alessandro Epasto, Siddhartha Reddy Jonnalagadda, Meng Wei, Meiyan Xie, Ankur Taly, Michela
Paganini, Mukund Sundararajan, Daniel Toyama, Ting Yu, Dessie Petrova, Aneesh Pappu, Rohan Agrawal, Senaka
Buthpitiya, Justin Frye, Thomas Buschmann, Remi Crocker, Marco Tagliasacchi, Mengchao Wang, Da Huang,
Sagi Perel, Brian Wieder, Hideto Kazawa, Weiyue Wang, Jeremy Cole, Himanshu Gupta, Ben Golan, Seojin Bang,
Nitish Kulkarni, Ken Franko, Casper Liu, Doug Reid, Sid Dalmia, Jay Whang, Kevin Cen, Prasha Sundaram,
Johan Ferret, Berivan Isik, Lucian Ionita, Guan Sun, Anna Shekhawat, Muqthar Mohammad, Philip Pham, Ronny
Huang, Karthik Raman, Xingyi Zhou, Ross Mcilroy, Austin Myers, Sheng Peng, Jacob Scott, Paul Covington,
Sofia Erell, Pratik Joshi, João Gabriel Oliveira, Natasha Noy, Tajwar Nasir, Jake Walker, Vera Axelrod, Tim
Dozat, Pu Han, Chun-Te Chu, Eugene Weinstein, Anand Shukla, Shreyas Chandrakaladharan, Petra Poklukar,
Bonnie Li, Ye Jin, Prem Eruvbetine, Steven Hansen, Avigail Dabush, Alon Jacovi, Samrat Phatale, Chen Zhu,
Steven Baker, Mo Shomrat, Yang Xiao, Jean Pouget-Abadie, Mingyang Zhang, Fanny Wei, Yang Song, Helen
King, Yiling Huang, Yun Zhu, Ruoxi Sun, Juliana Vicente Franco, Chu-Cheng Lin, Sho Arora, Hui, Li, Vivian
Xia, Luke Vilnis, Mariano Schain, Kaiz Alarakyia, Laurel Prince, Aaron Phillips, Caleb Habtegebriel, Luyao
Xu, Huan Gui, Santiago Ontanon, Lora Aroyo, Karan Gill, Peggy Lu, Yash Katariya, Dhruv Madeka, Shankar
Krishnan, Shubha Srinivas Raghvendra, James Freedman, Yi Tay, Gaurav Menghani, Peter Choy, Nishita Shetty,
Dan Abolafia, Doron Kukliansky, Edward Chou, Jared Lichtarge, Ken Burke, Ben Coleman, Dee Guo, Larry Jin,
Indro Bhattacharya, Victoria Langston, Yiming Li, Suyog Kotecha, Alex Yakubovich, Xinyun Chen, Petre Petrov,
Tolly Powell, Yanzhang He, Corbin Quick, Kanav Garg, Dawsen Hwang, Yang Lu, Srinadh Bhojanapalli, Kristian
Kjems, Ramin Mehran, Aaron Archer, Hado van Hasselt, Ashwin Balakrishna, JK Kearns, Meiqi Guo, Jason
Riesa, Mikita Sazanovich, Xu Gao, Chris Sauer, Chengrun Yang, XiangHai Sheng, Thomas Jimma, Wouter Van
Gansbeke, Vitaly Nikolaev, Wei Wei, Katie Millican, Ruizhe Zhao, Justin Snyder, Levent Bolelli, Maura O’Brien,
Shawn Xu, Fei Xia, Wentao Yuan, Arvind Neelakantan, David Barker, Sachin Yadav, Hannah Kirkwood, Farooq
Ahmad, Joel Wee, Jordan Grimstad, Boyu Wang, Matthew Wiethoff, Shane Settle, Miaosen Wang, Charles Blun-
dell, Jingjing Chen, Chris Duvarney, Grace Hu, Olaf Ronneberger, Alex Lee, Yuanzhen Li, Abhishek Chakladar,
Alena Butryna, Georgios Evangelopoulos, Guillaume Desjardins, Jonni Kanerva, Henry Wang, Averi Nowak, Nick
Li, Alyssa Loo, Art Khurshudov, Laurent El Shafey, Nagabhushan Baddi, Karel Lenc, Yasaman Razeghi, Tom
Lieber, Amer Sinha, Xiao Ma, Yao Su, James Huang, Asahi Ushio, Hanna Klimczak-Plucińska, Kareem Mohamed,
JD Chen, Simon Osindero, Stav Ginzburg, Lampros Lamprou, Vasilisa Bashlovkina, Duc-Hieu Tran, Ali Khodaei,
Ankit Anand, Yixian Di, Ramy Eskander, Manish Reddy Vuyyuru, Jasmine Liu, Aishwarya Kamath, Roman
Goldenberg, Mathias Bellaiche, Juliette Pluto, Bill Rosgen, Hassan Mansoor, William Wong, Suhas Ganesh, Eric
Bailey, Scott Baird, Dan Deutsch, Jinoo Baek, Xuhui Jia, Chansoo Lee, Abe Friesen, Nathaniel Braun, Kate
Lee, Amayika Panda, Steven M. Hernandez, Duncan Williams, Jianqiao Liu, Ethan Liang, Arnaud Autef, Emily
Pitler, Deepali Jain, Phoebe Kirk, Oskar Bunyan, Jaume Sanchez Elias, Tongxin Yin, Machel Reid, Aedan Pope,
Nikita Putikhin, Bidisha Samanta, Sergio Guadarrama, Dahun Kim, Simon Rowe, Marcella Valentine, Geng Yan,
Alex Salcianu, David Silver, Gan Song, Richa Singh, Shuai Ye, Hannah DeBalsi, Majd Al Merey, Eran Ofek,
Albert Webson, Shibl Mourad, Ashwin Kakarla, Silvio Lattanzi, Nick Roy, Evgeny Sluzhaev, Christina Butterfield,
Alessio Tonioni, Nathan Waters, Sudhindra Kopalle, Jason Chase, James Cohan, Girish Ramchandra Rao, Robert
Berry, Michael Voznesensky, Shuguang Hu, Kristen Chiafullo, Sharat Chikkerur, George Scrivener, Ivy Zheng,
Jeremy Wiesner, Wolfgang Macherey, Timothy Lillicrap, Fei Liu, Brian Walker, David Welling, Elinor Davies,
Yangsibo Huang, Lijie Ren, Nir Shabat, Alessandro Agostini, Mariko Iinuma, Dustin Zelle, Rohit Sathyanarayana,
Andrea D’olimpio, Morgan Redshaw, Matt Ginsberg, Ashwin Murthy, Mark Geller, Tatiana Matejovicova, Ayan
Chakrabarti, Ryan Julian, Christine Chan, Qiong Hu, Daniel Jarrett, Manu Agarwal, Jeshwanth Challagundla,
Tao Li, Sandeep Tata, Wen Ding, Maya Meng, Zhuyun Dai, Giulia Vezzani, Shefali Garg, Jannis Bulian, Mary
Jasarevic, Honglong Cai, Harish Rajamani, Adam Santoro, Florian Hartmann, Chen Liang, Bartek Perz, Apoorv
Jindal, Fan Bu, Sungyong Seo, Ryan Poplin, Adrian Goedeckemeyer, Badih Ghazi, Nikhil Khadke, Leon Liu,
Kevin Mather, Mingda Zhang, Ali Shah, Alex Chen, Jinliang Wei, Keshav Shivam, Yuan Cao, Donghyun Cho,

18
Angelo Scorza Scarpati, Michael Moffitt, Clara Barbu, Ivan Jurin, Ming-Wei Chang, Hongbin Liu, Hao Zheng,
Shachi Dave, Christine Kaeser-Chen, Xiaobin Yu, Alvin Abdagic, Lucas Gonzalez, Yanping Huang, Peilin Zhong,
Cordelia Schmid, Bryce Petrini, Alex Wertheim, Jifan Zhu, Hoang Nguyen, Kaiyang Ji, Yanqi Zhou, Tao Zhou,
Fangxiaoyu Feng, Regev Cohen, David Rim, Shubham Milind Phal, Petko Georgiev, Ariel Brand, Yue Ma,
Wei Li, Somit Gupta, Chao Wang, Pavel Dubov, Jean Tarbouriech, Kingshuk Majumder, Huijian Li, Norman
Rink, Apurv Suman, Yang Guo, Yinghao Sun, Arun Nair, Xiaowei Xu, Mohamed Elhawaty, Rodrigo Cabrera,
Guangxing Han, Julian Eisenschlos, Junwen Bai, Yuqi Li, Yamini Bansal, Thibault Sellam, Mina Khan, Hung
Nguyen, Justin Mao-Jones, Nikos Parotsidis, Jake Marcus, Cindy Fan, Roland Zimmermann, Yony Kochinski,
Laura Graesser, Feryal Behbahani, Alvaro Caceres, Michael Riley, Patrick Kane, Sandra Lefdal, Rob Willoughby,
Paul Vicol, Lun Wang, Shujian Zhang, Ashleah Gill, Yu Liang, Gautam Prasad, Soroosh Mariooryad, Mehran
Kazemi, Zifeng Wang, Kritika Muralidharan, Paul Voigtlaender, Jeffrey Zhao, Huanjie Zhou, Nina D’Souza, Aditi
Mavalankar, Séb Arnold, Nick Young, Obaid Sarvana, Chace Lee, Milad Nasr, Tingting Zou, Seokhwan Kim,
Lukas Haas, Kaushal Patel, Neslihan Bulut, David Parkinson, Courtney Biles, Dmitry Kalashnikov, Chi Ming To,
Aviral Kumar, Jessica Austin, Alex Greve, Lei Zhang, Megha Goel, Yeqing Li, Sergey Yaroshenko, Max Chang,
Abhishek Jindal, Geoff Clark, Hagai Taitelbaum, Dale Johnson, Ofir Roval, Jeongwoo Ko, Anhad Mohananey,
Christian Schuler, Shenil Dodhia, Ruichao Li, Kazuki Osawa, Claire Cui, Peng Xu, Rushin Shah, Tao Huang,
Ela Gruzewska, Nathan Clement, Mudit Verma, Olcan Sercinoglu, Hai Qian, Viral Shah, Masa Yamaguchi,
Abhinit Modi, Takahiro Kosakai, Thomas Strohmann, Junhao Zeng, Beliz Gunel, Jun Qian, Austin Tarango,
Krzysztof Jastrz˛ebski, Robert David, Jyn Shan, Parker Schuh, Kunal Lad, Willi Gierke, Mukundan Madhavan,
Xinyi Chen, Mark Kurzeja, Rebeca Santamaria-Fernandez, Dawn Chen, Alexandra Cordell, Yuri Chervonyi,
Frankie Garcia, Nithish Kannen, Vincent Perot, Nan Ding, Shlomi Cohen-Ganor, Victor Lavrenko, Junru Wu,
Georgie Evans, Cicero Nogueira dos Santos, Madhavi Sewak, Ashley Brown, Andrew Hard, Joan Puigcerver,
Zeyu Zheng, Yizhong Liang, Evgeny Gladchenko, Reeve Ingle, Uri First, Pierre Sermanet, Charlotte Magister,
Mihajlo Velimirović, Sashank Reddi, Susanna Ricco, Eirikur Agustsson, Hartwig Adam, Nir Levine, David
Gaddy, Dan Holtmann-Rice, Xuanhui Wang, Ashutosh Sathe, Abhijit Guha Roy, Blaž Bratanič, Alen Carin, Harsh
Mehta, Silvano Bonacina, Nicola De Cao, Mara Finkelstein, Verena Rieser, Xinyi Wu, Florent Altché, Dylan
Scandinaro, Li Li, Nino Vieillard, Nikhil Sethi, Garrett Tanzer, Zhi Xing, Shibo Wang, Parul Bhatia, Gui Citovsky,
Thomas Anthony, Sharon Lin, Tianze Shi, Shoshana Jakobovits, Gena Gibson, Raj Apte, Lisa Lee, Mingqing
Chen, Arunkumar Byravan, Petros Maniatis, Kellie Webster, Andrew Dai, Pu-Chin Chen, Jiaqi Pan, Asya Fadeeva,
Zach Gleicher, Thang Luong, and Niket Kumar Bhumihar. Gemini 2.5: Pushing the frontier with advanced
reasoning, multimodality, long context, and next generation agentic capabilities, 2025.
[34] Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are
unsupervised multitask learners. 2019.
[35] Hongru Wang, Cheng Qian, Manling Li, Jiahao Qiu, Boyang Xue, Mengdi Wang, Heng Ji, and Kam-Fai Wong.
Toward a theory of agents as tool-use decision-makers, 2025.
[36] Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun
Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. Autogen: Enabling
next-gen llm applications via multi-agent conversation, 2023.
[37] OpenAI. Introducing deep research.
[38] Manus Team. Manus, 2024.
[39] Hongru Wang, Cheng Qian, Wanjun Zhong, Xiusi Chen, Jiahao Qiu, Shijue Huang, Bowen Jin, Mengdi Wang,
Kam-Fai Wong, and Heng Ji. Acting less is reasoning more! teaching model to act efficiently, 2025.
[40] Guanting Dong, Hangyu Mao, Kai Ma, Licheng Bao, Yifei Chen, Zhongyuan Wang, Zhongxia Chen, Jiazhen Du,
Huiyang Wang, Fuzheng Zhang, Guorui Zhou, Yutao Zhu, Ji-Rong Wen, and Zhicheng Dou. Agentic reinforced
policy optimization, 2025.
[41] Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, and Yuqing Yang.
Agent lightning: Train any ai agents with reinforcement learning, 2025.
[42] AlphaProof and AlphaGeometry teams. Ai achieves silver-medal standard solving international mathematical
olympiad problems. Google DeepMind Blog, July 2024. Published July 25, 2024.
[43] Jiahao Qiu, Yinghui He, Xinzhe Juan, Yimin Wang, Yuhan Liu, Zixin Yao, Yue Wu, Xun Jiang, Ling Yang, and
Mengdi Wang. Emoagent: Assessing and safeguarding human-ai interaction for mental health safety, 2025.
[44] Luoxin Chen, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin,
Chenggang Li, Kaijing Ma, Cheng Ren, Jiawei Shen, Wenlei Shi, Tong Sun, He Sun, Jiahui Wang, Siran Wang,
Zhihong Wang, Chenrui Wei, Shufa Wei, Yonghui Wu, Yuchen Wu, Yihang Xia, Huajian Xin, Fan Yang, Huaiyuan
Ying, Hongyi Yuan, Zheng Yuan, Tianyang Zhan, Chi Zhang, Yue Zhang, Ge Zhang, Tianyun Zhao, Jianqiu Zhao,

19
Yichi Zhou, and Thomas Hanwen Zhu. Seed-prover: Deep and broad reasoning for automated theorem proving,
2025.
[45] Song Dai, Yibo Yan, Jiamin Su, Dongfang Zihao, Yubo Gao, Yonghua Hei, Jungang Li, Junyan Zhang, Sicheng
Tao, Zhuoran Gao, and Xuming Hu. Physicsarena: The first multimodal physics reasoning benchmark exploring
variable, process, and solution dimensions, 2025.
[46] Xinyu Zhang, Yuxuan Dong, Yanrui Wu, Jiaxing Huang, Chengyou Jia, Basura Fernando, Mike Zheng Shou,
Lingling Zhang, and Jun Liu. Physreason: A comprehensive benchmark towards physics-based reasoning, 2025.
[47] Isaac Newton. Philosophiæ Naturalis Principia Mathematica. Jussu Societatis Regiæ ac Typis Josephi Streater,
1687.
[48] Shiekh Zia Uddin, Sachin Vaidya, Shrish Choudhary, Zhuo Chen, Raafat K. Salib, Luke Huang, Dirk R. Englund,
and Marin Soljačić. Ai-driven robotics for free-space optics, 2025.
[49] Mario Carneiro. Lean4lean: Towards a verified typechecker for lean, in lean, 2024.
[50] Peiyang Song, Kaiyu Yang, and Anima Anandkumar. Lean copilot: Large language models as copilots for theorem
proving in lean, 2025.
[51] Maxwell P. Bobbin, Samiha Sharlin, Parivash Feyzishendi, An Hong Dang, Catherine M. Wraback, and Tyler R.
Josephson. Formalizing chemical physics using the lean theorem prover, 2023.
[52] P. Smolensky. Connectionist ai, symbolic ai, and the brain. Artificial Intelligence Review, 1(3):95–109, 1987.
[53] Leonardo de Moura and Sebastian Ullrich. The lean 4 theorem prover and programming language (system
description). In Automated Deduction – CADE 28, pages 625–635. Springer, Cham, 2021.
[54] Z. Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao
Zhu, Dejian Yang, Z. F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, and
Chong Ruan. Deepseek-prover-v2: Advancing formal mathematical reasoning via reinforcement learning for
subgoal decomposition, 2025.
[55] Numina & Kimi Team. Kimina-prover preview: Towards large formal reasoning models with reinforcement
learning, 2025.
[56] Yichi Zhou, Jianqiu Zhao, Yongxin Zhang, Bohan Wang, Siran Wang, Luoxin Chen, Jiahui Wang, Haowei Chen,
Allan Jie, Xinbo Zhang, Haocheng Wang, Trung Luong, Rong Ye, Phan Nhat Hoang, Huishuai Zhang, Peng Sun,
and Hang Li. Solving formal math problems by decomposition and iterative reflection, 2025.
[57] Kaito Baba, Chaoran Liu, Shuhei Kurita, and Akiyoshi Sannai. Prover agent: An agent-based framework for
formal mathematical proofs, 2025.
[58] Azim Ospanov, Farzan Farnia, and Roozbeh Yousefzadeh. Apollo: Automated llm and lean collaboration for
advanced formal reasoning, 2025.

20
A Examples of IPhO 2025 Problems Scoring Criteria
We provide two examples of the IPhO 2025 scoring criteria (corresponding to Theory Problem 1 Part C.1 and Theory
Problem 3 Part C.2, respectively) in Figure 4, obtained from https://ipho.olimpicos.net/.

Q1-7 Q3-11
Theory Theory

English (Official) English (Official)

CO
𝑃 2 𝐵
The saturated vapor pressure of for the CO2 ice solid/gas transition follows : log10 ⒧ sat ⒭ = 𝐴 −
CO
𝑃sat 2
C.1 Find the equation of motion on 𝑧 for the vertical motion of a point mass 𝑚 0.5pt ::::::: ::::::::::::::::::
𝑃0 𝑇 +𝐶
in such a potential, assuming 𝑟 is constant. Show that, if 𝑟 < 𝑟0 , the galactic with 𝑇 in K, 𝐴 = 6.81, 𝐵 = 1.30 × 103 K and 𝐶 = −3.49 K𝐵::::::::::::: and::::::::::
= 1.30 × 103 K :::: 𝐶 = −3.49 K.
plane is a stable equilibrium state by giving the angular frequency 𝜔0 of small
oscillations around it. C.2 Give the numerical value 𝑇f of the CO2 gas at the end of the expansion, after 0.7pt
opening a bottle, if 𝑇0 = 6 °C and if 𝑇0 = 20 °C, if no phase transition occured.
SOLUTION: Choose which statements are true (several statements possible):
1. At 𝑇0 = 6 °C a grey-white fog appears while opening the bottle.
− →
→ − →
− →

The equation of motion is given by Newton's second law 𝑚𝑎 = 𝐹 = −𝑚∇𝜑, projected on 𝑢𝑧 , it gives 2. At 𝑇0 = 6 °C a blue fog appears while opening the bottle.
2 3. At 𝑇0 = 20 °C a grey-white fog appears while opening the bottle.
𝑚𝑧̈ = −𝑚 𝜕𝜑
𝜕𝑧
. Using the given potential we have 𝑧̈ = 2𝑧
𝜑 ln ⒧ 𝑟𝑟 ⒭ exp − ⒧ 𝑧𝑧 ⒭ 
𝑧02 0
. Near the galactic plane ( 4. At 𝑇0 = 20 °C a blue fog appears while opening the bottle.
0 0

𝑧 = 0 ) the exponential is equal to 1 and can be simplified to give 𝑧̈ ≃ 2𝑧


𝜑 ln ⒧ 𝑟𝑟 ⒭ . If 𝑟 < 𝑟0 the ln is negative
𝑧02 0 0 SOLUTION:

and the equation of motion is of the form 𝑧̈ ≃ −𝜔02 𝑧 with 𝜔0 =  2𝜑 0 ln ⒧ 𝑟 ⒭ . This proves that 𝑧 is oscillating C.2.1. The adiabatic reversible expansion goes from 𝑃𝑖 to 𝑃0 .
𝑧02 | 𝑟0 |
(1/𝛾)−1
𝑃𝑖
around 𝑧 = 0 and that the motion is stable. C.2.2. 𝑇𝑓 = 𝑇0 ⒧ ⒭
𝑃0
Marker Scheme
C.2.3. For 𝑇0 = 6 °C: 𝑃𝑖 = 4.69 bar and 𝑇𝑓 = 195.3 K = −77.8 °C.
C.2.4. For 𝑇0 = 20 °C: 𝑃𝑖 = 7.45 bar and 𝑇𝑓 = 184.3 K = −88.8 °C.
C.1.1 : Newton's second law, or equivalent method 0.1
C.2.5. First method: comparison 𝑃sat (𝑇𝑓 ) and 𝑃𝑓 = 𝑃0 .
C.1.2 : Projection on the z axis 0.1
C.1.3 : Equation of motion 0.1 Second method: evaluation of the transition temperature at 𝑃0 and comparison with 𝑇𝑓 .

C.2.6. First method: 𝑃sat 2 (𝑇𝑓 = 6 °C) = 1.07 bar > 𝑃0 . As the solid-liquid frontier has a positive slope in 𝑃, 𝑇
CO
C.1.4 : Equation near the galactic plane 0.1
state-diagram, the final state of CO2 is gaseous. 𝑃sat 2 (𝑇𝑓 = 20 °C) = 0.41 bar < 𝑃0 . As the solid-gas frontier
CO
C.1.5 : Expression for 𝜔0 0.1
has a positive slope in 𝑃, 𝑇 state-diagram, the final gaseous state hypothesis is inconsistent and a phase
transition has occured in the latter case.
From here on, we set 𝑧 = 0. 𝐵
Second method: 𝑇𝑡𝑟𝑎𝑛𝑠 = − 𝐶 . 𝑇𝑡𝑟𝑎𝑛𝑠 = 194.4 K = −78.8 °C. For 𝑇0 = 6 °C: 𝑇𝑓 = 195.3 K > 𝑇𝑡𝑟𝑎𝑛𝑠 ; the
𝑃
C.2 Identify the regime, either 𝑟 ≫ 𝑟𝑚 or 𝑟 ≪ 𝑟𝑚 , in which the model of Eq. 1 recovers 0.6pt 𝐴 − log10 ⒧ 0 ⒭
𝑃0
a potential of the form 𝜑𝐺 (𝑟, 0) with a suitable definition of 𝜑0 . final state of CO2 is gaseous. For 𝑇0 = 20 °C: 𝑇𝑓 = 184.3 K < 𝑇𝑡𝑟𝑎𝑛𝑠 ; the final gaseous state hypothesis is
Under this condition 𝑣𝑐 (𝑟) no longer depends on 𝑟. Express it in terms of 𝜑0 . inconsistent and a phase transition has occured.
C.2.7. The true statements are: 1 and 4.
SOLUTION:
Using the density given by equation (1) in part B, we have obtained C.2.1. Final pressure of the expansion. 0.1
C.2.2. Litteral expression of 𝑇𝑓 . 0.1
4𝜋𝐶𝑚 𝐺 𝑟 − 𝑟𝑚 arctan ⒧ 𝑟𝑟 ⒭
𝑔𝑚 (𝑟) = − 𝑚
(4) C.2.3. For 𝑇0 = 6 °C: 𝑃𝑖 = 4.69 bar and 𝑇𝑓 = 195.3 K ; 0.1
𝑟2
C.2.4. For 𝑇0 = 20 °C: 𝑃𝑖 = 7.45 bar and 𝑇𝑓 = 184.3 K; 0.1
4𝜋𝐶𝑚 𝐺 C.2.5. Idea of comparison between 𝑃sat and 𝑃0 or evaluation 0.1
Hence, considering 𝑟 ≫ 𝑟𝑚 , one can simplify this relation to 𝑔𝑚 (𝑟) ≃ − . The gravitational potential
𝑟 of the transition temperature at 𝑃0 and idea of comparison
can be obtained by integration, we then have : 𝜑(𝑟) = +4𝜋𝐶𝑚 𝐺 ln(𝑟) + cst . The constant can be found by with 𝑇𝑓 .
correctly choosing the origin of the potential. This potential corresponds to: 𝜑𝐺 (𝑟, 𝑧 = 0) = 𝜑0 ln ⒧ 𝑟𝑟 ⒭ with C.2.6. Numerical comparison. 0.1
0
C.2.7. True statements (all or nothing). 0.1
2
𝜑0 = +4𝜋𝐶𝑚 𝐺 . In that case, the equation of motion in the galactic plane gives −𝑚 𝑣𝑟𝑐 = −𝑚𝑔𝑚 (𝑟) which
writes 𝑣𝑐 = √𝑟𝑔𝑚 (𝑟) = √4𝜋𝐶𝑚 𝐺 , so that 𝑣𝑐 = √𝜑0 .
During bottle opening, the cork stopper pops out. We now determine the maximum height 𝐻c it reaches.
Marker Scheme
Figure 4: Two scoring criteria examples, for IPhO 2025 Theory Problem 1 Part C.1 and Theory Problem 3 Part C.2,
correspondingly. As shown, there are 5 scoring criteria for Theory Problem 1 Part C.1, and 7 scoring criteria for Theory
Problem 3 Part C.2.
As shown in Figure 4, for each problem, very detailed answers and scoring criteria are provided, making fine-grained
scoring possible for answers. In Table 1 we count the number of scoring criteria for each part of each Theory Problem.

B Detailed Prompts
B.1 Image Analyzer Tool Prompt

The image analyzer tool utilizes an LLM provided with the image and the question from the manager agent. Its task is
to answer the manager’s questions based on the provided information. It then returns an str-object of its measurements.
# input : manager_query : str , img_file : ImageFile , vision_expert_llm : LLMModel
IMG_S YSTEM_PROMPT = " You are an expert in dealing with image in Physics
Olympiads . "
messages = [
ChatMessage ( role = MessageRole . SYSTEM , content = IMG_SYSTEM_PROMPT ) ,
ChatMessage ( role = MessageRole . USER , content =[
{ " type " : " image " , " image " : img_file } ,
{ " type " : " text " , " text " : question } ,
]) ,
]
output : str = vision_expert_llm . generate ( messages )

21
B.2 Answer Reviewer Tool Prompt

The Answer Reviewer tool utilizes an LLM provided with: (1) manager agent’s solution; (2) manager agent’s notes; (3)
the original problems (including texts and figures). It then returns an str-object representing its review results.

# input : agent_solution : str , agent_note : str , markdown_content : List [ Dict [ str ,


Any ] , review_expert_llm : LLMModel
# markdown_content includes markdown file text and image .
R E V I E W _ S Y STEM_PROMPT = (
" You are an uncompromising Physics peer - reviewer . Your job is to find * every *
logical , mathematical
error in the worker ’s answer . "
" Check dimensional consistency , missing steps , incorrect sign conventions ,
numerical mistakes , and
unclear explanations . Focus especially on wrong answers , less on presentations . "
" Be extremely critical : if something is wrong , point it out and request
clarification or correction .
Mainly focus on errors that would lead to a wrong result , rather than focusing
extremely on presentation
or style . "
" It is possible that the worker ’s answer is not correct , so please be
prepared to provide detailed
feedback . The worker ’s answer contains some error , so you must check and point it
out . Also , if the
worker reads measurements from image , make sure to remind the worker that
whenever it reads or measures
from image , it uses the ask_image_expert tool , or the readings might be very
inaccurate .\ n "
)

re vi ew _in struction = (
f " Please review the following solution :\ n \ n "
f " WORKER ’S SOLUTION :\ n { agent_solution }\ n \ n "
f " WORKER ’S NOTE : { agent_note }\ n \ n "
f " Please provide detailed feedback on correctness . "
f " Point out any errors , wrong steps , focus more on correctness rather than
presentation . "
f " The original problem follows : "
)

combined_content : List [ Dict [ str , Any ]] = [


{ " type " : " text " , " text " : review_instruction }
] + markdown_content

messages = [
ChatMessage ( role = MessageRole . SYSTEM , content = REVIEW_SYSTEM_PROMPT ) ,
ChatMessage ( role = MessageRole . USER , content = combined_content ) ,
]
output : str = review_expert_llm . generate ( messages )

C Expert-knowledge Requiring Tasks

We further generate several tasks requiring expert knowledge to test how integrating WolframAlpha would help Physics
Supernova with accurate expert knowledge.
As shown in the following examples, When given access to WolframAlpha Tools, the agent system provides more
accurate answers. In the experiments, we use Gemini 2.5 Pro as LLM and compare the result with and without
WolframAlpha Tools, as shown below. The aggregated results are also shown in Table 4.

22
Problem Q1.
Using the latest AME (Atomic Mass Evaluation) atomic masses, compute the Q-value of double beta decay
76
Ge →76 Se + 2e− (ground state → ground state). Return a single number: the value in keV, rounded to exactly
5 significant digits, in scientific notation. Do not include units or extra text.
GT Answer: 2.0391E+3 w.o.wolfTool Answer: 2.0391E+3 w.wolfTool Answer: 2.0390E+3

Problem Q2.
Using NIST XCOM (or an equivalent authoritative database), determine the mass attenuation coefficient µ/ρ of
lead (Pb) for photons of energy 662.0 keV (137 Cs γ line). Return a single number: the value in cm2 g −1 , rounded
to exactly 5 significant digits, in scientific notation. Do not include units or extra text.
GT Answer: 1.1105E-1 w.o.wolfTool Answer: 1.1352E-1 w.wolfTool Answer: 1.1150E-1

Problem Q3.
Using the Ciddor (1996) refractive-index model for air, at wavelength λ = 633nm (vacuum), P = 101325 Pa, T =
20 °C, RH = 50 , and CO2 = 450 ppm, compute n − 1. Return a single number: the value (dimensionless), rounded
to exactly 5 significant digits, in scientific notation. Do not include units or extra text.
GT Answer: 2.7132E-4 w.o.wolfTool Answer: 2.6894E-4 w.wolfTool Answer: 2.7139E-4

Problem Q4.
Using the IAPWS-IF97 formulation for water/steam, compute the specific enthalpy of water at p = 15 MPa and T
= 650 K (single-phase state as appropriate). Return a single number: the value in kJkg −1 , rounded to exactly 5
significant digits, in scientific notation. Do not include units or extra text.
GT Answer: 2.8686E+3 w.o.wolfTool Answer: 3.0462E+3 w.wolfTool Answer: 2.8690E+3

Problem Q5.
Using NIST X-ray transition energies (or equivalent), determine the photon energy of the copper κα1 (KL3 ) line
for elemental Cu at ambient conditions. Return a single number: the value in keV, rounded to exactly 5 significant
digits, in scientific notation. Do not include units or extra text.
GT Answer: 8.0478E+0 w.o.wolfTool Answer: 8.0463E+0 w.wolfTool Answer: 8.0478E+0

Problem Q6.
Using the IGRF 13th generation (epoch 2025.0), compute the total geomagnetic field magnitude at (40.0140° N,
105.2705° W, altitude 1624 m) on 2025-01-01 00:00 UTC. Return a single number: the value in nT, rounded to
exactly 5 significant digits, in scientific notation. Do not include units or extra text.
GT Answer: 5.1321E+4 w.o.wolfTool Answer: 4.9726E+4 w.wolfTool Answer: 5.1300E+4

Problem Q7.
Using CODATA-2022 fundamental constants, compute the rest frequency of the neutral hydrogen 21 cm hyperfine
transition (ground-state spin-flip). Return a single number: the value in Hz, rounded to exactly 5 significant digits,
in scientific notation. Do not include units or extra text.
GT Answer: 1.4204E+9 w.o.wolfTool Answer: 1.4228E+9 w.wolfTool Answer: 1.4204E+9

Problem Q8.
Using the NIST ESTAR database (or equivalent), determine the mass stopping power of aluminum (Al) for electrons
of kinetic energy 1.000 MeV. Return a single number: the value in M eV cm2 g −1 , rounded to exactly 5 significant
digits, in scientific notation. Do not include units or extra text.
GT Answer: 1.4860E+0 w.o.wolfTool Answer: 1.5980E+0 w.wolfTool Answer: 1.5980E+0

23
Problem Q9. Using JANAF/NIST thermochemical data (ideal-gas heat capacities), determine the molar heat
capacity at constant pressure, Cp , of nitrogen gas (N2 ) at T = 1200 K (assume thermally perfect ideal gas, no
dissociation). Return a single number: the value in Jmol−1 K −1 , rounded to exactly 5 significant digits, in
scientific notation. Do not include units or extra text.
GT Answer: 3.3723E+1 w.o.wolfTool Answer: 3.3540E+1 w.wolfTool Answer: 3.3724E+1

Problem Q10.
For Beijing, China (39.9042° N, 116.4074° E, elevation 50 m), determine the umbral magnitude of the next lunar
eclipse after 2025-08-09 that is at least partially visible from that location. Return a single number: the umbral
magnitude (dimensionless), rounded to exactly 5 significant digits, in scientific notation. Do not include units or
extra text.
GT Answer: 1.3638E+0 w.o.wolfTool Answer: 1.1510E+0 w.wolfTool Answer: 1.3680E+0

Experiment

Q1-1
EuPhO-2020 Online Experimental Problems. Language: English

1 Hidden Charge x-coordinate of the electron beam in cm:


English (Official) Enter a number between -20 and 20 and then press re-
1.1 Introduction turn. Finally, the program asks for yi , with the prompt

Non-ideal capacitors (10 points) An unknown point charge Q is fixed in a region of


y-coordinate of the electron beam in cm:
Enter a number between -20 and 20 and then press re-
space. Electrons launched parallel to the z axis far from turn. If you enter an invalid number for any of these
This experiment is designed to investigate the properties of capacitors.
the charge will scatter electrostatically off of the fixed three, the program will prompt you with
Capacitor’s capacitance (which always means differential capacitance in this text) can be found based on charge and strike a detecting screen. It is possible learn
about the details of the hidden charge by varying the ini- Invalid entry.
its charging graph of its voltage 𝑈 (𝑡) via the resistor 𝑅1 . Depending on the circuit, it is necessary to find tial kinetic energy as well as the initial xi and yi coordi- and will then prompt you for the value again, remind-
the relation of capacitor’s charging current vs voltage 𝐼 (𝑈) and use it to determine capacitance: nates of the electron beam and measuring the final coor- ing you of the allowed limits.
dinates xf and yf of where an electron strikes a finite flat After the three numbers have been entered, the pro-
screen perpendicular to the z axis and located at z = 0. gram will output
d𝑞 𝐼 d𝑡 𝐼 (𝑈)
𝐶(𝑈) = = = . (1) It is useful to know the Rutherford scattering formula, Electron beam fired with parameters (x, y,
d𝑈 d𝑈 d𝑈/d𝑡 V) =
kqQ 1
b=
2E tan(θ/2) and it will restate your entered values, and then
The electric circuit implemented in this experiment is shown in Fig. 1.1. Switch S1 on the board can be
Electron detected at (x, y) =
used to switch between capacitors C1 and C2. The middle position of the switch does not play any role in where b is the impact parameter, E is the energy of the
electron, q = −1.602 × 10−19 C is the charge of the elec- and give the screen location of the detected electron.
this experiment and should never be used. However, if the electron misses the finite size screen, you
tron, k = 8.99 × 109 Nm2 /C2 , and θ is the scattering angle.
The impact parameter is defined as the closest approach will be told
of the electron to the target, assuming that the electron Electron not detected...
were unaffected by the target and hence would move in The program then repeats, allowing you to enter in a
a straight line; the scattering angle is angle between the new set of initial coordinates.
original velocity vector of the electron far from the tar-
get and the final velocity vector of the electron far from
the target after scattering. 2 Black box

2.1 Introduction
You have a rigid mechanical black box consisting of a
container of mass m1 . Inside the container there is a load
electron trajectory θ of mass m2 that hangs on an effectively massless spring
of stiffness k1 from the ceiling of the box. Another mass
m3 is hanged to the mass m2 via another massless spring
b
of stiffness k2 . There is a small viscous drag which de-
pends on the velocity of the objects. The gravity of Earth
Q is g = 9.81 m/s2 and is parallel to the sides of the box.

Figure 1.1. Electric circuit for the experiment.


1.2 Task
The task is to determine the position (xQ , yQ , zQ ) and also
Caution: one of the sample capacitors contains a dielectric with dielectric permittivity that depends on
the magnitude and sign of the fixed charge Q, as pre-
the capacitor voltage change rate. To keep this rate as stable as possible, when measuring at the positive cisely as possible. You should provide rough, order of
voltages, the capacitor should be charged from 9 V down to –9 V, while measurements at the negative magnitude error estimates on these results. There is
voltages should be done when capacitor is charged from –9 V towards 9 V. The measured capacitance can Gaussian error associated with initial beam location that
is on the order of 0.5 mm.
be influenced by the previous state of the capacitor, thus capacitor should be kept at the starting voltage
As with all experiments, you must provide clearly la-
for at least 10 s before the measurement. belled tables of data, clearly labelled graphs, and suf-
ficient formulae derivations to make it clear what you
Part A. Capacitors at room temperature (4.0 points) have measured, and how you are deriving your results.

Measure and graph the capacitance of the capacitors C1 and C2 versus the voltage at room temperature
1.3 Program Interface
(draw all graphs together on the same axes).
The program asks for an accelerating voltage with the
A.1 Measure and graph 𝐶1 (𝑈) and 𝐶2 (𝑈) in range from –7 V to 7 V. In the answer 2.3pt prompt
The box can be moved up or down with a piece-wise
sheet write 𝐶1 and 𝐶2 values at 0 V, 3 V, and 6 V. Write down the formula used for Beam accelerating voltage in V:
constant acceleration. The acceleration pattern can be
calculating capacitance from raw measurements. Also write Board ID and room Enter a number between 1 and 10000, and press re- programmed through input by giving the duration (in
turn. The program then asks for the initial launch coor- seconds) and acceleration (in m/s2 ) for each step. The
temperature.
dinates, starting with xi , with the prompt simulation shows in ”real time” the force F exerted to

Figure 5: Left: an instrument-based experiment example (IPhO 2021 experiment problem 1); Right: a program-based
experiment example (EuPhO 2020 experiment problem 1). For IPhO 2021 experiment problem 1, a circuit board with
electronic components to be measured inside it is provided, where contestants have to conduct measurements for these
components: this instrument-based experiment requires contestants to appropriately conduct manipulations on
real experiment instruments, which is not tested in program-based experiments. For EuPhO 2020 experiment
problem 1, a program simulates experiments about detecting unknown charge with electron beams, similar to the
Rutherford scattering experiment: this program-simulated experiment is more related to modern physics, and it is
impractical in a typical Olympiad venue due to cost and safety constraints.

D Example of Instrument-based and Program-based experiments

We provide two examples of Instrument-based and Program-based experiments, as shown in Figure 5: the instrument-
based experiment is IPhO 2021 experiment problem 1; and the program-based experiment is EuPhO 2020 experiment
problem 1.

24
As shown and described in the caption of Figure 5, the program-based experiment can be more related to modern
physics and bypasses the difficulties of cost and safety issues, although they are less real compared to instrument-based
experiments.

25

You might also like