Paper2Agent 2509.06917v1
Paper2Agent 2509.06917v1
Repository Demo
Abstract
We introduce Paper2Agent, an automated framework that converts research pa-
pers into AI agents. Paper2Agent transforms research output from passive artifacts into
active systems that can accelerate downstream use, adoption, and discovery. Conven-
tional research papers require readers to invest substantial effort to understand and
adapt a paper’s code, data, and methods to their own work, creating barriers to dis-
semination and reuse. Paper2Agent addresses this challenge by automatically con-
verting a paper into an AI agent that acts as a knowledgeable research assistant. It
systematically analyzes the paper and the associated codebase using multiple agents
to construct a Model Context Protocol (MCP) server, then iteratively generates and
runs tests to refine and robustify the resulting MCP. These paper MCPs can then be
flexibly connected to a chat agent (e.g. Claude Code) to carry out complex scientific
queries through natural language while invoking tools and workflows from the orig-
inal paper. We demonstrate Paper2Agent’s effectiveness in creating reliable and ca-
pable paper agents through in-depth case studies. Paper2Agent created an agent that
leverages AlphaGenome to interpret genomic variants and agents based on ScanPy
and TISSUE to carry out single-cell and spatial transcriptomics analyses. We validate
that these paper agents can reproduce the original paper’s results and can correctly
carry out novel user queries. By turning static papers into dynamic, interactive AI
agents, Paper2Agent introduces a new paradigm for knowledge dissemination and a
foundation for the collaborative ecosystem of AI co-scientists.
1 Introduction
The research paper is the traditional unit of scientific communication. It remains the norm for
documenting methods, results, and insights, and is the primary way research is shared with the
broader community. However, papers are fundamentally passive objects: a reader must discover
the paper (not an easy task given the flood of publications), parse its contributions, and manu-
ally determine how to apply them to their own work. In particular, when a paper describes a
new computational method, significant technical barriers often remain before the method can be
Email: {jcmiao, jamesz}@stanford.edu
1
used on new data [1]. A reader might need to locate the corresponding code repository, install de-
pendencies, configure environments, and interpret the correct inputs and outputs [2]. Even with
well-maintained repositories, this process is often non-trivial.
For instance, consider AlphaGenome, which provides a powerful framework for genome-scale
foundation modeling [3]. Despite its utility, this system requires substantial technical expertise
to set up and deploy, limiting accessibility for biologists who could otherwise benefit. Using Al-
phaGenome in code involves installing the environment, importing multiple modules, creating
client objects with API keys, and constructing inputs such as chromosomes, variant objects, and
selecting desired output modalities. Users must understand the API hierarchy and parameter
semantics, which imposes a learning curve for biologists unfamiliar with these abstractions.
This illustrates a broader challenge: research outputs are passively siloed behind technical
barriers. Paper2Agent re-imagines research dissemination by turning static papers into active
AI agents. Each agent serves as an interactive expert on the corresponding paper, capable of
demonstrating, applying, and adapting its methods to new projects.
Environment
agent Testing
Refine
Configured agent
Input Output
environment
Identify the Remote Connect with
Figure 1: Overview of the Paper2Agent. (A) Paper2Agent turns research papers into interactive AI agents
by building remote MCP servers with tools, resources, and prompts. Connecting an AI agent to the server
creates a paper-specific agent for diverse tasks. (B) Workflow of Paper2Agent. It starts with codebase
extraction and automated environment setup for reproducibility. Core analytical features are wrapped as
MCP tools, then validated through iterative testing. The resulting MCP server is deployed remotely and
integrated with an AI agent, enabling natural-language interaction with the paper’s methods and analyses.
2
AI agents are autonomous systems that can reason about tasks and act to achieve goals by
leveraging external tools and resources [4]. Modern AI agents are typically powered by large
language models (LLMs) connected to external tools or APIs. They can perform reasoning, invoke
specialized models, and adapt based on feedback [5]. Agents differ from static models in that they
are interactive and adaptive. Rather than returning fixed outputs, they can take multi-step actions,
integrate context, and support iterative human–AI collaboration. Importantly, because agents
are built on top of LLMs, users can interact with agents through human language, substantially
reducing usage barriers for scientists.
Recent advances highlight the promise of agents for accelerating discovery. For example, the
Virtual Lab framework organizes teams of AI scientist agents that collaboratively design and exe-
cute research projects across biology and chemistry [6]. Similarly, Google’s AI co-scientist serves
as a virtual collaborator, assisting with hypothesis generation and research proposal development
[7]. Sakana AI’s co-scientist aims for automation of the research lifecycle—from ideation to pub-
lication [8]. FutureHouse provides an AI scientist platform designed for diverse scientific tasks
[9]. Alongside these general-purpose platforms, specialized agents are also emerging for specific
domains [10]. For example, CellVoyager introduces an agentic system for autonomous analysis of
single-cell omics data [11]. Biomni is an AI agent for diverse biological tasks [12]. These systems
demonstrate that agents can not only execute code, but also generate hypotheses, evaluate uncer-
tainty, and adapt methods to new datasets. Paper2Agent complements this emerging paradigm
by generalizing the concept: any research paper can be converted into an agent that embodies the
knowledge and methods described in the publication.
Paper2Agent provides an automated workflow for converting a scientific paper into an agent.
The core idea is to represent the paper as a Model Context Protocol (MCP) server [13]. MCP is a
standardized protocol that allows structured APIs and tools to be exposed in a way that is directly
accessible to LLMs and agent frameworks. The conversion process involves: (i) identifying the key
contributions of the paper (datasets, methods, models, or workflows); (ii) encapsulating these con-
tributions through an MCP server, defining the inputs, outputs, and usage instructions; (iii) link-
ing the MCP server to LLM-based agents, enabling natural language querying and autonomous
execution. Users can then interact with the paper by asking questions, requesting demonstrations,
or applying the method to new data.
As an illustration, applying Paper2Agent to AlphaGenome would expose its genome founda-
tion model as an MCP. Instead of requiring users to clone repositories and configure dependencies,
they could simply ask: “Generate AlphaGenome predictions for these variants.”, "Interpret the expected
effect of this variant on chromatin accessibility in muscle cells." or “Visually compare the AlphaGenome
predicted expression changes for a splicing variant in cell types of interest.” The Paper2Agent-generated
agent would handle the setup, execution, and presentation of results, making the method accessi-
ble to both computational experts and experimental biologists.
Efforts to make research outputs more executable and accessible have been ongoing for years.
Executable papers—such as those proposed in Elsevier’s Executable Paper Grand Challenge [14]
and more recent Jupyter Notebook–backed publications [15]—sought to merge narrative text with
runnable code. These approaches increased reproducibility but still required substantial technical
familiarity to engage with fully. The Papers with Code initiative [16] similarly aimed to bridge
papers and implementations by linking publications to open-source repositories. While this im-
proved discoverability, the barrier of installing and executing the code remained.
Paper2Agent substantially extends this trajectory by providing a new framework: a paper can
be transformed into a capable agent accessible via natural language. In contrast to previous efforts,
3
Paper2Agent shifts the research output from a document or codebase encoding knowledge to a
knowledgeable entity capable of execution and dialogue. This represents a new mode of scientific
communication, moving beyond static dissemination to interactive collaboration. This framework
lowers barriers to adoption, democratizes access to advanced methods, and accelerates the trans-
lation of research into practice.
2 Results
Overview of Paper2Agent
Paper2Agent is a multi-agent AI system that automatically transforms research papers into inter-
active AI agents with minimal human input. The paper agents created via this framework are:
1. Interactive and easy to use: Users can execute complex scientific analyses through natural
language prompts, eliminating the need for programming expertise.
2. Reliable and reproducible: Each tool used by a paper agent is validated against the refer-
ence codebase’s reported results and figures using example datasets, then locked to ensure
reproducibility. This design mitigates the risk of “code hallucination”, where executing in-
accurate LLM-generated code could lead to incorrect scientific results. It also minimizes
randomness in code generation, further strengthening reproducibility. Finally, every tool
includes a code reference from the original paper to provide transparency and traceability.
MCP has recently become an industry standard for connecting LLM-based agents with external
resources, providing a unified interface for accessing datasets and tools without custom integra-
tion [13]. Paper2Agent builds on this ecosystem with two components: (i) Paper2MCP, which
extracts information from papers and their codebases to build remote MCP servers; and (ii) an
agent layer, which wraps each MCP server as a context provider to instantiate paper-specific AI
agents (Figure 1A). Any LLM or external agent can invoke the servers’ tools through MCP with-
out extra setup. For presentation clarity, we assign one MCP server and one paper agent to each
paper. The same approach can create MCPs and agents for a group of related papers. Each MCP
server includes three core components:
1. MCP Tools are executable functions that encapsulate a paper’s methodological contribu-
tions. For example, one AlphaGenome MCP tool takes a genetic variant as input and gener-
ates predictions and visualizations of its effects on gene expression, chromatin accessibility,
and other modalities. These tools come with a pre-configured environment for seamless
execution.
2. MCP Resources serve as a repository of static assets, including the manuscript text, the
associated codebase, and supplementary materials such as datasets, tables, and figures. As
an illustration, the AlphaGenome MCP resources include links to the training data used
to train the model. All resources are stored in accessible, standardized formats to enable
efficient querying and integration by AI agents.
3. MCP Prompts contain concise instructions that guide AI agents through complex, multi-
step scientific workflows derived from a paper’s text or codebase. For example, a Scanpy
MCP Prompt encodes the sequence of steps for preprocessing and clustering single-cell data,
which we present later in the manuscript. These templates orchestrate tools and resources to
ensure reproducible, systematic analyses while reducing the barrier to effective prompting.
The paper MCP servers can be hosted remotely on platforms like Hugging Face Spaces, elimi-
nating local dependency issues. MCP standardizes communication, enabling secure and scalable
4
integration with AI agents. The agent layer wraps each Paper2MCP server as a context provider,
creating paper-specific conversational agents. Any compatible LLM or agent can connect to these
servers to perform tasks such as reproducibility checks, new data analyses, or figure regenera-
tion. For example, a user might ask, “Apply the method in this paper to the newly generated dataset”,
and the agent will automatically run the pipeline, produce results, and present interpretable out-
puts. By abstracting away technical details, the agent lowers barriers to method adoption, ensures
reproducibility, and helps researchers focus on insights rather than implementation.
We implemented Paper2Agent with Claude Code [17], an AI coding agent specialized in man-
aging complex coding tasks and real-time iterative debugging (Extended Methods). The workflow
begins by identifying the codebase associated with a paper (Figure 1B). Two specialized agents
are then invoked: the environment agent, which configures the necessary software environment,
and the extraction agent, which translates core methods into implemented tools. These tools are
validated through a testing agent that runs automated checks, refining both the code and envi-
ronment until results match the reference outputs. Once validated, the tools and environment are
packaged into an MCP Python file that can be deployed on a remote server such as Hugging Face.
Finally, the paper MCP server is connected with an AI agent to create a fully functional Paper
Agent, enabling interactive access to the paper’s methods through natural language queries. We
use Claude Code as the downstream AI agent in our case studies, though the paper MCPs can be
flexibly integrated with different chat agents. Because MCPs are modular, multiple MCPs can be
connected to the same chat agent, enabling users to leverage tools and resources across multiple
papers simultaneously.
Next, we present three case studies demonstrating Paper2Agent’s ability to convert diverse
research papers into reliable, interactive AI agents for different scientific tasks. These case studies
include AlphaGenome [3] for genomics, TISSUE [18] for spatial transcriptomics, and Scanpy [19]
for single-cell analysis.
5
A Paper2Agent automatically generates AlphaGenome MCP server and agent
B Generated MCP tool has flexible input and provides source of code C AlphaGenome agent produces
numerical results with 100% accuracy
using example and novel inputs
User: Score variant chr19:8134523:G>A using ATAC-seq
predictions for lung (UBERON:0002048). What is the
quantile_score for this cell type?
Ground truth answer: 0.11485085
Agent answer: 0.11485085
15/15 15/15
100%
80%
Accuracy
60%
40%
20%
0%
Example inputs Novel inputs
6
Importantly, the tools generated by Paper2Agent are designed with flexible, well-annotated
input parameters. For example, the visualize_variant_effects() tool exposes a rich set of op-
tions that make it adaptable to diverse use cases (Figure 2B). Given an input genetic variant, the
AlphaGenome agent can select the organism to analyze (human or mouse), adjust the sequence
context length around the variant, toggle different modalities—such as RNA-seq, ATAC-seq, or
ChIP-seq histone tracks. Moreover, each MCP tool embeds a traceable link to the original GitHub
source code, ensuring transparency and reproducibility. By connecting an AI agent with the Al-
phaGenome MCP, the system creates the AlphaGenome agent.
Next, we benchmarked the Paper2Agent-generated AlphaGenome agent in producing nu-
merical results and figures relative to human experts configuring and running the code manu-
ally (Figure 2C). We manually curated 15 example queries directly from the AlphaGenome tu-
torial, such as "Score variant chr3:58394738:A>T using ATAC-seq predictions for motor neuron cells
(CL:0000100). What is the quantile_score for this cell type?", "Make DNase-seq predictions for sequence
’GATTACA’ (padded to 2048 length) for lung tissue (UBERON:0002048). What is the nonzero_mean
value in the dnase metadata". The AlphaGenome agent achieved 100% accuracy on these queries,
precisely matching all reported values. To assess generalizability and guard against potential
overfitting to the original examples, we also manually curated a set of novel queries that were not
present in either the paper or its codebase (see Supplementary Table 2). These included previously
untested variant positions, allelic substitutions, and tissue–cell type contexts, such as "Analyze
variant chr9:98765432:T>C with DNASE predictions for muscle cells (CL:0000187). What is the quan-
tile_score for muscle tissue?" and "Analyze histone ChIP-seq metadata for neuronal stem cells. What is
the nonzero_mean value for H3K4me3 in neuronal stem cells (CL:0000100)?" The AlphaGenome agent
again achieved 100% accuracy, faithfully producing the expected numerical outputs, which we
verified by manual execution of the original AlphaGenome code.
Finally, we demonstrated that the AlphaGenome agent enables automatic interpretation of
GWAS loci and validation of the analysis in the original paper. We considered the example of
interpreting why the genetic variant chr1:109274968:G>T is associated with low-density lipopro-
tein cholesterol that was presented in the original AlphaGenome paper (Figure 2D). Based on
the tools available, the AlphaGenome agent constructs a step-by-step plan to solve this task. This
plan includes generating input files, scoring variants across multiple modalities, filtering results
for trait-relevant tissues, creating modality-specific visualizations (chromatin accessibility, histone
marks, transcription factor binding, and splicing), and assembling a comprehensive interpretation
report. The agent then executes these actions using implemented tools, such as score_variant()
and visualize_tf_binding(), automatically refining its strategy through iterative observation
and feedback. A final report is then presented to provide a unified interpretation of the regulatory
impact of the variant, integrating evidence across modalities and tissues.
Interestingly, the AlphaGenome agent prioritizes SORT1 as the most likely causal gene, whereas
the original paper emphasized CELSR2 and PSRC1. The agent favors SORT1 for two reasons: 1)
a high quantile score (0.99982) indicating a strong predicted impact on SORT1 expression in liver
tissue. Here, the quantile score reflects how extreme the variant’s predicted effect relative to other
variants 2) SORT1 encodes sortilin, directly involved in LDL/VLDL secretion [20]. We manually
queried the GTEx [21] eQTL data and confirmed that this variant is a significant liver eQTL for
SORT1 (p = 1.1e-65) in liver. However, both CELSR2 and PSRC1 also exhibit high AlphaGenome
quantile scores (0.99998 each) and significant eQTL associations in GTEx liver (p = 4.7e-46 and
8.5e-50, respectively). This shows the inherent difficulty in confidently assigning causal genes at
complex GWAS loci where the variants are eQTLs for multiple nearby genes [22, 23].
This discrepancy highlights a key strength of Paper2Agent: with a single prompt, users can re-
7
evaluate published conclusions using independent model-based evidence. Rather than treating
the original interpretation as fixed, the agent enables dynamic hypothesis re-assessment and, at
scale, provides a systematic way to revisit conclusions across many studies.
calibrate_uncertainties_and_prediction_in
tervals(): calibrate uncertainties and obtain User: Use TISSUE to perform
prediction intervals for spatial gene uncertainty-aware
expression prediction dimensionality reduction on my
… spatial transcriptomics data
Connect to
multiple_imputation_hypothesis_testing(): an
Paper2MCP Hypothesis testing with TISSUE multiple
imputation framework Agent: Sure! Please see
AI Agent
the PCA figure attached.
Spatial Instructions for
transcriptomics … uncertainty-aware
data used in spatial transcriptomics
TISSUE analysis
B TISSUE agent offers Q&A support that guides C TISSUE agent produces identical
researchers in applying TISSUE effectively results to those of human researchers
User: Based on the TISSUE MCP server, what can TISSUE User: Use TISSUE to generate the prediction
do and what are the required input and expected output? interval for gene Acta2. Below are my data:
- Spatial count matrix: Spatial_count.txt
- Spatial locations: Location.txt
- scRNA-seq count matrix: scRNA_count.txt
D TISSUE MCP resources provide a structured format for spatial transcriptomics data used in TISSUE
Figure 3: Overview of the Paper2Agent-generated TISSUE agent. (A) Construction of the TISSUE MCP
server and agent. (B) Q&A support for uncertainty-aware spatial transcriptomics analysis. (C) Repro-
ducibility confirmed by matching human researcher results. (D) Structured MCP resources enable stan-
dardized dataset access and automated downloads.
8
TISSUE Agent for Uncertainty-Aware Single-Cell Spatial Transcriptomics Analysis
We next present the Paper2Agent-generated paper agent for TISSUE [18], a recent paper that
developed a new method for uncertainty-aware single-cell spatial transcriptomics analysis (Fig-
ure 3A). This case study reflects a common scenario: a new methodology paper is published,
and researchers want to apply the method to their own data but lack the time to navigate the
codebase, configure the environment, and grasp the method’s features and input requirements.
Paper2Agent addresses these challenges by automatically generating ready-to-use agents for di-
verse papers and providing Q&A support to guide input preparation and clarify what the method
can do.
Paper2Agent generated 6 tools for the TISSUE MCP server, covering spatial gene expression
prediction, prediction interval construction, and uncertainty-aware downstream analysis such as
hypothesis testing, prediction, and dimensionality reduction (Figure 3A). Importantly, the TIS-
SUE agent can also serve as an interactive guide (Figure 3B). For example, when prompted with
“Based on the TISSUE MCP server, what are the required inputs for TISSUE?”, the agent returns a
structured and comprehensive explanation of the method’s required inputs, expected outputs,
and available features. This transforms the TISSUE paper into an interactive AI agent: instead of
manually searching through documentation or code, users can directly ask the agent about how
to use TISSUE and receive precise, actionable instructions.
Next, we evaluate the TISSUE agent’s ability to construct prediction intervals for spatial tran-
scriptomic (ST) prediction. We prompt the agent "Calculate the prediction interval for the spatial
gene expression prediction of gene Acta2 using TISSUE. This is my data: Spatial count matrix: Spa-
tial_count.txt Spatial locations: Locations.txt scRNA-seq count matrix: scRNA_count.txt". The agent
automatically executes the TISSUE pipeline, without additional user intervention (Figure 3C). The
output matches the results obtained by human experts running the pipeline manually. This illus-
trates the paper agent’s ability to run entire analysis workflows (in this case, from data loading
and preprocessing through imputation and uncertainty estimation), not just individual tools.
Finally, we showcase the use of MCP resources by translating the data availability section of the
TISSUE paper into a structured registry. This registry harmonizes ST datasets with standardized
metadata (species, tissue type, modality, and data URL) and makes them directly accessible to
the TISSUE agent through data repository APIs such as the Zenodo REST API (Figure 3D). Users
can query and filter datasets, for example, by species—without manually navigating multiple
repositories. Combined with the TISSUE MCP tools, a user’s query might be: “Download the
mouse spatial transcriptomics data from this paper and run TISSUE to generate a prediction interval after
applying spatial prediction to the dataset.”. The TISSUE agent then automatically filters for mouse
data, downloads it, and applies the TISSUE pipeline.
9
A Paper2Agent automatically generates Scanpy MCP server and agent
Scanpy Agent
Scanpy MCP Server
Chain-of-tools
Output
C Scanpy agent produces the same results as human researchers with identical inputs
User: Perform standard single-cell preprocessing and clustering pipeline on this single cell data: data.h5ad
Highly
variable
gene
UMAP
Figure 4: Overview of the Paper2Agent-generated Scanpy agent. (A) Construction of the Scanpy
MCP server and agent. (B) MCP prompts encode a standardized single-cell preprocessing and
clustering pipeline. (C) Agent reproduces human researcher results, requiring only the dataset
path as input.
10
We focus on Scanpy’s most common use case: preprocessing and clustering single-cell data.
Paper2Agent generates 7 tools for this feature in around 45 minutes on a personal laptop – tools
such as quality_control() for calculating and visualizing QC metrics, filtering cells and genes,
and detecting doublets, and normalize_data() for normalizing count data (Figure 4A). This al-
lows users to prompt the Scanpy agent to perform quality control on their single-cell data.
In practice, many users prefer an end-to-end workflow for preprocessing and clustering, where
the implemented tools are executed sequentially in the correct order. This type of analysis work-
flow is not unique to single-cell analysis but is common across many scientific domains. However,
executing such workflows can be challenging: the AI agent must either already “know” the correct
order of actions, or the user must provide a carefully structured prompt that explicitly specifies
the sequence. To overcome this limitation, we use MCP prompts to guide the agent. MCP prompts
offer a standardized way to encode workflows, ensuring that tools are executed in the proper or-
der and relieving users from the burden of manually instructing the agent. Importantly, these
MCP prompts are inferred directly from the paper and codebase by Paper2Agent, without the
need for manual curation. This design improves both reproducibility and usability, particularly
for complex analyses such as single-cell data processing.
For example, the Paper2Agent-generated Scanpy MCP prompts encode a standard preprocess-
ing and clustering pipeline, including quality control, normalization, feature selection, dimen-
sionality reduction, graph construction, clustering, and cell-type annotation in the correct order
(Figure 4B). The prompt also instructs the Scanpy agent to inspect the data before analysis to se-
lect appropriate parameters. Users only need to provide the data path (e.g., data.h5ad), and the
Scanpy agent automatically runs the workflow and provides a summary of the analysis results.
To evaluate the Scanpy agent’s performance, we applied it to preprocess and cluster three pub-
licly available single-cell datasets from 10x Genomics (Data availability) that are not included in
the Scanpy codebase. We invoke the Scanpy MCP prompts and query the Scanpy agent "Perform
standard single-cell preprocessing and clustering pipeline on this single-cell data: data.h5ad". As shown
in Figure 4C, the agent produces outputs that match those produced by human researchers when
processing the same data. This demonstrates how MCP-prompt–powered Scanpy agents stream-
line workflow execution, making advanced single-cell analysis both accessible and reproducible.
3 Discussion
In this work, we introduce Paper2Agent, a framework that transforms a research paper from a
passive publication into an interactive AI agent. We demonstrate this approach by creating Pa-
per2Agent instances for several methodological advances, including AlphaGenome for genomics,
Scanpy for single-cell analysis, and TISSUE for spatial transcriptomics. These examples illustrate
how a paper agent can embody the research contribution, making it directly accessible through
natural language interaction. The generated paper MCPs are modular units that can be connected
to diverse user-facing agents, enabling broad adoption. By lowering the barrier between publica-
tion and practical application, Paper2Agent helps bridge the gap between how scientific discov-
eries are disseminated and how they are used in practice.
Our initial focus has been on methodological papers, since they offer the clearest use case. Such
papers typically describe algorithms, models, or computational workflows that other researchers
seek to adopt, but whose deployment often requires substantial technical expertise. Converting
them into agents allows the methods to be applied to new problems without the overhead of
mastering complex software ecosystems. In future work, we plan to extend Paper2Agent to other
11
forms of research output, including data resources and discovery papers. In those contexts, the
agent’s role may shift from computation to interpretation, curation, or explanation, guiding users
through datasets or contextualizing new insights for diverse scientific communities.
Not every paper can be seamlessly turned into a robust agent. If the original codebase is in-
complete, poorly documented, or contains unresolved errors, Paper2Agent cannot reliably expose
it as a functioning tool. Yet this limitation is itself informative: the ease with which a paper can
be transformed into an agent can serve as a practical measure of reproducibility and rigor. Just
as the scientific community has come to expect clear data and code availability, we envision that
a natural extension will be to expect contributions to be structured in ways that facilitate their
translation into agents. Well-documented, modular, and transparent papers will naturally lend
themselves to this new standard.
To better quantify this ease of reproducibility and agentification, we have introduced a bench-
marking approach based on manually evaluated examples from the paper as well as novel ex-
amples meant to test generalizability. With this approach, we showed, for example, that the Al-
phaGenome agent was able to execute both tutorial-based and novel queries with 100% accuracy.
This approach, however, is limited by expert knowledge of the paper and method and manual im-
plementation and review. A future direction is to further streamline this process with additional
agentic frameworks, e.g. with LLM-as-judge evaluations [24].
Another consideration is the scale of agentification. While the paper is the conventional unit of
scientific communication, it is not always the best unit for agentification. In many fields, an idea
evolves across a sequence of publications, each adding refinements, benchmarks, or applications.
In such cases, the most useful agent may not represent a single paper but rather a collection of
related works aggregated into a coherent interface. A single MCP can encapsulate multiple related
papers. We plan to work on extensions of Paper2Agent to flexibly accommodate this broader
scope.
Looking forward, just as many journals now require data and code availability sections, we
anticipate the emergence of an “agent availability” section that specifies whether and how the
contribution has been embodied as an interactive agent. This would not only provide immedi-
ate utility to readers but also incentivize authors to present their work in a form conducive to
agentification.
Finally, once scientific knowledge is encoded in active agents rather than static artifacts, the po-
tential extends beyond individual use. Agents could interact with one another, linking methods to
datasets or combining insights from different domains. For example, an agent representing a new
analytical method could collaborate with an agent representing a newly released dataset, jointly
producing analyses that neither artifact could support alone. Communities of such agents could
form a dynamic, interoperable layer of scientific intelligence, accelerating connections across dis-
ciplines. Paper2Agent thus points toward a future in which scientific communication is not only
about describing results, but also about creating interactive, collaborative entities that embody
and extend the research.
Data availability
This paper utilized publicly available data for analysis:
10x Genomics PBMC single-cell RNA-seq datasets: http://cf.10xgenomics.com/samples/cell-exp/
3.0.0/pbmc_1k_v2/pbmc_1k_v2_filtered_feature_bc_matrix.h5, http://cf.10xgenomics.
com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_filtered_feature_bc_matrix.h5,
12
http://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_protein_v3/pbmc_1k_protein_
v3_filtered_feature_bc_matrix.h5.
Mouse somatosensory cortex spatial transcriptomic data: Dataset15 in https://zenodo.org/
records/8259942
GTEx portal: https://gtexportal.org/home/
AlphaGenome Github repository: https://github.com/google-deepmind/alphagenome
TISSUE Github repository: https://github.com/sunericd/TISSUE
Scanpy Github repository: https://github.com/scverse/scanpy
Code availability
Paper2Agent is publicly available at https://github.com/jmiao24/Paper2Agent.
AlphaGenome MCP server: https://huggingface.co/spaces/Paper2Agent/alphagenome_mcp.
Scanpy MCP server: https://huggingface.co/spaces/Paper2Agent/scanpy_mcp.
TISSUE MCP server: https://huggingface.co/spaces/Paper2Agent/tissue_mcp.
Agent availability
Paper2Agent-generated AlphaGenome agent is publicly available at https://huggingface.co/
spaces/Paper2Agent/alphagenome_agent.
Acknowledgments
We thank Abubakar Abid, Eric Sun, Emma Dann, lab members from the Zou lab and the Pritchard
lab for helpful feedback during the project. J.Z. is supported by funding from the Chan-Zuckerberg
Biohub.
References
[1] Ana Trisovic, Matthew K Lau, Thomas Pasquier, and Mercè Crosas. A large-scale study on
research code quality and execution. Scientific Data, 9(1):60, 2022.
[2] Dylan GE Gomes, Patrice Pottier, Robert Crystal-Ornelas, Emma J Hudgins, Vivienne For-
oughirad, Luna L Sánchez-Reyes, Rachel Turba, Paula Andrea Martinez, David Moreau,
Michael G Bertram, et al. Why don’t we share data and code? perceived barriers and benefits
to public archiving practices. Proceedings of the Royal Society B, 289(1987):20221113, 2022.
[3] Ziga Avsec, Natasha Latysheva, Jun Cheng, Guido Novati, Kyle R Taylor, Tom Ward, Clare
Bycroft, Lauren Nicolaisen, Eirini Arvaniti, Joshua Pan, et al. Alphagenome: advancing reg-
ulatory variant effect prediction with a unified dna sequence model. bioRxiv, pages 2025–06,
2025.
[4] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan
Cao. React: Synergizing reasoning and acting in language models. In International Conference
on Learning Representations (ICLR), 2023.
13
[5] Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. Agentless: Demystify-
ing llm-based software engineering agents. arXiv preprint arXiv:2407.01489, 2024.
[6] Kyle Swanson, Wesley Wu, Nash L Bulaong, John E Pak, and James Zou. The virtual lab of
ai agents designs new sars-cov-2 nanobodies. Nature, pages 1–3, 2025.
[7] Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Ar-
tiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. Towards an ai
co-scientist. arXiv preprint arXiv:2502.18864, 2025.
[9] Ali Essam Ghareeb, Benjamin Chang, Ludovico Mitchener, Angela Yiu, Caralyn J
Szostkiewicz, Jon M Laurent, Muhammed T Razzak, Andrew D White, Michaela M Hinks,
and Samuel G Rodriques. Robin: A multi-agent system for automating scientific discovery.
arXiv preprint arXiv:2505.13400, 2025.
[10] Yuanhao Qu, Kaixuan Huang, Ming Yin, Kanghong Zhan, Dyllan Liu, Di Yin, Henry C
Cousins, William A Johnson, Xiaotong Wang, Mihir Shah, et al. Crispr-gpt for agentic au-
tomation of gene-editing experiments. Nature Biomedical Engineering, pages 1–14, 2025.
[11] Samuel Alber, Bowen Chen, Eric Sun, Alina Isakova, Aaron J Wilk, and James Zou. Cellvoy-
ager: Ai compbio agent generates new insights by autonomously analyzing biological data.
bioRxiv, pages 2025–06, 2025.
[12] Kexin Huang, Serena Zhang, Hanchen Wang, Yuanhao Qu, Yingzhou Lu, Yusuf Roohani,
Ryan Li, Lin Qiu, Gavin Li, Junze Zhang, et al. Biomni: A general-purpose biomedical ai
agent. biorxiv, pages 2025–05, 2025.
[13] Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model context protocol (mcp):
Landscape, security threats, and future research directions. arXiv preprint arXiv:2503.23278,
2025.
[14] Piotr Nowakowski, Eryk Ciepiela, Daniel Har˛eżlak, Joanna Kocot, Marek Kasztelnik, Tomasz
Bartyński, Jan Meizner, Grzegorz Dyk, and Maciej Malawski. The collage authoring environ-
ment. Procedia Computer Science, 4:608–617, 2011.
[15] Adam Rule, Amanda Birmingham, Cristal Zuniga, Ilkay Altintas, Shih-Cheng Huang, Rob
Knight, Niema Moshiri, Mai H Nguyen, Sara Brin Rosenthal, Fernando Pérez, et al. Ten
simple rules for writing and sharing computational analyses in jupyter notebooks, 2019.
[16] Robert Stojnic and Ross Taylor. Papers with code is joining facebook ai, December 2019.
Medium article.
[17] Anthropic. Claude code: Deep coding at terminal velocity, 2025. Accessed via Anthropic
website.
[18] Eric D Sun, Rong Ma, Paloma Navarro Negredo, Anne Brunet, and James Zou. Tissue:
uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream
analyses. Nature Methods, 21(3):444–454, 2024.
[19] F Alexander Wolf, Philipp Angerer, and Fabian J Theis. Scanpy: large-scale single-cell gene
expression data analysis. Genome biology, 19(1):15, 2018.
14
[20] Mads Kjolby, Morten Schallburg Nielsen, and Claus Munck Petersen. Sortilin, encoded by the
cardiovascular risk gene sort1, and its suggested functions in cardiovascular disease. Current
atherosclerosis reports, 17(4):18, 2015.
[21] GTEx Consortium. The gtex consortium atlas of genetic regulatory effects across human
tissues. Science, 369(6509):1318–1330, 2020.
[22] Michael Wainberg, Nasa Sinnott-Armstrong, Nicholas Mancuso, Alvaro N Barbeira, David A
Knowles, David Golan, Raili Ermel, Arno Ruusalepp, Thomas Quertermous, Ke Hao, et al.
Opportunities and challenges for transcriptome-wide association studies. Nature genetics,
51(4):592–599, 2019.
[23] Hakhamanesh Mostafavi, Jeffrey P Spence, Sahin Naqvi, and Jonathan K Pritchard. System-
atic differences in discovery of genetic effects on gene expression and complex traits. Nature
genetics, 55(11):1866–1875, 2023.
[24] Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Ying-
han Shen, Shengjie Ma, Honghao Liu, et al. A survey on llm-as-a-judge. arXiv preprint
arXiv:2411.15594, 2024.
15
Extended Methods
Details on implementing Paper2Agent
Paper2Agent converts a research paper and its public codebase into a production-ready MCP
server and then exposes that server to an AI agent interface. The process has four stages: (i)
codebase identification and extraction, (ii) environment configuration, (iii) tool synthesis and MCP
server generation, and (iv) testing, refinement, and deployment, followed by agent connection. We
implemented this multi-agent AI system in Claude Code. We design an orchestrator agent that
coordinates four sub-agents:
• Environment-manager: a specialized agent responsible for creating clean, reproducible en-
vironments for research codebases. It analyzes project setup requirements, provisions an
isolated workspace, installs all necessary dependencies, and ensures the code runs without
conflicts. Standardizing environment setup enables reliable execution and reproducibility
across different systems.
• Tutorial-scanner: a specialized agent for reviewing the public codebases to identify and
organize educational resources. It systematically scans available materials, distinguishes
genuine tutorials from other files, and highlights those most useful for reuse. The agent then
produces clear summaries and reports, providing a structured view of which resources are
worth keeping and which can be set aside.
• Tutorial-tool-extractor-implementor: a specialized agent that converts tutorials into reusable
tools. It reviews selected tutorials, identifies tasks that generalize beyond the example data,
and implements each as a clean, single-purpose function with clear inputs, outputs, and de-
faults. The agent parameterizes hardcoded values, enforces file-based inputs, saves essential
results and figures, and returns a standardized summary of produced artifacts. Its goal is to
create a practical function library that reproduces tutorial results on the original data while
remaining ready to run on new datasets.
• Test-verifier-improver: a specialized agent that creates, runs, and refines tests for tutorial
implementations. It uses only the tutorial’s own examples to ensure complete coverage and
faithful reproduction of numerical and visualization results. The agent runs in a loop of
generating tests, executing them, diagnosing failures, and applying fixes. If functions re-
peatedly fail, their MCP decorators are removed, and they will not be included in the MCP
server. All results and logs are recorded for transparency.
Paper2Agent contains six steps:
1. Locate and download the codebase. Identify the official repository linked to the paper, clone
or download it, and gather associated resources such as supplementary data or configura-
tion files.
2. Environment setup. Provision a clean, reproducible workspace using an environment man-
ager, pin dependencies, and verify imports so the codebase runs consistently across ma-
chines.
3. Tutorial discovery. Scan the repository to locate useful reference and educational materials
and produce an index of candidate tutorials for tooling.
4. Tutorial execution and audit. Run the selected tutorials end-to-end with their example data,
capture inputs, outputs, figures, and runtime constraints, and record any implicit assump-
tions that must be made explicit.
5. Tool extraction and implementation. Convert tutorial logic into reusable, single-purpose
16
functions with clear inputs and outputs, parameterize hardcoded values, and save essential
artifacts while preserving tutorial fidelity.
6. MCP server assembly. Integrate the implemented tools, resources, and prompts into a single
MCP server with a manifest, versioning, and basic security defaults, ready to be used by an
orchestrator or co-scientist agent.
The orchestrator agent invokes sub-agents as needed at different stages of the process. As
the Paper2Agent workflow progresses, the results are automatically recorded for each step for
traceability and reproducibility. The detailed setup and prompt are available in the Paper2Agent
GitHub repository.
17
ing, normalization, principal component analysis, neighborhood graph construction, and cluster-
ing—yielding outputs consistent with human-executed analyses.
18