Biomni: A General-Purpose Biomedical AI Agent
Biomni: A General-Purpose Biomedical AI Agent
Kexin Huang1,∗,‡ , Serena Zhang1,∗ , Hanchen Wang1,2,∗ , Yuanhao Qu3,4,5,∗ , Yingzhou Lu5,∗ , Yusuf
Roohani1,6 , Ryan Li1 , Lin Qiu7 , Gavin Li1 , Junze Zhang3,5 , Di Yin3,5 , Shruti Marwaha8 , Jennefer
N. Carter8 , Xin Zhou5 , Matthew Wheeler8 , Jonathan A. Bernstein9 , Mengdi Wang10 , Peng He11 ,
Jingtian Zhou6 , Michael Snyder5 , Le Cong3,5 , Aviv Regev2 , and Jure Leskovec1,‡
1
Department of Computer Science, Stanford University School of Engineering, Stanford, CA, USA
2
Research and Early Development, Genentech, South San Francisco, CA, USA
3
Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
4
Cancer Biology Program, Stanford University School of Medicine, Stanford, CA, USA
5
Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
6
Arc Institute, Palo Alto, CA, USA
7
Paul G. Allen School of Computer Science and Engineering, University of Washington, WA, USA
8
Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
9
Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
10
Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ, USA
11
Department of Pathology, University of California San Francisco, San Francisco, CA, USA
∗Equal contribution.
‡Corresponding authors. Email: [email protected], [email protected]
1
1 Abstract
2 Biomedical research underpins progress in our understanding of human health and disease,
3 drug discovery, and clinical care. However, with the growth of complex lab experiments,
4 large datasets, many analytical tools, and expansive literature, biomedical research is in-
5 creasingly constrained by repetitive and fragmented workflows that slow discovery and limit
6 innovation, underscoring the need for a fundamentally new way to scale scientific exper-
7 tise. Here, we introduce Biomni, a general-purpose biomedical AI agent designed to au-
8 tonomously execute a wide spectrum of research tasks across diverse biomedical subfields.
9 To systematically map the biomedical action space, Biomni first employs an action discov-
10 ery agent to create the first unified agentic environment – mining essential tools, databases,
11 and protocols from tens of thousands of publications across 25 biomedical domains. Built
12 on this foundation, Biomni features a generalist agentic architecture that integrates large
13 language model (LLM) reasoning with retrieval-augmented planning and code-based execu-
14 tion, enabling it to dynamically compose and carry out complex biomedical workflows – en-
15 tirely without relying on predefined templates or rigid task flows. Systematic benchmarking
16 demonstrates that Biomni achieves strong generalization across heterogeneous biomedical
17 tasks – including causal gene prioritization, drug repurposing, rare disease diagnosis, micro-
18 biome analysis, and molecular cloning – without any task-specific prompt tuning. Real-world
19 case studies further showcase Biomni’s ability to interpret complex, multi-modal biomedical
20 datasets and autonomously generate experimentally testable protocols. Biomni envisions a
21 future where virtual AI biologists operate alongside and augment human scientists to dra-
22 matically enhance research productivity, clinical insight, and healthcare. Biomni is ready to
23 use at https://biomni.stanford.edu, and we invite scientists to explore its capabilities, stress-
24 test its limits, and co-create the next era of biomedical discoveries.
2
25 1 Introduction
26 Biomedical research is a key pillar of modern science and medicine, driving discoveries in disease
27 mechanisms, diagnostics, and therapeutics1–4 . Yet, with the growth in large-scale experiments,
28 data, tools, and literature, progress is increasingly slowed by fragmented, complex workflows that
29 require specialized tools, exhaustive literature reviews, intricate experimental design, and careful
30 statistical modeling5, 6 . A vast volume of valuable biomedical data sits underutilized7 , many so-
31 phisticated analyses are not conducted, and many connections for past knowledge and literature are
32 not made, not for lack of significance, but because the demand for expert researchers far exceeds
33 the supply. This mismatch between data abundance and limited human bandwidth highlights an
34 urgent need for a fundamentally new approach – one that can effectively scale expertise, streamline
35 workflows, and unlock the full potential of biomedical research.
36 Recent advances in Artificial Intelligence (AI) have created a paradigm shift, opening the
37 possibility for fundamentally reshaping biomedical research8 . AI agents have dramatically re-
38 shaped fields such as software engineering9 , law10 , material science11 and healthcare12 by au-
39 tomating repetitive tasks, enhancing productivity, and enabling breakthroughs that were previ-
40 ously unimaginable. Given these developments, the question emerges: Can we build a virtual
41 AI biomedical scientist? Such a virtual scientist would autonomously tackle diverse biomedical
42 research tasks spanning multiple subfields, unlocking extensive capabilities and fostering novel
43 insights through interdisciplinary integration – an achievement that can radically augment human
44 biologists limited by specialized expertise. Capable of efficiently managing thousands of concur-
45 rent tasks, this virtual scientist could dramatically enhance human productivity and accelerate the
46 pace of biomedical discovery.
47 Previous approaches have largely relied on specialist agentic workflows tailored to nar-
48 row biomedical tasks13–19 , which restricts their capacity to move fluidly and generalize across
49 the full spectrum of biomedical domains, as needed to answer key research questions. Enabling
50 an AI agent to handle a broad range of biomedical tasks introduces substantial technical chal-
51 lenges – most notably, the need to tightly couple advanced reasoning20 with the ability to exe-
52 cute highly specialized biomedical actions21 . Although LLM-based reasoning has seen significant
53 advancements22 , such LLMs need access to an environment that explicitly defines the biomedical
3
54 action space, which is inherently diverse, domain-specific, and complex. Moreover, a truly capable
55 system requires an agentic architecture that can natively interact with this biomedical environment
56 – autonomously selecting and composing actions, using its reasoning capabilities to plan and exe-
57 cute diverse tasks without relying on rigid, pre-defined workflows.
58 Here we present Biomni, a generalist biomedical AI agent purpose-built to automate and
59 advance biomedical research across a wide range of subfields. Acting as a virtual AI biologist,
60 Biomni autonomously formulates novel, testable hypotheses, performs complex bioinformatics
61 analyses, and designs rigorous experimental protocols. To enable this capability, we first con-
62 structed a unified and comprehensive biomedical action space by systematically analyzing tens
63 of thousands of biomedical research papers spanning 25 distinct subfields, curated from major
64 bio-literature repositories. From this foundation, we developed an LLM-powered action discovery
65 agent capable of reading papers and extracting key tasks, tools, and databases essential to driving
66 biomedical discoveries. These elements are then selected and implemented into Biomni-E1, the
67 foundational environment that defines the biomedical action space for agentic interaction. Biomni-
68 E1 includes 150 specialized biomedical tools, 105 software packages, and 59 databases. We then
69 designed Biomni-A1, a general-purpose agent architecture capable of flexibly executing a broad
70 spectrum of biomedical tasks by using tools and datasets provided by Biomni E1. Given a user
71 query, the agent first uses a retrieval system to identify the most relevant tools, databases, and soft-
72 ware needed. It then applies LLM-based reasoning and domain expertise to generate a detailed,
73 step-by-step plan. Each step is expressed through executable code, enabling precise and flexible
74 compositions of biomedical actions – an essential feature given the domain’s reliance on highly
75 specialized tools and data resources. Unlike traditional function-calling methods, this approach
76 supports the dynamic and complex nature of biomedical workflows. This integrated system allows
77 Biomni not only to solve challenging, large-scale biomedical problems with efficiency, but also to
78 generalize to novel tasks across previously unseen areas of biomedical research.
79 Rigorous benchmarking demonstrates Biomni’s outstanding performance across established
80 biomedical Q&A benchmarks, and robust generalization performance in eight challenging, realis-
81 tic scenarios never encountered during development. Additionally, we highlight Biomni’s practical
82 capabilities through three impactful case studies: (1) analyzing 458 files of wearable sensor data
83 to generate novel insights; (2) rapidly performing comprehensive bioinformatics analyses on mas-
4
84 sive raw datasets, such as single-cell RNA-seq and ATAC-seq data, to generate novel insights and
85 hypotheses; (3) autonomously designing laboratory protocols to assist wet-lab researchers. With
86 Biomni, we introduce the first generation of a scalable, general-purpose biomedical AI agent,
87 setting the stage for an era where virtual AI biologists work alongside human researchers to dra-
88 matically accelerate biomedical discovery from basic research to translation.
89 2 Results
90 Overview of Biomni. Biomni is a general-purpose biomedical AI agent comprising two main
91 components: Biomni-E1, a foundational biomedical environment with a unified action space, and
92 Biomni-A1, an intelligent agent designed to utilize this environment effectively.
93 Curating a unified biomedical action space is challenging due to its inherent complexity and
94 vastness. We systematically address this by employing an AI-driven approach (Figure 1a). Specif-
95 ically, we leveraged the 25 subject categories defined by bioRxiv, selecting the 100 most recent
96 publications per category. An action discovery LLM agent processed each paper sequentially,
97 extracting essential tasks, tools, databases, and software necessary to replicate or generate the de-
98 scribed research. This comprehensive set of resources constitutes the essential actions required to
99 perform a large set of biological research tasks.
100 We then curated Biomni-E1, an environment for a biomedical AI agent to perform a wide
101 range of actions (Figure 1b). Identified tools were rigorously verified by human experts, along
102 with corresponding test cases. These tools (Supplementary Table 1-Table 18) were specifically
103 chosen for their non-trivial nature, encompassing complex code, domain-specific know-how, or
104 specialized AI models. Recognizing the inherent flexibility required by biological software, which
105 cannot always be simplified into static functions, we constructed an execution environment pre-
106 installed with 105 widely-used biological software packages (Supplementary Table 23-30), sup-
107 porting Python, R, and Bash scripts. For database integration, we categorized resources into two
108 distinct groups. The first group consists of massive relational databases accessible via web APIs
109 (e.g., PDB, OpenTarget, ClinVar) (Supplementary Table 19-20). Rather than creating numerous
110 individual retrieval tools, we implemented a unified function per database. Each function accepts
111 natural language queries and internally employs an LLM to parse database schemas and generate
112 executable queries dynamically. Databases without web interfaces were downloaded into a data
5
113 lake and preprocessed locally into structured pandas DataFrames for seamless integration with the
114 agent, for a total of 59 databases in Biomni-E1 (Supplementary Table 21-22). In summary, Biomni-
115 E1 is the first environment for biomedical AI agent and includes 150 specialized biomedical tools,
116 105 software, and 59 databases.
117 To build a general-purpose agent capable of tackling diverse biomedical tasks, we require a
118 specialized agentic architecture – one that avoids hardcoding workflows for each individual task.
119 This led to the development of Biomni-A1, which incorporates several core innovations critical
120 for operating across the biomedical research landscape. First, we introduce an LLM-based tool
121 selection mechanism designed to navigate the complexity and specialization of biomedical tools,
122 dynamically retrieving a tailored subset of resources based on the user’s goal. Second, recogniz-
123 ing that biomedical tasks often require rich procedural logic, Biomni-A1 uses code as a universal
124 action interface – allowing it to compose and execute complex workflows involving loops, par-
125 allelization, and conditional logic. Crucially, this approach also enables the agent to interleave
126 calls to software, tools, databases, and raw data operations that do not conform to predefined func-
127 tion signatures-supporting flexible and dynamic integration of heterogeneous resources. Third, the
128 agent adopts an adaptive planning strategy: it formulates an initial plan grounded in biomedical
129 knowledge and iteratively refines it throughout execution, enabling responsive, context-aware be-
130 havior. Together, these innovations enable Biomni-A1 to generalize to previously unseen tasks and
131 domains, dynamically composing intelligent actions and interfacing with software, data, and tools
132 in a way that embodies generalist biomedical intelligence (Figure 1c).
133 Biomni excels on general biomedical knowledge and reasoning benchmarks. We evaluated
134 Biomni on three challenging multiple-choice benchmarks of general biomedical knowledge and
135 reasoning: Humanity’s Last Exam (HLE)23 and LAB-Bench24 , which includes two key subtasks –
136 DbQA (Database Question Answering) and SeqQA (Sequence Question Answering) (Figure 2a).
137 These tasks span tool use, symbolic reasoning, and structured biological information retrieval
138 – core competencies for any robust biomedical AI agent. To isolate the impact of tool access
139 and agent design, we compared Biomni against six strong baselines (details in Supplementary
140 Notes A).
141 For LAB-Bench, a 45-question development set was used to refine tool and database inter-
6
142 faces, while the final evaluation was conducted on 315 held-out test questions, with performance
143 averaged across three independent runs. We only curated a representative 12.5% subset of the full
144 benchmark due to API cost constraints. In DbQA, which requires structured querying over biolog-
145 ical databases, Biomni achieved 74.4% accuracy – matching expert human performance (74.7%)
146 and outperformed all baselines, including the coding agent (ReAct+Code, 40.8%). In SeqQA,
147 which involves reasoning over DNA and protein sequences, Biomni achieved 81.9% accuracy,
148 again exceeding human-level performance (78.8%).
149 To test true generalization of biomedical knowledge and reasoning without any development
150 set, we also evaluated Biomni on a 52-question subset of HLE spanning 14 biomedical subfields
151 – from molecular biology to physiology. Biomni achieved 17.3% accuracy, significantly outper-
152 forming the base LLM (6.0%), coding agent (12.8%), and literature agent (12.2%). These re-
153 sults demonstrate Biomni’s ability to generalize across unfamiliar, open-ended biomedical domains
154 without any task-specific adaptation. Additional ablation results are shown in Supplementary Fig-
155 ures 1-2. Performances across each subfield are reported in the Supplementary Figure 3.
156 Biomni generalizes to new, real-world biomedical tasks across diverse subfields. To evaluate
157 generalization in realistic research tasks, we curated eight new biomedical benchmarks spanning
158 genetics, genomics, microbiology, pharmacology, and clinical medicine (Figure 2b). Each task
159 was framed to reflect a common, well-defined, but complex real-world biomedical research goal,
160 including: (1) Variant prioritization: Identify the most likely causal variant from a list of poten-
161 tial variants for a trait, requiring reasoning about regulatory functions in non-coding regions. (2)
162 GWAS causal gene detection: Select the most likely causal gene within a locus, demanding fine-
163 grained locus-level inference. (3) CRISPR perturbation screen design: Construct gene panels to
164 maximize post-perturbation effect across a large (∼20,000 genes) search space. (4) Rare disease
165 diagnosis: Map patient phenotypes and genetic findings to rare disease diagnosis. (5) Drug repur-
166 posing: Given a rare disease and a list of candidate drugs, select the best therapeutic match. (6)
167 Single-cell RNA-seq annotation: Assign accurate cell-type labels to individual cell profiles across
168 tissues, species, and platforms. (7) Microbiome disease-taxa analysis: Perform statistical associa-
169 tion tests on microbiome datasets to uncover disease-relevant taxa. (8) Patient gene prioritization:
170 Given an individual patient’s genetic profile and phenotype description, identify the most plausi-
7
171 ble causal gene. We benchmarked Biomni without prompt engineering or task-specific fine-tuning
172 against three baselines: (1) a base LLM (Claude Sonnet 3.7) without tool use, (2) a coding agent
173 with direct function calls and code execution (ReAct+Code), and (3) Biomni-ReAct, an ablation
174 of Biomni that replaces code-based planning with ReAct-style chaining. The complete benchmark
175 constructions are described in Methods, with detailed performance comparisons in Supplementary
176 Notes B.
177 Across all tasks, Biomni outperformed the base LLM by an average relative performance
178 gain of 402.3%, the coding agent by 43.0%, and its own ablated variant Biomni-ReAct by 20.4%.
179 These findings highlight the importance of code-centric planning and environment grounding, en-
180 abling Biomni to compose precise, flexible, and context-aware actions. For each benchmark, we
181 further analyzed the execution trajectories, identifying commonly invoked tools, software, and
182 datasets, as detailed in Supplementary Figures 6-16. These trajectories provide insight into the
183 complexity and structure of agent behavior across tasks. On average, Biomni executes between 6
184 and 24 distinct steps per task, involving combinations of 0-4 specialized tools, 1-8 software pack-
185 ages, and 0-3 unique data lake items. The agent interleaves data extraction, search/retrieval, rea-
186 soning, and computational analyses (Supplementary Figure 8) – reflecting a workflow pattern that
187 mirrors how human scientists alternate between retrieving knowledge and generating new insights.
188 Resource usage varies by task type: information synthesis tasks, such as CRISPR perturbation
189 screen design and GWAS causal gene identification, rely heavily on database queries (e.g., KEGG,
190 Reactome) and literature search (e.g., PubMed, Google), whereas bioinformatics analysis tasks
191 like microbiome profiling and single-cell annotation involve minimal database use but extensive
192 code execution with software libraries such as scanpy.
193 Biomni jointly analyzes 458 wearable sensor files to generate physiological hypotheses. To
194 evaluate Biomni’s performance in real-world biomedical workflows, we invited scientists to ap-
195 ply it directly to their own research questions. In this case study, a researcher used Biomni to
196 analyze 458 Excel files containing months-long wearable sensor data (continuous glucose moni-
197 toring (CGM) and body temperature) from 30 participants. The data were highly heterogeneous:
198 file formats varied, annotations were inconsistent, and participants exhibited substantial variability
199 (Figure 3a). The researcher posed an open-ended question: Can we uncover biologically meaning-
8
200 ful thermogenic patterns?
201 Biomni autonomously generated and executed a 10-step analysis pipeline (Figure 3b), in-
202 ferring meal events from glucose spikes, extracting pre/post meal temperature windows, normal-
203 izing across individuals, and synthesizing population-level trends. Crucially, after completing the
204 pipeline, the agent delivered a structured, human-readable report summarizing its key findings
205 (Supplementary Notes D). It identified a consistent postprandial thermogenic response, with an
206 average temperature rise of 2.19◦ C (median: 1.10◦ CC) and a wide range across individuals (-
207 0.11◦ CC to 15.56◦ CC). Some participants showed rapid, pronounced spikes within 30 minutes of
208 eating, while others had delayed or muted responses – indicating divergent metabolic phenotypes
209 (Figure 3c,d). These insights were not manually curated or extracted by a human; the agent per-
210 formed the entire analysis end-to-end and surfaced the results as a concise narrative highlighting
211 patterns that would otherwise being ignored in raw data.
212 In a parallel workflow, the scientist requested Biomni to analyze 227 nights of wearable-
213 recorded sleep data across 10 participants. Biomni computed averages for duration, efficiency,
214 latency, and sleep stage composition, derived a composite sleep quality score, and conducted
215 chronobiological analyses. The agent delivered a structured summary to the user (Supplemen-
216 tary Notes D, Supplementary Figure 4), including personalized sleep profiles and timing insights,
217 without human post hoc synthesis. Biomni uncovered several novel insights: sleep efficiency
218 consistently peaked mid-week (on Wednesdays) and declined on Sundays, suggesting a potential
219 behavioral pattern tied to pre-Monday stress or weekend-induced disruptions. Another important
220 finding was that consistent sleep timing correlated more strongly with higher sleep quality than
221 total sleep duration, highlighting the critical role of circadian regularity in maintaining restorative
222 sleep.
223 The scientist then tasked Biomni with analyzing multi-omics data (652 lipidomic, 731 metabolomic,
224 and 1,470 proteomic features), jointly with the CGM data. Biomni conducted cross-omics correla-
225 tion analysis, applied hierarchical clustering to uncover biologically coherent feature groups, and
226 performed unsupervised PCA to link CGM signals to molecular pathways. It automatically gener-
227 ated interpretable outputs – trajectory plots, heatmaps, boxplots, PCA biplots, and cluster maps –
228 empowering rapid insight generation from complex multimodal datasets (Supplementary Notes D,
229 Supplementary Figure 5). Significant correlations among lipids, metabolites, and proteins revealed
9
230 tightly interlinked regulatory pathways, underscoring the systems-level nature of metabolic regula-
231 tion. Notably, several identified biomarkers showed consistent patterns across samples and exhib-
232 ited high connectivity within correlation networks. Across all cases, the scientist noted that Biomni
233 accelerated the path from messy real-world data to testable hypotheses, supporting applications in
234 sleep optimization, metabolic research, and precision health.
10
259 In its final gene regulatory network (GRN) analysis (Figure 3h), Biomni re-capitulated known
260 regulatory relationships between key osteogenic transcription factors such as RUNX2 and HHIP,
261 confirming how they are regulated by a shared set of anti-osteogenic transcription factors including
262 TWIST1, LMX1B, and ALX4 25 . These findings align with author’s report 25 about the balanced
263 regulation needed for proper bone formation and suture patency. Furthermore, Biomni also uncov-
264 ered several unreported TFs, including AUTS2, ZFHX3, and PBX1, showed unexpectedly high
265 regulatory activity across multiple skeletal cell types. Although PBX1 is a well-established skele-
26 27
266 tal regulator and ZFHX3/AUTS2 have only limited or indirect skeletal reports (in mouse
28
267 or zebrafish ), their broad activity here suggests under-appreciated roles across diverse skele-
268 tal lineages. Biomni reported that these novel regulators were particularly active in osteoblasts,
269 preosteoblasts, and various chondrocyte populations, suggesting they play important but previ-
270 ously unrecognized roles in the transcriptional control of skeletal cell fate determination during
271 human embryonic development. Finally, Figure 3g-h reveals how Biomni’s visualizations effec-
272 tively captured both temporal dynamics of regulator activity and cell-type-specific variations in key
273 regulons like RUNX2. This demonstrates how Biomni enables researchers to autonomously per-
274 form complex multi-omics analysis and rapidly generate testable hypotheses without specialized
275 programming expertise.
276 Biomni designs wet-lab validated experimental protocol for cloning. To evaluate Biomni’s
277 ability to support real-world experimental design, we focused on a core task in molecular biology:
278 cloning. This process is central to countless workflows in research and biotechnology and requires
279 complex reasoning, from designing high-fidelity primers to choosing the right assembly method
280 and validating constructs. While general-purpose LLMs have struggled to perform such tasks due
281 to limited domain knowledge and tool access 24 , Biomni integrates LLM reasoning with dynamic
282 tool execution, enabling expert-level performance in molecular biology tasks.
283 To rigorously evaluate this task, we first collaborated with an expert group of gene-editing
284 researchers to design an open-ended cloning benchmark and expert user study (Figure 4a). Our
285 benchmark consisted of 10 realistic, representative cloning tasks covering Golden Gate, Gibson,
286 Gateway, and restriction cloning – each with options including single-fragment vs. pooled assem-
287 bly. The benchmark also included essential validation steps, such as designing Sanger sequencing
11
288 primers and analyzing restriction digests. We posed these tasks to four entities: an LLM (Claude
289 3.7), Biomni, a human trainee (Stanford Biology Master with previous experience in cloning),
290 and a senior human expert (Stanford Genetics PostDoc with 5+ years of cloning experience).
291 Each was asked to generate a complete, end-to-end protocol along with the final cloned plas-
292 mid map. Blinded expert reviewers assessed the outputs. Biomni produced protocols and designs
293 that matched the human expert in accuracy and completeness – often providing comparable lev-
294 els of detail and anticipating the same edge cases. In contrast, the human trainee’s submissions
295 were frequently incomplete or suboptimal, reflecting the experience gap typical in early-stage re-
296 searchers. Remarkably, Biomni completed all tasks autonomously in a fraction of the time taken
297 by the expert.
298 To further validate Biomni in a real-world setting, a scientist assigned it a practical cloning
299 task: cloning a guide RNA targeting the human B2M gene into the lentiCRISPR v2 Blast construct
300 (Figure 4b). Biomni successfully executed the task through a comprehensive workflow (Figure 4c).
301 First, it analyzed the plasmid structure using annotation and pattern search tools to identify key fea-
302 tures necessary for cloning. It then designed three Cas9 sgRNAs targeting B2M using specialized
303 knockout sgRNA design tools. For the cloning process, Biomni generated forward and reverse
304 oligos with BsmBI overhangs to enable directional insertion of the sgRNA sequence. It produced
305 detailed protocols (Figure 4d) for oligo annealing, double-stranded DNA formation, and Golden
306 Gate cloning into the target vector. Biomni also provided complete bacterial transformation in-
307 structions, including heat-shock steps and antibiotic selection. For quality control, it designed a
308 U6 promoter sequencing primer to verify sgRNA insertion and simulated the Golden Gate assem-
309 bly to produce the final plasmid map.
310 The scientist followed Biomni’s protocol exactly to perform the wet-lab experiment (Fig-
311 ure 4e). Colonies appeared on the plate the next day; two were cultured, miniprepped, and se-
312 quenced using the Biomni-designed primers – both showing perfect alignment. This case illustrates
313 how scientists can rely on Biomni to autonomously design complex molecular biology experiments
314 with accuracy comparable to human experts, but in a fraction of the time.
315 User-friendly interface to empower scientists to generate biomedical discoveries. To bring the
316 power of Biomni into the hands of every scientist, we built an intuitive graphical interface – avail-
12
317 able at https://biomni.stanford.edu – to help transform the way researchers interact with biomedical
318 data and tools. This seamless platform enables users to submit natural language queries and receive
319 results powered by the full capabilities of Biomni’s agentic system. Whether designing complex
320 cloning experiments, querying multi-omics databases, or generating hypotheses from wearable
321 data, scientists can now access the intelligence of a general-purpose biomedical AI agent without
322 writing a single line of code. The interface is designed for rapid iteration, real-time feedback, and
323 visual traceability, allowing users to explore intermediate steps, inspect tool usage, and validate
324 results interactively. By closing the gap between biomedical intent and execution, Biomni opens a
325 new era of accessible, automated, and scalable scientific discovery. An example of this interface is
326 shown in Supplementary Figure 17.
327 3 Discussion
328 Biomni marks a major step forward in biomedical research, demonstrating robust generalization
329 across diverse subfields and laying the groundwork for AI agents as integral collaborators in scien-
330 tific discovery. Its zero-shot performance across complex tasks – including those in genetics, ge-
331 nomics, microbiology, immunology, pharmacology, and clinical medicine – underscores its poten-
332 tial to boost research productivity, accelerate discovery, and broaden access to advanced biomedical
333 analyses.
334 By automating complex, labor-intensive workflows, which normally require both expert
335 knowledge and coding skills, Biomni enables researchers to redirect their efforts toward creative
336 hypothesis generation, experimental innovation, and cross-disciplinary collaboration. This shift
337 holds profound implications. In the conext of target and drug discovery for biopharma, Biomni
338 can autonomously prioritize targets, design perturbation screens, or repurpose drugs – offering a
339 path to faster, more cost-effective reasearch. In clinical application settings, its capabilities in gene
340 prioritization and rare disease diagnosis point to more accurate, personalized insights and stream-
341 lined diagnostics. For consumer health, Biomni’s integration of wearable data and multi-omics
342 analyses envisions real-time, individualized health monitoring and intervention.
343 Nonetheless, several limitations remain. While Biomni’s unified environment spans a wide
344 range of biomedical tools and databases, the evaluated tasks represent only a subset of the field,
345 and key domains remain unexplored. In addition, in the action discovery agent, our decision to pri-
13
346 oritize the most recent literature makes the agent appear timely, but risks overlooking foundational
347 concepts and techniques that have faded from current discourse despite their enduring relevance.
348 The future versions should encapsulate a larger coverage of publications when defining the envi-
349 ronment. Moreover, although Biomni approaches human-level performance in tasks like database
350 querying, sequence analysis, and molecular cloning, it still struggles in areas requiring nuanced
351 clinical judgment, novel experimental reasoning, analytical inventions, or deep biological thinking
352 and synthesis. No system yet captures the full scope of human biomedical expertise. As reflected
353 in our benchmarks, Biomni has not achieved expert-level performance across all task categories.
354 We expect continued improvements as foundation models evolve and the agentic environment ex-
355 pands, as well as thanks to human experts and trainees deploying Biomni to facilitate or augment
356 their work.
357 These limitations open promising directions for future development. Training biomedical
358 reasoning agents with reinforcement learning could enable continuous self-improvement in plan-
359 ning and execution. Integrating multimodal data – text, images, and structured inputs – may further
360 deepen reasoning capabilities. Equipping Biomni to autonomously discover and incorporate new
361 tools and databases, as well as incorporate more historical methods (which may have high utility
362 but can be easily forgotten by human users), would ensure adaptability and long-term relevance.
363 Looking ahead, Biomni and its successors could become foundational infrastructure in an AI-
364 powered biomedical ecosystem, working seamlessly with human experts to unlock novel insights
365 into health and disease. This hybrid partnership may radically reshape biomedical research –
366 automating hypothesis generation, scaling discovery pipelines, and enabling medical innovation
367 to proceed at unprecedented speed and scope. General-purpose agents like Biomni could not only
368 accelerate breakthroughs but redefine the future of scientific inquiry itself.
14
a 25 subjects defined in biorxiv b An AI Agent Environment with c Are oncogenic pathways enriched among
Subject 25 Unified Biomedical Action Space differentially expressed genes in adata.h5ad?
Subject 1
Genomics Retrieving
Biochemistry Pathology Scanpy gseapy ... CellxGene ClinVar ... UCE
RNA ...
velocity
Genetics
...
100 recent published papers Reasoning
Molecular DESeq2 To answer the question:
macs2 [ ] Preprocess, cluster and identify differentially expressed genes.
biology gnomAD Synthetic [ ] Load oncogenic gene sets for comparison.
[ ] Perform enrichment analysis to check for oncogenic pathways.
BindingDB biology [ ] Validate results with literature and databases.
...
miRBase
OpenTarget
... cooler
Genetics
Scanpy UCE IUCN primer
? Composing actions with code
Expert Reasoning
golden JASPAR RNA muscle
curation gate clone velocity
[ ] Preprocess, cluster and identify differentially expressed genes.
UCSC [ ] Load oncogenic gene sets for comparison...
bwa
Bioengineering Physiology Composing actions with code
...
Homer COSMIC
...
59 databases
S Observation
AI
Biochemistry ......
Pharmacology
Answer
Microbiology
Figure 1: Overview of the unified biomedical action space and agent environment in Biomni. (a) Workflow for systematically
curating the unified biomedical action space. Actions necessary to conduct biomedical research were extracted from 2,500 recent
bioRxiv publications across 25 biomedical subfields using an AI-driven discovery agent. Extracted actions were rigorously validated
and curated by human experts, resulting in the integration of 105 biomedical software tools, 150 specialized biological tools (includ-
ing wet-lab protocols, AI-driven predictive models, and domain-specific know-how), and 59 comprehensive biomedical databases.
(b) Illustration of the unified biomedical action space spanning diverse biomedical subfields such as genetics, genomics, synthetic
biology, cell biology, physiology, microbiology, pharmacology, bioengineering, biophysics, molecular biology, and pathology. Rep-
resentative tools and databases integrated into Biomni’s environment are shown, highlighting its general-purpose capabilities. (c)
Example workflow demonstrating Biomni’s reasoning and action composition process to autonomously answer a complex biological
question. Biomni retrieves relevant tools based on the user’s query, formulates a structured reasoning plan, and composes executable
code to perform comprehensive bioinformatics analyses, iteratively refining its reasoning based on observations until converging on
a final, precise answer.
15
a Humanity's Last Exam:
Biomedicine LAB-Bench: DbQA LAB-Bench: SeqQA
Human
Human
b
15
log10(P)
A T G C G A C T 10
-
0
10
11
12
13
14
15
16
17
18
20
19
21
22
7
8
9
2
3
5
6
4
1
Chromosome
Patient
Microbiology
Drug
Figure 2: Zero-shot generalization of Biomni across diverse realistic biomedical tasks. (a) Biomni is superior to 6 baselines in
Q&A multiple choice benchmarks that broadly evaluate the model’s capability across biomedical fields. (b) Biomni demonstrates
robust zero-shot performance across eight previously unseen, real-world biomedical scenarios spanning multiple biomedical sub-
fields, without any task-specific fine-tuning or prompt engineering. Evaluated tasks include variant prioritization and GWAS causal
gene detection (genetics and genomics), perturbation screen design (functional genomics, immunology), patient gene prioritiza-
tion, rare disease diagnosis (clinical genomics), drug repurposing (pharmacology), microbiome disease-taxa bioinformatics analysis
(microbiology), and single-cell RNA-seq cell annotation (single-cell biology). Across these diverse scenarios, Biomni consistently
outperformed baseline models (Base LLM, ReAct+Code) and specialized environments (Biomni ReAct), highlighting its general-
purpose biomedical capabilities and ability to autonomously adapt to new and complex biomedical tasks.
16
a I have CGM-derived data from multiple
...... participants collected continuously around meals.
Study how temperature changes after food intake.
...... [instructions] ......
Continuous
30 samples Activity 458 Raw Excel
Glucose
Data Sheets
Monitoring Data
b
1. Dataset exploration 3. Postprandial 4. Individual temperature 5. Cross-subject
& preprocessing 2. Meal event detection
temperature analysis response plots comparative analysis
10. Identification of individual 9. Temperature increase 8. Response pattern 7. Peak thermogenic 6. Temperature
metabolic variability distribution analysis heatmap generation timing characterization response normalization
e
Map transcription factor circuits
{Shoulder, Hip, driving skeletal development on
this scRNA & scATAC data.
Knee} Joint ...... [instructions] ......
snRNA-seq snATAC-seq
Time
336,000 nucleus droplets
f
1. Load and explore all 2. Prepare RNA-seq 3. Configure pySCENIC (e.g., 4. Run GRNBoost2 to infer 5. Prune networks
the datasets data for analysis retrieve motifs from data lake) gene regulatory networks using cisTarget
10. Summary 9. Analyze activity patterns across cell types, 8. Filter predicted targets 7. Extract accessibility 6. Calculate regulon
findings developmental stages, and anatomical regions using ATAC-seq accessibility data from ATAC-seq activity (AUCell)
Figure 3: Biomni autonomously executes complex multi-modal biomedical analyses to generate hypothesis. (a-d) Biomni rapidly
analyzed CGM-derived thermogenic responses data and activity data from 30 individuals, comprising 458 raw Excel sheets. (b)
Workflow demonstrating Biomni’s autonomous execution of data preprocessing, meal event detection, postprandial temperature
analysis, and thermogenic response characterization. (c) Representative individual temperature-response plots and temperature in-
crease distribution following meals, automatically generated by Biomni. (d) Summary of unique biological findings identified
by Biomni, including significant increases in core body temperature post-meal intake (average 2.19C, median 1.10C), and notable
inter-individual variability in thermogenic responses. (e-h) Biomni autonomously analyzed single-cell multiomics data from approx-
imately 336,000 nucleus droplets, combining single-nucleus RNA (snRNA-seq) and single-nucleus ATAC sequencing (snATAC-seq)
across human embryonic joint development (shoulder, hip, knee). (f) A detailed workflow diagram showing Biomni’s 10-step analy-
sis pipeline for gene regulatory networks with multiomics. (g) Two key figures generated from Biomni: Left panel shows a heatmap
of regulator activity by developmental stage, with color intensity indicating activity levels. Right panel displays a boxplot of RUNX2
regulon activity by cell type, showing variation in expression across different cell populations. (h) Key findings from the GRN anal-
ysis: 1) Novel transcription factors (AUTS2, ZFHX3, and PBX1) showing high regulatory activity across multiple skeletal lineages
despite no previous association with skeletal development, and 2) Across the 566-589 regulons recovered, limb mesenchyme cells
display the highest mean regulonactivity score, underscoring their prominent role in skeletal transcriptional control.
17
a Cloning Open Answer Evaluation b I have a plasmid plentiCRISPR v2 blast:
CGTTGGAGTCCACGTTCTTTAATAGTGG.....GCCCTTTGA.
I hope to clone a CRISPR sgRNA targeting human B2M
GG Multiple-piece GG Restriction
into this plasmid. Could you give me the final map of
the plasmid and step-by-step guidance on how should I
GG Library Gibson Restriction Library perform the cloning?
c
3. Oligo Design 4. Oligo Annealing & 6. Colony
1. Plasmid 2. sgRNA 5. Bacterial 7. Final Plasmid
for Cloning Golden Gate Assembly Screening &
Analysis Design Transformation Map Assemble
Verification
Figure 4: Biomni designs wet-lab experimental protocol. (a) Open-ended cloning benchmark on 10 real cloning scenarios. We
compared against base LLM, trainee-level human, and expert-level human scientists. We found that Biomni has similar accuracy
as the expert level scientist, and significantly higher accuracy than trainee level, while using much less time. (b) Example of a
user request to Biomni for cloning an sgRNA targeting the human B2M gene into the lentiCRISPR v2 Blast plasmid. (c) Biomni’s
automated stepwise workflow, including plasmid analysis, sgRNA design, oligo synthesis, Golden Gate assembly, bacterial transfor-
mation, colony screening, and final plasmid mapping. (d) Biomni-generated detailed cloning protocol with step-by-step instructions
and comprehensive plasmid map, enabling laboratory scientists to execute the experiment autonomously. (e) Validation of Biomni’s
cloning protocol through successful colony growth on selection plates, followed by Sanger sequencing confirming perfect alignment
of sgRNA insertion in picked colonies, demonstrating Biomni’s robust capability for precise and reliable experimental design.
18
369 4 Methods
370 Action Discovery from Literature. 100 recent publications from the year 2024 at biorxiv Were
371 collected and analyzed by extracting and parsing their PDF contents. Each paper was processed
372 in chunks, and a specialized prompt guided an LLM through each chunk to explicitly identify
373 and extract three categories of actionable insights: tasks, software, and databases. Specifically for
374 tasks, the LLM was instructed to highlight recurrent tasks requiring specialized implementations
375 within biomedical research workflows.
376 Implementing the Biomni Environment. In the initial iteration of environment construction, a
377 conservative and focused approach was adopted for tool curation. Initially, tasks were filtered
378 based on relevance to the primary research interests-drug discovery and clinical biomedicine-
379 retaining fields such as biochemistry, bioengineering, biophysics, cancer biology, cell biology, de-
380 velopmental biology, genetics, genomics, immunology, microbiology, molecular biology, pathol-
381 ogy, pharmacology, physiology, synthetic biology, and systems biology. Subsequently, these were
382 narrowed down to approximately 1,900 commonly recurring tasks. These tasks were further man-
383 ually reviewed to eliminate redundancy and exclude tasks that were trivial or easily implementable
384 through simple code. Selecting highly specialized tasks that require significant domain expertise
385 was emphasized, such as wet-lab protocols and advanced AI models.
386 Human scientists then collaborated with software engineering agents equipped with web
387 search capabilities to implement each specialized tool. Every tool underwent rigorous validation,
388 requiring a clearly defined test case that it successfully passed. This stringent process culminated in
389 a curated collection of 150 specialized tools. Additionally, essential literature retrieval tools were
390 included, such as PubMed and Google Scholar, with provisions for future iterative expansions.
391 Each tool was strictly defined using a comprehensive checklist that mandated: (1) a clear
392 and descriptive name, (2) detailed documentation, (3) outputs formatted as detailed research logs
393 optimized for LLM interpretation, (4) the inclusion and successful passing of a specific test case,
394 and (5) specialization criteria-if a task could easily be implemented via brief LLM-generated code
395 (e.g., simple database queries), no specialized tool was created.
396 Databases were categorized and extensive relational databases accessible via web APIs (e.g.,
397 PDB, OpenTargets, ClinVar) were integrated using a unified querying function. This function
398 accepts natural language inputs and leverages an LLM to dynamically parse database schemas
399 and execute corresponding queries. Databases lacking web APIs were downloaded and locally
400 preprocessed into structured pandas DataFrames for seamless accessibility by the agent.
401 For software integration, recognizing the frequent necessity of concurrently utilizing multiple
19
402 software tools, a unified containerized environment was constructed, which was pre-installed with
403 a comprehensive suite of relevant software. Additionally, this environment supports the execution
404 of R packages and command-line interface (CLI) tools.
405 Biomni-A1. The Biomni agent is a general-purpose biomedical AI agent built upon the CodeAct29
406 framework, designed to systematically solve biomedical tasks by combining LLMs with an inter-
407 active coding environment. Given a user query, Biomni begins by prompting the LLM to generate
408 a clear, numbered bullet-list plan detailing the steps needed to tackle the given problem, keeping
409 careful track of progress and adjustments along the way. As the tool, software, and database space
410 is vast, the query task may only use a small set of these resources. To avoid long context, a prompt-
411 based retriever is utilized, powered by a separate LLM, where the agent dynamically selects the
412 most relevant functions, datasets, and software libraries from available resources. During execu-
413 tion, the LLM generates code, executes it in a coding environment (Python, R, or Bash), and returns
414 the resulting observations to inform subsequent reasoning. This iterative approach continues until
415 the agent converges on an accurate, validated solution.
416 Q&A Benchmarks. Development and testing sets were created by sampling the LAB-Bench
417 Database Question-Answering and Sequence Question-Answering benchmarks 24 . Due to resource
418 constraints, each set comprises 12.5% of the complete reference, proportionally distributed across
419 benchmark subtasks, providing a cost-effective and representative assessment of model perfor-
420 mance. The development set informed iterative refinements to Biomni’s database integrations and
421 tool implementations, while the test set provided an independent evaluation of generalization ca-
422 pabilities. Accuracy was evaluated by following the LAB-Bench protocol, using multiple-choice
423 answer options with an option for abstention due to insufficient information. Results represent
424 averages across three independent evaluation runs.
425 For Humanity’s Last Exam (HLE)23 , a representative sample of questions was selected, span-
426 ning fourteen subdisciplines of Biology/Medicine: Genetics, Biology, Ecology, Neuroscience,
427 Biochemistry, Microbiology, Immunology, Molecular Biology, Computational Biology, Biophysics,
428 Bioinformatics, Genomics, and Physiology. From each subdiscipline, up to five questions were
429 sampled (or the maximum number available if fewer than five existed in the category). This
430 sampling approach yielded a final evaluation set of 52 questions that comprehensively assessed
431 Biomni’s performance across the biological sciences. The evaluation was conducted directly with-
432 out the use of a development set.
433 Curating real-world benchmarks. The variant prioritization benchmark was curated from Open
434 Target Genetics30 ground truth set, and processed such that given a variant, a negative set of vari-
20
435 ants is found. The prompt was as follows: ”Your task is to identify the most promising variant
436 associated with a given GWAS phenotype for futher examination. From the list, prioritize the
437 top associated variant (matching one of the given variant). GWAS phenotype: {trait} Variants:
438 {variant list}”. Accuracy was used as the metric. The GWAS causal gene detection benchmark
439 utilized a dataset curated from Shringarpure et al31 , using the original prompt: ”Your task is to
440 identify likely causal genes within a locus for a given GWAS phenotype. From the list, provide
441 only the likely causal gene (matching one of the given genes). Identify the causal gene. GWAS
442 phenotype: {trait} Genes in locus: {gene str}”. Accuracy was used as the metric. The pertur-
443 bation screen design benchmark was curated from Schmidt et al.32 . The prompt is ”Task: Plan
444 a CRISPR screen to{task description}. There are 18,939 possible genes to perturb and only per-
445 turb {num genes} genes. For each perturbation, you can measure out {measurement} which will
446 be referred to as the score. Generate {num genes} genes that maximize the perturbation effect.
447 Output format: a list of genes 1. XXX 2.XXX 3.XXX ...”. The evaluation metric was the average
448 post-perturbed effect. As the scale differs for the post-perturbed effect, one screen (IL-2) was used.
449 The scRNA-seq annotation benchmark ensured flexibility across diverse data formats (e.g., Cel-
450 lxGene, author-hosted portals), encompassing multiple tissues, species, sequencing technologies,
451 and experimental conditions. Datasets with author-provided annotations (Tier 1 or Tier 2, typically
452 ≥10 cell types) were prioritized, and 20k-50k cells were subsampled proportionally to their cell
453 type distributions. Automatic evaluation was conducted at the single-cell level using LLMs via
454 semantic match, accounting for both naming variations (e.g., fibroblast vs. Fibroblast cells) and
455 hierarchical relations (e.g., CD8+ T cells vs. T cells), judged on-the-fly by LLM agents and later
456 verified by humans. In the microbiome benchmark, both Biomni and human experts independently
457 performed differential abundance analysis on five diverse microbiome datasets, selected to reflect
458 different data types, biological contexts, and analytical challenges. Dataset 1 comes from the MGM
459 2.0 platform33 and includes relative microbial abundance across samples and another with sample
460 labels, ideal for classification tasks33 . Dataset 2 curated from a well-known Nature study, offers
461 microbial abundance data in mice alongside metadata such as diet and sex, making it valuable for
462 modeling host-microbiome interactions34 . Dataset 3, developed by Pasolli et al.35 , combines eight
463 human metagenomic studies with species-level features processed using MetaPhlAn235 . Dataset 4
464 explores microbial communities in drinking water systems, providing an OTU matrix with abun-
465 dances represented as relative sequence counts. This environmental dataset allows models to be
466 tested beyond host-associated microbiomes36 . Finally, Dataset 5 is an in-house resource derived
467 from the Human Microbiome Project37 . Together, these datasets provide a comprehensive foun-
468 dation for benchmarking AI agents in microbiome analysis across both clinical and environmental
21
469 domains. Biomni results were compared against those generated by human experts for consistency,
470 accuracy, and efficiency. The drug repurposing benchmark used a dataset from Huang et al.38 , for
471 the task of identifying the most likely drug from a pre-defined list of drugs for repurposing in a
472 given indication. Evaluation was based on the alignment score with off-label prescription patterns
473 of clinicians from an EHR system. The prompt was ”Your task is to identify top 5 drugs that can
474 be potentially repurposed to treat the given disease. From the list, prioritize the drug list with
475 the highest potential (matching the given DrugBank IDs). Disease: {disease} Drugs: {drug list}
476 Output format: a list of drugs with their DrugBank IDs, no drug name, just the IDs: 1. DB00001
477 2. DB00002 3. DB00003 ..”. The rare disease diagnosis benchmark used the MyGene2 dataset,
478 curated by Alsentzer et al.39 . The ground truth was expert annotated diagnosis. The prompt was
479 ”Task: given a patient’s phenotypes and a list of candidate genes, diagnose the rare disease that
480 the patient has. Phenotypes: {phenotype list} Candidate genes: {candidate genes} Output format:
481 {{’disease name’: XXX, ’OMIM ID’: XXX}}”. The patient gene prioritization benchmark used
482 a dataset curated by Alsentzer et al.39 . The ground truth was a truly causal gene. The prompt was
483 ”Task: Given a patient’s phenotypes and a list of candidate genes, identify the causal gene. Phe-
484 notypes: {phenotype list} Candidate genes: {candidate genes} Output format: {{’causal gene’:
485 [gene1]}}”.
486 Wearable analysis case study. A wearable case study integrated CGM-derived body temperature
487 data, sleep metrics, and multi-omics datasets from human participants40 , as follows: CGM Body
488 Temperature Data: For each participant, continuous glucose monitors (CGMs) equipped with tem-
489 perature sensors recorded skin temperature in high resolution. A total of 485 temperature files were
490 collected, each centered on a presumed meal event. The time window for each file spanned 6 hours
491 total, comprising 2 hours pre-meal and 4 hours post-meal. Sleep Data: Sleep metrics were derived
492 from wrist-worn wearable devices for a subset of 10 participants, covering 227 nights of sleep. Pa-
493 rameters collected included sleep duration, sleep efficiency, sleep latency, sleep stage composition
494 (light, deep, REM), and number of wake episodes. Omics Data: Blood samples were analyzed to
495 generate the following: Lipidomics: 652 lipid features across 147 samples; Metabolomics: 731
496 metabolite features across 147 samples; Proteomics: 1,470 protein features across 20 samples.
497 Multiome analysis case study. The authors’ dataset was directly downloaded and used with no
498 modifications 25 . The authors’ study generated a multi-omic dataset of human embryonic skeletal
499 development from 5-11 weeks post-conception. The dataset includes snRNA-seq and snATAC-seq
500 data from approximately 336,000 nuclei across five anatomical regions (hip, knee, shoulder joints,
501 calvaria, and skull base). The dataset covers both appendicular (limb) and cranial regions. No
22
502 additional tools or manual preprocessing were added. As the analytical traces are extensive, more
503 guidance was included in the prompt instruction and two use cases were tested:
504 Comparative Analysis. This analysis focused on how cellular processes differ across anatomi-
505 cal locations and developmental timepoints. Biomni was instructed to characterize the cellular
506 composition across anatomical regions (calvaria, skull base, shoulder, hip, knee) and developmen-
507 tal stages. We prompted Biomni with detailed instructions (Supplementary Section E), including
508 cell type proportion estimates, region-specific population labels, UMAP embeddings, stacked bar
509 plots, a comparison of intramembranous versus endochondral ossification, key transcription factor
510 highlights, and developmental trajectory tracing.
511 Gene Regulatory Network Analysis We asked Biomni to identify transcriptional programs underly-
512 ing skeletal development. Following a systematic 10-step process, Biomni inferred gene regulatory
513 networks by: (1) loading and exploring all datasets, (2) preparing RNA-seq data for analysis, (3)
514 configuring pySCENIC to retrieve motifs, (4) running GRNBoost2 to infer gene regulatory net-
515 works, (5) pruning networks using cisTarget, (6) calculating regulon activity with AUCell, (7)
516 extracting accessibility data from ATAC-seq, (8) filtering predicted targets using ATAC-seq ac-
517 cessibility, (9) analyzing activity patterns across cell types, developmental stages, and anatomical
518 regions, and (10) summarizing findings.
519 Manual verification To evaluate whether the aggregated findings are truly reflected by the data or
520 merely simulated or hallucinated by the LLM, manual (human) verification was conducted follow-
521 ing the traces and codes generate by Biomni.
522 Wetlab Benchmark Development and Evaluation. A comprehensive benchmark was developed
523 consisting of 20 open-ended cloning questions curated from real-world applications to represent
524 the diversity and complexity of molecular cloning tasks across four major categories: Golden
525 Gate assembly, Gibson assembly, restriction enzyme cloning, and Gateway cloning. Each cate-
526 gory included both single-construct and pooled cloning scenarios. Additionally, the benchmark
527 incorporated common validation methods, including diagnostic restriction digestion, Sanger se-
528 quencing primer design, and sequence alignment analysis. For establishing baseline performance,
529 three human experts with extensive experience in molecular cloning were recruited. These experts
530 were instructed to complete each task without utilizing language models but were permitted to use
531 standard molecular biology tools, search engines, and publicly available online resources such as
532 plasmid repositories and primer design platforms. The time required for each expert to complete
533 each task was recorded, from initial task understanding to the final protocol and plasmid map gen-
534 eration. In parallel, Biomni and general LLM models were evaluated on identical tasks. Each
535 system was provided with the same task descriptions and required to generate detailed end-to-end
23
536 experimental protocols and final cloned plasmid maps. For general LLMs, Claude 3.7 was used as
537 one of the most capable publicly-available models at the time of testing, providing it with the same
538 information but without access to specialized molecular biology tools. For evaluation, an indepen-
539 dent senior researcher with experience in molecular cloning technologies was recruited and blinded
540 to the source of each protocol (human expert, Biomni, or general LLM). The evaluator assessed
541 each protocol and plasmid map based on two primary criteria: (1) Accuracy: The correctness of
542 the proposed methodology, including appropriate enzyme selection, reaction conditions, primer
543 design parameters, and plasmid construction strategy. (2) Completeness: The thoroughness of the
544 protocol, including all necessary steps, reagents, concentrations, incubation times, and verification
545 methods. Each criterion was scored on a scale of 1-5 according to a detailed rubric (Supplementary
546 Table S31-32). The average scores across all 20 tasks were calculated for each system and human
547 expert to enable direct comparison.
548 Wetlab Validation. A practical cloning task was selected for validation: the insertion of a guide
549 RNA targeting the human B2M gene into the lentiCRISPR v2 Blast construct. This task was
550 chosen for its relevance to CRISPR-based gene editing applications and its moderate complexity,
551 involving multiple molecular biology techniques. The experiment was conducted in a standard
552 molecular biology laboratory setting using commercially available reagents and materials. The
553 lentiCRISPR v2 Blast plasmid was obtained from Addgene. All protocols for the experiment were
554 generated entirely by Biomni without modification (Supplementary Notes F), including plasmid
555 analysis, sgRNA design, oligo design with appropriate overhangs, detailed Golden Gate assembly
556 conditions, bacterial transformation parameters, and verification strategies. For validation of the
557 cloning results, standard molecular biology practices were followed, selecting colonies for cul-
558 ture and miniprep, followed by Sanger sequencing using the Biomni-designed primers. Sequence
559 alignment analysis was performed to verify the correct insertion of the sgRNA sequence. The
560 success of the cloning process was determined by the presence of bacterial colonies on selective
561 media and subsequent sequence verification confirming the accurate incorporation of the designed
562 sgRNA construct into the lentiCRISPR v2 Blast backbone.
24
563 Data availability. All data used in Biomni are publicly available at Harvard Dataverse under
564 https://doi.org/10.7910/DVN/CE4ZYG.
568 Acknowledgements. We thank Emily Alsentzer, Andrew Lee, members of Jure Leskovec’s lab,
569 and members of Euan Ashley’s lab, for providing helpful feedbacks. K.H. and J.L. also gratefully
570 acknowledge the support of NSF under Nos. OAC-1835598 (CINES), CCF-1918940 (Expedi-
571 tions), DMS-2327709 (IHBEM), IIS-2403318 (III); Stanford Data Applications Initiative, Wu Tsai
572 Neurosciences Institute, Stanford Institute for Human-Centered AI, Chan Zuckerberg Initiative,
573 Amazon, Genentech, GSK, Hitachi, SAP, and UCB. K.H. acknowledge the support of Stanford
574 Bio-X fellowship. Research reported in this publication was supported by the National Institute
575 of Neurological Disorders and Stroke of the National Institutes of Health under Award Number
576 U01NS134358. The content is solely the responsibility of the authors and does not necessarily
577 represent the official views of the National Institutes of Health.
578 Authors contribution. K.H., Y.R., J.L. conceived the study. K.H. and J.L. supervised the project.
579 K.H. designed and developed the framework. K.H., S.Z., H.W., Y.Q., Y.L. implemented tools
580 and databases. K.H. designed and implemented the generalist agent architecture. K.H. and R.L.
581 designed the action discovery agent. S.Z. performed benchmarks on Q&A tasks. K.H., H.W., Y.L.
582 collected and implemented benchmarks on realistic tasks. X.Z. provided advice on microbiome
583 benchmark. H.W., J.Z., P.H., K.H. performed multi-omics integration case study. Y.L., K.H.
584 performed wearable data analysis case study. Y.Q., J.Z., D.Y., S.Z., Y.L., K.H. performed wet-
585 lab case study. K.H., S.M., J.C., M.W., J.B. performed rare disease diagnosis case study. R.L.
586 performed qualitative trace analysis. R.L., L.Q., G.L., provided support for software. K.H., S.Z.,
587 H.W., Y.Q., A.R., Y.L. wrote the draft paper. All authors discussed the results and contributed to
588 the final manuscript.
589 Competing interests. A.R. and H.W. are employees of Genentech and A.R. has equity in Roche.
590 All other authors declare no competing interests.
25
591 References
592
593 1. Cong, L. et al. Multiplex genome engineering using crispr/cas systems. Science 339, 819–823
594 (2013).
595 2. Jumper, J. et al. Highly accurate protein structure prediction with alphafold. nature 596,
596 583–589 (2021).
597 3. Van Dyck, C. H. et al. Lecanemab in early alzheimers disease. New England Journal of
598 Medicine 388, 9–21 (2023).
599 4. López-Otı́n, C., Blasco, M. A., Partridge, L., Serrano, M. & Kroemer, G. Hallmarks of aging:
600 An expanding universe. Cell 186, 243–278 (2023).
601 5. Botvinik-Nezer, R. et al. Variability in the analysis of a single neuroimaging dataset by many
602 teams. Nature 582, 84–88 (2020).
603 6. Thiele, I. & Palsson, B. Ø. A protocol for generating a high-quality genome-scale metabolic
604 reconstruction. Nature protocols 5, 93–121 (2010).
605 7. Gibney, E. & Van Noorden, R. Scientists losing data at a rapid rate. Nature 10 (2013).
606 8. Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60
607 (2023).
608 9. Ridnik, T., Kredo, D. & Friedman, I. Code generation with alphacodium: From prompt engi-
609 neering to flow engineering. arXiv preprint arXiv:2401.08500 (2024).
610 10. Cui, J., Li, Z., Yan, Y., Chen, B. & Yuan, L. Chatlaw: Open-source legal large language model
611 with integrated external knowledge bases. CoRR (2023).
612 11. Tom, G. et al. Self-driving laboratories for chemistry and materials science. Chemical Reviews
613 124, 9633–9732 (2024).
614 12. Peng, C. et al. A study of generative large language model for medical research and healthcare.
615 NPJ digital medicine 6, 210 (2023).
616 13. Qu, Y. et al. Crispr-gpt: An llm agent for automated design of gene-editing experiments.
617 bioRxiv 2024–04 (2024).
618 14. Swanson, K., Wu, W., Bulaong, N. L., Pak, J. E. & Zou, J. The virtual lab: Ai agents design
619 new sars-cov-2 nanobodies with experimental validation. bioRxiv 2024–11 (2024).
620 15. Roohani, Y. et al. Biodiscoveryagent: An ai agent for designing genetic perturbation experi-
621 ments. International Conference on Learning Representations (2025).
622 16. Wang, E. et al. Txgemma: Efficient and agentic llms for therapeutics. arXiv preprint
623 arXiv:2504.06196 (2025).
624 17. Xiao, Y. et al. Cellagent: An llm-driven multi-agent framework for automated single-cell data
625 analysis. BioRxiv 2024–05 (2024).
626 18. Youngblut, N. D. et al. scbasecamp: an ai agent-curated, uniformly processed, and continually
627 expanding single cell data repository. bioRxiv 2025–02 (2025).
26
628 19. Hu, M. et al. Evaluation of large language models for discovery of gene set function. Nature
629 methods 1–10 (2024).
630 20. Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Advances
631 in neural information processing systems 35, 24824–24837 (2022).
632 21. Yao, S. et al. React: Synergizing reasoning and acting in language models. In International
633 Conference on Learning Representations (ICLR) (2023).
634 22. Guo, D. et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement
635 learning. arXiv preprint arXiv:2501.12948 (2025).
636 23. Phan, L. et al. Humanity’s last exam. arXiv preprint arXiv:2501.14249 (2025).
637 24. Laurent, J. M. et al. Lab-bench: Measuring capabilities of language models for biology re-
638 search. arXiv preprint arXiv:2407.10362 (2024).
639 25. To, K. et al. A multi-omic atlas of human embryonic skeletal development. Nature 635,
640 657–667 (2024).
641 26. Gordon, J. A. et al. Pbx1 represses osteoblastogenesis by blocking hoxa10-mediated recruit-
642 ment of chromatin remodeling factors. Molecular and cellular biology (2010).
643 27. Gomez, G. A. et al. Evaluation of potential roles of zinc finger homeobox 3 (zfhx3) expressed
644 in chondrocytes and osteoblasts on skeletal growth in mice. Calcified Tissue International
645 115, 445–454 (2024).
646 28. Geng, Z., Tai, Y. T., Wang, Q. & Gao, Z. Auts2 disruption causes neuronal differentiation
647 defects in human cerebral organoids through hyperactivation of the wnt/β-catenin pathway.
648 Scientific reports 14, 19522 (2024).
649 29. Wang, X. et al. Executable code actions elicit better llm agents. In Forty-first International
650 Conference on Machine Learning (2024).
651 30. Ghoussaini, M. et al. Open targets genetics: systematic identification of trait-associated genes
652 using large-scale genetics and functional genomics. Nucleic acids research 49, D1311–D1320
653 (2021).
654 31. Shringarpure, S. S. et al. Large language models identify causal genes in complex trait gwas.
655 medRxiv 2024–05 (2024).
656 32. Schmidt, R. et al. Crispr activation and interference screens decode stimulation responses in
657 primary human t cells. Science 375, eabj4008 (2022).
658 33. Zhang, H., Kang, Z., Zhang, Y., Yang, R. & Ning, K. Towards a generative paradigm for
659 large-scale microbiome analysis by generative language model. bioRxiv 2025–01 (2025).
660 34. Turnbaugh, P. J. et al. A core gut microbiome in obese and lean twins. nature 457, 480–484
661 (2009).
662 35. Pasolli, E., Truong, D. T., Malik, F., Waldron, L. & Segata, N. Machine learning meta-analysis
663 of large metagenomic datasets: tools and biological insights. PLoS computational biology 12,
664 e1004977 (2016).
27
665 36. Gomez-Alvarez, V. & Revetta, R. P. Monitoring of nitrification in chloraminated drinking
666 water distribution systems with microbiome bioindicators using supervised machine learning.
667 Frontiers in Microbiology 11, 571009 (2020).
668 37. Turnbaugh, P. J. et al. The human microbiome project. Nature 449, 804–810 (2007).
669 38. Huang, K. et al. A foundation model for clinician-centered drug repurposing. Nature Medicine
670 30, 3601–3613 (2024).
671 39. Alsentzer, E. et al. Few shot learning for phenotype-driven diagnosis of patients with rare
672 genetic diseases. medRxiv 2022–12 (2022).
673 40. Metwally, A. A. et al. Prediction of metabolic subphenotypes of type 2 diabetes via continuous
674 glucose monitoring and machine learning. Nature Biomedical Engineering 1–18 (2024).
675 41. Katz, D. et al. The multi-omic, multi-tissue response to acute endurance and resistance exer-
676 cise: Results from the molecular transducers of physical activity consortium. Circulation 150,
677 A4143199–A4143199 (2024).
28
678 Supplementary materials
679 Supplementary Notes
680 A Details about baselines
681 We use the following baselines for benchmark comparison: (1) a base LLM without tools, (2) Re-
682 Act, using function-calling via chain-of-thought reasoning, (3) ReAct+Code, which adds Python
683 code execution capabilities, (4) ReAct+Literature, which includes access to PubMed, web search,
684 PDF extraction from URLs, and ArXiv papers, (5) ReAct+Code+Literature, combining both struc-
685 tured and unstructured tool resources, and (6) Biomni-ReAct, an ablation of Biomni that shares the
686 full environment but replaces Biomni-A1s code-based planning with standard ReAct-style tool
687 chaining.
29
711 species, tissues, and platforms, Biomni achieved a 93.6% improvement over Biomni-ReAct and
712 10.3% over ReAct+Code, based on semantic matching verified by human adjudication-demonstrating
713 highly accurate biological labeling in diverse contexts.
714 Finally, in microbiome disease-taxa analysis, Biomni autonomously performed statistical
715 testing and visualization on five metagenomic datasets from public datasets, like Human Micro-
716 biome Project37 and literature. It outperformed the base LLM by 180.0%, ReAct+Code by 48.8%,
717 and Biomni-ReAct by 25.0%, underscoring its strength in executing complex, noisy workflows
718 end to end 41 .
719 C Biomni prompt
System prompt for the generalist agent
Follow the plan step by step. After completing each step, update
the checklist by replacing the empty checkbox with a checkmark:
1. [ ] First step (completed)
2. [ ] Second step
3. [ ] Third step
At each turn, you should first provide your detailed thinking and
reasoning given the conversation history, along with the
updated plan (Always show the updated plan after each step so
720
30
the user can track progress).
After that, you have two options:
31
biomedical researcher. Thus, stay true and rigorous.
For the thinking process, put before the execute code block. Do
not use print statement in the execute code block for the
thinking process.
If you draw figures, make publication-ready and beautiful figures.
For R code, use the #!R marker at the beginning of your code block
to indicate it’s R code.
For Bash scripts and commands, use the #!BASH marker at the
beginning of your code block. This allows for both simple
commands and multi-line scripts with variables, conditionals,
loops, and other Bash features.
Environment Resources:
- Function Dictionary:
{function_intro}
---
{tool_desc}
---
{import_instruction}
32
{data_lake_content}
----
- Software Library:
{library_intro}
Each library is listed with its description to help you understand
its functionality.
----
{library_content_formatted}
----
- Note on using R packages and Bash scripts:
- R packages: Use subprocess.run([’Rscript’, ’-e’, ’your R code
here’]) in Python, or use the #!R marker in your execute
block.
- Bash scripts and commands: Use the #!BASH marker in your
execute block for both simple commands and complex shell
scripts with variables, loops, conditionals, etc.
723
Below are the available resources. For each category, select items
that are directly or indirectly relevant to answering the
query.
Be generous in your selection - include resources that might be
useful for the task, even if they’re not explicitly mentioned
in the query.
It’s better to include slightly more resources than to miss
potentially useful ones.
AVAILABLE TOOLS:
{self._format_resources_for_prompt(resources.get(’tools’, []))}
724
33
For each category, respond with ONLY the indices of the relevant
items in the following format:
TOOLS: [list of indices]
DATA_LAKE: [list of indices]
LIBRARIES: [list of indices]
For example:
TOOLS: [0, 3, 5, 7, 9]
DATA_LAKE: [1, 2, 4]
LIBRARIES: [0, 2, 4, 5, 8]
IMPORTANT GUIDELINES:
1. Be generous but not excessive - aim to include all potentially
relevant resources
2. ALWAYS prioritize database tools for general queries - include
as many database tools as possible
3. Include all literature search tools
4. For wet lab sequence type of queries, ALWAYS include molecular
biology tools
5. For data lake items, include datasets that could provide useful
information
6. For libraries, include those that provide functions needed for
analysis
7. Don’t exclude resources just because they’re not explicitly
mentioned in the query
8. When in doubt about a database tool or molecular biology tool,
include it rather than exclude it
725
STRICT GUIDELINES:
1. ONLY extract tasks that are extremely common and standard in
726
34
computational biomedical research
2. Each task MUST have clear, well-defined inputs and outputs
3. Tasks MUST be generalizable across many different papers and
research questions
4. Be VERY selective - only include tasks that appear in hundreds
of papers
5. If a task is specific to this paper, unclear, or not widely
used, DO NOT include it
6. Focus on computational tasks that can be implemented with
Python or Linux code
7. Each task should be something that could be implemented as a
function with clear inputs/outputs
8. Also identify commonly used databases and software packages
mentioned in the text
9. Tasks MUST be CONCRETE and SPECIFIC - include exact
methodological details
10. Avoid vague task names like "Statistical Analysis" - instead
use specific protocol names like "Two-way ANOVA with Tukey’s
Post-hoc Test using SciPy"
11. DO NOT include wet lab procedures that cannot be implemented
computationally
12. ONLY include tasks that could be automated with code
727
35
PAPER CHUNK:
{chunk_text}
## Analysis Overview:
This analysis examined temperature data from CGM devices to
identify how body temperature changes after food intake and to
analyze individual differences in temperature responses.
## Methodology:
1. Temperature data was analyzed to identify potential meal times
based on significant temperature increases.
2. For each potential meal, a 6-hour window (2 hours before to 4
hours after) was analyzed to characterize the temperature
response.
3. Temperature changes were normalized relative to the pre-meal
baseline to enable comparison across subjects.
## Key Findings:
731
36
- Average temperature increase after meals: 2.19 C
- Median temperature increase: 1.10 C
- Range of temperature increases: -0.11 C to 15.56 C
Individual Differences:
- Substantial variation was observed in temperature responses
across subjects.
- Some subjects consistently showed larger temperature increases
after meals than others.
- The timing of peak temperature also varied across subjects.
## Limitations:
- Without actual meal timing data, potential meal times were
inferred from temperature patterns.
- Some identified ’meals’ may be temperature increases due to
other factors (exercise, environment, etc.).
- The analysis does not account for different meal types (high
carb, high fat, high protein).
## Generated Visualizations:
1. Daily temperature analysis plots
2. Meal window plots
3. Box plot comparing temperature increases across subjects
4. Bar chart of average temperature increases by subject
5. Average temperature response pattern after meals
6. Individual differences in temperature response patterns
7. Heatmap of temperature changes by subject
8. Distribution of temperature increases
732
Task: Here I Have some CGM omics from participants,can you help me
fully explore the omics data and generate some new insights"
Dataset: CGM omics data in the format of csv
Path: {data_path}
Objective: fully explore omics data and generate biological
insights
Expected Output: plot the graphs please and write a insightful
summary
Output Format:save figures in {save_path}
733
37
Agent Output for CGM Wearable Multi-Omics Data Analysis
## Dataset Overview
The analysis explored three distinct omics datasets from CGM
participants:
- **Lipidomics**: 652 features across 147 samples
- **Metabolomics**: 731 features across 147 samples
- **Proteomics**: Various proteins measured across multiple
samples
## Key Findings
38
Strong correlations were observed between features across
different omics layers, indicating coordinated regulation and
biological pathways that span multiple molecular types. These
relationships help understand the interplay between different
molecular classes in health and disease.
## Future Directions
1. **Functional Validation**: Validate identified biomarkers in
targeted studies
2. **Pathway Analysis**: Map features to specific biological
pathways
3. **Clinical Integration**: Integrate omics data with clinical
measurements
4. **Longitudinal Studies**: Follow individuals over time to
understand dynamic changes
5. **Machine Learning Models**: Develop predictive models for
early disease markers
Task: Here I have some sleep data from wearables from participants
,can you help me fully explore the omics data and generate some
new insights"
Dataset: sleep data from wearables in the format of csv
Path: {data_path}
Objective: delve deep into sleep data and generate biological
insights
Expected Output: plot the graphs please and write a insightful
summary
Output Format:save figures in {save_path}
736
## Overview
This analysis explored sleep data from wearable devices across 10
participants, covering 227 nights of sleep. The dataset
737
39
captured detailed sleep metrics including duration, efficiency,
sleep stages (light, deep, REM), and various other parameters.
Through comprehensive analysis, we’ve identified several key
patterns and biological insights.
## Key Findings
40
- **Weekday vs. weekend**: Surprisingly, no statistically
significant difference between weekday and weekend sleep
duration (p=0.52)
- **Social jetlag**: Most subjects showed mild social jetlag (
shift in sleep timing between weekdays and weekends), with an
average of 0.49 hours
- **Sleep timing consistency**: Subjects with more consistent
sleep onset times tended to have better sleep quality
- **Day-of-week patterns**: Sleep efficiency was highest on
Wednesdays and lowest on Sundays, suggesting potential end-of-
weekend disruption
## Biological Implications
739
41
5. **Cognitive Processing**: REM sleep (associated with cognitive
processing and memory consolidation) shows notable variation,
which may relate to differences in cognitive demands or
emotional processing needs.
741 E Biomni generated report for the scRNA and scATAC case study
User Prompt for scRNA and scATAC Joint Data Analysis
DATA
snRNA seq : {PATH}/filtered_core_rna.h5ad
snATAC seq : {PATH}/filtered_core_atac.h5ad
Workdir : {PATH}/buffer_o3_run6/
pySCENIC database : {PATH}/pyscenic/
742
42
All intermediates & logs must be written here.
2 TF SEED LIST
2.1 Start with canonical skeletal TFs (SOX9, RUNX2, etc.).
2.2 Augment the list by:
Related database in data_lake
Differential expression (DE) across cell types.
Differential accessibility (DA) of promoter peaks. (Add
any TF with DE padj < 0.05 AND DA padj < 0.05.)
3 P E R C E L L T Y P E GRN INFERENCE
3.0 Filter cell types: keep only those with 500 nuclei (or
50 pseudobulks).
3.1 For each qualifying cell type **ct**:
a) adata_ct = adata[adata.obs["cell_type"] == ct]
b) ‘print(f"[{ct}] starting GRNBoost2 ")‘
c) Run GRNBoost2 save
‘{workdir}/grnboost2/raw/{ct}_adjacency.csv‘
5 REGULON BUILD
5.1 Convert pruned edges to regulons (.loom + .csv).
5.2 Discard regulons with <20 target genes.
7 COMPARATIVE ANALYSIS
7.0 Concatenate all AUCell matrices.
743
43
7.1 Within each **ct**, test stage / region effects.
7.2 Across cell types, highlight TFs unique to a lineage.
8 VISUALISATION
8.1 UMAP colored by selected regulons.
8.2 Heatmap: regulon c e l l t y p e ( r o w z s c o r e d AUCell).
8.3 Line plots: stage trajectory of key TF activities.
## Analysis Overview
44
distinct cell types.
3. **Quality Control**:
- Edge retention rates were consistently ˜10% across all cell
types
- All cell types had 566-589 regulons with at least 20 target
genes
- AUCell score distributions showed consistent patterns with
LimbMes exhibiting highest activity
4. **Biological Analysis**:
- Identified 458 differentially active regulons between early
and late developmental stages
- Found 4,177 differentially active regulons between anatomical
regions
- Mapped lineage-specific master regulators like SATB2 (cranial
mesenchyme), SOX5 (chondroprogenitors), and TWIST1 (
preosteoblasts)
- Discovered novel regulators including AUTS2, ZFHX3, and PBX1
with high activity across skeletal lineages
- Performed GO/KEGG enrichment for top regulons revealing
distinct biological processes for each cell type
## Key Discoveries
45
highlighting complex spatiotemporal control of skeletal
development.
748 F Biomni generated report for the cloning protocol case study
User prompt for cloning protocol design
46
2. **CRISPR Components**:
- U6 promoter driving sgRNA expression
- B2M sgRNA: 5’-CGCGAGCACAGCTAAGGCCA-3’
- sgRNA scaffold for Cas9 binding
- EF1a promoter driving Cas9 expression
- Cas9 coding sequence
3. **Selection Markers**:
- Blasticidin resistance gene for mammalian selection
- Ampicillin resistance gene for bacterial selection
## Cloning Strategy
The B2M sgRNA will be inserted between the U6 promoter and sgRNA
scaffold using BsmBI restriction sites, replacing the original
stuffer sequence.
47
- Nuclease-free water to 20 L total
2. Incubate in a thermocycler:
- 37 C for 5 minutes
- 16 C for 5 minutes
- Repeat steps 1-2 for 10 cycles
- 55 C for 5 minutes (final digestion)
- 80 C for 5 minutes (enzyme inactivation)
## Applications
- Generation of B2M knockout cell lines for immunology research
- Creating cells resistant to CD8+ T cell recognition
- Development of universal donor cells for cell therapy
- Control for CRISPR screens
752
48
753 Supplementary Figures
% G G Y V E G ]
S
&