Skip to content

jerry609/AI4S

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

LLM Research Papers Collection

This repository contains a curated list of research papers related to Large Language Models (LLMs) in scientific research, code generation, and idea evaluation.

1️⃣ Automated Scientific Discovery Systems (AI Scientist)

Systems that leverage LLMs to perform end-to-end scientific research, from idea generation to experimentation and writing.

End-to-End Scientific Discovery

  • The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (Sakana AI, 2024)

    • Description: Automates idea generation, coding, experimentation, plotting, paper writing, and reviewing.
    • Paper: arXiv
  • DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively (Weng et al., 2025)

    • Description: Formalizes discovery as a Bayesian optimization problem with a "hypothesize–verify–analyze" loop.
    • Repository: GitHub
    • Paper: arXiv
  • Towards an AI Co-Scientist (Google / Gemini 2.0, 2025)

    • Description: Multi-agent system for biomedical discovery with generate-debate-evolve structure.
    • Paper: arXiv
  • Coscientist: Autonomous Chemical Research with LLM Agents (Nature, 2023)

    • Description: Multi-agent system controlling a cloud lab for complex organic synthesis.
    • Paper: Nature

Multi-Agent Research Teams

  • Many Heads Are Better Than One (VirSci): Improved Scientific Idea Generation by A LLM-Based Multi-Agent System (ACL 2025)

    • Description: Virtual Scientists ecosystem with multiple specialist agents for idea generation.
    • Repository: GitHub
    • Paper: ACL Anthology
  • ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models (2024)

    • Description: Idea agent + multiple reviewer agents with human review standard alignment.
    • Repository: GitHub
    • Paper: arXiv
  • ToolUniverse / From Models to Scientists (Gao et al., 2025)

    • Description: A framework connecting LLMs to 600+ scientific tools.
    • Website: Kempner Institute
  • Agent Laboratory: Using LLM Agents as Research Assistants

    • Description: A "research OS" for human-AI collaboration in research workflows.
    • Repository: GitHub
  • Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents (2025)

    • Description: Comprehensive survey of scientific agents.
    • Paper: arXiv

2️⃣ Paper-to-Code Reproduction Systems

Multi-agent systems for automatically reproducing research papers as executable code.

Multi-Agent Reproduction Systems

  • Paper2Code (PaperCoder): Automating Code Generation from Scientific Papers in Machine Learning (2025)

    • Description: Planning-Analysis-Generation multi-agent pipeline, evaluated on PaperBench.
    • Repository: GitHub
    • Paper: arXiv
  • ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification of Research Methodologies (2025)

    • Description: Multi-agent system translating ML methodologies into executable code. Achieves 46.9% high-quality code, 25% outperforming baselines.
    • Paper: arXiv
  • AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage (2025)

    • Description: Uses paper lineage (citation tracking) + multi-agent framework for full experiment reproduction. 70%+ improvement over baselines.
    • Paper: arXiv
  • RePro: Reflective Paper-to-Code Reproduction Enabled by Fine-Grained Verification (2025)

    • Description: Fingerprint-based verification + iterative refinement loop. 13% improvement on PaperBench Code-Dev.
    • Paper: arXiv
  • SciReplicate-Bench + Sci-Reproducer: Dual-Agent Algorithmic Reproduction (2025)

    • Description: Paper Agent (reasoning graph) + Code Agent for NLP algorithm reproduction.
    • Paper: arXiv
  • Paper2Code (Autonomous-Scientific-Agents)

    • Description: CrewAI-based system for reproducing computational science papers.
    • Repository: GitHub

3️⃣ Reproduction Benchmarks

Benchmarks for evaluating AI agents' ability to reproduce research papers and experiments.

  • PaperBench: Evaluating AI's Ability to Replicate AI Research (OpenAI, 2025)

    • Description: Benchmark on reproducing 20 ICML 2024 Spotlight/Oral papers. Best agents ~20% score.
    • Repository: GitHub
    • Paper: arXiv
  • CORE-Bench: Computational Reproducibility Agent Benchmark (2024)

    • Description: 90 papers, 270 tasks for computational reproducibility. Includes CORE-Agent baseline.
    • Paper: arXiv
  • LMR-BENCH: Evaluating LLM Agents' Ability on Reproducing Language Modeling Research (EMNLP 2025)

    • Description: 28 tasks from 23 language modeling papers.
    • Paper: arXiv
  • SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction (2025)

    • Description: 100 algorithmic reproduction tasks from 36 NLP papers. Best model ~39% execution accuracy.
    • Paper: arXiv
  • ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code

  • MLAgentBench (2023)

    • Description: Kaggle-style end-to-end experiment benchmark.
    • Paper: ar5iv

4️⃣ Scientific Claim Verification & Review

Tools for evaluating ideas, reviewing papers, and verifying scientific claims.

  • DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process (2025)

    • Description: Structured analysis + literature retrieval + evidence-based argumentation.
    • Repository: GitHub
    • Paper: arXiv
  • ScholarEval: Research Idea Evaluation Grounded in Literature (2025)

    • Description: Retrieval-based idea evaluation for soundness + contribution scoring.
    • Repository: GitHub
    • Paper: arXiv
  • LLM-based Corroborating and Refuting Evidence Retrieval (CREDIFY / CIBER)

    • Description: Multi-view evidence retrieval for claim verification.
    • Repository: GitHub
    • Paper: arXiv
  • Zero-shot Scientific Claim Verification Using LLMs and Citation Text

  • AI4Research: A Survey of Artificial Intelligence for Scientific Research


5️⃣ Research Directions

Combining Reproduction + Multi-Agent Research

  1. Reproduction Multi-Agent (Paper2Code / ResearchCodeAgent / AutoReproduce / RePro) → Solves "implementation / reproduction & reliability"

  2. Research Multi-Agent (VirSci / ResearchAgent / AI Co-Scientist / DeepScientist) → Solves "idea / hypothesis / experiment design"

Natural Research Topics

  • End-to-end system: VirSci-style idea generation → AutoReproduce/RePro-style code generation → automated evaluation
  • Systematic comparison on benchmarks: single agent vs multi-role vs reflective multi-agent
  • New benchmark: Can multi-agent systems "read paper → question it → reproduce/refute with experiments"?

6️⃣ Resources

  • Awesome LLM Agents for Scientific Discovery: GitHub

About

AI4S collections

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published