Home Blog

The Last Human-Written Paper

Agent-Native Research Artifacts

Posted on April 24, 2026 by Amber Liu & the Orchestra Research team

In the near future, most CS papers will be written by AI, and most will be read by AI.

When neither the author nor the audience is human, the three-century-old paper format stops making sense. Papers flatten a branching research process into a clean story, and that flattening imposes two taxes.

The Storytelling Tax

Research is inherently branching and exploratory. Scientists try dozens of approaches, hit dead ends, pivot, and iterate. But papers collapse this rich process into a single winning narrative, discarding every failed attempt, rejected hypothesis, and negative result.

The Real Research Process

Initial question Hyp. A CNN baseline Hyp. B Transformer Hyp. C GAN variant OOM at batch 64 Standard LayerNorm Pivot! inv-std outside fwd Mode collapse Gradient exploded Loss diverged at 7.28 Training stable loss ↓ 4.60 Differential LR emb 3e-4, tfm 3e-5 Loss 3.98 +13% improvement 5 dead ends, 1 pivot, 1 success All of this gets thrown away

What Gets Published

Research question “Can ReLU transformers match softmax?” Abstract “We present a ReLU transformer...” Introduction “Prior work lacks efficiency...” Method “We compute inv-std outside fwd...” Experiments “Differential LR outperforms...” Results “Loss 3.98, +13% improvement” Clean narrative, straight line No dead ends. No failures. No tricks. Knowledge lost forever.
Research explores many branches, but papers only report the winning path. The map of where not to go, often the most expensive knowledge a project produces, never leaves the lab.

The Engineering Tax

Papers describe methods at the precision needed to convince a reviewer, not at the precision needed to reproduce the work. Hyperparameters are underspecified. Warmup schedules live in someone's head. Numerical stability fixes exist in no document. The gap between "sufficient to believe" and "sufficient to execute" is where reproduction breaks down.

Reproduction Information Gap

8,921 expert-annotated reproduction requirements across 23 ICML papers (PaperBench)

Fully specified in PDF45.4%
Missing hyperparameters26.2%
Vague description21.9%
Cross-reference only13.4%
Missing code / baseline detail21.7%
Less than half of what an agent needs to reproduce a paper is actually in the PDF.

The information exists somewhere (a lab notebook, a Slack thread, the original author's muscle memory), but not in any document an AI agent can access. Every reproduction attempt pays the full cost of rediscovering it.

The Solution: Four Interlocking Layers

ARA restructures a paper into four machine-native layers. Together they form a single executable knowledge package: the organized, evolving knowledge produced during research, not the narrative compiled afterward.

PAPER.md                      # Human-readable overview & entry point
│
├── logic/                    # Cognitive Layer
│   ├── claims.yaml           # Falsifiable claims with epistemic status
│   ├── concepts/             # Formal concept definitions
│   ├── experiments/          # Declarative experiment plans
│   └── problem_spec.md       # The "what and why" of the research
│
├── src/                     # Physical Layer
│   ├── kernel/               # Novel algorithm core
│   ├── configs/              # Annotated with search ranges & sensitivity
│   └── environment.yaml      # Exact reproducibility spec
│
├── trace/                   # Exploration Graph
│   ├── graph.json            # Full branching research DAG
│   ├── dead_ends/            # Every failed attempt preserved
│   └── pivots/               # Decision points & lessons learned
│
└── evidence/                # Evidence Layer
    ├── results/              # Machine-readable quantitative outputs
    ├── logs/                 # Raw experiment logs
    └── curves/               # Training curves & metrics

Live Research Manager

ARA doesn't require researchers to manually package their work. The Live Research Manager silently captures the research trajectory during AI-human collaboration: no interruptions, no extra effort. The artifact builds itself in the background.

Collaborate with AI on research, and the trajectory is automatically captured with epistemic provenance: every claim tagged with who proposed it, who verified it, and how strongly the evidence supports it.

Silent integration Epistemic objectivity Framework independence Comprehensive capture Faithful translation

Results

A paper shows the path taken. ARA remembers the paths abandoned, and the choices that made the road. We evaluate across three layers: understanding, reproduction, and extension.

Understanding
+21.3pp
93.7% vs 72.4% across 450 questions
PaperBench + RE-Bench · wins every category
Reproduction
+7.0pp
64.4% vs 57.4%; advantage grows with difficulty
150 subtasks · 15 PaperBench papers
Extension
3/5
Tasks where ARA wins on best score; reaches a useful first move earlier on all 5
5 RE-Bench tasks · MALT failure traces
Knowledge over Narrative

The organized, evolving knowledge produced during research is the primary scientific object. The narrative paper is a compiled view.

Talk

Cite

@article{liu2026ara,
  title   = {The Last Human-Written Paper: Agent-Native Research Artifacts},
  author  = {Liu, Jiachen and Pei, Jiaxin and Huang, Jintao and Si, Chenglei and others},
  year    = {2026},
  journal = {arXiv preprint arXiv:2604.24658},
  url     = {https://arxiv.org/abs/2604.24658}
}