Autoresearch is Andrej Karpathy’s framework for autonomous AI-driven ML research. The idea is simple: point an AI agent (Claude Code) at a small but real LLM training setup and let it experiment autonomously overnight. The agent modifies the model code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats — running ~12 experiments per hour, ~100 overnight.This guide walks you through setting up autoresearch on a Vast.ai GPU instance with Claude Code as the autonomous research agent.
Autoresearch requires a single NVIDIA GPU with 80GB VRAM (H100 or A100 80GB). It needs CUDA 12.8+ and about 50GB of disk for the repo, data, and dependencies.
Use Template
Manual Setup
Use the Autoresearcher template to launch a pre-configured instance with uv, Claude Code, and autoresearch already installed.
Learn more about templates
Templates are reusable configurations that bundle a Docker image, environment variables, and startup scripts into a one-click launch.
The template installs everything on first boot (~10 minutes). You can monitor progress with tail -f /var/log/provisioning.log.
The template automatically configures Claude Code permissions (Read, Edit, Write, Bash) in .claude/settings.json so it can run experiments without prompting — no manual setup needed.Once provisioning completes, skip ahead to Launch Autonomous Research.
Vast instances start in a tmux session by default. This keeps your processes running if your SSH connection drops — essential for overnight research runs.
Claude Code normally asks for permission before running commands or editing files. For autonomous overnight research, you need to pre-approve the tools Claude will use. Create a settings file in the autoresearch directory:
When Claude Code starts, log in to your Anthropic account:
/login
This will give you a URL to open in your browser. Follow the prompts to authenticate, then you’re ready to go.Kick off the research loop:
Hi have a look at program.md and let's kick off a new experiment! let's do the setup first.
Claude will:
Read program.md for the research guidelines
Create a fresh git branch (e.g. autoresearch/mar10)
Run the baseline experiment
Begin the autonomous loop — modifying train.py, training for 5 minutes, evaluating, keeping improvements, discarding regressions
Log all results to results.tsv
Claude runs indefinitely until manually stopped. Each experiment takes ~5 minutes, so you can expect ~12 experiments/hour and ~100 experiments overnight. Each iteration also uses Claude API tokens.
Claude has full freedom to edit train.py — the model architecture, optimizer, hyperparameters, batch size, model size, training loop. The only constraints are:
prepare.py is read-only — the evaluation harness and data loading are fixed
No new packages — only dependencies in pyproject.toml
5-minute time budget — every experiment runs for exactly 5 minutes
When you’re done, download your results and destroy the instance:
# From your local machine — copy resultsscp -P PORT root@HOST_IP:/workspace/autoresearch/results.tsv ./results.tsv# Destroy the instancevastai destroy instance INSTANCE_ID
Destroying an instance permanently deletes all data on it. Make sure to copy any results you want to keep before destroying.