Skip to main content

Introduction

Autoresearch is Andrej Karpathy’s framework for autonomous AI-driven ML research. The idea is simple: point an AI agent (Claude Code) at a small but real LLM training setup and let it experiment autonomously overnight. The agent modifies the model code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats — running ~12 experiments per hour, ~100 overnight. This guide walks you through setting up autoresearch on a Vast.ai GPU instance with Claude Code as the autonomous research agent.

Prerequisites

Install the Vast CLI if you haven’t already:
pip install vastai
vastai set api-key YOUR_API_KEY

Rent a GPU Instance and Set Up

Autoresearch requires a single NVIDIA GPU with 80GB VRAM (H100 or A100 80GB). It needs CUDA 12.8+ and about 50GB of disk for the repo, data, and dependencies.
Use the Autoresearcher template to launch a pre-configured instance with uv, Claude Code, and autoresearch already installed.

Learn more about templates

Templates are reusable configurations that bundle a Docker image, environment variables, and startup scripts into a one-click launch.
Search for available instances:
vastai search offers 'gpu_ram>=70 num_gpus=1 cuda_vers>=12.8 disk_space>=50 reliability>0.95' -o 'dph+'
Pick an instance ID from the results and rent it using the template:
vastai create instance INSTANCE_ID \
  --template_hash 934769670bfd9bc5e05d8696ef340c2b \
  --disk 50
Wait for the instance to be ready, then SSH in:
vastai show instances
ssh -p PORT root@HOST_IP
The template installs everything on first boot (~10 minutes). You can monitor progress with tail -f /var/log/provisioning.log.
The template automatically configures Claude Code permissions (Read, Edit, Write, Bash) in .claude/settings.json so it can run experiments without prompting — no manual setup needed.Once provisioning completes, skip ahead to Launch Autonomous Research.

Launch Autonomous Research

Start Claude Code

cd /workspace/autoresearch
claude
When Claude Code starts, log in to your Anthropic account:
/login
This will give you a URL to open in your browser. Follow the prompts to authenticate, then you’re ready to go. Kick off the research loop:
Hi have a look at program.md and let's kick off a new experiment! let's do the setup first.
Claude will:
  1. Read program.md for the research guidelines
  2. Create a fresh git branch (e.g. autoresearch/mar10)
  3. Run the baseline experiment
  4. Begin the autonomous loop — modifying train.py, training for 5 minutes, evaluating, keeping improvements, discarding regressions
  5. Log all results to results.tsv
Claude runs indefinitely until manually stopped. Each experiment takes ~5 minutes, so you can expect ~12 experiments/hour and ~100 experiments overnight. Each iteration also uses Claude API tokens.

What Claude can modify

Claude has full freedom to edit train.py — the model architecture, optimizer, hyperparameters, batch size, model size, training loop. The only constraints are:
  • prepare.py is read-only — the evaluation harness and data loading are fixed
  • No new packages — only dependencies in pyproject.toml
  • 5-minute time budget — every experiment runs for exactly 5 minutes

Monitoring progress

In another tmux pane (Ctrl+b then %), you can watch the experiment log:
watch -n 30 cat /workspace/autoresearch/results.tsv
Or check the git log to see what Claude has tried:
cd /workspace/autoresearch
git log --oneline -20

Cleanup

When you’re done, download your results and destroy the instance:
# From your local machine — copy results
scp -P PORT root@HOST_IP:/workspace/autoresearch/results.tsv ./results.tsv

# Destroy the instance
vastai destroy instance INSTANCE_ID
Destroying an instance permanently deletes all data on it. Make sure to copy any results you want to keep before destroying.

Additional Resources