in context
reinforcement learning
for language models

In-context reinforcement learning improves language models in real time by putting agent's most useful past actions into context for the next task.

Open docs

What is ICRL

Reinforcement learning requires retraining the model with every new experience, a very expensive process that doesn's work for the closed-source frontier and takes weeks to successfully complete. In-Context Reinforcement Learning (ICRL) lets LLM agents improve continuously without any post-training work at all. When an agent successfully completes a task, ICRL stores that trajectory, so the next time a similar task comes up, the agent will retrieve the most relevant past steps and use them as in-context examples to guide its decisions.

When the attention mechanism attends to repeated in context examples, it forms what is functionally a LoRA on top of the base model equivalent to a small fine tune on the in context data 1. We see improvements across a wide range of tasks, from coding to support triage to RLHF type tasks 2.

ICRL works with any LLM provider, even closed-source ones. It ships as a npm and pip package, so you can apply our research right away.

ICRL vs traditional RL

You can get the same improvements from traditional RL with ICRL, at a much cheaper cost, on closed source models, and at all times.

Computational cost

ICRL:Minimal — only storage + retrieval, no training
Traditional:High — full training runs, GPUs, evals

Model capability

ICRL:All models, including closed-source ones
Traditional:Open-source models only

When it improves

ICRL:Immediately, on the next similar task
Traditional:After a full retraining cycle

What changes

ICRL:In-context examples (retrieval memory)
Traditional:Model weights / policy parameters

Infrastructure

ICRL:Lightweight trajectory DB + retrieval
Traditional:GPU training pipelines + eval + rollouts

Feedback latency

ICRL:Instant — same session
Traditional:Batch-delayed — hours to days

Interpretability

ICRL:Explicit — retrievable examples show what worked
Traditional:Implicit — behavior encoded in weights

Codebase specific agents

ICRL enables you to specialize an agent to a specific codebase. Using the ICRL CLI, you can launch a terminal UI like Claude Code or Codex that creates an interactive coding assistant in your shell. Unlike static assistants, it gets better over time as successful trajectories are stored and retrieved for future tasks.

uv run icrl chat

The more you use it, the better it gets at understanding your project's patterns, conventions, and architecture, like a custom model fine-tuned to your codebase.

ICRLHF — learning from any feedback signal

ICRL can be used with any feedback signal, not just task success. With ICRLHF, agents can learn from human preferences in real time, reacting to the first thumbs up or down signal they receive.

Human preference

User picks the best of N candidate outputs

Task success / failure

Tests pass, build succeeds, goal reached

Code review signals

PR approved, changes requested, comments

Conversational corrections

User edits, rephrases, or overrides the output