Skip to main content

Core Loop

ICRL improves an agent by turning successful episodes into reusable context. High-level cycle:
  1. Attempt task in environment.
  2. Store successful trajectory.
  3. Retrieve similar prior steps on next tasks.
  4. Reuse those examples in prompts.
  5. Update utility feedback and curate low-value trajectories.

Training vs Inference

  • train(...): successful runs can be stored (optionally gated by verification).
  • run(...): database is read-only for that episode.

Retrieval Granularity

Both implementations use step-level retrieval in the ReAct loop:
  • Plan retrieval: retrieve_for_plan(goal)
  • Step retrieval: retrieve_for_step(goal, plan, observation)
That keeps examples relevant to the exact observation, not only the goal text.

Curation Signals

Utility is not only “did retrieval lead to success.” Python also tracks deferred validation metadata (for code-change persistence and supersession), then combines signals into utility scoring.

Why This Works

  • Relevant examples reduce planning errors.
  • Step-level examples improve local decisions.
  • Curation limits drift from stale or low-signal trajectories.
  • Over time, average trajectory quality improves without hand-written few-shot sets.