ICRL Algorithm

Core Loop

ICRL improves an agent by turning successful episodes into reusable context. High-level cycle:

Attempt task in environment.
Store successful trajectory.
Retrieve similar prior steps on next tasks.
Reuse those examples in prompts.
Update utility feedback and curate low-value trajectories.

Training vs Inference

train(...): successful runs can be stored (optionally gated by verification).
run(...): database is read-only for that episode.

Retrieval Granularity

Both implementations use step-level retrieval in the ReAct loop:

Plan retrieval: retrieve_for_plan(goal)
Step retrieval: retrieve_for_step(goal, plan, observation)

That keeps examples relevant to the exact observation, not only the goal text.

Curation Signals

Utility is not only “did retrieval lead to success.” Python also tracks deferred validation metadata (for code-change persistence and supersession), then combines signals into utility scoring.

Why This Works

Relevant examples reduce planning errors.
Step-level examples improve local decisions.
Curation limits drift from stale or low-signal trajectories.
Over time, average trajectory quality improves without hand-written few-shot sets.

Contributing

ReAct Loop

​Core Loop

​Training vs Inference

​Retrieval Granularity

​Curation Signals

​Why This Works

Core Loop

Training vs Inference

Retrieval Granularity

Curation Signals

Why This Works