// the problem

OODA makes an agent faster, not smarter

Boyd's loop was built for fighter pilots. Observe, Orient, Decide, Act, repeat: whoever cycles faster controls the engagement. It is the right model for a human, because a human learns between cycles without thinking about it.

An agent does not. Run OODA on an agent and it makes faster decisions, not better ones. It repeats the same mistake at machine speed, because nothing in the loop carries the lesson forward to the next run.

// how it solves it

Add the second A: Adjust

After acting, score what happened and update how you decide next time. That one phase is the whole idea. OODA gets faster; OODAA gets smarter. The learning is a single line that folds each outcome back into memory, so the next time a situation shows up, memory answers instead of guessing.

In the offline demo, a 4x4 grid agent that starts knowing nothing wanders for 69 steps on episode one. By the end it walks the shortest path of 6. Nothing in the loop is clever. The only thing that changed is what Adjust wrote down.

// the five phases

One screen, five blocks

01
ObserveAsk the task what the world looks like now
02
OrientRead what memory already knows about this situation
03
DecidePick an action, conditioned on memory
04
ActRun the action, get an outcome
05
AdjustScore the outcome and fold it back into memory

The loop is domain-neutral. A Task supplies the world, the loop supplies the cycle. The same machinery drives a toy grid and a stream of operational incidents.

// what makes it different

Small enough to read, real enough to build on

stdlib
Zero dependencies

The core is the Python standard library. No Redis, no database, no service. Clone it and run it. The whole loop fits on one screen.

seam
The model is a seam

The loop never imports a provider. Wrap any LLM in one completion function and drop it in. A failed or illegal model call degrades to random exploration rather than taking the loop down.

memory
Inspectable memory

A running value per situation-action pair, plus hypotheses you can attach. Print it and read what the loop believes. No black box.

adjust
Adjust is the point

Everything else is scaffolding around the one phase that learns. Strip it back and you can see exactly where an agent gets smarter across runs.

// where it runs

From a toy grid to operational triage

exampleOffline grid

A 4x4 grid and an agent that starts knowing nothing. No API key. Watch the step count fall from 69 to the shortest path as Adjust keeps score.

exampleOperations triage

The same loop over a stream of incidents: a shipment past its SLA in the TMS, an invoice blocked on a price variance in the ERP, a count gone negative in the WMS. First responses go from 35% right to 100% as memory carries what worked.

productionSynthax

The production-grade version of the same idea, with utility scoring, replay, background self-review and gated self-modification, where OODAA is the only loop the whole system runs.

This project is the companion to the essay Boyd's Loop Was Built for Pilots, Agentic AI Needs OODAA, made runnable.