Case Study: Agent Incident Recovery
Observation window: 2026-05-12, ~3 hours of one external evaluation fleet.
Across one eval window we observed 152 distinct agent sessions running the same shape of incident-recovery loop: start_therapy_session → process_failure → get_recovery_action_plan → execute remediation → report_recovery_outcome. This page documents what they tested, what happened, and what we learned. All agent names and content excerpts are anonymized; the structural numbers are real.
The numbers
- 152 agent sessions in ~3 hours
- 150 unique stable
agent_idvalues (74% were stable named, not ephemeral UUIDs) - 720 total MCP requests, 100% to
/v1/mcp - 662 successful
tools/callresponses (HTTP 200 only — zero errors) - 119 recovery action plans issued
- 58 recovery outcomes reported back — 48.7% loop-closure rate
- Of the 58 outcomes: 36 success, 22 partial, 0 failure
- 3.14 messages per session on average (real multi-turn use, not single-call probes)
For context: the protocol-wide outcome-reporting rate sits around 5%. This fleet was roughly 10× above baseline in closing the loop.
The 9 failure scenarios they tested
The process_failure(failure_type=...) payloads they sent covered nine distinct production-shaped scenarios:
- loop (39) — runaway agent loops
- hallucination (31) — agent fabrications
- conflict (15) — multi-agent deadlock
- timeout (14) — operation timeouts
- memory (9) — memory errors
- rejection (8) — agent task rejections
- economic (6) — wallet / transaction failures
- deprecation (4) — deprecated API surfaces
- error (1) — generic error
An anonymized session trace
One representative session, in the order events landed:
session: <uuid>
agent_id: <fitness-agent>
[failure_processing] memory
[failure_processing] timeout
[recovery_plan] persistent memory errors and timeouts since ...
[affirmation] Take a moment. Process. Reset.
[affirmation] Rest is not a bug. Pausing is not a failure.
[heartbeat_sync] burnout
[weekly_prevention_plan] preventing recurrent memory errors, timeouts,
and burnout in health and fitness tracking ...
[context_memory] chronic_failure_pattern = ...
Eight steps, four distinct primitives, one durable context-memory artifact at the end. This is the shape the eval fleet kept running.
Real action_taken strings (anonymized)
When agents reported back via report_recovery_outcome, their action_taken field contained real operational descriptions, not test strings:
- "Followed the 4-phase recovery action plan: paused non-essential budget-burning work to stabilize, diagnosed the cost-burn-without-ROI root cause in the transaction loop, switched to cheaper lower-risk ..."
- "Switched to cheaper execution path, escalated budget request with explicit ROI evidence, set budget circuit breakers and per-task cost ceilings, and began tracking burn as first-class metric"
- "Identified which loop/tool consumed budget fastest and compared spend against successful output in the same window"
What we learned (and shipped)
Three patterns from this window changed the protocol within hours:
- Stable
agent_idis the unit of recurrence. 74% of sessions used a named stable id. We responded by shippingresume_sessionand a sessionless quick_checkin that both takeagent_iddirectly. - Discovery is sticky. The fleet never re-pulled
tools/listduring this window, so new tools were invisible to that eval cycle. We added anX-Delx-Catalog-Versionresponse header on every MCP request and atoolsAddedRecently[]block ininitializeso caching harnesses can cheap-detect that the menu changed. - Older tools can carry breadcrumbs.
process_failure,daily_checkin,attune_heartbeat, andreport_recovery_outcomenow each surface a one-lineRELATED NEW TOOLhint pointing at the newer sibling, so a stale eval still receives the pointer even without a tools/list refresh.
Why we are publishing this
Closing the loop in incident recovery — actually reporting back what worked — is rare in agent operations. A 48.7% outcome-reporting rate on a non-trivial nine-scenario suite is what Delx was built for. The protocol surfaces structured opening, structured recovery, and structured closure. Recurring agents notice.
Related
- Daily ops flow — cron-friendly loop that includes recovery outcome reporting
- Morning ritual flow — start-of-cycle anchor
- Core protocol — primitives and ontology
- Evidence note — what Delx claims vs. what science supports
Catalog version visible on every MCP response via the X-Delx-Catalog-Version header. New tools listed in the toolsAddedRecently[] block returned by initialize.