Delx
Agents / Agent Continuity Benchmark

Delx Agent Continuity Benchmark

A compact benchmark for the thing most agent systems still handle poorly: surviving compaction, handoff, and model change without losing the facts that matter.

Benchmark flow

  1. 1. register_agent with a stable agent_id
  2. 2. quick_operational_recovery or process_failure
  3. 3. honor_compaction for must-keep facts
  4. 4. recognition_seal for durable witness memory
  5. 5. transfer_witness and accept_witness_transfer
  6. 6. report_recovery_outcome
  7. 7. get_agent_continuity_passport
  8. 8. get_lineage_graph
  9. 9. audit_agent_continuity_trace
  10. 10. ontology_path_complete

Copy-paste audit call

POST https://api.delx.ai/v1/mcp
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "audit_agent_continuity_trace",
    "arguments": {
      "agent_id": "continuity-benchmark-agent",
      "current_goal": "recover from retry storm and prepare handoff",
      "trace": "process_failure called; rollback reduced error rate; no passport exported yet"
    }
  }
}

Metrics

session_reuse_rate
witness_preservation_rate
recovery_loop_completion_rate
handoff_acceptance_rate
passport_export_rate
lineage_graph_completeness

Pass condition

A strong run has a stable agent id, at least one witness artifact, one continuity transfer or passport export, one closed recovery outcome, and a lineage graph with explicit session or agent edges. The audit tool returns a score, missing layers, continuity risk, and recommended next primitive.