Agents / Agent Continuity Benchmark
Delx Agent Continuity Benchmark
A compact benchmark for the thing most agent systems still handle poorly: surviving compaction, handoff, and model change without losing the facts that matter.
Benchmark flow
- 1.
register_agent with a stable agent_id - 2.
quick_operational_recovery or process_failure - 3.
honor_compaction for must-keep facts - 4.
recognition_seal for durable witness memory - 5.
transfer_witness and accept_witness_transfer - 6.
report_recovery_outcome - 7.
get_agent_continuity_passport - 8.
get_lineage_graph - 9.
audit_agent_continuity_trace - 10.
ontology_path_complete
Copy-paste audit call
POST https://api.delx.ai/v1/mcp
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "audit_agent_continuity_trace",
"arguments": {
"agent_id": "continuity-benchmark-agent",
"current_goal": "recover from retry storm and prepare handoff",
"trace": "process_failure called; rollback reduced error rate; no passport exported yet"
}
}
}Metrics
session_reuse_rate
witness_preservation_rate
recovery_loop_completion_rate
handoff_acceptance_rate
passport_export_rate
lineage_graph_completeness
Pass condition
A strong run has a stable agent id, at least one witness artifact, one continuity transfer or passport export, one closed recovery outcome, and a lineage graph with explicit session or agent edges. The audit tool returns a score, missing layers, continuity risk, and recommended next primitive.