RFC-0013: Coordinator Governance & Meta-Coordination#
Status: Proposed
Created: 2026-02-07
Authors: OpenIntent Contributors
Requires: RFC-0001 (Intents), RFC-0003 (Leasing), RFC-0004 (Governance), RFC-0011 (Access Control), RFC-0012 (Task Decomposition & Planning)
Abstract#
This RFC formalizes the Coordinator as a governed participant in the OpenIntent protocol. A coordinator — whether an LLM, a human, or a composite system — is subject to the same accountability, leasing, and auditability mechanisms as any other agent. The RFC introduces supervisor hierarchies, coordinator guardrails, plan review gates, coordinator handoff and failover, and decision auditing. Together, these ensure that the entity orchestrating multi-agent work is itself observable, bounded, and overridable.
Motivation#
RFC-0012 introduced Plans and Tasks, enabling a coordinator to decompose intents into concrete work. But the coordinator itself operates outside the protocol's governance model. This creates a critical gap:
-
No accountability for bad plans. A coordinator can create an arbitrarily expensive, risky, or nonsensical plan. Nothing in the protocol constrains this or requires review before execution begins.
-
No failover. If a coordinator crashes, stalls, or becomes unresponsive, plans in flight have no recovery mechanism. The lease model (RFC-0003) applies to tasks and intents, but not to the coordination role itself.
-
No supervision hierarchy. In practice, coordination is layered — an LLM coordinator should have a human supervisor who can override, redirect, or pause. The protocol doesn't model this relationship.
-
No guardrails. Budget limits, scope boundaries, delegation depth limits, and escalation thresholds are ad-hoc. The protocol should express these as declarative constraints that the coordinator is bound by.
-
No decision auditing at the coordination level. Task events capture what agents do. But the coordinator's decisions — why it chose this plan, why it delegated to this agent, why it re-planned — are invisible to the audit trail.
Without these, the "virtuous loop" of LLM-as-coordinator is incomplete: the loop has no governor.
Terminology#
| Term | Definition |
|---|---|
| Coordinator | An agent (LLM, human, or composite) responsible for plan creation, task scheduling, monitoring, and escalation for one or more intents |
| Supervisor | A coordinator's designated overseer, with authority to override, pause, or replace the coordinator |
| Coordinator Lease | A lease (RFC-0003) granting an agent the coordinator role for a specific intent or portfolio |
| Guardrail | A declarative constraint bounding what a coordinator is allowed to do |
| Decision Record | An auditable log entry capturing a coordination decision (plan creation, re-planning, delegation, escalation) with rationale |
| Plan Review Gate | A checkpoint requiring supervisor approval before a plan is activated |
| Heartbeat | A periodic signal from a coordinator proving it is still active and responsive |
Design#
1. Coordinator as a Governed Agent#
A coordinator is an agent with a special role. It holds a coordinator lease on an intent (or portfolio), granting it authority to create plans, schedule tasks, and make orchestration decisions. Like any lease, it has a TTL, is revocable, and is exclusive.
1.1 Coordinator Lease#
{
"id": "clease_01HABC",
"intent_id": "intent_01HXYZ",
"agent_id": "llm-coordinator-01",
"role": "coordinator",
"supervisor_id": "human-operator-01",
"granted_at": "2026-02-07T10:00:00Z",
"expires_at": "2026-02-07T22:00:00Z",
"heartbeat_interval_seconds": 60,
"last_heartbeat": "2026-02-07T10:15:00Z",
"guardrails": {
"max_budget_usd": 50.00,
"max_tasks_per_plan": 20,
"max_delegation_depth": 3,
"max_concurrent_tasks": 10,
"allowed_capabilities": ["data_access", "analytics", "reporting"],
"requires_plan_review": true,
"auto_escalate_after_failures": 3,
"scope": "intent"
},
"status": "active",
"version": 1
}
Key properties:
- supervisor_id: Every coordinator has a supervisor. For LLM coordinators, this is typically a human. For sub-coordinators, it may be a parent coordinator. The chain terminates at a human.
- heartbeat_interval_seconds: The coordinator must send heartbeats at this interval. Miss two consecutive heartbeats, and failover triggers.
- guardrails: Declarative constraints the coordinator must operate within (see Section 3).
- scope: Whether this coordinator lease covers a single intent ("intent") or an entire portfolio ("portfolio").
1.2 Coordinator Registration#
A coordinator registers its capabilities and preferences:
{
"agent_id": "llm-coordinator-01",
"type": "llm",
"model": "claude-4",
"capabilities": ["planning", "monitoring", "escalation", "re-planning"],
"supported_domains": ["compliance", "data-pipeline", "content"],
"max_concurrent_intents": 5,
"preferred_heartbeat_interval": 60,
"metadata": {
"provider": "anthropic",
"context_window": 200000,
"supports_tools": true
}
}
The type field distinguishes coordinator kinds:
- "llm" — An LLM acting as coordinator (via MCP or tool integration)
- "human" — A human coordinator (via dashboard or API)
- "composite" — LLM + human working together (LLM proposes, human approves)
- "system" — Automated rule-based coordinator (cron-like, no judgment)
2. Supervisor Hierarchy#
Every coordinator has a supervisor. This creates a chain of accountability that terminates at a human.
Human Operator (ultimate authority)
└── LLM Coordinator A (portfolio-level)
├── LLM Coordinator B (intent-level, compliance domain)
│ └── Agent workers (task-level)
└── LLM Coordinator C (intent-level, data domain)
└── Agent workers (task-level)
2.1 Supervisor Authority#
A supervisor can:
| Action | Description | Event Logged |
|---|---|---|
| Override plan | Replace or modify coordinator's plan | coordinator.plan_overridden |
| Pause coordinator | Temporarily suspend coordinator activity | coordinator.paused |
| Resume coordinator | Resume a paused coordinator | coordinator.resumed |
| Replace coordinator | Revoke lease and assign a new coordinator | coordinator.replaced |
| Adjust guardrails | Tighten or loosen coordinator constraints | coordinator.guardrails_updated |
| Approve plan | Sign off on a plan at a review gate | plan.approved_by_supervisor |
| Reject plan | Reject a plan with feedback | plan.rejected_by_supervisor |
| Force-escalate | Force a specific task to escalate to human | task.force_escalated |
2.2 Escalation Chain#
When a coordinator encounters something outside its guardrails or capabilities, it escalates to its supervisor:
- Coordinator hits guardrail (e.g., budget exceeded) → auto-escalates
- Coordinator is uncertain (e.g., ambiguous requirements) → voluntary escalation
- Supervisor reviews context (intent, plan, events, coordinator decisions)
- Supervisor decides: adjust guardrails, override plan, provide guidance, or escalate further
- Decision is recorded as a
coordinator.escalation_resolvedevent
If the supervisor is also a coordinator (not a human), the escalation continues up the chain until it reaches a human. The protocol guarantees the chain terminates — a coordinator lease cannot be created without a supervisor_id, and at least one ancestor must have type: "human".
3. Guardrails#
Guardrails are declarative constraints that the protocol enforces. The coordinator cannot violate these — attempts are rejected with a guardrail_violation error, and the attempt is logged.
3.1 Budget Guardrails#
Integrates with RFC-0009 (Cost Tracking). When cumulative task costs approach the budget, a warning event is logged. On exceed, the specified action triggers (pause, escalate, or fail).
3.2 Scope Guardrails#
{
"max_tasks_per_plan": 20,
"max_delegation_depth": 3,
"max_concurrent_tasks": 10,
"max_plan_versions": 5,
"allowed_capabilities": ["data_access", "analytics", "reporting"]
}
These prevent runaway decomposition (infinite sub-tasks), unbounded delegation chains, resource exhaustion from too many concurrent tasks, and scope creep through capability restrictions.
3.3 Temporal Guardrails#
{
"max_plan_duration_hours": 48,
"max_task_wait_hours": 4,
"checkpoint_timeout_hours": 24,
"require_progress_every_minutes": 30
}
These ensure things don't stall indefinitely. If no progress events are recorded within the require_progress_every_minutes window, an coordinator.stalled event is logged and the supervisor is notified.
3.4 Review Guardrails#
{
"requires_plan_review": true,
"requires_replan_review": false,
"auto_escalate_after_failures": 3,
"require_human_for_capabilities": ["financial_approval", "legal_review"]
}
requires_plan_review means the coordinator must submit plans for supervisor approval before activation. require_human_for_capabilities ensures certain capability requirements always route to humans, never to other LLMs.
4. Decision Records#
Every significant coordinator decision produces a Decision Record — an event on the intent's audit trail capturing what the coordinator decided, why, and what alternatives were considered.
4.1 Decision Record Object#
{
"id": "drec_01HABC",
"type": "coordinator.decision",
"coordinator_id": "llm-coordinator-01",
"intent_id": "intent_01HXYZ",
"decision_type": "plan_created",
"summary": "Created 4-task plan for Q1 compliance report",
"rationale": "Decomposed into data gathering (2 parallel tasks), analysis (1 task dependent on gathering), and report generation (1 task dependent on analysis). Added checkpoint after analysis for compliance officer review.",
"alternatives_considered": [
{
"description": "Single-task approach: one agent does everything",
"rejected_reason": "No agent has all required capabilities (data_access + analytics + reporting)"
},
{
"description": "5-task plan with separate financial and HR analysis",
"rejected_reason": "Added complexity with minimal benefit; combined analysis is sufficient"
}
],
"confidence": 0.85,
"timestamp": "2026-02-07T10:01:00Z"
}
4.2 Decision Types#
| Decision Type | When Logged |
|---|---|
plan_created |
Coordinator creates a new plan |
plan_modified |
Coordinator re-plans (adds/removes/reorders tasks) |
task_assigned |
Coordinator selects an agent for a task |
task_delegated |
Coordinator approves or initiates delegation |
escalation_initiated |
Coordinator escalates to supervisor |
escalation_resolved |
Coordinator receives supervisor guidance and acts |
checkpoint_evaluated |
Coordinator makes a decision at a plan checkpoint |
failure_handled |
Coordinator decides how to respond to a task failure |
guardrail_approached |
Coordinator adjusts behavior to stay within guardrails |
coordinator_handoff |
Coordinator transfers control to another coordinator |
Decision records are what make LLM coordination auditable. When a human asks "why did the agent do that?", the decision record provides the answer — not just what happened, but the reasoning and alternatives.
5. Coordinator Lifecycle & Failover#
5.1 Lifecycle States#
┌──────────┐ ┌──────┐ ┌──────────┐
│registering├───►│active├───►│completing│
└──────────┘ └──┬─┬─┘ └────┬─────┘
│ │ │
│ │ ┌───▼────┐
│ │ │completed│
│ │ └────────┘
│ │
┌────▼─┘
│ │
┌────▼┐ ┌▼──────────┐
│paused│ │unresponsive│
└──┬──┘ └─────┬─────┘
│ │
│ ┌────▼────┐
│ │failed_over│
│ └─────────┘
│
┌───▼──┐
│active│ (resumed)
└──────┘
| State | Description |
|---|---|
registering |
Coordinator is being set up, guardrails applied |
active |
Coordinator is operational, sending heartbeats |
paused |
Supervisor has paused the coordinator |
unresponsive |
Missed 2+ heartbeats, failover pending |
failed_over |
Another coordinator has taken over |
completing |
All tasks finishing, coordinator wrapping up |
completed |
Coordination finished successfully |
5.2 Heartbeat & Failover#
Coordinators send periodic heartbeats:
{
"type": "coordinator.heartbeat",
"coordinator_id": "llm-coordinator-01",
"intent_id": "intent_01HXYZ",
"timestamp": "2026-02-07T10:15:00Z",
"active_tasks": 3,
"pending_decisions": 0,
"budget_used_usd": 12.50,
"status_summary": "3 tasks running, analysis 60% complete"
}
If two consecutive heartbeats are missed:
1. Coordinator state transitions to unresponsive
2. coordinator.unresponsive event is logged
3. Supervisor is notified
4. After a grace period (heartbeat_interval * 3), failover triggers:
- A new coordinator is assigned (from a configured failover pool, or the supervisor takes over)
- The new coordinator receives the full plan, task states, and event history
- The new coordinator resumes from the last known state
- coordinator.failed_over event records the transition
5.3 Coordinator Handoff#
A coordinator can voluntarily hand off to another coordinator:
{
"type": "coordinator.handoff",
"from_coordinator": "llm-coordinator-01",
"to_coordinator": "llm-coordinator-02",
"reason": "Context window approaching limit, transferring to fresh coordinator",
"state_summary": {
"plan_version": 3,
"tasks_completed": 5,
"tasks_remaining": 3,
"key_decisions": ["drec_01", "drec_02", "drec_03"],
"open_items": ["Checkpoint cp_02 pending approval"]
}
}
This is critical for LLM coordinators: context windows have limits. When a long-running coordination approaches the limit, the coordinator packages its state summary, hands off to a fresh instance, and the new coordinator picks up with the full protocol state (plan, tasks, events) plus the summarized context.
6. Composite Coordination#
The composite coordinator type formalizes the human-LLM collaboration pattern:
{
"agent_id": "composite-coord-01",
"type": "composite",
"components": {
"proposer": "llm-coordinator-01",
"approver": "human-operator-01"
},
"mode": "propose-approve",
"auto_approve_threshold": 0.95
}
Modes:
- propose-approve: LLM proposes plans/decisions, human approves or rejects. The default for high-stakes coordination.
- act-notify: LLM acts autonomously, human is notified of all decisions. For lower-stakes or time-sensitive coordination.
- act-audit: LLM acts fully autonomously, human can audit asynchronously. For routine, well-understood workflows.
The auto_approve_threshold allows high-confidence decisions to proceed without blocking on human approval (only in propose-approve mode). If the coordinator's confidence (from the Decision Record) exceeds this threshold, the decision auto-approves with a decision.auto_approved event.
7. Permissions Integration (RFC-0011)#
Coordinator governance integrates with the existing permissions model:
7.1 Coordinator Permissions#
intent:
compliance_report:
permissions:
policy: restricted
allow:
- agent: llm-coordinator-01
grant: [coordinate, read, delegate]
- agent: data-agent
grant: [execute]
- agent: compliance-officer
grant: [approve, read]
coordinator:
agent: llm-coordinator-01
supervisor: compliance-officer
guardrails:
max_budget_usd: 50
requires_plan_review: true
New permission grants:
- coordinate: Authority to create plans, assign tasks, and make orchestration decisions
- approve: Authority to approve plans at review gates and resolve checkpoints
- supervise: Authority to override, pause, or replace a coordinator
7.2 Task-Level Permission Delegation#
When a coordinator assigns a task, it delegates a scoped subset of its permissions:
- Coordinator holds
[coordinate, read, delegate]on the intent - Coordinator creates Task A and assigns it to
data-agent data-agentreceives[execute, read]on Task A (scoped from coordinator's delegation grant)data-agentcannot access other tasks, modify the plan, or escalate outside the task scope
8. Python SDK#
8.1 Coordinator Definition#
from openintent import coordinator, CoordinatorContext, Plan, Guardrails
@coordinator(
name="compliance-coordinator",
type="composite",
mode="propose-approve",
guardrails=Guardrails(
max_budget_usd=50.0,
max_tasks_per_plan=20,
requires_plan_review=True,
auto_escalate_after_failures=3,
),
)
async def compliance_coordinator(ctx: CoordinatorContext) -> Plan:
# Coordinator has access to intent, capabilities registry, and agent pool
available_agents = await ctx.discover_agents(
capabilities=["data_access", "analytics", "reporting"]
)
# Create plan with decision rationale
plan = await ctx.propose_plan(
tasks=[
fetch_financials.t(quarter=ctx.intent.input["quarter"]),
fetch_hr_data.t(quarter=ctx.intent.input["quarter"]),
run_analysis.t().depends_on(fetch_financials, fetch_hr_data),
generate_report.t().depends_on(run_analysis),
],
checkpoints=[
Checkpoint(after=run_analysis, approvers=["compliance-officer"]),
],
rationale="Parallel data gathering, sequential analysis, human review before report generation",
)
# In propose-approve mode, this blocks until supervisor approves
return plan
8.2 Supervisor Hooks#
from openintent import on_plan_proposed, on_escalation, on_guardrail_warning
@on_plan_proposed
async def review_plan(ctx: SupervisorContext, plan: Plan):
"""Called when a coordinator proposes a plan for review."""
if plan.estimated_cost_usd > 30:
return ctx.reject(reason="Budget too high, simplify the plan")
return ctx.approve()
@on_escalation
async def handle_escalation(ctx: SupervisorContext, escalation: Escalation):
"""Called when a coordinator escalates a decision."""
return ctx.decide(
action="proceed",
guidance="Use the conservative interpretation of section 4.2",
)
@on_guardrail_warning
async def handle_guardrail(ctx: SupervisorContext, warning: GuardrailWarning):
"""Called when a coordinator approaches a guardrail limit."""
if warning.guardrail == "budget" and warning.percentage > 90:
return ctx.adjust_guardrail(max_budget_usd=75.0)
8.3 Coordinator Context API#
class CoordinatorContext:
intent: Intent # The intent being coordinated
plan: Plan | None # Current plan (if exists)
guardrails: Guardrails # Active guardrails
supervisor: AgentRef # Supervisor reference
async def propose_plan(self, tasks, checkpoints, rationale) -> Plan:
"""Create a plan and submit for review (if required by guardrails)."""
async def replan(self, reason: str, changes: PlanChanges) -> Plan:
"""Modify the active plan with a new version."""
async def discover_agents(self, capabilities: list[str]) -> list[Agent]:
"""Find agents matching required capabilities."""
async def assign_task(self, task_id: str, agent_id: str, rationale: str) -> None:
"""Assign a task to an agent (creates decision record)."""
async def escalate(self, reason: str, context: dict = None) -> Decision:
"""Escalate to supervisor, blocks until resolved."""
async def record_decision(self, decision_type: str, summary: str,
rationale: str, confidence: float = None) -> None:
"""Explicitly log a coordination decision."""
async def heartbeat(self, status_summary: str = None) -> None:
"""Send a heartbeat (automatic in SDK, manual override available)."""
async def handoff(self, to_coordinator: str, reason: str) -> None:
"""Transfer coordination to another coordinator."""
async def check_guardrails(self) -> GuardrailStatus:
"""Check current guardrail usage and remaining budget."""
8.4 YAML Workflow Integration#
name: quarterly_compliance
version: "1.0"
coordinator:
agent: llm-coordinator
type: composite
mode: propose-approve
supervisor: compliance-officer
guardrails:
max_budget_usd: 50
max_tasks_per_plan: 20
max_delegation_depth: 3
requires_plan_review: true
auto_escalate_after_failures: 3
require_progress_every_minutes: 30
heartbeat_interval: 60
failover:
pool: [llm-coordinator-backup]
grace_period_seconds: 180
intents:
compliance_report:
description: "Generate quarterly compliance report"
permissions:
policy: restricted
allow:
- agent: llm-coordinator
grant: [coordinate, delegate]
- agent: data-agent
grant: [execute]
- agent: compliance-officer
grant: [approve, supervise]
plan:
tasks:
- name: fetch_financials
capabilities: [data_access, finance]
timeout: 300
- name: fetch_hr_data
capabilities: [data_access, hr]
timeout: 300
- name: run_analysis
capabilities: [analytics]
depends_on: [fetch_financials, fetch_hr_data]
- name: generate_report
capabilities: [reporting]
depends_on: [run_analysis]
checkpoints:
- after: run_analysis
requires_approval: true
approvers: [compliance-officer]
on_failure: pause_and_escalate
9. API Endpoints#
Coordinator Management#
| Method | Path | Description |
|---|---|---|
POST |
/v1/coordinators |
Register a coordinator |
GET |
/v1/coordinators/{id} |
Get coordinator details |
PATCH |
/v1/coordinators/{id} |
Update coordinator registration |
POST |
/v1/intents/{id}/coordinator |
Assign coordinator to intent |
DELETE |
/v1/intents/{id}/coordinator |
Remove coordinator from intent |
POST |
/v1/coordinators/{id}/heartbeat |
Send heartbeat |
POST |
/v1/coordinators/{id}/handoff |
Initiate coordinator handoff |
Guardrails#
| Method | Path | Description |
|---|---|---|
GET |
/v1/coordinators/{id}/guardrails |
Get active guardrails |
PATCH |
/v1/coordinators/{id}/guardrails |
Update guardrails (supervisor only) |
GET |
/v1/coordinators/{id}/guardrails/status |
Get guardrail usage and remaining budget |
Decision Records#
| Method | Path | Description |
|---|---|---|
GET |
/v1/intents/{id}/decisions |
List decision records for an intent |
GET |
/v1/decisions/{id} |
Get a specific decision record |
POST |
/v1/intents/{id}/decisions |
Create a decision record |
Supervisor Actions#
| Method | Path | Description |
|---|---|---|
POST |
/v1/coordinators/{id}/pause |
Pause coordinator |
POST |
/v1/coordinators/{id}/resume |
Resume coordinator |
POST |
/v1/coordinators/{id}/replace |
Replace coordinator |
POST |
/v1/plans/{id}/approve |
Approve a plan at review gate |
POST |
/v1/plans/{id}/reject |
Reject a plan at review gate |
All endpoints support optimistic concurrency via If-Match headers.
10. Event Types#
| Event | Data |
|---|---|
coordinator.registered |
coordinator_id, type, capabilities |
coordinator.assigned |
coordinator_id, intent_id, supervisor_id |
coordinator.heartbeat |
coordinator_id, active_tasks, budget_used |
coordinator.paused |
coordinator_id, paused_by, reason |
coordinator.resumed |
coordinator_id, resumed_by |
coordinator.unresponsive |
coordinator_id, last_heartbeat, missed_count |
coordinator.failed_over |
old_coordinator_id, new_coordinator_id, state_transferred |
coordinator.handoff |
from_coordinator, to_coordinator, reason |
coordinator.replaced |
old_coordinator_id, new_coordinator_id, replaced_by, reason |
coordinator.completed |
coordinator_id, summary |
coordinator.decision |
decision_type, summary, rationale, confidence |
coordinator.escalation_initiated |
coordinator_id, reason, escalated_to |
coordinator.escalation_resolved |
coordinator_id, resolved_by, action |
coordinator.plan_overridden |
coordinator_id, overridden_by, old_plan_version, new_plan_version |
coordinator.guardrails_updated |
coordinator_id, updated_by, changes |
coordinator.guardrail_warning |
coordinator_id, guardrail, current_value, limit |
coordinator.guardrail_violation |
coordinator_id, guardrail, attempted_value, limit |
plan.approved_by_supervisor |
plan_id, approved_by |
plan.rejected_by_supervisor |
plan_id, rejected_by, reason |
decision.auto_approved |
decision_id, confidence, threshold |
11. Interaction with Existing RFCs#
| RFC | Interaction |
|---|---|
| RFC-0001 (Intents) | Coordinators are assigned to intents. Intent state reflects coordinator status. |
| RFC-0003 (Leasing) | Coordinator lease extends lease model with heartbeat, guardrails, and supervisor. |
| RFC-0004 (Governance) | Coordinator decisions integrate with governance pipeline. Supervisor actions use arbitration. |
| RFC-0006 (Subscriptions) | Clients can subscribe to coordinator events (heartbeats, decisions, escalations). |
| RFC-0007 (Portfolios) | Coordinator lease can scope to a portfolio for cross-intent coordination. |
| RFC-0009 (Cost Tracking) | Budget guardrails integrate with cost tracking. Decision records include cost impact. |
| RFC-0010 (Retry Policies) | Coordinator failure triggers retry/failover. Task retry counts towards guardrail limits. |
| RFC-0011 (Access Control) | New permission grants (coordinate, approve, supervise). Task-level delegation scoping. |
| RFC-0012 (Tasks & Plans) | Coordinator creates and manages plans and tasks. Plan review gates are coordinator-level checkpoints. |
Open Questions#
-
Coordinator-to-coordinator trust: When a coordinator delegates sub-coordination to another LLM coordinator, what trust model applies? Should there be capability attestation or reputation scoring?
-
Decision record granularity: How much detail should decision records contain? Every micro-decision (which agent to pick) vs. only major decisions (plan creation, re-planning)? Too much noise defeats the purpose of auditability.
-
Guardrail inheritance: Should sub-coordinators inherit their parent coordinator's guardrails, or should they be independently configured? Inheritance is simpler but may be too restrictive.
-
Multi-coordinator consensus: For high-stakes intents, should the protocol support multiple coordinators that must reach consensus before acting? This adds resilience but also complexity and latency.