RFC-0026: Suspension Propagation & Retry v1.0#
Status: Accepted
Version: v0.17.0
Date: 2026-03-24
Authors: OpenIntent Working Group
Extends: RFC-0025 (Human-in-the-Loop)
Abstract#
RFC-0025 introduced suspended_awaiting_input as an intent-level lifecycle state but left three gaps: (1) how suspension interacts with container structures (intent graphs, portfolios, plans, workflows); (2) a single-shot timeout model with no re-notification or escalation ladder; and (3) no platform- or agent-level default for suspension policy. RFC-0026 closes all three gaps in a single coherent extension so the protocol has complete, end-to-end coverage of human engagement.
1. Motivation#
Gap 1 — Container semantics#
RFC-0025 defines suspension at the intent level but does not specify how containers observe it:
- RFC-0002 (Intent Graphs):
aggregate_status.by_statushas no entry forsuspended_awaiting_input. The completion gate does not explicitly say whether a suspended dependency satisfies it. Noactive → blockedtrigger is defined for upstream suspension. - RFC-0007 (Portfolios): The aggregate status algorithm does not enumerate suspension. The GET response has no suspension-aware fields.
- RFC-0012 (Plans & Tasks): The task
blockedstate was designed before RFC-0025 and has no defined relationship tosuspended_awaiting_input. When a phase-agent callsrequest_input(), the plan task does not transition toblocked. The coordinator sees the task as still running. - RFC-0024 (Workflows):
validate_claim_inputs()has no rejection reason for upstream suspension. The workflow progress object has nosuspended_phasesfield.
Gap 2 — Human retry / re-notification#
RFC-0025 timeout model is single-shot: one window, then fallback policy fires. A missed Slack notification should not immediately trigger complete_with_fallback="deny". Systems need grace — notify once, re-notify, escalate, then fail.
Gap 3 — Platform and agent-level defaults#
Every request_input() call must specify its own policy from scratch. There is no platform-level constant and no agent-level default.
2. Container Rules (Five, Non-Negotiable)#
Rule 1 — Suspension is always intent-local#
Only the suspended intent changes to suspended_awaiting_input. Container structures (parent intents, portfolios, plans, workflows) observe it; they never absorb it into their own state. A portfolio does not become suspended because a member is suspended.
Rule 2 — Suspended intent is "not completed"; dependents stay blocked#
suspended_awaiting_input does NOT satisfy the RFC-0002 completion gate. A dependent intent that is active and whose upstream suspends MUST transition to blocked (new active → blocked trigger). Auto-unblock fires when the dependency resumes and subsequently completes, not on resume alone.
Rule 3 — RFC-0012 tasks mirror intent suspension bidirectionally#
When an intent transitions to suspended_awaiting_input, its corresponding plan task MUST transition to blocked with:
On intent.resumed, the task transitions back to running. RFC-0012 checkpoints that require human approval SHOULD be implemented via RFC-0025 request_input() — this is the canonical pattern going forward.
Rule 4 — Container aggregates gain suspension-aware fields#
RFC-0002 parent intents:
{
"aggregate_status": {
"total": 6,
"by_status": {
"completed": 3,
"active": 2,
"blocked": 0,
"suspended_awaiting_input": 1
}
}
}
RFC-0007 portfolios:
Portfolio GET response adds two fields:
Aggregate status algorithm (revised):
| Condition | Aggregate status |
|---|---|
All members completed |
completed |
Any member failed or abandoned |
failed |
| Otherwise (including any suspended) | in_progress |
RFC-0012 plans:
Plan progress object gains:
{
"suspended_tasks": [
{
"task_id": "task_01XYZ",
"intent_id": "intent_01ABC",
"suspended_since": "2026-03-24T10:00:00Z",
"expires_at": "2026-03-24T13:00:00Z"
}
]
}
RFC-0024 workflows:
Workflow progress object gains:
{
"suspended_phases": [
{
"phase_name": "compliance_review",
"intent_id": "intent_01ABC",
"suspended_since": "2026-03-24T10:00:00Z",
"expires_at": "2026-03-24T13:00:00Z"
}
]
}
Rule 5 — Portfolio deadline takes precedence over suspension timeout#
If governance.deadline fires while a member intent is suspended_awaiting_input, the server MUST abandon the intent with abandonment_reason: "portfolio_deadline_exceeded", bypassing fallback_policy. intent.suspension_expired is still emitted with reason: "portfolio_deadline" for audit.
3. Coordinator Suspension Policy (RFC-0013 Extension)#
Coordinator leases gain an optional suspension_policy field:
| Value | Behaviour |
|---|---|
isolate |
Default. No action beyond aggregate status update. |
block_dependents |
Coordinator explicitly pauses RFC-0024-wired downstream phases. |
escalate |
Emits coordinator.escalation_required or self-suspends. |
4. Human Retry / Re-notification Policy#
4.1 The HumanRetryPolicy Object#
{
"max_attempts": 3,
"interval_seconds": 3600,
"strategy": "fixed",
"escalation_ladder": [
{ "attempt": 2, "channel_hint": "email", "notify_to": null },
{ "attempt": 3, "channel_hint": "pagerduty", "notify_to": "supervisor@example.com" }
],
"final_fallback_policy": "fail"
}
| Field | Type | Default | Description |
|---|---|---|---|
max_attempts |
integer | 1 | Total notification attempts (including initial) |
interval_seconds |
integer | — | Seconds between re-notification attempts (≤ timeout_seconds) |
strategy |
"fixed" |
"fixed" |
Re-notification cadence strategy |
escalation_ladder |
array | [] |
Per-attempt channel/recipient overrides |
final_fallback_policy |
enum | (inherited) | Policy to apply after all attempts exhausted |
4.2 How It Works#
- Attempt 1 fires immediately when
request_input()is called.timeout_secondsbecomes the per-attempt window. - If the operator does not respond within
interval_seconds(≤timeout_seconds), a re-notification fires and the attempt counter increments. - Each
escalation_ladderentry triggers at itsattemptnumber, overridingchannel_hintand optionally routing to a differentnotify_toidentity. - After
max_attemptsnotifications with no response,final_fallback_policyis applied. - Total suspension window =
interval_seconds × max_attempts.expires_atonSuspensionRecordreflects this total deadline. suspension_idis unchanged across all attempts — the operator can respond to the original request at any point.
4.3 Backwards Compatibility#
The existing fallback_policy field on SuspensionRecord is kept as an alias:
fallback_policywith noretry_policyis equivalent toHumanRetryPolicy(max_attempts=1).- When a
retry_policyis present,final_fallback_policyinside it takes precedence over the top-levelfallback_policy.
4.4 New Events#
| Event | When emitted |
|---|---|
intent.suspension_renotified |
Before each re-notification attempt (attempt ≥ 2) |
intent.suspension_escalated |
When an escalation_ladder entry triggers |
intent.suspension_renotified payload:
{
"suspension_id": "susp-uuid",
"attempt": 2,
"max_attempts": 3,
"channel_hint": "email",
"notify_to": null,
"next_attempt_at": "2026-03-24T11:00:00Z"
}
intent.suspension_escalated payload:
{
"suspension_id": "susp-uuid",
"attempt": 3,
"escalated_to": "supervisor@example.com",
"channel_hint": "pagerduty"
}
Existing intent.suspension_expired fires after all attempts exhausted, then final_fallback_policy executes.
4.5 @on_input_requested Re-fired on Each Attempt#
The existing @on_input_requested decorator is called again with attempt number in the suspension context on each re-notification. Agents can customize messages:
@on_input_requested
async def notify_operator(self, intent, suspension):
attempt = suspension.context.get("_attempt", 1)
if attempt == 1:
msg = f"Input needed: {suspension.question}"
elif attempt < suspension.context.get("_max_attempts", 1):
msg = f"Reminder ({attempt}): {suspension.question}"
else:
msg = f"URGENT — final reminder: {suspension.question}"
await send_notification(msg, channel=suspension.channel_hint)
5. Three-Level Configuration Cascade#
server config → BaseAgent default → request_input() call
───────────────────── ──────────────────────── ──────────────────────
default_human_retry_ default_human_retry_ retry_policy=
policy: { policy: { HumanRetryPolicy(
max_attempts: 3, max_attempts: 2, max_attempts: 1,
interval_seconds: 3600 interval_seconds: 1800 interval_seconds: 300
} } )
Resolution: per-suspension overrides agent default overrides platform default. Any field not specified at a lower level inherits from the level above.
Platform constant location: Server config file (openintent.yaml) under suspension.default_retry_policy. Exposed via GET /v1/server/config (read-only, for client introspection).
Agent-level default: BaseAgent.default_human_retry_policy — a HumanRetryPolicy instance set in the agent definition or __init__. If None, platform default applies.
6. RFC-0024 Patch: validate_claim_inputs() Rejection Reason#
When an agent attempts to claim a task whose declared inputs reference an upstream phase that is currently suspended_awaiting_input, validate_claim_inputs() MUST reject with:
raise UpstreamIntentSuspendedError(
task_id=task_id,
phase_name=phase_name,
suspended_intent_id="<upstream_intent_id>",
expected_resume_at="<ISO-8601 or None>",
)
This is a new exception type (upstream_intent_suspended) that the executor surfaces as a claim rejection reason. The downstream task stays in pending / ready state and retries the claim check after the upstream resumes.
7. Relationship to RFC-0010 (Retry Policies)#
RFC-0010 defines retry when the agent fails (picks a new agent attempt). RFC-0026 defines retry when the human fails to respond (resends notification, escalates channel). They are parallel constructs at different layers:
| Dimension | RFC-0010 | RFC-0026 |
|---|---|---|
| What failed? | Agent execution | Human responsiveness |
| What retries? | Agent assignment | Human notification |
| State during retry | Intent may be reassigned | Intent stays suspended_awaiting_input |
| Infrastructure | Scheduled retry job | Scheduled re-notification job |
The server SHOULD use the same scheduled-job infrastructure for both.
8. Python SDK — HumanRetryPolicy#
from openintent import HumanRetryPolicy
policy = HumanRetryPolicy(
max_attempts=3,
interval_seconds=3600,
escalation_ladder=[
{"attempt": 2, "channel_hint": "email"},
{"attempt": 3, "channel_hint": "pagerduty", "notify_to": "supervisor@example.com"},
],
final_fallback_policy="fail",
)
value = await self.request_input(
intent_id,
question="Should we proceed with the refund?",
response_type="choice",
choices=[...],
timeout_seconds=3600,
retry_policy=policy,
)
BaseAgent gains default_human_retry_policy:
@Agent("my-agent")
class MyAgent:
default_human_retry_policy = HumanRetryPolicy(
max_attempts=2,
interval_seconds=1800,
final_fallback_policy="complete_with_fallback",
)
9. End-to-End Motivating Example#
Scenario: Multi-phase compliance workflow. Phase 2 (compliance_review) requires human sign-off before Phase 3 (generate_report) can run.
Phase 1: fetch_data → completes OK
Phase 2: compliance_review → agent calls request_input()
Phase 3: generate_report → depends_on: compliance_review
Timeline:
| Time | Event |
|---|---|
| T+0 | Phase 2 agent calls request_input() with retry_policy(max_attempts=3, interval_seconds=3600) |
| T+0 | Intent 2 → suspended_awaiting_input |
| T+0 | Task 2 → blocked (blocked_reason: "intent_suspended") |
| T+0 | Intent 3 → blocked (upstream suspended, does not satisfy completion gate) |
| T+0 | intent.suspended emitted; @on_input_requested fires (attempt=1) → Slack message sent |
| T+0 | Portfolio: has_suspended_members: true, suspended_member_count: 1 |
| T+3600 | No response. intent.suspension_renotified emitted (attempt=2) |
| T+3600 | @on_input_requested fires again (attempt=2, channel_hint="email") → email sent |
| T+3600 | intent.suspension_escalated emitted (attempt=2) |
| T+5400 | Operator responds via POST /intents/{id}/suspend/respond |
| T+5400 | Intent 2 → active → completed |
| T+5400 | Task 2 → running → completed |
| T+5400 | Intent 3 → active (dependency now completed) |
| T+5400 | Task 3 claims Phase 3 inputs from Phase 2 outputs — validate_claim_inputs() succeeds |
| T+5500 | Phase 3 completes. Workflow done. |
What did NOT happen: Phase 3 did not try to claim while Phase 2 was suspended. The coordinator saw the suspension in the aggregate. The portfolio deadline was not exceeded.
10. Cross-RFC Patch Summary#
RFC-0002 patches#
- Status enum: add
suspended_awaiting_inputtoby_statusinaggregate_status. - Completion gate: explicitly states
suspended_awaiting_inputdoes NOT satisfy the gate. - New
active → blockedtrigger: upstream dependency transitions tosuspended_awaiting_input. - Cross-RFC table: add RFC-0026.
RFC-0007 patches#
- Aggregate status algorithm: enumerated explicitly (completed/failed/in_progress).
- GET response: add
has_suspended_members: bool,suspended_member_count: int. - New events:
portfolio.member_suspended,portfolio.member_resumed. - Cross-RFC table: add RFC-0026.
RFC-0012 patches#
- Bidirectional task/intent relationship: task
blocked↔ intentsuspended_awaiting_input. blocked_reason: "intent_suspended"andsuspended_intent_idon blocked task.- Plan progress: add
suspended_tasksarray. - Checkpoints: explicitly documented as RFC-0025
request_input()triggers (canonical pattern). - Cross-RFC table: add RFC-0026.
RFC-0024 patches#
validate_claim_inputs(): addupstream_intent_suspendedrejection reason (UpstreamIntentSuspendedError).- Workflow progress: add
suspended_phasesarray. - Cross-RFC table: add RFC-0026.
RFC-0025 patches#
SuspensionRecord: addretry_policyfield (optionalHumanRetryPolicy).timeout_secondssemantics: clarified as per-attempt window whenretry_policyis set.fallback_policy: documented as alias forHumanRetryPolicy(max_attempts=1, final_fallback_policy=...).- Cross-RFC table: add RFC-0026, RFC-0010.
- Backwards compatibility: note
fallback_policyunchanged;retry_policyis additive.
RFC-0010 patches#
- Cross-RFC table: add RFC-0026 with note on parallel retry constructs.
11. Security Considerations#
- Re-notification payloads to external channels (Slack, PagerDuty) MUST NOT include secrets or PII in
questionorcontextfields. escalation_ladder.notify_toidentity values should be validated against an allowlist before delivery.- Multiple re-notification attempts increase the attack surface for replay;
suspension_idSHOULD remain the same (see §4.2 item 6) and the server MUST reject duplicate responses after the first.
12. Backwards Compatibility#
retry_policyonSuspensionRecordis optional. Existingfallback_policyfield continues to work unchanged with single-attempt semantics.UpstreamIntentSuspendedErroris a new exception class; callers that only catchUnresolvableInputErrorwill see uncaught exceptions if they don't update. Callers should catchWorkflowErrorfor robust handling.- New events (
intent.suspension_renotified,intent.suspension_escalated) are additive; existing subscriptions propagate them through the same infrastructure. has_suspended_members/suspended_member_countare additive fields on portfolio GET responses; existing clients that ignore unknown fields are unaffected.suspended_tasks/suspended_phasesare additive fields on progress objects.