RFC-0008: LLM Integration & Observability v1.0#
Status: Proposed
Created: 2026-02-01
Authors: OpenIntent Contributors
Requires: RFC-0001 (Intents), RFC-0009 (Cost Tracking)
Abstract#
This RFC defines the integration layer between the OpenIntent protocol and Large Language Model providers. It standardizes how LLM calls are initiated, tracked, and audited within the protocol, including token usage, cost attribution, streaming support, and distributed tracing.
Motivation#
LLMs are the primary reasoning engine for most AI agents. The protocol must:
- Track token usage: Every LLM call consumes tokens with associated costs
- Attribute costs: Link LLM usage to specific intents and agents
- Enable observability: Distributed tracing across multi-model workflows
- Support streaming: Real-time token streaming for responsive agents
- Remain provider-neutral: Work with any LLM provider through adapters
Adapter Architecture#
The SDK provides a pluggable adapter system for LLM providers:
from openintent.adapters import (
OpenAIAdapter,
AnthropicAdapter,
GeminiAdapter,
GrokAdapter,
DeepSeekAdapter,
AzureOpenAIAdapter,
OpenRouterAdapter,
)
Base Adapter Interface#
All adapters implement a common interface:
class BaseAdapter:
async def complete(self, messages, **kwargs) -> LLMResponse
async def stream(self, messages, **kwargs) -> AsyncIterator[str]
# Streaming hooks
def _invoke_stream_start(self, metadata)
def _invoke_on_token(self, token)
def _invoke_stream_end(self, metadata)
def _invoke_stream_error(self, error)
Streaming Hooks#
Adapters support lifecycle hooks for observability:
from openintent.adapters import AdapterConfig
config = AdapterConfig(
on_stream_start=lambda meta: print(f"Stream started: {meta}"),
on_token=lambda token: print(token, end=""),
on_stream_end=lambda meta: print(f"\nStream ended: {meta}"),
on_stream_error=lambda err: print(f"Error: {err}"),
)
All hooks use a fail-safe pattern — exceptions in hooks are caught and logged without breaking the main flow.
Event Types#
LLM interactions produce protocol events:
| Event Type | Description |
|---|---|
llm_request_started |
LLM call initiated with model, prompt tokens |
llm_request_completed |
LLM call finished with completion tokens, latency |
llm_request_failed |
LLM call failed with error details |
llm_stream_started |
Streaming response began |
llm_stream_completed |
Streaming response finished |
Cost Attribution#
Each LLM call automatically records cost data (RFC-0009):
{
"intent_id": "uuid",
"agent_id": "agent-research",
"cost_type": "tokens",
"provider": "openai",
"metadata": {
"model": "gpt-4",
"prompt_tokens": 1200,
"completion_tokens": 300,
"total_tokens": 1500,
"latency_ms": 2340
}
}
Distributed Tracing#
LLM calls are integrated with distributed tracing (OpenTelemetry compatible):
- Each LLM call creates a span linked to the parent intent
- Spans include model, token counts, latency, and provider
- Trace context propagates across agent-to-agent coordination
Context Packing#
The adapter layer manages context window packing strategy:
- Selects which memory entries (RFC-0015) to include in prompts
- Respects token limits per model
- Prioritizes recent and relevant context
- Supports structured context injection from permissions (RFC-0011)
Cross-RFC Interactions#
| RFC | Interaction |
|---|---|
| RFC-0009 (Costs) | Automatic token and cost tracking per LLM call |
| RFC-0011 (Access) | Context injection based on agent permissions |
| RFC-0012 (Tasks) | LLM adapters used within task execution |
| RFC-0015 (Memory) | Memory entries packed into LLM context windows |