RFC-0008: LLM Integration & Observability v1.0#

Status: Proposed
Created: 2026-02-01
Authors: OpenIntent Contributors
Requires: RFC-0001 (Intents), RFC-0009 (Cost Tracking)

Abstract#

This RFC defines the integration layer between the OpenIntent protocol and Large Language Model providers. It standardizes how LLM calls are initiated, tracked, and audited within the protocol, including token usage, cost attribution, streaming support, and distributed tracing.

Motivation#

LLMs are the primary reasoning engine for most AI agents. The protocol must:

Track token usage: Every LLM call consumes tokens with associated costs
Attribute costs: Link LLM usage to specific intents and agents
Enable observability: Distributed tracing across multi-model workflows
Support streaming: Real-time token streaming for responsive agents
Remain provider-neutral: Work with any LLM provider through adapters

Adapter Architecture#

The SDK provides a pluggable adapter system for LLM providers:

from openintent.adapters import (
    OpenAIAdapter,
    AnthropicAdapter,
    GeminiAdapter,
    GrokAdapter,
    DeepSeekAdapter,
    AzureOpenAIAdapter,
    OpenRouterAdapter,
)

Base Adapter Interface#

All adapters implement a common interface:

class BaseAdapter:
    async def complete(self, messages, **kwargs) -> LLMResponse
    async def stream(self, messages, **kwargs) -> AsyncIterator[str]

    # Streaming hooks
    def _invoke_stream_start(self, metadata)
    def _invoke_on_token(self, token)
    def _invoke_stream_end(self, metadata)
    def _invoke_stream_error(self, error)

Streaming Hooks#

Adapters support lifecycle hooks for observability:

from openintent.adapters import AdapterConfig

config = AdapterConfig(
    on_stream_start=lambda meta: print(f"Stream started: {meta}"),
    on_token=lambda token: print(token, end=""),
    on_stream_end=lambda meta: print(f"\nStream ended: {meta}"),
    on_stream_error=lambda err: print(f"Error: {err}"),
)

All hooks use a fail-safe pattern — exceptions in hooks are caught and logged without breaking the main flow.

Event Types#

LLM interactions produce protocol events:

Event Type	Description
`llm_request_started`	LLM call initiated with model, prompt tokens
`llm_request_completed`	LLM call finished with completion tokens, latency
`llm_request_failed`	LLM call failed with error details
`llm_stream_started`	Streaming response began
`llm_stream_completed`	Streaming response finished

Cost Attribution#

Each LLM call automatically records cost data (RFC-0009):

{
  "intent_id": "uuid",
  "agent_id": "agent-research",
  "cost_type": "tokens",
  "provider": "openai",
  "metadata": {
    "model": "gpt-4",
    "prompt_tokens": 1200,
    "completion_tokens": 300,
    "total_tokens": 1500,
    "latency_ms": 2340
  }
}

Distributed Tracing#

LLM calls are integrated with distributed tracing (OpenTelemetry compatible):

Each LLM call creates a span linked to the parent intent
Spans include model, token counts, latency, and provider
Trace context propagates across agent-to-agent coordination

Context Packing#

The adapter layer manages context window packing strategy:

Selects which memory entries (RFC-0015) to include in prompts
Respects token limits per model
Prioritizes recent and relevant context
Supports structured context injection from permissions (RFC-0011)

Cross-RFC Interactions#

RFC	Interaction
RFC-0009 (Costs)	Automatic token and cost tracking per LLM call
RFC-0011 (Access)	Context injection based on agent permissions
RFC-0012 (Tasks)	LLM adapters used within task execution
RFC-0015 (Memory)	Memory entries packed into LLM context windows