RFC-0005: Attachments & Multi-modality v1.0#
Status: Proposed
Created: 2026-02-01
Authors: OpenIntent Contributors
Requires: RFC-0001 (Intents)
Abstract#
This RFC defines support for file attachments on intents, enabling multi-modal workflows involving images, audio, video, documents, and other binary content.
Motivation#
Modern AI agents frequently work with multi-modal content:
- Vision tasks: Image analysis, OCR, visual QA
- Audio processing: Transcription, voice commands, music analysis
- Document workflows: PDF parsing, contract review, data extraction
- Video understanding: Content moderation, scene detection, summarization
The protocol must provide a standard mechanism for associating binary content with intents so that agents can exchange rich media as part of structured coordination.
Data Model#
Attachment Object#
{
"id": "uuid",
"intent_id": "uuid",
"filename": "document.pdf",
"mime_type": "application/pdf",
"size": 1048576,
"storage_url": "https://storage.example.com/files/abc123",
"metadata": {
"width": null,
"height": null,
"duration": null,
"pages": 24
},
"uploaded_by": "agent-id",
"created_at": "ISO 8601"
}
Supported Metadata#
Metadata fields are content-type specific:
| Field | Applies To | Description |
|---|---|---|
width |
Images, Video | Width in pixels |
height |
Images, Video | Height in pixels |
duration |
Audio, Video | Duration in seconds |
pages |
Documents | Number of pages |
encoding |
Audio | Audio encoding format |
frame_rate |
Video | Frames per second |
Endpoints#
| Method | Path | Description |
|---|---|---|
POST |
/v1/intents/{id}/attachments |
Add attachment |
GET |
/v1/intents/{id}/attachments |
List attachments |
DELETE |
/v1/intents/{id}/attachments/{attachmentId} |
Remove attachment |
Example: Image Analysis Workflow#
# Upload an attachment reference
curl -X POST http://localhost:8000/api/v1/intents/{id}/attachments \
-H "X-API-Key: dev-user-key" \
-d '{
"filename": "receipt.jpg",
"mime_type": "image/jpeg",
"size": 245000,
"storage_url": "https://storage.example.com/receipts/r123.jpg",
"metadata": { "width": 1920, "height": 1080 }
}'
Storage Considerations#
This RFC defines the attachment metadata model only. The actual file storage mechanism (S3, GCS, local filesystem, etc.) is left to the implementation. The storage_url field provides the indirection needed to support any storage backend.
Integration with Other RFCs#
- RFC-0015 (Agent Memory): Memory entries can reference attachments for rich context
- RFC-0008 (LLM Integration): Adapters can include attachment content in LLM prompts for multi-modal inference