Channels

All channels support text, images, files, voice, and video. Live status updates show what the agent is doing ("⏳ Thinking..." → "🔧 Using shell_exec..." → response).

All channels share the same command vocabulary: /new, /stop, /status, /queue, /help, /usage.

yaml

channels:
  telegram:
    token: ${TELEGRAM_BOT_TOKEN}
    allowed_users: ["123456789", "987654321"]

Features: inline stop button, voice transcription (faster-whisper local or OpenAI Whisper API), markdown rendering via HTML parse mode.

Discord

yaml

channels:
  discord:
    token: ${DISCORD_BOT_TOKEN}
    allowed_users:
      - "123456789012345678"
    # allowed_guilds: []         # [] = any server
    # listen_channels: []        # [] = mention required
    # dm_only: false             # true = DMs only

allowed_users is mandatory — the channel refuses to start without it. Unauthorized messages are silently ignored. Supports native slash commands and an inline stop button.

WhatsApp (Green API)

yaml

channels:
  whatsapp:
    green_api_id: ${GREEN_API_ID}
    green_api_token: ${GREEN_API_TOKEN}
    allowed_users:
      - "391234567890"

No message editing and no inline buttons — users cancel with /stop text command.

WebSocket (Desktop/Web App)

The desktop app and web app connect to the gateway over Iroh QUIC, authenticated by device certificates. No port configuration or shared tokens needed — the invite ticket carries the coordinator's NodeId, and the client dials directly via Iroh.

For development or custom clients, the loopback proxy exposes a local TCP endpoint:

bash

# The loopback proxy bridges localhost to the agent over Iroh
openagent-cli proxy

This exposes localhost:PORT that acts as a plain HTTP/WS gateway, with the proxy handling Iroh transport and device cert presentation transparently.

Running Multiple Channels

bash

openagent serve                  # all configured channels
openagent serve -ch telegram     # specific channel only

Media Support

Files & images from user → agent

Upload endpoint:

POST /api/upload        (multipart/form-data, field: "file")

Returns {path, filename, transcription?}. The path is a local absolute path the agent can read with the filesystem MCP (filesystem_read_text_file, filesystem_read_media_file, filesystem_get_file_info, etc.). On macOS it's a /private/var/folders/.../T/oa_upload_<rand>/<filename> realpath — already resolved so the filesystem MCP's allowlist check doesn't reject it.

Flow:

Client (web app, desktop, any bridge) posts the file to /api/upload.
The returned path goes into the next chat message text (e.g. "Summarise the file at /private/var/.../report.pdf") — OR into the WS attachments field if the client builds one directly via Agent.run(attachments=[...]).
The LLM calls a filesystem MCP tool with that path to read content.

Agent → user attachments

The agent signals attachments back to the client by emitting markers in its reply text:

[IMAGE:/path/to/chart.png]
[FILE:/path/to/report.pdf]
[VOICE:/path/to/memo.ogg]
[VIDEO:/path/to/clip.mp4]

The gateway strips these markers from the response text and delivers them as a structured attachments: [{type, path, filename}] array on the WS response message. Bridges render them as native media attachments (Telegram photo, Discord file, WhatsApp media).

Voice Transcription

Voice messages are transcribed automatically. Two backends (tried in order):

faster-whisper (local, free, no API key) — install with pip install openagent-framework[voice]
OpenAI Whisper API (cloud fallback) — requires OPENAI_API_KEY in environment

Channels ​

Telegram ​

Discord ​

WhatsApp (Green API) ​

WebSocket (Desktop/Web App) ​

Running Multiple Channels ​

Media Support ​

Files & images from user → agent ​

Agent → user attachments ​

Voice Transcription ​

Channels

Telegram

Discord

WhatsApp (Green API)

WebSocket (Desktop/Web App)

Running Multiple Channels

Media Support

Files & images from user → agent

Agent → user attachments

Voice Transcription