Builtins¶
Builtins are optional capabilities conditionally loaded based on API key availability and installed packages. They come in two types: tools (agent-invokable functions) and processors (message transformers).
Overview¶
OpenPaw ships with 17 built-in tools and 4 message processors. Builtins are discovered at runtime — if prerequisites (API keys, packages) are missing, the builtin is unavailable. The allow/deny system provides fine-grained control over which capabilities are active in each workspace.
Architecture:
BuiltinRegistry
├─ Tools (17)
│ ├─ browser Web automation via Playwright
│ ├─ email Email send/receive via Gmail
│ ├─ brave_search Web search
│ ├─ spawn Sub-agent spawning
│ ├─ cron Agent self-scheduling
│ ├─ cron_manager Persistent YAML cron management
│ ├─ acknowledge Silent system event acknowledgment
│ ├─ task_tracker Persistent task management
│ ├─ send_message Mid-execution messaging
│ ├─ report_progress Structured progress reporting
│ ├─ send_file Send workspace files to users
│ ├─ followup Self-continuation
│ ├─ plan Session-scoped planning
│ ├─ channel_history Channel history browsing
│ ├─ memory_search Semantic conversation search
│ ├─ shell Local command execution
│ ├─ md2pdf Markdown-to-PDF conversion
│ └─ elevenlabs Text-to-speech
│
└─ Processors (4)
├─ file_persistence Universal file upload handling
├─ whisper Audio transcription
├─ timestamp Message timestamp injection
└─ docling Document-to-markdown conversion
Processor Pipeline Order: file_persistence → whisper → timestamp → docling
The order matters — file_persistence runs first to save uploaded files, then downstream processors (whisper, docling) can read from disk.
Tools¶
browser¶
Group: browser
Type: Tool (11 functions)
Prerequisites: playwright (core dependency), chromium browser installed
Web automation via Playwright with accessibility tree navigation. Agents interact with pages via numeric element references instead of writing CSS selectors.
Available Functions:
- browser_navigate — Navigate to a URL (respects domain allowlist/blocklist)
- browser_snapshot — Get current page state as numbered accessibility tree
- browser_click — Click an element by numeric reference
- browser_type — Type text into an input field by numeric reference
- browser_select — Select dropdown option by numeric reference
- browser_scroll — Scroll the page (up/down/top/bottom)
- browser_back — Navigate back in browser history
- browser_screenshot — Capture page screenshot (saved to workspace/screenshots/)
- browser_close — Close current page/tab
- browser_tabs — List all open tabs
- browser_switch_tab — Switch to a different tab by index
Security Model:
Domain allowlisting and blocklisting prevent unauthorized navigation. If allowed_domains is non-empty, only those domains (and subdomains with *. prefix) are permitted. The blocked_domains list takes precedence and denies specific domains even if allowed.
Configuration:
builtins:
browser:
enabled: true
config:
headless: true # Run browser without GUI
allowed_domains: # Allowlist (empty = allow all)
- "calendly.com"
- "*.google.com" # Subdomain wildcard
blocked_domains: [] # Blocklist (takes precedence)
timeout_seconds: 30 # Default timeout for operations
persist_cookies: false # Persist cookies across agent runs
downloads_dir: "downloads" # Where to save downloaded files
screenshots_dir: "screenshots" # Where to save screenshots
Installation:
Usage Example:
User: "Book a meeting on my Calendly for tomorrow at 2pm"
Agent: [Calls browser_navigate("https://calendly.com/myaccount")]
Agent: [Calls browser_snapshot() to see page elements]
Agent: [Calls browser_click(42) to click the "Schedule" button (element #42)]
Agent: [Fills in meeting details and confirms booking]
Lifecycle:
Browser instances are lazily initialized (no browser created until first use). Each session gets its own browser context. Browsers are automatically cleaned up on /new, /compact, and workspace shutdown.
Cookie Persistence:
When persist_cookies: true, authentication state and cookies survive across agent runs within the same session. Cookies are cleared on conversation reset.
Downloads and Screenshots:
Files downloaded by the browser are saved to {workspace}/workspace/downloads/ with sanitized filenames. Page screenshots are saved to {workspace}/workspace/screenshots/ and returned as relative paths for agent reference.
brave_search¶
Group: web
Type: Tool
Prerequisites: BRAVE_API_KEY, poetry install -E web
Web search capability using the Brave Search API.
Configuration:
Usage Example:
User: "What's the latest news about Python 3.13?"
Agent: [Uses brave_search tool to find recent articles]
Agent: "According to recent sources, Python 3.13 introduces..."
spawn¶
Group: agent
Type: Tool (4 functions)
Prerequisites: None (always available)
Sub-agent spawning for concurrent background tasks. Sub-agents run in isolated contexts with filtered tools to prevent recursion and unsolicited messaging.
Available Functions:
- spawn_agent — Spawn a background sub-agent with a task prompt and label
- list_subagents — List all sub-agents (active and recently completed)
- get_subagent_result — Retrieve result of a completed sub-agent by ID
- cancel_subagent — Cancel a running sub-agent
Configuration:
builtins:
spawn:
enabled: true
config:
max_concurrent: 8 # Maximum simultaneous sub-agents (default: 8)
default_progress_interval: 5 # Minutes between progress updates (0 = disabled, default: 5)
Tool Exclusions:
Sub-agents have a restricted tool set to prevent recursion, unsolicited messaging, and persistent side effects:
- Spawning: no
spawn_agent(prevents sub-agent recursion) - Messaging: no
send_message,send_file(sub-agents cannot contact users directly) - Self-continuation: no
request_followup - Scheduling: no cron or dynamic scheduling tools (side effects outlive the sub-agent)
- Browser: no browser tools (browser sessions require a session key for cleanup)
- Cron manager: no persistent cron management tools (writes YAML files that persist after the sub-agent exits)
- Plan: no session-scoped planning tools (requires session key context)
Lifecycle:
pending → running → completed/failed/cancelled/timed_out. Running sub-agents exceeding their timeout are marked as timed_out during cleanup.
Notifications:
When notify: true (default), notifications are injected into the message queue for all terminal states — completed, failed, timed_out, and cancelled. Each notification includes a brief result summary and the session log path so the main agent can call read_file() on it for the full transcript.
Progress Updates:
Sub-agents emit progress updates every 5 minutes by default. Override per-spawn with progress_interval_minutes, or change the workspace default with default_progress_interval in the spawn config. Set to 0 to disable.
Progress messages are delivered as [SYSTEM] events to the main agent's queue and include elapsed time, tools called, and current activity. The main agent decides whether to relay updates to the user.
Usage Example:
User: "Research topic X in the background while I work on Y"
Agent: [Calls spawn_agent(task="Research topic X...", label="research-x", progress_interval_minutes=5)]
Sub-agent: [Runs concurrently, sends progress every 5 min, main agent continues working on Y]
System: [When complete, user receives notification with result summary]
Limits:
Maximum 8 concurrent sub-agents (configurable), timeout defaults to 30 minutes (1-120 range). Results are truncated at 50K characters to match read_file safety valve pattern.
Storage:
Sub-agent state persists to {workspace}/data/subagents.yaml and survives restarts. Completed/failed/cancelled requests older than 24 hours are automatically cleaned up on initialization.
cron¶
Group: agent
Type: Tool (4 functions)
Prerequisites: None (always available)
Agent self-scheduling for one-time and recurring tasks. Enables autonomous workflows like "remind me in 20 minutes" or "check on this PR every hour".
Available Functions:
- schedule_at — Schedule a one-time action at a specific timestamp
- schedule_every — Schedule a recurring action at fixed intervals
- list_scheduled — List all pending scheduled tasks
- cancel_scheduled — Cancel a scheduled task by ID
Configuration:
builtins:
cron:
enabled: true
config:
min_interval_seconds: 300 # Minimum interval for recurring tasks (default: 5 min)
max_tasks: 50 # Maximum pending tasks per workspace
Storage:
Tasks persist to {workspace}/data/dynamic_crons.json and survive restarts. One-time tasks are automatically cleaned up after execution or if expired on startup.
Routing:
Responses are sent back to the first allowed user in the workspace's channel config.
Usage Example:
User: "Ping me in 10 minutes to check on the deploy"
Agent: [Calls schedule_at with timestamp 10 minutes from now]
System: [Task fires, agent sends reminder to user's chat]
cron_manager¶
Group: automation
Type: Tool (4 functions)
Prerequisites: None (always available)
Persistent cron management — create, list, update, and delete YAML cron jobs that survive restarts. Unlike dynamic scheduling (schedule_at/schedule_every), cron_manager writes YAML files to config/crons/ that are loaded by the cron scheduler at startup alongside any hand-authored cron files. Changes are also applied to the live scheduler immediately — no workspace restart required.
Available Functions:
- create_cron — Create a new persistent cron job (validates expression, writes YAML, hot-adds to scheduler)
- list_crons — List all YAML crons with name, schedule, enabled status, and next run time
- update_cron — Update fields on an existing cron job (hot-reloads in scheduler)
- delete_cron — Remove a cron job file and unregister from scheduler
Configuration:
Comparison with Dynamic Scheduling:
| Feature | Dynamic (cron) |
Persistent (cron_manager) |
|---|---|---|
| Storage | data/dynamic_crons.json |
config/crons/{name}.yaml |
| Scheduling | One-time or interval-based | Standard cron expressions |
| Lifecycle | Auto-cleaned after execution (one-time) | Permanent until deleted |
| Restarts | Loaded from JSON on restart | Loaded from YAML on restart |
| Use case | "Remind me in 10 minutes" | "Daily summary at 9am" |
Name Validation:
Cron names must be lowercase alphanumeric with hyphens only (^[a-z0-9][a-z0-9-]*$). Names become filenames ({name}.yaml).
Usage Example:
User: "Set up a daily summary cron at 9am"
Agent: [Calls create_cron(name="daily-summary", schedule="0 9 * * *", prompt="Generate a daily summary...", delivery="channel")]
Agent: "Done — 'daily-summary' will run every day at 9:00 AM and send results to this chat."
task_tracker¶
Group: agent
Type: Tool (4 functions)
Prerequisites: None (always available)
Task management via TASKS.yaml for tracking long-running operations across heartbeats and sessions.
Available Functions:
- create_task — Create a new tracked task
- update_task — Update task status or notes
- list_tasks — List all tasks (optionally filtered by status)
- get_task — Retrieve a specific task by ID
Configuration:
Storage:
Tasks persist to {workspace}/data/TASKS.yaml. Thread-safe with atomic writes.
Integration with Heartbeat:
When active tasks exist, a compact summary is injected into the heartbeat prompt as <active_tasks> XML tags. This avoids an extra LLM tool call to list_tasks().
Usage Example:
Agent: [Calls create_task(title="Monitor deploy", status="in_progress")]
Agent: [Works on the task]
Agent: [Calls update_task(task_id="task-001", status="completed")]
send_message¶
Group: agent
Type: Tool
Prerequisites: None (always available)
Mid-execution messaging to keep users informed during long operations. Agents can send progress updates while continuing to work.
Configuration:
Implementation:
Uses shared _channel_context for session-safe state access to the active channel.
Usage Example:
User: "Process this large dataset"
Agent: [Calls send_message("Starting analysis of 10,000 rows...")]
Agent: [Continues processing]
Agent: [Calls send_message("Halfway done, found 3 anomalies...")]
Agent: [Finishes and responds with full results]
report_progress¶
Group: communication
Type: Tool
Prerequisites: None (always available)
Structured progress reporting for long operations. Unlike send_message, this tool provides a dedicated schema with status label, optional detail, and optional percentage. Use it when you want to give the user more structured progress information than a plain text message.
Configuration:
Usage Example:
User: "Process this large dataset"
Agent: [Calls report_progress("Analyzing data", detail="Processing batch 1 of 10", percent=10)]
Agent: [Continues processing]
Agent: [Calls report_progress("Analyzing data", detail="Processing batch 5 of 10", percent=50, emoji="📊")]
Agent: [Finishes and responds with full results]
Optional Emoji Parameter:
Pass an emoji to prefix the status message with a custom emoji. If omitted, the framework uses a default emoji based on the status label.
Agent: [Calls report_progress("Deploying", detail="Pushing to staging", percent=30, emoji="🚀")]
User sees: "🚀 Deploying — Pushing to staging (30%)"
Implementation:
Uses shared _channel_context for session-safe state access to the active channel. Formats messages as: Status — Detail (Percent%) with an optional emoji prefix.
send_file¶
Group: agent
Type: Tool
Prerequisites: None (always available)
Send workspace files to users via channel. Validates files within sandbox, infers MIME type, enforces 50MB limit.
Configuration:
Implementation:
Uses shared _channel_context for session-safe state. Validates paths with resolve_sandboxed_path() for security.
Usage Example:
Agent: [Generates a report.pdf in workspace]
Agent: [Calls send_file("report.pdf", caption="Monthly report")]
User: [Receives file via Telegram]
followup¶
Group: agent
Type: Tool
Prerequisites: None (always available)
Self-continuation for multi-step autonomous workflows with depth limiting. Agents request re-invocation after responding.
Configuration:
Usage Example:
Agent: "I've completed step 1 of 3. [Calls request_followup()]"
System: [Re-invokes agent]
Agent: "Now completing step 2..."
Depth Limiting:
Prevents infinite loops via configurable depth limits in the message processing loop.
memory_search¶
Group: memory
Type: Tool
Prerequisites: sqlite-vec, poetry install -E memory
Semantic search over past conversations using vector embeddings.
Configuration:
Usage Example:
User: "What did we discuss about the deployment last week?"
Agent: [Calls memory_search("deployment last week")]
Agent: "Last Tuesday we discussed rolling back the deployment due to..."
shell¶
Group: system
Type: Tool
Prerequisites: None (core dependency)
Execute shell commands on the host system with configurable security controls. Disabled by default — must explicitly enable.
Security:
- Disabled by default
- Default blocked commands list prevents dangerous operations (rm -rf, sudo, etc.)
- Optional command allowlist for strict control
- Optional working directory constraint
Configuration:
builtins:
shell:
enabled: true # Must explicitly enable
config:
allowed_commands: # Optional allowlist
- ls
- cat
- grep
blocked_commands: # Optional override of defaults
- rm -rf
- sudo
working_directory: /home/user/sandbox # Optional constraint
Default Blocked Commands:
rm -rf, sudo, chmod 777, chown, wget, curl, dd if=, mkfs, fork bombs
Usage Example:
User: "What files are in the current directory?"
Agent: [Calls shell with command "ls -la"]
Agent: "Here are the files in the directory..."
md2pdf¶
Group: document
Type: Tool
Prerequisites: weasyprint, markdown, pygments (core dependencies)
Convert workspace markdown files to polished PDF documents with CSS theming, Mermaid diagram rendering, and AI self-healing for broken diagrams.
Themes:
| Theme | Style |
|---|---|
minimal |
Clean serif font, light styling, academic feel |
professional |
Indigo accents, sans-serif, business report look |
technical |
Dark code blocks, monospace-heavy, engineering docs |
Features:
- Mermaid diagrams rendered via mermaid.ink API (no local dependencies)
- SVG auto-scaling to fit page width
- AI self-healing for broken Mermaid syntax (configurable LLM, default: gpt-4o-mini)
- Syntax-highlighted code blocks via Pygments
- Tables, table of contents, and standard markdown extensions
Configuration:
builtins:
md2pdf:
theme: professional # minimal, professional, or technical
max_diagram_width: 6.5 # Max diagram width in inches
self_heal: true # AI repair for broken Mermaid diagrams
self_heal_model: "openai:gpt-4o-mini" # Any LangChain model spec
max_heal_iterations: 3 # Max repair attempts per diagram
Self-Healing:
When a Mermaid diagram fails to render, the tool can optionally invoke a LangGraph subgraph that:
- Sends the broken source + error to a configurable LLM
- Validates the repair by re-rendering via mermaid.ink
- Loops up to
max_heal_iterationstimes - Marks repaired diagrams with a visual indicator in the PDF
Self-healing requires an API key for the configured model (e.g., OPENAI_API_KEY for gpt-4o-mini). If unavailable, the tool degrades gracefully — broken diagrams get an error placeholder instead.
Usage Example:
User: "Convert my research notes to a PDF"
Agent: [Calls markdown_to_pdf(source_path="reports/notes.md", theme="professional")]
Agent: "PDF created: reports/notes.pdf (3 Mermaid diagrams rendered, 1 repaired by AI)"
plan¶
Group: system
Type: Tool
Prerequisites: None (always available)
Session-scoped planning tool for multi-step work. Agents use write_plan to externalize their thinking into a structured plan and read_plan to retrieve it later. Plans persist for the current session and reset on /new or /compact.
Available tools:
write_plan(plan)— Write or overwrite the session planread_plan()— Retrieve the current plan
When agents use this:
Agents create plans when tackling complex, multi-step tasks — especially when the work involves multiple tool calls, file operations, or research phases. The plan serves as working memory that survives across tool calls within a single session.
Usage Example:
User: "Research the latest AI safety papers and write a summary report"
Agent: [Calls write_plan("1. Search for recent AI safety papers\n2. Read top 5 results\n3. Synthesize findings\n4. Write summary to reports/ai-safety.md")]
Agent: [Proceeds to execute each step, updating the plan as steps complete]
acknowledge¶
Group: automation
Type: Tool
Prerequisites: None (always available)
Silent acknowledgment for system events. When the agent receives a [SYSTEM] event (cron result, heartbeat injection, sub-agent completion) and determines there is nothing the user needs to know, it calls acknowledge_event to suppress channel delivery. Everything is still logged — conversation history, token usage, and the acknowledgment reason. Silence means "don't message the user," not "don't record."
When to use:
The agent receives a [SYSTEM] notification with routine information — a cron ran successfully with no notable output, a heartbeat check found nothing actionable, or a background sub-agent completed a task the user doesn't need to hear about. Instead of sending a noisy "nothing to report" message, the agent calls acknowledge_event with a brief reason.
Key Behaviors:
- Only suppresses channel delivery for system-originated events. Has no effect on user messages.
- One
acknowledge_eventcall per agent invocation — duplicate calls return an error. - The agent's text response is still written to conversation history; it just isn't delivered to the channel.
Configuration:
Usage Example:
System: [SYSTEM] Cron 'daily-check' completed. Session log: memory/sessions/cron/daily-check_2026-03-25T09-00-00.jsonl
Agent: [Reads session log — routine status, no anomalies found]
Agent: [Calls acknowledge_event(reason="daily-check ran clean, no anomalies to report")]
Agent: "Checked the daily-check cron result — all systems nominal." (not delivered to user)
elevenlabs¶
Group: voice
Type: Tool
Prerequisites: ELEVENLABS_API_KEY, poetry install -E voice
Text-to-speech for voice responses using ElevenLabs API.
Configuration:
builtins:
elevenlabs:
enabled: true
config:
voice_id: 21m00Tcm4TlvDq8ikWAM # ElevenLabs voice ID
model_id: eleven_turbo_v2_5
Usage Example:
User: "Read me the summary"
Agent: [Calls elevenlabs to generate audio]
Agent: [Sends voice message via Telegram]
To find voice IDs, visit the ElevenLabs Voice Library.
email¶
Group: communication
Type: Tool (8 functions)
Prerequisites: poetry install --extras email (google-auth, google-api-python-client)
Send and receive email via Google service account with domain-wide delegation. Designed with an abstract provider interface for future SMTP/SES support.
Security Model:
The email builtin enforces a safe-by-default outbound policy via RecipientPolicy:
- An empty allowed_recipients list blocks ALL outbound email — sending must be explicitly enabled
- Recipient patterns use fnmatch glob syntax (e.g., *@company.com)
- All recipients (to, cc, bcc) are validated before sending
- Max recipients per message is configurable (default: 10)
Available Functions:
- send_email — Send email with recipient policy validation and optional file attachments
- check_email — List recent messages from a Gmail label (default: INBOX)
- get_email — Retrieve full content of a specific email by message ID
- search_email — Search using Gmail query syntax (from:, subject:, is:unread, etc.)
- reply_email — Reply to an email thread with proper In-Reply-To/References headers
- download_attachment — Download email attachment to workspace downloads/email/
- mark_as_read — Remove UNREAD label from a message
- mark_as_unread — Add UNREAD label to a message
Configuration:
builtins:
email:
enabled: true
config:
provider: gmail
service_account_file: config/service-account.json # Relative to workspace root
delegated_user: agent@yourdomain.com
allowed_recipients: # Empty = block all sends (safe default)
- "*@yourdomain.com"
- "partner@external.com"
max_recipients: 10
Provider Setup (Gmail):
- Create a Google Cloud project with Gmail API enabled
- Create a service account and download the JSON key file
- Enable domain-wide delegation on the service account
- In Google Workspace Admin, authorize the service account with scopes:
https://www.googleapis.com/auth/gmail.readonlyhttps://www.googleapis.com/auth/gmail.sendhttps://www.googleapis.com/auth/gmail.modify- Place the JSON key file in your workspace (e.g.,
config/service-account.json) - Set
delegated_userto the email address the agent should impersonate
Usage Example:
User: "Check my email for anything from Alice"
Agent: [Calls search_email(query="from:alice@example.com")]
Agent: "Found 3 emails from Alice. The most recent is about the Q4 report..."
User: "Download the PDF attachment from that email"
Agent: [Calls download_attachment(message_id="...", attachment_id="...", filename="q4-report.pdf")]
Agent: "Saved to downloads/email/q4-report.pdf"
User: "Reply and tell her I'll review it by Friday"
Agent: [Calls reply_email(message_id="...", body="Thanks Alice, I'll review the Q4 report by Friday.")]
Agent: "Reply sent to alice@example.com"
status_reminder¶
Group: agent
Type: Middleware (not a tool — operates automatically)
Prerequisites: send_message builtin must be enabled
Automatic detection of long silent tool-calling runs. When the agent completes multiple tool-calling turns without calling send_message(), the middleware injects a reminder to communicate with the user. Uses a three-gate decision model to avoid nagging: threshold (minimum silent turns before first reminder), budget (maximum reminders per run), and cooldown (minimum turns between consecutive reminders).
This is not a tool the agent calls — it operates transparently as middleware on every agent run where send_message is available.
Configuration:
status_reminder:
enabled: true # Default: true
threshold: 5 # Tool-calling turns before first reminder (1-50)
max_reminders: 3 # Max reminders per agent run (0-20)
cooldown_turns: 1 # Min turns between consecutive reminders (0-10)
Behavior:
- Reminders are injected as
<framework_instruction>tags into existing messages — no extra checkpoint entries, no additional API calls - Only active when the
send_messagebuiltin is loaded for the workspace - Resets between agent runs (each user message starts a fresh count)
- Set
max_reminders: 0to disable reminders while keeping detection active - Disabled entirely with
enabled: false
status_updates¶
Group: communication
Type: Middleware (not a tool — operates automatically)
Prerequisites: None (always available)
Automatic status updates sent to the user during agent execution. The middleware hooks into the agent run to detect and report:
- When the agent starts working ("Starting work...")
- When the agent decides to call tools ("Using tools: X, Y...")
- When a sub-agent is dispatched ("Dispatched sub-agent: label")
- Optionally, per-tool start and completion messages
Status updates are sent directly to the channel, bypassing the agent state to avoid extra LLM calls. They are throttled by time and budget to prevent spam.
This is not a tool the agent calls — it operates transparently as middleware on every agent run.
Configuration:
status_updates:
enabled: true # Default: true
agent_start: true # Report "Starting work..." (default: true)
tool_calls_detected: true # Report tool usage detection (default: true)
tool_start: false # Report per-tool start (default: false)
tool_complete: false # Report per-tool completion (default: false)
subagent_spawned: true # Report sub-agent dispatch (default: true)
min_interval_seconds: 3 # Min seconds between auto-updates (default: 3)
edit_in_place: true # Edit single message in place (default: true)
Behavior:
- Edit-in-place pattern (default): A single status message is maintained and edited in place. First update sends a new message; subsequent updates edit the same message. The message is deleted after the agent run completes.
- Run-aware labels: First run shows
"Starting work...", subsequent runs show"Continuing work...". - System events skip: Cron, heartbeat, and sub-agent completion events do not trigger status updates to avoid mid-task confusion.
- Time-based throttle:
min_interval_secondsbetween auto-detected status messages - Deduplication: if the same tool set is detected twice, only report once
- Agent-driven
report_progresstool calls bypass all throttling - Resets between agent runs (each user message starts a fresh count)
- Disabled entirely with
enabled: false
Processors¶
file_persistence¶
Group: None Type: Processor Prerequisites: None (always available)
Universal file upload handling with date partitioning. First processor in the pipeline — saves all uploaded files to {workspace}/data/uploads/{YYYY-MM-DD}/.
Configuration:
builtins:
file_persistence:
enabled: true
config:
max_file_size: 52428800 # 50 MB default
clear_data_after_save: false # Free memory after saving
Behavior:
Sets attachment.saved_path (relative to workspace root) so downstream processors can read from disk. Enriches message content with file receipt notifications:
[File received: report.pdf (2.3 MB, application/pdf)]
[Saved to: data/uploads/2026-02-07/report.pdf]
Filename Handling:
sanitize_filename() normalizes filenames (lowercases, removes special chars, replaces spaces with underscores). deduplicate_path() appends counters (1), (2), etc. to prevent overwrites.
whisper¶
Group: voice
Type: Processor
Prerequisites: OPENAI_API_KEY, poetry install -E voice
Audio transcription for voice and audio messages using OpenAI's Whisper API.
Configuration:
builtins:
whisper:
enabled: true
config:
model: whisper-1
language: en # Optional: auto-detect if omitted
Behavior:
Transcribes audio/voice messages and saves transcript as .txt sibling to the audio file (e.g., voice_123.ogg → voice_123.txt). Appends transcript inline to message content.
Usage Example:
User: [Sends voice message]
Channel: [Downloads audio file]
Whisper: [Transcribes to text, saves voice_123.txt]
Agent: [Processes transcribed text as normal message]
timestamp¶
Group: context
Type: Processor
Prerequisites: None (always available)
Prepends current date/time context to inbound messages, helping agents understand the current time in the user's timezone.
Configuration:
builtins:
timestamp:
enabled: true
config:
format: "%Y-%m-%d %H:%M %Z" # Optional datetime format (strftime)
template: "[Current time: {datetime}]" # Optional prefix template
Behavior:
Automatically adds timestamp context to every message:
User: "What's the weather today?"
[Timestamp processor adds: "[Current time: 2026-02-17 14:30 PST]"]
Agent sees: "[Current time: 2026-02-17 14:30 PST]\n\nWhat's the weather today?"
Format Examples:
# ISO 8601
format: "%Y-%m-%d %H:%M:%S %Z"
# Output: [Current time: 2026-02-17 14:30:00 PST]
# Human-readable
format: "%A, %B %d, %Y at %I:%M %p %Z"
# Output: [Current time: Monday, February 17, 2026 at 02:30 PM PST]
Note: Timestamp formatting uses the workspace timezone (configurable in agent.yaml).
docling¶
Group: None
Type: Processor
Prerequisites: docling (core dependency)
Document conversion (PDF, DOCX, PPTX, etc.) to markdown with OCR support.
Configuration:
Behavior:
Converts documents to markdown and saves as .md sibling file (e.g., report.pdf → report.md). Appends converted markdown inline to message content.
OCR Support:
- macOS: Uses
OcrMacOptions(force_full_page_ocr=True)for scanned PDFs - Linux: Uses
EasyOcrOptionsfor OCR
Usage Example:
User: [Uploads report.pdf]
FilePersistence: [Saves to data/uploads/2026-02-17/report.pdf]
Docling: [Converts to markdown, saves report.md]
Agent: [Processes markdown content]
Configuration¶
Global Configuration¶
Configure builtins in config.yaml:
builtins:
# Allow/deny lists
allow: [] # Empty = allow all available
deny:
- group:voice # Deny all voice-related builtins
# Individual builtin configs
browser:
enabled: true
config:
headless: true
allowed_domains: ["calendly.com"]
brave_search:
enabled: true
config:
count: 5
whisper:
enabled: true
config:
model: whisper-1
spawn:
enabled: true
config:
max_concurrent: 8
default_progress_interval: 5
cron:
enabled: true
config:
max_tasks: 50
min_interval_seconds: 300
file_persistence:
enabled: true
config:
max_file_size: 52428800
Per-Workspace Configuration¶
Override builtin settings per workspace in agent.yaml:
# workspace1/agent.yaml - Enable web search only
builtins:
allow:
- brave_search
deny:
- elevenlabs
# workspace2/agent.yaml - Enable voice features
builtins:
allow:
- group:voice # Allow all voice builtins
deny:
- brave_search
Allow/Deny Behavior¶
Empty allow list — Allow all available builtins (default)
Specific allow list — Only enable listed builtins
Group allow — Enable all builtins in a group
Deny list — Block specific builtins or groups
Group deny — Block all builtins in a group
Priority: Deny takes precedence over allow.
Builtin Groups¶
| Group | Members |
|---|---|
voice |
whisper, elevenlabs |
web |
brave_search |
system |
shell |
context |
timestamp |
agent |
spawn, cron, task_tracker, send_message, followup, send_file |
automation |
cron_manager, acknowledge |
browser |
browser |
communication |
|
memory |
memory_search |
document |
md2pdf |
Usage:
Installation¶
Most builtins are included in the core install. A few require optional extras:
Voice capabilities:
Installs:openai, elevenlabs
Web capabilities:
Installs:langchain-community
Memory search:
Installs:sqlite-vec
Email capabilities:
Installs:google-auth, google-api-python-client
All optional builtins:
Core dependencies (included in base poetry install):
- docling, easyocr, opencv-python-headless — Document conversion and OCR
- playwright — Browser automation
- langchain-anthropic, langchain-openai, langchain-aws, langchain-xai, langchain-fireworks — LLM providers
- weasyprint, markdown, pygments — Markdown-to-PDF conversion
- Shell tool — No extra dependencies required
Adding Custom Builtins¶
You can extend OpenPaw with custom tools and processors.
Creating a Custom Tool¶
- Create tool file:
openpaw/builtins/tools/my_tool.py
from langchain_core.tools import StructuredTool
from openpaw.builtins.base import (
BaseBuiltinTool,
BuiltinMetadata,
BuiltinType,
BuiltinPrerequisite,
)
class MyCustomTool(BaseBuiltinTool):
"""Custom tool implementation."""
metadata = BuiltinMetadata(
name="my_custom_tool",
display_name="My Custom Tool",
description="Custom functionality for X",
builtin_type=BuiltinType.TOOL,
group="custom",
prerequisites=BuiltinPrerequisite(
env_vars=["MY_API_KEY"],
packages=["my-package"],
),
)
def get_langchain_tool(self) -> list:
"""Return LangChain tool instances."""
def my_tool_func(query: str) -> str:
"""Execute the tool."""
api_key = self.config.get("api_key") or os.getenv("MY_API_KEY")
# Implementation here
return result
return [
StructuredTool.from_function(
func=my_tool_func,
name="my_custom_tool",
description="What this tool does",
)
]
Key Points:
- Extend
BaseBuiltinTool, notBaseToolfrom LangChain - Use
StructuredTool.from_function()factory pattern get_langchain_tool()returns a list (can contain multiple tools)-
Access config via
self.config -
Register in registry:
openpaw/builtins/registry.py
try:
from openpaw.builtins.tools.my_tool import MyCustomTool
self.register_tool(MyCustomTool)
except ImportError as e:
logger.debug(f"My custom tool not available: {e}")
- Configure in config.yaml:
- Set environment variable:
Creating a Custom Processor¶
- Create processor file:
openpaw/builtins/processors/my_processor.py
from openpaw.builtins.base import (
BaseBuiltinProcessor,
BuiltinMetadata,
BuiltinType,
BuiltinPrerequisite,
ProcessorResult,
)
from openpaw.domain.message import Message
class MyCustomProcessor(BaseBuiltinProcessor):
"""Custom message processor."""
metadata = BuiltinMetadata(
name="my_processor",
display_name="My Processor",
description="Processes messages before agent sees them",
builtin_type=BuiltinType.PROCESSOR,
group="custom",
prerequisites=BuiltinPrerequisite(
env_vars=["MY_API_KEY"],
),
)
async def process_inbound(self, message: Message) -> ProcessorResult:
"""Transform the message."""
# Access config
option = self.config.get("option1", "default")
# Transform message content
message.content = f"[Processed] {message.content}"
return ProcessorResult(message=message)
- Register in registry:
openpaw/builtins/registry.py
try:
from openpaw.builtins.processors.my_processor import MyCustomProcessor
self.register_processor(MyCustomProcessor)
except ImportError as e:
logger.debug(f"My processor not available: {e}")
- Configure and use:
Processors run automatically on all messages in the channel layer.
Best Practices¶
1. Use Environment Variables for Secrets¶
Never hardcode API keys:
# Bad
builtins:
brave_search:
config:
api_key: "actual-key-here" # Don't do this
# Good — relies on BRAVE_API_KEY environment variable
builtins:
brave_search:
enabled: true
2. Install Only Needed Extras¶
Minimize dependencies:
3. Deny Unused Builtins¶
Reduce attack surface:
builtins:
deny:
- elevenlabs # Don't need TTS in this workspace
- group:system # Disable shell for security
Security Note: The shell tool should be denied unless explicitly needed. It is disabled by default and requires explicit enablement.
4. Use Groups for Bulk Operations¶
Simplify configuration:
# Instead of denying individual tools:
builtins:
deny:
- whisper
- elevenlabs
# Use group deny:
builtins:
deny:
- group:voice
5. Test Prerequisites¶
Verify API keys before deploying:
# Test OpenAI key
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"
# Test ElevenLabs key
curl https://api.elevenlabs.io/v1/voices \
-H "xi-api-key: $ELEVENLABS_API_KEY"
Troubleshooting¶
Builtin not available:
- Check environment variable is set: echo $OPENAI_API_KEY
- Verify extras are installed: poetry install -E voice
- Check allow/deny lists in config
- Enable verbose logging: poetry run openpaw -w agent -v
API key errors: - Verify key is valid and active - Check API quota/billing status - Test key with curl (see examples above)
Import errors:
- Missing optional dependency
- Run poetry install -E <extra-name>
- Check pyproject.toml for correct package versions
Processor not running:
- Verify enabled: true in config
- Check processor isn't denied
- Ensure processor is registered in registry
- Check logs for initialization errors
Tool not available to agent: - Verify tool prerequisites are met - Check allow/deny lists - Tool must be properly registered - Agent must have permission to use tools (model capability)
Security Considerations¶
Shell Tool¶
The shell tool provides powerful system access and requires careful configuration:
- Disabled by default — must explicitly enable in config
- Use
allowed_commandsfor strict allowlisting when possible - Default
blocked_commandslist prevents common dangerous operations - Consider constraining
working_directoryto a sandbox - Never enable in untrusted environments
Best Practices:
1. Enable the shell tool only in workspaces that need it
2. Use group:system deny rule in untrusted workspaces
3. Configure minimal allowed_commands
4. Monitor logs for blocked command attempts
Browser Tool¶
Domain Security:
- Use allowed_domains allowlist for production workspaces
- blocked_domains takes precedence over allowlist
- Wildcard subdomain support with *.example.com
- Consider persist_cookies: false to avoid session leakage
Best Practices:
1. Configure domain allowlist for untrusted agents
2. Monitor the workspace/downloads/ directory for unexpected files
3. Set reasonable timeout values
4. Review screenshot captures for sensitive data
