maestro/docs/mcp.md

# MCP (Model Context Protocol) Server Integration

The orchestrator can call tools hosted on external **MCP servers** (OAuth-secured
SaaS like Canva, or self-hosted servers with static API keys). Connected MCP
tools are exposed to pieces via `mcp__<server>__<tool>` names, and can be
allowlisted with `mcp__<server>__*` wildcards in `piece.allowed_tools`.

This document is the **operator runbook** for setting up, troubleshooting, and
maintaining MCP integrations. For internal design notes, see

## Prerequisites

### 1. Generate `MCP_ENCRYPTION_KEY`

All OAuth client secrets, static API tokens, and user access tokens are
encrypted at rest with AES-256-GCM. The key is a 32-byte hex string.

```bash
openssl rand -hex 32
```

Set it in your environment before starting the orchestrator:

```bash
export MCP_ENCRYPTION_KEY=<the 64-hex output>
scripts/server.sh start
```

If `MCP_ENCRYPTION_KEY` is **not** set, the MCP subsystem boots fail-soft: a
warning is logged and all MCP endpoints return 503 / are hidden from the UI.
Other features continue normally.

> ⚠ **Key rotation invalidates all encrypted tokens.** Plan rotation as a
> migration event: ask every user to re-connect. There is no automatic
> re-encryption today.

### 2. Optional: `mcp.allow_private_addresses`

By default, MCP requests are routed through the SSRF strict-check, which
rejects loopback and private-IP addresses. For **self-hosted MCP servers** on
`localhost` or LAN, set in `config.yaml`:

```yaml
mcp:
  allow_private_addresses: true
```

This skips the SSRF check entirely (same semantics as `insecureLocalTestMode`).
**Only enable in trusted networks.** Better cidr-aware controls are tracked in
the Phase 8 follow-ups.

## Authentication modes

There are two `auth_kind` values for an MCP server registration:

| `auth_kind` | Use case | Setup |
|---|---|---|
| `oauth` | SaaS providers (Canva, GitHub Apps, etc.) | Register OAuth app in provider's dev portal, capture `client_id` + `client_secret`, plug into Settings UI. User clicks **Connect** to authorize. |
| `api_key` | Self-hosted MCP, providers with personal access tokens | Generate a bearer token on the provider side, paste it into the server registration. No per-user dance. |

## Global vs user-owned servers

| Owner | Visibility | Who can register |
|---|---|---|
| **Global** (`owner_id IS NULL`) | All users see it on the Connections panel | Admins via `/api/mcp/servers` |
| **User-owned** (`owner_id = userId`) | Only the owner sees / uses it | Any user via `/api/mcp/user-servers` |

Admins can also register **user-owned** servers (they're "users too" from the
API's perspective). The Settings → User Folder → MCP Servers panel has both
sections — global at top (admin only), user's own below.

## Setup walkthroughs

### A. OAuth server (global, admin-managed)

1. **Provider portal**: register a new OAuth client. Configure the callback URL:
   ```
   https://<your-orchestrator-host>/auth/mcp/<server_id>/callback
   ```
   where `<server_id>` is the slug you'll use in step 2 (e.g. `canva`).
2. **Admin UI** → Settings → User Folder → **Global Servers** → **+ Add server**:
   - **ID**: `canva` (matches callback URL)
   - **Name**: `Canva` (display only)
   - **URL**: `https://api.canva.com/mcp` (the MCP endpoint, not the OAuth host)
   - **Auth**: OAuth
   - **Client ID / Secret / Scopes**: from the provider portal
3. The orchestrator fetches `<URL_origin>/.well-known/oauth-authorization-server`
   and stores `issuer`, `authorization_endpoint`, `token_endpoint`,
   `discovery_fingerprint`. If discovery fails, see Troubleshooting below.
4. **Each user** clicks **Connect** on the Connections panel → OAuth flow runs
   → access + refresh tokens persisted (encrypted) under `user_mcp_tokens`.
5. From here on, tools cached for that server are usable by the user as
   `mcp__canva__<tool>`.

### B. api_key server (self-hosted, user-managed)

1. **Provider**: generate a bearer token (e.g. `sk-...`). For most self-hosted
   MCP, this is a static value in the server's config.
2. **User UI** → Settings → User Folder → **Your Servers** → **+ Add server**:
   - **ID**: `my-tools`
   - **Name**: `My self-hosted tools`
   - **URL**: `http://10.0.0.10:8080/mcp` (or wherever)
   - **Auth**: API Key
   - **Static Token**: paste the bearer
3. No OAuth dance — `tools/list` and `tools/call` flow uses the static token
   directly. Token is encrypted at rest in `mcp_servers.static_token_enc`.
4. If using a private IP, ensure `mcp.allow_private_addresses: true` is set
   (see Prerequisites).

## How tools flow into pieces

The orchestrator caches `tools/list` results in `mcp_server_tools`, refreshed
on registration and on explicit admin refresh (no automatic TTL today). Piece
authors expose them via `allowed_tools`:

```yaml
movements:
  - name: design
    allowed_tools:
      - Read
      - Write
      - mcp__canva__*           # all tools from server `canva`
      - mcp__my-tools__lint     # a specific tool from `my-tools`
```

The wildcard `mcp__<server>__*` expands to all currently-cached tools for that
server.

## Job parking and resume

When a piece requires an MCP server (via `required_mcp` frontmatter or
discovered from `allowed_tools`) and the user has no connection, the worker
parks the job:

- `jobs.status = 'waiting_human'`
- `jobs.wait_reason = 'mcp_auth_required'`
- A comment is posted on the local task with a Connect link

When the user completes the OAuth flow, `resumeWaitingJobs(userId, serverId)`
re-queues every parked job for that pair. api_key servers don't park
(server-level credentials, not per-user).

## Troubleshooting

### Discovery fails (`/api/mcp/servers` returns 400 on POST)

Symptoms: registration fails with `Discovery fetch failed: <code>` or
`authorization_endpoint origin must match MCP url origin`.

Causes:
- Provider doesn't expose `/.well-known/oauth-authorization-server` at the
  origin of the MCP URL. Check with `curl <origin>/.well-known/oauth-authorization-server`.
- Cross-origin `authorization_endpoint` or `token_endpoint` — orchestrator
  enforces same-origin to prevent malicious redirects.
- SSRF block on a private-IP URL — set `mcp.allow_private_addresses: true`.

### OAuth callback fails with 400

Symptoms: After clicking **Connect**, the browser lands on `/auth/mcp/<id>/callback`
and gets `400 Bad Request`.

Causes:
- **State mismatch**: `code` or `state` query param missing, or the `state`
  was already consumed (single-use, by design). Re-trigger the flow from scratch.
- **Token endpoint rejected the code**: check provider portal for misconfigured
  redirect URI. The orchestrator uses exactly `<your-host>/auth/mcp/<server_id>/callback`.

### Tool calls return 401 silently

The token may have expired and refresh failed. Check the audit log:

```sql
SELECT detail FROM audit_log WHERE action LIKE 'mcp.%' AND created_at > datetime('now', '-1 hour');
```

`mcp.token.refresh` rows with status non-200, or `mcp.token.invalid_grant` rows
indicate the user's refresh token is gone. They need to re-connect.

### "MCP_ENCRYPTION_KEY not configured"

The env var was not set, OR it's the wrong length (must be exactly 64 hex chars
= 32 bytes). Verify with:

```bash
echo -n "$MCP_ENCRYPTION_KEY" | wc -c   # should be 64
```

### Private-IP MCP rejected

You're hitting `http://localhost:...` or a RFC1918 address and SSRF is blocking
it. Set in `config.yaml`:

```yaml
mcp:
  allow_private_addresses: true
```

then restart (`scripts/server.sh restart`). Note: `loadConfig().mcp` is read at
boot; runtime hot-reload doesn't propagate.

## Log prefixes

Grep the orchestrator log for:

| Prefix | Subsystem |
|---|---|
| `[mcp:registry]` | Server CRUD, discovery snapshots |
| `[mcp:token]` | hasToken, getValidToken, refresh, invalidation |
| `[mcp:oauth]` | OAuth start / callback handlers |
| `[mcp:client]` | SDK transport connect / close |
| `[mcp:executor]` | callTool execution + content[] handling |
| `[mcp:aggregator]` | tool list resolution, dispatch |

## Audit log entries

| `action` | Trigger | Detail (redacted) |
|---|---|---|
| `mcp.server.upsert` | Admin or user adds/updates a server | `serverId`, `authKind` |
| `mcp.oauth.start` | User clicks Connect on an oauth server | `serverId` |
| `mcp.oauth.callback` | User completes OAuth dance | `serverId`, `success` |
| `mcp.token.refresh` | getValidToken triggers refresh | `serverId`, `outcome` |
| `mcp.token.invalid_grant` | Refresh failed with `invalid_grant` | `serverId` |
| `mcp.call_tool` | A tool was invoked | `serverId`, `toolName`, `argsHash` |

Token strings, OAuth codes, and Authorization headers are scrubbed by
`src/mcp/redact.ts` before being written to detail JSON.

## SSRF and private IPs

The strict SSRF check (`src/mcp/ssrf-strict.ts`) is enforced for **all** MCP
fetches (discovery, token, /mcp). It:

1. Resolves the URL hostname to an IP.
2. Rejects loopback (`127.0.0.0/8`, `::1`), RFC1918, link-local, multicast,
   CGNAT (`100.64.0.0/10`), and broadcast.
3. Pins the resolved IP to prevent TOCTOU attacks (`pinnedFetch`).

`mcp.allow_private_addresses: true` bypasses **all** of the above. Use only in
trusted dev/CI environments. Granular allow/deny (e.g. allow loopback but deny
multicast) is a Phase 8 follow-up.

## Future work

- Refresh-on-401 retry inside `tool-executor` (currently a 401 fails the call;
  the user must re-trigger)
- Stdio transport for local MCP servers (no HTTP)
- Org-scoped shared tokens (schema already has `scope_type` / `scope_id`)
- TTL-based opportunistic refresh of `mcp_server_tools` cache
- `MCP_ENCRYPTION_KEY` rotation without invalidating tokens
- Cidr-aware private-IP allowlist (replace blanket `allow_private_addresses`)