# OSS Documentation Cleanup Plan (MAESTRO) Status: PLAN (audit + execution checklist). Branch: `docs/oss-cleanup`. Author: docs audit pass, 2026-06-10. This plan audits MAESTRO's documentation against GitHub/OSS publishing conventions and the repo owner's HTML-staged-docs preference, and lays out a concrete, ordered execution path. It does **not** change any shipping docs; the only new files on this branch are this plan and two clearly-labeled DRAFT stubs (`docs/README.en.draft.md`, `docs/SECURITY.draft.md`). ## How OSS publishing actually works (verified) `scripts/oss-sync.sh` builds the public tree from `git archive HEAD`, then: 1. removes every path in `oss/exclude.txt` (CLAUDE.md, `docs/superpowers`, `docs/plans`, dated internal docs, `oss/` itself, sync tooling, editor configs); 2. scrubs internal hostnames/IPs via `oss/scrub.sed`; 3. strips dead links to excluded docs via `oss/scrub-deadlinks.py`; 4. overlays `oss/overlay/.` last (public-only files **win**); 5. runs a release gate (`oss/verify_release.py` + `oss/forbidden.txt`). So the **public** doc set = (tracked docs that survive exclude) **+** the overlay. The overlay is the source of truth for the public-facing files. The tracked root `README.md` is the **private** README and is *replaced* by `oss/overlay/README.md` (README is overlaid, not excluded). Anything authored for the public must land in `oss/overlay/` or in a tracked `docs/` path that is **not** excluded. Implication for this cleanup: most "public-facing" doc work means editing files under `oss/overlay/`, not the tracked root files. --- ## A. INVENTORY Audience = who the doc is for. OSS-ships = `overlay` (public-only file), `tracked` (survives exclude and ships as-is), or `excluded` (removed by `oss/exclude.txt`). Language as of today. ### Top-level files | Path | Audience | Lang | OSS-ships? | Verdict | |------|----------|------|-----------|---------| | `README.md` (tracked root) | internal | JA | replaced by overlay README | keep as private; **not** public | | `oss/overlay/README.md` | public | **JA** | overlay (becomes public README) | **translate→EN-first** + add badges/screenshot | | `oss/overlay/AGENTS.md` | public (contributors) | EN | overlay | keep | | `oss/overlay/CONTRIBUTING.md` | public | EN | overlay | keep (add CoC + DCO/CLA note) | | `oss/overlay/SECURITY.md` | public | EN | overlay | keep (good) | | `oss/overlay/CHANGELOG.md` | public | EN | overlay | keep (date stale: dated 2026-06-02) | | `oss/overlay/LICENSE` | public | EN | overlay | keep (Apache-2.0) | | `oss/overlay/NOTICE` | public | EN | overlay | keep | | `AGENTS.md` (tracked root) | internal | EN | tracked (not excluded) — **collides** w/ overlay | see note 1 | | `GEMINI.md` (tracked root) | internal | — | tracked (not excluded!) | **add to exclude.txt** (internal editor config) | | `CLAUDE.md` | internal | JA | excluded | keep excluded | Note 1 — `AGENTS.md` exists both tracked-at-root and in the overlay. The overlay copy wins (overlaid last), so the public ships the overlay version. Confirm the tracked root `AGENTS.md` is acceptable to ship *if* the overlay ever stops shipping it; today it is harmless but a maintenance trap (two AGENTS.md to keep in sync). Recommend: keep only the overlay AGENTS.md public; the tracked root one is internal — fine, but document the duplication. ### `docs/` overlay (public) | Path | Audience | Lang | OSS-ships? | Verdict | |------|----------|------|-----------|---------| | `oss/overlay/docs/getting-started.md` | public | JA | overlay | translate→EN | | `oss/overlay/docs/configuration.md` | public | JA | overlay | translate→EN | | `oss/overlay/docs/architecture.md` | public | JA | overlay | translate→EN | ### `docs/` tracked (ship unless excluded) | Path | Audience | Lang | OSS-ships? | Verdict | |------|----------|------|-----------|---------| | `docs/architecture.md` | mixed | JA | tracked | reconcile vs overlay arch (see note 2) | | `docs/tools/*.md` (34) | public | JA | tracked | keep; translate top-N later | | `docs/operations/bash-sandbox-provisioning.md` | public | JA | tracked | keep | | `docs/operations/index.html` + `initial-setup.html` + `guide.css` | public | JA | tracked (HTML) | see RECONCILIATION | | `docs/design/**` (ui_kits_reference jsx/html/css) | internal | mixed | tracked | **exclude** (dev design refs, not user docs) | | `docs/aao-gateway-overview.md` | public | JA | tracked | keep | | `docs/mcp.md`, `docs/skills.md`, `docs/ssh.md`, `docs/bench.md` | public | JA | tracked | keep | | `docs/context-flow.md`, `docs/user-folder-layout.md` | public | JA | tracked | keep | | `docs/maintenance-checklist.md` | internal (contributors) | JA | tracked | keep (referenced by CONTRIBUTING) | Note 2 — duplicate architecture docs: `docs/architecture.md` (tracked) AND `oss/overlay/docs/architecture.md` (overlay). The overlay wins for the public. Decide which is canonical and have the other redirect/stub, or exclude the tracked one to avoid drift. ### docs-wip branch (in-progress reorg — NOT yet on main) `docs-wip` does two things: 2. **Adds** a consolidated, curated doc set: `docs/reference/*.md` (18 files, Japanese — feature reference: scheduler, config, mcp, ssh, memory, gateway, pieces, skills, notifications, media-tools, etc.), plus an HTML build system: `docs/build_html.py` (Markdown→staged HTML), generated `docs/html/**`, and manifest JSON (`.investigate-status.json`, `.consolidate-manifest.json`). `docs/build_html.py` reads each doc's implementation status from `.investigate-status.json` and renders staged HTML (design=blue / implementation=amber / completed=green) with an `index.html` nav — exactly the owner's HTML-staged convention. The `docs/reference/*.md` consolidation is the single most valuable doc asset for OSS: a clean, deduplicated feature reference replacing scattered dated design specs. **Recommendation:** land `docs-wip`'s `docs/reference/*.md` consolidation (highest-value, low-risk) ahead of the OSS push, but treat the *generated* `docs/html/**` output and the `*.json` manifests as build artifacts (see RECONCILIATION — keep the generator, gitignore/exclude the output). --- ## B. GAP ANALYSIS — GitHub OSS readiness Verified-present: `LICENSE` (Apache-2.0), `NOTICE`, `CONTRIBUTING.md`, `SECURITY.md`, `CHANGELOG.md`, `AGENTS.md`, `Dockerfile`, `docker-compose.yml`, `.env.example`, `.dockerignore`. Gaps found: | Gap | Severity | Notes | |-----|----------|-------| | **No `CODE_OF_CONDUCT.md`** | high | Standard for OSS; GitHub surfaces it in the community profile. Add Contributor Covenant 2.1 to `oss/overlay/`. | | **No `.github/` community files** | high | Missing issue templates (bug/feature), `PULL_REQUEST_TEMPLATE.md`, `FUNDING.yml` (optional). Note: repo is published to **Gitea** (`swallow/maestro`), not GitHub — Gitea reads `.gitea/ISSUE_TEMPLATE/` (and also `.github/`). Add templates under a host-appropriate dir in the overlay. | | **README is JA-only** | high | First-time visitor on an English-default host can't read it. EN-first is the single biggest readiness fix. | | **README has 1 badge only** | medium | Only a static license badge. Add: build/CI status, release/version, Node version, "PRs welcome". Avoid badges that need a live service. | | **No screenshots/GIF in README** | medium | An agent UI sells itself visually. Add 1–2 screenshots (task detail / settings) under `oss/overlay/docs/assets/` and embed in README. | | **No architecture diagram image** | low | The execution-flow ASCII block is fine; a simple diagram would help. Optional. | | **CHANGELOG date stale** | low | `v0.1.0 — 2026-06-02` predates current HEAD; refresh on release. | | **No top-level "Documentation" landing for the curated set** | medium | If `docs/reference/*` lands, README/getting-started should link an index. | | **License headers in source** | low | Apache-2.0 doesn't require per-file headers, but `NOTICE` + a short header policy in CONTRIBUTING avoids questions. Optional. | | **`GEMINI.md` would leak** | medium | Tracked at root, NOT in `oss/exclude.txt` → ships publicly. It's an internal editor/assistant config like CLAUDE.md. Add to exclude. | | **`docs/design/ui_kits_reference/**` would ship** | low-medium | Internal design references (JSX prototypes, legacy admin kit). Not user docs. Add `docs/design` to exclude. | README quality for a first-time visitor (overlay README): structure is good (features → quickstart → requirements → docs → security → license). What's missing for a strong first impression: English, a screenshot, a one-line "what/why" hook in English at the very top, and CI/release badges. --- ## C. i18n PLAN Current state: all public docs (overlay README + getting-started + configuration + architecture) and tracked `docs/**` are Japanese. GitHub/Gitea OSS audiences default to English. The product also targets 多言語対応 as a goal. **Convention (recommended): English-first with a `.ja` sibling.** This is the lowest-friction, most-recognized GitHub pattern. - `README.md` → English (the public README, i.e. `oss/overlay/README.md`). - `README.ja.md` → Japanese (current content moved here). - Cross-link at the top of each: `[English](README.md) | [日本語](README.ja.md)`. For `docs/`, use a suffix convention to avoid a parallel directory tree: ``` oss/overlay/docs/ getting-started.md # EN (canonical) getting-started.ja.md # JA configuration.md # EN configuration.ja.md # JA architecture.md # EN architecture.ja.md # JA ``` Rationale: a `docs/en/` + `docs/ja/` split doubles directory depth and breaks relative links on every move. The `.ja.md` suffix keeps EN as the default a visitor hits and keeps JA one click away. (If a richer i18n site is built later — e.g. Docusaurus/MkDocs — migrate then; don't over-engineer now.) **Translate-first order (highest visitor impact first):** 1. `README.md` (overlay) — the storefront. **Do first.** 2. `docs/getting-started.md` — clone→running path. 3. `docs/configuration.md` — `config.yaml` reference. 4. `docs/architecture.md` — for evaluators/contributors. 5. `CHANGELOG.md` (already EN), `SECURITY.md` (already EN), `CONTRIBUTING.md` (already EN), `AGENTS.md` (already EN) — no action. 6. Later/optional: top 5–8 `docs/tools/*.md` by usage (bash, websearch, browseweb, spawnsubtask, office) and the `docs/reference/*` set if landed. **Tooling/convention:** - Keep EN canonical, JA as translation. Do not auto-generate JA at build time; hand-maintain the high-value pages, accept staleness on the long tail. - Add a short "Translations" note in CONTRIBUTING describing the `.ja.md` convention and that EN is canonical (PRs that change EN should flag the JA sibling as needing update — don't block on it). - The existing JA overlay docs are already written and accurate — they become the `.ja.md` siblings essentially for free. The work is the EN translation, not throwing away the JA. --- ## D. DOCKER DOCS PLAN From `git clone` to running via Docker, a new user needs (and currently has): | Step | Covered today? | Where | |------|----------------|-------| | `cp .env.example .env` | yes | README quickstart + getting-started §5 | | Set LLM endpoint | yes | `.env.example` comments (`OLLAMA_BASE_URL`, `OLLAMA_MODEL`) | | `docker compose up -d` | yes | README + getting-started | | Where the UI is | yes | `http://localhost:9876` | | Data persistence | yes | named volumes `maestro-data` / `maestro-workspaces` | | Security default (localhost-only) | yes | README + getting-started + SECURITY | What's **missing / unclear** for a Docker-first OSS user: 1. **`host.docker.internal` on Linux.** `.env.example` defaults to `http://host.docker.internal:11434/v1`. On Linux Docker this name is not resolvable by default (works on Docker Desktop mac/win). New Linux users will hit "connection refused" with no hint. **Add a note**: on Linux use the host gateway IP or `--add-host=host.docker.internal:host-gateway` (compose: `extra_hosts`), or point at the LAN IP of the Ollama host. 2. **No "verify it's running" step.** Add a healthcheck/`docker compose logs -f` + "open the UI, create a task" smoke check to getting-started §5. 3. **Mounting `config.yaml` into the container** is referenced ("see the comments in docker-compose.yml") but not shown inline. New users benefit from one explicit example of mounting a host `config.yaml` and where setup runs in the container (does the image run `npm run setup`, or only env-var config?). 4. **Bash sandbox in Docker.** bwrap needs unprivileged user namespaces; in a container that may need extra flags or `--privileged`-adjacent settings, or the hardened fallback applies. Getting-started §8 covers host provisioning but not the containerized story. Add one paragraph: what `bash_sandbox` mode the Docker image ships with and any caveats. 5. **Image build vs prebuilt.** Is there a published image, or is `docker compose up` building locally from the `Dockerfile` every time? State it. If build-from-source, note the first-run build time. 6. **GPU / external Ollama.** Make explicit that MAESTRO's container does **not** run the LLM; users point it at an existing OpenAI-compatible endpoint. The README says it but the Docker section could restate it (common confusion). Recommend a dedicated `oss/overlay/docs/docker.md` (EN) consolidating the above, linked from README + getting-started §5, rather than growing getting-started. --- ## E. RECONCILIATION — HTML-staged docs vs GitHub Markdown The tension: the owner's convention wants `docs/` as **staged HTML** (design/implementation/completed, color badges, `index.html` nav). GitHub/OSS convention wants **Markdown** README + `docs/` that render on the repo host. These are not actually in conflict if we treat HTML as a **generated artifact**, not the source: 1. **Markdown is the single source of truth.** Author everything as `.md` (`README.md`, `docs/**/*.md`, the `docs/reference/*.md` set from `docs-wip`). Markdown renders natively on Gitea/GitHub and is what OSS contributors expect. 2. **`docs/build_html.py` (from `docs-wip`) generates the staged HTML view** from that Markdown for the owner's internal browsing/handoff workflow. The generator stays; its **output (`docs/html/**`) is a build artifact** — do not hand-edit it, and keep it out of OSS: - add `docs/html/` to `.gitignore` (don't commit generated output), and - add `docs/html` + the `*.json` manifests to `oss/exclude.txt` as belt-and- suspenders so even a stray commit never ships generated HTML publicly. This gives the owner the staged-HTML experience locally (`python3 docs/build_html.py`) while the OSS repo ships clean Markdown — **no duplication of content**, only a generation step. 3. **The few hand-written HTML docs that exist today** (`docs/operations/index.html`, `initial-setup.html`, `guide.css`) are the exception: they're authored HTML, not generated. Two options — (a) convert them to Markdown so they fit the generated-from-MD model (preferred for OSS consistency), or (b) keep them as authored HTML but exclude them from OSS and let the public read the Markdown equivalents (`docs/operations/bash-sandbox-provisioning.md` already exists). Recommend (a) long-term, (b) as the immediate no-risk choice. 4. **Stage metadata** (design/impl/completed) lives in the Markdown frontmatter or the manifest JSON, consumed only by `build_html.py`. OSS readers never see stages; they see finished Markdown. Internal readers get the staged HTML view. Net: keep `docs-wip`'s generator + the `docs/reference/*.md` consolidation; gate the generated HTML behind `.gitignore` + `oss/exclude.txt`. One source (MD), two renderings (host-native MD for OSS, staged HTML for internal). --- ## F. PRIORITIZED EXECUTION CHECKLIST Effort: S ≤30min, M ≤2h, L ≤half-day. Tags: [public-facing] ships to OSS; [internal] private-repo / tooling only. ### P0 — leak/correctness fixes (do before any OSS push) 1. [internal] **Add `GEMINI.md` to `oss/exclude.txt`** (internal assistant config, currently ships). — S 2. [internal] **Add `docs/design` to `oss/exclude.txt`** (JSX/HTML UI prototypes, not user docs). — S 3. [internal] **Resolve duplicate architecture doc**: pick canonical (`oss/overlay/docs/architecture.md`), exclude or stub the tracked `docs/architecture.md`. — S 4. [internal] **Decide AGENTS.md duplication** (overlay wins; document that the tracked root copy is internal). — S 5. [internal] Run `scripts/oss-sync.sh --dry-run --local-only` and read the release-gate output + diff stat to confirm nothing internal leaks. — S ### P1 — English README + storefront (highest visitor impact) 6. [public-facing] **Translate `oss/overlay/README.md` to English**; move JA to `oss/overlay/README.ja.md`; add the `EN | 日本語` switcher line. Use `docs/README.en.draft.md` (this branch) as the starting skeleton. — M 7. [public-facing] **Add badges** to README: CI/build, release/version, Node 22+, license (keep), PRs-welcome. Only badges that don't require a live host. — S 8. [public-facing] **Add 1–2 screenshots** (`oss/overlay/docs/assets/`) and embed under a "Screenshots" section. — S–M (needs capturing UI) ### P2 — community health files 9. [public-facing] **Add `CODE_OF_CONDUCT.md`** (Contributor Covenant 2.1) to `oss/overlay/`. — S 10. [public-facing] **Add issue + PR templates** under the host-appropriate dir in the overlay (`.gitea/ISSUE_TEMPLATE/{bug,feature}.md` + a `PULL_REQUEST_TEMPLATE.md`; also `.github/` if mirroring to GitHub). — M 11. [public-facing] **Promote `SECURITY.draft.md`** (this branch) — confirm the existing `oss/overlay/SECURITY.md` is sufficient (it is); the draft is a redundant stub, delete it once confirmed. — S ### P3 — i18n of core docs 12. [public-facing] Translate `docs/getting-started.md` → EN, move JA to `getting-started.ja.md`. — M 13. [public-facing] Translate `docs/configuration.md` → EN (+ `.ja.md`). — M–L 14. [public-facing] Translate `docs/architecture.md` → EN (+ `.ja.md`). — M 15. [public-facing] Add a "Translations" note to CONTRIBUTING describing the `.ja.md` convention (EN canonical). — S ### P4 — Docker docs 16. [public-facing] Add `oss/overlay/docs/docker.md` (EN) covering the 6 gaps in section D (esp. Linux `host.docker.internal`, verify-running, config mount, sandbox-in-Docker, build-vs-prebuilt, external LLM). Link from README + getting-started §5. — M ### P5 — docs reorg reconciliation (coordinate with `docs-wip`) 17. [internal] **Land `docs-wip`'s `docs/reference/*.md` consolidation** onto main (the 18 curated feature docs; drop the generated `docs/html/**` from the merge). — L (review of 18 docs) 18. [internal] **Add `docs/html/` to `.gitignore`** and **`docs/html` + `docs/.investigate-status.json` + `docs/.*-manifest.json` to `oss/exclude.txt`** (generated artifacts; section E). — S 19. [public-facing] Add a `docs/README.md` (or section in main README) indexing the `docs/reference/*` set so the curated docs are discoverable. — S 20. [public-facing] Convert `docs/operations/*.html` to Markdown (or exclude from OSS); section E option (a)/(b). — M ### P6 — polish (optional, post-launch) 21. [public-facing] Translate top 5–8 `docs/tools/*.md`. — L 22. [public-facing] Refresh `CHANGELOG.md` date/contents at actual release. — S 23. [public-facing] Add an architecture diagram image to README. — M --- ## Drafts created on this branch (NOT replacing anything) - `docs/README.en.draft.md` — English README skeleton (starting point for item 6). - `docs/SECURITY.draft.md` — redundant stub pointing at the existing policy; exists only to confirm coverage (item 11) and should be deleted once the existing `oss/overlay/SECURITY.md` is accepted. Both are clearly labeled DRAFT and live under `docs/` so they do not collide with or overwrite the shipping overlay files.