maestro/docs/oss-docs-cleanup-plan.md
oss-sync d061ad08d8
Some checks failed
CI / build-and-test (push) Has been cancelled
sync: update from private repo (e62f5c7)
2026-06-11 01:52:48 +00:00

371 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# OSS Documentation Cleanup Plan (MAESTRO)
Status: PLAN (audit + execution checklist). Branch: `docs/oss-cleanup`.
Author: docs audit pass, 2026-06-10.
This plan audits MAESTRO's documentation against GitHub/OSS publishing
conventions and the repo owner's HTML-staged-docs preference, and lays out a
concrete, ordered execution path. It does **not** change any shipping docs; the
only new files on this branch are this plan and two clearly-labeled DRAFT stubs
(`docs/README.en.draft.md`, `docs/SECURITY.draft.md`).
## How OSS publishing actually works (verified)
`scripts/oss-sync.sh` builds the public tree from `git archive HEAD`, then:
1. removes every path in `oss/exclude.txt` (CLAUDE.md, `docs/superpowers`,
`docs/plans`, dated internal docs, `oss/` itself, sync tooling, editor configs);
2. scrubs internal hostnames/IPs via `oss/scrub.sed`;
3. strips dead links to excluded docs via `oss/scrub-deadlinks.py`;
4. overlays `oss/overlay/.` last (public-only files **win**);
5. runs a release gate (`oss/verify_release.py` + `oss/forbidden.txt`).
So the **public** doc set = (tracked docs that survive exclude) **+** the
overlay. The overlay is the source of truth for the public-facing files. The
tracked root `README.md` is the **private** README and is *replaced* by
`oss/overlay/README.md` (README is overlaid, not excluded). Anything authored
for the public must land in `oss/overlay/` or in a tracked `docs/` path that is
**not** excluded.
Implication for this cleanup: most "public-facing" doc work means editing files
under `oss/overlay/`, not the tracked root files.
---
## A. INVENTORY
Audience = who the doc is for. OSS-ships = `overlay` (public-only file),
`tracked` (survives exclude and ships as-is), or `excluded` (removed by
`oss/exclude.txt`). Language as of today.
### Top-level files
| Path | Audience | Lang | OSS-ships? | Verdict |
|------|----------|------|-----------|---------|
| `README.md` (tracked root) | internal | JA | replaced by overlay README | keep as private; **not** public |
| `oss/overlay/README.md` | public | **JA** | overlay (becomes public README) | **translate→EN-first** + add badges/screenshot |
| `oss/overlay/AGENTS.md` | public (contributors) | EN | overlay | keep |
| `oss/overlay/CONTRIBUTING.md` | public | EN | overlay | keep (add CoC + DCO/CLA note) |
| `oss/overlay/SECURITY.md` | public | EN | overlay | keep (good) |
| `oss/overlay/CHANGELOG.md` | public | EN | overlay | keep (date stale: dated 2026-06-02) |
| `oss/overlay/LICENSE` | public | EN | overlay | keep (Apache-2.0) |
| `oss/overlay/NOTICE` | public | EN | overlay | keep |
| `AGENTS.md` (tracked root) | internal | EN | tracked (not excluded) — **collides** w/ overlay | see note 1 |
| `GEMINI.md` (tracked root) | internal | — | tracked (not excluded!) | **add to exclude.txt** (internal editor config) |
| `CLAUDE.md` | internal | JA | excluded | keep excluded |
Note 1 — `AGENTS.md` exists both tracked-at-root and in the overlay. The overlay
copy wins (overlaid last), so the public ships the overlay version. Confirm the
tracked root `AGENTS.md` is acceptable to ship *if* the overlay ever stops
shipping it; today it is harmless but a maintenance trap (two AGENTS.md to keep
in sync). Recommend: keep only the overlay AGENTS.md public; the tracked root one
is internal — fine, but document the duplication.
### `docs/` overlay (public)
| Path | Audience | Lang | OSS-ships? | Verdict |
|------|----------|------|-----------|---------|
| `oss/overlay/docs/getting-started.md` | public | JA | overlay | translate→EN |
| `oss/overlay/docs/configuration.md` | public | JA | overlay | translate→EN |
| `oss/overlay/docs/architecture.md` | public | JA | overlay | translate→EN |
### `docs/` tracked (ship unless excluded)
| Path | Audience | Lang | OSS-ships? | Verdict |
|------|----------|------|-----------|---------|
| `docs/architecture.md` | mixed | JA | tracked | reconcile vs overlay arch (see note 2) |
| `docs/tools/*.md` (34) | public | JA | tracked | keep; translate top-N later |
| `docs/operations/bash-sandbox-provisioning.md` | public | JA | tracked | keep |
| `docs/operations/index.html` + `initial-setup.html` + `guide.css` | public | JA | tracked (HTML) | see RECONCILIATION |
| `docs/design/**` (ui_kits_reference jsx/html/css) | internal | mixed | tracked | **exclude** (dev design refs, not user docs) |
| `docs/aao-gateway-overview.md` | public | JA | tracked | keep |
| `docs/mcp.md`, `docs/skills.md`, `docs/ssh.md`, `docs/bench.md` | public | JA | tracked | keep |
| `docs/context-flow.md`, `docs/user-folder-layout.md` | public | JA | tracked | keep |
| `docs/maintenance-checklist.md` | internal (contributors) | JA | tracked | keep (referenced by CONTRIBUTING) |
Note 2 — duplicate architecture docs: `docs/architecture.md` (tracked) AND
`oss/overlay/docs/architecture.md` (overlay). The overlay wins for the public.
Decide which is canonical and have the other redirect/stub, or exclude the
tracked one to avoid drift.
### docs-wip branch (in-progress reorg — NOT yet on main)
`docs-wip` does two things:
2. **Adds** a consolidated, curated doc set: `docs/reference/*.md` (18 files,
Japanese — feature reference: scheduler, config, mcp, ssh, memory, gateway,
pieces, skills, notifications, media-tools, etc.), plus an HTML build system:
`docs/build_html.py` (Markdown→staged HTML), generated `docs/html/**`, and
manifest JSON (`.investigate-status.json`, `.consolidate-manifest.json`).
`docs/build_html.py` reads each doc's implementation status from
`.investigate-status.json` and renders staged HTML (design=blue /
implementation=amber / completed=green) with an `index.html` nav — exactly the
owner's HTML-staged convention. The `docs/reference/*.md` consolidation is the
single most valuable doc asset for OSS: a clean, deduplicated feature reference
replacing scattered dated design specs.
**Recommendation:** land `docs-wip`'s `docs/reference/*.md` consolidation
(highest-value, low-risk) ahead of the OSS push, but treat the *generated*
`docs/html/**` output and the `*.json` manifests as build artifacts (see
RECONCILIATION — keep the generator, gitignore/exclude the output).
---
## B. GAP ANALYSIS — GitHub OSS readiness
Verified-present: `LICENSE` (Apache-2.0), `NOTICE`, `CONTRIBUTING.md`,
`SECURITY.md`, `CHANGELOG.md`, `AGENTS.md`, `Dockerfile`, `docker-compose.yml`,
`.env.example`, `.dockerignore`.
Gaps found:
| Gap | Severity | Notes |
|-----|----------|-------|
| **No `CODE_OF_CONDUCT.md`** | high | Standard for OSS; GitHub surfaces it in the community profile. Add Contributor Covenant 2.1 to `oss/overlay/`. |
| **No `.github/` community files** | high | Missing issue templates (bug/feature), `PULL_REQUEST_TEMPLATE.md`, `FUNDING.yml` (optional). Note: repo is published to **Gitea** (`swallow/maestro`), not GitHub — Gitea reads `.gitea/ISSUE_TEMPLATE/` (and also `.github/`). Add templates under a host-appropriate dir in the overlay. |
| **README is JA-only** | high | First-time visitor on an English-default host can't read it. EN-first is the single biggest readiness fix. |
| **README has 1 badge only** | medium | Only a static license badge. Add: build/CI status, release/version, Node version, "PRs welcome". Avoid badges that need a live service. |
| **No screenshots/GIF in README** | medium | An agent UI sells itself visually. Add 12 screenshots (task detail / settings) under `oss/overlay/docs/assets/` and embed in README. |
| **No architecture diagram image** | low | The execution-flow ASCII block is fine; a simple diagram would help. Optional. |
| **CHANGELOG date stale** | low | `v0.1.0 — 2026-06-02` predates current HEAD; refresh on release. |
| **No top-level "Documentation" landing for the curated set** | medium | If `docs/reference/*` lands, README/getting-started should link an index. |
| **License headers in source** | low | Apache-2.0 doesn't require per-file headers, but `NOTICE` + a short header policy in CONTRIBUTING avoids questions. Optional. |
| **`GEMINI.md` would leak** | medium | Tracked at root, NOT in `oss/exclude.txt` → ships publicly. It's an internal editor/assistant config like CLAUDE.md. Add to exclude. |
| **`docs/design/ui_kits_reference/**` would ship** | low-medium | Internal design references (JSX prototypes, legacy admin kit). Not user docs. Add `docs/design` to exclude. |
README quality for a first-time visitor (overlay README): structure is good
(features → quickstart → requirements → docs → security → license). What's
missing for a strong first impression: English, a screenshot, a one-line
"what/why" hook in English at the very top, and CI/release badges.
---
## C. i18n PLAN
Current state: all public docs (overlay README + getting-started + configuration
+ architecture) and tracked `docs/**` are Japanese. GitHub/Gitea OSS audiences
default to English. The product also targets 多言語対応 as a goal.
**Convention (recommended): English-first with a `.ja` sibling.**
This is the lowest-friction, most-recognized GitHub pattern.
- `README.md` → English (the public README, i.e. `oss/overlay/README.md`).
- `README.ja.md` → Japanese (current content moved here).
- Cross-link at the top of each: `[English](README.md) | [日本語](README.ja.md)`.
For `docs/`, use a suffix convention to avoid a parallel directory tree:
```
oss/overlay/docs/
getting-started.md # EN (canonical)
getting-started.ja.md # JA
configuration.md # EN
configuration.ja.md # JA
architecture.md # EN
architecture.ja.md # JA
```
Rationale: a `docs/en/` + `docs/ja/` split doubles directory depth and breaks
relative links on every move. The `.ja.md` suffix keeps EN as the default a
visitor hits and keeps JA one click away. (If a richer i18n site is built later
— e.g. Docusaurus/MkDocs — migrate then; don't over-engineer now.)
**Translate-first order (highest visitor impact first):**
1. `README.md` (overlay) — the storefront. **Do first.**
2. `docs/getting-started.md` — clone→running path.
3. `docs/configuration.md``config.yaml` reference.
4. `docs/architecture.md` — for evaluators/contributors.
5. `CHANGELOG.md` (already EN), `SECURITY.md` (already EN), `CONTRIBUTING.md`
(already EN), `AGENTS.md` (already EN) — no action.
6. Later/optional: top 58 `docs/tools/*.md` by usage (bash, websearch, browseweb,
spawnsubtask, office) and the `docs/reference/*` set if landed.
**Tooling/convention:**
- Keep EN canonical, JA as translation. Do not auto-generate JA at build time;
hand-maintain the high-value pages, accept staleness on the long tail.
- Add a short "Translations" note in CONTRIBUTING describing the `.ja.md`
convention and that EN is canonical (PRs that change EN should flag the JA
sibling as needing update — don't block on it).
- The existing JA overlay docs are already written and accurate — they become
the `.ja.md` siblings essentially for free. The work is the EN translation,
not throwing away the JA.
---
## D. DOCKER DOCS PLAN
From `git clone` to running via Docker, a new user needs (and currently has):
| Step | Covered today? | Where |
|------|----------------|-------|
| `cp .env.example .env` | yes | README quickstart + getting-started §5 |
| Set LLM endpoint | yes | `.env.example` comments (`OLLAMA_BASE_URL`, `OLLAMA_MODEL`) |
| `docker compose up -d` | yes | README + getting-started |
| Where the UI is | yes | `http://localhost:9876` |
| Data persistence | yes | named volumes `maestro-data` / `maestro-workspaces` |
| Security default (localhost-only) | yes | README + getting-started + SECURITY |
What's **missing / unclear** for a Docker-first OSS user:
1. **`host.docker.internal` on Linux.** `.env.example` defaults to
`http://host.docker.internal:11434/v1`. On Linux Docker this name is not
resolvable by default (works on Docker Desktop mac/win). New Linux users will
hit "connection refused" with no hint. **Add a note**: on Linux use the host
gateway IP or `--add-host=host.docker.internal:host-gateway` (compose:
`extra_hosts`), or point at the LAN IP of the Ollama host.
2. **No "verify it's running" step.** Add a healthcheck/`docker compose logs -f`
+ "open the UI, create a task" smoke check to getting-started §5.
3. **Mounting `config.yaml` into the container** is referenced ("see the comments
in docker-compose.yml") but not shown inline. New users benefit from one
explicit example of mounting a host `config.yaml` and where setup runs in the
container (does the image run `npm run setup`, or only env-var config?).
4. **Bash sandbox in Docker.** bwrap needs unprivileged user namespaces; in a
container that may need extra flags or `--privileged`-adjacent settings, or
the hardened fallback applies. Getting-started §8 covers host provisioning but
not the containerized story. Add one paragraph: what `bash_sandbox` mode the
Docker image ships with and any caveats.
5. **Image build vs prebuilt.** Is there a published image, or is
`docker compose up` building locally from the `Dockerfile` every time? State
it. If build-from-source, note the first-run build time.
6. **GPU / external Ollama.** Make explicit that MAESTRO's container does **not**
run the LLM; users point it at an existing OpenAI-compatible endpoint. The
README says it but the Docker section could restate it (common confusion).
Recommend a dedicated `oss/overlay/docs/docker.md` (EN) consolidating the above,
linked from README + getting-started §5, rather than growing getting-started.
---
## E. RECONCILIATION — HTML-staged docs vs GitHub Markdown
The tension: the owner's convention wants `docs/` as **staged HTML**
(design/implementation/completed, color badges, `index.html` nav). GitHub/OSS
convention wants **Markdown** README + `docs/` that render on the repo host.
These are not actually in conflict if we treat HTML as a **generated artifact**,
not the source:
1. **Markdown is the single source of truth.** Author everything as `.md`
(`README.md`, `docs/**/*.md`, the `docs/reference/*.md` set from `docs-wip`).
Markdown renders natively on Gitea/GitHub and is what OSS contributors expect.
2. **`docs/build_html.py` (from `docs-wip`) generates the staged HTML view** from
that Markdown for the owner's internal browsing/handoff workflow. The
generator stays; its **output (`docs/html/**`) is a build artifact** — do not
hand-edit it, and keep it out of OSS:
- add `docs/html/` to `.gitignore` (don't commit generated output), and
- add `docs/html` + the `*.json` manifests to `oss/exclude.txt` as belt-and-
suspenders so even a stray commit never ships generated HTML publicly.
This gives the owner the staged-HTML experience locally (`python3
docs/build_html.py`) while the OSS repo ships clean Markdown — **no
duplication of content**, only a generation step.
3. **The few hand-written HTML docs that exist today**
(`docs/operations/index.html`, `initial-setup.html`, `guide.css`) are the
exception: they're authored HTML, not generated. Two options —
(a) convert them to Markdown so they fit the generated-from-MD model
(preferred for OSS consistency), or
(b) keep them as authored HTML but exclude them from OSS and let the public
read the Markdown equivalents (`docs/operations/bash-sandbox-provisioning.md`
already exists). Recommend (a) long-term, (b) as the immediate no-risk choice.
4. **Stage metadata** (design/impl/completed) lives in the Markdown frontmatter
or the manifest JSON, consumed only by `build_html.py`. OSS readers never see
stages; they see finished Markdown. Internal readers get the staged HTML view.
Net: keep `docs-wip`'s generator + the `docs/reference/*.md` consolidation; gate
the generated HTML behind `.gitignore` + `oss/exclude.txt`. One source (MD), two
renderings (host-native MD for OSS, staged HTML for internal).
---
## F. PRIORITIZED EXECUTION CHECKLIST
Effort: S ≤30min, M ≤2h, L ≤half-day. Tags: [public-facing] ships to OSS;
[internal] private-repo / tooling only.
### P0 — leak/correctness fixes (do before any OSS push)
1. [internal] **Add `GEMINI.md` to `oss/exclude.txt`** (internal assistant
config, currently ships). — S
2. [internal] **Add `docs/design` to `oss/exclude.txt`** (JSX/HTML UI prototypes,
not user docs). — S
3. [internal] **Resolve duplicate architecture doc**: pick canonical
(`oss/overlay/docs/architecture.md`), exclude or stub the tracked
`docs/architecture.md`. — S
4. [internal] **Decide AGENTS.md duplication** (overlay wins; document that the
tracked root copy is internal). — S
5. [internal] Run `scripts/oss-sync.sh --dry-run --local-only` and read the
release-gate output + diff stat to confirm nothing internal leaks. — S
### P1 — English README + storefront (highest visitor impact)
6. [public-facing] **Translate `oss/overlay/README.md` to English**; move JA to
`oss/overlay/README.ja.md`; add the `EN | 日本語` switcher line. Use
`docs/README.en.draft.md` (this branch) as the starting skeleton. — M
7. [public-facing] **Add badges** to README: CI/build, release/version, Node 22+,
license (keep), PRs-welcome. Only badges that don't require a live host. — S
8. [public-facing] **Add 12 screenshots** (`oss/overlay/docs/assets/`) and embed
under a "Screenshots" section. — SM (needs capturing UI)
### P2 — community health files
9. [public-facing] **Add `CODE_OF_CONDUCT.md`** (Contributor Covenant 2.1) to
`oss/overlay/`. — S
10. [public-facing] **Add issue + PR templates** under the host-appropriate dir
in the overlay (`.gitea/ISSUE_TEMPLATE/{bug,feature}.md` + a
`PULL_REQUEST_TEMPLATE.md`; also `.github/` if mirroring to GitHub). — M
11. [public-facing] **Promote `SECURITY.draft.md`** (this branch) — confirm the
existing `oss/overlay/SECURITY.md` is sufficient (it is); the draft is a
redundant stub, delete it once confirmed. — S
### P3 — i18n of core docs
12. [public-facing] Translate `docs/getting-started.md` → EN, move JA to
`getting-started.ja.md`. — M
13. [public-facing] Translate `docs/configuration.md` → EN (+ `.ja.md`). — ML
14. [public-facing] Translate `docs/architecture.md` → EN (+ `.ja.md`). — M
15. [public-facing] Add a "Translations" note to CONTRIBUTING describing the
`.ja.md` convention (EN canonical). — S
### P4 — Docker docs
16. [public-facing] Add `oss/overlay/docs/docker.md` (EN) covering the 6 gaps in
section D (esp. Linux `host.docker.internal`, verify-running, config mount,
sandbox-in-Docker, build-vs-prebuilt, external LLM). Link from README +
getting-started §5. — M
### P5 — docs reorg reconciliation (coordinate with `docs-wip`)
17. [internal] **Land `docs-wip`'s `docs/reference/*.md` consolidation** onto main
(the 18 curated feature docs; drop the generated `docs/html/**` from the
merge). — L (review of 18 docs)
18. [internal] **Add `docs/html/` to `.gitignore`** and **`docs/html` +
`docs/.investigate-status.json` + `docs/.*-manifest.json` to
`oss/exclude.txt`** (generated artifacts; section E). — S
19. [public-facing] Add a `docs/README.md` (or section in main README) indexing
the `docs/reference/*` set so the curated docs are discoverable. — S
20. [public-facing] Convert `docs/operations/*.html` to Markdown (or exclude from
OSS); section E option (a)/(b). — M
### P6 — polish (optional, post-launch)
21. [public-facing] Translate top 58 `docs/tools/*.md`. — L
22. [public-facing] Refresh `CHANGELOG.md` date/contents at actual release. — S
23. [public-facing] Add an architecture diagram image to README. — M
---
## Drafts created on this branch (NOT replacing anything)
- `docs/README.en.draft.md` — English README skeleton (starting point for item 6).
- `docs/SECURITY.draft.md` — redundant stub pointing at the existing policy;
exists only to confirm coverage (item 11) and should be deleted once the
existing `oss/overlay/SECURITY.md` is accepted.
Both are clearly labeled DRAFT and live under `docs/` so they do not collide with
or overwrite the shipping overlay files.