docs(sprint-0): expert-panel review v2 + doc-alignment

- Add Sprint-0-Plan-Review-v2.md (3-expert panel + code reality check, verdict REVISE 72%) - Fix stale Assessment migration count (V1..V6, retracts earlier 'V5 tail') - Add Open-Questions Q13-Q17 (review-v2 blockers B1-B5) - Update Home page status to flag v2 REVISE verdict
2026-06-24 20:22:49 +02:00
parent 9bceb0ae96
commit 20678d8ac6
4 changed files with 166 additions and 2 deletions
@@ -41,7 +41,8 @@ plate-auth is the carve-out of [InspectFlow](https://git.plate-software.de/pplat
 | [Sprint-0-Assessment](Sprint-0-Assessment) | State of the InspectFlow auth code, what is reusable, risks, recommendation |
 | [Sprint-0-Plan](Sprint-0-Plan) | File-by-file extraction plan, package renames, Flyway consolidation, publishing pipeline |
 | [Sprint-0-Testplan](Sprint-0-Testplan) | Unit / integration / contract test coverage matrix |
-| [Sprint-0-Plan-Review](Sprint-0-Plan-Review) | ✅ Plan Review — APPROVED (2 warnings, panel confidence 82%) |
+| [Sprint-0-Plan-Review](Sprint-0-Plan-Review) | Plan Review v1 — APPROVED docs-only (2 warnings, panel 82%) |
 | [Sprint-0-Plan-Review-v2](Sprint-0-Plan-Review-v2) | ⚠️ **Plan Review v2 — REVISE (panel 72%)** — code-reality check found 5 blockers (B1 unscoped security chain, B2 `@ComponentScan` in auto-config, B3 plan↔code scope gap, B4 open CORS default, B5 undocumented RefreshToken). Read this before resuming code. |
 ### Consumer guides
@@ -40,6 +40,11 @@ When a question is decided, this doc gets updated; the decision is also reflecte
 | Q10 | License — MIT, Apache-2.0, or internal proprietary | ✅ Decided (2026-06-24) | n/a |
 | Q11 | First-class i18n in `@platesoft/auth`/react? | ⏭️ Deferred | n/a |
 | Q12 | Audit emit channel — DB rows only, or also event stream? | 🟡 Leaning | Before W2 |
 | Q13 | SecurityFilterChain scoping — `securityMatcher` vs unscoped `anyRequest` | 🔴 Blocker (Review v2 B1) | Before v0.1.0 |
 | Q14 | Remove `@ComponentScan` from auto-config — explicit `@Bean`/`@Import`? | 🔴 Blocker (Review v2 B2) | Before v0.1.0 |
 | Q15 | v0.1.0 surface — finish full extraction vs rescope to OAuth-core | 🔴 Blocker (Review v2 B3) | Before any code resumes |
 | Q16 | CORS default — fail-closed (same-origin) vs current open `*` | 🔴 Blocker (Review v2 B4) | Before v0.1.0 |
 | Q17 | `RefreshToken` table — stateful rotation now (doc+migration) vs remove | 🔴 Blocker (Review v2 B5) | Before v0.1.0 |
 ---
@@ -48,7 +48,9 @@ directly from `inspectflow/backend` and `inspectflow/frontend` at the time of th
 | Migration | V28 — `invitations` + audit | — | None | **Renumber to V3** |
 | Migration | V29 — `access_requests` | — | None | **Renumber to V4** |
 | Migration | V30 — `companies.microsoft_tenant_id` | — | **Pure InspectFlow** | **Leave in InspectFlow's migration set** |
-| Migration | V31 — `login_events` + revinfo actor | — | None | **Renumber to V5** (V30 stays in app, so plate-auth's tail is V5 not V6) |
+| Migration | V31 — `login_events` + revinfo actor | — | None | **Renumber to V6** (see note below) |
 > **Migration-count note (corrected 2026-06-24, supersedes earlier "V5 tail"):** plate-auth ships **6** migrations, V1..V6. V26→V1, V27→V2, V28→V3, V29→V4, then a **standalone** `V5__add_microsoft_tenant_id_index.sql` (an index on `user_identities.microsoft_tenant_id` — T1 data, distinct from V30's `companies.microsoft_tenant_id` which is T3 and stays in InspectFlow), and finally V31→**V6** `create_login_events_and_revinfo_actor.sql`. This matches [Architecture.md §8.1](Architecture.md), [Open-Questions F2](Open-Questions.md), and the 6 migration files present in `plate-auth-starter/src/main/resources/db/migration/auth/`. The earlier "tail is V5 not V6" phrasing was wrong and is retracted.
 ### 1.2 Frontend — TypeScript / Next.js 15 / NextAuth v5
@@ -0,0 +1,156 @@
 # Plan Review v2 — plate-auth Sprint 0 (Expert Panel + Code Reality Check)
 **Date:** 2026-06-24
 **Module:** plate-auth
 **Reviewer:** Lumen (Architect / Plan Reviewer)
 **Method:** 3-expert panel (Domain / Architecture / Risk) + web-validated best practices + **actual code inspection** of `main` @ `9d314a4`
 **Supersedes:** [Sprint-0-Plan-Review.md](Sprint-0-Plan-Review.md) (v1 — reviewed docs only, not code)
 **Verdict:** ⚠️ **REVISE** — plan is sound, but code materially diverges from it and two real security defects exist in the starter.
 ---
 ## 0. Why a v2 review
 The v1 review (82% panel confidence, APPROVED-with-2-warnings) reviewed **only the wiki documents**. Since then, code was committed (W1–W7, HEAD `9d314a4` *"test(w7): greenfield consumer integration test"*). This v2 review reads the **actual code** against the plan and finds a gap large enough to move the verdict to REVISE.
 This is not a regression in planning quality — the plan is still good. It is that **the code is not yet what the plan promises**, and shipping `v0.1.0` against the current code would publish a misleading and partially-insecure artifact.
 ---
 ## 1. Headline finding: plan says "extraction", code is "partial greenfield"
 The [Sprint-0-Assessment](Sprint-0-Assessment.md) classifies the work as **65% lift-and-shift / 25% reshape / 10% leave-behind** — a *verbatim extraction* of InspectFlow Sprint 14.1–14.6. The [README](https://git.plate-software.de/pplate/plate-auth) advertises a 5-line quick-start with Google SSO, invitations, access requests, and admin audit.
 **The code on `main` does not contain most of that.**
 | Plan promises (lift-and-shift) | Present in code? |
 |---|---|
 | `AuthController` (password login/register/refresh/me) | ❌ Absent |
 | `InvitationController` + `InvitationService` | ❌ Absent |
 | `AccessRequestController` + `AccessRequestService` | ❌ Absent |
 | `AdminAuditController` | ❌ Absent |
 | `OrgContextResolver` (filter) | ❌ Absent |
 | `OAuthController` + `ExchangeService` | ✅ Present |
 | `JwtService`, `LoginEventService`, `MembershipService` | ✅ Present |
 | Entities (User, UserIdentity, Membership, Invitation, AccessRequest, LoginEvent) | ✅ Present |
 | 6 Flyway migrations V1–V6 | ✅ Present |
 | 5 SPIs + defaults | ✅ Present |
 | Frontend `createAuthConfig()` | ❌ **`throw new Error('Not yet implemented — Sprint 0 W3')`** |
 | Frontend proxy / exchange / middleware / hooks | ❌ Stubs |
 So the backend is a **re-skeletoned subset** (the exchange/OAuth happy-path only), and the **entire npm package is stubs**. The entities for invitations/access-requests exist but have no service or controller behind them.
 **Implication:** The `test(w7) greenfield consumer integration test` is green because it only exercises bootstrap + exchange + Flyway — not the feature surface the plan and README describe. The [CHANGELOG](https://git.plate-software.de/pplate/plate-auth) still lists only "W1 scaffold", which is closer to the truth than the README.
 This is the single most important thing to reconcile: **either** the plan/README must be rescoped to match what v0.1.0 will actually ship, **or** the missing extraction work (W2 controllers/services + W3 frontend) must be completed before tagging. The panel recommends the latter, because Sparkboard and InspectFlow both depend on the advertised surface.
 ---
 ## 2. Expert Panel
 ### 🏛️ Domain Expert — "Is the auth/multi-tenancy domain modelled correctly?"
 **Confidence: 78%**
 **What's right:**
 - The `(org_type, org_id)` polymorphic membership model is correct and consistent across entity, migration ([V2](https://git.plate-software.de/pplate/plate-auth), unique constraint on `(user_id, org_type, org_id)`), and the `OrgValidator` SPI seam. This is a sound multi-tenant primitive.
 - Find-or-create-user in `ExchangeService` correctly keys on `(provider, subject)` first, then falls back to email — the right precedence for OAuth identity linking.
 - **Privilege-assignment is safe:** new users are hard-coded to `Role.ROLE_USER`; the exchange envelope does *not* carry a role claim, so a tampered envelope cannot mint an admin. Good.
 **Domain gaps:**
 - **D1 — The membership lifecycle is unreachable.** `Invitation` and `AccessRequest` entities exist, but with no service/controller the *only* way a user gets a membership is… there is none in code. `MembershipService` exists but nothing calls `grant()` from a request path. For a multi-tenancy library, "how does a user acquire a membership" is the core domain workflow and it is currently absent.
 - **D2 — `OnboardingHook.onFirstSignIn` fires inside the exchange transaction** (`ExchangeService.findOrCreateUser`). For InspectFlow's heavy `OnboardingService` (tenant auto-map) and Sparkboard's `SparkboardOnboardingHook`, running consumer business logic inside the auth library's `@Transactional` is a domain-coupling hazard — a slow or failing hook rolls back the login. The plan's sequence diagram (Architecture §5 step 13) shows the hook *after* the login event, but the code runs it *before* `recordSuccess`. Worth an explicit contract: "hooks run in the auth transaction; keep them fast and idempotent" — or move to post-commit.
 - **D3 — `last_login_at` is set on the identity but `findOrCreateUser` never persists the `User.lastProvider` change for the existing-identity branch** (no explicit save; relies on dirty-checking inside the transaction, which works, but the new-user email-collision branch mutates a managed entity from `findByEmail` without re-save — fragile).
 ---
 ### 🔧 Architecture Expert — "Is this a well-behaved Spring Boot 4 starter?"
 **Confidence: 68%**
 This is where the most consequential defects live. Validated against the Spring Boot 4.1 *Creating Your Own Auto-configuration* and Spring Security 7 reference docs.
 - **A1 (BLOCKER) — The `SecurityFilterChain` is unscoped and hijacks the consumer.** [`SecurityConfig`](https://git.plate-software.de/pplate/plate-auth) registers a chain at `@Order(-100)` with `.anyRequest().authenticated()` and **no `securityMatcher(...)`**. In Spring Security, the first chain whose matcher matches handles the request; an unscoped chain at the highest priority matches **every** request in the consuming app. This means InspectFlow's and Sparkboard's *own* security rules (public marketing routes, their own endpoints) are overridden by plate-auth. A library security chain **must** be scoped — e.g. `http.securityMatcher("/api/auth/**", "/api/invitations/**", "/api/access-requests/**", "/api/admin/**", "/api/me")` — and the consumer keeps a default chain for everything else. This is the textbook "library provides a *scoped* chain, app provides the catch-all" pattern. As written, plate-auth is not embeddable without breaking the host.
 - **A2 — `@ComponentScan(basePackages = "de.platesoft.auth")` in an auto-configuration is an anti-pattern.** The Spring Boot docs are explicit: *"auto-configuration classes should never use `@ComponentScan`."* It re-scans on every consumer and can pick up beans unpredictably; it also double-instantiates when the consumer's own scan overlaps. Starters should declare beans explicitly with `@Bean` methods (the SPI defaults already do this correctly — extend that pattern to services/controllers, or use `@Import`).
 - **A3 — Beans are discovered by `@Service`/`@Configuration` stereotype + component-scan, not by the auto-config.** Combined with A2, this means the starter only works because of the broad scan. Remove the scan (per A2) and the services vanish. The dependency graph needs to be made explicit.
 - **A4 — Frontend is unbuildable as advertised.** `createAuthConfig` throws. `tsup` config exists but there is nothing real to bundle. A6 ("npm artifact bundles + ships ESM") cannot pass against current code. The lockstep-version promise (Roadmap Amendment 1) is moot while one half is a stub.
 - **A5 — Second Flyway bean (`PlateAuthFlywayConfig`) ordering vs Spring Boot's managed Flyway.** The plan (§7.3) correctly flags that plate-auth's `flyway_schema_history_auth` migration must run before JPA init. This needs an integration test asserting ordering when the *consumer also has* a primary Flyway — `PlateAuthFlywayMigrationIT` exists but only tests the auth migrations in isolation, not co-existence with a consumer's `flyway_schema_history`. That co-existence is the actual R1 risk.
 **What's right:** `@ConditionalOnMissingBean` SPI override pattern is idiomatic and correctly implemented; `@AutoConfiguration` + `AutoConfiguration.imports` registration is correct; `PlateAuthProperties` consolidation with `@Validated` is clean.
 ---
 ### 🛡️ Risk / Compliance Expert — "What's the blast radius of shipping this as v0.1.0?"
 **Confidence: 70%**
 - **R1 (HIGH) — Default-open CORS in an auth library.** [`SecurityConfig.corsConfigurationSource`](https://git.plate-software.de/pplate/plate-auth): when `allowedOrigins` is empty (the default), it sets `allowedOriginPatterns("*")`. For a library whose entire job is authentication, a permissive-by-default CORS is the wrong default. It should **fail closed** — empty origins → no cross-origin allowed (same-origin only), forcing the consumer to opt in. The current default also silently flips `allowCredentials` off in the `*` branch, which masks the danger during testing and surprises in prod.
 - **R2 (MEDIUM) — Undocumented `RefreshToken` entity/table.** Code ships a `RefreshToken` entity + `RefreshTokenRepository` that appear in **no** wiki doc. The plan's threat model (Architecture §10) and §9.3 explicitly say refresh tokens are *stateless JWTs* and rotation is *deferred to v0.3*. Either the code added stateful refresh (good, but then the data model, threat model, and a V7 migration are missing and the security checklist is wrong), or it's a dead half-implementation. Undocumented security-relevant persistence is a compliance red flag — it must be reconciled.
 - **R3 (MEDIUM) — Audit surface is partial.** The plan's §9.7 audit checklist requires Envers revisions with `actor_user_id` on every state-changing op across Membership/Invitation/AccessRequest/Auth services. With those services largely absent and no `RevInfo`/`RevInfoListener` in the code tree, the audit guarantees the security checklist claims are not yet backed by code. `LoginEventService` exists (good) but is the only audit actually present.
 - **R4 (LOW, known) — In-memory nonce store.** Correctly implemented (`ConcurrentHashMap` + TTL GC, `putIfAbsent` race-safe) and correctly documented as single-replica-only with v0.3 fix. Accepted.
 - **R5 (LOW) — Nonce GC is unbounded between calls.** `gcNonces` runs on every exchange; under burst load the map can grow to the burst size before GC. Fine for 2-consumer scale; note for v0.3 alongside the distributed store.
 **Positive compliance notes:** secrets are `@NotBlank @Size(min=32)` validated (fail-fast confirmed in `PlateAuthProperties`); JWT carries only `sub`/`email`/`role` (no excess PII); HMAC compare is constant-time (`MessageDigest.isEqual`); the `PermissiveOrgValidator` per-call WARN is implemented exactly as F1 decided.
 ---
 ## 3. Panel Verdict
 **Overall Confidence: 72%**
 - 🏛️ Domain: 78% — model is sound; membership-acquisition workflow is absent in code.
 - 🔧 Architecture: 68% — **unscoped security chain + `@ComponentScan` in auto-config make the starter non-embeddable as-is; frontend is a stub.**
 - 🛡️ Risk: 70% — default-open CORS + undocumented refresh-token table + partial audit.
 > Per the panel rule, **any expert < 70% (Architecture 68%) → the verdict is REVISE**, regardless of the document-quality checklist (which remains strong).
 ### ❌ Blockers (must fix before v0.1.0 tag)
 - **B1 — Scope the `SecurityFilterChain`** with `securityMatcher(...)` to plate-auth's own endpoints so it cannot hijack the consuming app. (Arch A1)
 - **B2 — Remove `@ComponentScan` from the auto-configuration**; declare service/controller/filter beans explicitly or via `@Import`. (Arch A2/A3)
 - **B3 — Reconcile plan ↔ code scope.** Decide v0.1.0 surface: either (a) finish the extraction (AuthController, Invitation/AccessRequest/AdminAudit services+controllers, OrgContextResolver, real frontend `createAuthConfig`/proxy/exchange), or (b) rescope v0.1.0 to "OAuth-exchange + memberships core" and move the rest to v0.2 — and rewrite README/Assessment/Roadmap to match. (Headline §1)
 - **B4 — Make CORS fail-closed by default.** Empty `allowedOrigins` → same-origin only, never `*`. (Risk R1)
 - **B5 — Reconcile the `RefreshToken` table** with the threat model: document it + add migration + update §9.3, or remove it. (Risk R2)
 ### ⚠️ Warnings (resolve during the sprint)
 - **W-A — Onboarding hook transaction contract.** Define whether SPI hooks run inside or after the auth transaction; document it; align code to Architecture §5. (Domain D2)
 - **W-B — Flyway co-existence test.** Add an IT where the consumer has its *own* primary Flyway + `flyway_schema_history`, asserting plate-auth's `flyway_schema_history_auth` runs independently and before JPA. (Arch A5)
 - **W-C — Stale Assessment migration count.** [Sprint-0-Assessment §1.1](Sprint-0-Assessment.md) still says "V1..V5 / tail is V5 not V6". Code has 6 migrations and F2 locked V1..V6. Fix the Assessment. (carry-over of v1 W-2)
 - **W-D — CHANGELOG vs README truth.** CHANGELOG says "W1 scaffold"; README advertises full feature set. Align both to actual shipped surface.
 ### ℹ️ Recommendations
 - Add an `@ConditionalOnMissingBean(SecurityFilterChain.class)` escape so consumers can fully replace the chain (Sparkboard Q06 wants defaults; InspectFlow may want override).
 - Provide a `spring-configuration-metadata` (annotation processor is in the plan §4.3 W2-10 — verify it's wired) so `plate.auth.*` autocompletes in consumer IDEs.
 - Consider `@AutoConfiguration(after = {DataSourceAutoConfiguration.class, FlywayAutoConfiguration.class})` to make ordering explicit rather than relying on bean-factory side effects.
 - The npm package should ship a golden HMAC test vector generated from the backend (Testplan TR-3) so the two halves can't drift on the envelope canonicalization.
 ---
 ## 4. What stays APPROVED from v1
 The planning *artifacts* remain high quality. The tier model (T1/T2/T3), the 5 SPI seams, the lockstep-version strategy, the separate-Flyway-history decision, the 43-test matrix, and the threat-model table are all sound and should be kept. v2 does not re-litigate them — it flags that **the code has not yet caught up to them**, plus the two starter-embedding defects (B1/B2) and the CORS default (B4) which are code-design issues the document review could not have caught.
 ---
 ## 5. Recommended next action
 1. **Apply doc-alignment fixes** (W-C, W-D, B3-rescope-decision) in the wiki — cheap, immediate.
 2. **Switch to Code mode** to fix B1/B2/B4/B5 in `plate-auth-starter` (the security-embedding defects) and to complete or rescope the extraction (B3).
 3. Re-run this panel against the post-fix code before cutting the `v0.0.1` validation tag (Plan §8.2) — which remains the right gate before `v0.1.0`.
 **Panel Confidence: 72% → REVISE** (Domain 78 / Architecture 68 / Risk 70).
 ---
 ## 6. Cross-references
 - [Sprint-0-Plan-Review.md](Sprint-0-Plan-Review.md) (v1)
 - [Sprint-0-Plan.md](Sprint-0-Plan.md)
 - [Sprint-0-Assessment.md](Sprint-0-Assessment.md)
 - [Architecture.md](Architecture.md)
 - [Open-Questions.md](Open-Questions.md)
 ---
 **End of Sprint-0-Plan-Review-v2.md.**