initial commit

2026-03-16 19:54:53 -04:00
commit bfe0e01254
3341 changed files with 483939 additions and 0 deletions
--- a/.claude/skills/bmad-testarch-test-review/steps-c/step-01-load-context.md
+++ b/.claude/skills/bmad-testarch-test-review/steps-c/step-01-load-context.md
@@ -0,0 +1,197 @@
+---
+name: 'step-01-load-context'
+description: 'Load knowledge base, determine scope, and gather context'
+nextStepFile: './step-02-discover-tests.md'
+knowledgeIndex: '{project-root}/_bmad/tea/testarch/tea-index.csv'
+outputFile: '{test_artifacts}/test-review.md'
+---
+
+# Step 1: Load Context & Knowledge Base
+
+## STEP GOAL
+
+Determine review scope, load required knowledge fragments, and gather related artifacts.
+
+## MANDATORY EXECUTION RULES
+
+- 📖 Read the entire step file before acting
+- ✅ Speak in `{communication_language}`
+
+---
+
+## EXECUTION PROTOCOLS:
+
+- 🎯 Follow the MANDATORY SEQUENCE exactly
+- 💾 Record outputs before proceeding
+- 📖 Load the next step only when instructed
+
+## CONTEXT BOUNDARIES:
+
+- Available context: config, loaded artifacts, and knowledge fragments
+- Focus: this step's goal only
+- Limits: do not execute future steps
+- Dependencies: prior steps' outputs (if any)
+
+## MANDATORY SEQUENCE
+
+**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
+
+## 1. Determine Scope and Stack
+
+Use `review_scope`:
+
+- **single**: one file
+- **directory**: all tests in folder
+- **suite**: all tests in repo
+
+If unclear, ask the user.
+
+**Stack Detection** (for context-aware loading):
+
+Read `test_stack_type` from `{config_source}`. If `"auto"` or not configured, infer `{detected_stack}` by scanning `{project-root}`:
+
+- **Frontend indicators**: `playwright.config.*`, `cypress.config.*`, `package.json` with react/vue/angular
+- **Backend indicators**: `pyproject.toml`, `pom.xml`/`build.gradle`, `go.mod`, `*.csproj`, `Gemfile`, `Cargo.toml`
+- **Both present** → `fullstack`; only frontend → `frontend`; only backend → `backend`
+- Explicit `test_stack_type` overrides auto-detection
+
+---
+
+### Tiered Knowledge Loading
+
+Load fragments based on their `tier` classification in `tea-index.csv`:
+
+1. **Core tier** (always load): Foundational fragments required for this workflow
+2. **Extended tier** (load on-demand): Load when deeper analysis is needed or when the user's context requires it
+3. **Specialized tier** (load only when relevant): Load only when the specific use case matches (e.g., contract-testing only for microservices, email-auth only for email flows)
+
+> **Context Efficiency**: Loading only core fragments reduces context usage by 40-50% compared to loading all fragments.
+
+### Playwright Utils Loading Profiles
+
+**If `tea_use_playwright_utils` is enabled**, select the appropriate loading profile:
+
+- **API-only profile** (when `{detected_stack}` is `backend` or no `page.goto`/`page.locator` found in test files):
+  Load: `overview`, `api-request`, `auth-session`, `recurse` (~1,800 lines)
+
+- **Full UI+API profile** (when `{detected_stack}` is `frontend`/`fullstack` or browser tests detected):
+  Load: all Playwright Utils core fragments (~4,500 lines)
+
+**Detection**: Scan `{test_dir}` for files containing `page.goto` or `page.locator`. If none found, use API-only profile.
+
+### Pact.js Utils Loading
+
+**If `tea_use_pactjs_utils` is enabled** (and contract tests detected in review scope):
+
+Load: `pactjs-utils-overview.md`, `pactjs-utils-provider-verifier.md`, `pactjs-utils-request-filter.md` (the 3 most relevant for reviewing provider verification tests)
+
+**If `tea_use_pactjs_utils` is disabled** but contract tests are in review scope:
+
+Load: `contract-testing.md`
+
+### Pact MCP Loading
+
+**If `tea_pact_mcp` is `"mcp"`:**
+
+Load: `pact-mcp.md` — enables agent to use SmartBear MCP "Review Pact Tests" tool for automated best-practice feedback during test review.
+
+## 2. Load Knowledge Base
+
+From `{knowledgeIndex}` load:
+
+Read `{config_source}` and check `tea_use_playwright_utils`, `tea_use_pactjs_utils`, `tea_pact_mcp`, and `tea_browser_automation` to select the correct fragment set.
+
+**Core:**
+
+- `test-quality.md`
+- `data-factories.md`
+- `test-levels-framework.md`
+- `selective-testing.md`
+- `test-healing-patterns.md`
+- `selector-resilience.md`
+- `timing-debugging.md`
+
+**If Playwright Utils enabled:**
+
+- `overview.md`, `api-request.md`, `network-recorder.md`, `auth-session.md`, `intercept-network-call.md`, `recurse.md`, `log.md`, `file-utils.md`, `burn-in.md`, `network-error-monitor.md`, `fixtures-composition.md`
+
+**If disabled:**
+
+- `fixture-architecture.md`
+- `network-first.md`
+- `playwright-config.md`
+- `component-tdd.md`
+- `ci-burn-in.md`
+
+**Playwright CLI (if `tea_browser_automation` is "cli" or "auto"):**
+
+- `playwright-cli.md`
+
+**MCP Patterns (if `tea_browser_automation` is "mcp" or "auto"):**
+
+- (existing MCP-related fragments, if any are added in future)
+
+**Pact.js Utils (if enabled and contract tests in review scope):**
+
+- `pactjs-utils-overview.md`, `pactjs-utils-provider-verifier.md`, `pactjs-utils-request-filter.md`
+
+**Contract Testing (if pactjs-utils disabled but contract tests in review scope):**
+
+- `contract-testing.md`
+
+**Pact MCP (if tea_pact_mcp is "mcp"):**
+
+- `pact-mcp.md`
+
+---
+
+## 3. Gather Context Artifacts
+
+If available:
+
+- Story file (acceptance criteria)
+- Test design doc (priorities)
+- Framework config
+
+Summarize what was found.
+
+Coverage mapping and coverage gates are out of scope in `test-review`. Route those concerns to `trace`.
+
+---
+
+## 4. Save Progress
+
+**Save this step's accumulated work to `{outputFile}`.**
+
+- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
+
+  ```yaml
+  ---
+  stepsCompleted: ['step-01-load-context']
+  lastStep: 'step-01-load-context'
+  lastSaved: '{date}'
+  ---
+  ```
+
+  Then write this step's output below the frontmatter.
+
+- **If `{outputFile}` already exists**, update:
+  - Add `'step-01-load-context'` to `stepsCompleted` array (only if not already present)
+  - Set `lastStep: 'step-01-load-context'`
+  - Set `lastSaved: '{date}'`
+  - Append this step's output to the appropriate section of the document.
+
+**Update `inputDocuments`**: Set `inputDocuments` in the output template frontmatter to the list of artifact paths loaded in this step (e.g., knowledge fragments, test design documents, configuration files).
+
+Load next step: `{nextStepFile}`
+
+## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
+
+### ✅ SUCCESS:
+
+- Step completed in full with required outputs
+
+### ❌ SYSTEM FAILURE:
+
+- Skipped sequence steps or missing outputs
+  **Master Rule:** Skipping steps is FORBIDDEN.
--- a/.claude/skills/bmad-testarch-test-review/steps-c/step-01b-resume.md
+++ b/.claude/skills/bmad-testarch-test-review/steps-c/step-01b-resume.md
@@ -0,0 +1,104 @@
+---
+name: 'step-01b-resume'
+description: 'Resume interrupted workflow from last completed step'
+outputFile: '{test_artifacts}/test-review.md'
+---
+
+# Step 1b: Resume Workflow
+
+## STEP GOAL
+
+Resume an interrupted workflow by loading the existing output document, displaying progress, and routing to the next incomplete step.
+
+## MANDATORY EXECUTION RULES
+
+- Read the entire step file before acting
+- Speak in `{communication_language}`
+
+---
+
+## EXECUTION PROTOCOLS:
+
+- Follow the MANDATORY SEQUENCE exactly
+- Load the next step only when instructed
+
+## CONTEXT BOUNDARIES:
+
+- Available context: Output document with progress frontmatter
+- Focus: Load progress and route to next step
+- Limits: Do not re-execute completed steps
+- Dependencies: Output document must exist from a previous run
+
+## MANDATORY SEQUENCE
+
+**CRITICAL:** Follow this sequence exactly.
+
+### 1. Load Output Document
+
+Read `{outputFile}` and parse YAML frontmatter for:
+
+- `stepsCompleted` -- array of completed step names
+- `lastStep` -- last completed step name
+- `lastSaved` -- timestamp of last save
+
+**If `{outputFile}` does not exist**, display:
+
+"No previous progress found. There is no output document to resume from. Please use **[C] Create** to start a fresh workflow run."
+
+**THEN:** Halt. Do not proceed.
+
+---
+
+### 2. Display Progress Dashboard
+
+Display progress with checkmark/empty indicators:
+
+```
+Test Quality Review - Resume Progress:
+
+1. Load Context (step-01-load-context)              [completed/pending]
+2. Discover Tests (step-02-discover-tests)           [completed/pending]
+3. Quality Evaluation + Aggregate (step-03f-aggregate-scores) [completed/pending]
+4. Generate Report (step-04-generate-report)         [completed/pending]
+
+Last saved: {lastSaved}
+```
+
+---
+
+### 3. Route to Next Step
+
+Based on `lastStep`, load the next incomplete step:
+
+| lastStep                    | Next Step File                    |
+| --------------------------- | --------------------------------- |
+| `step-01-load-context`      | `./step-02-discover-tests.md`     |
+| `step-02-discover-tests`    | `./step-03-quality-evaluation.md` |
+| `step-03f-aggregate-scores` | `./step-04-generate-report.md`    |
+| `step-04-generate-report`   | **Workflow already complete.**    |
+
+**If `lastStep` is the final step** (`step-04-generate-report`), display: "All steps completed. Use **[C] Create** to start fresh, **[V] Validate** to review outputs, or **[E] Edit** to make revisions." Then halt.
+
+**If `lastStep` does not match any value above**, display: "Unknown progress state (`lastStep`: {lastStep}). Please use **[C] Create** to start fresh." Then halt.
+
+**Otherwise**, load the identified step file, read completely, and execute.
+
+The existing content in `{outputFile}` provides context from previously completed steps.
+
+---
+
+## SYSTEM SUCCESS/FAILURE METRICS
+
+### SUCCESS:
+
+- Output document loaded and parsed correctly
+- Progress dashboard displayed accurately
+- Routed to correct next step
+
+### FAILURE:
+
+- Not loading output document
+- Incorrect progress display
+- Routing to wrong step
+
+**Master Rule:** Resume MUST route to the exact next incomplete step. Never re-execute completed steps.
--- a/.claude/skills/bmad-testarch-test-review/steps-c/step-02-discover-tests.md
+++ b/.claude/skills/bmad-testarch-test-review/steps-c/step-02-discover-tests.md
@@ -0,0 +1,113 @@
+---
+name: 'step-02-discover-tests'
+description: 'Find and parse test files'
+nextStepFile: './step-03-quality-evaluation.md'
+outputFile: '{test_artifacts}/test-review.md'
+---
+
+# Step 2: Discover & Parse Tests
+
+## STEP GOAL
+
+Collect test files in scope and parse structure/metadata.
+
+## MANDATORY EXECUTION RULES
+
+- 📖 Read the entire step file before acting
+- ✅ Speak in `{communication_language}`
+
+---
+
+## EXECUTION PROTOCOLS:
+
+- 🎯 Follow the MANDATORY SEQUENCE exactly
+- 💾 Record outputs before proceeding
+- 📖 Load the next step only when instructed
+
+## CONTEXT BOUNDARIES:
+
+- Available context: config, loaded artifacts, and knowledge fragments
+- Focus: this step's goal only
+- Limits: do not execute future steps
+- Dependencies: prior steps' outputs (if any)
+
+## MANDATORY SEQUENCE
+
+**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
+
+## 1. Discover Test Files
+
+- **single**: use provided file path
+- **directory**: glob under `{test_dir}` or selected folder
+- **suite**: glob all tests in repo
+
+Halt if no tests are found.
+
+---
+
+## 2. Parse Metadata (per file)
+
+Collect:
+
+- File size and line count
+- Test framework detected
+- Describe/test block counts
+- Test IDs and priority markers
+- Imports, fixtures, factories, network interception
+- Waits/timeouts and control flow (if/try/catch)
+
+---
+
+## 3. Evidence Collection (if `tea_browser_automation` is `cli` or `auto`)
+
+> **Fallback:** If CLI is not installed, fall back to MCP (if available) or skip evidence collection.
+
+**CLI Evidence Collection:**
+All commands use the same named session to target the correct browser:
+
+1. `playwright-cli -s=tea-review open <target_url>`
+2. `playwright-cli -s=tea-review tracing-start`
+3. Execute the flow under review (using `-s=tea-review` on each command)
+4. `playwright-cli -s=tea-review tracing-stop` → saves trace.zip
+5. `playwright-cli -s=tea-review screenshot --filename={test_artifacts}/review-evidence.png`
+6. `playwright-cli -s=tea-review network` → capture network request log
+7. `playwright-cli -s=tea-review close`
+
+> **Session Hygiene:** Always close sessions using `playwright-cli -s=tea-review close`. Do NOT use `close-all` — it kills every session on the machine and breaks parallel execution.
+
+---
+
+## 4. Save Progress
+
+**Save this step's accumulated work to `{outputFile}`.**
+
+- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
+
+  ```yaml
+  ---
+  stepsCompleted: ['step-02-discover-tests']
+  lastStep: 'step-02-discover-tests'
+  lastSaved: '{date}'
+  ---
+  ```
+
+  Then write this step's output below the frontmatter.
+
+- **If `{outputFile}` already exists**, update:
+  - Add `'step-02-discover-tests'` to `stepsCompleted` array (only if not already present)
+  - Set `lastStep: 'step-02-discover-tests'`
+  - Set `lastSaved: '{date}'`
+  - Append this step's output to the appropriate section of the document.
+
+Load next step: `{nextStepFile}`
+
+## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
+
+### ✅ SUCCESS:
+
+- Step completed in full with required outputs
+
+### ❌ SYSTEM FAILURE:
+
+- Skipped sequence steps or missing outputs
+  **Master Rule:** Skipping steps is FORBIDDEN.
--- a/.claude/skills/bmad-testarch-test-review/steps-c/step-03-quality-evaluation.md
+++ b/.claude/skills/bmad-testarch-test-review/steps-c/step-03-quality-evaluation.md
@@ -0,0 +1,274 @@
+---
+name: 'step-03-quality-evaluation'
+description: 'Orchestrate adaptive quality dimension checks (agent-team, subagent, or sequential)'
+nextStepFile: './step-03f-aggregate-scores.md'
+---
+
+# Step 3: Orchestrate Adaptive Quality Evaluation
+
+## STEP GOAL
+
+Select execution mode deterministically, then evaluate quality dimensions using agent-team, subagent, or sequential execution while preserving output contracts:
+
+- Determinism
+- Isolation
+- Maintainability
+- Performance
+
+Coverage is intentionally excluded from this workflow and handled by `trace`.
+
+## MANDATORY EXECUTION RULES
+
+- 📖 Read the entire step file before acting
+- ✅ Speak in `{communication_language}`
+- ✅ Resolve execution mode from config (`tea_execution_mode`, `tea_capability_probe`)
+- ✅ Apply fallback rules deterministically when requested mode is unsupported
+- ✅ Wait for required worker steps to complete
+- ❌ Do NOT skip capability checks when probing is enabled
+- ❌ Do NOT proceed until required worker steps finish
+
+---
+
+## EXECUTION PROTOCOLS:
+
+- 🎯 Follow the MANDATORY SEQUENCE exactly
+- 💾 Wait for subagent outputs
+- 📖 Load the next step only when instructed
+
+## CONTEXT BOUNDARIES:
+
+- Available context: test files from Step 2, knowledge fragments
+- Focus: orchestration only (mode selection + worker dispatch)
+- Limits: do not evaluate quality directly (delegate to worker steps)
+
+---
+
+## MANDATORY SEQUENCE
+
+### 1. Prepare Execution Context
+
+**Generate unique timestamp:**
+
+```javascript
+const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
+```
+
+**Prepare context for all subagents:**
+
+```javascript
+const parseBooleanFlag = (value, defaultValue = true) => {
+  if (typeof value === 'string') {
+    const normalized = value.trim().toLowerCase();
+    if (['false', '0', 'off', 'no'].includes(normalized)) return false;
+    if (['true', '1', 'on', 'yes'].includes(normalized)) return true;
+  }
+  if (value === undefined || value === null) return defaultValue;
+  return Boolean(value);
+};
+
+const subagentContext = {
+  test_files: /* from Step 2 */,
+  knowledge_fragments_loaded: ['test-quality'],
+  config: {
+    execution_mode: config.tea_execution_mode || 'auto',  // "auto" | "subagent" | "agent-team" | "sequential"
+    capability_probe: parseBooleanFlag(config.tea_capability_probe, true),  // supports booleans and "false"/"true" strings
+  },
+  timestamp: timestamp
+};
+```
+
+---
+
+### 2. Resolve Execution Mode with Capability Probe
+
+```javascript
+const normalizeUserExecutionMode = (mode) => {
+  if (typeof mode !== 'string') return null;
+  const normalized = mode.trim().toLowerCase().replace(/[-_]/g, ' ').replace(/\s+/g, ' ');
+
+  if (normalized === 'auto') return 'auto';
+  if (normalized === 'sequential') return 'sequential';
+  if (normalized === 'subagent' || normalized === 'sub agent' || normalized === 'subagents' || normalized === 'sub agents') {
+    return 'subagent';
+  }
+  if (normalized === 'agent team' || normalized === 'agent teams' || normalized === 'agentteam') {
+    return 'agent-team';
+  }
+
+  return null;
+};
+
+const normalizeConfigExecutionMode = (mode) => {
+  if (mode === 'subagent') return 'subagent';
+  if (mode === 'auto' || mode === 'sequential' || mode === 'subagent' || mode === 'agent-team') {
+    return mode;
+  }
+  return null;
+};
+
+// Explicit user instruction in the active run takes priority over config.
+const explicitModeFromUser = normalizeUserExecutionMode(runtime.getExplicitExecutionModeHint?.() || null);
+
+const requestedMode = explicitModeFromUser || normalizeConfigExecutionMode(subagentContext.config.execution_mode) || 'auto';
+const probeEnabled = subagentContext.config.capability_probe;
+
+const supports = {
+  subagent: false,
+  agentTeam: false,
+};
+
+if (probeEnabled) {
+  supports.subagent = runtime.canLaunchSubagents?.() === true;
+  supports.agentTeam = runtime.canLaunchAgentTeams?.() === true;
+}
+
+let resolvedMode = requestedMode;
+
+if (requestedMode === 'auto') {
+  if (supports.agentTeam) resolvedMode = 'agent-team';
+  else if (supports.subagent) resolvedMode = 'subagent';
+  else resolvedMode = 'sequential';
+} else if (probeEnabled && requestedMode === 'agent-team' && !supports.agentTeam) {
+  resolvedMode = supports.subagent ? 'subagent' : 'sequential';
+} else if (probeEnabled && requestedMode === 'subagent' && !supports.subagent) {
+  resolvedMode = 'sequential';
+}
+
+subagentContext.execution = {
+  requestedMode,
+  resolvedMode,
+  probeEnabled,
+  supports,
+};
+```
+
+Resolution precedence:
+
+1. Explicit user request in this run (`agent team` => `agent-team`; `subagent` => `subagent`; `sequential`; `auto`)
+2. `tea_execution_mode` from config
+3. Runtime capability fallback (when probing enabled)
+
+If probing is disabled, honor the requested mode strictly. If that mode cannot be executed at runtime, fail with explicit error instead of silent fallback.
+
+---
+
+### 3. Dispatch 4 Quality Workers
+
+**Subagent A: Determinism**
+
+- File: `./step-03a-subagent-determinism.md`
+- Output: `/tmp/tea-test-review-determinism-${timestamp}.json`
+- Execution:
+  - `agent-team` or `subagent`: launch non-blocking
+  - `sequential`: run blocking and wait
+- Status: Running... ⟳
+
+**Subagent B: Isolation**
+
+- File: `./step-03b-subagent-isolation.md`
+- Output: `/tmp/tea-test-review-isolation-${timestamp}.json`
+- Status: Running... ⟳
+
+**Subagent C: Maintainability**
+
+- File: `./step-03c-subagent-maintainability.md`
+- Output: `/tmp/tea-test-review-maintainability-${timestamp}.json`
+- Status: Running... ⟳
+
+**Subagent D: Performance**
+
+- File: `./step-03e-subagent-performance.md`
+- Output: `/tmp/tea-test-review-performance-${timestamp}.json`
+- Status: Running... ⟳
+
+In `agent-team` and `subagent` modes, runtime decides worker scheduling and concurrency.
+
+---
+
+### 4. Wait for Expected Worker Completion
+
+**If `resolvedMode` is `agent-team` or `subagent`:**
+
+```
+⏳ Waiting for 4 quality subagents to complete...
+✅ All 4 quality subagents completed successfully!
+```
+
+**If `resolvedMode` is `sequential`:**
+
+```
+✅ Sequential mode: each worker already completed during dispatch.
+```
+
+---
+
+### 5. Verify All Outputs Exist
+
+```javascript
+const outputs = ['determinism', 'isolation', 'maintainability', 'performance'].map(
+  (dim) => `/tmp/tea-test-review-${dim}-${timestamp}.json`,
+);
+
+outputs.forEach((output) => {
+  if (!fs.existsSync(output)) {
+    throw new Error(`Subagent output missing: ${output}`);
+  }
+});
+```
+
+---
+
+### 6. Execution Report
+
+```
+🚀 Performance Report:
+- Execution Mode: {resolvedMode}
+- Total Elapsed: ~mode-dependent
+- Parallel Gain: ~60-70% faster when mode is subagent/agent-team
+```
+
+---
+
+### 7. Proceed to Aggregation
+
+Pass the same `timestamp` value to Step 3F (do not regenerate it). Step 3F must read the exact temp files written in this step.
+
+Load next step: `{nextStepFile}`
+
+The aggregation step (3F) will:
+
+- Read all 4 subagent outputs
+- Calculate weighted overall score (0-100)
+- Aggregate violations by severity
+- Generate review report with top suggestions
+
+---
+
+## EXIT CONDITION
+
+Proceed to Step 3F when:
+
+- ✅ All 4 subagents completed successfully
+- ✅ All output files exist and are valid JSON
+- ✅ Execution metrics displayed
+
+**Do NOT proceed if any subagent failed.**
+
+---
+
+## 🚨 SYSTEM SUCCESS METRICS
+
+### ✅ SUCCESS:
+
+- All 4 subagents launched and completed
+- All required worker steps completed
+- Output files generated and valid
+- Fallback behavior respected configuration and capability probe rules
+
+### ❌ FAILURE:
+
+- One or more subagents failed
+- Output files missing or invalid
+- Unsupported requested mode with probing disabled
+
+**Master Rule:** Deterministic mode selection + stable output contract. Use the best supported mode, then aggregate normally.
--- a/.claude/skills/bmad-testarch-test-review/steps-c/step-03a-subagent-determinism.md
+++ b/.claude/skills/bmad-testarch-test-review/steps-c/step-03a-subagent-determinism.md
@@ -0,0 +1,214 @@
+---
+name: 'step-03a-subagent-determinism'
+description: 'Subagent: Check test determinism (no random/time dependencies)'
+subagent: true
+outputFile: '/tmp/tea-test-review-determinism-{{timestamp}}.json'
+---
+
+# Subagent 3A: Determinism Quality Check
+
+## SUBAGENT CONTEXT
+
+This is an **isolated subagent** running in parallel with other quality dimension checks.
+
+**What you have from parent workflow:**
+
+- Test files discovered in Step 2
+- Knowledge fragment: test-quality (determinism criteria)
+- Config: test framework
+
+**Your task:** Analyze test files for DETERMINISM violations only.
+
+---
+
+## MANDATORY EXECUTION RULES
+
+- 📖 Read this entire subagent file before acting
+- ✅ Check DETERMINISM only (not other quality dimensions)
+- ✅ Output structured JSON to temp file
+- ❌ Do NOT check isolation, maintainability, coverage, or performance (other subagents)
+- ❌ Do NOT modify test files (read-only analysis)
+- ❌ Do NOT run tests (just analyze code)
+
+---
+
+## SUBAGENT TASK
+
+### 1. Identify Determinism Violations
+
+**Scan test files for non-deterministic patterns:**
+
+**HIGH SEVERITY Violations**:
+
+- `Math.random()` - Random number generation
+- `Date.now()` or `new Date()` without mocking
+- `setTimeout` / `setInterval` without proper waits
+- External API calls without mocking
+- File system operations on random paths
+- Database queries with non-deterministic ordering
+
+**MEDIUM SEVERITY Violations**:
+
+- `page.waitForTimeout(N)` - Hard waits instead of conditions
+- Flaky selectors (CSS classes that may change)
+- Race conditions (missing proper synchronization)
+- Test order dependencies (test A must run before test B)
+
+**LOW SEVERITY Violations**:
+
+- Missing test isolation (shared state between tests)
+- Console timestamps without fixed timezone
+
+### 2. Analyze Each Test File
+
+For each test file from Step 2:
+
+```javascript
+const violations = [];
+
+// Check for Math.random()
+if (testFileContent.includes('Math.random()')) {
+  violations.push({
+    file: testFile,
+    line: findLineNumber('Math.random()'),
+    severity: 'HIGH',
+    category: 'random-generation',
+    description: 'Test uses Math.random() - non-deterministic',
+    suggestion: 'Use faker.seed(12345) for deterministic random data',
+  });
+}
+
+// Check for Date.now()
+if (testFileContent.includes('Date.now()') || testFileContent.includes('new Date()')) {
+  violations.push({
+    file: testFile,
+    line: findLineNumber('Date.now()'),
+    severity: 'HIGH',
+    category: 'time-dependency',
+    description: 'Test uses Date.now() or new Date() without mocking',
+    suggestion: 'Mock system time with test.useFakeTimers() or use fixed timestamps',
+  });
+}
+
+// Check for hard waits
+if (testFileContent.includes('waitForTimeout')) {
+  violations.push({
+    file: testFile,
+    line: findLineNumber('waitForTimeout'),
+    severity: 'MEDIUM',
+    category: 'hard-wait',
+    description: 'Test uses waitForTimeout - creates flakiness',
+    suggestion: 'Replace with expect(locator).toBeVisible() or waitForResponse',
+  });
+}
+
+// ... check other patterns
+```
+
+### 3. Calculate Determinism Score
+
+**Scoring Logic**:
+
+```javascript
+const totalChecks = testFiles.length * checksPerFile;
+const failedChecks = violations.length;
+const passedChecks = totalChecks - failedChecks;
+
+// Weight violations by severity
+const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
+const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
+
+// Score: 100 - (penalty points)
+const score = Math.max(0, 100 - totalPenalty);
+```
+
+---
+
+## OUTPUT FORMAT
+
+Write JSON to temp file: `/tmp/tea-test-review-determinism-{{timestamp}}.json`
+
+```json
+{
+  "dimension": "determinism",
+  "score": 85,
+  "max_score": 100,
+  "grade": "B",
+  "violations": [
+    {
+      "file": "tests/api/user.spec.ts",
+      "line": 42,
+      "severity": "HIGH",
+      "category": "random-generation",
+      "description": "Test uses Math.random() - non-deterministic",
+      "suggestion": "Use faker.seed(12345) for deterministic random data",
+      "code_snippet": "const userId = Math.random() * 1000;"
+    },
+    {
+      "file": "tests/e2e/checkout.spec.ts",
+      "line": 78,
+      "severity": "MEDIUM",
+      "category": "hard-wait",
+      "description": "Test uses waitForTimeout - creates flakiness",
+      "suggestion": "Replace with expect(locator).toBeVisible()",
+      "code_snippet": "await page.waitForTimeout(5000);"
+    }
+  ],
+  "passed_checks": 12,
+  "failed_checks": 3,
+  "total_checks": 15,
+  "violation_summary": {
+    "HIGH": 1,
+    "MEDIUM": 1,
+    "LOW": 1
+  },
+  "recommendations": [
+    "Use faker with fixed seed for all random data",
+    "Replace all waitForTimeout with conditional waits",
+    "Mock Date.now() in tests that use current time"
+  ],
+  "summary": "Tests are mostly deterministic with 3 violations (1 HIGH, 1 MEDIUM, 1 LOW)"
+}
+```
+
+**On Error:**
+
+```json
+{
+  "dimension": "determinism",
+  "success": false,
+  "error": "Error message describing what went wrong"
+}
+```
+
+---
+
+## EXIT CONDITION
+
+Subagent completes when:
+
+- ✅ All test files analyzed for determinism violations
+- ✅ Score calculated (0-100)
+- ✅ Violations categorized by severity
+- ✅ Recommendations generated
+- ✅ JSON output written to temp file
+
+**Subagent terminates here.** Parent workflow will read output and aggregate with other quality dimensions.
+
+---
+
+## 🚨 SUBAGENT SUCCESS METRICS
+
+### ✅ SUCCESS:
+
+- All test files scanned for determinism violations
+- Score calculated with proper severity weighting
+- JSON output valid and complete
+- Only determinism checked (not other dimensions)
+
+### ❌ FAILURE:
+
+- Checked quality dimensions other than determinism
+- Invalid or missing JSON output
+- Score calculation incorrect
+- Modified test files (should be read-only)
--- a/.claude/skills/bmad-testarch-test-review/steps-c/step-03b-subagent-isolation.md
+++ b/.claude/skills/bmad-testarch-test-review/steps-c/step-03b-subagent-isolation.md
@@ -0,0 +1,125 @@
+---
+name: 'step-03b-subagent-isolation'
+description: 'Subagent: Check test isolation (no shared state/dependencies)'
+subagent: true
+outputFile: '/tmp/tea-test-review-isolation-{{timestamp}}.json'
+---
+
+# Subagent 3B: Isolation Quality Check
+
+## SUBAGENT CONTEXT
+
+This is an **isolated subagent** running in parallel with other quality dimension checks.
+
+**Your task:** Analyze test files for ISOLATION violations only.
+
+---
+
+## MANDATORY EXECUTION RULES
+
+- ✅ Check ISOLATION only (not other quality dimensions)
+- ✅ Output structured JSON to temp file
+- ❌ Do NOT check determinism, maintainability, coverage, or performance
+- ❌ Do NOT modify test files (read-only analysis)
+
+---
+
+## SUBAGENT TASK
+
+### 1. Identify Isolation Violations
+
+**Scan test files for isolation issues:**
+
+**HIGH SEVERITY Violations**:
+
+- Global state mutations (global variables modified)
+- Test order dependencies (test B depends on test A running first)
+- Shared database records without cleanup
+- beforeAll/afterAll with side effects leaking to other tests
+
+**MEDIUM SEVERITY Violations**:
+
+- Missing test cleanup (created data not deleted)
+- Shared fixtures that mutate state
+- Tests that assume specific execution order
+- Environment variables modified without restoration
+
+**LOW SEVERITY Violations**:
+
+- Tests sharing test data (but not mutating)
+- Missing test.describe grouping
+- Tests that could be more isolated
+
+### 2. Calculate Isolation Score
+
+```javascript
+const totalChecks = testFiles.length * checksPerFile;
+const failedChecks = violations.length;
+const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
+const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
+const score = Math.max(0, 100 - totalPenalty);
+```
+
+---
+
+## OUTPUT FORMAT
+
+```json
+{
+  "dimension": "isolation",
+  "score": 90,
+  "max_score": 100,
+  "grade": "A-",
+  "violations": [
+    {
+      "file": "tests/api/integration.spec.ts",
+      "line": 15,
+      "severity": "HIGH",
+      "category": "test-order-dependency",
+      "description": "Test depends on previous test creating user record",
+      "suggestion": "Each test should create its own test data in beforeEach",
+      "code_snippet": "test('should update user', async () => { /* assumes user exists */ });"
+    }
+  ],
+  "passed_checks": 14,
+  "failed_checks": 1,
+  "total_checks": 15,
+  "violation_summary": {
+    "HIGH": 1,
+    "MEDIUM": 0,
+    "LOW": 0
+  },
+  "recommendations": [
+    "Add beforeEach hooks to create test data",
+    "Add afterEach hooks to cleanup created records",
+    "Use test.describe.configure({ mode: 'parallel' }) to enforce isolation"
+  ],
+  "summary": "Tests are well isolated with 1 HIGH severity violation"
+}
+```
+
+---
+
+## EXIT CONDITION
+
+Subagent completes when:
+
+- ✅ All test files analyzed for isolation violations
+- ✅ Score calculated
+- ✅ JSON output written to temp file
+
+**Subagent terminates here.**
+
+---
+
+## 🚨 SUBAGENT SUCCESS METRICS
+
+### ✅ SUCCESS:
+
+- Only isolation checked (not other dimensions)
+- JSON output valid and complete
+
+### ❌ FAILURE:
+
+- Checked quality dimensions other than isolation
+- Invalid or missing JSON output
--- a/.claude/skills/bmad-testarch-test-review/steps-c/step-03c-subagent-maintainability.md
+++ b/.claude/skills/bmad-testarch-test-review/steps-c/step-03c-subagent-maintainability.md
@@ -0,0 +1,102 @@
+---
+name: 'step-03c-subagent-maintainability'
+description: 'Subagent: Check test maintainability (readability, structure, DRY)'
+subagent: true
+outputFile: '/tmp/tea-test-review-maintainability-{{timestamp}}.json'
+---
+
+# Subagent 3C: Maintainability Quality Check
+
+## SUBAGENT CONTEXT
+
+This is an **isolated subagent** running in parallel with other quality dimension checks.
+
+**Your task:** Analyze test files for MAINTAINABILITY violations only.
+
+---
+
+## MANDATORY EXECUTION RULES
+
+- ✅ Check MAINTAINABILITY only (not other quality dimensions)
+- ✅ Output structured JSON to temp file
+- ❌ Do NOT check determinism, isolation, coverage, or performance
+
+---
+
+## SUBAGENT TASK
+
+### 1. Identify Maintainability Violations
+
+**HIGH SEVERITY Violations**:
+
+- Tests >100 lines (too complex)
+- No test.describe grouping
+- Duplicate test logic (copy-paste)
+- Unclear test names (no Given/When/Then structure)
+- Magic numbers/strings without constants
+
+**MEDIUM SEVERITY Violations**:
+
+- Tests missing comments for complex logic
+- Inconsistent naming conventions
+- Excessive nesting (>3 levels)
+- Large setup/teardown blocks
+
+**LOW SEVERITY Violations**:
+
+- Minor code style issues
+- Could benefit from helper functions
+- Inconsistent assertion styles
+
+### 2. Calculate Maintainability Score
+
+```javascript
+const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
+const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
+const score = Math.max(0, 100 - totalPenalty);
+```
+
+---
+
+## OUTPUT FORMAT
+
+```json
+{
+  "dimension": "maintainability",
+  "score": 75,
+  "max_score": 100,
+  "grade": "C",
+  "violations": [
+    {
+      "file": "tests/e2e/complex-flow.spec.ts",
+      "line": 1,
+      "severity": "HIGH",
+      "category": "test-too-long",
+      "description": "Test file is 250 lines - too complex to maintain",
+      "suggestion": "Split into multiple smaller test files by feature area",
+      "code_snippet": "test.describe('Complex flow', () => { /* 250 lines */ });"
+    }
+  ],
+  "passed_checks": 10,
+  "failed_checks": 5,
+  "violation_summary": {
+    "HIGH": 2,
+    "MEDIUM": 2,
+    "LOW": 1
+  },
+  "recommendations": [
+    "Split large test files into smaller, focused files (<100 lines each)",
+    "Add test.describe grouping for related tests",
+    "Extract duplicate logic into helper functions"
+  ],
+  "summary": "Tests have maintainability issues - 5 violations (2 HIGH)"
+}
+```
+
+---
+
+## EXIT CONDITION
+
+Subagent completes when JSON output written to temp file.
+
+**Subagent terminates here.**
--- a/.claude/skills/bmad-testarch-test-review/steps-c/step-03e-subagent-performance.md
+++ b/.claude/skills/bmad-testarch-test-review/steps-c/step-03e-subagent-performance.md
@@ -0,0 +1,117 @@
+---
+name: 'step-03e-subagent-performance'
+description: 'Subagent: Check test performance (speed, efficiency, parallelization)'
+subagent: true
+outputFile: '/tmp/tea-test-review-performance-{{timestamp}}.json'
+---
+
+# Subagent 3E: Performance Quality Check
+
+## SUBAGENT CONTEXT
+
+This is an **isolated subagent** running in parallel with other quality dimension checks.
+
+**Your task:** Analyze test files for PERFORMANCE violations only.
+
+---
+
+## MANDATORY EXECUTION RULES
+
+- ✅ Check PERFORMANCE only (not other quality dimensions)
+- ✅ Output structured JSON to temp file
+- ❌ Do NOT check determinism, isolation, maintainability, or coverage
+
+---
+
+## SUBAGENT TASK
+
+### 1. Identify Performance Violations
+
+**HIGH SEVERITY Violations**:
+
+- Tests not parallelizable (using test.describe.serial unnecessarily)
+- Slow setup/teardown (creating fresh DB for every test)
+- Excessive navigation (reloading pages unnecessarily)
+- No fixture reuse (repeating expensive operations)
+
+**MEDIUM SEVERITY Violations**:
+
+- Hard waits >2 seconds (waitForTimeout(5000))
+- Inefficient selectors (page.$$ instead of locators)
+- Large data sets in tests without pagination
+- Missing performance optimizations
+
+**LOW SEVERITY Violations**:
+
+- Could use parallelization (test.describe.configure({ mode: 'parallel' }))
+- Minor inefficiencies
+- Excessive logging
+
+### 2. Calculate Performance Score
+
+```javascript
+const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
+const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
+const score = Math.max(0, 100 - totalPenalty);
+```
+
+---
+
+## OUTPUT FORMAT
+
+```json
+{
+  "dimension": "performance",
+  "score": 80,
+  "max_score": 100,
+  "grade": "B",
+  "violations": [
+    {
+      "file": "tests/e2e/search.spec.ts",
+      "line": 10,
+      "severity": "HIGH",
+      "category": "not-parallelizable",
+      "description": "Tests use test.describe.serial unnecessarily - reduces parallel execution",
+      "suggestion": "Remove .serial unless tests truly share state",
+      "code_snippet": "test.describe.serial('Search tests', () => { ... });"
+    },
+    {
+      "file": "tests/api/bulk-operations.spec.ts",
+      "line": 35,
+      "severity": "MEDIUM",
+      "category": "slow-setup",
+      "description": "Test creates 1000 records in setup - very slow",
+      "suggestion": "Use smaller data sets or fixture factories",
+      "code_snippet": "beforeEach(async () => { for (let i=0; i<1000; i++) { ... } });"
+    }
+  ],
+  "passed_checks": 13,
+  "failed_checks": 2,
+  "violation_summary": {
+    "HIGH": 1,
+    "MEDIUM": 1,
+    "LOW": 0
+  },
+  "performance_metrics": {
+    "parallelizable_tests": 80,
+    "serial_tests": 20,
+    "avg_test_duration_estimate": "~2 seconds",
+    "slow_tests": ["bulk-operations.spec.ts (>30s)"]
+  },
+  "recommendations": [
+    "Enable parallel mode where possible",
+    "Reduce setup data to minimum needed",
+    "Use fixtures to share expensive setup across tests",
+    "Remove unnecessary .serial constraints"
+  ],
+  "summary": "Good performance with 2 violations - 80% tests can run in parallel"
+}
+```
+
+---
+
+## EXIT CONDITION
+
+Subagent completes when JSON output written to temp file.
+
+**Subagent terminates here.**
--- a/.claude/skills/bmad-testarch-test-review/steps-c/step-03f-aggregate-scores.md
+++ b/.claude/skills/bmad-testarch-test-review/steps-c/step-03f-aggregate-scores.md
@@ -0,0 +1,277 @@
+---
+name: 'step-03f-aggregate-scores'
+description: 'Aggregate quality dimension scores into overall 0-100 score'
+nextStepFile: './step-04-generate-report.md'
+outputFile: '{test_artifacts}/test-review.md'
+---
+
+# Step 3F: Aggregate Quality Scores
+
+## STEP GOAL
+
+Read outputs from 4 quality subagents, calculate weighted overall score (0-100), and aggregate violations for report generation.
+
+---
+
+## MANDATORY EXECUTION RULES
+
+- 📖 Read the entire step file before acting
+- ✅ Speak in `{communication_language}`
+- ✅ Read all 4 subagent outputs
+- ✅ Calculate weighted overall score
+- ✅ Aggregate violations by severity
+- ❌ Do NOT re-evaluate quality (use subagent outputs)
+
+---
+
+## EXECUTION PROTOCOLS:
+
+- 🎯 Follow the MANDATORY SEQUENCE exactly
+- 💾 Record outputs before proceeding
+- 📖 Load the next step only when instructed
+
+---
+
+## MANDATORY SEQUENCE
+
+### 1. Read All Subagent Outputs
+
+```javascript
+// Use the SAME timestamp generated in Step 3 (do not regenerate).
+const timestamp = subagentContext?.timestamp;
+if (!timestamp) {
+  throw new Error('Missing timestamp from Step 3 context. Pass Step 3 timestamp into Step 3F.');
+}
+const dimensions = ['determinism', 'isolation', 'maintainability', 'performance'];
+const results = {};
+
+dimensions.forEach((dim) => {
+  const outputPath = `/tmp/tea-test-review-${dim}-${timestamp}.json`;
+  results[dim] = JSON.parse(fs.readFileSync(outputPath, 'utf8'));
+});
+```
+
+**Verify all succeeded:**
+
+```javascript
+const allSucceeded = dimensions.every((dim) => results[dim].score !== undefined);
+if (!allSucceeded) {
+  throw new Error('One or more quality subagents failed!');
+}
+```
+
+---
+
+### 2. Calculate Weighted Overall Score
+
+**Dimension Weights** (based on TEA quality priorities):
+
+```javascript
+const weights = {
+  determinism: 0.3, // 30% - Reliability and flake prevention
+  isolation: 0.3, // 30% - Parallel safety and independence
+  maintainability: 0.25, // 25% - Readability and long-term health
+  performance: 0.15, // 15% - Speed and execution efficiency
+};
+```
+
+**Calculate overall score:**
+
+```javascript
+const overallScore = dimensions.reduce((sum, dim) => {
+  return sum + results[dim].score * weights[dim];
+}, 0);
+
+const roundedScore = Math.round(overallScore);
+```
+
+**Determine grade:**
+
+```javascript
+const getGrade = (score) => {
+  if (score >= 90) return 'A';
+  if (score >= 80) return 'B';
+  if (score >= 70) return 'C';
+  if (score >= 60) return 'D';
+  return 'F';
+};
+
+const overallGrade = getGrade(roundedScore);
+```
+
+---
+
+### 3. Aggregate Violations by Severity
+
+**Collect all violations from all dimensions:**
+
+```javascript
+const allViolations = dimensions.flatMap((dim) =>
+  results[dim].violations.map((v) => ({
+    ...v,
+    dimension: dim,
+  })),
+);
+
+// Group by severity
+const highSeverity = allViolations.filter((v) => v.severity === 'HIGH');
+const mediumSeverity = allViolations.filter((v) => v.severity === 'MEDIUM');
+const lowSeverity = allViolations.filter((v) => v.severity === 'LOW');
+
+const violationSummary = {
+  total: allViolations.length,
+  HIGH: highSeverity.length,
+  MEDIUM: mediumSeverity.length,
+  LOW: lowSeverity.length,
+};
+```
+
+---
+
+### 4. Prioritize Recommendations
+
+**Extract recommendations from all dimensions:**
+
+```javascript
+const allRecommendations = dimensions.flatMap((dim) =>
+  results[dim].recommendations.map((rec) => ({
+    dimension: dim,
+    recommendation: rec,
+    impact: results[dim].score < 70 ? 'HIGH' : 'MEDIUM',
+  })),
+);
+
+// Sort by impact (HIGH first)
+const prioritizedRecommendations = allRecommendations.sort((a, b) => (a.impact === 'HIGH' ? -1 : 1)).slice(0, 10); // Top 10 recommendations
+```
+
+---
+
+### 5. Create Review Summary Object
+
+**Aggregate all results:**
+
+```javascript
+const reviewSummary = {
+  overall_score: roundedScore,
+  overall_grade: overallGrade,
+  quality_assessment: getQualityAssessment(roundedScore),
+
+  dimension_scores: {
+    determinism: results.determinism.score,
+    isolation: results.isolation.score,
+    maintainability: results.maintainability.score,
+    performance: results.performance.score,
+  },
+
+  dimension_grades: {
+    determinism: results.determinism.grade,
+    isolation: results.isolation.grade,
+    maintainability: results.maintainability.grade,
+    performance: results.performance.grade,
+  },
+
+  violations_summary: violationSummary,
+
+  all_violations: allViolations,
+
+  high_severity_violations: highSeverity,
+
+  top_10_recommendations: prioritizedRecommendations,
+
+  subagent_execution: 'PARALLEL (4 quality dimensions)',
+  performance_gain: '~60% faster than sequential',
+};
+
+// Save for Step 4 (report generation)
+fs.writeFileSync(`/tmp/tea-test-review-summary-${timestamp}.json`, JSON.stringify(reviewSummary, null, 2), 'utf8');
+```
+
+---
+
+### 6. Display Summary to User
+
+```
+✅ Quality Evaluation Complete (Parallel Execution)
+
+📊 Overall Quality Score: {roundedScore}/100 (Grade: {overallGrade})
+
+📈 Dimension Scores:
+- Determinism:      {determinism_score}/100 ({determinism_grade})
+- Isolation:        {isolation_score}/100 ({isolation_grade})
+- Maintainability:  {maintainability_score}/100 ({maintainability_grade})
+- Performance:      {performance_score}/100 ({performance_grade})
+
+ℹ️ Coverage is excluded from `test-review` scoring. Use `trace` for coverage analysis and gates.
+
+⚠️ Violations Found:
+- HIGH:   {high_count} violations
+- MEDIUM: {medium_count} violations
+- LOW:    {low_count} violations
+- TOTAL:  {total_count} violations
+
+🚀 Performance: Parallel execution ~60% faster than sequential
+
+✅ Ready for report generation (Step 4)
+```
+
+---
+
+---
+
+### 7. Save Progress
+
+**Save this step's accumulated work to `{outputFile}`.**
+
+- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
+
+  ```yaml
+  ---
+  stepsCompleted: ['step-03f-aggregate-scores']
+  lastStep: 'step-03f-aggregate-scores'
+  lastSaved: '{date}'
+  ---
+  ```
+
+  Then write this step's output below the frontmatter.
+
+- **If `{outputFile}` already exists**, update:
+  - Add `'step-03f-aggregate-scores'` to `stepsCompleted` array (only if not already present)
+  - Set `lastStep: 'step-03f-aggregate-scores'`
+  - Set `lastSaved: '{date}'`
+  - Append this step's output to the appropriate section of the document.
+
+---
+
+## EXIT CONDITION
+
+Proceed to Step 4 when:
+
+- ✅ All subagent outputs read successfully
+- ✅ Overall score calculated
+- ✅ Violations aggregated
+- ✅ Recommendations prioritized
+- ✅ Summary saved to temp file
+- ✅ Output displayed to user
+- ✅ Progress saved to output document
+
+Load next step: `{nextStepFile}`
+
+---
+
+## 🚨 SYSTEM SUCCESS METRICS
+
+### ✅ SUCCESS:
+
+- All 4 subagent outputs read and parsed
+- Overall score calculated with proper weights
+- Violations aggregated correctly
+- Summary complete and saved
+
+### ❌ FAILURE:
+
+- Failed to read one or more subagent outputs
+- Score calculation incorrect
+- Summary missing or incomplete
+
+**Master Rule:** Aggregate determinism, isolation, maintainability, and performance only.
--- a/.claude/skills/bmad-testarch-test-review/steps-c/step-04-generate-report.md
+++ b/.claude/skills/bmad-testarch-test-review/steps-c/step-04-generate-report.md
@@ -0,0 +1,111 @@
+---
+name: 'step-04-generate-report'
+description: 'Create test-review report and validate'
+outputFile: '{test_artifacts}/test-review.md'
+---
+
+# Step 4: Generate Report & Validate
+
+## STEP GOAL
+
+Produce the test-review report and validate against checklist.
+
+## MANDATORY EXECUTION RULES
+
+- 📖 Read the entire step file before acting
+- ✅ Speak in `{communication_language}`
+
+---
+
+## EXECUTION PROTOCOLS:
+
+- 🎯 Follow the MANDATORY SEQUENCE exactly
+- 💾 Record outputs before proceeding
+- 📖 Load the next step only when instructed
+
+## CONTEXT BOUNDARIES:
+
+- Available context: config, loaded artifacts, and knowledge fragments
+- Focus: this step's goal only
+- Limits: do not execute future steps
+- Dependencies: prior steps' outputs (if any)
+
+## MANDATORY SEQUENCE
+
+**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
+
+## 1. Report Generation
+
+Use `test-review-template.md` to produce `{outputFile}` including:
+
+- Score summary
+- Critical findings with fixes
+- Warnings and recommendations
+- Context references (story/test-design if available)
+- Coverage boundary note: `test-review` does not score coverage. Direct coverage findings to `trace`.
+
+---
+
+## 2. Polish Output
+
+Before finalizing, review the complete output document for quality:
+
+1. **Remove duplication**: Progressive-append workflow may have created repeated sections — consolidate
+2. **Verify consistency**: Ensure terminology, risk scores, and references are consistent throughout
+3. **Check completeness**: All template sections should be populated or explicitly marked N/A
+4. **Format cleanup**: Ensure markdown formatting is clean (tables aligned, headers consistent, no orphaned references)
+
+---
+
+## 3. Validation
+
+Validate against `checklist.md` and fix any gaps.
+
+- [ ] CLI sessions cleaned up (no orphaned browsers)
+- [ ] Temp artifacts stored in `{test_artifacts}/` not random locations
+
+---
+
+## 4. Save Progress
+
+**Save this step's accumulated work to `{outputFile}`.**
+
+- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
+
+  ```yaml
+  ---
+  stepsCompleted: ['step-04-generate-report']
+  lastStep: 'step-04-generate-report'
+  lastSaved: '{date}'
+  ---
+  ```
+
+  Then write this step's output below the frontmatter.
+
+- **If `{outputFile}` already exists**, update:
+  - Add `'step-04-generate-report'` to `stepsCompleted` array (only if not already present)
+  - Set `lastStep: 'step-04-generate-report'`
+  - Set `lastSaved: '{date}'`
+  - Append this step's output to the appropriate section of the document.
+
+---
+
+## 5. Completion Summary
+
+Report:
+
+- Scope reviewed
+- Overall score
+- Critical blockers
+- Next recommended workflow (e.g., `automate` or `trace`)
+
+## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
+
+### ✅ SUCCESS:
+
+- Step completed in full with required outputs
+
+### ❌ SYSTEM FAILURE:
+
+- Skipped sequence steps or missing outputs
+  **Master Rule:** Skipping steps is FORBIDDEN.