initial commit

This commit is contained in:
2026-03-16 19:54:53 -04:00
commit bfe0e01254
3341 changed files with 483939 additions and 0 deletions

View File

@@ -0,0 +1,197 @@
---
name: 'step-01-load-context'
description: 'Load knowledge base, determine scope, and gather context'
nextStepFile: './step-02-discover-tests.md'
knowledgeIndex: '{project-root}/_bmad/tea/testarch/tea-index.csv'
outputFile: '{test_artifacts}/test-review.md'
---
# Step 1: Load Context & Knowledge Base
## STEP GOAL
Determine review scope, load required knowledge fragments, and gather related artifacts.
## MANDATORY EXECUTION RULES
- 📖 Read the entire step file before acting
- ✅ Speak in `{communication_language}`
---
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
- 💾 Record outputs before proceeding
- 📖 Load the next step only when instructed
## CONTEXT BOUNDARIES:
- Available context: config, loaded artifacts, and knowledge fragments
- Focus: this step's goal only
- Limits: do not execute future steps
- Dependencies: prior steps' outputs (if any)
## MANDATORY SEQUENCE
**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
## 1. Determine Scope and Stack
Use `review_scope`:
- **single**: one file
- **directory**: all tests in folder
- **suite**: all tests in repo
If unclear, ask the user.
**Stack Detection** (for context-aware loading):
Read `test_stack_type` from `{config_source}`. If `"auto"` or not configured, infer `{detected_stack}` by scanning `{project-root}`:
- **Frontend indicators**: `playwright.config.*`, `cypress.config.*`, `package.json` with react/vue/angular
- **Backend indicators**: `pyproject.toml`, `pom.xml`/`build.gradle`, `go.mod`, `*.csproj`, `Gemfile`, `Cargo.toml`
- **Both present** → `fullstack`; only frontend → `frontend`; only backend → `backend`
- Explicit `test_stack_type` overrides auto-detection
---
### Tiered Knowledge Loading
Load fragments based on their `tier` classification in `tea-index.csv`:
1. **Core tier** (always load): Foundational fragments required for this workflow
2. **Extended tier** (load on-demand): Load when deeper analysis is needed or when the user's context requires it
3. **Specialized tier** (load only when relevant): Load only when the specific use case matches (e.g., contract-testing only for microservices, email-auth only for email flows)
> **Context Efficiency**: Loading only core fragments reduces context usage by 40-50% compared to loading all fragments.
### Playwright Utils Loading Profiles
**If `tea_use_playwright_utils` is enabled**, select the appropriate loading profile:
- **API-only profile** (when `{detected_stack}` is `backend` or no `page.goto`/`page.locator` found in test files):
Load: `overview`, `api-request`, `auth-session`, `recurse` (~1,800 lines)
- **Full UI+API profile** (when `{detected_stack}` is `frontend`/`fullstack` or browser tests detected):
Load: all Playwright Utils core fragments (~4,500 lines)
**Detection**: Scan `{test_dir}` for files containing `page.goto` or `page.locator`. If none found, use API-only profile.
### Pact.js Utils Loading
**If `tea_use_pactjs_utils` is enabled** (and contract tests detected in review scope):
Load: `pactjs-utils-overview.md`, `pactjs-utils-provider-verifier.md`, `pactjs-utils-request-filter.md` (the 3 most relevant for reviewing provider verification tests)
**If `tea_use_pactjs_utils` is disabled** but contract tests are in review scope:
Load: `contract-testing.md`
### Pact MCP Loading
**If `tea_pact_mcp` is `"mcp"`:**
Load: `pact-mcp.md` — enables agent to use SmartBear MCP "Review Pact Tests" tool for automated best-practice feedback during test review.
## 2. Load Knowledge Base
From `{knowledgeIndex}` load:
Read `{config_source}` and check `tea_use_playwright_utils`, `tea_use_pactjs_utils`, `tea_pact_mcp`, and `tea_browser_automation` to select the correct fragment set.
**Core:**
- `test-quality.md`
- `data-factories.md`
- `test-levels-framework.md`
- `selective-testing.md`
- `test-healing-patterns.md`
- `selector-resilience.md`
- `timing-debugging.md`
**If Playwright Utils enabled:**
- `overview.md`, `api-request.md`, `network-recorder.md`, `auth-session.md`, `intercept-network-call.md`, `recurse.md`, `log.md`, `file-utils.md`, `burn-in.md`, `network-error-monitor.md`, `fixtures-composition.md`
**If disabled:**
- `fixture-architecture.md`
- `network-first.md`
- `playwright-config.md`
- `component-tdd.md`
- `ci-burn-in.md`
**Playwright CLI (if `tea_browser_automation` is "cli" or "auto"):**
- `playwright-cli.md`
**MCP Patterns (if `tea_browser_automation` is "mcp" or "auto"):**
- (existing MCP-related fragments, if any are added in future)
**Pact.js Utils (if enabled and contract tests in review scope):**
- `pactjs-utils-overview.md`, `pactjs-utils-provider-verifier.md`, `pactjs-utils-request-filter.md`
**Contract Testing (if pactjs-utils disabled but contract tests in review scope):**
- `contract-testing.md`
**Pact MCP (if tea_pact_mcp is "mcp"):**
- `pact-mcp.md`
---
## 3. Gather Context Artifacts
If available:
- Story file (acceptance criteria)
- Test design doc (priorities)
- Framework config
Summarize what was found.
Coverage mapping and coverage gates are out of scope in `test-review`. Route those concerns to `trace`.
---
## 4. Save Progress
**Save this step's accumulated work to `{outputFile}`.**
- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
```yaml
---
stepsCompleted: ['step-01-load-context']
lastStep: 'step-01-load-context'
lastSaved: '{date}'
---
```
Then write this step's output below the frontmatter.
- **If `{outputFile}` already exists**, update:
- Add `'step-01-load-context'` to `stepsCompleted` array (only if not already present)
- Set `lastStep: 'step-01-load-context'`
- Set `lastSaved: '{date}'`
- Append this step's output to the appropriate section of the document.
**Update `inputDocuments`**: Set `inputDocuments` in the output template frontmatter to the list of artifact paths loaded in this step (e.g., knowledge fragments, test design documents, configuration files).
Load next step: `{nextStepFile}`
## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
### ✅ SUCCESS:
- Step completed in full with required outputs
### ❌ SYSTEM FAILURE:
- Skipped sequence steps or missing outputs
**Master Rule:** Skipping steps is FORBIDDEN.

View File

@@ -0,0 +1,104 @@
---
name: 'step-01b-resume'
description: 'Resume interrupted workflow from last completed step'
outputFile: '{test_artifacts}/test-review.md'
---
# Step 1b: Resume Workflow
## STEP GOAL
Resume an interrupted workflow by loading the existing output document, displaying progress, and routing to the next incomplete step.
## MANDATORY EXECUTION RULES
- Read the entire step file before acting
- Speak in `{communication_language}`
---
## EXECUTION PROTOCOLS:
- Follow the MANDATORY SEQUENCE exactly
- Load the next step only when instructed
## CONTEXT BOUNDARIES:
- Available context: Output document with progress frontmatter
- Focus: Load progress and route to next step
- Limits: Do not re-execute completed steps
- Dependencies: Output document must exist from a previous run
## MANDATORY SEQUENCE
**CRITICAL:** Follow this sequence exactly.
### 1. Load Output Document
Read `{outputFile}` and parse YAML frontmatter for:
- `stepsCompleted` -- array of completed step names
- `lastStep` -- last completed step name
- `lastSaved` -- timestamp of last save
**If `{outputFile}` does not exist**, display:
"No previous progress found. There is no output document to resume from. Please use **[C] Create** to start a fresh workflow run."
**THEN:** Halt. Do not proceed.
---
### 2. Display Progress Dashboard
Display progress with checkmark/empty indicators:
```
Test Quality Review - Resume Progress:
1. Load Context (step-01-load-context) [completed/pending]
2. Discover Tests (step-02-discover-tests) [completed/pending]
3. Quality Evaluation + Aggregate (step-03f-aggregate-scores) [completed/pending]
4. Generate Report (step-04-generate-report) [completed/pending]
Last saved: {lastSaved}
```
---
### 3. Route to Next Step
Based on `lastStep`, load the next incomplete step:
| lastStep | Next Step File |
| --------------------------- | --------------------------------- |
| `step-01-load-context` | `./step-02-discover-tests.md` |
| `step-02-discover-tests` | `./step-03-quality-evaluation.md` |
| `step-03f-aggregate-scores` | `./step-04-generate-report.md` |
| `step-04-generate-report` | **Workflow already complete.** |
**If `lastStep` is the final step** (`step-04-generate-report`), display: "All steps completed. Use **[C] Create** to start fresh, **[V] Validate** to review outputs, or **[E] Edit** to make revisions." Then halt.
**If `lastStep` does not match any value above**, display: "Unknown progress state (`lastStep`: {lastStep}). Please use **[C] Create** to start fresh." Then halt.
**Otherwise**, load the identified step file, read completely, and execute.
The existing content in `{outputFile}` provides context from previously completed steps.
---
## SYSTEM SUCCESS/FAILURE METRICS
### SUCCESS:
- Output document loaded and parsed correctly
- Progress dashboard displayed accurately
- Routed to correct next step
### FAILURE:
- Not loading output document
- Incorrect progress display
- Routing to wrong step
**Master Rule:** Resume MUST route to the exact next incomplete step. Never re-execute completed steps.

View File

@@ -0,0 +1,113 @@
---
name: 'step-02-discover-tests'
description: 'Find and parse test files'
nextStepFile: './step-03-quality-evaluation.md'
outputFile: '{test_artifacts}/test-review.md'
---
# Step 2: Discover & Parse Tests
## STEP GOAL
Collect test files in scope and parse structure/metadata.
## MANDATORY EXECUTION RULES
- 📖 Read the entire step file before acting
- ✅ Speak in `{communication_language}`
---
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
- 💾 Record outputs before proceeding
- 📖 Load the next step only when instructed
## CONTEXT BOUNDARIES:
- Available context: config, loaded artifacts, and knowledge fragments
- Focus: this step's goal only
- Limits: do not execute future steps
- Dependencies: prior steps' outputs (if any)
## MANDATORY SEQUENCE
**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
## 1. Discover Test Files
- **single**: use provided file path
- **directory**: glob under `{test_dir}` or selected folder
- **suite**: glob all tests in repo
Halt if no tests are found.
---
## 2. Parse Metadata (per file)
Collect:
- File size and line count
- Test framework detected
- Describe/test block counts
- Test IDs and priority markers
- Imports, fixtures, factories, network interception
- Waits/timeouts and control flow (if/try/catch)
---
## 3. Evidence Collection (if `tea_browser_automation` is `cli` or `auto`)
> **Fallback:** If CLI is not installed, fall back to MCP (if available) or skip evidence collection.
**CLI Evidence Collection:**
All commands use the same named session to target the correct browser:
1. `playwright-cli -s=tea-review open <target_url>`
2. `playwright-cli -s=tea-review tracing-start`
3. Execute the flow under review (using `-s=tea-review` on each command)
4. `playwright-cli -s=tea-review tracing-stop` → saves trace.zip
5. `playwright-cli -s=tea-review screenshot --filename={test_artifacts}/review-evidence.png`
6. `playwright-cli -s=tea-review network` → capture network request log
7. `playwright-cli -s=tea-review close`
> **Session Hygiene:** Always close sessions using `playwright-cli -s=tea-review close`. Do NOT use `close-all` — it kills every session on the machine and breaks parallel execution.
---
## 4. Save Progress
**Save this step's accumulated work to `{outputFile}`.**
- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
```yaml
---
stepsCompleted: ['step-02-discover-tests']
lastStep: 'step-02-discover-tests'
lastSaved: '{date}'
---
```
Then write this step's output below the frontmatter.
- **If `{outputFile}` already exists**, update:
- Add `'step-02-discover-tests'` to `stepsCompleted` array (only if not already present)
- Set `lastStep: 'step-02-discover-tests'`
- Set `lastSaved: '{date}'`
- Append this step's output to the appropriate section of the document.
Load next step: `{nextStepFile}`
## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
### ✅ SUCCESS:
- Step completed in full with required outputs
### ❌ SYSTEM FAILURE:
- Skipped sequence steps or missing outputs
**Master Rule:** Skipping steps is FORBIDDEN.

View File

@@ -0,0 +1,274 @@
---
name: 'step-03-quality-evaluation'
description: 'Orchestrate adaptive quality dimension checks (agent-team, subagent, or sequential)'
nextStepFile: './step-03f-aggregate-scores.md'
---
# Step 3: Orchestrate Adaptive Quality Evaluation
## STEP GOAL
Select execution mode deterministically, then evaluate quality dimensions using agent-team, subagent, or sequential execution while preserving output contracts:
- Determinism
- Isolation
- Maintainability
- Performance
Coverage is intentionally excluded from this workflow and handled by `trace`.
## MANDATORY EXECUTION RULES
- 📖 Read the entire step file before acting
- ✅ Speak in `{communication_language}`
- ✅ Resolve execution mode from config (`tea_execution_mode`, `tea_capability_probe`)
- ✅ Apply fallback rules deterministically when requested mode is unsupported
- ✅ Wait for required worker steps to complete
- ❌ Do NOT skip capability checks when probing is enabled
- ❌ Do NOT proceed until required worker steps finish
---
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
- 💾 Wait for subagent outputs
- 📖 Load the next step only when instructed
## CONTEXT BOUNDARIES:
- Available context: test files from Step 2, knowledge fragments
- Focus: orchestration only (mode selection + worker dispatch)
- Limits: do not evaluate quality directly (delegate to worker steps)
---
## MANDATORY SEQUENCE
### 1. Prepare Execution Context
**Generate unique timestamp:**
```javascript
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
```
**Prepare context for all subagents:**
```javascript
const parseBooleanFlag = (value, defaultValue = true) => {
if (typeof value === 'string') {
const normalized = value.trim().toLowerCase();
if (['false', '0', 'off', 'no'].includes(normalized)) return false;
if (['true', '1', 'on', 'yes'].includes(normalized)) return true;
}
if (value === undefined || value === null) return defaultValue;
return Boolean(value);
};
const subagentContext = {
test_files: /* from Step 2 */,
knowledge_fragments_loaded: ['test-quality'],
config: {
execution_mode: config.tea_execution_mode || 'auto', // "auto" | "subagent" | "agent-team" | "sequential"
capability_probe: parseBooleanFlag(config.tea_capability_probe, true), // supports booleans and "false"/"true" strings
},
timestamp: timestamp
};
```
---
### 2. Resolve Execution Mode with Capability Probe
```javascript
const normalizeUserExecutionMode = (mode) => {
if (typeof mode !== 'string') return null;
const normalized = mode.trim().toLowerCase().replace(/[-_]/g, ' ').replace(/\s+/g, ' ');
if (normalized === 'auto') return 'auto';
if (normalized === 'sequential') return 'sequential';
if (normalized === 'subagent' || normalized === 'sub agent' || normalized === 'subagents' || normalized === 'sub agents') {
return 'subagent';
}
if (normalized === 'agent team' || normalized === 'agent teams' || normalized === 'agentteam') {
return 'agent-team';
}
return null;
};
const normalizeConfigExecutionMode = (mode) => {
if (mode === 'subagent') return 'subagent';
if (mode === 'auto' || mode === 'sequential' || mode === 'subagent' || mode === 'agent-team') {
return mode;
}
return null;
};
// Explicit user instruction in the active run takes priority over config.
const explicitModeFromUser = normalizeUserExecutionMode(runtime.getExplicitExecutionModeHint?.() || null);
const requestedMode = explicitModeFromUser || normalizeConfigExecutionMode(subagentContext.config.execution_mode) || 'auto';
const probeEnabled = subagentContext.config.capability_probe;
const supports = {
subagent: false,
agentTeam: false,
};
if (probeEnabled) {
supports.subagent = runtime.canLaunchSubagents?.() === true;
supports.agentTeam = runtime.canLaunchAgentTeams?.() === true;
}
let resolvedMode = requestedMode;
if (requestedMode === 'auto') {
if (supports.agentTeam) resolvedMode = 'agent-team';
else if (supports.subagent) resolvedMode = 'subagent';
else resolvedMode = 'sequential';
} else if (probeEnabled && requestedMode === 'agent-team' && !supports.agentTeam) {
resolvedMode = supports.subagent ? 'subagent' : 'sequential';
} else if (probeEnabled && requestedMode === 'subagent' && !supports.subagent) {
resolvedMode = 'sequential';
}
subagentContext.execution = {
requestedMode,
resolvedMode,
probeEnabled,
supports,
};
```
Resolution precedence:
1. Explicit user request in this run (`agent team` => `agent-team`; `subagent` => `subagent`; `sequential`; `auto`)
2. `tea_execution_mode` from config
3. Runtime capability fallback (when probing enabled)
If probing is disabled, honor the requested mode strictly. If that mode cannot be executed at runtime, fail with explicit error instead of silent fallback.
---
### 3. Dispatch 4 Quality Workers
**Subagent A: Determinism**
- File: `./step-03a-subagent-determinism.md`
- Output: `/tmp/tea-test-review-determinism-${timestamp}.json`
- Execution:
- `agent-team` or `subagent`: launch non-blocking
- `sequential`: run blocking and wait
- Status: Running... ⟳
**Subagent B: Isolation**
- File: `./step-03b-subagent-isolation.md`
- Output: `/tmp/tea-test-review-isolation-${timestamp}.json`
- Status: Running... ⟳
**Subagent C: Maintainability**
- File: `./step-03c-subagent-maintainability.md`
- Output: `/tmp/tea-test-review-maintainability-${timestamp}.json`
- Status: Running... ⟳
**Subagent D: Performance**
- File: `./step-03e-subagent-performance.md`
- Output: `/tmp/tea-test-review-performance-${timestamp}.json`
- Status: Running... ⟳
In `agent-team` and `subagent` modes, runtime decides worker scheduling and concurrency.
---
### 4. Wait for Expected Worker Completion
**If `resolvedMode` is `agent-team` or `subagent`:**
```
⏳ Waiting for 4 quality subagents to complete...
✅ All 4 quality subagents completed successfully!
```
**If `resolvedMode` is `sequential`:**
```
✅ Sequential mode: each worker already completed during dispatch.
```
---
### 5. Verify All Outputs Exist
```javascript
const outputs = ['determinism', 'isolation', 'maintainability', 'performance'].map(
(dim) => `/tmp/tea-test-review-${dim}-${timestamp}.json`,
);
outputs.forEach((output) => {
if (!fs.existsSync(output)) {
throw new Error(`Subagent output missing: ${output}`);
}
});
```
---
### 6. Execution Report
```
🚀 Performance Report:
- Execution Mode: {resolvedMode}
- Total Elapsed: ~mode-dependent
- Parallel Gain: ~60-70% faster when mode is subagent/agent-team
```
---
### 7. Proceed to Aggregation
Pass the same `timestamp` value to Step 3F (do not regenerate it). Step 3F must read the exact temp files written in this step.
Load next step: `{nextStepFile}`
The aggregation step (3F) will:
- Read all 4 subagent outputs
- Calculate weighted overall score (0-100)
- Aggregate violations by severity
- Generate review report with top suggestions
---
## EXIT CONDITION
Proceed to Step 3F when:
- ✅ All 4 subagents completed successfully
- ✅ All output files exist and are valid JSON
- ✅ Execution metrics displayed
**Do NOT proceed if any subagent failed.**
---
## 🚨 SYSTEM SUCCESS METRICS
### ✅ SUCCESS:
- All 4 subagents launched and completed
- All required worker steps completed
- Output files generated and valid
- Fallback behavior respected configuration and capability probe rules
### ❌ FAILURE:
- One or more subagents failed
- Output files missing or invalid
- Unsupported requested mode with probing disabled
**Master Rule:** Deterministic mode selection + stable output contract. Use the best supported mode, then aggregate normally.

View File

@@ -0,0 +1,214 @@
---
name: 'step-03a-subagent-determinism'
description: 'Subagent: Check test determinism (no random/time dependencies)'
subagent: true
outputFile: '/tmp/tea-test-review-determinism-{{timestamp}}.json'
---
# Subagent 3A: Determinism Quality Check
## SUBAGENT CONTEXT
This is an **isolated subagent** running in parallel with other quality dimension checks.
**What you have from parent workflow:**
- Test files discovered in Step 2
- Knowledge fragment: test-quality (determinism criteria)
- Config: test framework
**Your task:** Analyze test files for DETERMINISM violations only.
---
## MANDATORY EXECUTION RULES
- 📖 Read this entire subagent file before acting
- ✅ Check DETERMINISM only (not other quality dimensions)
- ✅ Output structured JSON to temp file
- ❌ Do NOT check isolation, maintainability, coverage, or performance (other subagents)
- ❌ Do NOT modify test files (read-only analysis)
- ❌ Do NOT run tests (just analyze code)
---
## SUBAGENT TASK
### 1. Identify Determinism Violations
**Scan test files for non-deterministic patterns:**
**HIGH SEVERITY Violations**:
- `Math.random()` - Random number generation
- `Date.now()` or `new Date()` without mocking
- `setTimeout` / `setInterval` without proper waits
- External API calls without mocking
- File system operations on random paths
- Database queries with non-deterministic ordering
**MEDIUM SEVERITY Violations**:
- `page.waitForTimeout(N)` - Hard waits instead of conditions
- Flaky selectors (CSS classes that may change)
- Race conditions (missing proper synchronization)
- Test order dependencies (test A must run before test B)
**LOW SEVERITY Violations**:
- Missing test isolation (shared state between tests)
- Console timestamps without fixed timezone
### 2. Analyze Each Test File
For each test file from Step 2:
```javascript
const violations = [];
// Check for Math.random()
if (testFileContent.includes('Math.random()')) {
violations.push({
file: testFile,
line: findLineNumber('Math.random()'),
severity: 'HIGH',
category: 'random-generation',
description: 'Test uses Math.random() - non-deterministic',
suggestion: 'Use faker.seed(12345) for deterministic random data',
});
}
// Check for Date.now()
if (testFileContent.includes('Date.now()') || testFileContent.includes('new Date()')) {
violations.push({
file: testFile,
line: findLineNumber('Date.now()'),
severity: 'HIGH',
category: 'time-dependency',
description: 'Test uses Date.now() or new Date() without mocking',
suggestion: 'Mock system time with test.useFakeTimers() or use fixed timestamps',
});
}
// Check for hard waits
if (testFileContent.includes('waitForTimeout')) {
violations.push({
file: testFile,
line: findLineNumber('waitForTimeout'),
severity: 'MEDIUM',
category: 'hard-wait',
description: 'Test uses waitForTimeout - creates flakiness',
suggestion: 'Replace with expect(locator).toBeVisible() or waitForResponse',
});
}
// ... check other patterns
```
### 3. Calculate Determinism Score
**Scoring Logic**:
```javascript
const totalChecks = testFiles.length * checksPerFile;
const failedChecks = violations.length;
const passedChecks = totalChecks - failedChecks;
// Weight violations by severity
const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
// Score: 100 - (penalty points)
const score = Math.max(0, 100 - totalPenalty);
```
---
## OUTPUT FORMAT
Write JSON to temp file: `/tmp/tea-test-review-determinism-{{timestamp}}.json`
```json
{
"dimension": "determinism",
"score": 85,
"max_score": 100,
"grade": "B",
"violations": [
{
"file": "tests/api/user.spec.ts",
"line": 42,
"severity": "HIGH",
"category": "random-generation",
"description": "Test uses Math.random() - non-deterministic",
"suggestion": "Use faker.seed(12345) for deterministic random data",
"code_snippet": "const userId = Math.random() * 1000;"
},
{
"file": "tests/e2e/checkout.spec.ts",
"line": 78,
"severity": "MEDIUM",
"category": "hard-wait",
"description": "Test uses waitForTimeout - creates flakiness",
"suggestion": "Replace with expect(locator).toBeVisible()",
"code_snippet": "await page.waitForTimeout(5000);"
}
],
"passed_checks": 12,
"failed_checks": 3,
"total_checks": 15,
"violation_summary": {
"HIGH": 1,
"MEDIUM": 1,
"LOW": 1
},
"recommendations": [
"Use faker with fixed seed for all random data",
"Replace all waitForTimeout with conditional waits",
"Mock Date.now() in tests that use current time"
],
"summary": "Tests are mostly deterministic with 3 violations (1 HIGH, 1 MEDIUM, 1 LOW)"
}
```
**On Error:**
```json
{
"dimension": "determinism",
"success": false,
"error": "Error message describing what went wrong"
}
```
---
## EXIT CONDITION
Subagent completes when:
- ✅ All test files analyzed for determinism violations
- ✅ Score calculated (0-100)
- ✅ Violations categorized by severity
- ✅ Recommendations generated
- ✅ JSON output written to temp file
**Subagent terminates here.** Parent workflow will read output and aggregate with other quality dimensions.
---
## 🚨 SUBAGENT SUCCESS METRICS
### ✅ SUCCESS:
- All test files scanned for determinism violations
- Score calculated with proper severity weighting
- JSON output valid and complete
- Only determinism checked (not other dimensions)
### ❌ FAILURE:
- Checked quality dimensions other than determinism
- Invalid or missing JSON output
- Score calculation incorrect
- Modified test files (should be read-only)

View File

@@ -0,0 +1,125 @@
---
name: 'step-03b-subagent-isolation'
description: 'Subagent: Check test isolation (no shared state/dependencies)'
subagent: true
outputFile: '/tmp/tea-test-review-isolation-{{timestamp}}.json'
---
# Subagent 3B: Isolation Quality Check
## SUBAGENT CONTEXT
This is an **isolated subagent** running in parallel with other quality dimension checks.
**Your task:** Analyze test files for ISOLATION violations only.
---
## MANDATORY EXECUTION RULES
- ✅ Check ISOLATION only (not other quality dimensions)
- ✅ Output structured JSON to temp file
- ❌ Do NOT check determinism, maintainability, coverage, or performance
- ❌ Do NOT modify test files (read-only analysis)
---
## SUBAGENT TASK
### 1. Identify Isolation Violations
**Scan test files for isolation issues:**
**HIGH SEVERITY Violations**:
- Global state mutations (global variables modified)
- Test order dependencies (test B depends on test A running first)
- Shared database records without cleanup
- beforeAll/afterAll with side effects leaking to other tests
**MEDIUM SEVERITY Violations**:
- Missing test cleanup (created data not deleted)
- Shared fixtures that mutate state
- Tests that assume specific execution order
- Environment variables modified without restoration
**LOW SEVERITY Violations**:
- Tests sharing test data (but not mutating)
- Missing test.describe grouping
- Tests that could be more isolated
### 2. Calculate Isolation Score
```javascript
const totalChecks = testFiles.length * checksPerFile;
const failedChecks = violations.length;
const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
const score = Math.max(0, 100 - totalPenalty);
```
---
## OUTPUT FORMAT
```json
{
"dimension": "isolation",
"score": 90,
"max_score": 100,
"grade": "A-",
"violations": [
{
"file": "tests/api/integration.spec.ts",
"line": 15,
"severity": "HIGH",
"category": "test-order-dependency",
"description": "Test depends on previous test creating user record",
"suggestion": "Each test should create its own test data in beforeEach",
"code_snippet": "test('should update user', async () => { /* assumes user exists */ });"
}
],
"passed_checks": 14,
"failed_checks": 1,
"total_checks": 15,
"violation_summary": {
"HIGH": 1,
"MEDIUM": 0,
"LOW": 0
},
"recommendations": [
"Add beforeEach hooks to create test data",
"Add afterEach hooks to cleanup created records",
"Use test.describe.configure({ mode: 'parallel' }) to enforce isolation"
],
"summary": "Tests are well isolated with 1 HIGH severity violation"
}
```
---
## EXIT CONDITION
Subagent completes when:
- ✅ All test files analyzed for isolation violations
- ✅ Score calculated
- ✅ JSON output written to temp file
**Subagent terminates here.**
---
## 🚨 SUBAGENT SUCCESS METRICS
### ✅ SUCCESS:
- Only isolation checked (not other dimensions)
- JSON output valid and complete
### ❌ FAILURE:
- Checked quality dimensions other than isolation
- Invalid or missing JSON output

View File

@@ -0,0 +1,102 @@
---
name: 'step-03c-subagent-maintainability'
description: 'Subagent: Check test maintainability (readability, structure, DRY)'
subagent: true
outputFile: '/tmp/tea-test-review-maintainability-{{timestamp}}.json'
---
# Subagent 3C: Maintainability Quality Check
## SUBAGENT CONTEXT
This is an **isolated subagent** running in parallel with other quality dimension checks.
**Your task:** Analyze test files for MAINTAINABILITY violations only.
---
## MANDATORY EXECUTION RULES
- ✅ Check MAINTAINABILITY only (not other quality dimensions)
- ✅ Output structured JSON to temp file
- ❌ Do NOT check determinism, isolation, coverage, or performance
---
## SUBAGENT TASK
### 1. Identify Maintainability Violations
**HIGH SEVERITY Violations**:
- Tests >100 lines (too complex)
- No test.describe grouping
- Duplicate test logic (copy-paste)
- Unclear test names (no Given/When/Then structure)
- Magic numbers/strings without constants
**MEDIUM SEVERITY Violations**:
- Tests missing comments for complex logic
- Inconsistent naming conventions
- Excessive nesting (>3 levels)
- Large setup/teardown blocks
**LOW SEVERITY Violations**:
- Minor code style issues
- Could benefit from helper functions
- Inconsistent assertion styles
### 2. Calculate Maintainability Score
```javascript
const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
const score = Math.max(0, 100 - totalPenalty);
```
---
## OUTPUT FORMAT
```json
{
"dimension": "maintainability",
"score": 75,
"max_score": 100,
"grade": "C",
"violations": [
{
"file": "tests/e2e/complex-flow.spec.ts",
"line": 1,
"severity": "HIGH",
"category": "test-too-long",
"description": "Test file is 250 lines - too complex to maintain",
"suggestion": "Split into multiple smaller test files by feature area",
"code_snippet": "test.describe('Complex flow', () => { /* 250 lines */ });"
}
],
"passed_checks": 10,
"failed_checks": 5,
"violation_summary": {
"HIGH": 2,
"MEDIUM": 2,
"LOW": 1
},
"recommendations": [
"Split large test files into smaller, focused files (<100 lines each)",
"Add test.describe grouping for related tests",
"Extract duplicate logic into helper functions"
],
"summary": "Tests have maintainability issues - 5 violations (2 HIGH)"
}
```
---
## EXIT CONDITION
Subagent completes when JSON output written to temp file.
**Subagent terminates here.**

View File

@@ -0,0 +1,117 @@
---
name: 'step-03e-subagent-performance'
description: 'Subagent: Check test performance (speed, efficiency, parallelization)'
subagent: true
outputFile: '/tmp/tea-test-review-performance-{{timestamp}}.json'
---
# Subagent 3E: Performance Quality Check
## SUBAGENT CONTEXT
This is an **isolated subagent** running in parallel with other quality dimension checks.
**Your task:** Analyze test files for PERFORMANCE violations only.
---
## MANDATORY EXECUTION RULES
- ✅ Check PERFORMANCE only (not other quality dimensions)
- ✅ Output structured JSON to temp file
- ❌ Do NOT check determinism, isolation, maintainability, or coverage
---
## SUBAGENT TASK
### 1. Identify Performance Violations
**HIGH SEVERITY Violations**:
- Tests not parallelizable (using test.describe.serial unnecessarily)
- Slow setup/teardown (creating fresh DB for every test)
- Excessive navigation (reloading pages unnecessarily)
- No fixture reuse (repeating expensive operations)
**MEDIUM SEVERITY Violations**:
- Hard waits >2 seconds (waitForTimeout(5000))
- Inefficient selectors (page.$$ instead of locators)
- Large data sets in tests without pagination
- Missing performance optimizations
**LOW SEVERITY Violations**:
- Could use parallelization (test.describe.configure({ mode: 'parallel' }))
- Minor inefficiencies
- Excessive logging
### 2. Calculate Performance Score
```javascript
const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
const score = Math.max(0, 100 - totalPenalty);
```
---
## OUTPUT FORMAT
```json
{
"dimension": "performance",
"score": 80,
"max_score": 100,
"grade": "B",
"violations": [
{
"file": "tests/e2e/search.spec.ts",
"line": 10,
"severity": "HIGH",
"category": "not-parallelizable",
"description": "Tests use test.describe.serial unnecessarily - reduces parallel execution",
"suggestion": "Remove .serial unless tests truly share state",
"code_snippet": "test.describe.serial('Search tests', () => { ... });"
},
{
"file": "tests/api/bulk-operations.spec.ts",
"line": 35,
"severity": "MEDIUM",
"category": "slow-setup",
"description": "Test creates 1000 records in setup - very slow",
"suggestion": "Use smaller data sets or fixture factories",
"code_snippet": "beforeEach(async () => { for (let i=0; i<1000; i++) { ... } });"
}
],
"passed_checks": 13,
"failed_checks": 2,
"violation_summary": {
"HIGH": 1,
"MEDIUM": 1,
"LOW": 0
},
"performance_metrics": {
"parallelizable_tests": 80,
"serial_tests": 20,
"avg_test_duration_estimate": "~2 seconds",
"slow_tests": ["bulk-operations.spec.ts (>30s)"]
},
"recommendations": [
"Enable parallel mode where possible",
"Reduce setup data to minimum needed",
"Use fixtures to share expensive setup across tests",
"Remove unnecessary .serial constraints"
],
"summary": "Good performance with 2 violations - 80% tests can run in parallel"
}
```
---
## EXIT CONDITION
Subagent completes when JSON output written to temp file.
**Subagent terminates here.**

View File

@@ -0,0 +1,277 @@
---
name: 'step-03f-aggregate-scores'
description: 'Aggregate quality dimension scores into overall 0-100 score'
nextStepFile: './step-04-generate-report.md'
outputFile: '{test_artifacts}/test-review.md'
---
# Step 3F: Aggregate Quality Scores
## STEP GOAL
Read outputs from 4 quality subagents, calculate weighted overall score (0-100), and aggregate violations for report generation.
---
## MANDATORY EXECUTION RULES
- 📖 Read the entire step file before acting
- ✅ Speak in `{communication_language}`
- ✅ Read all 4 subagent outputs
- ✅ Calculate weighted overall score
- ✅ Aggregate violations by severity
- ❌ Do NOT re-evaluate quality (use subagent outputs)
---
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
- 💾 Record outputs before proceeding
- 📖 Load the next step only when instructed
---
## MANDATORY SEQUENCE
### 1. Read All Subagent Outputs
```javascript
// Use the SAME timestamp generated in Step 3 (do not regenerate).
const timestamp = subagentContext?.timestamp;
if (!timestamp) {
throw new Error('Missing timestamp from Step 3 context. Pass Step 3 timestamp into Step 3F.');
}
const dimensions = ['determinism', 'isolation', 'maintainability', 'performance'];
const results = {};
dimensions.forEach((dim) => {
const outputPath = `/tmp/tea-test-review-${dim}-${timestamp}.json`;
results[dim] = JSON.parse(fs.readFileSync(outputPath, 'utf8'));
});
```
**Verify all succeeded:**
```javascript
const allSucceeded = dimensions.every((dim) => results[dim].score !== undefined);
if (!allSucceeded) {
throw new Error('One or more quality subagents failed!');
}
```
---
### 2. Calculate Weighted Overall Score
**Dimension Weights** (based on TEA quality priorities):
```javascript
const weights = {
determinism: 0.3, // 30% - Reliability and flake prevention
isolation: 0.3, // 30% - Parallel safety and independence
maintainability: 0.25, // 25% - Readability and long-term health
performance: 0.15, // 15% - Speed and execution efficiency
};
```
**Calculate overall score:**
```javascript
const overallScore = dimensions.reduce((sum, dim) => {
return sum + results[dim].score * weights[dim];
}, 0);
const roundedScore = Math.round(overallScore);
```
**Determine grade:**
```javascript
const getGrade = (score) => {
if (score >= 90) return 'A';
if (score >= 80) return 'B';
if (score >= 70) return 'C';
if (score >= 60) return 'D';
return 'F';
};
const overallGrade = getGrade(roundedScore);
```
---
### 3. Aggregate Violations by Severity
**Collect all violations from all dimensions:**
```javascript
const allViolations = dimensions.flatMap((dim) =>
results[dim].violations.map((v) => ({
...v,
dimension: dim,
})),
);
// Group by severity
const highSeverity = allViolations.filter((v) => v.severity === 'HIGH');
const mediumSeverity = allViolations.filter((v) => v.severity === 'MEDIUM');
const lowSeverity = allViolations.filter((v) => v.severity === 'LOW');
const violationSummary = {
total: allViolations.length,
HIGH: highSeverity.length,
MEDIUM: mediumSeverity.length,
LOW: lowSeverity.length,
};
```
---
### 4. Prioritize Recommendations
**Extract recommendations from all dimensions:**
```javascript
const allRecommendations = dimensions.flatMap((dim) =>
results[dim].recommendations.map((rec) => ({
dimension: dim,
recommendation: rec,
impact: results[dim].score < 70 ? 'HIGH' : 'MEDIUM',
})),
);
// Sort by impact (HIGH first)
const prioritizedRecommendations = allRecommendations.sort((a, b) => (a.impact === 'HIGH' ? -1 : 1)).slice(0, 10); // Top 10 recommendations
```
---
### 5. Create Review Summary Object
**Aggregate all results:**
```javascript
const reviewSummary = {
overall_score: roundedScore,
overall_grade: overallGrade,
quality_assessment: getQualityAssessment(roundedScore),
dimension_scores: {
determinism: results.determinism.score,
isolation: results.isolation.score,
maintainability: results.maintainability.score,
performance: results.performance.score,
},
dimension_grades: {
determinism: results.determinism.grade,
isolation: results.isolation.grade,
maintainability: results.maintainability.grade,
performance: results.performance.grade,
},
violations_summary: violationSummary,
all_violations: allViolations,
high_severity_violations: highSeverity,
top_10_recommendations: prioritizedRecommendations,
subagent_execution: 'PARALLEL (4 quality dimensions)',
performance_gain: '~60% faster than sequential',
};
// Save for Step 4 (report generation)
fs.writeFileSync(`/tmp/tea-test-review-summary-${timestamp}.json`, JSON.stringify(reviewSummary, null, 2), 'utf8');
```
---
### 6. Display Summary to User
```
✅ Quality Evaluation Complete (Parallel Execution)
📊 Overall Quality Score: {roundedScore}/100 (Grade: {overallGrade})
📈 Dimension Scores:
- Determinism: {determinism_score}/100 ({determinism_grade})
- Isolation: {isolation_score}/100 ({isolation_grade})
- Maintainability: {maintainability_score}/100 ({maintainability_grade})
- Performance: {performance_score}/100 ({performance_grade})
Coverage is excluded from `test-review` scoring. Use `trace` for coverage analysis and gates.
⚠️ Violations Found:
- HIGH: {high_count} violations
- MEDIUM: {medium_count} violations
- LOW: {low_count} violations
- TOTAL: {total_count} violations
🚀 Performance: Parallel execution ~60% faster than sequential
✅ Ready for report generation (Step 4)
```
---
---
### 7. Save Progress
**Save this step's accumulated work to `{outputFile}`.**
- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
```yaml
---
stepsCompleted: ['step-03f-aggregate-scores']
lastStep: 'step-03f-aggregate-scores'
lastSaved: '{date}'
---
```
Then write this step's output below the frontmatter.
- **If `{outputFile}` already exists**, update:
- Add `'step-03f-aggregate-scores'` to `stepsCompleted` array (only if not already present)
- Set `lastStep: 'step-03f-aggregate-scores'`
- Set `lastSaved: '{date}'`
- Append this step's output to the appropriate section of the document.
---
## EXIT CONDITION
Proceed to Step 4 when:
- ✅ All subagent outputs read successfully
- ✅ Overall score calculated
- ✅ Violations aggregated
- ✅ Recommendations prioritized
- ✅ Summary saved to temp file
- ✅ Output displayed to user
- ✅ Progress saved to output document
Load next step: `{nextStepFile}`
---
## 🚨 SYSTEM SUCCESS METRICS
### ✅ SUCCESS:
- All 4 subagent outputs read and parsed
- Overall score calculated with proper weights
- Violations aggregated correctly
- Summary complete and saved
### ❌ FAILURE:
- Failed to read one or more subagent outputs
- Score calculation incorrect
- Summary missing or incomplete
**Master Rule:** Aggregate determinism, isolation, maintainability, and performance only.

View File

@@ -0,0 +1,111 @@
---
name: 'step-04-generate-report'
description: 'Create test-review report and validate'
outputFile: '{test_artifacts}/test-review.md'
---
# Step 4: Generate Report & Validate
## STEP GOAL
Produce the test-review report and validate against checklist.
## MANDATORY EXECUTION RULES
- 📖 Read the entire step file before acting
- ✅ Speak in `{communication_language}`
---
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
- 💾 Record outputs before proceeding
- 📖 Load the next step only when instructed
## CONTEXT BOUNDARIES:
- Available context: config, loaded artifacts, and knowledge fragments
- Focus: this step's goal only
- Limits: do not execute future steps
- Dependencies: prior steps' outputs (if any)
## MANDATORY SEQUENCE
**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
## 1. Report Generation
Use `test-review-template.md` to produce `{outputFile}` including:
- Score summary
- Critical findings with fixes
- Warnings and recommendations
- Context references (story/test-design if available)
- Coverage boundary note: `test-review` does not score coverage. Direct coverage findings to `trace`.
---
## 2. Polish Output
Before finalizing, review the complete output document for quality:
1. **Remove duplication**: Progressive-append workflow may have created repeated sections — consolidate
2. **Verify consistency**: Ensure terminology, risk scores, and references are consistent throughout
3. **Check completeness**: All template sections should be populated or explicitly marked N/A
4. **Format cleanup**: Ensure markdown formatting is clean (tables aligned, headers consistent, no orphaned references)
---
## 3. Validation
Validate against `checklist.md` and fix any gaps.
- [ ] CLI sessions cleaned up (no orphaned browsers)
- [ ] Temp artifacts stored in `{test_artifacts}/` not random locations
---
## 4. Save Progress
**Save this step's accumulated work to `{outputFile}`.**
- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
```yaml
---
stepsCompleted: ['step-04-generate-report']
lastStep: 'step-04-generate-report'
lastSaved: '{date}'
---
```
Then write this step's output below the frontmatter.
- **If `{outputFile}` already exists**, update:
- Add `'step-04-generate-report'` to `stepsCompleted` array (only if not already present)
- Set `lastStep: 'step-04-generate-report'`
- Set `lastSaved: '{date}'`
- Append this step's output to the appropriate section of the document.
---
## 5. Completion Summary
Report:
- Scope reviewed
- Overall score
- Critical blockers
- Next recommended workflow (e.g., `automate` or `trace`)
## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
### ✅ SUCCESS:
- Step completed in full with required outputs
### ❌ SYSTEM FAILURE:
- Skipped sequence steps or missing outputs
**Master Rule:** Skipping steps is FORBIDDEN.