initial commit

This commit is contained in:
2026-03-16 19:54:53 -04:00
commit bfe0e01254
3341 changed files with 483939 additions and 0 deletions

View File

@@ -0,0 +1,6 @@
---
name: bmad-testarch-test-review
description: 'Review test quality using best practices validation. Use when user says "lets review tests" or "I want to evaluate test quality"'
---
Follow the instructions in [workflow.md](workflow.md).

View File

@@ -0,0 +1 @@
type: skill

View File

@@ -0,0 +1,475 @@
# Test Quality Review - Validation Checklist
Use this checklist to validate that the test quality review workflow completed successfully and all quality criteria were properly evaluated.
---
## Prerequisites
Note: `test-review` is optional and only audits existing tests; it does not generate tests.
Coverage analysis is out of scope for this workflow. Use `trace` for coverage metrics and coverage gate decisions.
### Test File Discovery
- [ ] Test file(s) identified for review (single/directory/suite scope)
- [ ] Test files exist and are readable
- [ ] Test framework detected (Playwright, Jest, Cypress, Vitest, etc.)
- [ ] Test framework configuration found (playwright.config.ts, jest.config.js, etc.)
### Knowledge Base Loading
- [ ] tea-index.csv loaded successfully
- [ ] `test-quality.md` loaded (Definition of Done)
- [ ] `fixture-architecture.md` loaded (Pure function → Fixture patterns)
- [ ] `network-first.md` loaded (Route intercept before navigate)
- [ ] `data-factories.md` loaded (Factory patterns)
- [ ] `test-levels-framework.md` loaded (E2E vs API vs Component vs Unit)
- [ ] All other enabled fragments loaded successfully
### Context Gathering
- [ ] Story file discovered or explicitly provided (if available)
- [ ] Test design document discovered or explicitly provided (if available)
- [ ] Acceptance criteria extracted from story (if available)
- [ ] Priority context (P0/P1/P2/P3) extracted from test-design (if available)
---
## Process Steps
### Step 1: Context Loading
- [ ] Review scope determined (single/directory/suite)
- [ ] Test file paths collected
- [ ] Related artifacts discovered (story, test-design)
- [ ] Knowledge base fragments loaded successfully
- [ ] Quality criteria flags read from workflow variables
### Step 2: Test File Parsing
**For Each Test File:**
- [ ] File read successfully
- [ ] File size measured (lines, KB)
- [ ] File structure parsed (describe blocks, it blocks)
- [ ] Test IDs extracted (if present)
- [ ] Priority markers extracted (if present)
- [ ] Imports analyzed
- [ ] Dependencies identified
**Test Structure Analysis:**
- [ ] Describe block count calculated
- [ ] It/test block count calculated
- [ ] BDD structure identified (Given-When-Then)
- [ ] Fixture usage detected
- [ ] Data factory usage detected
- [ ] Network interception patterns identified
- [ ] Assertions counted
- [ ] Waits and timeouts cataloged
- [ ] Conditionals (if/else) detected
- [ ] Try/catch blocks detected
- [ ] Shared state or globals detected
### Step 3: Quality Criteria Validation
Coverage criteria are intentionally excluded from this checklist.
**For Each Enabled Criterion:**
#### BDD Format (if `check_given_when_then: true`)
- [ ] Given-When-Then structure evaluated
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with line numbers
- [ ] Examples of good/bad patterns noted
#### Test IDs (if `check_test_ids: true`)
- [ ] Test ID presence validated
- [ ] Test ID format checked (e.g., 1.3-E2E-001)
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Missing IDs cataloged
#### Priority Markers (if `check_priority_markers: true`)
- [ ] P0/P1/P2/P3 classification validated
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Missing priorities cataloged
#### Hard Waits (if `check_hard_waits: true`)
- [ ] sleep(), waitForTimeout(), hardcoded delays detected
- [ ] Justification comments checked
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with line numbers and recommended fixes
#### Determinism (if `check_determinism: true`)
- [ ] Conditionals (if/else/switch) detected
- [ ] Try/catch abuse detected
- [ ] Random values (Math.random, Date.now) detected
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Isolation (if `check_isolation: true`)
- [ ] Cleanup hooks (afterEach/afterAll) validated
- [ ] Shared state detected
- [ ] Global variable mutations detected
- [ ] Resource cleanup verified
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Fixture Patterns (if `check_fixture_patterns: true`)
- [ ] Fixtures detected (test.extend)
- [ ] Pure functions validated
- [ ] mergeTests usage checked
- [ ] beforeEach complexity analyzed
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Data Factories (if `check_data_factories: true`)
- [ ] Factory functions detected
- [ ] Hardcoded data (magic strings/numbers) detected
- [ ] Faker.js or similar usage validated
- [ ] API-first setup pattern checked
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Network-First (if `check_network_first: true`)
- [ ] page.route() before page.goto() validated
- [ ] Race conditions detected (route after navigate)
- [ ] waitForResponse patterns checked
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Assertions (if `check_assertions: true`)
- [ ] Explicit assertions counted
- [ ] Implicit waits without assertions detected
- [ ] Assertion specificity validated
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
#### Test Length (if `check_test_length: true`)
- [ ] File line count calculated
- [ ] Threshold comparison (≤300 lines ideal)
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Splitting recommendations generated (if >300 lines)
#### Test Duration (if `check_test_duration: true`)
- [ ] Test complexity analyzed (as proxy for duration if no execution data)
- [ ] Threshold comparison (≤1.5 min target)
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Optimization recommendations generated
#### Flakiness Patterns (if `check_flakiness_patterns: true`)
- [ ] Tight timeouts detected (e.g., { timeout: 1000 })
- [ ] Race conditions detected
- [ ] Timing-dependent assertions detected
- [ ] Retry logic detected
- [ ] Environment-dependent assumptions detected
- [ ] Status assigned (PASS/WARN/FAIL)
- [ ] Violations recorded with recommended fixes
---
### Step 4: Quality Score Calculation
**Violation Counting:**
- [ ] Critical (P0) violations counted
- [ ] High (P1) violations counted
- [ ] Medium (P2) violations counted
- [ ] Low (P3) violations counted
- [ ] Violation breakdown by criterion recorded
**Score Calculation:**
- [ ] Starting score: 100
- [ ] Critical violations deducted (-10 each)
- [ ] High violations deducted (-5 each)
- [ ] Medium violations deducted (-2 each)
- [ ] Low violations deducted (-1 each)
- [ ] Bonus points added (max +30):
- [ ] Excellent BDD structure (+5 if applicable)
- [ ] Comprehensive fixtures (+5 if applicable)
- [ ] Comprehensive data factories (+5 if applicable)
- [ ] Network-first pattern (+5 if applicable)
- [ ] Perfect isolation (+5 if applicable)
- [ ] All test IDs present (+5 if applicable)
- [ ] Final score calculated: max(0, min(100, Starting - Violations + Bonus))
**Quality Grade:**
- [ ] Grade assigned based on score:
- 90-100: A+ (Excellent)
- 80-89: A (Good)
- 70-79: B (Acceptable)
- 60-69: C (Needs Improvement)
- <60: F (Critical Issues)
---
### Step 5: Review Report Generation
**Report Sections Created:**
- [ ] **Header Section**:
- [ ] Test file(s) reviewed listed
- [ ] Review date recorded
- [ ] Review scope noted (single/directory/suite)
- [ ] Quality score and grade displayed
- [ ] **Executive Summary**:
- [ ] Overall assessment (Excellent/Good/Needs Improvement/Critical)
- [ ] Key strengths listed (3-5 bullet points)
- [ ] Key weaknesses listed (3-5 bullet points)
- [ ] Recommendation stated (Approve/Approve with comments/Request changes/Block)
- [ ] **Quality Criteria Assessment**:
- [ ] Table with all criteria evaluated
- [ ] Status for each criterion (PASS/WARN/FAIL)
- [ ] Violation count per criterion
- [ ] **Critical Issues (Must Fix)**:
- [ ] P0/P1 violations listed
- [ ] Code location provided for each (file:line)
- [ ] Issue explanation clear
- [ ] Recommended fix provided with code example
- [ ] Knowledge base reference provided
- [ ] **Recommendations (Should Fix)**:
- [ ] P2/P3 violations listed
- [ ] Code location provided for each (file:line)
- [ ] Issue explanation clear
- [ ] Recommended improvement provided with code example
- [ ] Knowledge base reference provided
- [ ] **Best Practices Examples** (if good patterns found):
- [ ] Good patterns highlighted from tests
- [ ] Knowledge base fragments referenced
- [ ] Examples provided for others to follow
- [ ] **Knowledge Base References**:
- [ ] All fragments consulted listed
- [ ] Links to detailed guidance provided
---
### Step 6: Optional Outputs Generation
**Inline Comments** (if `generate_inline_comments: true`):
- [ ] Inline comments generated at violation locations
- [ ] Comment format: `// TODO (TEA Review): [Issue] - See test-review-{filename}.md`
- [ ] Comments added to test files (no logic changes)
- [ ] Test files remain valid and executable
**Quality Badge** (if `generate_quality_badge: true`):
- [ ] Badge created with quality score (e.g., "Test Quality: 87/100 (A)")
- [ ] Badge format suitable for README or documentation
- [ ] Badge saved to output folder
**Story Update** (if `append_to_story: true` and story file exists):
- [ ] "Test Quality Review" section created
- [ ] Quality score included
- [ ] Critical issues summarized
- [ ] Link to full review report provided
- [ ] Story file updated successfully
---
### Step 7: Save and Notify
**Outputs Saved:**
- [ ] Review report saved to `{output_file}`
- [ ] Inline comments written to test files (if enabled)
- [ ] Quality badge saved (if enabled)
- [ ] Story file updated (if enabled)
- [ ] All outputs are valid and readable
**Summary Message Generated:**
- [ ] Quality score and grade included
- [ ] Critical issue count stated
- [ ] Recommendation provided (Approve/Request changes/Block)
- [ ] Next steps clarified
- [ ] Message displayed to user
---
## Output Validation
### Review Report Completeness
- [ ] All required sections present
- [ ] No placeholder text or TODOs in report
- [ ] All code locations are accurate (file:line)
- [ ] All code examples are valid and demonstrate fix
- [ ] All knowledge base references are correct
### Review Report Accuracy
- [ ] Quality score matches violation breakdown
- [ ] Grade matches score range
- [ ] Violations correctly categorized by severity (P0/P1/P2/P3)
- [ ] Violations correctly attributed to quality criteria
- [ ] No false positives (violations are legitimate issues)
- [ ] No false negatives (critical issues not missed)
### Review Report Clarity
- [ ] Executive summary is clear and actionable
- [ ] Issue explanations are understandable
- [ ] Recommended fixes are implementable
- [ ] Code examples are correct and runnable
- [ ] Recommendation (Approve/Request changes) is clear
---
## Quality Checks
### Knowledge-Based Validation
- [ ] All feedback grounded in knowledge base fragments
- [ ] Recommendations follow proven patterns
- [ ] No arbitrary or opinion-based feedback
- [ ] Knowledge fragment references accurate and relevant
### Actionable Feedback
- [ ] Every issue includes recommended fix
- [ ] Every fix includes code example
- [ ] Code examples demonstrate correct pattern
- [ ] Fixes reference knowledge base for more detail
### Severity Classification
- [ ] Critical (P0) issues are genuinely critical (hard waits, race conditions, no assertions)
- [ ] High (P1) issues impact maintainability/reliability (missing IDs, hardcoded data)
- [ ] Medium (P2) issues are nice-to-have improvements (long files, missing priorities)
- [ ] Low (P3) issues are minor style/preference (verbose tests)
### Context Awareness
- [ ] Review considers project context (some patterns may be justified)
- [ ] Violations with justification comments noted as acceptable
- [ ] Edge cases acknowledged
- [ ] Recommendations are pragmatic, not dogmatic
---
## Integration Points
### Story File Integration
- [ ] Story file discovered correctly (if available)
- [ ] Acceptance criteria extracted and used for context
- [ ] Test quality section appended to story (if enabled)
- [ ] Link to review report added to story
### Test Design Integration
- [ ] Test design document discovered correctly (if available)
- [ ] Priority context (P0/P1/P2/P3) extracted and used
- [ ] Review validates tests align with prioritization
- [ ] Misalignment flagged (e.g., P0 scenario missing tests)
### Knowledge Base Integration
- [ ] tea-index.csv loaded successfully
- [ ] All required fragments loaded
- [ ] Fragments applied correctly to validation
- [ ] Fragment references in report are accurate
---
## Edge Cases and Special Situations
### Empty or Minimal Tests
- [ ] If test file is empty, report notes "No tests found"
- [ ] If test file has only boilerplate, report notes "No meaningful tests"
- [ ] Score reflects lack of content appropriately
### Legacy Tests
- [ ] Legacy tests acknowledged in context
- [ ] Review provides practical recommendations for improvement
- [ ] Recognizes that complete refactor may not be feasible
- [ ] Prioritizes critical issues (flakiness) over style
### Test Framework Variations
- [ ] Review adapts to test framework (Playwright vs Jest vs Cypress)
- [ ] Framework-specific patterns recognized (e.g., Playwright fixtures)
- [ ] Framework-specific violations detected (e.g., Cypress anti-patterns)
- [ ] Knowledge fragments applied appropriately for framework
### Justified Violations
- [ ] Violations with justification comments in code noted as acceptable
- [ ] Justifications evaluated for legitimacy
- [ ] Report acknowledges justified patterns
- [ ] Score not penalized for justified violations
---
## Final Validation
### Review Completeness
- [ ] All enabled quality criteria evaluated
- [ ] All test files in scope reviewed
- [ ] All violations cataloged
- [ ] All recommendations provided
- [ ] Review report is comprehensive
### Review Accuracy
- [ ] Quality score is accurate
- [ ] Violations are correct (no false positives)
- [ ] Critical issues not missed (no false negatives)
- [ ] Code locations are correct
- [ ] Knowledge base references are accurate
### Review Usefulness
- [ ] Feedback is actionable
- [ ] Recommendations are implementable
- [ ] Code examples are correct
- [ ] Review helps developer improve tests
- [ ] Review educates on best practices
### Workflow Complete
- [ ] All checklist items completed
- [ ] All outputs validated and saved
- [ ] User notified with summary
- [ ] Review ready for developer consumption
- [ ] Follow-up actions identified (if any)
---
## Notes
Record any issues, observations, or important context during workflow execution:
- **Test Framework**: [Playwright, Jest, Cypress, etc.]
- **Review Scope**: [single file, directory, full suite]
- **Quality Score**: [0-100 score, letter grade]
- **Critical Issues**: [Count of P0/P1 violations]
- **Recommendation**: [Approve / Approve with comments / Request changes / Block]
- **Special Considerations**: [Legacy code, justified patterns, edge cases]
- **Follow-up Actions**: [Re-review after fixes, pair programming, etc.]

View File

@@ -0,0 +1,45 @@
# Test Quality Review
**Workflow:** `bmad-testarch-test-review`
**Version:** 5.0 (Step-File Architecture)
---
## Overview
Review test quality using TEA knowledge base and produce a 0100 quality score with actionable findings.
Coverage assessment is intentionally out of scope for this workflow. Use `trace` for requirements coverage and coverage gate decisions.
---
## WORKFLOW ARCHITECTURE
This workflow uses **step-file architecture**:
- **Micro-file Design**: Each step is self-contained
- **JIT Loading**: Only the current step file is in memory
- **Sequential Enforcement**: Execute steps in order
---
## INITIALIZATION SEQUENCE
### 1. Configuration Loading
From `workflow.yaml`, resolve:
- `config_source`, `test_artifacts`, `user_name`, `communication_language`, `document_output_language`, `date`
- `test_dir`, `review_scope`
### 2. First Step
Load, read completely, and execute:
`./steps-c/step-01-load-context.md`
### 3. Resume Support
If the user selects **Resume** mode, load, read completely, and execute:
`./steps-c/step-01b-resume.md`
This checks the output document for progress tracking frontmatter and routes to the next incomplete step.

View File

@@ -0,0 +1,197 @@
---
name: 'step-01-load-context'
description: 'Load knowledge base, determine scope, and gather context'
nextStepFile: './step-02-discover-tests.md'
knowledgeIndex: '{project-root}/_bmad/tea/testarch/tea-index.csv'
outputFile: '{test_artifacts}/test-review.md'
---
# Step 1: Load Context & Knowledge Base
## STEP GOAL
Determine review scope, load required knowledge fragments, and gather related artifacts.
## MANDATORY EXECUTION RULES
- 📖 Read the entire step file before acting
- ✅ Speak in `{communication_language}`
---
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
- 💾 Record outputs before proceeding
- 📖 Load the next step only when instructed
## CONTEXT BOUNDARIES:
- Available context: config, loaded artifacts, and knowledge fragments
- Focus: this step's goal only
- Limits: do not execute future steps
- Dependencies: prior steps' outputs (if any)
## MANDATORY SEQUENCE
**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
## 1. Determine Scope and Stack
Use `review_scope`:
- **single**: one file
- **directory**: all tests in folder
- **suite**: all tests in repo
If unclear, ask the user.
**Stack Detection** (for context-aware loading):
Read `test_stack_type` from `{config_source}`. If `"auto"` or not configured, infer `{detected_stack}` by scanning `{project-root}`:
- **Frontend indicators**: `playwright.config.*`, `cypress.config.*`, `package.json` with react/vue/angular
- **Backend indicators**: `pyproject.toml`, `pom.xml`/`build.gradle`, `go.mod`, `*.csproj`, `Gemfile`, `Cargo.toml`
- **Both present** → `fullstack`; only frontend → `frontend`; only backend → `backend`
- Explicit `test_stack_type` overrides auto-detection
---
### Tiered Knowledge Loading
Load fragments based on their `tier` classification in `tea-index.csv`:
1. **Core tier** (always load): Foundational fragments required for this workflow
2. **Extended tier** (load on-demand): Load when deeper analysis is needed or when the user's context requires it
3. **Specialized tier** (load only when relevant): Load only when the specific use case matches (e.g., contract-testing only for microservices, email-auth only for email flows)
> **Context Efficiency**: Loading only core fragments reduces context usage by 40-50% compared to loading all fragments.
### Playwright Utils Loading Profiles
**If `tea_use_playwright_utils` is enabled**, select the appropriate loading profile:
- **API-only profile** (when `{detected_stack}` is `backend` or no `page.goto`/`page.locator` found in test files):
Load: `overview`, `api-request`, `auth-session`, `recurse` (~1,800 lines)
- **Full UI+API profile** (when `{detected_stack}` is `frontend`/`fullstack` or browser tests detected):
Load: all Playwright Utils core fragments (~4,500 lines)
**Detection**: Scan `{test_dir}` for files containing `page.goto` or `page.locator`. If none found, use API-only profile.
### Pact.js Utils Loading
**If `tea_use_pactjs_utils` is enabled** (and contract tests detected in review scope):
Load: `pactjs-utils-overview.md`, `pactjs-utils-provider-verifier.md`, `pactjs-utils-request-filter.md` (the 3 most relevant for reviewing provider verification tests)
**If `tea_use_pactjs_utils` is disabled** but contract tests are in review scope:
Load: `contract-testing.md`
### Pact MCP Loading
**If `tea_pact_mcp` is `"mcp"`:**
Load: `pact-mcp.md` — enables agent to use SmartBear MCP "Review Pact Tests" tool for automated best-practice feedback during test review.
## 2. Load Knowledge Base
From `{knowledgeIndex}` load:
Read `{config_source}` and check `tea_use_playwright_utils`, `tea_use_pactjs_utils`, `tea_pact_mcp`, and `tea_browser_automation` to select the correct fragment set.
**Core:**
- `test-quality.md`
- `data-factories.md`
- `test-levels-framework.md`
- `selective-testing.md`
- `test-healing-patterns.md`
- `selector-resilience.md`
- `timing-debugging.md`
**If Playwright Utils enabled:**
- `overview.md`, `api-request.md`, `network-recorder.md`, `auth-session.md`, `intercept-network-call.md`, `recurse.md`, `log.md`, `file-utils.md`, `burn-in.md`, `network-error-monitor.md`, `fixtures-composition.md`
**If disabled:**
- `fixture-architecture.md`
- `network-first.md`
- `playwright-config.md`
- `component-tdd.md`
- `ci-burn-in.md`
**Playwright CLI (if `tea_browser_automation` is "cli" or "auto"):**
- `playwright-cli.md`
**MCP Patterns (if `tea_browser_automation` is "mcp" or "auto"):**
- (existing MCP-related fragments, if any are added in future)
**Pact.js Utils (if enabled and contract tests in review scope):**
- `pactjs-utils-overview.md`, `pactjs-utils-provider-verifier.md`, `pactjs-utils-request-filter.md`
**Contract Testing (if pactjs-utils disabled but contract tests in review scope):**
- `contract-testing.md`
**Pact MCP (if tea_pact_mcp is "mcp"):**
- `pact-mcp.md`
---
## 3. Gather Context Artifacts
If available:
- Story file (acceptance criteria)
- Test design doc (priorities)
- Framework config
Summarize what was found.
Coverage mapping and coverage gates are out of scope in `test-review`. Route those concerns to `trace`.
---
## 4. Save Progress
**Save this step's accumulated work to `{outputFile}`.**
- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
```yaml
---
stepsCompleted: ['step-01-load-context']
lastStep: 'step-01-load-context'
lastSaved: '{date}'
---
```
Then write this step's output below the frontmatter.
- **If `{outputFile}` already exists**, update:
- Add `'step-01-load-context'` to `stepsCompleted` array (only if not already present)
- Set `lastStep: 'step-01-load-context'`
- Set `lastSaved: '{date}'`
- Append this step's output to the appropriate section of the document.
**Update `inputDocuments`**: Set `inputDocuments` in the output template frontmatter to the list of artifact paths loaded in this step (e.g., knowledge fragments, test design documents, configuration files).
Load next step: `{nextStepFile}`
## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
### ✅ SUCCESS:
- Step completed in full with required outputs
### ❌ SYSTEM FAILURE:
- Skipped sequence steps or missing outputs
**Master Rule:** Skipping steps is FORBIDDEN.

View File

@@ -0,0 +1,104 @@
---
name: 'step-01b-resume'
description: 'Resume interrupted workflow from last completed step'
outputFile: '{test_artifacts}/test-review.md'
---
# Step 1b: Resume Workflow
## STEP GOAL
Resume an interrupted workflow by loading the existing output document, displaying progress, and routing to the next incomplete step.
## MANDATORY EXECUTION RULES
- Read the entire step file before acting
- Speak in `{communication_language}`
---
## EXECUTION PROTOCOLS:
- Follow the MANDATORY SEQUENCE exactly
- Load the next step only when instructed
## CONTEXT BOUNDARIES:
- Available context: Output document with progress frontmatter
- Focus: Load progress and route to next step
- Limits: Do not re-execute completed steps
- Dependencies: Output document must exist from a previous run
## MANDATORY SEQUENCE
**CRITICAL:** Follow this sequence exactly.
### 1. Load Output Document
Read `{outputFile}` and parse YAML frontmatter for:
- `stepsCompleted` -- array of completed step names
- `lastStep` -- last completed step name
- `lastSaved` -- timestamp of last save
**If `{outputFile}` does not exist**, display:
"No previous progress found. There is no output document to resume from. Please use **[C] Create** to start a fresh workflow run."
**THEN:** Halt. Do not proceed.
---
### 2. Display Progress Dashboard
Display progress with checkmark/empty indicators:
```
Test Quality Review - Resume Progress:
1. Load Context (step-01-load-context) [completed/pending]
2. Discover Tests (step-02-discover-tests) [completed/pending]
3. Quality Evaluation + Aggregate (step-03f-aggregate-scores) [completed/pending]
4. Generate Report (step-04-generate-report) [completed/pending]
Last saved: {lastSaved}
```
---
### 3. Route to Next Step
Based on `lastStep`, load the next incomplete step:
| lastStep | Next Step File |
| --------------------------- | --------------------------------- |
| `step-01-load-context` | `./step-02-discover-tests.md` |
| `step-02-discover-tests` | `./step-03-quality-evaluation.md` |
| `step-03f-aggregate-scores` | `./step-04-generate-report.md` |
| `step-04-generate-report` | **Workflow already complete.** |
**If `lastStep` is the final step** (`step-04-generate-report`), display: "All steps completed. Use **[C] Create** to start fresh, **[V] Validate** to review outputs, or **[E] Edit** to make revisions." Then halt.
**If `lastStep` does not match any value above**, display: "Unknown progress state (`lastStep`: {lastStep}). Please use **[C] Create** to start fresh." Then halt.
**Otherwise**, load the identified step file, read completely, and execute.
The existing content in `{outputFile}` provides context from previously completed steps.
---
## SYSTEM SUCCESS/FAILURE METRICS
### SUCCESS:
- Output document loaded and parsed correctly
- Progress dashboard displayed accurately
- Routed to correct next step
### FAILURE:
- Not loading output document
- Incorrect progress display
- Routing to wrong step
**Master Rule:** Resume MUST route to the exact next incomplete step. Never re-execute completed steps.

View File

@@ -0,0 +1,113 @@
---
name: 'step-02-discover-tests'
description: 'Find and parse test files'
nextStepFile: './step-03-quality-evaluation.md'
outputFile: '{test_artifacts}/test-review.md'
---
# Step 2: Discover & Parse Tests
## STEP GOAL
Collect test files in scope and parse structure/metadata.
## MANDATORY EXECUTION RULES
- 📖 Read the entire step file before acting
- ✅ Speak in `{communication_language}`
---
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
- 💾 Record outputs before proceeding
- 📖 Load the next step only when instructed
## CONTEXT BOUNDARIES:
- Available context: config, loaded artifacts, and knowledge fragments
- Focus: this step's goal only
- Limits: do not execute future steps
- Dependencies: prior steps' outputs (if any)
## MANDATORY SEQUENCE
**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
## 1. Discover Test Files
- **single**: use provided file path
- **directory**: glob under `{test_dir}` or selected folder
- **suite**: glob all tests in repo
Halt if no tests are found.
---
## 2. Parse Metadata (per file)
Collect:
- File size and line count
- Test framework detected
- Describe/test block counts
- Test IDs and priority markers
- Imports, fixtures, factories, network interception
- Waits/timeouts and control flow (if/try/catch)
---
## 3. Evidence Collection (if `tea_browser_automation` is `cli` or `auto`)
> **Fallback:** If CLI is not installed, fall back to MCP (if available) or skip evidence collection.
**CLI Evidence Collection:**
All commands use the same named session to target the correct browser:
1. `playwright-cli -s=tea-review open <target_url>`
2. `playwright-cli -s=tea-review tracing-start`
3. Execute the flow under review (using `-s=tea-review` on each command)
4. `playwright-cli -s=tea-review tracing-stop` → saves trace.zip
5. `playwright-cli -s=tea-review screenshot --filename={test_artifacts}/review-evidence.png`
6. `playwright-cli -s=tea-review network` → capture network request log
7. `playwright-cli -s=tea-review close`
> **Session Hygiene:** Always close sessions using `playwright-cli -s=tea-review close`. Do NOT use `close-all` — it kills every session on the machine and breaks parallel execution.
---
## 4. Save Progress
**Save this step's accumulated work to `{outputFile}`.**
- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
```yaml
---
stepsCompleted: ['step-02-discover-tests']
lastStep: 'step-02-discover-tests'
lastSaved: '{date}'
---
```
Then write this step's output below the frontmatter.
- **If `{outputFile}` already exists**, update:
- Add `'step-02-discover-tests'` to `stepsCompleted` array (only if not already present)
- Set `lastStep: 'step-02-discover-tests'`
- Set `lastSaved: '{date}'`
- Append this step's output to the appropriate section of the document.
Load next step: `{nextStepFile}`
## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
### ✅ SUCCESS:
- Step completed in full with required outputs
### ❌ SYSTEM FAILURE:
- Skipped sequence steps or missing outputs
**Master Rule:** Skipping steps is FORBIDDEN.

View File

@@ -0,0 +1,274 @@
---
name: 'step-03-quality-evaluation'
description: 'Orchestrate adaptive quality dimension checks (agent-team, subagent, or sequential)'
nextStepFile: './step-03f-aggregate-scores.md'
---
# Step 3: Orchestrate Adaptive Quality Evaluation
## STEP GOAL
Select execution mode deterministically, then evaluate quality dimensions using agent-team, subagent, or sequential execution while preserving output contracts:
- Determinism
- Isolation
- Maintainability
- Performance
Coverage is intentionally excluded from this workflow and handled by `trace`.
## MANDATORY EXECUTION RULES
- 📖 Read the entire step file before acting
- ✅ Speak in `{communication_language}`
- ✅ Resolve execution mode from config (`tea_execution_mode`, `tea_capability_probe`)
- ✅ Apply fallback rules deterministically when requested mode is unsupported
- ✅ Wait for required worker steps to complete
- ❌ Do NOT skip capability checks when probing is enabled
- ❌ Do NOT proceed until required worker steps finish
---
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
- 💾 Wait for subagent outputs
- 📖 Load the next step only when instructed
## CONTEXT BOUNDARIES:
- Available context: test files from Step 2, knowledge fragments
- Focus: orchestration only (mode selection + worker dispatch)
- Limits: do not evaluate quality directly (delegate to worker steps)
---
## MANDATORY SEQUENCE
### 1. Prepare Execution Context
**Generate unique timestamp:**
```javascript
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
```
**Prepare context for all subagents:**
```javascript
const parseBooleanFlag = (value, defaultValue = true) => {
if (typeof value === 'string') {
const normalized = value.trim().toLowerCase();
if (['false', '0', 'off', 'no'].includes(normalized)) return false;
if (['true', '1', 'on', 'yes'].includes(normalized)) return true;
}
if (value === undefined || value === null) return defaultValue;
return Boolean(value);
};
const subagentContext = {
test_files: /* from Step 2 */,
knowledge_fragments_loaded: ['test-quality'],
config: {
execution_mode: config.tea_execution_mode || 'auto', // "auto" | "subagent" | "agent-team" | "sequential"
capability_probe: parseBooleanFlag(config.tea_capability_probe, true), // supports booleans and "false"/"true" strings
},
timestamp: timestamp
};
```
---
### 2. Resolve Execution Mode with Capability Probe
```javascript
const normalizeUserExecutionMode = (mode) => {
if (typeof mode !== 'string') return null;
const normalized = mode.trim().toLowerCase().replace(/[-_]/g, ' ').replace(/\s+/g, ' ');
if (normalized === 'auto') return 'auto';
if (normalized === 'sequential') return 'sequential';
if (normalized === 'subagent' || normalized === 'sub agent' || normalized === 'subagents' || normalized === 'sub agents') {
return 'subagent';
}
if (normalized === 'agent team' || normalized === 'agent teams' || normalized === 'agentteam') {
return 'agent-team';
}
return null;
};
const normalizeConfigExecutionMode = (mode) => {
if (mode === 'subagent') return 'subagent';
if (mode === 'auto' || mode === 'sequential' || mode === 'subagent' || mode === 'agent-team') {
return mode;
}
return null;
};
// Explicit user instruction in the active run takes priority over config.
const explicitModeFromUser = normalizeUserExecutionMode(runtime.getExplicitExecutionModeHint?.() || null);
const requestedMode = explicitModeFromUser || normalizeConfigExecutionMode(subagentContext.config.execution_mode) || 'auto';
const probeEnabled = subagentContext.config.capability_probe;
const supports = {
subagent: false,
agentTeam: false,
};
if (probeEnabled) {
supports.subagent = runtime.canLaunchSubagents?.() === true;
supports.agentTeam = runtime.canLaunchAgentTeams?.() === true;
}
let resolvedMode = requestedMode;
if (requestedMode === 'auto') {
if (supports.agentTeam) resolvedMode = 'agent-team';
else if (supports.subagent) resolvedMode = 'subagent';
else resolvedMode = 'sequential';
} else if (probeEnabled && requestedMode === 'agent-team' && !supports.agentTeam) {
resolvedMode = supports.subagent ? 'subagent' : 'sequential';
} else if (probeEnabled && requestedMode === 'subagent' && !supports.subagent) {
resolvedMode = 'sequential';
}
subagentContext.execution = {
requestedMode,
resolvedMode,
probeEnabled,
supports,
};
```
Resolution precedence:
1. Explicit user request in this run (`agent team` => `agent-team`; `subagent` => `subagent`; `sequential`; `auto`)
2. `tea_execution_mode` from config
3. Runtime capability fallback (when probing enabled)
If probing is disabled, honor the requested mode strictly. If that mode cannot be executed at runtime, fail with explicit error instead of silent fallback.
---
### 3. Dispatch 4 Quality Workers
**Subagent A: Determinism**
- File: `./step-03a-subagent-determinism.md`
- Output: `/tmp/tea-test-review-determinism-${timestamp}.json`
- Execution:
- `agent-team` or `subagent`: launch non-blocking
- `sequential`: run blocking and wait
- Status: Running... ⟳
**Subagent B: Isolation**
- File: `./step-03b-subagent-isolation.md`
- Output: `/tmp/tea-test-review-isolation-${timestamp}.json`
- Status: Running... ⟳
**Subagent C: Maintainability**
- File: `./step-03c-subagent-maintainability.md`
- Output: `/tmp/tea-test-review-maintainability-${timestamp}.json`
- Status: Running... ⟳
**Subagent D: Performance**
- File: `./step-03e-subagent-performance.md`
- Output: `/tmp/tea-test-review-performance-${timestamp}.json`
- Status: Running... ⟳
In `agent-team` and `subagent` modes, runtime decides worker scheduling and concurrency.
---
### 4. Wait for Expected Worker Completion
**If `resolvedMode` is `agent-team` or `subagent`:**
```
⏳ Waiting for 4 quality subagents to complete...
✅ All 4 quality subagents completed successfully!
```
**If `resolvedMode` is `sequential`:**
```
✅ Sequential mode: each worker already completed during dispatch.
```
---
### 5. Verify All Outputs Exist
```javascript
const outputs = ['determinism', 'isolation', 'maintainability', 'performance'].map(
(dim) => `/tmp/tea-test-review-${dim}-${timestamp}.json`,
);
outputs.forEach((output) => {
if (!fs.existsSync(output)) {
throw new Error(`Subagent output missing: ${output}`);
}
});
```
---
### 6. Execution Report
```
🚀 Performance Report:
- Execution Mode: {resolvedMode}
- Total Elapsed: ~mode-dependent
- Parallel Gain: ~60-70% faster when mode is subagent/agent-team
```
---
### 7. Proceed to Aggregation
Pass the same `timestamp` value to Step 3F (do not regenerate it). Step 3F must read the exact temp files written in this step.
Load next step: `{nextStepFile}`
The aggregation step (3F) will:
- Read all 4 subagent outputs
- Calculate weighted overall score (0-100)
- Aggregate violations by severity
- Generate review report with top suggestions
---
## EXIT CONDITION
Proceed to Step 3F when:
- ✅ All 4 subagents completed successfully
- ✅ All output files exist and are valid JSON
- ✅ Execution metrics displayed
**Do NOT proceed if any subagent failed.**
---
## 🚨 SYSTEM SUCCESS METRICS
### ✅ SUCCESS:
- All 4 subagents launched and completed
- All required worker steps completed
- Output files generated and valid
- Fallback behavior respected configuration and capability probe rules
### ❌ FAILURE:
- One or more subagents failed
- Output files missing or invalid
- Unsupported requested mode with probing disabled
**Master Rule:** Deterministic mode selection + stable output contract. Use the best supported mode, then aggregate normally.

View File

@@ -0,0 +1,214 @@
---
name: 'step-03a-subagent-determinism'
description: 'Subagent: Check test determinism (no random/time dependencies)'
subagent: true
outputFile: '/tmp/tea-test-review-determinism-{{timestamp}}.json'
---
# Subagent 3A: Determinism Quality Check
## SUBAGENT CONTEXT
This is an **isolated subagent** running in parallel with other quality dimension checks.
**What you have from parent workflow:**
- Test files discovered in Step 2
- Knowledge fragment: test-quality (determinism criteria)
- Config: test framework
**Your task:** Analyze test files for DETERMINISM violations only.
---
## MANDATORY EXECUTION RULES
- 📖 Read this entire subagent file before acting
- ✅ Check DETERMINISM only (not other quality dimensions)
- ✅ Output structured JSON to temp file
- ❌ Do NOT check isolation, maintainability, coverage, or performance (other subagents)
- ❌ Do NOT modify test files (read-only analysis)
- ❌ Do NOT run tests (just analyze code)
---
## SUBAGENT TASK
### 1. Identify Determinism Violations
**Scan test files for non-deterministic patterns:**
**HIGH SEVERITY Violations**:
- `Math.random()` - Random number generation
- `Date.now()` or `new Date()` without mocking
- `setTimeout` / `setInterval` without proper waits
- External API calls without mocking
- File system operations on random paths
- Database queries with non-deterministic ordering
**MEDIUM SEVERITY Violations**:
- `page.waitForTimeout(N)` - Hard waits instead of conditions
- Flaky selectors (CSS classes that may change)
- Race conditions (missing proper synchronization)
- Test order dependencies (test A must run before test B)
**LOW SEVERITY Violations**:
- Missing test isolation (shared state between tests)
- Console timestamps without fixed timezone
### 2. Analyze Each Test File
For each test file from Step 2:
```javascript
const violations = [];
// Check for Math.random()
if (testFileContent.includes('Math.random()')) {
violations.push({
file: testFile,
line: findLineNumber('Math.random()'),
severity: 'HIGH',
category: 'random-generation',
description: 'Test uses Math.random() - non-deterministic',
suggestion: 'Use faker.seed(12345) for deterministic random data',
});
}
// Check for Date.now()
if (testFileContent.includes('Date.now()') || testFileContent.includes('new Date()')) {
violations.push({
file: testFile,
line: findLineNumber('Date.now()'),
severity: 'HIGH',
category: 'time-dependency',
description: 'Test uses Date.now() or new Date() without mocking',
suggestion: 'Mock system time with test.useFakeTimers() or use fixed timestamps',
});
}
// Check for hard waits
if (testFileContent.includes('waitForTimeout')) {
violations.push({
file: testFile,
line: findLineNumber('waitForTimeout'),
severity: 'MEDIUM',
category: 'hard-wait',
description: 'Test uses waitForTimeout - creates flakiness',
suggestion: 'Replace with expect(locator).toBeVisible() or waitForResponse',
});
}
// ... check other patterns
```
### 3. Calculate Determinism Score
**Scoring Logic**:
```javascript
const totalChecks = testFiles.length * checksPerFile;
const failedChecks = violations.length;
const passedChecks = totalChecks - failedChecks;
// Weight violations by severity
const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
// Score: 100 - (penalty points)
const score = Math.max(0, 100 - totalPenalty);
```
---
## OUTPUT FORMAT
Write JSON to temp file: `/tmp/tea-test-review-determinism-{{timestamp}}.json`
```json
{
"dimension": "determinism",
"score": 85,
"max_score": 100,
"grade": "B",
"violations": [
{
"file": "tests/api/user.spec.ts",
"line": 42,
"severity": "HIGH",
"category": "random-generation",
"description": "Test uses Math.random() - non-deterministic",
"suggestion": "Use faker.seed(12345) for deterministic random data",
"code_snippet": "const userId = Math.random() * 1000;"
},
{
"file": "tests/e2e/checkout.spec.ts",
"line": 78,
"severity": "MEDIUM",
"category": "hard-wait",
"description": "Test uses waitForTimeout - creates flakiness",
"suggestion": "Replace with expect(locator).toBeVisible()",
"code_snippet": "await page.waitForTimeout(5000);"
}
],
"passed_checks": 12,
"failed_checks": 3,
"total_checks": 15,
"violation_summary": {
"HIGH": 1,
"MEDIUM": 1,
"LOW": 1
},
"recommendations": [
"Use faker with fixed seed for all random data",
"Replace all waitForTimeout with conditional waits",
"Mock Date.now() in tests that use current time"
],
"summary": "Tests are mostly deterministic with 3 violations (1 HIGH, 1 MEDIUM, 1 LOW)"
}
```
**On Error:**
```json
{
"dimension": "determinism",
"success": false,
"error": "Error message describing what went wrong"
}
```
---
## EXIT CONDITION
Subagent completes when:
- ✅ All test files analyzed for determinism violations
- ✅ Score calculated (0-100)
- ✅ Violations categorized by severity
- ✅ Recommendations generated
- ✅ JSON output written to temp file
**Subagent terminates here.** Parent workflow will read output and aggregate with other quality dimensions.
---
## 🚨 SUBAGENT SUCCESS METRICS
### ✅ SUCCESS:
- All test files scanned for determinism violations
- Score calculated with proper severity weighting
- JSON output valid and complete
- Only determinism checked (not other dimensions)
### ❌ FAILURE:
- Checked quality dimensions other than determinism
- Invalid or missing JSON output
- Score calculation incorrect
- Modified test files (should be read-only)

View File

@@ -0,0 +1,125 @@
---
name: 'step-03b-subagent-isolation'
description: 'Subagent: Check test isolation (no shared state/dependencies)'
subagent: true
outputFile: '/tmp/tea-test-review-isolation-{{timestamp}}.json'
---
# Subagent 3B: Isolation Quality Check
## SUBAGENT CONTEXT
This is an **isolated subagent** running in parallel with other quality dimension checks.
**Your task:** Analyze test files for ISOLATION violations only.
---
## MANDATORY EXECUTION RULES
- ✅ Check ISOLATION only (not other quality dimensions)
- ✅ Output structured JSON to temp file
- ❌ Do NOT check determinism, maintainability, coverage, or performance
- ❌ Do NOT modify test files (read-only analysis)
---
## SUBAGENT TASK
### 1. Identify Isolation Violations
**Scan test files for isolation issues:**
**HIGH SEVERITY Violations**:
- Global state mutations (global variables modified)
- Test order dependencies (test B depends on test A running first)
- Shared database records without cleanup
- beforeAll/afterAll with side effects leaking to other tests
**MEDIUM SEVERITY Violations**:
- Missing test cleanup (created data not deleted)
- Shared fixtures that mutate state
- Tests that assume specific execution order
- Environment variables modified without restoration
**LOW SEVERITY Violations**:
- Tests sharing test data (but not mutating)
- Missing test.describe grouping
- Tests that could be more isolated
### 2. Calculate Isolation Score
```javascript
const totalChecks = testFiles.length * checksPerFile;
const failedChecks = violations.length;
const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
const score = Math.max(0, 100 - totalPenalty);
```
---
## OUTPUT FORMAT
```json
{
"dimension": "isolation",
"score": 90,
"max_score": 100,
"grade": "A-",
"violations": [
{
"file": "tests/api/integration.spec.ts",
"line": 15,
"severity": "HIGH",
"category": "test-order-dependency",
"description": "Test depends on previous test creating user record",
"suggestion": "Each test should create its own test data in beforeEach",
"code_snippet": "test('should update user', async () => { /* assumes user exists */ });"
}
],
"passed_checks": 14,
"failed_checks": 1,
"total_checks": 15,
"violation_summary": {
"HIGH": 1,
"MEDIUM": 0,
"LOW": 0
},
"recommendations": [
"Add beforeEach hooks to create test data",
"Add afterEach hooks to cleanup created records",
"Use test.describe.configure({ mode: 'parallel' }) to enforce isolation"
],
"summary": "Tests are well isolated with 1 HIGH severity violation"
}
```
---
## EXIT CONDITION
Subagent completes when:
- ✅ All test files analyzed for isolation violations
- ✅ Score calculated
- ✅ JSON output written to temp file
**Subagent terminates here.**
---
## 🚨 SUBAGENT SUCCESS METRICS
### ✅ SUCCESS:
- Only isolation checked (not other dimensions)
- JSON output valid and complete
### ❌ FAILURE:
- Checked quality dimensions other than isolation
- Invalid or missing JSON output

View File

@@ -0,0 +1,102 @@
---
name: 'step-03c-subagent-maintainability'
description: 'Subagent: Check test maintainability (readability, structure, DRY)'
subagent: true
outputFile: '/tmp/tea-test-review-maintainability-{{timestamp}}.json'
---
# Subagent 3C: Maintainability Quality Check
## SUBAGENT CONTEXT
This is an **isolated subagent** running in parallel with other quality dimension checks.
**Your task:** Analyze test files for MAINTAINABILITY violations only.
---
## MANDATORY EXECUTION RULES
- ✅ Check MAINTAINABILITY only (not other quality dimensions)
- ✅ Output structured JSON to temp file
- ❌ Do NOT check determinism, isolation, coverage, or performance
---
## SUBAGENT TASK
### 1. Identify Maintainability Violations
**HIGH SEVERITY Violations**:
- Tests >100 lines (too complex)
- No test.describe grouping
- Duplicate test logic (copy-paste)
- Unclear test names (no Given/When/Then structure)
- Magic numbers/strings without constants
**MEDIUM SEVERITY Violations**:
- Tests missing comments for complex logic
- Inconsistent naming conventions
- Excessive nesting (>3 levels)
- Large setup/teardown blocks
**LOW SEVERITY Violations**:
- Minor code style issues
- Could benefit from helper functions
- Inconsistent assertion styles
### 2. Calculate Maintainability Score
```javascript
const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
const score = Math.max(0, 100 - totalPenalty);
```
---
## OUTPUT FORMAT
```json
{
"dimension": "maintainability",
"score": 75,
"max_score": 100,
"grade": "C",
"violations": [
{
"file": "tests/e2e/complex-flow.spec.ts",
"line": 1,
"severity": "HIGH",
"category": "test-too-long",
"description": "Test file is 250 lines - too complex to maintain",
"suggestion": "Split into multiple smaller test files by feature area",
"code_snippet": "test.describe('Complex flow', () => { /* 250 lines */ });"
}
],
"passed_checks": 10,
"failed_checks": 5,
"violation_summary": {
"HIGH": 2,
"MEDIUM": 2,
"LOW": 1
},
"recommendations": [
"Split large test files into smaller, focused files (<100 lines each)",
"Add test.describe grouping for related tests",
"Extract duplicate logic into helper functions"
],
"summary": "Tests have maintainability issues - 5 violations (2 HIGH)"
}
```
---
## EXIT CONDITION
Subagent completes when JSON output written to temp file.
**Subagent terminates here.**

View File

@@ -0,0 +1,117 @@
---
name: 'step-03e-subagent-performance'
description: 'Subagent: Check test performance (speed, efficiency, parallelization)'
subagent: true
outputFile: '/tmp/tea-test-review-performance-{{timestamp}}.json'
---
# Subagent 3E: Performance Quality Check
## SUBAGENT CONTEXT
This is an **isolated subagent** running in parallel with other quality dimension checks.
**Your task:** Analyze test files for PERFORMANCE violations only.
---
## MANDATORY EXECUTION RULES
- ✅ Check PERFORMANCE only (not other quality dimensions)
- ✅ Output structured JSON to temp file
- ❌ Do NOT check determinism, isolation, maintainability, or coverage
---
## SUBAGENT TASK
### 1. Identify Performance Violations
**HIGH SEVERITY Violations**:
- Tests not parallelizable (using test.describe.serial unnecessarily)
- Slow setup/teardown (creating fresh DB for every test)
- Excessive navigation (reloading pages unnecessarily)
- No fixture reuse (repeating expensive operations)
**MEDIUM SEVERITY Violations**:
- Hard waits >2 seconds (waitForTimeout(5000))
- Inefficient selectors (page.$$ instead of locators)
- Large data sets in tests without pagination
- Missing performance optimizations
**LOW SEVERITY Violations**:
- Could use parallelization (test.describe.configure({ mode: 'parallel' }))
- Minor inefficiencies
- Excessive logging
### 2. Calculate Performance Score
```javascript
const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
const score = Math.max(0, 100 - totalPenalty);
```
---
## OUTPUT FORMAT
```json
{
"dimension": "performance",
"score": 80,
"max_score": 100,
"grade": "B",
"violations": [
{
"file": "tests/e2e/search.spec.ts",
"line": 10,
"severity": "HIGH",
"category": "not-parallelizable",
"description": "Tests use test.describe.serial unnecessarily - reduces parallel execution",
"suggestion": "Remove .serial unless tests truly share state",
"code_snippet": "test.describe.serial('Search tests', () => { ... });"
},
{
"file": "tests/api/bulk-operations.spec.ts",
"line": 35,
"severity": "MEDIUM",
"category": "slow-setup",
"description": "Test creates 1000 records in setup - very slow",
"suggestion": "Use smaller data sets or fixture factories",
"code_snippet": "beforeEach(async () => { for (let i=0; i<1000; i++) { ... } });"
}
],
"passed_checks": 13,
"failed_checks": 2,
"violation_summary": {
"HIGH": 1,
"MEDIUM": 1,
"LOW": 0
},
"performance_metrics": {
"parallelizable_tests": 80,
"serial_tests": 20,
"avg_test_duration_estimate": "~2 seconds",
"slow_tests": ["bulk-operations.spec.ts (>30s)"]
},
"recommendations": [
"Enable parallel mode where possible",
"Reduce setup data to minimum needed",
"Use fixtures to share expensive setup across tests",
"Remove unnecessary .serial constraints"
],
"summary": "Good performance with 2 violations - 80% tests can run in parallel"
}
```
---
## EXIT CONDITION
Subagent completes when JSON output written to temp file.
**Subagent terminates here.**

View File

@@ -0,0 +1,277 @@
---
name: 'step-03f-aggregate-scores'
description: 'Aggregate quality dimension scores into overall 0-100 score'
nextStepFile: './step-04-generate-report.md'
outputFile: '{test_artifacts}/test-review.md'
---
# Step 3F: Aggregate Quality Scores
## STEP GOAL
Read outputs from 4 quality subagents, calculate weighted overall score (0-100), and aggregate violations for report generation.
---
## MANDATORY EXECUTION RULES
- 📖 Read the entire step file before acting
- ✅ Speak in `{communication_language}`
- ✅ Read all 4 subagent outputs
- ✅ Calculate weighted overall score
- ✅ Aggregate violations by severity
- ❌ Do NOT re-evaluate quality (use subagent outputs)
---
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
- 💾 Record outputs before proceeding
- 📖 Load the next step only when instructed
---
## MANDATORY SEQUENCE
### 1. Read All Subagent Outputs
```javascript
// Use the SAME timestamp generated in Step 3 (do not regenerate).
const timestamp = subagentContext?.timestamp;
if (!timestamp) {
throw new Error('Missing timestamp from Step 3 context. Pass Step 3 timestamp into Step 3F.');
}
const dimensions = ['determinism', 'isolation', 'maintainability', 'performance'];
const results = {};
dimensions.forEach((dim) => {
const outputPath = `/tmp/tea-test-review-${dim}-${timestamp}.json`;
results[dim] = JSON.parse(fs.readFileSync(outputPath, 'utf8'));
});
```
**Verify all succeeded:**
```javascript
const allSucceeded = dimensions.every((dim) => results[dim].score !== undefined);
if (!allSucceeded) {
throw new Error('One or more quality subagents failed!');
}
```
---
### 2. Calculate Weighted Overall Score
**Dimension Weights** (based on TEA quality priorities):
```javascript
const weights = {
determinism: 0.3, // 30% - Reliability and flake prevention
isolation: 0.3, // 30% - Parallel safety and independence
maintainability: 0.25, // 25% - Readability and long-term health
performance: 0.15, // 15% - Speed and execution efficiency
};
```
**Calculate overall score:**
```javascript
const overallScore = dimensions.reduce((sum, dim) => {
return sum + results[dim].score * weights[dim];
}, 0);
const roundedScore = Math.round(overallScore);
```
**Determine grade:**
```javascript
const getGrade = (score) => {
if (score >= 90) return 'A';
if (score >= 80) return 'B';
if (score >= 70) return 'C';
if (score >= 60) return 'D';
return 'F';
};
const overallGrade = getGrade(roundedScore);
```
---
### 3. Aggregate Violations by Severity
**Collect all violations from all dimensions:**
```javascript
const allViolations = dimensions.flatMap((dim) =>
results[dim].violations.map((v) => ({
...v,
dimension: dim,
})),
);
// Group by severity
const highSeverity = allViolations.filter((v) => v.severity === 'HIGH');
const mediumSeverity = allViolations.filter((v) => v.severity === 'MEDIUM');
const lowSeverity = allViolations.filter((v) => v.severity === 'LOW');
const violationSummary = {
total: allViolations.length,
HIGH: highSeverity.length,
MEDIUM: mediumSeverity.length,
LOW: lowSeverity.length,
};
```
---
### 4. Prioritize Recommendations
**Extract recommendations from all dimensions:**
```javascript
const allRecommendations = dimensions.flatMap((dim) =>
results[dim].recommendations.map((rec) => ({
dimension: dim,
recommendation: rec,
impact: results[dim].score < 70 ? 'HIGH' : 'MEDIUM',
})),
);
// Sort by impact (HIGH first)
const prioritizedRecommendations = allRecommendations.sort((a, b) => (a.impact === 'HIGH' ? -1 : 1)).slice(0, 10); // Top 10 recommendations
```
---
### 5. Create Review Summary Object
**Aggregate all results:**
```javascript
const reviewSummary = {
overall_score: roundedScore,
overall_grade: overallGrade,
quality_assessment: getQualityAssessment(roundedScore),
dimension_scores: {
determinism: results.determinism.score,
isolation: results.isolation.score,
maintainability: results.maintainability.score,
performance: results.performance.score,
},
dimension_grades: {
determinism: results.determinism.grade,
isolation: results.isolation.grade,
maintainability: results.maintainability.grade,
performance: results.performance.grade,
},
violations_summary: violationSummary,
all_violations: allViolations,
high_severity_violations: highSeverity,
top_10_recommendations: prioritizedRecommendations,
subagent_execution: 'PARALLEL (4 quality dimensions)',
performance_gain: '~60% faster than sequential',
};
// Save for Step 4 (report generation)
fs.writeFileSync(`/tmp/tea-test-review-summary-${timestamp}.json`, JSON.stringify(reviewSummary, null, 2), 'utf8');
```
---
### 6. Display Summary to User
```
✅ Quality Evaluation Complete (Parallel Execution)
📊 Overall Quality Score: {roundedScore}/100 (Grade: {overallGrade})
📈 Dimension Scores:
- Determinism: {determinism_score}/100 ({determinism_grade})
- Isolation: {isolation_score}/100 ({isolation_grade})
- Maintainability: {maintainability_score}/100 ({maintainability_grade})
- Performance: {performance_score}/100 ({performance_grade})
Coverage is excluded from `test-review` scoring. Use `trace` for coverage analysis and gates.
⚠️ Violations Found:
- HIGH: {high_count} violations
- MEDIUM: {medium_count} violations
- LOW: {low_count} violations
- TOTAL: {total_count} violations
🚀 Performance: Parallel execution ~60% faster than sequential
✅ Ready for report generation (Step 4)
```
---
---
### 7. Save Progress
**Save this step's accumulated work to `{outputFile}`.**
- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
```yaml
---
stepsCompleted: ['step-03f-aggregate-scores']
lastStep: 'step-03f-aggregate-scores'
lastSaved: '{date}'
---
```
Then write this step's output below the frontmatter.
- **If `{outputFile}` already exists**, update:
- Add `'step-03f-aggregate-scores'` to `stepsCompleted` array (only if not already present)
- Set `lastStep: 'step-03f-aggregate-scores'`
- Set `lastSaved: '{date}'`
- Append this step's output to the appropriate section of the document.
---
## EXIT CONDITION
Proceed to Step 4 when:
- ✅ All subagent outputs read successfully
- ✅ Overall score calculated
- ✅ Violations aggregated
- ✅ Recommendations prioritized
- ✅ Summary saved to temp file
- ✅ Output displayed to user
- ✅ Progress saved to output document
Load next step: `{nextStepFile}`
---
## 🚨 SYSTEM SUCCESS METRICS
### ✅ SUCCESS:
- All 4 subagent outputs read and parsed
- Overall score calculated with proper weights
- Violations aggregated correctly
- Summary complete and saved
### ❌ FAILURE:
- Failed to read one or more subagent outputs
- Score calculation incorrect
- Summary missing or incomplete
**Master Rule:** Aggregate determinism, isolation, maintainability, and performance only.

View File

@@ -0,0 +1,111 @@
---
name: 'step-04-generate-report'
description: 'Create test-review report and validate'
outputFile: '{test_artifacts}/test-review.md'
---
# Step 4: Generate Report & Validate
## STEP GOAL
Produce the test-review report and validate against checklist.
## MANDATORY EXECUTION RULES
- 📖 Read the entire step file before acting
- ✅ Speak in `{communication_language}`
---
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
- 💾 Record outputs before proceeding
- 📖 Load the next step only when instructed
## CONTEXT BOUNDARIES:
- Available context: config, loaded artifacts, and knowledge fragments
- Focus: this step's goal only
- Limits: do not execute future steps
- Dependencies: prior steps' outputs (if any)
## MANDATORY SEQUENCE
**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
## 1. Report Generation
Use `test-review-template.md` to produce `{outputFile}` including:
- Score summary
- Critical findings with fixes
- Warnings and recommendations
- Context references (story/test-design if available)
- Coverage boundary note: `test-review` does not score coverage. Direct coverage findings to `trace`.
---
## 2. Polish Output
Before finalizing, review the complete output document for quality:
1. **Remove duplication**: Progressive-append workflow may have created repeated sections — consolidate
2. **Verify consistency**: Ensure terminology, risk scores, and references are consistent throughout
3. **Check completeness**: All template sections should be populated or explicitly marked N/A
4. **Format cleanup**: Ensure markdown formatting is clean (tables aligned, headers consistent, no orphaned references)
---
## 3. Validation
Validate against `checklist.md` and fix any gaps.
- [ ] CLI sessions cleaned up (no orphaned browsers)
- [ ] Temp artifacts stored in `{test_artifacts}/` not random locations
---
## 4. Save Progress
**Save this step's accumulated work to `{outputFile}`.**
- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
```yaml
---
stepsCompleted: ['step-04-generate-report']
lastStep: 'step-04-generate-report'
lastSaved: '{date}'
---
```
Then write this step's output below the frontmatter.
- **If `{outputFile}` already exists**, update:
- Add `'step-04-generate-report'` to `stepsCompleted` array (only if not already present)
- Set `lastStep: 'step-04-generate-report'`
- Set `lastSaved: '{date}'`
- Append this step's output to the appropriate section of the document.
---
## 5. Completion Summary
Report:
- Scope reviewed
- Overall score
- Critical blockers
- Next recommended workflow (e.g., `automate` or `trace`)
## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
### ✅ SUCCESS:
- Step completed in full with required outputs
### ❌ SYSTEM FAILURE:
- Skipped sequence steps or missing outputs
**Master Rule:** Skipping steps is FORBIDDEN.

View File

@@ -0,0 +1,65 @@
---
name: 'step-01-assess'
description: 'Load an existing output for editing'
nextStepFile: './step-02-apply-edit.md'
---
# Step 1: Assess Edit Target
## STEP GOAL:
Identify which output should be edited and load it.
## MANDATORY EXECUTION RULES (READ FIRST):
### Universal Rules:
- 📖 Read the complete step file before taking any action
- ✅ Speak in `{communication_language}`
### Role Reinforcement:
- ✅ You are the Master Test Architect
### Step-Specific Rules:
- 🎯 Ask the user which output file to edit
- 🚫 Do not edit until target is confirmed
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
## CONTEXT BOUNDARIES:
- Available context: existing outputs
- Focus: select edit target
- Limits: no edits yet
## MANDATORY SEQUENCE
**CRITICAL:** Follow this sequence exactly.
### 1. Identify Target
Ask the user to provide the output file path or select from known outputs.
### 2. Load Target
Read the provided output file in full.
### 3. Confirm
Confirm the target and proceed to edit.
Load next step: `{nextStepFile}`
## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
### ✅ SUCCESS:
- Target identified and loaded
### ❌ SYSTEM FAILURE:
- Proceeding without a confirmed target

View File

@@ -0,0 +1,60 @@
---
name: 'step-02-apply-edit'
description: 'Apply edits to the selected output'
---
# Step 2: Apply Edits
## STEP GOAL:
Apply the requested edits to the selected output and confirm changes.
## MANDATORY EXECUTION RULES (READ FIRST):
### Universal Rules:
- 📖 Read the complete step file before taking any action
- ✅ Speak in `{communication_language}`
### Role Reinforcement:
- ✅ You are the Master Test Architect
### Step-Specific Rules:
- 🎯 Only apply edits explicitly requested by the user
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
## CONTEXT BOUNDARIES:
- Available context: selected output and user changes
- Focus: apply edits only
## MANDATORY SEQUENCE
**CRITICAL:** Follow this sequence exactly.
### 1. Confirm Requested Changes
Restate what will be changed and confirm.
### 2. Apply Changes
Update the output file accordingly.
### 3. Report
Summarize the edits applied.
## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
### ✅ SUCCESS:
- Changes applied and confirmed
### ❌ SYSTEM FAILURE:
- Unconfirmed edits or missing update

View File

@@ -0,0 +1,67 @@
---
name: 'step-01-validate'
description: 'Validate workflow outputs against checklist'
outputFile: '{test_artifacts}/test-review-validation-report.md'
validationChecklist: '../checklist.md'
---
# Step 1: Validate Outputs
## STEP GOAL:
Validate outputs using the workflow checklist and record findings.
## MANDATORY EXECUTION RULES (READ FIRST):
### Universal Rules:
- 📖 Read the complete step file before taking any action
- ✅ Speak in `{communication_language}`
### Role Reinforcement:
- ✅ You are the Master Test Architect
### Step-Specific Rules:
- 🎯 Validate against `{validationChecklist}`
- 🚫 Do not skip checks
## EXECUTION PROTOCOLS:
- 🎯 Follow the MANDATORY SEQUENCE exactly
- 💾 Write findings to `{outputFile}`
## CONTEXT BOUNDARIES:
- Available context: workflow outputs and checklist
- Focus: validation only
- Limits: do not modify outputs in this step
## MANDATORY SEQUENCE
**CRITICAL:** Follow this sequence exactly.
### 1. Load Checklist
Read `{validationChecklist}` and list all criteria.
### 2. Validate Outputs
Evaluate outputs against each checklist item.
### 3. Write Report
Write a validation report to `{outputFile}` with PASS/WARN/FAIL per section.
## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
### ✅ SUCCESS:
- Validation report written
- All checklist items evaluated
### ❌ SYSTEM FAILURE:
- Skipped checklist items
- No report produced

View File

@@ -0,0 +1,387 @@
---
stepsCompleted: []
lastStep: ''
lastSaved: ''
workflowType: 'testarch-test-review'
inputDocuments: []
---
# Test Quality Review: {test_filename}
**Quality Score**: {score}/100 ({grade} - {assessment})
**Review Date**: {YYYY-MM-DD}
**Review Scope**: {single | directory | suite}
**Reviewer**: {user_name or TEA Agent}
---
Note: This review audits existing tests; it does not generate tests.
Coverage mapping and coverage gates are out of scope here. Use `trace` for coverage decisions.
## Executive Summary
**Overall Assessment**: {Excellent | Good | Acceptable | Needs Improvement | Critical Issues}
**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
### Key Strengths
✅ {strength_1}
✅ {strength_2}
✅ {strength_3}
### Key Weaknesses
❌ {weakness_1}
❌ {weakness_2}
❌ {weakness_3}
### Summary
{1-2 paragraph summary of overall test quality, highlighting major findings and recommendation rationale}
---
## Quality Criteria Assessment
| Criterion | Status | Violations | Notes |
| ------------------------------------ | ------------------------------- | ---------- | ------------ |
| BDD Format (Given-When-Then) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Test IDs | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Priority Markers (P0/P1/P2/P3) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Hard Waits (sleep, waitForTimeout) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Determinism (no conditionals) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Isolation (cleanup, no shared state) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Fixture Patterns | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Data Factories | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Network-First Pattern | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Explicit Assertions | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
| Test Length (≤300 lines) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {lines} | {brief_note} |
| Test Duration (≤1.5 min) | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {duration} | {brief_note} |
| Flakiness Patterns | {✅ PASS \| ⚠️ WARN \| ❌ FAIL} | {count} | {brief_note} |
**Total Violations**: {critical_count} Critical, {high_count} High, {medium_count} Medium, {low_count} Low
---
## Quality Score Breakdown
```
Starting Score: 100
Critical Violations: -{critical_count} × 10 = -{critical_deduction}
High Violations: -{high_count} × 5 = -{high_deduction}
Medium Violations: -{medium_count} × 2 = -{medium_deduction}
Low Violations: -{low_count} × 1 = -{low_deduction}
Bonus Points:
Excellent BDD: +{0|5}
Comprehensive Fixtures: +{0|5}
Data Factories: +{0|5}
Network-First: +{0|5}
Perfect Isolation: +{0|5}
All Test IDs: +{0|5}
--------
Total Bonus: +{bonus_total}
Final Score: {final_score}/100
Grade: {grade}
```
---
## Critical Issues (Must Fix)
{If no critical issues: "No critical issues detected. ✅"}
{For each critical issue:}
### {issue_number}. {Issue Title}
**Severity**: P0 (Critical)
**Location**: `{filename}:{line_number}`
**Criterion**: {criterion_name}
**Knowledge Base**: [{fragment_name}]({fragment_path})
**Issue Description**:
{Detailed explanation of what the problem is and why it's critical}
**Current Code**:
```typescript
// ❌ Bad (current implementation)
{
code_snippet_showing_problem;
}
```
**Recommended Fix**:
```typescript
// ✅ Good (recommended approach)
{
code_snippet_showing_solution;
}
```
**Why This Matters**:
{Explanation of impact - flakiness risk, maintainability, reliability}
**Related Violations**:
{If similar issue appears elsewhere, note line numbers}
---
## Recommendations (Should Fix)
{If no recommendations: "No additional recommendations. Test quality is excellent. ✅"}
{For each recommendation:}
### {rec_number}. {Recommendation Title}
**Severity**: {P1 (High) | P2 (Medium) | P3 (Low)}
**Location**: `{filename}:{line_number}`
**Criterion**: {criterion_name}
**Knowledge Base**: [{fragment_name}]({fragment_path})
**Issue Description**:
{Detailed explanation of what could be improved and why}
**Current Code**:
```typescript
// ⚠️ Could be improved (current implementation)
{
code_snippet_showing_current_approach;
}
```
**Recommended Improvement**:
```typescript
// ✅ Better approach (recommended)
{
code_snippet_showing_improvement;
}
```
**Benefits**:
{Explanation of benefits - maintainability, readability, reusability}
**Priority**:
{Why this is P1/P2/P3 - urgency and impact}
---
## Best Practices Found
{If good patterns found, highlight them}
{For each best practice:}
### {practice_number}. {Best Practice Title}
**Location**: `{filename}:{line_number}`
**Pattern**: {pattern_name}
**Knowledge Base**: [{fragment_name}]({fragment_path})
**Why This Is Good**:
{Explanation of why this pattern is excellent}
**Code Example**:
```typescript
// ✅ Excellent pattern demonstrated in this test
{
code_snippet_showing_best_practice;
}
```
**Use as Reference**:
{Encourage using this pattern in other tests}
---
## Test File Analysis
### File Metadata
- **File Path**: `{relative_path_from_project_root}`
- **File Size**: {line_count} lines, {kb_size} KB
- **Test Framework**: {Playwright | Jest | Cypress | Vitest | Other}
- **Language**: {TypeScript | JavaScript}
### Test Structure
- **Describe Blocks**: {describe_count}
- **Test Cases (it/test)**: {test_count}
- **Average Test Length**: {avg_lines_per_test} lines per test
- **Fixtures Used**: {fixture_count} ({fixture_names})
- **Data Factories Used**: {factory_count} ({factory_names})
### Test Scope
- **Test IDs**: {test_id_list}
- **Priority Distribution**:
- P0 (Critical): {p0_count} tests
- P1 (High): {p1_count} tests
- P2 (Medium): {p2_count} tests
- P3 (Low): {p3_count} tests
- Unknown: {unknown_count} tests
### Assertions Analysis
- **Total Assertions**: {assertion_count}
- **Assertions per Test**: {avg_assertions_per_test} (avg)
- **Assertion Types**: {assertion_types_used}
---
## Context and Integration
### Related Artifacts
{If story file found:}
- **Story File**: [{story_filename}]({story_path})
{If test-design found:}
- **Test Design**: [{test_design_filename}]({test_design_path})
- **Risk Assessment**: {risk_level}
- **Priority Framework**: P0-P3 applied
---
## Knowledge Base References
This review consulted the following knowledge base fragments:
- **[test-quality.md](../../../testarch/knowledge/test-quality.md)** - Definition of Done for tests (no hard waits, <300 lines, <1.5 min, self-cleaning)
- **[fixture-architecture.md](../../../testarch/knowledge/fixture-architecture.md)** - Pure function Fixture mergeTests pattern
- **[network-first.md](../../../testarch/knowledge/network-first.md)** - Route intercept before navigate (race condition prevention)
- **[data-factories.md](../../../testarch/knowledge/data-factories.md)** - Factory functions with overrides, API-first setup
- **[test-levels-framework.md](../../../testarch/knowledge/test-levels-framework.md)** - E2E vs API vs Component vs Unit appropriateness
- **[tdd-cycles.md](../../../testarch/knowledge/tdd-cycles.md)** - Red-Green-Refactor patterns
- **[selective-testing.md](../../../testarch/knowledge/selective-testing.md)** - Duplicate coverage detection
- **[ci-burn-in.md](../../../testarch/knowledge/ci-burn-in.md)** - Flakiness detection patterns (10-iteration loop)
- **[test-priorities.md](../../../testarch/knowledge/test-priorities.md)** - P0/P1/P2/P3 classification framework
For coverage mapping, consult `trace` workflow outputs.
See [tea-index.csv](../../../testarch/tea-index.csv) for complete knowledge base.
---
## Next Steps
### Immediate Actions (Before Merge)
1. **{action_1}** - {description}
- Priority: {P0 | P1 | P2}
- Owner: {team_or_person}
- Estimated Effort: {time_estimate}
2. **{action_2}** - {description}
- Priority: {P0 | P1 | P2}
- Owner: {team_or_person}
- Estimated Effort: {time_estimate}
### Follow-up Actions (Future PRs)
1. **{action_1}** - {description}
- Priority: {P2 | P3}
- Target: {next_milestone | backlog}
2. **{action_2}** - {description}
- Priority: {P2 | P3}
- Target: {next_milestone | backlog}
### Re-Review Needed?
{✅ No re-review needed - approve as-is}
{⚠ Re-review after critical fixes - request changes, then re-review}
{❌ Major refactor required - block merge, pair programming recommended}
---
## Decision
**Recommendation**: {Approve | Approve with Comments | Request Changes | Block}
**Rationale**:
{1-2 paragraph explanation of recommendation based on findings}
**For Approve**:
> Test quality is excellent/good with {score}/100 score. {Minor issues noted can be addressed in follow-up PRs.} Tests are production-ready and follow best practices.
**For Approve with Comments**:
> Test quality is acceptable with {score}/100 score. {High-priority recommendations should be addressed but don't block merge.} Critical issues resolved, but improvements would enhance maintainability.
**For Request Changes**:
> Test quality needs improvement with {score}/100 score. {Critical issues must be fixed before merge.} {X} critical violations detected that pose flakiness/maintainability risks.
**For Block**:
> Test quality is insufficient with {score}/100 score. {Multiple critical issues make tests unsuitable for production.} Recommend pairing session with QA engineer to apply patterns from knowledge base.
---
## Appendix
### Violation Summary by Location
{Table of all violations sorted by line number:}
| Line | Severity | Criterion | Issue | Fix |
| ------ | ------------- | ----------- | ------------- | ----------- |
| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
| {line} | {P0/P1/P2/P3} | {criterion} | {brief_issue} | {brief_fix} |
### Quality Trends
{If reviewing same file multiple times, show trend:}
| Review Date | Score | Grade | Critical Issues | Trend |
| ------------ | ------------- | --------- | --------------- | ----------- |
| {YYYY-MM-DD} | {score_1}/100 | {grade_1} | {count_1} | Improved |
| {YYYY-MM-DD} | {score_2}/100 | {grade_2} | {count_2} | Declined |
| {YYYY-MM-DD} | {score_3}/100 | {grade_3} | {count_3} | Stable |
### Related Reviews
{If reviewing multiple files in directory/suite:}
| File | Score | Grade | Critical | Status |
| -------- | ----------- | ------- | -------- | ------------------ |
| {file_1} | {score}/100 | {grade} | {count} | {Approved/Blocked} |
| {file_2} | {score}/100 | {grade} | {count} | {Approved/Blocked} |
| {file_3} | {score}/100 | {grade} | {count} | {Approved/Blocked} |
**Suite Average**: {avg_score}/100 ({avg_grade})
---
## Review Metadata
**Generated By**: BMad TEA Agent (Test Architect)
**Workflow**: testarch-test-review v4.0
**Review ID**: test-review-{filename}-{YYYYMMDD}
**Timestamp**: {YYYY-MM-DD HH:MM:SS}
**Version**: 1.0
---
## Feedback on This Review
If you have questions or feedback on this review:
1. Review patterns in knowledge base: `testarch/knowledge/`
2. Consult tea-index.csv for detailed guidance
3. Request clarification on specific violations
4. Pair with QA engineer to apply patterns
This review is guidance, not rigid rules. Context matters - if a pattern is justified, document it with a comment.

View File

@@ -0,0 +1,72 @@
---
validationDate: 2026-01-27
workflowName: testarch-test-review
workflowPath: {project-root}/src/workflows/testarch/bmad-testarch-test-review
validationStatus: COMPLETE
completionDate: 2026-01-27 10:03:10
---
# Validation Report: testarch-test-review
**Validation Started:** 2026-01-27 09:50:21
**Validator:** BMAD Workflow Validation System (Codex)
**Standards Version:** BMAD Workflow Standards
## File Structure & Size
- workflow.md present: YES
- instructions.md present: YES
- workflow.yaml present: YES
- step files found: 7
**Step File Sizes:**
- steps-c/step-01-load-context.md: 91 lines [GOOD]
- steps-c/step-02-discover-tests.md: 63 lines [GOOD]
- steps-c/step-03-quality-evaluation.md: 69 lines [GOOD]
- steps-c/step-04-generate-report.md: 65 lines [GOOD]
- steps-e/step-01-assess.md: 51 lines [GOOD]
- steps-e/step-02-apply-edit.md: 46 lines [GOOD]
- steps-v/step-01-validate.md: 53 lines [GOOD]
- workflow-plan.md present: YES
## Frontmatter Validation
- No frontmatter violations found
## Critical Path Violations
- No {project-root} hardcoded paths detected in body
- No dead relative links detected
## Menu Handling Validation
- No menu structures detected (linear step flow) [N/A]
## Step Type Validation
- Last step steps-v/step-01-validate.md has no nextStepFile (final step OK)
- Step type validation assumes linear sequence (no branching/menu). Workflow-plan.md present for reference. [INFO]
## Output Format Validation
- Templates present: test-review-template.md
- Steps with outputFile in frontmatter:
- steps-c/step-04-generate-report.md
- steps-v/step-01-validate.md
## Validation Design Check
- checklist.md present: YES
- Validation steps folder (steps-v) present: YES
## Instruction Style Check
- All steps include STEP GOAL, MANDATORY EXECUTION RULES, EXECUTION PROTOCOLS, CONTEXT BOUNDARIES, and SUCCESS/FAILURE metrics
## Summary
- Validation completed: 2026-01-27 10:03:10
- Critical issues: 0
- Warnings: 0 (informational notes only)
- Readiness: READY (manual review optional)

View File

@@ -0,0 +1,114 @@
---
validationDate: 2026-01-27
workflowName: testarch-test-review
workflowPath: {project-root}/src/workflows/testarch/bmad-testarch-test-review
validationStatus: COMPLETE
completionDate: 2026-01-27 10:24:01
---
# Validation Report: testarch-test-review
**Validation Started:** 2026-01-27 10:24:01
**Validator:** BMAD Workflow Validation System (Codex)
**Standards Version:** BMAD Workflow Standards
## File Structure & Size
- workflow.md present: YES
- instructions.md present: YES
- workflow.yaml present: YES
- step files found: 7
**Step File Sizes:**
- steps-c/step-01-load-context.md: 90 lines [GOOD]
- steps-c/step-02-discover-tests.md: 62 lines [GOOD]
- steps-c/step-03-quality-evaluation.md: 68 lines [GOOD]
- steps-c/step-04-generate-report.md: 64 lines [GOOD]
- steps-e/step-01-assess.md: 50 lines [GOOD]
- steps-e/step-02-apply-edit.md: 45 lines [GOOD]
- steps-v/step-01-validate.md: 52 lines [GOOD]
- workflow-plan.md present: YES
## Frontmatter Validation
- No frontmatter violations found
## Critical Path Violations
### Config Variables (Exceptions)
Standard BMAD config variables treated as valid exceptions: bmb_creations_output_folder, communication_language, document_output_language, output_folder, planning_artifacts, project-root, project_name, test_artifacts, user_name
- No {project-root} hardcoded paths detected in body
- No dead relative links detected
- No module path assumptions detected
**Status:** ✅ PASS - No critical violations
## Menu Handling Validation
- No menu structures detected (linear step flow) [N/A]
## Step Type Validation
- steps-c/step-01-load-context.md: Init [PASS]
- steps-c/step-02-discover-tests.md: Middle [PASS]
- steps-c/step-03-quality-evaluation.md: Middle [PASS]
- steps-c/step-04-generate-report.md: Final [PASS]
- Step type validation assumes linear sequence (no branching/menu). Workflow-plan.md present for reference. [INFO]
## Output Format Validation
- Templates present: test-review-template.md
- Steps with outputFile in frontmatter:
- steps-c/step-04-generate-report.md
- steps-v/step-01-validate.md
- checklist.md present: YES
## Validation Design Check
- Validation steps folder (steps-v) present: YES
- Validation step(s) present: step-01-validate.md
- Validation steps reference checklist data and auto-proceed
## Instruction Style Check
- Instruction style: Prescriptive (appropriate for TEA quality/compliance workflows)
- Steps emphasize mandatory sequence, explicit success/failure metrics, and risk-based guidance
## Collaborative Experience Check
- Overall facilitation quality: GOOD
- Steps use progressive prompts and clear role reinforcement; no laundry-list interrogation detected
- Flow progression is clear and aligned to workflow goals
## Subagent Optimization Opportunities
- No high-priority subagent optimizations identified; workflow already uses step-file architecture
- Pattern 1 (grep/regex): N/A for most steps
- Pattern 2 (per-file analysis): already aligned to validation structure
- Pattern 3 (data ops): minimal data file loads
- Pattern 4 (parallel): optional for validation only
## Cohesive Review
- Overall assessment: GOOD
- Flow is linear, goals are clear, and outputs map to TEA artifacts
- Voice and tone consistent with Test Architect persona
- Recommendation: READY (minor refinements optional)
## Plan Quality Validation
- Plan file present: workflow-plan.md
- Planned steps found: 7 (all implemented)
- Plan implementation status: Fully Implemented
## Summary
- Validation completed: 2026-01-27 10:24:01
- Critical issues: 0
- Warnings: 0 (informational notes only)
- Readiness: READY (manual review optional)

View File

@@ -0,0 +1,18 @@
# Workflow Plan: testarch-test-review
## Create Mode (steps-c)
- step-01-load-context.md
- step-02-discover-tests.md
- step-03-quality-evaluation.md
- step-04-generate-report.md
## Validate Mode (steps-v)
- step-01-validate.md
## Edit Mode (steps-e)
- step-01-assess.md
- step-02-apply-edit.md
## Outputs
- {test_artifacts}/test-review.md

View File

@@ -0,0 +1,41 @@
---
name: bmad-testarch-test-review
description: Review test quality using best practices validation. Use when user says 'lets review tests' or 'I want to evaluate test quality'
web_bundle: true
---
# Test Quality Review
**Goal:** Review test quality using comprehensive knowledge base and best practices validation
**Role:** You are the Master Test Architect.
---
## WORKFLOW ARCHITECTURE
This workflow uses **tri-modal step-file architecture**:
- **Create mode (steps-c/)**: primary execution flow
- **Validate mode (steps-v/)**: validation against checklist
- **Edit mode (steps-e/)**: revise existing outputs
---
## INITIALIZATION SEQUENCE
### 1. Mode Determination
"Welcome to the workflow. What would you like to do?"
- **[C] Create** — Run the workflow
- **[R] Resume** — Resume an interrupted workflow
- **[V] Validate** — Validate existing outputs
- **[E] Edit** — Edit existing outputs
### 2. Route to First Step
- **If C:** Load `steps-c/step-01-load-context.md`
- **If R:** Load `steps-c/step-01b-resume.md`
- **If V:** Load `steps-v/step-01-validate.md`
- **If E:** Load `steps-e/step-01-assess.md`

View File

@@ -0,0 +1,48 @@
# Test Architect workflow: bmad-testarch-test-review
name: bmad-testarch-test-review
# prettier-ignore
description: 'Review test quality using best practices validation. Use when the user says "lets review tests" or "I want to evaluate test quality"'
# Critical variables from config
config_source: "{project-root}/_bmad/tea/config.yaml"
output_folder: "{config_source}:output_folder"
test_artifacts: "{config_source}:test_artifacts"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
document_output_language: "{config_source}:document_output_language"
date: system-generated
# Workflow components
installed_path: "."
instructions: "./instructions.md"
validation: "./checklist.md"
template: "./test-review-template.md"
# Variables and inputs
variables:
test_dir: "{project-root}/tests" # Root test directory
review_scope: "single" # single (one file), directory (folder), suite (all tests)
test_stack_type: "auto" # auto, frontend, backend, fullstack - from config or auto-detected
# Output configuration
default_output_file: "{test_artifacts}/test-review.md"
# Required tools
required_tools:
- read_file # Read test files, story, test-design
- write_file # Create review report
- list_files # Discover test files in directory
- search_repo # Find tests by patterns
- glob # Find test files matching patterns
tags:
- qa
- test-architect
- code-review
- quality
- best-practices
execution_hints:
interactive: false # Minimize prompts
autonomous: true # Proceed without user input unless blocked
iterative: true # Can review multiple files