initial commit
This commit is contained in:
@@ -0,0 +1,197 @@
|
||||
---
|
||||
name: 'step-01-load-context'
|
||||
description: 'Load knowledge base, determine scope, and gather context'
|
||||
nextStepFile: './step-02-discover-tests.md'
|
||||
knowledgeIndex: '{project-root}/_bmad/tea/testarch/tea-index.csv'
|
||||
outputFile: '{test_artifacts}/test-review.md'
|
||||
---
|
||||
|
||||
# Step 1: Load Context & Knowledge Base
|
||||
|
||||
## STEP GOAL
|
||||
|
||||
Determine review scope, load required knowledge fragments, and gather related artifacts.
|
||||
|
||||
## MANDATORY EXECUTION RULES
|
||||
|
||||
- 📖 Read the entire step file before acting
|
||||
- ✅ Speak in `{communication_language}`
|
||||
|
||||
---
|
||||
|
||||
## EXECUTION PROTOCOLS:
|
||||
|
||||
- 🎯 Follow the MANDATORY SEQUENCE exactly
|
||||
- 💾 Record outputs before proceeding
|
||||
- 📖 Load the next step only when instructed
|
||||
|
||||
## CONTEXT BOUNDARIES:
|
||||
|
||||
- Available context: config, loaded artifacts, and knowledge fragments
|
||||
- Focus: this step's goal only
|
||||
- Limits: do not execute future steps
|
||||
- Dependencies: prior steps' outputs (if any)
|
||||
|
||||
## MANDATORY SEQUENCE
|
||||
|
||||
**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
|
||||
|
||||
## 1. Determine Scope and Stack
|
||||
|
||||
Use `review_scope`:
|
||||
|
||||
- **single**: one file
|
||||
- **directory**: all tests in folder
|
||||
- **suite**: all tests in repo
|
||||
|
||||
If unclear, ask the user.
|
||||
|
||||
**Stack Detection** (for context-aware loading):
|
||||
|
||||
Read `test_stack_type` from `{config_source}`. If `"auto"` or not configured, infer `{detected_stack}` by scanning `{project-root}`:
|
||||
|
||||
- **Frontend indicators**: `playwright.config.*`, `cypress.config.*`, `package.json` with react/vue/angular
|
||||
- **Backend indicators**: `pyproject.toml`, `pom.xml`/`build.gradle`, `go.mod`, `*.csproj`, `Gemfile`, `Cargo.toml`
|
||||
- **Both present** → `fullstack`; only frontend → `frontend`; only backend → `backend`
|
||||
- Explicit `test_stack_type` overrides auto-detection
|
||||
|
||||
---
|
||||
|
||||
### Tiered Knowledge Loading
|
||||
|
||||
Load fragments based on their `tier` classification in `tea-index.csv`:
|
||||
|
||||
1. **Core tier** (always load): Foundational fragments required for this workflow
|
||||
2. **Extended tier** (load on-demand): Load when deeper analysis is needed or when the user's context requires it
|
||||
3. **Specialized tier** (load only when relevant): Load only when the specific use case matches (e.g., contract-testing only for microservices, email-auth only for email flows)
|
||||
|
||||
> **Context Efficiency**: Loading only core fragments reduces context usage by 40-50% compared to loading all fragments.
|
||||
|
||||
### Playwright Utils Loading Profiles
|
||||
|
||||
**If `tea_use_playwright_utils` is enabled**, select the appropriate loading profile:
|
||||
|
||||
- **API-only profile** (when `{detected_stack}` is `backend` or no `page.goto`/`page.locator` found in test files):
|
||||
Load: `overview`, `api-request`, `auth-session`, `recurse` (~1,800 lines)
|
||||
|
||||
- **Full UI+API profile** (when `{detected_stack}` is `frontend`/`fullstack` or browser tests detected):
|
||||
Load: all Playwright Utils core fragments (~4,500 lines)
|
||||
|
||||
**Detection**: Scan `{test_dir}` for files containing `page.goto` or `page.locator`. If none found, use API-only profile.
|
||||
|
||||
### Pact.js Utils Loading
|
||||
|
||||
**If `tea_use_pactjs_utils` is enabled** (and contract tests detected in review scope):
|
||||
|
||||
Load: `pactjs-utils-overview.md`, `pactjs-utils-provider-verifier.md`, `pactjs-utils-request-filter.md` (the 3 most relevant for reviewing provider verification tests)
|
||||
|
||||
**If `tea_use_pactjs_utils` is disabled** but contract tests are in review scope:
|
||||
|
||||
Load: `contract-testing.md`
|
||||
|
||||
### Pact MCP Loading
|
||||
|
||||
**If `tea_pact_mcp` is `"mcp"`:**
|
||||
|
||||
Load: `pact-mcp.md` — enables agent to use SmartBear MCP "Review Pact Tests" tool for automated best-practice feedback during test review.
|
||||
|
||||
## 2. Load Knowledge Base
|
||||
|
||||
From `{knowledgeIndex}` load:
|
||||
|
||||
Read `{config_source}` and check `tea_use_playwright_utils`, `tea_use_pactjs_utils`, `tea_pact_mcp`, and `tea_browser_automation` to select the correct fragment set.
|
||||
|
||||
**Core:**
|
||||
|
||||
- `test-quality.md`
|
||||
- `data-factories.md`
|
||||
- `test-levels-framework.md`
|
||||
- `selective-testing.md`
|
||||
- `test-healing-patterns.md`
|
||||
- `selector-resilience.md`
|
||||
- `timing-debugging.md`
|
||||
|
||||
**If Playwright Utils enabled:**
|
||||
|
||||
- `overview.md`, `api-request.md`, `network-recorder.md`, `auth-session.md`, `intercept-network-call.md`, `recurse.md`, `log.md`, `file-utils.md`, `burn-in.md`, `network-error-monitor.md`, `fixtures-composition.md`
|
||||
|
||||
**If disabled:**
|
||||
|
||||
- `fixture-architecture.md`
|
||||
- `network-first.md`
|
||||
- `playwright-config.md`
|
||||
- `component-tdd.md`
|
||||
- `ci-burn-in.md`
|
||||
|
||||
**Playwright CLI (if `tea_browser_automation` is "cli" or "auto"):**
|
||||
|
||||
- `playwright-cli.md`
|
||||
|
||||
**MCP Patterns (if `tea_browser_automation` is "mcp" or "auto"):**
|
||||
|
||||
- (existing MCP-related fragments, if any are added in future)
|
||||
|
||||
**Pact.js Utils (if enabled and contract tests in review scope):**
|
||||
|
||||
- `pactjs-utils-overview.md`, `pactjs-utils-provider-verifier.md`, `pactjs-utils-request-filter.md`
|
||||
|
||||
**Contract Testing (if pactjs-utils disabled but contract tests in review scope):**
|
||||
|
||||
- `contract-testing.md`
|
||||
|
||||
**Pact MCP (if tea_pact_mcp is "mcp"):**
|
||||
|
||||
- `pact-mcp.md`
|
||||
|
||||
---
|
||||
|
||||
## 3. Gather Context Artifacts
|
||||
|
||||
If available:
|
||||
|
||||
- Story file (acceptance criteria)
|
||||
- Test design doc (priorities)
|
||||
- Framework config
|
||||
|
||||
Summarize what was found.
|
||||
|
||||
Coverage mapping and coverage gates are out of scope in `test-review`. Route those concerns to `trace`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Save Progress
|
||||
|
||||
**Save this step's accumulated work to `{outputFile}`.**
|
||||
|
||||
- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
|
||||
|
||||
```yaml
|
||||
---
|
||||
stepsCompleted: ['step-01-load-context']
|
||||
lastStep: 'step-01-load-context'
|
||||
lastSaved: '{date}'
|
||||
---
|
||||
```
|
||||
|
||||
Then write this step's output below the frontmatter.
|
||||
|
||||
- **If `{outputFile}` already exists**, update:
|
||||
- Add `'step-01-load-context'` to `stepsCompleted` array (only if not already present)
|
||||
- Set `lastStep: 'step-01-load-context'`
|
||||
- Set `lastSaved: '{date}'`
|
||||
- Append this step's output to the appropriate section of the document.
|
||||
|
||||
**Update `inputDocuments`**: Set `inputDocuments` in the output template frontmatter to the list of artifact paths loaded in this step (e.g., knowledge fragments, test design documents, configuration files).
|
||||
|
||||
Load next step: `{nextStepFile}`
|
||||
|
||||
## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
|
||||
|
||||
### ✅ SUCCESS:
|
||||
|
||||
- Step completed in full with required outputs
|
||||
|
||||
### ❌ SYSTEM FAILURE:
|
||||
|
||||
- Skipped sequence steps or missing outputs
|
||||
**Master Rule:** Skipping steps is FORBIDDEN.
|
||||
@@ -0,0 +1,104 @@
|
||||
---
|
||||
name: 'step-01b-resume'
|
||||
description: 'Resume interrupted workflow from last completed step'
|
||||
outputFile: '{test_artifacts}/test-review.md'
|
||||
---
|
||||
|
||||
# Step 1b: Resume Workflow
|
||||
|
||||
## STEP GOAL
|
||||
|
||||
Resume an interrupted workflow by loading the existing output document, displaying progress, and routing to the next incomplete step.
|
||||
|
||||
## MANDATORY EXECUTION RULES
|
||||
|
||||
- Read the entire step file before acting
|
||||
- Speak in `{communication_language}`
|
||||
|
||||
---
|
||||
|
||||
## EXECUTION PROTOCOLS:
|
||||
|
||||
- Follow the MANDATORY SEQUENCE exactly
|
||||
- Load the next step only when instructed
|
||||
|
||||
## CONTEXT BOUNDARIES:
|
||||
|
||||
- Available context: Output document with progress frontmatter
|
||||
- Focus: Load progress and route to next step
|
||||
- Limits: Do not re-execute completed steps
|
||||
- Dependencies: Output document must exist from a previous run
|
||||
|
||||
## MANDATORY SEQUENCE
|
||||
|
||||
**CRITICAL:** Follow this sequence exactly.
|
||||
|
||||
### 1. Load Output Document
|
||||
|
||||
Read `{outputFile}` and parse YAML frontmatter for:
|
||||
|
||||
- `stepsCompleted` -- array of completed step names
|
||||
- `lastStep` -- last completed step name
|
||||
- `lastSaved` -- timestamp of last save
|
||||
|
||||
**If `{outputFile}` does not exist**, display:
|
||||
|
||||
"No previous progress found. There is no output document to resume from. Please use **[C] Create** to start a fresh workflow run."
|
||||
|
||||
**THEN:** Halt. Do not proceed.
|
||||
|
||||
---
|
||||
|
||||
### 2. Display Progress Dashboard
|
||||
|
||||
Display progress with checkmark/empty indicators:
|
||||
|
||||
```
|
||||
Test Quality Review - Resume Progress:
|
||||
|
||||
1. Load Context (step-01-load-context) [completed/pending]
|
||||
2. Discover Tests (step-02-discover-tests) [completed/pending]
|
||||
3. Quality Evaluation + Aggregate (step-03f-aggregate-scores) [completed/pending]
|
||||
4. Generate Report (step-04-generate-report) [completed/pending]
|
||||
|
||||
Last saved: {lastSaved}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Route to Next Step
|
||||
|
||||
Based on `lastStep`, load the next incomplete step:
|
||||
|
||||
| lastStep | Next Step File |
|
||||
| --------------------------- | --------------------------------- |
|
||||
| `step-01-load-context` | `./step-02-discover-tests.md` |
|
||||
| `step-02-discover-tests` | `./step-03-quality-evaluation.md` |
|
||||
| `step-03f-aggregate-scores` | `./step-04-generate-report.md` |
|
||||
| `step-04-generate-report` | **Workflow already complete.** |
|
||||
|
||||
**If `lastStep` is the final step** (`step-04-generate-report`), display: "All steps completed. Use **[C] Create** to start fresh, **[V] Validate** to review outputs, or **[E] Edit** to make revisions." Then halt.
|
||||
|
||||
**If `lastStep` does not match any value above**, display: "Unknown progress state (`lastStep`: {lastStep}). Please use **[C] Create** to start fresh." Then halt.
|
||||
|
||||
**Otherwise**, load the identified step file, read completely, and execute.
|
||||
|
||||
The existing content in `{outputFile}` provides context from previously completed steps.
|
||||
|
||||
---
|
||||
|
||||
## SYSTEM SUCCESS/FAILURE METRICS
|
||||
|
||||
### SUCCESS:
|
||||
|
||||
- Output document loaded and parsed correctly
|
||||
- Progress dashboard displayed accurately
|
||||
- Routed to correct next step
|
||||
|
||||
### FAILURE:
|
||||
|
||||
- Not loading output document
|
||||
- Incorrect progress display
|
||||
- Routing to wrong step
|
||||
|
||||
**Master Rule:** Resume MUST route to the exact next incomplete step. Never re-execute completed steps.
|
||||
@@ -0,0 +1,113 @@
|
||||
---
|
||||
name: 'step-02-discover-tests'
|
||||
description: 'Find and parse test files'
|
||||
nextStepFile: './step-03-quality-evaluation.md'
|
||||
outputFile: '{test_artifacts}/test-review.md'
|
||||
---
|
||||
|
||||
# Step 2: Discover & Parse Tests
|
||||
|
||||
## STEP GOAL
|
||||
|
||||
Collect test files in scope and parse structure/metadata.
|
||||
|
||||
## MANDATORY EXECUTION RULES
|
||||
|
||||
- 📖 Read the entire step file before acting
|
||||
- ✅ Speak in `{communication_language}`
|
||||
|
||||
---
|
||||
|
||||
## EXECUTION PROTOCOLS:
|
||||
|
||||
- 🎯 Follow the MANDATORY SEQUENCE exactly
|
||||
- 💾 Record outputs before proceeding
|
||||
- 📖 Load the next step only when instructed
|
||||
|
||||
## CONTEXT BOUNDARIES:
|
||||
|
||||
- Available context: config, loaded artifacts, and knowledge fragments
|
||||
- Focus: this step's goal only
|
||||
- Limits: do not execute future steps
|
||||
- Dependencies: prior steps' outputs (if any)
|
||||
|
||||
## MANDATORY SEQUENCE
|
||||
|
||||
**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
|
||||
|
||||
## 1. Discover Test Files
|
||||
|
||||
- **single**: use provided file path
|
||||
- **directory**: glob under `{test_dir}` or selected folder
|
||||
- **suite**: glob all tests in repo
|
||||
|
||||
Halt if no tests are found.
|
||||
|
||||
---
|
||||
|
||||
## 2. Parse Metadata (per file)
|
||||
|
||||
Collect:
|
||||
|
||||
- File size and line count
|
||||
- Test framework detected
|
||||
- Describe/test block counts
|
||||
- Test IDs and priority markers
|
||||
- Imports, fixtures, factories, network interception
|
||||
- Waits/timeouts and control flow (if/try/catch)
|
||||
|
||||
---
|
||||
|
||||
## 3. Evidence Collection (if `tea_browser_automation` is `cli` or `auto`)
|
||||
|
||||
> **Fallback:** If CLI is not installed, fall back to MCP (if available) or skip evidence collection.
|
||||
|
||||
**CLI Evidence Collection:**
|
||||
All commands use the same named session to target the correct browser:
|
||||
|
||||
1. `playwright-cli -s=tea-review open <target_url>`
|
||||
2. `playwright-cli -s=tea-review tracing-start`
|
||||
3. Execute the flow under review (using `-s=tea-review` on each command)
|
||||
4. `playwright-cli -s=tea-review tracing-stop` → saves trace.zip
|
||||
5. `playwright-cli -s=tea-review screenshot --filename={test_artifacts}/review-evidence.png`
|
||||
6. `playwright-cli -s=tea-review network` → capture network request log
|
||||
7. `playwright-cli -s=tea-review close`
|
||||
|
||||
> **Session Hygiene:** Always close sessions using `playwright-cli -s=tea-review close`. Do NOT use `close-all` — it kills every session on the machine and breaks parallel execution.
|
||||
|
||||
---
|
||||
|
||||
## 4. Save Progress
|
||||
|
||||
**Save this step's accumulated work to `{outputFile}`.**
|
||||
|
||||
- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
|
||||
|
||||
```yaml
|
||||
---
|
||||
stepsCompleted: ['step-02-discover-tests']
|
||||
lastStep: 'step-02-discover-tests'
|
||||
lastSaved: '{date}'
|
||||
---
|
||||
```
|
||||
|
||||
Then write this step's output below the frontmatter.
|
||||
|
||||
- **If `{outputFile}` already exists**, update:
|
||||
- Add `'step-02-discover-tests'` to `stepsCompleted` array (only if not already present)
|
||||
- Set `lastStep: 'step-02-discover-tests'`
|
||||
- Set `lastSaved: '{date}'`
|
||||
- Append this step's output to the appropriate section of the document.
|
||||
|
||||
Load next step: `{nextStepFile}`
|
||||
|
||||
## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
|
||||
|
||||
### ✅ SUCCESS:
|
||||
|
||||
- Step completed in full with required outputs
|
||||
|
||||
### ❌ SYSTEM FAILURE:
|
||||
|
||||
- Skipped sequence steps or missing outputs
|
||||
**Master Rule:** Skipping steps is FORBIDDEN.
|
||||
@@ -0,0 +1,274 @@
|
||||
---
|
||||
name: 'step-03-quality-evaluation'
|
||||
description: 'Orchestrate adaptive quality dimension checks (agent-team, subagent, or sequential)'
|
||||
nextStepFile: './step-03f-aggregate-scores.md'
|
||||
---
|
||||
|
||||
# Step 3: Orchestrate Adaptive Quality Evaluation
|
||||
|
||||
## STEP GOAL
|
||||
|
||||
Select execution mode deterministically, then evaluate quality dimensions using agent-team, subagent, or sequential execution while preserving output contracts:
|
||||
|
||||
- Determinism
|
||||
- Isolation
|
||||
- Maintainability
|
||||
- Performance
|
||||
|
||||
Coverage is intentionally excluded from this workflow and handled by `trace`.
|
||||
|
||||
## MANDATORY EXECUTION RULES
|
||||
|
||||
- 📖 Read the entire step file before acting
|
||||
- ✅ Speak in `{communication_language}`
|
||||
- ✅ Resolve execution mode from config (`tea_execution_mode`, `tea_capability_probe`)
|
||||
- ✅ Apply fallback rules deterministically when requested mode is unsupported
|
||||
- ✅ Wait for required worker steps to complete
|
||||
- ❌ Do NOT skip capability checks when probing is enabled
|
||||
- ❌ Do NOT proceed until required worker steps finish
|
||||
|
||||
---
|
||||
|
||||
## EXECUTION PROTOCOLS:
|
||||
|
||||
- 🎯 Follow the MANDATORY SEQUENCE exactly
|
||||
- 💾 Wait for subagent outputs
|
||||
- 📖 Load the next step only when instructed
|
||||
|
||||
## CONTEXT BOUNDARIES:
|
||||
|
||||
- Available context: test files from Step 2, knowledge fragments
|
||||
- Focus: orchestration only (mode selection + worker dispatch)
|
||||
- Limits: do not evaluate quality directly (delegate to worker steps)
|
||||
|
||||
---
|
||||
|
||||
## MANDATORY SEQUENCE
|
||||
|
||||
### 1. Prepare Execution Context
|
||||
|
||||
**Generate unique timestamp:**
|
||||
|
||||
```javascript
|
||||
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
|
||||
```
|
||||
|
||||
**Prepare context for all subagents:**
|
||||
|
||||
```javascript
|
||||
const parseBooleanFlag = (value, defaultValue = true) => {
|
||||
if (typeof value === 'string') {
|
||||
const normalized = value.trim().toLowerCase();
|
||||
if (['false', '0', 'off', 'no'].includes(normalized)) return false;
|
||||
if (['true', '1', 'on', 'yes'].includes(normalized)) return true;
|
||||
}
|
||||
if (value === undefined || value === null) return defaultValue;
|
||||
return Boolean(value);
|
||||
};
|
||||
|
||||
const subagentContext = {
|
||||
test_files: /* from Step 2 */,
|
||||
knowledge_fragments_loaded: ['test-quality'],
|
||||
config: {
|
||||
execution_mode: config.tea_execution_mode || 'auto', // "auto" | "subagent" | "agent-team" | "sequential"
|
||||
capability_probe: parseBooleanFlag(config.tea_capability_probe, true), // supports booleans and "false"/"true" strings
|
||||
},
|
||||
timestamp: timestamp
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Resolve Execution Mode with Capability Probe
|
||||
|
||||
```javascript
|
||||
const normalizeUserExecutionMode = (mode) => {
|
||||
if (typeof mode !== 'string') return null;
|
||||
const normalized = mode.trim().toLowerCase().replace(/[-_]/g, ' ').replace(/\s+/g, ' ');
|
||||
|
||||
if (normalized === 'auto') return 'auto';
|
||||
if (normalized === 'sequential') return 'sequential';
|
||||
if (normalized === 'subagent' || normalized === 'sub agent' || normalized === 'subagents' || normalized === 'sub agents') {
|
||||
return 'subagent';
|
||||
}
|
||||
if (normalized === 'agent team' || normalized === 'agent teams' || normalized === 'agentteam') {
|
||||
return 'agent-team';
|
||||
}
|
||||
|
||||
return null;
|
||||
};
|
||||
|
||||
const normalizeConfigExecutionMode = (mode) => {
|
||||
if (mode === 'subagent') return 'subagent';
|
||||
if (mode === 'auto' || mode === 'sequential' || mode === 'subagent' || mode === 'agent-team') {
|
||||
return mode;
|
||||
}
|
||||
return null;
|
||||
};
|
||||
|
||||
// Explicit user instruction in the active run takes priority over config.
|
||||
const explicitModeFromUser = normalizeUserExecutionMode(runtime.getExplicitExecutionModeHint?.() || null);
|
||||
|
||||
const requestedMode = explicitModeFromUser || normalizeConfigExecutionMode(subagentContext.config.execution_mode) || 'auto';
|
||||
const probeEnabled = subagentContext.config.capability_probe;
|
||||
|
||||
const supports = {
|
||||
subagent: false,
|
||||
agentTeam: false,
|
||||
};
|
||||
|
||||
if (probeEnabled) {
|
||||
supports.subagent = runtime.canLaunchSubagents?.() === true;
|
||||
supports.agentTeam = runtime.canLaunchAgentTeams?.() === true;
|
||||
}
|
||||
|
||||
let resolvedMode = requestedMode;
|
||||
|
||||
if (requestedMode === 'auto') {
|
||||
if (supports.agentTeam) resolvedMode = 'agent-team';
|
||||
else if (supports.subagent) resolvedMode = 'subagent';
|
||||
else resolvedMode = 'sequential';
|
||||
} else if (probeEnabled && requestedMode === 'agent-team' && !supports.agentTeam) {
|
||||
resolvedMode = supports.subagent ? 'subagent' : 'sequential';
|
||||
} else if (probeEnabled && requestedMode === 'subagent' && !supports.subagent) {
|
||||
resolvedMode = 'sequential';
|
||||
}
|
||||
|
||||
subagentContext.execution = {
|
||||
requestedMode,
|
||||
resolvedMode,
|
||||
probeEnabled,
|
||||
supports,
|
||||
};
|
||||
```
|
||||
|
||||
Resolution precedence:
|
||||
|
||||
1. Explicit user request in this run (`agent team` => `agent-team`; `subagent` => `subagent`; `sequential`; `auto`)
|
||||
2. `tea_execution_mode` from config
|
||||
3. Runtime capability fallback (when probing enabled)
|
||||
|
||||
If probing is disabled, honor the requested mode strictly. If that mode cannot be executed at runtime, fail with explicit error instead of silent fallback.
|
||||
|
||||
---
|
||||
|
||||
### 3. Dispatch 4 Quality Workers
|
||||
|
||||
**Subagent A: Determinism**
|
||||
|
||||
- File: `./step-03a-subagent-determinism.md`
|
||||
- Output: `/tmp/tea-test-review-determinism-${timestamp}.json`
|
||||
- Execution:
|
||||
- `agent-team` or `subagent`: launch non-blocking
|
||||
- `sequential`: run blocking and wait
|
||||
- Status: Running... ⟳
|
||||
|
||||
**Subagent B: Isolation**
|
||||
|
||||
- File: `./step-03b-subagent-isolation.md`
|
||||
- Output: `/tmp/tea-test-review-isolation-${timestamp}.json`
|
||||
- Status: Running... ⟳
|
||||
|
||||
**Subagent C: Maintainability**
|
||||
|
||||
- File: `./step-03c-subagent-maintainability.md`
|
||||
- Output: `/tmp/tea-test-review-maintainability-${timestamp}.json`
|
||||
- Status: Running... ⟳
|
||||
|
||||
**Subagent D: Performance**
|
||||
|
||||
- File: `./step-03e-subagent-performance.md`
|
||||
- Output: `/tmp/tea-test-review-performance-${timestamp}.json`
|
||||
- Status: Running... ⟳
|
||||
|
||||
In `agent-team` and `subagent` modes, runtime decides worker scheduling and concurrency.
|
||||
|
||||
---
|
||||
|
||||
### 4. Wait for Expected Worker Completion
|
||||
|
||||
**If `resolvedMode` is `agent-team` or `subagent`:**
|
||||
|
||||
```
|
||||
⏳ Waiting for 4 quality subagents to complete...
|
||||
✅ All 4 quality subagents completed successfully!
|
||||
```
|
||||
|
||||
**If `resolvedMode` is `sequential`:**
|
||||
|
||||
```
|
||||
✅ Sequential mode: each worker already completed during dispatch.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. Verify All Outputs Exist
|
||||
|
||||
```javascript
|
||||
const outputs = ['determinism', 'isolation', 'maintainability', 'performance'].map(
|
||||
(dim) => `/tmp/tea-test-review-${dim}-${timestamp}.json`,
|
||||
);
|
||||
|
||||
outputs.forEach((output) => {
|
||||
if (!fs.existsSync(output)) {
|
||||
throw new Error(`Subagent output missing: ${output}`);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. Execution Report
|
||||
|
||||
```
|
||||
🚀 Performance Report:
|
||||
- Execution Mode: {resolvedMode}
|
||||
- Total Elapsed: ~mode-dependent
|
||||
- Parallel Gain: ~60-70% faster when mode is subagent/agent-team
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 7. Proceed to Aggregation
|
||||
|
||||
Pass the same `timestamp` value to Step 3F (do not regenerate it). Step 3F must read the exact temp files written in this step.
|
||||
|
||||
Load next step: `{nextStepFile}`
|
||||
|
||||
The aggregation step (3F) will:
|
||||
|
||||
- Read all 4 subagent outputs
|
||||
- Calculate weighted overall score (0-100)
|
||||
- Aggregate violations by severity
|
||||
- Generate review report with top suggestions
|
||||
|
||||
---
|
||||
|
||||
## EXIT CONDITION
|
||||
|
||||
Proceed to Step 3F when:
|
||||
|
||||
- ✅ All 4 subagents completed successfully
|
||||
- ✅ All output files exist and are valid JSON
|
||||
- ✅ Execution metrics displayed
|
||||
|
||||
**Do NOT proceed if any subagent failed.**
|
||||
|
||||
---
|
||||
|
||||
## 🚨 SYSTEM SUCCESS METRICS
|
||||
|
||||
### ✅ SUCCESS:
|
||||
|
||||
- All 4 subagents launched and completed
|
||||
- All required worker steps completed
|
||||
- Output files generated and valid
|
||||
- Fallback behavior respected configuration and capability probe rules
|
||||
|
||||
### ❌ FAILURE:
|
||||
|
||||
- One or more subagents failed
|
||||
- Output files missing or invalid
|
||||
- Unsupported requested mode with probing disabled
|
||||
|
||||
**Master Rule:** Deterministic mode selection + stable output contract. Use the best supported mode, then aggregate normally.
|
||||
@@ -0,0 +1,214 @@
|
||||
---
|
||||
name: 'step-03a-subagent-determinism'
|
||||
description: 'Subagent: Check test determinism (no random/time dependencies)'
|
||||
subagent: true
|
||||
outputFile: '/tmp/tea-test-review-determinism-{{timestamp}}.json'
|
||||
---
|
||||
|
||||
# Subagent 3A: Determinism Quality Check
|
||||
|
||||
## SUBAGENT CONTEXT
|
||||
|
||||
This is an **isolated subagent** running in parallel with other quality dimension checks.
|
||||
|
||||
**What you have from parent workflow:**
|
||||
|
||||
- Test files discovered in Step 2
|
||||
- Knowledge fragment: test-quality (determinism criteria)
|
||||
- Config: test framework
|
||||
|
||||
**Your task:** Analyze test files for DETERMINISM violations only.
|
||||
|
||||
---
|
||||
|
||||
## MANDATORY EXECUTION RULES
|
||||
|
||||
- 📖 Read this entire subagent file before acting
|
||||
- ✅ Check DETERMINISM only (not other quality dimensions)
|
||||
- ✅ Output structured JSON to temp file
|
||||
- ❌ Do NOT check isolation, maintainability, coverage, or performance (other subagents)
|
||||
- ❌ Do NOT modify test files (read-only analysis)
|
||||
- ❌ Do NOT run tests (just analyze code)
|
||||
|
||||
---
|
||||
|
||||
## SUBAGENT TASK
|
||||
|
||||
### 1. Identify Determinism Violations
|
||||
|
||||
**Scan test files for non-deterministic patterns:**
|
||||
|
||||
**HIGH SEVERITY Violations**:
|
||||
|
||||
- `Math.random()` - Random number generation
|
||||
- `Date.now()` or `new Date()` without mocking
|
||||
- `setTimeout` / `setInterval` without proper waits
|
||||
- External API calls without mocking
|
||||
- File system operations on random paths
|
||||
- Database queries with non-deterministic ordering
|
||||
|
||||
**MEDIUM SEVERITY Violations**:
|
||||
|
||||
- `page.waitForTimeout(N)` - Hard waits instead of conditions
|
||||
- Flaky selectors (CSS classes that may change)
|
||||
- Race conditions (missing proper synchronization)
|
||||
- Test order dependencies (test A must run before test B)
|
||||
|
||||
**LOW SEVERITY Violations**:
|
||||
|
||||
- Missing test isolation (shared state between tests)
|
||||
- Console timestamps without fixed timezone
|
||||
|
||||
### 2. Analyze Each Test File
|
||||
|
||||
For each test file from Step 2:
|
||||
|
||||
```javascript
|
||||
const violations = [];
|
||||
|
||||
// Check for Math.random()
|
||||
if (testFileContent.includes('Math.random()')) {
|
||||
violations.push({
|
||||
file: testFile,
|
||||
line: findLineNumber('Math.random()'),
|
||||
severity: 'HIGH',
|
||||
category: 'random-generation',
|
||||
description: 'Test uses Math.random() - non-deterministic',
|
||||
suggestion: 'Use faker.seed(12345) for deterministic random data',
|
||||
});
|
||||
}
|
||||
|
||||
// Check for Date.now()
|
||||
if (testFileContent.includes('Date.now()') || testFileContent.includes('new Date()')) {
|
||||
violations.push({
|
||||
file: testFile,
|
||||
line: findLineNumber('Date.now()'),
|
||||
severity: 'HIGH',
|
||||
category: 'time-dependency',
|
||||
description: 'Test uses Date.now() or new Date() without mocking',
|
||||
suggestion: 'Mock system time with test.useFakeTimers() or use fixed timestamps',
|
||||
});
|
||||
}
|
||||
|
||||
// Check for hard waits
|
||||
if (testFileContent.includes('waitForTimeout')) {
|
||||
violations.push({
|
||||
file: testFile,
|
||||
line: findLineNumber('waitForTimeout'),
|
||||
severity: 'MEDIUM',
|
||||
category: 'hard-wait',
|
||||
description: 'Test uses waitForTimeout - creates flakiness',
|
||||
suggestion: 'Replace with expect(locator).toBeVisible() or waitForResponse',
|
||||
});
|
||||
}
|
||||
|
||||
// ... check other patterns
|
||||
```
|
||||
|
||||
### 3. Calculate Determinism Score
|
||||
|
||||
**Scoring Logic**:
|
||||
|
||||
```javascript
|
||||
const totalChecks = testFiles.length * checksPerFile;
|
||||
const failedChecks = violations.length;
|
||||
const passedChecks = totalChecks - failedChecks;
|
||||
|
||||
// Weight violations by severity
|
||||
const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
|
||||
const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
|
||||
|
||||
// Score: 100 - (penalty points)
|
||||
const score = Math.max(0, 100 - totalPenalty);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## OUTPUT FORMAT
|
||||
|
||||
Write JSON to temp file: `/tmp/tea-test-review-determinism-{{timestamp}}.json`
|
||||
|
||||
```json
|
||||
{
|
||||
"dimension": "determinism",
|
||||
"score": 85,
|
||||
"max_score": 100,
|
||||
"grade": "B",
|
||||
"violations": [
|
||||
{
|
||||
"file": "tests/api/user.spec.ts",
|
||||
"line": 42,
|
||||
"severity": "HIGH",
|
||||
"category": "random-generation",
|
||||
"description": "Test uses Math.random() - non-deterministic",
|
||||
"suggestion": "Use faker.seed(12345) for deterministic random data",
|
||||
"code_snippet": "const userId = Math.random() * 1000;"
|
||||
},
|
||||
{
|
||||
"file": "tests/e2e/checkout.spec.ts",
|
||||
"line": 78,
|
||||
"severity": "MEDIUM",
|
||||
"category": "hard-wait",
|
||||
"description": "Test uses waitForTimeout - creates flakiness",
|
||||
"suggestion": "Replace with expect(locator).toBeVisible()",
|
||||
"code_snippet": "await page.waitForTimeout(5000);"
|
||||
}
|
||||
],
|
||||
"passed_checks": 12,
|
||||
"failed_checks": 3,
|
||||
"total_checks": 15,
|
||||
"violation_summary": {
|
||||
"HIGH": 1,
|
||||
"MEDIUM": 1,
|
||||
"LOW": 1
|
||||
},
|
||||
"recommendations": [
|
||||
"Use faker with fixed seed for all random data",
|
||||
"Replace all waitForTimeout with conditional waits",
|
||||
"Mock Date.now() in tests that use current time"
|
||||
],
|
||||
"summary": "Tests are mostly deterministic with 3 violations (1 HIGH, 1 MEDIUM, 1 LOW)"
|
||||
}
|
||||
```
|
||||
|
||||
**On Error:**
|
||||
|
||||
```json
|
||||
{
|
||||
"dimension": "determinism",
|
||||
"success": false,
|
||||
"error": "Error message describing what went wrong"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## EXIT CONDITION
|
||||
|
||||
Subagent completes when:
|
||||
|
||||
- ✅ All test files analyzed for determinism violations
|
||||
- ✅ Score calculated (0-100)
|
||||
- ✅ Violations categorized by severity
|
||||
- ✅ Recommendations generated
|
||||
- ✅ JSON output written to temp file
|
||||
|
||||
**Subagent terminates here.** Parent workflow will read output and aggregate with other quality dimensions.
|
||||
|
||||
---
|
||||
|
||||
## 🚨 SUBAGENT SUCCESS METRICS
|
||||
|
||||
### ✅ SUCCESS:
|
||||
|
||||
- All test files scanned for determinism violations
|
||||
- Score calculated with proper severity weighting
|
||||
- JSON output valid and complete
|
||||
- Only determinism checked (not other dimensions)
|
||||
|
||||
### ❌ FAILURE:
|
||||
|
||||
- Checked quality dimensions other than determinism
|
||||
- Invalid or missing JSON output
|
||||
- Score calculation incorrect
|
||||
- Modified test files (should be read-only)
|
||||
@@ -0,0 +1,125 @@
|
||||
---
|
||||
name: 'step-03b-subagent-isolation'
|
||||
description: 'Subagent: Check test isolation (no shared state/dependencies)'
|
||||
subagent: true
|
||||
outputFile: '/tmp/tea-test-review-isolation-{{timestamp}}.json'
|
||||
---
|
||||
|
||||
# Subagent 3B: Isolation Quality Check
|
||||
|
||||
## SUBAGENT CONTEXT
|
||||
|
||||
This is an **isolated subagent** running in parallel with other quality dimension checks.
|
||||
|
||||
**Your task:** Analyze test files for ISOLATION violations only.
|
||||
|
||||
---
|
||||
|
||||
## MANDATORY EXECUTION RULES
|
||||
|
||||
- ✅ Check ISOLATION only (not other quality dimensions)
|
||||
- ✅ Output structured JSON to temp file
|
||||
- ❌ Do NOT check determinism, maintainability, coverage, or performance
|
||||
- ❌ Do NOT modify test files (read-only analysis)
|
||||
|
||||
---
|
||||
|
||||
## SUBAGENT TASK
|
||||
|
||||
### 1. Identify Isolation Violations
|
||||
|
||||
**Scan test files for isolation issues:**
|
||||
|
||||
**HIGH SEVERITY Violations**:
|
||||
|
||||
- Global state mutations (global variables modified)
|
||||
- Test order dependencies (test B depends on test A running first)
|
||||
- Shared database records without cleanup
|
||||
- beforeAll/afterAll with side effects leaking to other tests
|
||||
|
||||
**MEDIUM SEVERITY Violations**:
|
||||
|
||||
- Missing test cleanup (created data not deleted)
|
||||
- Shared fixtures that mutate state
|
||||
- Tests that assume specific execution order
|
||||
- Environment variables modified without restoration
|
||||
|
||||
**LOW SEVERITY Violations**:
|
||||
|
||||
- Tests sharing test data (but not mutating)
|
||||
- Missing test.describe grouping
|
||||
- Tests that could be more isolated
|
||||
|
||||
### 2. Calculate Isolation Score
|
||||
|
||||
```javascript
|
||||
const totalChecks = testFiles.length * checksPerFile;
|
||||
const failedChecks = violations.length;
|
||||
const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
|
||||
const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
|
||||
const score = Math.max(0, 100 - totalPenalty);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## OUTPUT FORMAT
|
||||
|
||||
```json
|
||||
{
|
||||
"dimension": "isolation",
|
||||
"score": 90,
|
||||
"max_score": 100,
|
||||
"grade": "A-",
|
||||
"violations": [
|
||||
{
|
||||
"file": "tests/api/integration.spec.ts",
|
||||
"line": 15,
|
||||
"severity": "HIGH",
|
||||
"category": "test-order-dependency",
|
||||
"description": "Test depends on previous test creating user record",
|
||||
"suggestion": "Each test should create its own test data in beforeEach",
|
||||
"code_snippet": "test('should update user', async () => { /* assumes user exists */ });"
|
||||
}
|
||||
],
|
||||
"passed_checks": 14,
|
||||
"failed_checks": 1,
|
||||
"total_checks": 15,
|
||||
"violation_summary": {
|
||||
"HIGH": 1,
|
||||
"MEDIUM": 0,
|
||||
"LOW": 0
|
||||
},
|
||||
"recommendations": [
|
||||
"Add beforeEach hooks to create test data",
|
||||
"Add afterEach hooks to cleanup created records",
|
||||
"Use test.describe.configure({ mode: 'parallel' }) to enforce isolation"
|
||||
],
|
||||
"summary": "Tests are well isolated with 1 HIGH severity violation"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## EXIT CONDITION
|
||||
|
||||
Subagent completes when:
|
||||
|
||||
- ✅ All test files analyzed for isolation violations
|
||||
- ✅ Score calculated
|
||||
- ✅ JSON output written to temp file
|
||||
|
||||
**Subagent terminates here.**
|
||||
|
||||
---
|
||||
|
||||
## 🚨 SUBAGENT SUCCESS METRICS
|
||||
|
||||
### ✅ SUCCESS:
|
||||
|
||||
- Only isolation checked (not other dimensions)
|
||||
- JSON output valid and complete
|
||||
|
||||
### ❌ FAILURE:
|
||||
|
||||
- Checked quality dimensions other than isolation
|
||||
- Invalid or missing JSON output
|
||||
@@ -0,0 +1,102 @@
|
||||
---
|
||||
name: 'step-03c-subagent-maintainability'
|
||||
description: 'Subagent: Check test maintainability (readability, structure, DRY)'
|
||||
subagent: true
|
||||
outputFile: '/tmp/tea-test-review-maintainability-{{timestamp}}.json'
|
||||
---
|
||||
|
||||
# Subagent 3C: Maintainability Quality Check
|
||||
|
||||
## SUBAGENT CONTEXT
|
||||
|
||||
This is an **isolated subagent** running in parallel with other quality dimension checks.
|
||||
|
||||
**Your task:** Analyze test files for MAINTAINABILITY violations only.
|
||||
|
||||
---
|
||||
|
||||
## MANDATORY EXECUTION RULES
|
||||
|
||||
- ✅ Check MAINTAINABILITY only (not other quality dimensions)
|
||||
- ✅ Output structured JSON to temp file
|
||||
- ❌ Do NOT check determinism, isolation, coverage, or performance
|
||||
|
||||
---
|
||||
|
||||
## SUBAGENT TASK
|
||||
|
||||
### 1. Identify Maintainability Violations
|
||||
|
||||
**HIGH SEVERITY Violations**:
|
||||
|
||||
- Tests >100 lines (too complex)
|
||||
- No test.describe grouping
|
||||
- Duplicate test logic (copy-paste)
|
||||
- Unclear test names (no Given/When/Then structure)
|
||||
- Magic numbers/strings without constants
|
||||
|
||||
**MEDIUM SEVERITY Violations**:
|
||||
|
||||
- Tests missing comments for complex logic
|
||||
- Inconsistent naming conventions
|
||||
- Excessive nesting (>3 levels)
|
||||
- Large setup/teardown blocks
|
||||
|
||||
**LOW SEVERITY Violations**:
|
||||
|
||||
- Minor code style issues
|
||||
- Could benefit from helper functions
|
||||
- Inconsistent assertion styles
|
||||
|
||||
### 2. Calculate Maintainability Score
|
||||
|
||||
```javascript
|
||||
const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
|
||||
const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
|
||||
const score = Math.max(0, 100 - totalPenalty);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## OUTPUT FORMAT
|
||||
|
||||
```json
|
||||
{
|
||||
"dimension": "maintainability",
|
||||
"score": 75,
|
||||
"max_score": 100,
|
||||
"grade": "C",
|
||||
"violations": [
|
||||
{
|
||||
"file": "tests/e2e/complex-flow.spec.ts",
|
||||
"line": 1,
|
||||
"severity": "HIGH",
|
||||
"category": "test-too-long",
|
||||
"description": "Test file is 250 lines - too complex to maintain",
|
||||
"suggestion": "Split into multiple smaller test files by feature area",
|
||||
"code_snippet": "test.describe('Complex flow', () => { /* 250 lines */ });"
|
||||
}
|
||||
],
|
||||
"passed_checks": 10,
|
||||
"failed_checks": 5,
|
||||
"violation_summary": {
|
||||
"HIGH": 2,
|
||||
"MEDIUM": 2,
|
||||
"LOW": 1
|
||||
},
|
||||
"recommendations": [
|
||||
"Split large test files into smaller, focused files (<100 lines each)",
|
||||
"Add test.describe grouping for related tests",
|
||||
"Extract duplicate logic into helper functions"
|
||||
],
|
||||
"summary": "Tests have maintainability issues - 5 violations (2 HIGH)"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## EXIT CONDITION
|
||||
|
||||
Subagent completes when JSON output written to temp file.
|
||||
|
||||
**Subagent terminates here.**
|
||||
@@ -0,0 +1,117 @@
|
||||
---
|
||||
name: 'step-03e-subagent-performance'
|
||||
description: 'Subagent: Check test performance (speed, efficiency, parallelization)'
|
||||
subagent: true
|
||||
outputFile: '/tmp/tea-test-review-performance-{{timestamp}}.json'
|
||||
---
|
||||
|
||||
# Subagent 3E: Performance Quality Check
|
||||
|
||||
## SUBAGENT CONTEXT
|
||||
|
||||
This is an **isolated subagent** running in parallel with other quality dimension checks.
|
||||
|
||||
**Your task:** Analyze test files for PERFORMANCE violations only.
|
||||
|
||||
---
|
||||
|
||||
## MANDATORY EXECUTION RULES
|
||||
|
||||
- ✅ Check PERFORMANCE only (not other quality dimensions)
|
||||
- ✅ Output structured JSON to temp file
|
||||
- ❌ Do NOT check determinism, isolation, maintainability, or coverage
|
||||
|
||||
---
|
||||
|
||||
## SUBAGENT TASK
|
||||
|
||||
### 1. Identify Performance Violations
|
||||
|
||||
**HIGH SEVERITY Violations**:
|
||||
|
||||
- Tests not parallelizable (using test.describe.serial unnecessarily)
|
||||
- Slow setup/teardown (creating fresh DB for every test)
|
||||
- Excessive navigation (reloading pages unnecessarily)
|
||||
- No fixture reuse (repeating expensive operations)
|
||||
|
||||
**MEDIUM SEVERITY Violations**:
|
||||
|
||||
- Hard waits >2 seconds (waitForTimeout(5000))
|
||||
- Inefficient selectors (page.$$ instead of locators)
|
||||
- Large data sets in tests without pagination
|
||||
- Missing performance optimizations
|
||||
|
||||
**LOW SEVERITY Violations**:
|
||||
|
||||
- Could use parallelization (test.describe.configure({ mode: 'parallel' }))
|
||||
- Minor inefficiencies
|
||||
- Excessive logging
|
||||
|
||||
### 2. Calculate Performance Score
|
||||
|
||||
```javascript
|
||||
const severityWeights = { HIGH: 10, MEDIUM: 5, LOW: 2 };
|
||||
const totalPenalty = violations.reduce((sum, v) => sum + severityWeights[v.severity], 0);
|
||||
const score = Math.max(0, 100 - totalPenalty);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## OUTPUT FORMAT
|
||||
|
||||
```json
|
||||
{
|
||||
"dimension": "performance",
|
||||
"score": 80,
|
||||
"max_score": 100,
|
||||
"grade": "B",
|
||||
"violations": [
|
||||
{
|
||||
"file": "tests/e2e/search.spec.ts",
|
||||
"line": 10,
|
||||
"severity": "HIGH",
|
||||
"category": "not-parallelizable",
|
||||
"description": "Tests use test.describe.serial unnecessarily - reduces parallel execution",
|
||||
"suggestion": "Remove .serial unless tests truly share state",
|
||||
"code_snippet": "test.describe.serial('Search tests', () => { ... });"
|
||||
},
|
||||
{
|
||||
"file": "tests/api/bulk-operations.spec.ts",
|
||||
"line": 35,
|
||||
"severity": "MEDIUM",
|
||||
"category": "slow-setup",
|
||||
"description": "Test creates 1000 records in setup - very slow",
|
||||
"suggestion": "Use smaller data sets or fixture factories",
|
||||
"code_snippet": "beforeEach(async () => { for (let i=0; i<1000; i++) { ... } });"
|
||||
}
|
||||
],
|
||||
"passed_checks": 13,
|
||||
"failed_checks": 2,
|
||||
"violation_summary": {
|
||||
"HIGH": 1,
|
||||
"MEDIUM": 1,
|
||||
"LOW": 0
|
||||
},
|
||||
"performance_metrics": {
|
||||
"parallelizable_tests": 80,
|
||||
"serial_tests": 20,
|
||||
"avg_test_duration_estimate": "~2 seconds",
|
||||
"slow_tests": ["bulk-operations.spec.ts (>30s)"]
|
||||
},
|
||||
"recommendations": [
|
||||
"Enable parallel mode where possible",
|
||||
"Reduce setup data to minimum needed",
|
||||
"Use fixtures to share expensive setup across tests",
|
||||
"Remove unnecessary .serial constraints"
|
||||
],
|
||||
"summary": "Good performance with 2 violations - 80% tests can run in parallel"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## EXIT CONDITION
|
||||
|
||||
Subagent completes when JSON output written to temp file.
|
||||
|
||||
**Subagent terminates here.**
|
||||
@@ -0,0 +1,277 @@
|
||||
---
|
||||
name: 'step-03f-aggregate-scores'
|
||||
description: 'Aggregate quality dimension scores into overall 0-100 score'
|
||||
nextStepFile: './step-04-generate-report.md'
|
||||
outputFile: '{test_artifacts}/test-review.md'
|
||||
---
|
||||
|
||||
# Step 3F: Aggregate Quality Scores
|
||||
|
||||
## STEP GOAL
|
||||
|
||||
Read outputs from 4 quality subagents, calculate weighted overall score (0-100), and aggregate violations for report generation.
|
||||
|
||||
---
|
||||
|
||||
## MANDATORY EXECUTION RULES
|
||||
|
||||
- 📖 Read the entire step file before acting
|
||||
- ✅ Speak in `{communication_language}`
|
||||
- ✅ Read all 4 subagent outputs
|
||||
- ✅ Calculate weighted overall score
|
||||
- ✅ Aggregate violations by severity
|
||||
- ❌ Do NOT re-evaluate quality (use subagent outputs)
|
||||
|
||||
---
|
||||
|
||||
## EXECUTION PROTOCOLS:
|
||||
|
||||
- 🎯 Follow the MANDATORY SEQUENCE exactly
|
||||
- 💾 Record outputs before proceeding
|
||||
- 📖 Load the next step only when instructed
|
||||
|
||||
---
|
||||
|
||||
## MANDATORY SEQUENCE
|
||||
|
||||
### 1. Read All Subagent Outputs
|
||||
|
||||
```javascript
|
||||
// Use the SAME timestamp generated in Step 3 (do not regenerate).
|
||||
const timestamp = subagentContext?.timestamp;
|
||||
if (!timestamp) {
|
||||
throw new Error('Missing timestamp from Step 3 context. Pass Step 3 timestamp into Step 3F.');
|
||||
}
|
||||
const dimensions = ['determinism', 'isolation', 'maintainability', 'performance'];
|
||||
const results = {};
|
||||
|
||||
dimensions.forEach((dim) => {
|
||||
const outputPath = `/tmp/tea-test-review-${dim}-${timestamp}.json`;
|
||||
results[dim] = JSON.parse(fs.readFileSync(outputPath, 'utf8'));
|
||||
});
|
||||
```
|
||||
|
||||
**Verify all succeeded:**
|
||||
|
||||
```javascript
|
||||
const allSucceeded = dimensions.every((dim) => results[dim].score !== undefined);
|
||||
if (!allSucceeded) {
|
||||
throw new Error('One or more quality subagents failed!');
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Calculate Weighted Overall Score
|
||||
|
||||
**Dimension Weights** (based on TEA quality priorities):
|
||||
|
||||
```javascript
|
||||
const weights = {
|
||||
determinism: 0.3, // 30% - Reliability and flake prevention
|
||||
isolation: 0.3, // 30% - Parallel safety and independence
|
||||
maintainability: 0.25, // 25% - Readability and long-term health
|
||||
performance: 0.15, // 15% - Speed and execution efficiency
|
||||
};
|
||||
```
|
||||
|
||||
**Calculate overall score:**
|
||||
|
||||
```javascript
|
||||
const overallScore = dimensions.reduce((sum, dim) => {
|
||||
return sum + results[dim].score * weights[dim];
|
||||
}, 0);
|
||||
|
||||
const roundedScore = Math.round(overallScore);
|
||||
```
|
||||
|
||||
**Determine grade:**
|
||||
|
||||
```javascript
|
||||
const getGrade = (score) => {
|
||||
if (score >= 90) return 'A';
|
||||
if (score >= 80) return 'B';
|
||||
if (score >= 70) return 'C';
|
||||
if (score >= 60) return 'D';
|
||||
return 'F';
|
||||
};
|
||||
|
||||
const overallGrade = getGrade(roundedScore);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Aggregate Violations by Severity
|
||||
|
||||
**Collect all violations from all dimensions:**
|
||||
|
||||
```javascript
|
||||
const allViolations = dimensions.flatMap((dim) =>
|
||||
results[dim].violations.map((v) => ({
|
||||
...v,
|
||||
dimension: dim,
|
||||
})),
|
||||
);
|
||||
|
||||
// Group by severity
|
||||
const highSeverity = allViolations.filter((v) => v.severity === 'HIGH');
|
||||
const mediumSeverity = allViolations.filter((v) => v.severity === 'MEDIUM');
|
||||
const lowSeverity = allViolations.filter((v) => v.severity === 'LOW');
|
||||
|
||||
const violationSummary = {
|
||||
total: allViolations.length,
|
||||
HIGH: highSeverity.length,
|
||||
MEDIUM: mediumSeverity.length,
|
||||
LOW: lowSeverity.length,
|
||||
};
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Prioritize Recommendations
|
||||
|
||||
**Extract recommendations from all dimensions:**
|
||||
|
||||
```javascript
|
||||
const allRecommendations = dimensions.flatMap((dim) =>
|
||||
results[dim].recommendations.map((rec) => ({
|
||||
dimension: dim,
|
||||
recommendation: rec,
|
||||
impact: results[dim].score < 70 ? 'HIGH' : 'MEDIUM',
|
||||
})),
|
||||
);
|
||||
|
||||
// Sort by impact (HIGH first)
|
||||
const prioritizedRecommendations = allRecommendations.sort((a, b) => (a.impact === 'HIGH' ? -1 : 1)).slice(0, 10); // Top 10 recommendations
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. Create Review Summary Object
|
||||
|
||||
**Aggregate all results:**
|
||||
|
||||
```javascript
|
||||
const reviewSummary = {
|
||||
overall_score: roundedScore,
|
||||
overall_grade: overallGrade,
|
||||
quality_assessment: getQualityAssessment(roundedScore),
|
||||
|
||||
dimension_scores: {
|
||||
determinism: results.determinism.score,
|
||||
isolation: results.isolation.score,
|
||||
maintainability: results.maintainability.score,
|
||||
performance: results.performance.score,
|
||||
},
|
||||
|
||||
dimension_grades: {
|
||||
determinism: results.determinism.grade,
|
||||
isolation: results.isolation.grade,
|
||||
maintainability: results.maintainability.grade,
|
||||
performance: results.performance.grade,
|
||||
},
|
||||
|
||||
violations_summary: violationSummary,
|
||||
|
||||
all_violations: allViolations,
|
||||
|
||||
high_severity_violations: highSeverity,
|
||||
|
||||
top_10_recommendations: prioritizedRecommendations,
|
||||
|
||||
subagent_execution: 'PARALLEL (4 quality dimensions)',
|
||||
performance_gain: '~60% faster than sequential',
|
||||
};
|
||||
|
||||
// Save for Step 4 (report generation)
|
||||
fs.writeFileSync(`/tmp/tea-test-review-summary-${timestamp}.json`, JSON.stringify(reviewSummary, null, 2), 'utf8');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. Display Summary to User
|
||||
|
||||
```
|
||||
✅ Quality Evaluation Complete (Parallel Execution)
|
||||
|
||||
📊 Overall Quality Score: {roundedScore}/100 (Grade: {overallGrade})
|
||||
|
||||
📈 Dimension Scores:
|
||||
- Determinism: {determinism_score}/100 ({determinism_grade})
|
||||
- Isolation: {isolation_score}/100 ({isolation_grade})
|
||||
- Maintainability: {maintainability_score}/100 ({maintainability_grade})
|
||||
- Performance: {performance_score}/100 ({performance_grade})
|
||||
|
||||
ℹ️ Coverage is excluded from `test-review` scoring. Use `trace` for coverage analysis and gates.
|
||||
|
||||
⚠️ Violations Found:
|
||||
- HIGH: {high_count} violations
|
||||
- MEDIUM: {medium_count} violations
|
||||
- LOW: {low_count} violations
|
||||
- TOTAL: {total_count} violations
|
||||
|
||||
🚀 Performance: Parallel execution ~60% faster than sequential
|
||||
|
||||
✅ Ready for report generation (Step 4)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
### 7. Save Progress
|
||||
|
||||
**Save this step's accumulated work to `{outputFile}`.**
|
||||
|
||||
- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
|
||||
|
||||
```yaml
|
||||
---
|
||||
stepsCompleted: ['step-03f-aggregate-scores']
|
||||
lastStep: 'step-03f-aggregate-scores'
|
||||
lastSaved: '{date}'
|
||||
---
|
||||
```
|
||||
|
||||
Then write this step's output below the frontmatter.
|
||||
|
||||
- **If `{outputFile}` already exists**, update:
|
||||
- Add `'step-03f-aggregate-scores'` to `stepsCompleted` array (only if not already present)
|
||||
- Set `lastStep: 'step-03f-aggregate-scores'`
|
||||
- Set `lastSaved: '{date}'`
|
||||
- Append this step's output to the appropriate section of the document.
|
||||
|
||||
---
|
||||
|
||||
## EXIT CONDITION
|
||||
|
||||
Proceed to Step 4 when:
|
||||
|
||||
- ✅ All subagent outputs read successfully
|
||||
- ✅ Overall score calculated
|
||||
- ✅ Violations aggregated
|
||||
- ✅ Recommendations prioritized
|
||||
- ✅ Summary saved to temp file
|
||||
- ✅ Output displayed to user
|
||||
- ✅ Progress saved to output document
|
||||
|
||||
Load next step: `{nextStepFile}`
|
||||
|
||||
---
|
||||
|
||||
## 🚨 SYSTEM SUCCESS METRICS
|
||||
|
||||
### ✅ SUCCESS:
|
||||
|
||||
- All 4 subagent outputs read and parsed
|
||||
- Overall score calculated with proper weights
|
||||
- Violations aggregated correctly
|
||||
- Summary complete and saved
|
||||
|
||||
### ❌ FAILURE:
|
||||
|
||||
- Failed to read one or more subagent outputs
|
||||
- Score calculation incorrect
|
||||
- Summary missing or incomplete
|
||||
|
||||
**Master Rule:** Aggregate determinism, isolation, maintainability, and performance only.
|
||||
@@ -0,0 +1,111 @@
|
||||
---
|
||||
name: 'step-04-generate-report'
|
||||
description: 'Create test-review report and validate'
|
||||
outputFile: '{test_artifacts}/test-review.md'
|
||||
---
|
||||
|
||||
# Step 4: Generate Report & Validate
|
||||
|
||||
## STEP GOAL
|
||||
|
||||
Produce the test-review report and validate against checklist.
|
||||
|
||||
## MANDATORY EXECUTION RULES
|
||||
|
||||
- 📖 Read the entire step file before acting
|
||||
- ✅ Speak in `{communication_language}`
|
||||
|
||||
---
|
||||
|
||||
## EXECUTION PROTOCOLS:
|
||||
|
||||
- 🎯 Follow the MANDATORY SEQUENCE exactly
|
||||
- 💾 Record outputs before proceeding
|
||||
- 📖 Load the next step only when instructed
|
||||
|
||||
## CONTEXT BOUNDARIES:
|
||||
|
||||
- Available context: config, loaded artifacts, and knowledge fragments
|
||||
- Focus: this step's goal only
|
||||
- Limits: do not execute future steps
|
||||
- Dependencies: prior steps' outputs (if any)
|
||||
|
||||
## MANDATORY SEQUENCE
|
||||
|
||||
**CRITICAL:** Follow this sequence exactly. Do not skip, reorder, or improvise.
|
||||
|
||||
## 1. Report Generation
|
||||
|
||||
Use `test-review-template.md` to produce `{outputFile}` including:
|
||||
|
||||
- Score summary
|
||||
- Critical findings with fixes
|
||||
- Warnings and recommendations
|
||||
- Context references (story/test-design if available)
|
||||
- Coverage boundary note: `test-review` does not score coverage. Direct coverage findings to `trace`.
|
||||
|
||||
---
|
||||
|
||||
## 2. Polish Output
|
||||
|
||||
Before finalizing, review the complete output document for quality:
|
||||
|
||||
1. **Remove duplication**: Progressive-append workflow may have created repeated sections — consolidate
|
||||
2. **Verify consistency**: Ensure terminology, risk scores, and references are consistent throughout
|
||||
3. **Check completeness**: All template sections should be populated or explicitly marked N/A
|
||||
4. **Format cleanup**: Ensure markdown formatting is clean (tables aligned, headers consistent, no orphaned references)
|
||||
|
||||
---
|
||||
|
||||
## 3. Validation
|
||||
|
||||
Validate against `checklist.md` and fix any gaps.
|
||||
|
||||
- [ ] CLI sessions cleaned up (no orphaned browsers)
|
||||
- [ ] Temp artifacts stored in `{test_artifacts}/` not random locations
|
||||
|
||||
---
|
||||
|
||||
## 4. Save Progress
|
||||
|
||||
**Save this step's accumulated work to `{outputFile}`.**
|
||||
|
||||
- **If `{outputFile}` does not exist** (first save), create it using the workflow template (if available) with YAML frontmatter:
|
||||
|
||||
```yaml
|
||||
---
|
||||
stepsCompleted: ['step-04-generate-report']
|
||||
lastStep: 'step-04-generate-report'
|
||||
lastSaved: '{date}'
|
||||
---
|
||||
```
|
||||
|
||||
Then write this step's output below the frontmatter.
|
||||
|
||||
- **If `{outputFile}` already exists**, update:
|
||||
- Add `'step-04-generate-report'` to `stepsCompleted` array (only if not already present)
|
||||
- Set `lastStep: 'step-04-generate-report'`
|
||||
- Set `lastSaved: '{date}'`
|
||||
- Append this step's output to the appropriate section of the document.
|
||||
|
||||
---
|
||||
|
||||
## 5. Completion Summary
|
||||
|
||||
Report:
|
||||
|
||||
- Scope reviewed
|
||||
- Overall score
|
||||
- Critical blockers
|
||||
- Next recommended workflow (e.g., `automate` or `trace`)
|
||||
|
||||
## 🚨 SYSTEM SUCCESS/FAILURE METRICS:
|
||||
|
||||
### ✅ SUCCESS:
|
||||
|
||||
- Step completed in full with required outputs
|
||||
|
||||
### ❌ SYSTEM FAILURE:
|
||||
|
||||
- Skipped sequence steps or missing outputs
|
||||
**Master Rule:** Skipping steps is FORBIDDEN.
|
||||
Reference in New Issue
Block a user