# Quality Scan Script Opportunities — Reference Guide **Reference: `references/script-standards.md` for script creation guidelines.** This document identifies deterministic operations that should be offloaded from the LLM into scripts for quality validation of BMad agents. --- ## Core Principle Scripts validate structure and syntax (deterministic). Prompts evaluate semantics and meaning (judgment). Create scripts for checks that have clear pass/fail criteria. --- ## How to Spot Script Opportunities During build, walk through every capability/operation and apply these tests: ### The Determinism Test For each operation the agent performs, ask: - Given identical input, will this ALWAYS produce identical output? → Script - Does this require interpreting meaning, tone, context, or ambiguity? → Prompt - Could you write a unit test with expected output for every input? → Script ### The Judgment Boundary Scripts handle: fetch, transform, validate, count, parse, compare, extract, format, check structure Prompts handle: interpret, classify with ambiguity, create, decide with incomplete info, evaluate quality, synthesize meaning ### Pattern Recognition Checklist Table of signal verbs/patterns mapping to script types: | Signal Verb/Pattern | Script Type | |---------------------|-------------| | "validate", "check", "verify" | Validation script | | "count", "tally", "aggregate", "sum" | Metric/counting script | | "extract", "parse", "pull from" | Data extraction script | | "convert", "transform", "format" | Transformation script | | "compare", "diff", "match against" | Comparison script | | "scan for", "find all", "list all" | Pattern scanning script | | "check structure", "verify exists" | File structure checker | | "against schema", "conforms to" | Schema validation script | | "graph", "map dependencies" | Dependency analysis script | ### The Outside-the-Box Test Beyond obvious validation, consider: - Could any data gathering step be a script that returns structured JSON for the LLM to interpret? - Could pre-processing reduce what the LLM needs to read? - Could post-processing validate what the LLM produced? - Could metric collection feed into LLM decision-making without the LLM doing the counting? ### Your Toolbox Scripts have access to full capabilities — think broadly: - **Bash**: Full shell — `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, `sort`, `uniq`, `curl`, plus piping and composition - **Python**: Standard library (`json`, `yaml`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml`, etc.) plus PEP 723 inline-declared dependencies (`tiktoken`, `jsonschema`, `pyyaml`, etc.) - **System tools**: `git` commands for history/diff/blame, filesystem operations, process execution If you can express the logic as deterministic code, it's a script candidate. ### The --help Pattern All scripts use PEP 723 and `--help`. When a skill's prompt needs to invoke a script, it can say "Run `scripts/foo.py --help` to understand inputs/outputs, then invoke appropriately" instead of inlining the script's interface. This saves tokens in prompts and keeps a single source of truth for the script's API. --- ## Priority 1: High-Value Validation Scripts ### 1. Frontmatter Validator **What:** Validate SKILL.md frontmatter structure and content **Why:** Frontmatter is the #1 factor in skill triggering. Catch errors early. **Checks:** ```python # checks: - name exists and is kebab-case - description exists and follows pattern "Use when..." - No forbidden fields (XML, reserved prefixes) - Optional fields have valid values if present ``` **Output:** JSON with pass/fail per field, line numbers for errors **Implementation:** Python with argparse, no external deps needed --- ### 2. Manifest Schema Validator **Status:** ✅ Already exists at `scripts/manifest.py` (create, add-capability, update, read, validate) **Enhancement opportunities:** - Add `--agent-path` flag for auto-discovery - Check menu code uniqueness within agent - Verify prompt files exist for `type: "prompt"` capabilities - Verify external skill names are registered (could check against skill registry) --- ### 3. Template Artifact Scanner **What:** Scan for orphaned template substitution artifacts **Why:** Build process may leave `{if-autonomous}`, `{displayName}`, etc. **Output:** JSON with file path, line number, artifact type **Implementation:** Bash script with JSON output via jq --- ### 4. Access Boundaries Extractor **What:** Extract and validate access boundaries from memory-system.md **Why:** Security critical — must be defined before file operations **Checks:** ```python # Parse memory-system.md for: - ## Read Access section exists - ## Write Access section exists - ## Deny Zones section exists (can be empty) - Paths use placeholders correctly ({project-root} for _bmad paths, relative for skill-internal) ``` **Output:** Structured JSON of read/write/deny zones **Implementation:** Python with markdown parsing --- ### 5. Prompt Frontmatter Comparator **What:** Compare prompt file frontmatter against bmad-manifest.json **Why:** Capability misalignment causes runtime errors **Checks:** ```python # For each prompt .md file at skill root: - Has frontmatter (name, description, menu-code) - name matches manifest capability name - menu-code matches manifest (case-insensitive) - description is present ``` **Output:** JSON with mismatches, missing files **Implementation:** Python, reads bmad-manifest.json and all prompt .md files at skill root --- ## Priority 2: Analysis Scripts ### 6. Token Counter **What:** Count tokens in each file of an agent **Why:** Identify verbose files that need optimization **Checks:** ```python # For each .md file: - Total tokens (approximate: chars / 4) - Code block tokens - Token density (tokens / meaningful content) ``` **Output:** JSON with file path, token count, density score **Implementation:** Python with tiktoken for accurate counting, or char approximation --- ### 7. Dependency Graph Generator **What:** Map skill → external skill dependencies **Why:** Understand agent's dependency surface **Checks:** ```python # Parse bmad-manifest.json for external skills # Parse SKILL.md for skill invocation patterns # Build dependency graph ``` **Output:** DOT format (GraphViz) or JSON adjacency list **Implementation:** Python, JSON parsing only --- ### 8. Activation Flow Analyzer **What:** Parse SKILL.md On Activation section for sequence **Why:** Validate activation order matches best practices **Checks:** ```python # Look for steps in order: 1. Activation mode detection 2. Config loading 3. First-run check 4. Access boundaries load 5. Memory load 6. Manifest load 7. Greet 8. Present menu ``` **Output:** JSON with detected steps, missing steps, out-of-order warnings **Implementation:** Python with regex pattern matching --- ### 9. Memory Structure Validator **What:** Validate memory-system.md structure **Why:** Memory files have specific requirements **Checks:** ```python # Required sections: - ## Core Principle - ## File Structure - ## Write Discipline - ## Memory Maintenance ``` **Output:** JSON with missing sections, validation errors **Implementation:** Python with markdown parsing --- ### 10. Subagent Pattern Detector **What:** Detect if agent uses BMAD Advanced Context Pattern **Why:** Agents processing 5+ sources MUST use subagents **Checks:** ```python # Pattern detection in SKILL.md: - "DO NOT read sources yourself" - "delegate to sub-agents" - "/tmp/analysis-" temp file pattern - Sub-agent output template (50-100 token summary) ``` **Output:** JSON with pattern found/missing, recommendations **Implementation:** Python with keyword search and context extraction --- ## Priority 3: Composite Scripts ### 11. Agent Health Check **What:** Run all validation scripts and aggregate results **Why:** One-stop shop for agent quality assessment **Composition:** Runs Priority 1 scripts, aggregates JSON outputs **Output:** Structured health report with severity levels **Implementation:** Bash script orchestrating Python scripts, jq for aggregation --- ### 12. Comparison Validator **What:** Compare two versions of an agent for differences **Why:** Validate changes during iteration **Checks:** ```bash # Git diff with structure awareness: - Frontmatter changes - Capability additions/removals - New prompt files - Token count changes ``` **Output:** JSON with categorized changes **Implementation:** Bash with git, jq, python for analysis --- ## Script Output Standard All scripts MUST output structured JSON for agent consumption: ```json { "script": "script-name", "version": "1.0.0", "agent_path": "/path/to/agent", "timestamp": "2025-03-08T10:30:00Z", "status": "pass|fail|warning", "findings": [ { "severity": "critical|high|medium|low|info", "category": "structure|security|performance|consistency", "location": {"file": "SKILL.md", "line": 42}, "issue": "Clear description", "fix": "Specific action to resolve" } ], "summary": { "total": 10, "critical": 1, "high": 2, "medium": 3, "low": 4 } } ``` --- ## Implementation Checklist When creating validation scripts: - [ ] Uses `--help` for documentation - [ ] Accepts `--agent-path` for target agent - [ ] Outputs JSON to stdout - [ ] Writes diagnostics to stderr - [ ] Returns meaningful exit codes (0=pass, 1=fail, 2=error) - [ ] Includes `--verbose` flag for debugging - [ ] Has tests in `scripts/tests/` subfolder - [ ] Self-contained (PEP 723 for Python) - [ ] No interactive prompts --- ## Integration with Quality Optimizer The Quality Optimizer should: 1. **First**: Run available scripts for fast, deterministic checks 2. **Then**: Use sub-agents for semantic analysis (requires judgment) 3. **Finally**: Synthesize both sources into report **Example flow:** ```bash # Run all validation scripts python scripts/validate-frontmatter.py --agent-path {path} bash scripts/scan-template-artifacts.sh --agent-path {path} python scripts/compare-prompts-manifest.py --agent-path {path} # Collect JSON outputs # Spawn sub-agents only for semantic checks # Synthesize complete report ``` --- ## Script Creation Priorities **Phase 1 (Immediate value):** 1. Template Artifact Scanner (Bash + jq) 2. Prompt Frontmatter Comparator (Python) 3. Access Boundaries Extractor (Python) **Phase 2 (Enhanced validation):** 4. Token Counter (Python) 5. Subagent Pattern Detector (Python) 6. Activation Flow Analyzer (Python) **Phase 3 (Advanced features):** 7. Dependency Graph Generator (Python) 8. Memory Structure Validator (Python) 9. Agent Health Check orchestrator (Bash) **Phase 4 (Comparison tools):** 10. Comparison Validator (Bash + Python)