Files
bi-agents/.agents/skills/bmad-agent-builder/references/script-opportunities-reference.md
Cassel 647cbec54f docs: update all documentation and add AI tooling configs
- Rewrite README.md with current architecture, features and stack
- Update docs/API.md with all current endpoints (corporate, BI, client 360)
- Update docs/ARCHITECTURE.md with cache, modular queries, services, ETL
- Update docs/GUIA-USUARIO.md for all roles (admin, corporate, agente)
- Add docs/INDEX.md documentation index
- Add PROJETO.md comprehensive project reference
- Add BI-CCC-Implementation-Guide.md
- Include AI agent configs (.claude, .agents, .gemini, _bmad)
- Add netbird VPN configuration
- Add status report

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:29:03 -04:00

11 KiB

Quality Scan Script Opportunities — Reference Guide

Reference: references/script-standards.md for script creation guidelines.

This document identifies deterministic operations that should be offloaded from the LLM into scripts for quality validation of BMad agents.


Core Principle

Scripts validate structure and syntax (deterministic). Prompts evaluate semantics and meaning (judgment). Create scripts for checks that have clear pass/fail criteria.


How to Spot Script Opportunities

During build, walk through every capability/operation and apply these tests:

The Determinism Test

For each operation the agent performs, ask:

  • Given identical input, will this ALWAYS produce identical output? → Script
  • Does this require interpreting meaning, tone, context, or ambiguity? → Prompt
  • Could you write a unit test with expected output for every input? → Script

The Judgment Boundary

Scripts handle: fetch, transform, validate, count, parse, compare, extract, format, check structure Prompts handle: interpret, classify with ambiguity, create, decide with incomplete info, evaluate quality, synthesize meaning

Pattern Recognition Checklist

Table of signal verbs/patterns mapping to script types:

Signal Verb/Pattern Script Type
"validate", "check", "verify" Validation script
"count", "tally", "aggregate", "sum" Metric/counting script
"extract", "parse", "pull from" Data extraction script
"convert", "transform", "format" Transformation script
"compare", "diff", "match against" Comparison script
"scan for", "find all", "list all" Pattern scanning script
"check structure", "verify exists" File structure checker
"against schema", "conforms to" Schema validation script
"graph", "map dependencies" Dependency analysis script

The Outside-the-Box Test

Beyond obvious validation, consider:

  • Could any data gathering step be a script that returns structured JSON for the LLM to interpret?
  • Could pre-processing reduce what the LLM needs to read?
  • Could post-processing validate what the LLM produced?
  • Could metric collection feed into LLM decision-making without the LLM doing the counting?

Your Toolbox

Scripts have access to full capabilities — think broadly:

  • Bash: Full shell — jq, grep, awk, sed, find, diff, wc, sort, uniq, curl, plus piping and composition
  • Python: Standard library (json, yaml, pathlib, re, argparse, collections, difflib, ast, csv, xml, etc.) plus PEP 723 inline-declared dependencies (tiktoken, jsonschema, pyyaml, etc.)
  • System tools: git commands for history/diff/blame, filesystem operations, process execution

If you can express the logic as deterministic code, it's a script candidate.

The --help Pattern

All scripts use PEP 723 and --help. When a skill's prompt needs to invoke a script, it can say "Run scripts/foo.py --help to understand inputs/outputs, then invoke appropriately" instead of inlining the script's interface. This saves tokens in prompts and keeps a single source of truth for the script's API.


Priority 1: High-Value Validation Scripts

1. Frontmatter Validator

What: Validate SKILL.md frontmatter structure and content

Why: Frontmatter is the #1 factor in skill triggering. Catch errors early.

Checks:

# checks:
- name exists and is kebab-case
- description exists and follows pattern "Use when..."
- No forbidden fields (XML, reserved prefixes)
- Optional fields have valid values if present

Output: JSON with pass/fail per field, line numbers for errors

Implementation: Python with argparse, no external deps needed


2. Manifest Schema Validator

Status: Already exists at scripts/manifest.py (create, add-capability, update, read, validate)

Enhancement opportunities:

  • Add --agent-path flag for auto-discovery
  • Check menu code uniqueness within agent
  • Verify prompt files exist for type: "prompt" capabilities
  • Verify external skill names are registered (could check against skill registry)

3. Template Artifact Scanner

What: Scan for orphaned template substitution artifacts

Why: Build process may leave {if-autonomous}, {displayName}, etc.

Output: JSON with file path, line number, artifact type

Implementation: Bash script with JSON output via jq


4. Access Boundaries Extractor

What: Extract and validate access boundaries from memory-system.md

Why: Security critical — must be defined before file operations

Checks:

# Parse memory-system.md for:
- ## Read Access section exists
- ## Write Access section exists
- ## Deny Zones section exists (can be empty)
- Paths use placeholders correctly ({project-root} for _bmad paths, relative for skill-internal)

Output: Structured JSON of read/write/deny zones

Implementation: Python with markdown parsing


5. Prompt Frontmatter Comparator

What: Compare prompt file frontmatter against bmad-manifest.json

Why: Capability misalignment causes runtime errors

Checks:

# For each prompt .md file at skill root:
- Has frontmatter (name, description, menu-code)
- name matches manifest capability name
- menu-code matches manifest (case-insensitive)
- description is present

Output: JSON with mismatches, missing files

Implementation: Python, reads bmad-manifest.json and all prompt .md files at skill root


Priority 2: Analysis Scripts

6. Token Counter

What: Count tokens in each file of an agent

Why: Identify verbose files that need optimization

Checks:

# For each .md file:
- Total tokens (approximate: chars / 4)
- Code block tokens
- Token density (tokens / meaningful content)

Output: JSON with file path, token count, density score

Implementation: Python with tiktoken for accurate counting, or char approximation


7. Dependency Graph Generator

What: Map skill → external skill dependencies

Why: Understand agent's dependency surface

Checks:

# Parse bmad-manifest.json for external skills
# Parse SKILL.md for skill invocation patterns
# Build dependency graph

Output: DOT format (GraphViz) or JSON adjacency list

Implementation: Python, JSON parsing only


8. Activation Flow Analyzer

What: Parse SKILL.md On Activation section for sequence

Why: Validate activation order matches best practices

Checks:

# Look for steps in order:
1. Activation mode detection
2. Config loading
3. First-run check
4. Access boundaries load
5. Memory load
6. Manifest load
7. Greet
8. Present menu

Output: JSON with detected steps, missing steps, out-of-order warnings

Implementation: Python with regex pattern matching


9. Memory Structure Validator

What: Validate memory-system.md structure

Why: Memory files have specific requirements

Checks:

# Required sections:
- ## Core Principle
- ## File Structure
- ## Write Discipline
- ## Memory Maintenance

Output: JSON with missing sections, validation errors

Implementation: Python with markdown parsing


10. Subagent Pattern Detector

What: Detect if agent uses BMAD Advanced Context Pattern

Why: Agents processing 5+ sources MUST use subagents

Checks:

# Pattern detection in SKILL.md:
- "DO NOT read sources yourself"
- "delegate to sub-agents"
- "/tmp/analysis-" temp file pattern
- Sub-agent output template (50-100 token summary)

Output: JSON with pattern found/missing, recommendations

Implementation: Python with keyword search and context extraction


Priority 3: Composite Scripts

11. Agent Health Check

What: Run all validation scripts and aggregate results

Why: One-stop shop for agent quality assessment

Composition: Runs Priority 1 scripts, aggregates JSON outputs

Output: Structured health report with severity levels

Implementation: Bash script orchestrating Python scripts, jq for aggregation


12. Comparison Validator

What: Compare two versions of an agent for differences

Why: Validate changes during iteration

Checks:

# Git diff with structure awareness:
- Frontmatter changes
- Capability additions/removals
- New prompt files
- Token count changes

Output: JSON with categorized changes

Implementation: Bash with git, jq, python for analysis


Script Output Standard

All scripts MUST output structured JSON for agent consumption:

{
  "script": "script-name",
  "version": "1.0.0",
  "agent_path": "/path/to/agent",
  "timestamp": "2025-03-08T10:30:00Z",
  "status": "pass|fail|warning",
  "findings": [
    {
      "severity": "critical|high|medium|low|info",
      "category": "structure|security|performance|consistency",
      "location": {"file": "SKILL.md", "line": 42},
      "issue": "Clear description",
      "fix": "Specific action to resolve"
    }
  ],
  "summary": {
    "total": 10,
    "critical": 1,
    "high": 2,
    "medium": 3,
    "low": 4
  }
}

Implementation Checklist

When creating validation scripts:

  • Uses --help for documentation
  • Accepts --agent-path for target agent
  • Outputs JSON to stdout
  • Writes diagnostics to stderr
  • Returns meaningful exit codes (0=pass, 1=fail, 2=error)
  • Includes --verbose flag for debugging
  • Has tests in scripts/tests/ subfolder
  • Self-contained (PEP 723 for Python)
  • No interactive prompts

Integration with Quality Optimizer

The Quality Optimizer should:

  1. First: Run available scripts for fast, deterministic checks
  2. Then: Use sub-agents for semantic analysis (requires judgment)
  3. Finally: Synthesize both sources into report

Example flow:

# Run all validation scripts
python scripts/validate-frontmatter.py --agent-path {path}
bash scripts/scan-template-artifacts.sh --agent-path {path}
python scripts/compare-prompts-manifest.py --agent-path {path}

# Collect JSON outputs
# Spawn sub-agents only for semantic checks
# Synthesize complete report

Script Creation Priorities

Phase 1 (Immediate value):

  1. Template Artifact Scanner (Bash + jq)
  2. Prompt Frontmatter Comparator (Python)
  3. Access Boundaries Extractor (Python)

Phase 2 (Enhanced validation): 4. Token Counter (Python) 5. Subagent Pattern Detector (Python) 6. Activation Flow Analyzer (Python)

Phase 3 (Advanced features): 7. Dependency Graph Generator (Python) 8. Memory Structure Validator (Python) 9. Agent Health Check orchestrator (Bash)

Phase 4 (Comparison tools): 10. Comparison Validator (Bash + Python)