- Rewrite README.md with current architecture, features and stack - Update docs/API.md with all current endpoints (corporate, BI, client 360) - Update docs/ARCHITECTURE.md with cache, modular queries, services, ETL - Update docs/GUIA-USUARIO.md for all roles (admin, corporate, agente) - Add docs/INDEX.md documentation index - Add PROJETO.md comprehensive project reference - Add BI-CCC-Implementation-Guide.md - Include AI agent configs (.claude, .agents, .gemini, _bmad) - Add netbird VPN configuration - Add status report Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
11 KiB
Quality Scan Script Opportunities — Reference Guide
Reference: references/script-standards.md for script creation guidelines.
This document identifies deterministic operations that should be offloaded from the LLM into scripts for quality validation of BMad agents.
Core Principle
Scripts validate structure and syntax (deterministic). Prompts evaluate semantics and meaning (judgment). Create scripts for checks that have clear pass/fail criteria.
How to Spot Script Opportunities
During build, walk through every capability/operation and apply these tests:
The Determinism Test
For each operation the agent performs, ask:
- Given identical input, will this ALWAYS produce identical output? → Script
- Does this require interpreting meaning, tone, context, or ambiguity? → Prompt
- Could you write a unit test with expected output for every input? → Script
The Judgment Boundary
Scripts handle: fetch, transform, validate, count, parse, compare, extract, format, check structure Prompts handle: interpret, classify with ambiguity, create, decide with incomplete info, evaluate quality, synthesize meaning
Pattern Recognition Checklist
Table of signal verbs/patterns mapping to script types:
| Signal Verb/Pattern | Script Type |
|---|---|
| "validate", "check", "verify" | Validation script |
| "count", "tally", "aggregate", "sum" | Metric/counting script |
| "extract", "parse", "pull from" | Data extraction script |
| "convert", "transform", "format" | Transformation script |
| "compare", "diff", "match against" | Comparison script |
| "scan for", "find all", "list all" | Pattern scanning script |
| "check structure", "verify exists" | File structure checker |
| "against schema", "conforms to" | Schema validation script |
| "graph", "map dependencies" | Dependency analysis script |
The Outside-the-Box Test
Beyond obvious validation, consider:
- Could any data gathering step be a script that returns structured JSON for the LLM to interpret?
- Could pre-processing reduce what the LLM needs to read?
- Could post-processing validate what the LLM produced?
- Could metric collection feed into LLM decision-making without the LLM doing the counting?
Your Toolbox
Scripts have access to full capabilities — think broadly:
- Bash: Full shell —
jq,grep,awk,sed,find,diff,wc,sort,uniq,curl, plus piping and composition - Python: Standard library (
json,yaml,pathlib,re,argparse,collections,difflib,ast,csv,xml, etc.) plus PEP 723 inline-declared dependencies (tiktoken,jsonschema,pyyaml, etc.) - System tools:
gitcommands for history/diff/blame, filesystem operations, process execution
If you can express the logic as deterministic code, it's a script candidate.
The --help Pattern
All scripts use PEP 723 and --help. When a skill's prompt needs to invoke a script, it can say "Run scripts/foo.py --help to understand inputs/outputs, then invoke appropriately" instead of inlining the script's interface. This saves tokens in prompts and keeps a single source of truth for the script's API.
Priority 1: High-Value Validation Scripts
1. Frontmatter Validator
What: Validate SKILL.md frontmatter structure and content
Why: Frontmatter is the #1 factor in skill triggering. Catch errors early.
Checks:
# checks:
- name exists and is kebab-case
- description exists and follows pattern "Use when..."
- No forbidden fields (XML, reserved prefixes)
- Optional fields have valid values if present
Output: JSON with pass/fail per field, line numbers for errors
Implementation: Python with argparse, no external deps needed
2. Manifest Schema Validator
Status: ✅ Already exists at scripts/manifest.py (create, add-capability, update, read, validate)
Enhancement opportunities:
- Add
--agent-pathflag for auto-discovery - Check menu code uniqueness within agent
- Verify prompt files exist for
type: "prompt"capabilities - Verify external skill names are registered (could check against skill registry)
3. Template Artifact Scanner
What: Scan for orphaned template substitution artifacts
Why: Build process may leave {if-autonomous}, {displayName}, etc.
Output: JSON with file path, line number, artifact type
Implementation: Bash script with JSON output via jq
4. Access Boundaries Extractor
What: Extract and validate access boundaries from memory-system.md
Why: Security critical — must be defined before file operations
Checks:
# Parse memory-system.md for:
- ## Read Access section exists
- ## Write Access section exists
- ## Deny Zones section exists (can be empty)
- Paths use placeholders correctly ({project-root} for _bmad paths, relative for skill-internal)
Output: Structured JSON of read/write/deny zones
Implementation: Python with markdown parsing
5. Prompt Frontmatter Comparator
What: Compare prompt file frontmatter against bmad-manifest.json
Why: Capability misalignment causes runtime errors
Checks:
# For each prompt .md file at skill root:
- Has frontmatter (name, description, menu-code)
- name matches manifest capability name
- menu-code matches manifest (case-insensitive)
- description is present
Output: JSON with mismatches, missing files
Implementation: Python, reads bmad-manifest.json and all prompt .md files at skill root
Priority 2: Analysis Scripts
6. Token Counter
What: Count tokens in each file of an agent
Why: Identify verbose files that need optimization
Checks:
# For each .md file:
- Total tokens (approximate: chars / 4)
- Code block tokens
- Token density (tokens / meaningful content)
Output: JSON with file path, token count, density score
Implementation: Python with tiktoken for accurate counting, or char approximation
7. Dependency Graph Generator
What: Map skill → external skill dependencies
Why: Understand agent's dependency surface
Checks:
# Parse bmad-manifest.json for external skills
# Parse SKILL.md for skill invocation patterns
# Build dependency graph
Output: DOT format (GraphViz) or JSON adjacency list
Implementation: Python, JSON parsing only
8. Activation Flow Analyzer
What: Parse SKILL.md On Activation section for sequence
Why: Validate activation order matches best practices
Checks:
# Look for steps in order:
1. Activation mode detection
2. Config loading
3. First-run check
4. Access boundaries load
5. Memory load
6. Manifest load
7. Greet
8. Present menu
Output: JSON with detected steps, missing steps, out-of-order warnings
Implementation: Python with regex pattern matching
9. Memory Structure Validator
What: Validate memory-system.md structure
Why: Memory files have specific requirements
Checks:
# Required sections:
- ## Core Principle
- ## File Structure
- ## Write Discipline
- ## Memory Maintenance
Output: JSON with missing sections, validation errors
Implementation: Python with markdown parsing
10. Subagent Pattern Detector
What: Detect if agent uses BMAD Advanced Context Pattern
Why: Agents processing 5+ sources MUST use subagents
Checks:
# Pattern detection in SKILL.md:
- "DO NOT read sources yourself"
- "delegate to sub-agents"
- "/tmp/analysis-" temp file pattern
- Sub-agent output template (50-100 token summary)
Output: JSON with pattern found/missing, recommendations
Implementation: Python with keyword search and context extraction
Priority 3: Composite Scripts
11. Agent Health Check
What: Run all validation scripts and aggregate results
Why: One-stop shop for agent quality assessment
Composition: Runs Priority 1 scripts, aggregates JSON outputs
Output: Structured health report with severity levels
Implementation: Bash script orchestrating Python scripts, jq for aggregation
12. Comparison Validator
What: Compare two versions of an agent for differences
Why: Validate changes during iteration
Checks:
# Git diff with structure awareness:
- Frontmatter changes
- Capability additions/removals
- New prompt files
- Token count changes
Output: JSON with categorized changes
Implementation: Bash with git, jq, python for analysis
Script Output Standard
All scripts MUST output structured JSON for agent consumption:
{
"script": "script-name",
"version": "1.0.0",
"agent_path": "/path/to/agent",
"timestamp": "2025-03-08T10:30:00Z",
"status": "pass|fail|warning",
"findings": [
{
"severity": "critical|high|medium|low|info",
"category": "structure|security|performance|consistency",
"location": {"file": "SKILL.md", "line": 42},
"issue": "Clear description",
"fix": "Specific action to resolve"
}
],
"summary": {
"total": 10,
"critical": 1,
"high": 2,
"medium": 3,
"low": 4
}
}
Implementation Checklist
When creating validation scripts:
- Uses
--helpfor documentation - Accepts
--agent-pathfor target agent - Outputs JSON to stdout
- Writes diagnostics to stderr
- Returns meaningful exit codes (0=pass, 1=fail, 2=error)
- Includes
--verboseflag for debugging - Has tests in
scripts/tests/subfolder - Self-contained (PEP 723 for Python)
- No interactive prompts
Integration with Quality Optimizer
The Quality Optimizer should:
- First: Run available scripts for fast, deterministic checks
- Then: Use sub-agents for semantic analysis (requires judgment)
- Finally: Synthesize both sources into report
Example flow:
# Run all validation scripts
python scripts/validate-frontmatter.py --agent-path {path}
bash scripts/scan-template-artifacts.sh --agent-path {path}
python scripts/compare-prompts-manifest.py --agent-path {path}
# Collect JSON outputs
# Spawn sub-agents only for semantic checks
# Synthesize complete report
Script Creation Priorities
Phase 1 (Immediate value):
- Template Artifact Scanner (Bash + jq)
- Prompt Frontmatter Comparator (Python)
- Access Boundaries Extractor (Python)
Phase 2 (Enhanced validation): 4. Token Counter (Python) 5. Subagent Pattern Detector (Python) 6. Activation Flow Analyzer (Python)
Phase 3 (Advanced features): 7. Dependency Graph Generator (Python) 8. Memory Structure Validator (Python) 9. Agent Health Check orchestrator (Bash)
Phase 4 (Comparison tools): 10. Comparison Validator (Bash + Python)