Files
bi-agents/.claude/skills/bmad-workflow-builder/references/script-opportunities-reference.md
Cassel 647cbec54f docs: update all documentation and add AI tooling configs
- Rewrite README.md with current architecture, features and stack
- Update docs/API.md with all current endpoints (corporate, BI, client 360)
- Update docs/ARCHITECTURE.md with cache, modular queries, services, ETL
- Update docs/GUIA-USUARIO.md for all roles (admin, corporate, agente)
- Add docs/INDEX.md documentation index
- Add PROJETO.md comprehensive project reference
- Add BI-CCC-Implementation-Guide.md
- Include AI agent configs (.claude, .agents, .gemini, _bmad)
- Add netbird VPN configuration
- Add status report

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:29:03 -04:00

15 KiB

Script Opportunities Reference — Workflow Builder

Core Principle

Scripts handle deterministic operations (validate, transform, count). Prompts handle judgment (interpret, classify, decide). If a check has clear pass/fail criteria, it belongs in a script.


Section 1: How to Spot Script Opportunities

The Determinism Test

Ask two questions about any operation:

  1. Given identical input, will it always produce identical output? If yes, it's a script candidate.
  2. Could you write a unit test with expected output? If yes, it's definitely a script.

Script territory: The operation has no ambiguity — same input, same result, every time. Prompt territory: The operation requires interpreting meaning, tone, or context — reasonable people could disagree on the output.

The Judgment Boundary

Scripts Handle Prompts Handle
Fetch Interpret
Transform Classify (with ambiguity)
Validate Create
Count Decide (with incomplete info)
Parse Evaluate quality
Compare Synthesize meaning
Extract Assess tone/style
Format Generate recommendations
Check structure Weigh tradeoffs

Pattern Recognition Checklist

When you see these verbs or patterns in a workflow's requirements, think scripts first:

Signal Verb / Pattern Script Type Example
validate Validation script "Validate frontmatter fields exist"
count Metric script "Count tokens per file"
extract Data extraction "Extract all config variable references"
convert / transform Transformation script "Convert stage definitions to graph"
compare Comparison script "Compare prompt frontmatter vs manifest"
scan for Pattern scanning "Scan for orphaned template artifacts"
check structure File structure checker "Check skill directory has required files"
against schema Schema validation "Validate output against JSON schema"
graph / map dependencies Dependency analysis "Map skill-to-skill dependencies"
list all Enumeration script "List all resource files loaded by prompts"
detect pattern Pattern detector "Detect subagent delegation patterns"
diff / changes between Diff analysis "Show what changed between versions"

The Outside-the-Box Test

Scripts are not limited to validation. Push your thinking:

  • Data gathering as script: Could a script collect structured data (file sizes, dependency lists, config values) and return JSON for the LLM to interpret? The LLM gets pre-digested facts instead of reading raw files.
  • Pre-processing: Could a script reduce what the LLM needs to read? Extract only the relevant sections, strip boilerplate, summarize structure.
  • Post-processing validation: Could a script validate LLM output after generation? Check that generated YAML parses, that referenced files exist, that naming conventions are followed.
  • Metric collection: Could scripts count, measure, and tabulate so the LLM makes decisions based on numbers it didn't have to compute? Token counts, file counts, complexity scores — feed these to LLM judgment without making the LLM count.
  • Workflow stage analysis: Could a script parse stage definitions and progression conditions, giving the LLM a structural map without it needing to parse markdown?

Your Toolbox

Scripts have access to the full capabilities of the execution environment. Think broadly — if you can express the logic as deterministic code, it's a script candidate.

Bash: Full shell power — jq, grep, awk, sed, find, diff, wc, sort, uniq, curl, plus piping and composition. Great for file discovery, text processing, and orchestrating other scripts.

Python: The entire standard library — json, yaml, pathlib, re, argparse, collections, difflib, ast, csv, xml.etree, textwrap, dataclasses, and more. Plus PEP 723 inline-declared dependencies for anything else: tiktoken for accurate token counting, jsonschema for schema validation, pyyaml for YAML parsing, etc.

System tools: git commands for history, diff, blame, and log analysis. Filesystem operations for directory scanning and structure validation. Process execution for orchestrating multi-script pipelines.

The --help Pattern

All scripts use PEP 723 metadata and implement --help. This creates a powerful integration pattern for prompts:

Instead of inlining a script's interface details into a prompt, the prompt can simply say:

Run scripts/foo.py --help to understand its inputs and outputs, then invoke appropriately.

This saves tokens in the prompt and keeps a single source of truth for the script's API. When a script's interface changes, the prompt doesn't need updating — --help always reflects the current contract.


Section 2: Script Opportunity Catalog

Each entry follows the format: What it does, Why it matters for workflows, What it checks, What it outputs, and Implementation notes.


1. Frontmatter Validator

What: Validate SKILL.md frontmatter structure and content.

Why: Frontmatter drives skill triggering and routing. Malformed frontmatter means the skill never activates or activates incorrectly.

Checks:

  • name exists and is kebab-case
  • description exists and follows "Use when..." pattern
  • argument-hint is present if the skill accepts arguments
  • No forbidden fields or reserved prefixes
  • Optional fields have valid values if present

Output: JSON with pass/fail per field, line numbers for errors.

Implementation: Python with argparse, no external deps needed. Parse YAML frontmatter between --- delimiters.


2. Template Artifact Scanner

What: Scan all skill files for orphaned template substitution artifacts.

Why: The build process may leave behind {if-autonomous}, {displayName}, {skill-name}, or other placeholders that should have been replaced. These cause runtime confusion.

Checks:

  • Scan all .md files for {placeholder} patterns
  • Distinguish real config variables (loaded at runtime) from build-time artifacts
  • Flag any that don't match known runtime variables

Output: JSON with file path, line number, artifact text, and whether it looks intentional.

Implementation: Bash script with grep and jq for JSON output, or Python with regex.


3. Prompt Frontmatter Comparator

What: Compare prompt file frontmatter against the skill's bmad-skill-manifest.yaml.

Why: Capability misalignment between prompts and the manifest causes routing failures — the skill advertises a capability it can't deliver, or has a prompt that's never reachable.

Checks:

  • Every prompt file at root has frontmatter with name, description, menu-code
  • Prompt name matches manifest capability name
  • menu-code matches manifest entry (case-insensitive)
  • Every manifest capability with type: "prompt" has a corresponding file
  • Flag orphaned prompts not listed in manifest

Output: JSON with mismatches, missing files, orphaned prompts.

Implementation: Python, reads bmad-skill-manifest.yaml and all prompt .md files at skill root.


4. Token Counter

What: Count approximate token counts for each file in a skill.

Why: Identify verbose files that need optimization. Catch skills that exceed context window budgets. Understand where token budget is spent across prompts, resources, and the SKILL.md.

Checks:

  • Total tokens per .md file (approximate: chars / 4, or accurate via tiktoken)
  • Code block tokens vs prose tokens
  • Cumulative token cost of full skill activation (SKILL.md + loaded resources + initial prompt)

Output: JSON with file path, token count, percentage of total, and a sorted ranking.

Implementation: Python. Use tiktoken (PEP 723 dependency) for accuracy, or fall back to character approximation.


5. Dependency Graph Generator

What: Map dependencies between the current skill and external skills it invokes.

Why: Understand the skill's dependency surface. Catch references to skills that don't exist or have been renamed.

Checks:

  • Parse bmad-skill-manifest.yaml for external skill references
  • Parse SKILL.md and prompts for skill invocation patterns (invoke, load, skill name references)
  • Build a dependency list with direction (this skill depends on X, Y depends on this skill)

Output: JSON adjacency list or DOT format (GraphViz). Include whether each dependency is required or optional.

Implementation: Python, JSON/YAML parsing with regex for invocation pattern detection.


6. Stage Flow Analyzer

What: Parse multi-stage workflow definitions to extract stage ordering, progression conditions, and routing logic.

Why: Complex workflows define stages with specific progression conditions. Misaligned stage ordering, missing progression gates, or unreachable stages cause workflow failures that are hard to debug at runtime.

Checks:

  • Extract all defined stages from SKILL.md and prompt files
  • Verify each stage has a clear entry condition and exit/progression condition
  • Detect unreachable stages (no path leads to them)
  • Detect dead-end stages (no progression and not marked as terminal)
  • Validate stage ordering matches the documented flow
  • Check for circular stage references

Output: JSON with stage list, progression map, and structural warnings.

Implementation: Python with regex for stage/condition extraction from markdown.


7. Config Variable Tracker

What: Find all {var} references across skill files and verify they are loaded or defined.

Why: Unresolved config variables cause runtime errors or produce literal {var_name} text in outputs. This is especially common after refactoring or renaming variables.

Checks:

  • Scan all .md files for {variable_name} patterns
  • Cross-reference against variables loaded by bmad-init or defined in config
  • Distinguish template variables from literal text in code blocks
  • Flag undefined variables and unused loaded variables

Output: JSON with variable name, locations where used, and whether it's defined/loaded.

Implementation: Python with regex scanning and config file parsing.


8. Resource Loading Analyzer

What: Map which resources are loaded at which point during skill execution.

Why: Resources loaded too early waste context. Resources never loaded are dead weight in the skill directory. Understanding the loading sequence helps optimize token budget.

Checks:

  • Parse SKILL.md and prompts for Load resource / Read / file reference patterns
  • Map each resource to the stage/prompt where it's first loaded
  • Identify resources in references/ that are never referenced
  • Identify resources referenced but missing from references/
  • Calculate cumulative token cost at each loading point

Output: JSON with resource file, loading trigger (which prompt/stage), and orphan/missing flags.

Implementation: Python with regex for load-pattern detection and directory scanning.


9. Subagent Pattern Detector

What: Detect whether a skill that processes multiple sources uses the BMad Advanced Context Pattern (subagent delegation).

Why: Skills processing 5+ sources without subagent delegation risk context overflow and degraded output quality. This pattern is required for high-source-count workflows.

Checks:

  • Count distinct source/input references in the skill
  • Look for subagent delegation patterns: "DO NOT read sources yourself", "delegate to sub-agents", /tmp/analysis- temp file patterns
  • Check for sub-agent output templates (50-100 token summaries)
  • Flag skills with 5+ sources that lack the pattern

Output: JSON with source count, pattern found/missing, and recommendations.

Implementation: Python with keyword search and context extraction.


10. Prompt Chain Validator

What: Trace the chain of prompt loads through a workflow and verify every path is valid.

Why: Workflows route between prompts based on user intent and stage progression. A broken link in the chain — a Load foo.md where foo.md doesn't exist — halts the workflow.

Checks:

  • Extract all Load *.md prompt references from SKILL.md and every prompt file
  • Verify each referenced prompt file exists
  • Build a reachability map from SKILL.md entry points
  • Flag prompts that exist but are unreachable from any entry point

Output: JSON with prompt chain map, broken links, and unreachable prompts.

Implementation: Python with regex extraction and file existence checks.


11. Skill Health Check (Composite)

What: Run all available validation scripts and aggregate results into a single report.

Why: One command to assess overall skill quality. Useful as a build gate or pre-commit check.

Composition: Runs scripts 1-10 in sequence, collects JSON outputs, aggregates findings by severity.

Output: Unified JSON health report with per-script results and overall status.

Implementation: Bash script orchestrating Python scripts, jq for JSON aggregation. Or a Python orchestrator using subprocess.


12. Skill Comparison Validator

What: Compare two versions of a skill (or two skills) for structural differences.

Why: Validate that changes during iteration didn't break structure. Useful for reviewing edits, comparing before/after optimization, or diffing a skill against a template.

Checks:

  • Frontmatter changes
  • Capability additions/removals in manifest
  • New or removed prompt files
  • Token count changes per file
  • Stage flow changes (for workflows)
  • Resource additions/removals

Output: JSON with categorized changes and severity assessment.

Implementation: Bash with git diff or file comparison, Python for structural analysis.


Section 3: Script Output Standard and Implementation Checklist

Script Output Standard

All scripts MUST output structured JSON for agent consumption:

{
  "script": "script-name",
  "version": "1.0.0",
  "skill_path": "/path/to/skill",
  "timestamp": "2025-03-08T10:30:00Z",
  "status": "pass|fail|warning",
  "findings": [
    {
      "severity": "critical|high|medium|low|info",
      "category": "structure|security|performance|consistency",
      "location": {"file": "SKILL.md", "line": 42},
      "issue": "Clear description",
      "fix": "Specific action to resolve"
    }
  ],
  "summary": {
    "total": 0,
    "critical": 0,
    "high": 0,
    "medium": 0,
    "low": 0
  }
}

Implementation Checklist

When creating new validation scripts:

  • Uses --help for documentation (PEP 723 metadata)
  • Accepts skill path as argument
  • -o flag for output file (defaults to stdout)
  • Writes diagnostics to stderr
  • Returns meaningful exit codes: 0=pass, 1=fail, 2=error
  • Includes --verbose flag for debugging
  • Self-contained (PEP 723 for Python dependencies)
  • No interactive prompts
  • No network dependencies
  • Outputs valid JSON to stdout
  • Has tests in scripts/tests/ subfolder