Files
bi-agents/.agents/skills/bmad-workflow-builder/references/script-opportunities-reference.md
Cassel 647cbec54f docs: update all documentation and add AI tooling configs
- Rewrite README.md with current architecture, features and stack
- Update docs/API.md with all current endpoints (corporate, BI, client 360)
- Update docs/ARCHITECTURE.md with cache, modular queries, services, ETL
- Update docs/GUIA-USUARIO.md for all roles (admin, corporate, agente)
- Add docs/INDEX.md documentation index
- Add PROJETO.md comprehensive project reference
- Add BI-CCC-Implementation-Guide.md
- Include AI agent configs (.claude, .agents, .gemini, _bmad)
- Add netbird VPN configuration
- Add status report

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:29:03 -04:00

355 lines
15 KiB
Markdown

# Script Opportunities Reference — Workflow Builder
## Core Principle
Scripts handle deterministic operations (validate, transform, count). Prompts handle judgment (interpret, classify, decide). If a check has clear pass/fail criteria, it belongs in a script.
---
## Section 1: How to Spot Script Opportunities
### The Determinism Test
Ask two questions about any operation:
1. **Given identical input, will it always produce identical output?** If yes, it's a script candidate.
2. **Could you write a unit test with expected output?** If yes, it's definitely a script.
**Script territory:** The operation has no ambiguity — same input, same result, every time.
**Prompt territory:** The operation requires interpreting meaning, tone, or context — reasonable people could disagree on the output.
### The Judgment Boundary
| Scripts Handle | Prompts Handle |
|----------------|----------------|
| Fetch | Interpret |
| Transform | Classify (with ambiguity) |
| Validate | Create |
| Count | Decide (with incomplete info) |
| Parse | Evaluate quality |
| Compare | Synthesize meaning |
| Extract | Assess tone/style |
| Format | Generate recommendations |
| Check structure | Weigh tradeoffs |
### Pattern Recognition Checklist
When you see these verbs or patterns in a workflow's requirements, think scripts first:
| Signal Verb / Pattern | Script Type | Example |
|----------------------|-------------|---------|
| validate | Validation script | "Validate frontmatter fields exist" |
| count | Metric script | "Count tokens per file" |
| extract | Data extraction | "Extract all config variable references" |
| convert / transform | Transformation script | "Convert stage definitions to graph" |
| compare | Comparison script | "Compare prompt frontmatter vs manifest" |
| scan for | Pattern scanning | "Scan for orphaned template artifacts" |
| check structure | File structure checker | "Check skill directory has required files" |
| against schema | Schema validation | "Validate output against JSON schema" |
| graph / map dependencies | Dependency analysis | "Map skill-to-skill dependencies" |
| list all | Enumeration script | "List all resource files loaded by prompts" |
| detect pattern | Pattern detector | "Detect subagent delegation patterns" |
| diff / changes between | Diff analysis | "Show what changed between versions" |
### The Outside-the-Box Test
Scripts are not limited to validation. Push your thinking:
- **Data gathering as script:** Could a script collect structured data (file sizes, dependency lists, config values) and return JSON for the LLM to interpret? The LLM gets pre-digested facts instead of reading raw files.
- **Pre-processing:** Could a script reduce what the LLM needs to read? Extract only the relevant sections, strip boilerplate, summarize structure.
- **Post-processing validation:** Could a script validate LLM output after generation? Check that generated YAML parses, that referenced files exist, that naming conventions are followed.
- **Metric collection:** Could scripts count, measure, and tabulate so the LLM makes decisions based on numbers it didn't have to compute? Token counts, file counts, complexity scores — feed these to LLM judgment without making the LLM count.
- **Workflow stage analysis:** Could a script parse stage definitions and progression conditions, giving the LLM a structural map without it needing to parse markdown?
### Your Toolbox
Scripts have access to the full capabilities of the execution environment. Think broadly — if you can express the logic as deterministic code, it's a script candidate.
**Bash:** Full shell power — `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, `sort`, `uniq`, `curl`, plus piping and composition. Great for file discovery, text processing, and orchestrating other scripts.
**Python:** The entire standard library — `json`, `yaml`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml.etree`, `textwrap`, `dataclasses`, and more. Plus PEP 723 inline-declared dependencies for anything else: `tiktoken` for accurate token counting, `jsonschema` for schema validation, `pyyaml` for YAML parsing, etc.
**System tools:** `git` commands for history, diff, blame, and log analysis. Filesystem operations for directory scanning and structure validation. Process execution for orchestrating multi-script pipelines.
### The --help Pattern
All scripts use PEP 723 metadata and implement `--help`. This creates a powerful integration pattern for prompts:
Instead of inlining a script's interface details into a prompt, the prompt can simply say:
> Run `scripts/foo.py --help` to understand its inputs and outputs, then invoke appropriately.
This saves tokens in the prompt and keeps a single source of truth for the script's API. When a script's interface changes, the prompt doesn't need updating — `--help` always reflects the current contract.
---
## Section 2: Script Opportunity Catalog
Each entry follows the format: What it does, Why it matters for workflows, What it checks, What it outputs, and Implementation notes.
---
### 1. Frontmatter Validator
**What:** Validate SKILL.md frontmatter structure and content.
**Why:** Frontmatter drives skill triggering and routing. Malformed frontmatter means the skill never activates or activates incorrectly.
**Checks:**
- `name` exists and is kebab-case
- `description` exists and follows "Use when..." pattern
- `argument-hint` is present if the skill accepts arguments
- No forbidden fields or reserved prefixes
- Optional fields have valid values if present
**Output:** JSON with pass/fail per field, line numbers for errors.
**Implementation:** Python with argparse, no external deps needed. Parse YAML frontmatter between `---` delimiters.
---
### 2. Template Artifact Scanner
**What:** Scan all skill files for orphaned template substitution artifacts.
**Why:** The build process may leave behind `{if-autonomous}`, `{displayName}`, `{skill-name}`, or other placeholders that should have been replaced. These cause runtime confusion.
**Checks:**
- Scan all `.md` files for `{placeholder}` patterns
- Distinguish real config variables (loaded at runtime) from build-time artifacts
- Flag any that don't match known runtime variables
**Output:** JSON with file path, line number, artifact text, and whether it looks intentional.
**Implementation:** Bash script with `grep` and `jq` for JSON output, or Python with regex.
---
### 3. Prompt Frontmatter Comparator
**What:** Compare prompt file frontmatter against the skill's `bmad-skill-manifest.yaml`.
**Why:** Capability misalignment between prompts and the manifest causes routing failures — the skill advertises a capability it can't deliver, or has a prompt that's never reachable.
**Checks:**
- Every prompt file at root has frontmatter with `name`, `description`, `menu-code`
- Prompt `name` matches manifest capability name
- `menu-code` matches manifest entry (case-insensitive)
- Every manifest capability with `type: "prompt"` has a corresponding file
- Flag orphaned prompts not listed in manifest
**Output:** JSON with mismatches, missing files, orphaned prompts.
**Implementation:** Python, reads `bmad-skill-manifest.yaml` and all prompt `.md` files at skill root.
---
### 4. Token Counter
**What:** Count approximate token counts for each file in a skill.
**Why:** Identify verbose files that need optimization. Catch skills that exceed context window budgets. Understand where token budget is spent across prompts, resources, and the SKILL.md.
**Checks:**
- Total tokens per `.md` file (approximate: chars / 4, or accurate via tiktoken)
- Code block tokens vs prose tokens
- Cumulative token cost of full skill activation (SKILL.md + loaded resources + initial prompt)
**Output:** JSON with file path, token count, percentage of total, and a sorted ranking.
**Implementation:** Python. Use `tiktoken` (PEP 723 dependency) for accuracy, or fall back to character approximation.
---
### 5. Dependency Graph Generator
**What:** Map dependencies between the current skill and external skills it invokes.
**Why:** Understand the skill's dependency surface. Catch references to skills that don't exist or have been renamed.
**Checks:**
- Parse `bmad-skill-manifest.yaml` for external skill references
- Parse SKILL.md and prompts for skill invocation patterns (`invoke`, `load`, skill name references)
- Build a dependency list with direction (this skill depends on X, Y depends on this skill)
**Output:** JSON adjacency list or DOT format (GraphViz). Include whether each dependency is required or optional.
**Implementation:** Python, JSON/YAML parsing with regex for invocation pattern detection.
---
### 6. Stage Flow Analyzer
**What:** Parse multi-stage workflow definitions to extract stage ordering, progression conditions, and routing logic.
**Why:** Complex workflows define stages with specific progression conditions. Misaligned stage ordering, missing progression gates, or unreachable stages cause workflow failures that are hard to debug at runtime.
**Checks:**
- Extract all defined stages from SKILL.md and prompt files
- Verify each stage has a clear entry condition and exit/progression condition
- Detect unreachable stages (no path leads to them)
- Detect dead-end stages (no progression and not marked as terminal)
- Validate stage ordering matches the documented flow
- Check for circular stage references
**Output:** JSON with stage list, progression map, and structural warnings.
**Implementation:** Python with regex for stage/condition extraction from markdown.
---
### 7. Config Variable Tracker
**What:** Find all `{var}` references across skill files and verify they are loaded or defined.
**Why:** Unresolved config variables cause runtime errors or produce literal `{var_name}` text in outputs. This is especially common after refactoring or renaming variables.
**Checks:**
- Scan all `.md` files for `{variable_name}` patterns
- Cross-reference against variables loaded by `bmad-init` or defined in config
- Distinguish template variables from literal text in code blocks
- Flag undefined variables and unused loaded variables
**Output:** JSON with variable name, locations where used, and whether it's defined/loaded.
**Implementation:** Python with regex scanning and config file parsing.
---
### 8. Resource Loading Analyzer
**What:** Map which resources are loaded at which point during skill execution.
**Why:** Resources loaded too early waste context. Resources never loaded are dead weight in the skill directory. Understanding the loading sequence helps optimize token budget.
**Checks:**
- Parse SKILL.md and prompts for `Load resource` / `Read` / file reference patterns
- Map each resource to the stage/prompt where it's first loaded
- Identify resources in `references/` that are never referenced
- Identify resources referenced but missing from `references/`
- Calculate cumulative token cost at each loading point
**Output:** JSON with resource file, loading trigger (which prompt/stage), and orphan/missing flags.
**Implementation:** Python with regex for load-pattern detection and directory scanning.
---
### 9. Subagent Pattern Detector
**What:** Detect whether a skill that processes multiple sources uses the BMad Advanced Context Pattern (subagent delegation).
**Why:** Skills processing 5+ sources without subagent delegation risk context overflow and degraded output quality. This pattern is required for high-source-count workflows.
**Checks:**
- Count distinct source/input references in the skill
- Look for subagent delegation patterns: "DO NOT read sources yourself", "delegate to sub-agents", `/tmp/analysis-` temp file patterns
- Check for sub-agent output templates (50-100 token summaries)
- Flag skills with 5+ sources that lack the pattern
**Output:** JSON with source count, pattern found/missing, and recommendations.
**Implementation:** Python with keyword search and context extraction.
---
### 10. Prompt Chain Validator
**What:** Trace the chain of prompt loads through a workflow and verify every path is valid.
**Why:** Workflows route between prompts based on user intent and stage progression. A broken link in the chain — a `Load foo.md` where `foo.md` doesn't exist — halts the workflow.
**Checks:**
- Extract all `Load *.md` prompt references from SKILL.md and every prompt file
- Verify each referenced prompt file exists
- Build a reachability map from SKILL.md entry points
- Flag prompts that exist but are unreachable from any entry point
**Output:** JSON with prompt chain map, broken links, and unreachable prompts.
**Implementation:** Python with regex extraction and file existence checks.
---
### 11. Skill Health Check (Composite)
**What:** Run all available validation scripts and aggregate results into a single report.
**Why:** One command to assess overall skill quality. Useful as a build gate or pre-commit check.
**Composition:** Runs scripts 1-10 in sequence, collects JSON outputs, aggregates findings by severity.
**Output:** Unified JSON health report with per-script results and overall status.
**Implementation:** Bash script orchestrating Python scripts, `jq` for JSON aggregation. Or a Python orchestrator using `subprocess`.
---
### 12. Skill Comparison Validator
**What:** Compare two versions of a skill (or two skills) for structural differences.
**Why:** Validate that changes during iteration didn't break structure. Useful for reviewing edits, comparing before/after optimization, or diffing a skill against a template.
**Checks:**
- Frontmatter changes
- Capability additions/removals in manifest
- New or removed prompt files
- Token count changes per file
- Stage flow changes (for workflows)
- Resource additions/removals
**Output:** JSON with categorized changes and severity assessment.
**Implementation:** Bash with `git diff` or file comparison, Python for structural analysis.
---
## Section 3: Script Output Standard and Implementation Checklist
### Script Output Standard
All scripts MUST output structured JSON for agent consumption:
```json
{
"script": "script-name",
"version": "1.0.0",
"skill_path": "/path/to/skill",
"timestamp": "2025-03-08T10:30:00Z",
"status": "pass|fail|warning",
"findings": [
{
"severity": "critical|high|medium|low|info",
"category": "structure|security|performance|consistency",
"location": {"file": "SKILL.md", "line": 42},
"issue": "Clear description",
"fix": "Specific action to resolve"
}
],
"summary": {
"total": 0,
"critical": 0,
"high": 0,
"medium": 0,
"low": 0
}
}
```
### Implementation Checklist
When creating new validation scripts:
- [ ] Uses `--help` for documentation (PEP 723 metadata)
- [ ] Accepts skill path as argument
- [ ] `-o` flag for output file (defaults to stdout)
- [ ] Writes diagnostics to stderr
- [ ] Returns meaningful exit codes: 0=pass, 1=fail, 2=error
- [ ] Includes `--verbose` flag for debugging
- [ ] Self-contained (PEP 723 for Python dependencies)
- [ ] No interactive prompts
- [ ] No network dependencies
- [ ] Outputs valid JSON to stdout
- [ ] Has tests in `scripts/tests/` subfolder