# Script Opportunities Reference — Workflow Builder ## Core Principle Scripts handle deterministic operations (validate, transform, count). Prompts handle judgment (interpret, classify, decide). If a check has clear pass/fail criteria, it belongs in a script. --- ## Section 1: How to Spot Script Opportunities ### The Determinism Test Ask two questions about any operation: 1. **Given identical input, will it always produce identical output?** If yes, it's a script candidate. 2. **Could you write a unit test with expected output?** If yes, it's definitely a script. **Script territory:** The operation has no ambiguity — same input, same result, every time. **Prompt territory:** The operation requires interpreting meaning, tone, or context — reasonable people could disagree on the output. ### The Judgment Boundary | Scripts Handle | Prompts Handle | |----------------|----------------| | Fetch | Interpret | | Transform | Classify (with ambiguity) | | Validate | Create | | Count | Decide (with incomplete info) | | Parse | Evaluate quality | | Compare | Synthesize meaning | | Extract | Assess tone/style | | Format | Generate recommendations | | Check structure | Weigh tradeoffs | ### Pattern Recognition Checklist When you see these verbs or patterns in a workflow's requirements, think scripts first: | Signal Verb / Pattern | Script Type | Example | |----------------------|-------------|---------| | validate | Validation script | "Validate frontmatter fields exist" | | count | Metric script | "Count tokens per file" | | extract | Data extraction | "Extract all config variable references" | | convert / transform | Transformation script | "Convert stage definitions to graph" | | compare | Comparison script | "Compare prompt frontmatter vs manifest" | | scan for | Pattern scanning | "Scan for orphaned template artifacts" | | check structure | File structure checker | "Check skill directory has required files" | | against schema | Schema validation | "Validate output against JSON schema" | | graph / map dependencies | Dependency analysis | "Map skill-to-skill dependencies" | | list all | Enumeration script | "List all resource files loaded by prompts" | | detect pattern | Pattern detector | "Detect subagent delegation patterns" | | diff / changes between | Diff analysis | "Show what changed between versions" | ### The Outside-the-Box Test Scripts are not limited to validation. Push your thinking: - **Data gathering as script:** Could a script collect structured data (file sizes, dependency lists, config values) and return JSON for the LLM to interpret? The LLM gets pre-digested facts instead of reading raw files. - **Pre-processing:** Could a script reduce what the LLM needs to read? Extract only the relevant sections, strip boilerplate, summarize structure. - **Post-processing validation:** Could a script validate LLM output after generation? Check that generated YAML parses, that referenced files exist, that naming conventions are followed. - **Metric collection:** Could scripts count, measure, and tabulate so the LLM makes decisions based on numbers it didn't have to compute? Token counts, file counts, complexity scores — feed these to LLM judgment without making the LLM count. - **Workflow stage analysis:** Could a script parse stage definitions and progression conditions, giving the LLM a structural map without it needing to parse markdown? ### Your Toolbox Scripts have access to the full capabilities of the execution environment. Think broadly — if you can express the logic as deterministic code, it's a script candidate. **Bash:** Full shell power — `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, `sort`, `uniq`, `curl`, plus piping and composition. Great for file discovery, text processing, and orchestrating other scripts. **Python:** The entire standard library — `json`, `yaml`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml.etree`, `textwrap`, `dataclasses`, and more. Plus PEP 723 inline-declared dependencies for anything else: `tiktoken` for accurate token counting, `jsonschema` for schema validation, `pyyaml` for YAML parsing, etc. **System tools:** `git` commands for history, diff, blame, and log analysis. Filesystem operations for directory scanning and structure validation. Process execution for orchestrating multi-script pipelines. ### The --help Pattern All scripts use PEP 723 metadata and implement `--help`. This creates a powerful integration pattern for prompts: Instead of inlining a script's interface details into a prompt, the prompt can simply say: > Run `scripts/foo.py --help` to understand its inputs and outputs, then invoke appropriately. This saves tokens in the prompt and keeps a single source of truth for the script's API. When a script's interface changes, the prompt doesn't need updating — `--help` always reflects the current contract. --- ## Section 2: Script Opportunity Catalog Each entry follows the format: What it does, Why it matters for workflows, What it checks, What it outputs, and Implementation notes. --- ### 1. Frontmatter Validator **What:** Validate SKILL.md frontmatter structure and content. **Why:** Frontmatter drives skill triggering and routing. Malformed frontmatter means the skill never activates or activates incorrectly. **Checks:** - `name` exists and is kebab-case - `description` exists and follows "Use when..." pattern - `argument-hint` is present if the skill accepts arguments - No forbidden fields or reserved prefixes - Optional fields have valid values if present **Output:** JSON with pass/fail per field, line numbers for errors. **Implementation:** Python with argparse, no external deps needed. Parse YAML frontmatter between `---` delimiters. --- ### 2. Template Artifact Scanner **What:** Scan all skill files for orphaned template substitution artifacts. **Why:** The build process may leave behind `{if-autonomous}`, `{displayName}`, `{skill-name}`, or other placeholders that should have been replaced. These cause runtime confusion. **Checks:** - Scan all `.md` files for `{placeholder}` patterns - Distinguish real config variables (loaded at runtime) from build-time artifacts - Flag any that don't match known runtime variables **Output:** JSON with file path, line number, artifact text, and whether it looks intentional. **Implementation:** Bash script with `grep` and `jq` for JSON output, or Python with regex. --- ### 3. Prompt Frontmatter Comparator **What:** Compare prompt file frontmatter against the skill's `bmad-skill-manifest.yaml`. **Why:** Capability misalignment between prompts and the manifest causes routing failures — the skill advertises a capability it can't deliver, or has a prompt that's never reachable. **Checks:** - Every prompt file at root has frontmatter with `name`, `description`, `menu-code` - Prompt `name` matches manifest capability name - `menu-code` matches manifest entry (case-insensitive) - Every manifest capability with `type: "prompt"` has a corresponding file - Flag orphaned prompts not listed in manifest **Output:** JSON with mismatches, missing files, orphaned prompts. **Implementation:** Python, reads `bmad-skill-manifest.yaml` and all prompt `.md` files at skill root. --- ### 4. Token Counter **What:** Count approximate token counts for each file in a skill. **Why:** Identify verbose files that need optimization. Catch skills that exceed context window budgets. Understand where token budget is spent across prompts, resources, and the SKILL.md. **Checks:** - Total tokens per `.md` file (approximate: chars / 4, or accurate via tiktoken) - Code block tokens vs prose tokens - Cumulative token cost of full skill activation (SKILL.md + loaded resources + initial prompt) **Output:** JSON with file path, token count, percentage of total, and a sorted ranking. **Implementation:** Python. Use `tiktoken` (PEP 723 dependency) for accuracy, or fall back to character approximation. --- ### 5. Dependency Graph Generator **What:** Map dependencies between the current skill and external skills it invokes. **Why:** Understand the skill's dependency surface. Catch references to skills that don't exist or have been renamed. **Checks:** - Parse `bmad-skill-manifest.yaml` for external skill references - Parse SKILL.md and prompts for skill invocation patterns (`invoke`, `load`, skill name references) - Build a dependency list with direction (this skill depends on X, Y depends on this skill) **Output:** JSON adjacency list or DOT format (GraphViz). Include whether each dependency is required or optional. **Implementation:** Python, JSON/YAML parsing with regex for invocation pattern detection. --- ### 6. Stage Flow Analyzer **What:** Parse multi-stage workflow definitions to extract stage ordering, progression conditions, and routing logic. **Why:** Complex workflows define stages with specific progression conditions. Misaligned stage ordering, missing progression gates, or unreachable stages cause workflow failures that are hard to debug at runtime. **Checks:** - Extract all defined stages from SKILL.md and prompt files - Verify each stage has a clear entry condition and exit/progression condition - Detect unreachable stages (no path leads to them) - Detect dead-end stages (no progression and not marked as terminal) - Validate stage ordering matches the documented flow - Check for circular stage references **Output:** JSON with stage list, progression map, and structural warnings. **Implementation:** Python with regex for stage/condition extraction from markdown. --- ### 7. Config Variable Tracker **What:** Find all `{var}` references across skill files and verify they are loaded or defined. **Why:** Unresolved config variables cause runtime errors or produce literal `{var_name}` text in outputs. This is especially common after refactoring or renaming variables. **Checks:** - Scan all `.md` files for `{variable_name}` patterns - Cross-reference against variables loaded by `bmad-init` or defined in config - Distinguish template variables from literal text in code blocks - Flag undefined variables and unused loaded variables **Output:** JSON with variable name, locations where used, and whether it's defined/loaded. **Implementation:** Python with regex scanning and config file parsing. --- ### 8. Resource Loading Analyzer **What:** Map which resources are loaded at which point during skill execution. **Why:** Resources loaded too early waste context. Resources never loaded are dead weight in the skill directory. Understanding the loading sequence helps optimize token budget. **Checks:** - Parse SKILL.md and prompts for `Load resource` / `Read` / file reference patterns - Map each resource to the stage/prompt where it's first loaded - Identify resources in `references/` that are never referenced - Identify resources referenced but missing from `references/` - Calculate cumulative token cost at each loading point **Output:** JSON with resource file, loading trigger (which prompt/stage), and orphan/missing flags. **Implementation:** Python with regex for load-pattern detection and directory scanning. --- ### 9. Subagent Pattern Detector **What:** Detect whether a skill that processes multiple sources uses the BMad Advanced Context Pattern (subagent delegation). **Why:** Skills processing 5+ sources without subagent delegation risk context overflow and degraded output quality. This pattern is required for high-source-count workflows. **Checks:** - Count distinct source/input references in the skill - Look for subagent delegation patterns: "DO NOT read sources yourself", "delegate to sub-agents", `/tmp/analysis-` temp file patterns - Check for sub-agent output templates (50-100 token summaries) - Flag skills with 5+ sources that lack the pattern **Output:** JSON with source count, pattern found/missing, and recommendations. **Implementation:** Python with keyword search and context extraction. --- ### 10. Prompt Chain Validator **What:** Trace the chain of prompt loads through a workflow and verify every path is valid. **Why:** Workflows route between prompts based on user intent and stage progression. A broken link in the chain — a `Load foo.md` where `foo.md` doesn't exist — halts the workflow. **Checks:** - Extract all `Load *.md` prompt references from SKILL.md and every prompt file - Verify each referenced prompt file exists - Build a reachability map from SKILL.md entry points - Flag prompts that exist but are unreachable from any entry point **Output:** JSON with prompt chain map, broken links, and unreachable prompts. **Implementation:** Python with regex extraction and file existence checks. --- ### 11. Skill Health Check (Composite) **What:** Run all available validation scripts and aggregate results into a single report. **Why:** One command to assess overall skill quality. Useful as a build gate or pre-commit check. **Composition:** Runs scripts 1-10 in sequence, collects JSON outputs, aggregates findings by severity. **Output:** Unified JSON health report with per-script results and overall status. **Implementation:** Bash script orchestrating Python scripts, `jq` for JSON aggregation. Or a Python orchestrator using `subprocess`. --- ### 12. Skill Comparison Validator **What:** Compare two versions of a skill (or two skills) for structural differences. **Why:** Validate that changes during iteration didn't break structure. Useful for reviewing edits, comparing before/after optimization, or diffing a skill against a template. **Checks:** - Frontmatter changes - Capability additions/removals in manifest - New or removed prompt files - Token count changes per file - Stage flow changes (for workflows) - Resource additions/removals **Output:** JSON with categorized changes and severity assessment. **Implementation:** Bash with `git diff` or file comparison, Python for structural analysis. --- ## Section 3: Script Output Standard and Implementation Checklist ### Script Output Standard All scripts MUST output structured JSON for agent consumption: ```json { "script": "script-name", "version": "1.0.0", "skill_path": "/path/to/skill", "timestamp": "2025-03-08T10:30:00Z", "status": "pass|fail|warning", "findings": [ { "severity": "critical|high|medium|low|info", "category": "structure|security|performance|consistency", "location": {"file": "SKILL.md", "line": 42}, "issue": "Clear description", "fix": "Specific action to resolve" } ], "summary": { "total": 0, "critical": 0, "high": 0, "medium": 0, "low": 0 } } ``` ### Implementation Checklist When creating new validation scripts: - [ ] Uses `--help` for documentation (PEP 723 metadata) - [ ] Accepts skill path as argument - [ ] `-o` flag for output file (defaults to stdout) - [ ] Writes diagnostics to stderr - [ ] Returns meaningful exit codes: 0=pass, 1=fail, 2=error - [ ] Includes `--verbose` flag for debugging - [ ] Self-contained (PEP 723 for Python dependencies) - [ ] No interactive prompts - [ ] No network dependencies - [ ] Outputs valid JSON to stdout - [ ] Has tests in `scripts/tests/` subfolder