initial commit

2026-03-16 19:54:53 -04:00
commit bfe0e01254
3341 changed files with 483939 additions and 0 deletions
--- a/.gemini/skills/bmad-workflow-builder/references/script-opportunities-reference.md
+++ b/.gemini/skills/bmad-workflow-builder/references/script-opportunities-reference.md
@@ -0,0 +1,354 @@
+# Script Opportunities Reference — Workflow Builder
+
+## Core Principle
+
+Scripts handle deterministic operations (validate, transform, count). Prompts handle judgment (interpret, classify, decide). If a check has clear pass/fail criteria, it belongs in a script.
+
+---
+
+## Section 1: How to Spot Script Opportunities
+
+### The Determinism Test
+
+Ask two questions about any operation:
+
+1. **Given identical input, will it always produce identical output?** If yes, it's a script candidate.
+2. **Could you write a unit test with expected output?** If yes, it's definitely a script.
+
+**Script territory:** The operation has no ambiguity — same input, same result, every time.
+**Prompt territory:** The operation requires interpreting meaning, tone, or context — reasonable people could disagree on the output.
+
+### The Judgment Boundary
+
+| Scripts Handle | Prompts Handle |
+|----------------|----------------|
+| Fetch | Interpret |
+| Transform | Classify (with ambiguity) |
+| Validate | Create |
+| Count | Decide (with incomplete info) |
+| Parse | Evaluate quality |
+| Compare | Synthesize meaning |
+| Extract | Assess tone/style |
+| Format | Generate recommendations |
+| Check structure | Weigh tradeoffs |
+
+### Pattern Recognition Checklist
+
+When you see these verbs or patterns in a workflow's requirements, think scripts first:
+
+| Signal Verb / Pattern | Script Type | Example |
+|----------------------|-------------|---------|
+| validate | Validation script | "Validate frontmatter fields exist" |
+| count | Metric script | "Count tokens per file" |
+| extract | Data extraction | "Extract all config variable references" |
+| convert / transform | Transformation script | "Convert stage definitions to graph" |
+| compare | Comparison script | "Compare prompt frontmatter vs manifest" |
+| scan for | Pattern scanning | "Scan for orphaned template artifacts" |
+| check structure | File structure checker | "Check skill directory has required files" |
+| against schema | Schema validation | "Validate output against JSON schema" |
+| graph / map dependencies | Dependency analysis | "Map skill-to-skill dependencies" |
+| list all | Enumeration script | "List all resource files loaded by prompts" |
+| detect pattern | Pattern detector | "Detect subagent delegation patterns" |
+| diff / changes between | Diff analysis | "Show what changed between versions" |
+
+### The Outside-the-Box Test
+
+Scripts are not limited to validation. Push your thinking:
+
+- **Data gathering as script:** Could a script collect structured data (file sizes, dependency lists, config values) and return JSON for the LLM to interpret? The LLM gets pre-digested facts instead of reading raw files.
+- **Pre-processing:** Could a script reduce what the LLM needs to read? Extract only the relevant sections, strip boilerplate, summarize structure.
+- **Post-processing validation:** Could a script validate LLM output after generation? Check that generated YAML parses, that referenced files exist, that naming conventions are followed.
+- **Metric collection:** Could scripts count, measure, and tabulate so the LLM makes decisions based on numbers it didn't have to compute? Token counts, file counts, complexity scores — feed these to LLM judgment without making the LLM count.
+- **Workflow stage analysis:** Could a script parse stage definitions and progression conditions, giving the LLM a structural map without it needing to parse markdown?
+
+### Your Toolbox
+
+Scripts have access to the full capabilities of the execution environment. Think broadly — if you can express the logic as deterministic code, it's a script candidate.
+
+**Bash:** Full shell power — `jq`, `grep`, `awk`, `sed`, `find`, `diff`, `wc`, `sort`, `uniq`, `curl`, plus piping and composition. Great for file discovery, text processing, and orchestrating other scripts.
+
+**Python:** The entire standard library — `json`, `yaml`, `pathlib`, `re`, `argparse`, `collections`, `difflib`, `ast`, `csv`, `xml.etree`, `textwrap`, `dataclasses`, and more. Plus PEP 723 inline-declared dependencies for anything else: `tiktoken` for accurate token counting, `jsonschema` for schema validation, `pyyaml` for YAML parsing, etc.
+
+**System tools:** `git` commands for history, diff, blame, and log analysis. Filesystem operations for directory scanning and structure validation. Process execution for orchestrating multi-script pipelines.
+
+### The --help Pattern
+
+All scripts use PEP 723 metadata and implement `--help`. This creates a powerful integration pattern for prompts:
+
+Instead of inlining a script's interface details into a prompt, the prompt can simply say:
+
+> Run `scripts/foo.py --help` to understand its inputs and outputs, then invoke appropriately.
+
+This saves tokens in the prompt and keeps a single source of truth for the script's API. When a script's interface changes, the prompt doesn't need updating — `--help` always reflects the current contract.
+
+---
+
+## Section 2: Script Opportunity Catalog
+
+Each entry follows the format: What it does, Why it matters for workflows, What it checks, What it outputs, and Implementation notes.
+
+---
+
+### 1. Frontmatter Validator
+
+**What:** Validate SKILL.md frontmatter structure and content.
+
+**Why:** Frontmatter drives skill triggering and routing. Malformed frontmatter means the skill never activates or activates incorrectly.
+
+**Checks:**
+- `name` exists and is kebab-case
+- `description` exists and follows "Use when..." pattern
+- `argument-hint` is present if the skill accepts arguments
+- No forbidden fields or reserved prefixes
+- Optional fields have valid values if present
+
+**Output:** JSON with pass/fail per field, line numbers for errors.
+
+**Implementation:** Python with argparse, no external deps needed. Parse YAML frontmatter between `---` delimiters.
+
+---
+
+### 2. Template Artifact Scanner
+
+**What:** Scan all skill files for orphaned template substitution artifacts.
+
+**Why:** The build process may leave behind `{if-autonomous}`, `{displayName}`, `{skill-name}`, or other placeholders that should have been replaced. These cause runtime confusion.
+
+**Checks:**
+- Scan all `.md` files for `{placeholder}` patterns
+- Distinguish real config variables (loaded at runtime) from build-time artifacts
+- Flag any that don't match known runtime variables
+
+**Output:** JSON with file path, line number, artifact text, and whether it looks intentional.
+
+**Implementation:** Bash script with `grep` and `jq` for JSON output, or Python with regex.
+
+---
+
+### 3. Prompt Frontmatter Comparator
+
+**What:** Compare prompt file frontmatter against the skill's `bmad-skill-manifest.yaml`.
+
+**Why:** Capability misalignment between prompts and the manifest causes routing failures — the skill advertises a capability it can't deliver, or has a prompt that's never reachable.
+
+**Checks:**
+- Every prompt file at root has frontmatter with `name`, `description`, `menu-code`
+- Prompt `name` matches manifest capability name
+- `menu-code` matches manifest entry (case-insensitive)
+- Every manifest capability with `type: "prompt"` has a corresponding file
+- Flag orphaned prompts not listed in manifest
+
+**Output:** JSON with mismatches, missing files, orphaned prompts.
+
+**Implementation:** Python, reads `bmad-skill-manifest.yaml` and all prompt `.md` files at skill root.
+
+---
+
+### 4. Token Counter
+
+**What:** Count approximate token counts for each file in a skill.
+
+**Why:** Identify verbose files that need optimization. Catch skills that exceed context window budgets. Understand where token budget is spent across prompts, resources, and the SKILL.md.
+
+**Checks:**
+- Total tokens per `.md` file (approximate: chars / 4, or accurate via tiktoken)
+- Code block tokens vs prose tokens
+- Cumulative token cost of full skill activation (SKILL.md + loaded resources + initial prompt)
+
+**Output:** JSON with file path, token count, percentage of total, and a sorted ranking.
+
+**Implementation:** Python. Use `tiktoken` (PEP 723 dependency) for accuracy, or fall back to character approximation.
+
+---
+
+### 5. Dependency Graph Generator
+
+**What:** Map dependencies between the current skill and external skills it invokes.
+
+**Why:** Understand the skill's dependency surface. Catch references to skills that don't exist or have been renamed.
+
+**Checks:**
+- Parse `bmad-skill-manifest.yaml` for external skill references
+- Parse SKILL.md and prompts for skill invocation patterns (`invoke`, `load`, skill name references)
+- Build a dependency list with direction (this skill depends on X, Y depends on this skill)
+
+**Output:** JSON adjacency list or DOT format (GraphViz). Include whether each dependency is required or optional.
+
+**Implementation:** Python, JSON/YAML parsing with regex for invocation pattern detection.
+
+---
+
+### 6. Stage Flow Analyzer
+
+**What:** Parse multi-stage workflow definitions to extract stage ordering, progression conditions, and routing logic.
+
+**Why:** Complex workflows define stages with specific progression conditions. Misaligned stage ordering, missing progression gates, or unreachable stages cause workflow failures that are hard to debug at runtime.
+
+**Checks:**
+- Extract all defined stages from SKILL.md and prompt files
+- Verify each stage has a clear entry condition and exit/progression condition
+- Detect unreachable stages (no path leads to them)
+- Detect dead-end stages (no progression and not marked as terminal)
+- Validate stage ordering matches the documented flow
+- Check for circular stage references
+
+**Output:** JSON with stage list, progression map, and structural warnings.
+
+**Implementation:** Python with regex for stage/condition extraction from markdown.
+
+---
+
+### 7. Config Variable Tracker
+
+**What:** Find all `{var}` references across skill files and verify they are loaded or defined.
+
+**Why:** Unresolved config variables cause runtime errors or produce literal `{var_name}` text in outputs. This is especially common after refactoring or renaming variables.
+
+**Checks:**
+- Scan all `.md` files for `{variable_name}` patterns
+- Cross-reference against variables loaded by `bmad-init` or defined in config
+- Distinguish template variables from literal text in code blocks
+- Flag undefined variables and unused loaded variables
+
+**Output:** JSON with variable name, locations where used, and whether it's defined/loaded.
+
+**Implementation:** Python with regex scanning and config file parsing.
+
+---
+
+### 8. Resource Loading Analyzer
+
+**What:** Map which resources are loaded at which point during skill execution.
+
+**Why:** Resources loaded too early waste context. Resources never loaded are dead weight in the skill directory. Understanding the loading sequence helps optimize token budget.
+
+**Checks:**
+- Parse SKILL.md and prompts for `Load resource` / `Read` / file reference patterns
+- Map each resource to the stage/prompt where it's first loaded
+- Identify resources in `references/` that are never referenced
+- Identify resources referenced but missing from `references/`
+- Calculate cumulative token cost at each loading point
+
+**Output:** JSON with resource file, loading trigger (which prompt/stage), and orphan/missing flags.
+
+**Implementation:** Python with regex for load-pattern detection and directory scanning.
+
+---
+
+### 9. Subagent Pattern Detector
+
+**What:** Detect whether a skill that processes multiple sources uses the BMad Advanced Context Pattern (subagent delegation).
+
+**Why:** Skills processing 5+ sources without subagent delegation risk context overflow and degraded output quality. This pattern is required for high-source-count workflows.
+
+**Checks:**
+- Count distinct source/input references in the skill
+- Look for subagent delegation patterns: "DO NOT read sources yourself", "delegate to sub-agents", `/tmp/analysis-` temp file patterns
+- Check for sub-agent output templates (50-100 token summaries)
+- Flag skills with 5+ sources that lack the pattern
+
+**Output:** JSON with source count, pattern found/missing, and recommendations.
+
+**Implementation:** Python with keyword search and context extraction.
+
+---
+
+### 10. Prompt Chain Validator
+
+**What:** Trace the chain of prompt loads through a workflow and verify every path is valid.
+
+**Why:** Workflows route between prompts based on user intent and stage progression. A broken link in the chain — a `Load foo.md` where `foo.md` doesn't exist — halts the workflow.
+
+**Checks:**
+- Extract all `Load *.md` prompt references from SKILL.md and every prompt file
+- Verify each referenced prompt file exists
+- Build a reachability map from SKILL.md entry points
+- Flag prompts that exist but are unreachable from any entry point
+
+**Output:** JSON with prompt chain map, broken links, and unreachable prompts.
+
+**Implementation:** Python with regex extraction and file existence checks.
+
+---
+
+### 11. Skill Health Check (Composite)
+
+**What:** Run all available validation scripts and aggregate results into a single report.
+
+**Why:** One command to assess overall skill quality. Useful as a build gate or pre-commit check.
+
+**Composition:** Runs scripts 1-10 in sequence, collects JSON outputs, aggregates findings by severity.
+
+**Output:** Unified JSON health report with per-script results and overall status.
+
+**Implementation:** Bash script orchestrating Python scripts, `jq` for JSON aggregation. Or a Python orchestrator using `subprocess`.
+
+---
+
+### 12. Skill Comparison Validator
+
+**What:** Compare two versions of a skill (or two skills) for structural differences.
+
+**Why:** Validate that changes during iteration didn't break structure. Useful for reviewing edits, comparing before/after optimization, or diffing a skill against a template.
+
+**Checks:**
+- Frontmatter changes
+- Capability additions/removals in manifest
+- New or removed prompt files
+- Token count changes per file
+- Stage flow changes (for workflows)
+- Resource additions/removals
+
+**Output:** JSON with categorized changes and severity assessment.
+
+**Implementation:** Bash with `git diff` or file comparison, Python for structural analysis.
+
+---
+
+## Section 3: Script Output Standard and Implementation Checklist
+
+### Script Output Standard
+
+All scripts MUST output structured JSON for agent consumption:
+
+```json
+{
+  "script": "script-name",
+  "version": "1.0.0",
+  "skill_path": "/path/to/skill",
+  "timestamp": "2025-03-08T10:30:00Z",
+  "status": "pass|fail|warning",
+  "findings": [
+    {
+      "severity": "critical|high|medium|low|info",
+      "category": "structure|security|performance|consistency",
+      "location": {"file": "SKILL.md", "line": 42},
+      "issue": "Clear description",
+      "fix": "Specific action to resolve"
+    }
+  ],
+  "summary": {
+    "total": 0,
+    "critical": 0,
+    "high": 0,
+    "medium": 0,
+    "low": 0
+  }
+}
+```
+
+### Implementation Checklist
+
+When creating new validation scripts:
+
+- [ ] Uses `--help` for documentation (PEP 723 metadata)
+- [ ] Accepts skill path as argument
+- [ ] `-o` flag for output file (defaults to stdout)
+- [ ] Writes diagnostics to stderr
+- [ ] Returns meaningful exit codes: 0=pass, 1=fail, 2=error
+- [ ] Includes `--verbose` flag for debugging
+- [ ] Self-contained (PEP 723 for Python dependencies)
+- [ ] No interactive prompts
+- [ ] No network dependencies
+- [ ] Outputs valid JSON to stdout
+- [ ] Has tests in `scripts/tests/` subfolder