Files
bi-agents/.gemini/skills/bmad-distillator/resources/splitting-strategy.md
Cassel 647cbec54f docs: update all documentation and add AI tooling configs
- Rewrite README.md with current architecture, features and stack
- Update docs/API.md with all current endpoints (corporate, BI, client 360)
- Update docs/ARCHITECTURE.md with cache, modular queries, services, ETL
- Update docs/GUIA-USUARIO.md for all roles (admin, corporate, agente)
- Add docs/INDEX.md documentation index
- Add PROJETO.md comprehensive project reference
- Add BI-CCC-Implementation-Guide.md
- Include AI agent configs (.claude, .agents, .gemini, _bmad)
- Add netbird VPN configuration
- Add status report

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:29:03 -04:00

3.3 KiB

Semantic Splitting Strategy

When the source content is large (exceeds ~15,000 tokens) or a token_budget requires it, split the distillate into semantically coherent sections rather than arbitrary size breaks.

Why Semantic Over Size-Based

Arbitrary splits (every N tokens) break coherence. A downstream workflow loading "part 2 of 4" gets context fragments. Semantic splits produce self-contained topic clusters that a workflow can load selectively — "give me just the technical decisions section" — which is more useful and more token-efficient for the consumer.

Splitting Process

1. Identify Natural Boundaries

After the initial extraction and deduplication (Steps 1-2 of the compression process), look for natural semantic boundaries:

  • Distinct problem domains or functional areas
  • Different stakeholder perspectives (users, technical, business)
  • Temporal boundaries (current state vs future vision)
  • Scope boundaries (in-scope vs out-of-scope vs deferred)
  • Phase boundaries (analysis, design, implementation)

Choose boundaries that produce sections a downstream workflow might load independently.

2. Assign Items to Sections

For each extracted item, assign it to the most relevant section. Items that span multiple sections go in the root distillate.

Cross-cutting items (items relevant to multiple sections):

  • Constraints that affect all areas → root distillate
  • Decisions with broad impact → root distillate
  • Section-specific decisions → section distillate

3. Produce Root Distillate

The root distillate contains:

  • Orientation (3-5 bullets): what was distilled, from what sources, for what consumer, how many sections
  • Cross-references: list of section distillates with 1-line descriptions
  • Cross-cutting items: facts, decisions, and constraints that span multiple sections
  • Scope summary: high-level in/out/deferred if applicable

4. Produce Section Distillates

Each section distillate must be self-sufficient — a reader loading only one section should understand it without the others.

Each section includes:

  • Context header (1 line): "This section covers [topic]. Part N of M from [source document names]."
  • Section content: thematically-grouped bullets following the same compression rules as a single distillate
  • Cross-references (if needed): pointers to other sections for related content

5. Output Structure

Create a folder {base-name}-distillate/ containing:

{base-name}-distillate/
├── _index.md           # Root distillate: orientation, cross-cutting items, section manifest
├── 01-{topic-slug}.md  # Self-contained section
├── 02-{topic-slug}.md
└── 03-{topic-slug}.md

Example:

product-brief-distillate/
├── _index.md
├── 01-problem-solution.md
├── 02-technical-decisions.md
└── 03-users-market.md

Size Targets

When a token_budget is specified:

  • Root distillate: ~20% of budget (orientation + cross-cutting items)
  • Remaining budget split proportionally across sections based on content density
  • If a section exceeds its proportional share, compress more aggressively or sub-split

When no token_budget but splitting is needed:

  • Aim for sections of 3,000-5,000 tokens each
  • Root distillate as small as possible while remaining useful standalone