initial commit

2026-03-16 19:54:53 -04:00
commit bfe0e01254
3341 changed files with 483939 additions and 0 deletions
--- a/_bmad-output/implementation-artifacts/1-1-lexer-and-tokenizer.md
+++ b/_bmad-output/implementation-artifacts/1-1-lexer-and-tokenizer.md
@@ -0,0 +1,74 @@
+---
+epic: 1
+story: 1.1
+title: "Lexer & Tokenizer"
+status: draft
+---
+
+## Epic 1 — Core Calculation Engine (Rust Crate)
+**Goal:** Build `calcpad-engine` as a standalone Rust crate that powers all platforms. This is the foundation.
+
+### Story 1.1: Lexer & Tokenizer
+
+As a CalcPad engine consumer,
+I want input lines tokenized into a well-defined token stream,
+So that the parser can build an AST from structured, unambiguous tokens rather than raw text.
+
+**Acceptance Criteria:**
+
+**Given** an input line containing an integer such as `42`
+**When** the lexer tokenizes the input
+**Then** it produces a single `Number` token with value `42`
+**And** no heap allocations occur for this simple expression
+
+**Given** an input line containing a decimal number such as `3.14`
+**When** the lexer tokenizes the input
+**Then** it produces a single `Number` token with value `3.14`
+
+**Given** an input line containing a negative number such as `-7`
+**When** the lexer tokenizes the input
+**Then** it produces tokens representing the negation operator and the number `7`
+
+**Given** an input line containing scientific notation such as `6.022e23`
+**When** the lexer tokenizes the input
+**Then** it produces a single `Number` token with value `6.022e23`
+
+**Given** an input line containing SI scale suffixes such as `5k`, `2.5M`, or `1B`
+**When** the lexer tokenizes the input
+**Then** it produces `Number` tokens with values `5000`, `2500000`, and `1000000000` respectively
+
+**Given** an input line containing currency symbols such as `$20`, `€15`, `£10`, `¥500`, or `R$100`
+**When** the lexer tokenizes the input
+**Then** it produces `CurrencySymbol` tokens paired with their `Number` tokens
+**And** multi-character symbols like `R$` are recognized as a single token
+
+**Given** an input line containing unit suffixes such as `5kg`, `200g`, or `3.5m`
+**When** the lexer tokenizes the input
+**Then** it produces `Number` tokens followed by `Unit` tokens
+
+**Given** an input line containing arithmetic operators `+`, `-`, `*`, `/`, `^`, `%`
+**When** the lexer tokenizes the input
+**Then** it produces the corresponding `Operator` tokens
+
+**Given** an input line containing natural language operators such as `plus`, `minus`, `times`, or `divided by`
+**When** the lexer tokenizes the input
+**Then** it produces the same `Operator` tokens as their symbolic equivalents
+**And** `divided by` is recognized as a single two-word operator
+
+**Given** an input line containing a variable assignment such as `x = 10`
+**When** the lexer tokenizes the input
+**Then** it produces an `Identifier` token, an `Assign` token, and a `Number` token
+
+**Given** an input line containing a comment such as `// this is a note`
+**When** the lexer tokenizes the input
+**Then** it produces a `Comment` token containing the comment text
+**And** the comment token is preserved for display but excluded from evaluation
+
+**Given** an input line containing plain text with no calculable expression
+**When** the lexer tokenizes the input
+**Then** it produces a `Text` token representing the entire line
+
+**Given** an input line containing mixed content such as `$20 in euro - 5% discount`
+**When** the lexer tokenizes the input
+**Then** it produces tokens for the currency value, the conversion keyword, the currency target, the operator, the percentage, and the keyword
+**And** each token includes its byte span (start, end) within the input