--- epic: 1 story: 1.1 title: "Lexer & Tokenizer" status: draft --- ## Epic 1 — Core Calculation Engine (Rust Crate) **Goal:** Build `calcpad-engine` as a standalone Rust crate that powers all platforms. This is the foundation. ### Story 1.1: Lexer & Tokenizer As a CalcPad engine consumer, I want input lines tokenized into a well-defined token stream, So that the parser can build an AST from structured, unambiguous tokens rather than raw text. **Acceptance Criteria:** **Given** an input line containing an integer such as `42` **When** the lexer tokenizes the input **Then** it produces a single `Number` token with value `42` **And** no heap allocations occur for this simple expression **Given** an input line containing a decimal number such as `3.14` **When** the lexer tokenizes the input **Then** it produces a single `Number` token with value `3.14` **Given** an input line containing a negative number such as `-7` **When** the lexer tokenizes the input **Then** it produces tokens representing the negation operator and the number `7` **Given** an input line containing scientific notation such as `6.022e23` **When** the lexer tokenizes the input **Then** it produces a single `Number` token with value `6.022e23` **Given** an input line containing SI scale suffixes such as `5k`, `2.5M`, or `1B` **When** the lexer tokenizes the input **Then** it produces `Number` tokens with values `5000`, `2500000`, and `1000000000` respectively **Given** an input line containing currency symbols such as `$20`, `€15`, `£10`, `¥500`, or `R$100` **When** the lexer tokenizes the input **Then** it produces `CurrencySymbol` tokens paired with their `Number` tokens **And** multi-character symbols like `R$` are recognized as a single token **Given** an input line containing unit suffixes such as `5kg`, `200g`, or `3.5m` **When** the lexer tokenizes the input **Then** it produces `Number` tokens followed by `Unit` tokens **Given** an input line containing arithmetic operators `+`, `-`, `*`, `/`, `^`, `%` **When** the lexer tokenizes the input **Then** it produces the corresponding `Operator` tokens **Given** an input line containing natural language operators such as `plus`, `minus`, `times`, or `divided by` **When** the lexer tokenizes the input **Then** it produces the same `Operator` tokens as their symbolic equivalents **And** `divided by` is recognized as a single two-word operator **Given** an input line containing a variable assignment such as `x = 10` **When** the lexer tokenizes the input **Then** it produces an `Identifier` token, an `Assign` token, and a `Number` token **Given** an input line containing a comment such as `// this is a note` **When** the lexer tokenizes the input **Then** it produces a `Comment` token containing the comment text **And** the comment token is preserved for display but excluded from evaluation **Given** an input line containing plain text with no calculable expression **When** the lexer tokenizes the input **Then** it produces a `Text` token representing the entire line **Given** an input line containing mixed content such as `$20 in euro - 5% discount` **When** the lexer tokenizes the input **Then** it produces tokens for the currency value, the conversion keyword, the currency target, the operator, the percentage, and the keyword **And** each token includes its byte span (start, end) within the input