---
epic: 1
story: 1.1
title: "Lexer & Tokenizer"
status: draft
---

## Epic 1 — Core Calculation Engine (Rust Crate)
**Goal:** Build `calcpad-engine` as a standalone Rust crate that powers all platforms. This is the foundation.

### Story 1.1: Lexer & Tokenizer

As a CalcPad engine consumer,
I want input lines tokenized into a well-defined token stream,
So that the parser can build an AST from structured, unambiguous tokens rather than raw text.

**Acceptance Criteria:**

**Given** an input line containing an integer such as `42`
**When** the lexer tokenizes the input
**Then** it produces a single `Number` token with value `42`
**And** no heap allocations occur for this simple expression

**Given** an input line containing a decimal number such as `3.14`
**When** the lexer tokenizes the input
**Then** it produces a single `Number` token with value `3.14`

**Given** an input line containing a negative number such as `-7`
**When** the lexer tokenizes the input
**Then** it produces tokens representing the negation operator and the number `7`

**Given** an input line containing scientific notation such as `6.022e23`
**When** the lexer tokenizes the input
**Then** it produces a single `Number` token with value `6.022e23`

**Given** an input line containing SI scale suffixes such as `5k`, `2.5M`, or `1B`
**When** the lexer tokenizes the input
**Then** it produces `Number` tokens with values `5000`, `2500000`, and `1000000000` respectively

**Given** an input line containing currency symbols such as `$20`, `€15`, `£10`, `¥500`, or `R$100`
**When** the lexer tokenizes the input
**Then** it produces `CurrencySymbol` tokens paired with their `Number` tokens
**And** multi-character symbols like `R$` are recognized as a single token

**Given** an input line containing unit suffixes such as `5kg`, `200g`, or `3.5m`
**When** the lexer tokenizes the input
**Then** it produces `Number` tokens followed by `Unit` tokens

**Given** an input line containing arithmetic operators `+`, `-`, `*`, `/`, `^`, `%`
**When** the lexer tokenizes the input
**Then** it produces the corresponding `Operator` tokens

**Given** an input line containing natural language operators such as `plus`, `minus`, `times`, or `divided by`
**When** the lexer tokenizes the input
**Then** it produces the same `Operator` tokens as their symbolic equivalents
**And** `divided by` is recognized as a single two-word operator

**Given** an input line containing a variable assignment such as `x = 10`
**When** the lexer tokenizes the input
**Then** it produces an `Identifier` token, an `Assign` token, and a `Number` token

**Given** an input line containing a comment such as `// this is a note`
**When** the lexer tokenizes the input
**Then** it produces a `Comment` token containing the comment text
**And** the comment token is preserved for display but excluded from evaluation

**Given** an input line containing plain text with no calculable expression
**When** the lexer tokenizes the input
**Then** it produces a `Text` token representing the entire line

**Given** an input line containing mixed content such as `$20 in euro - 5% discount`
**When** the lexer tokenizes the input
**Then** it produces tokens for the currency value, the conversion keyword, the currency target, the operator, the percentage, and the keyword
**And** each token includes its byte span (start, end) within the input