3.3 KiB
epic, story, title, status
| epic | story | title | status |
|---|---|---|---|
| 1 | 1.1 | Lexer & Tokenizer | draft |
Epic 1 — Core Calculation Engine (Rust Crate)
Goal: Build calcpad-engine as a standalone Rust crate that powers all platforms. This is the foundation.
Story 1.1: Lexer & Tokenizer
As a CalcPad engine consumer, I want input lines tokenized into a well-defined token stream, So that the parser can build an AST from structured, unambiguous tokens rather than raw text.
Acceptance Criteria:
Given an input line containing an integer such as 42
When the lexer tokenizes the input
Then it produces a single Number token with value 42
And no heap allocations occur for this simple expression
Given an input line containing a decimal number such as 3.14
When the lexer tokenizes the input
Then it produces a single Number token with value 3.14
Given an input line containing a negative number such as -7
When the lexer tokenizes the input
Then it produces tokens representing the negation operator and the number 7
Given an input line containing scientific notation such as 6.022e23
When the lexer tokenizes the input
Then it produces a single Number token with value 6.022e23
Given an input line containing SI scale suffixes such as 5k, 2.5M, or 1B
When the lexer tokenizes the input
Then it produces Number tokens with values 5000, 2500000, and 1000000000 respectively
Given an input line containing currency symbols such as $20, €15, £10, ¥500, or R$100
When the lexer tokenizes the input
Then it produces CurrencySymbol tokens paired with their Number tokens
And multi-character symbols like R$ are recognized as a single token
Given an input line containing unit suffixes such as 5kg, 200g, or 3.5m
When the lexer tokenizes the input
Then it produces Number tokens followed by Unit tokens
Given an input line containing arithmetic operators +, -, *, /, ^, %
When the lexer tokenizes the input
Then it produces the corresponding Operator tokens
Given an input line containing natural language operators such as plus, minus, times, or divided by
When the lexer tokenizes the input
Then it produces the same Operator tokens as their symbolic equivalents
And divided by is recognized as a single two-word operator
Given an input line containing a variable assignment such as x = 10
When the lexer tokenizes the input
Then it produces an Identifier token, an Assign token, and a Number token
Given an input line containing a comment such as // this is a note
When the lexer tokenizes the input
Then it produces a Comment token containing the comment text
And the comment token is preserved for display but excluded from evaluation
Given an input line containing plain text with no calculable expression
When the lexer tokenizes the input
Then it produces a Text token representing the entire line
Given an input line containing mixed content such as $20 in euro - 5% discount
When the lexer tokenizes the input
Then it produces tokens for the currency value, the conversion keyword, the currency target, the operator, the percentage, and the keyword
And each token includes its byte span (start, end) within the input