1.2.0 • Published 6 months ago

@mazard/scanner v1.2.0

Weekly downloads
-
License
MIT
Repository
github
Last release
6 months ago

Mazard Scanner

This scanner converts a Markdown document into an array of tokens. These tokens can then be interpreted by a parser into an expression tree. Much inspiration has been taken from Robert Nystrom's Crafting Interpreters as well as Alfred Aho's The Theory of Parsing, Translation, and Compiling.

Tokens types

TypeDescriptionExample
SYMBOLAn alphanumeric string that closely resembles a variable name in other languagesFoo, foo, foo-bar, foo_bar
RUNESimilar to a symbol, but these strings contain non-alphanumeric contentFoo#, -foo, _foo, 1foo, fo>o
NUMBERAn integer, decimal, or a number in exponential notation1, 1.0, +1, -1, 1.0e1
SPACEOne ore more space characters. The literal value is the number of spaces encountered.
TABA "\t" or " " at the start of a line.
BROne or more line break characters.
COLONA, well, colon:
COLON_COLONTwo colons in sequence, likely indicated an Obsidian metadata value::
FRONTMATTER_STARTThe triple-dash at the start of a frontmatter section---
FRONTMATTER_ENDThe triple-dash at the end of a frontmatter section---
FRONTMATTER_KEYA frontmatter keyThe foo in foo: bar
FRONTMATTER_VALUEA frontmatter valueThe bar in foo: bar
FRONTMATTER_BULLETA dash at the beginning of a lineThe - in - bar
CODE_STARTThe triple-backtick at the start of a code section```
CODE_LANGUAGEThe language specified after the triple backticks of a CODE_STARTThe typescript in \``typescript`
CODE_KEYSimilar to frontmatter, code blocks can have keys and values after the CODE_STARTThe foo in foo: bar
CODE_VALUEA metadata code valueThe bar in foo: bar
CODE_SOURCEThe source code inside of a code block
CODE_ENDThe triple-backtick at the end of a code section```
HHASHA one- to six-legged hash tag at the beginning of a lineThe ### in ### Foo
HGTHANA > at the beginning of a lineThe > in > Foo
L_BRACKETA single left bracket[
LL_BRACKETTwo left brackets[[
R_BRACKETA single right bracket[
RR_BRACKETTwo right brackets]]
LL_BRACETwo left braces{{
RR_BRACETwo right braces}}
ASTERISKA single asterisk*
ASTERISK_ASTERISKTwo asterisks**
EQUALS_EQUALSTwo equals signs==
ORDINALA number with an ordinal suffix1st, 2nd, 3rd, 4th
PIPEA bar pipe\|
TAGA symbol prefixed with a hashtag#tag, #tag-foo #tag1
TILDE_TILDETwo tildes~~
ESCAPEA backslash followed by any character\|
L_PARENA left parenthesis(
R_PARENA right parenthesis)
BACKTICKA single backtick```
DOLLARA dollar sign$
DOLLAR_DOLLARTwo dollar signs$$
PERCENT_PERCENTTwo percent signs%%
COMMENTThe content of a commentA comment in %% A comment
HTML_TAGAn html tag<div>, </div>, <p />
HRA horizontal rule---, ***, ___
BULLETA dash or asterisk at the beginning of a lineThe - in - foo
N_BULLETA numbered bullet at the beginning of a lineThe 1. in 1. foo
CHECKBOXA checkbox at the beginning of a lineThe - [ ] in - [ ] foo
URLA urlhttps://www.google.com
EOFThe very end of the string or file

Some examples

const tokens = scanTokens([
	"# Mazard Scanner",
	"",
	"This scanner converts a Markdown document into an array of tokens.",
]);

printTokens(tokens);
NoTypeLexemeLiteralLineColumn
0HHASH"#"100
1SPACE" "101
2SYMBOL"Mazard""Mazard"02
3SPACE" "108
4SYMBOL"Scanner""Scanner"09
5BR"\n\n"2016
6SYMBOL"This""This"20
7SPACE" "124
8SYMBOL"scanner""scanner"25
9SPACE" "1212
10SYMBOL"converts""converts"213
11SPACE" "1221
12SYMBOL"a""a"222
13SPACE" "1223
14SYMBOL"Markdown""Markdown"224
15SPACE" "1232
16SYMBOL"document""document"233
17SPACE" "1241
18SYMBOL"into""into"242
19SPACE" "1246
20SYMBOL"an""an"247
21SPACE" "1249
22SYMBOL"array""array"250
23SPACE" "1255
24SYMBOL"of""of"256
25SPACE" "1258
26RUNE"tokens.""tokens."259
27EOF""""266
const tokens = scanTokens("here's a *line* with some ~~formatting~~.");
printTokens(tokens);
NoTypeLexemeLiteralLineColumn
0RUNE"here's""here's"00
1SPACE" "106
2SYMBOL"a""a"07
3SPACE" "108
4ASTERISK"*""*"09
5SYMBOL"line""line"010
6ASTERISK"*""*"014
7SPACE" "1015
8SYMBOL"with""with"016
9SPACE" "1020
10SYMBOL"some""some"021
11SPACE" "1025
12TILDE_TILDE"~~""~~"026
13SYMBOL"formatting""formatting"028
14TILDE_TILDE"~~""~~"038
15RUNE".""."040
16EOF""""041
const tokens = scanTokens([
	"- [x] Finish the scanner.",
	"- [ ] Write some reasonable documentation",
]);

printTokens(tokens);
NoTypeLexemeLiteralLineColumn
0CHECKBOX"- x"true00
1SPACE" "105
2SYMBOL"Finish""Finish"06
3SPACE" "1012
4SYMBOL"the""the"013
5SPACE" "1016
6RUNE"scanner.""scanner."017
7BR"\n"1025
8CHECKBOX"- "false10
9SPACE" "115
10SYMBOL"Write""Write"16
11SPACE" "1111
12SYMBOL"some""some"112
13SPACE" "1116
14SYMBOL"reasonable""reasonable"117
15SPACE" "1127
16SYMBOL"documentation""documentation"128
17EOF""""141
1.2.0

6 months ago

1.3.0

7 months ago

1.1.0

7 months ago