dbay-sql-lexer v1.1.0
𓆤DBay SQL Lexer
Table of Contents generated with DocToc
𓆤DBay SQL Lexer
The DBay SQL Lexer takes an SQL string as input and returns a list of tokens in the format { type, text,
idx, }
:
tokens = ( require 'dbay-sqlite-parser' ).tokenize """select * from my_table"""
gives
[ { type: 'select', text: 'select', idx: 0 },
{ type: 'star', text: '*', idx: 7 },
{ type: 'from', text: 'from', idx: 9 },
{ type: 'identifier', text: 'my_table', idx: 14 } ]
Acknowledgements
The DBay SQL Lexer is a fork of mistic100/sql-parser, with much of the original code that was outside the scope of a lexer removed.
To Do
- – documentation
- – make lexer accept Unicode identifiers
– regex on line 176 is incorrect because backticks can occur independently of each other:
LITERAL = /^`?([a-z_][a-z0-9_]{0,}(:(number|float|string|date|boolean))?)`?/iu
– implement correct identifier parsing; from Requirements For The SQLite Tokenizer: Identifier tokens:
Identifiers follow the usual rules with the exception that SQLite allows the dollar-sign symbol in the interior of an identifier. The dollar-sign is for compatibility with Microsoft SQL-Server and is not part of the SQL standard.
H41130: SQLite shall recognize as an ID token any sequence of characters that begins with an ALPHABETIC character and continue with zero or more ALPHANUMERIC characters and/or "$" (u0024) characters and which is not a keyword token. Identifiers can be arbitrary character strings within square brackets. This feature is also for compatibility with Microsoft SQL-Server and not a part of the SQL standard.
H41130: SQLite shall recognize as an ID token any sequence of characters that begins with an ALPHABETIC character and continue with zero or more ALPHANUMERIC characters and/or "$" (u0024) characters and which is not a keyword token. Identifiers can be arbitrary character strings within square brackets. This feature is also for compatibility with Microsoft SQL-Server and not a part of the SQL standard.
H41140: SQLite shall recognize as an ID token any sequence of non-zero characters that begins with "" (u005b) and continuing through the first "" (u005d) character. The standard way of quoting SQL identifiers is to use double-quotes.
H41140: SQLite shall recognize as an ID token any sequence of non-zero characters that begins with "" (u005b) and continuing through the first "" (u005d) character. The standard way of quoting SQL identifiers is to use double-quotes.
– replace with re-written parser based on moo (or similar), making use of the regex stick
y
flag