1.1.0 • Published 2 years ago

dbay-sql-lexer v1.1.0

Weekly downloads
-
License
MIT
Repository
github
Last release
2 years ago

𓆤DBay SQL Lexer

Table of Contents generated with DocToc

𓆤DBay SQL Lexer

The DBay SQL Lexer takes an SQL string as input and returns a list of tokens in the format { type, text, idx, }:

tokens = ( require 'dbay-sqlite-parser' ).tokenize """select * from my_table"""

gives

[ { type: 'select',       text: 'select',   idx: 0  },
  { type: 'star',         text: '*',        idx: 7  },
  { type: 'from',         text: 'from',     idx: 9  },
  { type: 'identifier',   text: 'my_table', idx: 14 } ]

Acknowledgements

The DBay SQL Lexer is a fork of mistic100/sql-parser, with much of the original code that was outside the scope of a lexer removed.

To Do

  • documentation
  • make lexer accept Unicode identifiers
  • regex on line 176 is incorrect because backticks can occur independently of each other:

    LITERAL             = /^`?([a-z_][a-z0-9_]{0,}(:(number|float|string|date|boolean))?)`?/iu
  • implement correct identifier parsing; from Requirements For The SQLite Tokenizer: Identifier tokens:

    Identifiers follow the usual rules with the exception that SQLite allows the dollar-sign symbol in the interior of an identifier. The dollar-sign is for compatibility with Microsoft SQL-Server and is not part of the SQL standard.

    H41130: SQLite shall recognize as an ID token any sequence of characters that begins with an ALPHABETIC character and continue with zero or more ALPHANUMERIC characters and/or "$" (u0024) characters and which is not a keyword token. Identifiers can be arbitrary character strings within square brackets. This feature is also for compatibility with Microsoft SQL-Server and not a part of the SQL standard.

    H41130: SQLite shall recognize as an ID token any sequence of characters that begins with an ALPHABETIC character and continue with zero or more ALPHANUMERIC characters and/or "$" (u0024) characters and which is not a keyword token. Identifiers can be arbitrary character strings within square brackets. This feature is also for compatibility with Microsoft SQL-Server and not a part of the SQL standard.

    H41140: SQLite shall recognize as an ID token any sequence of non-zero characters that begins with "" (u005b) and continuing through the first "" (u005d) character. The standard way of quoting SQL identifiers is to use double-quotes.

    H41140: SQLite shall recognize as an ID token any sequence of non-zero characters that begins with "" (u005b) and continuing through the first "" (u005d) character. The standard way of quoting SQL identifiers is to use double-quotes.

  • replace with re-written parser based on moo (or similar), making use of the regex sticky flag

Is Done

  • + use unicode flag on all regexes
  • + return list of objects instead of list of lists
  • + use lower case for type names
1.1.0

2 years ago

1.0.0

2 years ago

0.0.3

2 years ago

0.0.2

2 years ago