@financial-times/n-search-parser v2.0.0
n-search-parser
This parser is not that smart, but that's OK. You don't need to know about parsing expression grammar (and subsequently the tools surrounding it) or anything like that. It's written in sane JavaScript, is very fast, and consists of a tokenizer and an expression tree builder.
Supported features
- Conjunction operators (
AND
,OR
andNOT
) - Quoted phrases (
"Theresa May"
) - Grouping with parentheses (
("Theresa May" OR "Boris Johnson")
)
Installation
$ npm i -S @financial-times/n-search-parser
Usage
First include the module in your code:
const parser = require("@financial-times/n-search-parser");
This module will export three methods...
.tokenize(query)
Accepts a string and returns an array of tokens (see "grammar" below for more details).
const tokens = parser.tokenize('"Elon Musk" AND (Space-X OR Tesla)');
/* => [
{
type: 'phrase',
text: '"Elon Musk"'
offset: 0,
length: 11
},
{
type: 'operator',
text: 'AND',
offset: 12,
length: 3
},
{
type: 'group',
text: '(Space-X OR Tesla)',
offset: 16,
length: 18,
children: [ ... ]
}
] */
.build(tokens)
Accepts an array of tokens and returns an expression tree object (see "grammar" below for more details).
parser.build(tokens);
/* => {
left: {
type: 'phrase',
text: '"Elon Musk"'
},
operator: 'AND',
right: {
left: {
left: {
type: 'word',
text: 'Space-X'
},
operator: 'OR',
right: {
type: 'word',
text: 'Tesla'
}
}
}
} */
.parse(string)
Combines the tokenize
and build
methods. Accepts a string and returns an expression tree object.
Grammar
The tokenize
method will return an array of tokens. Each token has a type
property and the raw text
that it was generated from. The types are:
- group is an expression within parentheses.
- phrase is a word or series of words within double quotes.
- operator is one of
'AND'
,'OR'
or'NOT'
. - word is any series of characters up to, but not including a whitespace.
The build
method will return an expression tree object. The tree is constructed with tokens and returns a nested structure showing the relationship between left and right operands.
For example, the string Good morning World!
will generate the following tokens:
[
{
"type": "word",
"text": "Good"
},
{
"type": "word",
"text": "morning"
},
{
"type": "word",
"text": "World!"
}
]
These tokens can be used to construct the following expression tree:
{
"left": {
"type": "word",
"text": "Good"
},
"operator": "<implicit>",
"right": {
"left": {
"type": "word",
"text": "morning"
},
"operator": "<implicit>",
"right": {
"type": "word",
"text": "World!"
}
}
}
Performance
This module has been continuously benchmarked using real search data:
Benchmark processed 54348 items in 0.518364711 seconds
Inspired by
- Lucene query parser (NPM module, generated from PEG)
- Building a search query parser (Article by Tom Ashworth, about implementing search on Twitter)
2 months ago
10 months ago
11 months ago
2 years ago
2 years ago
6 years ago
7 years ago
7 years ago