@tamim.jabr/parser NPM

Parser

It is a package to help you parse a string into different types of sentences

How to install it?

npm i @tamim.jabr/parser

How to import it?

import { TokenizerFactory, Document } from '@tamim.jabr/parser'

How to use it?

A tokenizer is created by sending the grammar object

import { TokenizerFactory, Document } from '@tamim.jabr/parser'

const doc = new Document()
const tokenizerFactory = new TokenizerFactory()
const tokenizer = tokenizerFactory.getTokenizer(
  'Hello! it is the string that will be parsed! did you know that? really? good for you.'
)
doc.parse(tokenizer)
// sentences is an array with objects of the type Sentence
const sentences = doc.getSentences()

for (let i = 0; i < sentences.length; i++) {
  const singleSentence = sentences[i]
  console.log(singleSentence.getWordTokens())
  console.log(singleSentence.getEndType())
  console.log(singleSentence.toString())
}
//the following image shows the output from the console:

npm.io

// it is possible to get only one type of the sentences using the following methods:
const regularSentences = doc.getRegularSentences()
const questionSentences = doc.getQuestionSentences()
const exclamationSentences = doc.getExclamationSentences()

Public Interface (Methods to use):

On the document object: 1. parse(tokenizer). The method takes a tokenizer as a parameter. Tokenizer can we get using the tokenizer factory to get a tokenizer that is compatible with the parser because the parser only support sentences that end with one of the following: ! ? . 2. getSentences() return an array of Sentence objects 3. getRegularSentences() return an array with only RegularSentence objects 4. getExclamationSentences()return an array with only ExclamationSentence objects 5. getQuestionSentences() return an array with only QuestionSentence objects

On sentence object: 1. getWordTokens() returns words objects with tokenType and tokenValue for every object 2. getEndType() returns the end type of the sentnece which is one of the following: DOT, EXCLAMATION_MARK or QUESTION_MARK 3. toString() returns the sentence as string with one space between words and the end type character at the end.

Errors:

parse(tokenizer) throws error of the type InvalidEndtypeError when the there is no end for the sentence. example:

      const tokenizer = tokenizerFactory.getTokenizer('hello  ')
      document.parse(tokenizer)
      // error:Invalid end type of a sentence

parse(tokenizer) throws error of the type InvalidSentenceError when it detects an end type character without words before. example:

      const tokenizer = tokenizerFactory.getTokenizer('hello. !')
      document.parse(tokenizer)
      // error:! is an invalid sentence

@tamim.jabr/tokenizer readline-sync

4 years ago

4 years ago

4 years ago

4 years ago

4 years ago

4 years ago

4 years ago