0.1.0 • Published 4 years ago

@timus/parser v0.1.0

Weekly downloads
-
License
MIT
Repository
github
Last release
4 years ago

@timus/parser

Introduction

The main purpose of the Timus Parser is to analyze programming languages with the same syntactic characteristics as JavaScript, but with different words.

The parser is necessary so that we can build a compiler for those languages.

Consider the following code snippet, written in JavaScript:

function factorial(n) {
    if (n === 0)
        return 1
    return n * factorial(n - 1)
}

Now see the same piece of code written in Lume, the corresponding version of JavaScript that uses Portuguese words.

função fatorial(n) {
    se (n === 0)
        retornar 1
    retornar n * fatorial(n - 1)
}

Note that the same JavaScript rules are valid, but with different keywords.

Also note that we have chosen to change the identifier "factorial" to "fatorial", which is how the word is written in Portuguese. However, the choice of identifier name is and will remain with the developer. We did this only to maintain consistency with language changes.

Other variations could be created, using terms in other languages - Spanish, for example - or at the discretion of those who are creating the "new" language:

fn fact(n) {
    if (n === 0)
        rtn 1
    rtn n * fact(n - 1)
}

Remember that any Unicode character can be used. You can use emotions as synonyms for some words:

fn fact(n) {
    🤔 (n === 0)
        👉 1
    👉 n * fact(n - 1)
}

The result generated by the parser is a tree data structure. In English, this structure is known as AST (abstract syntax tree). Below is a small snippet of the AST generated for the JavaScript code given as an example above:

{
  "type": "Program",
  "start": 0,
  "end": 93,
  "body": [
    {
      "type": "FunctionDeclaration",
      "start": 0,
      "end": 93,
      "id": {
        "type": "Identifier",
        "start": 9,
        "end": 18,
        "name": "factorial"
      },
      "expression": false,
      "generator": false,
      "async": false,
      "params": [
        {
          "type": "Identifier",
          "start": 19,
          "end": 20,
          "name": "n"
        }
      ],
      "body": {          
      }
    }
  ],
  "sourceType": "module"
}

The purpose of Timus Parser is to generate the same structure as a result for any of the language variations. The only differences are in the start and end fields of each node, because of the variations in the length of the words. This is important for the structure to be a real representation of the analyzed source code.

The parser

We decided to build Timus Parser as an extension of acorn.

Acorn is a widely used and tested JavaScript parser.

Our idea is to try to keep our parser as close to the original as possible. To achieve this, we have kept all tokens and do translations at specific points in the code.

We will always study ways to optimize, mainly trying to reduce code repetition at the points where we do translations of JavaScript words into the current language and vice versa.

The language option

To find out what the new language looks like, you must pass a property called language in the parser's options object.

This property is an object that contains a map between the original JavaScript words and the synonym options in the new language. See an example below.

const parserOptions = {
    language: {
        'function': 'função',
        'if': 'se'
    }
}

You can also enter more than one synonym option for a word, using the '|' character:

const parserOptions = {
    language: {
        'new': 'novo | nova'
    }
}

You do not need to replace every word. The original words that have no synonyms defined in the language object will be considered as matched in the new language.

0.1.0

4 years ago