1.1.2 • Published 3 years ago

pylex v1.1.2

Weekly downloads
-
License
-
Repository
-
Last release
3 years ago

PyLex

NPM version Build Status

Implements a Parser class for modeling the high-level control functions of Python programs.

Parser

The high level structure of a Python file can be represented as a parse tree. Consider the following snippet of Python code:

class Bot(object):
    def __init__(self, id):
        this.name = id

    def work():
        print("Beep, Boop .-.");

b = Bot()
while True:
    b.work()

This can be imagined as the following parse tree, where each node is enclosed in a box:

Parse tree for the above code sample. Indentation represents a parent-child relationship

Calling the parse() function of an initialized Parser returns such a syntax-tree of the control flow of a Python program. A Parser can be initialized at instantiation, or when calling the parse() method. The Parser accepts two optional arguments when initializing:

  • text?: string: The text string to parse.
  • tabFmt?: TabInfo: A tab information descriptor (see Data Types).

When an argument is omitted, the previously passed value for that field will be reused.

By preserving the state this way, parse() can be called repeatedly without arguments to get the same tree more than once, and any new text passed without a tabFmt will be assumed to use the same format.


Once a Parser object has been initialized, calling its context(lineNumber: number) method will return a path of nodes from the leaf node containing the specified line number to the root. Take for example, the print() statement inside of work(). The returned context would be

Context Path: "while True" inside of "def work()" inside of "class Bot"

and can be read as:

The line print("Beep, boop .-.") is inside the function work() inside the class Bot inside the root of the document

NOTE: It is important to recognize that the print statement is not an actual node in practice, but nonetheless it is helpful to think of it being "inside" the leaf.

Data Types

Node

Each parser node is of type LexNode which extends vscode.TreeItem and has the following fields:

class LexNode {
  readonly label: string // Text label for node e.g., "function foo", "while True", "class Bot"
  readonly collapsibleState: vscode.TreeItemCollapsibleState // None (0), Collapsed (1), or Expanded (2)
  readonly token?: LineToken // Token associated with this node.
  private  _children?: LexNode[] // Child nodes. Accessed with children()
  private  _parent?: LexNode // Parent. Accessed with parent()
}

Additionally:

  • Use the hasChildren() method to check for children
  • Use the rootPath() method to return a path of nodes starting from the current node and ending at the root. Internally the context() method of Parser uses the root path of the leaf node "containing" the specified line number.

LineToken

A line token represents a single line of a Python file:

class LineToken {
  readonly type: Symbol // Type of token
  readonly linenr: number // Line number of this token (0-indexed)
  readonly indentLevel: number // Indent level of this line
  readonly attr?: any // Any additional things a token might need (class name, control condition)
}

TabInfo

A descriptor class to specify a type of tab for the Lexer.

class TabInfo {
  public size?: number = 4; // width of one tab
  public hard?: boolean = false; // whether to use tab characters
}

Symbol

Each symbol type represents either a Python construct, an indentation symbol, or EOF. Indentation symbols are used to track indentation inside of blocks.

enum Symbol {
  function,
  class,
  if,
  else,
  elif,
  for,
  while,
  try,
  except,
  finally,
  with,
  indent,
  eof
}

Find example programs in the examples sub-directory.

1.1.3

3 years ago

1.1.2

3 years ago

1.1.1

3 years ago

1.1.0

3 years ago

1.0.1

3 years ago

1.0.0

3 years ago