0.3.0 • Published 5 years ago

ched-shred v0.3.0

Weekly downloads
-
License
ISC
Repository
-
Last release
5 years ago

ched-shred

Lightweight C syntax parser intended for reading .h header files.

Still in early development, not yet ready for any real use!

Interfaces

// a token is an individual unit of C syntax
// 'int' | '{' | '}' | ...
type TokenString = string;

interface Macro extends TokenString[] {
  macroArgumentNames? : string[];
  isVariadic? : boolean;
}

interface MacroSet {
  [macroName:string] : MacroSet;
}

Preprocessor

Main Class

The default export of the 'ched-shred/preprocessor' module is a Readable stream class named Preprocessor. The output of this stream a series of token strings representing the preprocessed code.

  • Output (Object Mode): TokenStrings
  • Constructor parameters:
    • initialPath : string
    • options (optional) : {...}
      • resolvePath(string, string) => string
        • Default: (p1, p2) => path.resolve(path.dirname(p1), p2)
      • createReadStream() : string => stream.Readable
        • Default: (p) => fs.createReadStream(p, 'utf8')
      • trigraphMode: 'replace' | 'ignore' | 'error'
        • Default: 'replace'
      • initialMacros: MacroSet
  • Properties:
    • .macros : MacroSet

Helper Functions

createMacroSet()

Creates and returns a new MacroSet. If a plain object is passed as a parameter, the properties of this object will be added to the new macro set.

Low-Level Transforms

The 'ched-shred/preprocessor' module also contains a number of Transform stream classes that replicate each phase of the C preprocessor.

TrigraphReplaceTransform

Replace the nine trigraphs (??=, ??/, ??', ??(, ??), ??!, ??<, ??>, ??-) with the symbols they represent.

  • Input (Chunked): Text
  • Output (Chunked): Text

TrigraphErrorTransform

Pass through input to output unchanged, but throw an error if a trigraph is detected in the input.

  • Input (Chunked): Text
  • Output (Chunked): Text

ContinuedLineTransform

Join together one line with the next when the first ends in a \ backslash, with optional whitespace after it.

  • Input (Chunked): Text
  • Output (Chunked): Text

CommentToWhitespaceTransform

Replace comments (of the style /*...*/ and //...) with a single space character, ignoring those inside string literals.

  • Input (Chunked): Text
  • Output (Chunked): Text

Note that /*...*/-style comments do not nest.

DirectiveSplitTransform

Split text content into preprocessor directives and sections of code. Each code section only includes complete lines of code, so they can be tokenized independently. Directives and code sections are both guaranteed to end with a whitespace character (an extra space will be appended to the final line if there is no whitespace before the end), so that the text will always produce a set of complete tokens if processed by TokenizeTransform.

  • Input (Chunked): Text
  • Output (Object Mode):
    • ["", "...code section\n"]
    • ["#directive", "...directive parameters\n"]

Note that any whitespace between # and the name of a directive is removed in the output.

Comments and line continuators need to be handled first before this transform is applied, or the results are likely to become mangled.

Depending on the input stream there may sometimes be a run of two or more code sections without a directive.

TokenizeTransform

Split the incoming text stream into a series of atomic token strings.

  • Input (Chunked): Text
  • Output (Object Mode): TokenStrings

MacroExpansionTransform

Pass through a stream of token strings, expanding macros as they are encountered. Will throw an error if .end() is called while a function-like macro is left unfinished.

  • Input (Object Mode): TokenStrings
  • Output (Object Mode): TokenStrings
  • Constructor parameters:
    • macros : MacroSet
  • Properties:

    • macros : MacroSet

Syntax Parser

The 'ched-shred/c-syntax' module contains one main Transform class:

TokenParseTransform

This transform stream takes in a series of preprocessed tokens and emits one object for each complete top-level declaration that it has read.

  • Input (Object Mode): TokenStrings
  • Output (Object Mode): Declaration objects, one of the following:
    • InitDeclaration
    • FunctionDefinition
    • LegacyFunctionDefinition