555.0.1 • Published 10 months ago

gedcom555-token v555.0.1

Weekly downloads
-
License
MIT
Repository
github
Last release
10 months ago

Gedcom 5.5.5 Token

A tokenizer for 'Gedcom 5.5.5'.

Install

npm i gedcom555-token;

Usage

import {tokenizeFromString} from "gedcom555-token";

const tokenized = tokenizeFromString(`0 HEAD
1 GEDC
2 VERS 5.5.5
2 FORM LINEAGE-LINKED
3 VERS 5.5.5
1 CHAR UTF-8
1 SOUR gedcom.org
0 @U@ SUBM
1 NAME gedcom.org
0 TRLR`);
/*
    [
      {
        level: 0,
        tag: `HEAD`,
      },
      {
        level: 1,
        tag: `GEDC`,
      },
      {
        level: 2,
        tag: `VERS`,
        lineItem: `5.5.5`,
      },
      {
        level: 2,
        tag: `FORM`,
        lineItem: `LINEAGE-LINKED`,
      },
      {
        level: 3,
        tag: `VERS`,
        lineItem: `5.5.5`,
      },
      {
        level: 1,
        tag: `CHAR`,
        lineItem: `UTF-8`,
      },
      {
        level: 1,
        tag: `SOUR`,
        lineItem: `gedcom.org`,
      },
      {
        level: 0,
        tag: `SUBM`,
        xrefId: `@U@`,
      },
      {
        level: 1,
        tag: `NAME`,
        lineItem: `gedcom.org`,
      },
      {
        level: 0,
        tag: `TRLR`,
      },
    ]
  */

Line by line:

When required, the tokenizer can be called for a single line.

import {tokenize} from "gedcom555-token/dist/token";

const tokenized = tokenize(`0 head`);
/*
{
  level: 0,
  tag: `HEAD`
}
*/

Notes

  1. Does not check encoding. Assuming that the string is unicode.
  2. Checks for line terminator consistency.
  3. Checks tags against known list. Todo: Low Priority: Allow tag list extension.
  4. Checks line item form single "@" at signs.
  5. Does not check other grammar rules. These are left for the parser to implement.
  6. Gedcom 555 tags being case insensitive, tokenize converts them to upper case.

License

MIT

Contact / Issues

Github page

555.0.1

10 months ago

555.0.0

10 months ago