555.0.2 • Published 1 year ago

gedcom555-token v555.0.2

Weekly downloads
-
License
MIT
Repository
gitlab
Last release
1 year ago

Gedcom 5.5.5 Token

A tokenizer for 'Gedcom 5.5.5'.

Install

npm i gedcom555-token;

Usage

import {tokenizeFromString} from "gedcom555-token";

const tokenized = tokenizeFromString(`0 HEAD
1 GEDC
2 VERS 5.5.5
2 FORM LINEAGE-LINKED
3 VERS 5.5.5
1 CHAR UTF-8
1 SOUR gedcom.org
0 @U@ SUBM
1 NAME gedcom.org
0 TRLR`);
/*
    [
      {
        level: 0,
        tag: `HEAD`,
      },
      {
        level: 1,
        tag: `GEDC`,
      },
      {
        level: 2,
        tag: `VERS`,
        lineItem: `5.5.5`,
      },
      {
        level: 2,
        tag: `FORM`,
        lineItem: `LINEAGE-LINKED`,
      },
      {
        level: 3,
        tag: `VERS`,
        lineItem: `5.5.5`,
      },
      {
        level: 1,
        tag: `CHAR`,
        lineItem: `UTF-8`,
      },
      {
        level: 1,
        tag: `SOUR`,
        lineItem: `gedcom.org`,
      },
      {
        level: 0,
        tag: `SUBM`,
        xrefId: `@U@`,
      },
      {
        level: 1,
        tag: `NAME`,
        lineItem: `gedcom.org`,
      },
      {
        level: 0,
        tag: `TRLR`,
      },
    ]
  */

Line by line:

When required, the tokenizer can be called for a single line.

import {tokenize} from "gedcom555-token/dist/token";

const tokenized = tokenize(`0 head`);
/*
{
  level: 0,
  tag: `HEAD`
}
*/

Notes

  1. Does not check encoding. Assuming that the string is unicode.
  2. Checks for line terminator consistency.
  3. Checks tags against known list. Todo: Low Priority: Allow tag list extension.
  4. Checks line item form single "@" at signs.
  5. Does not check other grammar rules. These are left for the parser to implement.
  6. Gedcom 555 tags being case insensitive, tokenize converts them to upper case.

License

MIT

Issues / FAQ

  • Empty CONT. As per the gedcom line definition, a CONT tag can appear without line value. If so, the line terminator MUST be directly after the tag. A trailing space or deliminator after the tag and before the terminator will cause an error.
"2 CONT"   : is legal   : +1 CONT[terminator]
"2 CONT "  : is illegal : +1 CONT[delim space][terminator]
"2 CONT  " : is legal   : +1 CONT[delim space][line value space][terminator]

Contact / Issues

GitLab page

555.0.2

1 year ago

555.0.1

2 years ago

555.0.0

2 years ago