0.0.8 • Published 1 year ago

@kbss4/text2json-parser v0.0.8

Weekly downloads
-
License
MIT
Repository
-
Last release
1 year ago

text2json-parser

A customizable text to json parser utility, a very simple approach.

This is still a trial I am using for a private project. The idea was to be able to parse different text files to JSON format without coding a solution ad-hoc for each file. To do so it is necessary to describe a JSON file to describe the operations we want to process on the raw data.

This function expects two parameters (described below) and the output will be a JSON with the file identifier as key and the JSON entries as values.

Files

A JSON file that allocates the files to process. The dictionary keys are the identifiers/names/paths of the files and the values of the file content as a string or buffer.

<pre>
let filesJson = {
    logs.txt: '<FILE-CONTENT-BUFFER>',
    operations.log: '<FILE-CONTENT-BUFFER>'
}
</pre>

JSON files descriptor

This file describes all the operations we want to make on raw files to parse the contents. Here is the description of each field.

  • auxiliarFields: These are fields that we might use in the parsing process.
  • commonFields: These are fields that will be added to all the entries we are gonna parse.
  • files: The files we will parse with name as key and the entries list as values.

File descriptor

Fields

  • name: The name of the field
  • source: The source of the field (depending on the source type we would need diferent parameters).
SourceDescriptionRequired fields
FILEFrom a file contentfilename
FILENAMEFrom a file namefilename
LITERALFrom a literal value set by the uservalue
ROWFrom a splitted file row

Procedures

To parse a field we can add three types of procedures

ProcedureDescriptionValue fields
extractInitial extraction operationoperation description
processAdditional operations to perform on the previous extracted field valueoperation descriptions list
conditionalConditions needed to add the field to the output entriesconditional descriptor

Operations

OperationDescriptionAdditional fieldsexpectedValues
EXTRACTExtract value using a regular expressionregex(string)
SUBSTRINGSubstring operation on a string parameterfrom, to (string:value/number:index)
PARSEParse value to another field typefrom, to'STRING', 'NUMBER', 'DATE', 'LIST'
REPLACEParse value to another field typefrom, to
SPLITParse value to another field typeby
JOINParse value to another field typevalues, byvalues:[], by: (joiner)

Conditionals

ConditionalsAdditional fieldsexpectedValues
EQUALSregex(string)
NOT_EQUALSfrom, to (string:value/number:index)
LIKEfrom, to'STRING', 'NUMBER', 'DATE', 'LIST'
NOT_LIKEParse value to another field typefrom, to
INParse value to another field typeby
NOT_INParse value to another field typevalues, byvalues:[], by: (joiner)
STARTSWITHParse value to another field typeby
ENDSWITHParse value to another field typevalues, byvalues:[], by: (joiner)

Field descriptor

Fields are used to describe the parameters we will extract from our sources.

<pre>
const filesDescriptorJson = {
  auxiliarFields: [
    {
      name: 'year', source: 'FILE', filename: 'journalctl.log',
      extract: { operation: 'EXTRACT', regex: /\d{4}/ }
    }
  ],
  commonFields: [
    {
      name: 'session', type: 'STRING', source: 'FILENAME', filename: 'rosout.log',
      extract: { operation: 'SUBSTRING', from: undefined, to: '/' }
    },
    {
      name: 'serial', type: 'STRING', source: 'FILE', filename: 'journalctl.log',
      extract: { operation: 'EXTRACT', regex: /\w{2}-\w{4}-\w{3}-\w{6}/ }
    }
  ],
  files: {
    rosout: {
      filename: 'rosout.log', lineDelimiter: /(\d{10}\.\d{9})/,
      fields: [
        { name: 'file', source: 'LITERAL', value: 'rosout.log' },
        { name: 'raw', type: 'STRING', source: 'ROW' },
        {
          name: 'time', type: 'DATE', source: 'ROW',
          extract: { operation: 'SUBSTRING', from: 0, to: ' ' },
          process: [
            { operation: 'REPLACE', from: '.', to: '' },
            { operation: 'SUBSTRING', from: 0, to: 13 },
            { operation: 'PARSE', from: 'NUMBER', to: 'DATE' },
          ]
        },
        {
          name: 'severity', type: 'STRING', source: 'ROW',
          conditions: [{ key: '$this', operator: 'IN', value: ['INFO', 'WARN', 'ERROR', 'FATAL'] }],
          extract: { operation: 'SUBSTRING', from: 0, to: ' ' }
        },
        {
          name: 'node', type: 'STRING', source: 'ROW',
          conditions: [{ key: '$this', operator: 'STARTSWITH', value: '/' }],
          extract: { operation: 'SUBSTRING', from: 0, to: ' ' }
        },
        {
          name: 'source', type: 'STRING', source: 'ROW',
          conditions: [
            { key: '$this', operator: 'STARTSWITH', value: '[' },
            { key: '$this', operator: 'ENDSWITH', value: ')' }
          ],
          extract: { operation: 'SUBSTRING', from: 0, to: ' ' }
        },
        {
          name: 'topics', type: 'LIST', source: 'ROW',
          extract: { operation: 'SUBSTRING', from: '[topics:', to: ']' },
          conditions: [{ key: '$this', operator: 'STARTSWITH', value: '/' }],
          process: [{ operation: 'SPLIT', by: ', ' }]
        },
        { name: 'message', type: 'STRING', source: 'ROW' },
        {
          name: 'mongoTime', type: 'DATE', source: 'ROW',
          conditions: [{ key: '$node', operator: 'EQUALS', value: '/mongo_server' }],
          extract: { operation: 'SUBSTRING', from: 0, to: ' ' },
          process: [{ operation: 'PARSE', from: 'STRING', to: 'DATE' }]
        },
        {
          name: 'mongoProcess', type: 'DATE', source: 'ROW',
          conditions: [{ key: '$node', operator: 'EQUALS', value: '/mongo_server' }],
          extract: { operation: 'SUBSTRING', from: '[', to: ']' }
        },
        {
          name: 'mongoMessage', type: 'DATE', source: 'ROW',
          conditions: [{ key: '$node', operator: 'EQUALS', value: '/mongo_server' }]
        },
      ]
    },
    </pre>
0.0.8

1 year ago

0.0.7

1 year ago

0.0.6

2 years ago

0.0.5

2 years ago

0.0.4

2 years ago

0.0.3

2 years ago

0.0.1

2 years ago