@kbss4/text2json-parser v0.0.8
text2json-parser
A customizable text to json parser utility, a very simple approach.
This is still a trial I am using for a private project. The idea was to be able to parse different text files to JSON format without coding a solution ad-hoc for each file. To do so it is necessary to describe a JSON file to describe the operations we want to process on the raw data.
This function expects two parameters (described below) and the output will be a JSON with the file identifier as key and the JSON entries as values.
Files
A JSON file that allocates the files to process. The dictionary keys are the identifiers/names/paths of the files and the values of the file content as a string or buffer.
<pre>
let filesJson = {
logs.txt: '<FILE-CONTENT-BUFFER>',
operations.log: '<FILE-CONTENT-BUFFER>'
}
</pre>
JSON files descriptor
This file describes all the operations we want to make on raw files to parse the contents. Here is the description of each field.
- auxiliarFields: These are fields that we might use in the parsing process.
- commonFields: These are fields that will be added to all the entries we are gonna parse.
- files: The files we will parse with name as key and the entries list as values.
File descriptor
Fields
- name: The name of the field
- source: The source of the field (depending on the source type we would need diferent parameters).
Source | Description | Required fields |
---|---|---|
FILE | From a file content | filename |
FILENAME | From a file name | filename |
LITERAL | From a literal value set by the user | value |
ROW | From a splitted file row |
Procedures
To parse a field we can add three types of procedures
Procedure | Description | Value fields |
---|---|---|
extract | Initial extraction operation | operation description |
process | Additional operations to perform on the previous extracted field value | operation descriptions list |
conditional | Conditions needed to add the field to the output entries | conditional descriptor |
Operations
Operation | Description | Additional fields | expectedValues |
---|---|---|---|
EXTRACT | Extract value using a regular expression | regex(string) | |
SUBSTRING | Substring operation on a string parameter | from, to (string:value/number:index) | |
PARSE | Parse value to another field type | from, to | 'STRING', 'NUMBER', 'DATE', 'LIST' |
REPLACE | Parse value to another field type | from, to | |
SPLIT | Parse value to another field type | by | |
JOIN | Parse value to another field type | values, by | values:[], by: (joiner) |
Conditionals
Conditionals | Additional fields | expectedValues | |
---|---|---|---|
EQUALS | regex(string) | ||
NOT_EQUALS | from, to (string:value/number:index) | ||
LIKE | from, to | 'STRING', 'NUMBER', 'DATE', 'LIST' | |
NOT_LIKE | Parse value to another field type | from, to | |
IN | Parse value to another field type | by | |
NOT_IN | Parse value to another field type | values, by | values:[], by: (joiner) |
STARTSWITH | Parse value to another field type | by | |
ENDSWITH | Parse value to another field type | values, by | values:[], by: (joiner) |
Field descriptor
Fields are used to describe the parameters we will extract from our sources.
<pre>
const filesDescriptorJson = {
auxiliarFields: [
{
name: 'year', source: 'FILE', filename: 'journalctl.log',
extract: { operation: 'EXTRACT', regex: /\d{4}/ }
}
],
commonFields: [
{
name: 'session', type: 'STRING', source: 'FILENAME', filename: 'rosout.log',
extract: { operation: 'SUBSTRING', from: undefined, to: '/' }
},
{
name: 'serial', type: 'STRING', source: 'FILE', filename: 'journalctl.log',
extract: { operation: 'EXTRACT', regex: /\w{2}-\w{4}-\w{3}-\w{6}/ }
}
],
files: {
rosout: {
filename: 'rosout.log', lineDelimiter: /(\d{10}\.\d{9})/,
fields: [
{ name: 'file', source: 'LITERAL', value: 'rosout.log' },
{ name: 'raw', type: 'STRING', source: 'ROW' },
{
name: 'time', type: 'DATE', source: 'ROW',
extract: { operation: 'SUBSTRING', from: 0, to: ' ' },
process: [
{ operation: 'REPLACE', from: '.', to: '' },
{ operation: 'SUBSTRING', from: 0, to: 13 },
{ operation: 'PARSE', from: 'NUMBER', to: 'DATE' },
]
},
{
name: 'severity', type: 'STRING', source: 'ROW',
conditions: [{ key: '$this', operator: 'IN', value: ['INFO', 'WARN', 'ERROR', 'FATAL'] }],
extract: { operation: 'SUBSTRING', from: 0, to: ' ' }
},
{
name: 'node', type: 'STRING', source: 'ROW',
conditions: [{ key: '$this', operator: 'STARTSWITH', value: '/' }],
extract: { operation: 'SUBSTRING', from: 0, to: ' ' }
},
{
name: 'source', type: 'STRING', source: 'ROW',
conditions: [
{ key: '$this', operator: 'STARTSWITH', value: '[' },
{ key: '$this', operator: 'ENDSWITH', value: ')' }
],
extract: { operation: 'SUBSTRING', from: 0, to: ' ' }
},
{
name: 'topics', type: 'LIST', source: 'ROW',
extract: { operation: 'SUBSTRING', from: '[topics:', to: ']' },
conditions: [{ key: '$this', operator: 'STARTSWITH', value: '/' }],
process: [{ operation: 'SPLIT', by: ', ' }]
},
{ name: 'message', type: 'STRING', source: 'ROW' },
{
name: 'mongoTime', type: 'DATE', source: 'ROW',
conditions: [{ key: '$node', operator: 'EQUALS', value: '/mongo_server' }],
extract: { operation: 'SUBSTRING', from: 0, to: ' ' },
process: [{ operation: 'PARSE', from: 'STRING', to: 'DATE' }]
},
{
name: 'mongoProcess', type: 'DATE', source: 'ROW',
conditions: [{ key: '$node', operator: 'EQUALS', value: '/mongo_server' }],
extract: { operation: 'SUBSTRING', from: '[', to: ']' }
},
{
name: 'mongoMessage', type: 'DATE', source: 'ROW',
conditions: [{ key: '$node', operator: 'EQUALS', value: '/mongo_server' }]
},
]
},
</pre>