jl v2.4.1
node-jl
Utility kit for working with JSON files/streams divided by \n symbol.
JSON-representation of each record shouldn't contain line breaking symbol.
Install
npm install -g jlExamples
We have a log file /tmp/test.json, that contains next records
{"ts": 1416595508, "type": "click"}
{"ts": 1416478467, "type": "buy", "price": 10}
{"ts": 1416466930, "type": "click"}
{"ts": 1416622653, "type": "buy", "price": 20}
{"ts": 1416699396, "type": "click"}
{"ts": 1416624334, "type": "click"}
{"ts": 1416518859, "type": "click"}
{"ts": 1416569870, "type": "click"}
{"ts": 1416573325, "type": "click"}
{"ts": 1416682270, "type": "click"}SQL-interface
jl-sql utility supports query language very similar to SQL
cat /tmp/test.json \
| jl-sql 'SELECT type, COUNT(*) AS c, SUM(price) AS sum GROUP BY type ORDER BY c NUMERIC DESC'{"type":"click","c":8,"sum":0}
{"type":"buy","c":2,"sum":30}As you can see we receive that we expect.
Please note non-standard keyword NUMERIC after ORDER BY:
without this keyword, sorting will be by string representation, not by numerical value.
Constants
NULLTRUEFALSE
Operators
Arithmetic
+-*/%
Comparison
=,==!====- Comparison with type check!==- Negative comparison with type check><>=<=
Logical
AND,&&OR,||!
Functions
Functions for working with data
FROM_UNIXTIME(unixTimestamp)- converts unix timestamp to DateUNIX_TIMESTAMP(date)- converts Date to unix timestamp, date should be string in'YYYY-MM-DD hh:mm:ss'formatDATE(date)- returns date in'YYYY-MM-DD'format
Misc
IF(expression, ifTrue, ifFalse)COALESCE(expression1[, expression2[, ...]])
Aggregating
Right now, only very basic aggregating functions supported:
SUM(expression)- counts summary of field's values. Non-numerical values will be ignoredMIN(expression)- returns minimal field's value. Non-numerical values will be ignoredMAX(expression)- returns maximal field's value. Non-numerical values will be ignoredCOUNT(expression)- counts amount of elements for which comparison withexpressionwill be !== null or !== undefinedCOUNT(*)- counts amount of elementsHLL_COUNT_DISTINCT(arg1[, args2[, ...]])- counts amount of unique argument combinations. UsesO(1)of memory andO(N)of CPU, but returns result with 0.1% error. For more information read HyperLogLog algorithm.
Restrictions
JOINis not supported(and probably never will be)- Sorting by multiple fields is not supported
LIMITis not supported, but could be replaced withheadandtail
Low-level interface
SQL-interface only wrapper for utility kit described down below. By combining those utilities you can do much more than with SQL.
Filtering and aggregation
Let's filter only events with type buy
cat /tmp/test.json \
| jl-filter 'type == "buy"'{"ts": 1416478467, "type": "buy", "price": 10}
{"ts": 1416622653, "type": "buy", "price": 20}Add sum calculation for all price fields
cat /tmp/test.json \
| jl-filter 'type == "buy"'
| jl-sum price{"value":30}To get just 30 you can add jl-extract value at the end - this command extract value of the field.
Sorting
Let sort log file by ts field
cat /tmp/test.json \
| jl-sort ts{"ts": 1416466930, "type": "click"}
{"ts": 1416478467, "type": "buy", "price": 10}
{"ts": 1416518859, "type": "click"}
{"ts": 1416569870, "type": "click"}
{"ts": 1416573325, "type": "click"}
{"ts": 1416595508, "type": "click"}
{"ts": 1416622653, "type": "buy", "price": 20}
{"ts": 1416624334, "type": "click"}
{"ts": 1416682270, "type": "click"}
{"ts": 1416699396, "type": "click"}The standard arguments of sort utility is also supported: -r, -n, -u, -m, -s, -T, -S
Modification
Let's add to each object field date, that contains event date in UTC format
cat /tmp/test.json \
| jl-transform '{r.date = (new Date(r.ts * 1000)).toUTCString()}'{"ts":1416595508,"type":"click","date":"Fri, 21 Nov 2014 18:45:08 GMT"}
{"ts":1416478467,"type":"buy","price":10,"date":"Thu, 20 Nov 2014 10:14:27 GMT"}
{"ts":1416466930,"type":"click","date":"Thu, 20 Nov 2014 07:02:10 GMT"}
{"ts":1416622653,"type":"buy","price":20,"date":"Sat, 22 Nov 2014 02:17:33 GMT"}
{"ts":1416699396,"type":"click","date":"Sat, 22 Nov 2014 23:36:36 GMT"}
{"ts":1416624334,"type":"click","date":"Sat, 22 Nov 2014 02:45:34 GMT"}
{"ts":1416518859,"type":"click","date":"Thu, 20 Nov 2014 21:27:39 GMT"}
{"ts":1416569870,"type":"click","date":"Fri, 21 Nov 2014 11:37:50 GMT"}
{"ts":1416573325,"type":"click","date":"Fri, 21 Nov 2014 12:35:25 GMT"}
{"ts":1416682270,"type":"click","date":"Sat, 22 Nov 2014 18:51:10 GMT"}Internal pipe
The system pipes are inefficient because of serialization/deserialization of JSON each time when working with multiple jl-utilities because each utility receives and returns JSON using stdin/stdout.
To solve this problem jl-utilities supports internal piping which allows to multiple utilities interact with each other in the context of a single process without spending resources for parsing. To use internal pipes replace | to \| between jl-commands in the command line. For example:
cat /tmp/test.json \
| jl-filter 'type == "buy"'
| jl-sum priceyou can change to:
cat /tmp/test.json \
| jl-filter 'type == "buy"'
\| jl-sum pricePerformance
CPU consumption of all utilities except for jl-sort is O(n), by memory is O(1).
The performance of jl-sort completely depends on system's sort realization, additional CPU consumptions also O(n), and by memory O(1).
By using internal pipes significantly increases performance because JSON parsing is very expensive operation. But in another hand, all operations in pipes is computing in a single thread, so with very long pipes you should keep the balance.
Utilities
jl-sql - SQL
Requires single argument - SQL and has few options:
-T DIR- temporary directory which will be used for sorting if it didn't use RAM.$TMPDIRor/tmpby default. Pay attention when sorting big amount of data, by having/tmpon the RAM disk.-S BUF- size of sorting buffer in bytes. On buffer overflow, filesystem will be used(see-Toption)
jl-sort - sorting
Wrapper for GNU Sort, allowing to sort JSON. Uses all advantages of sort, also, supports sorting in filesystem, merge-sort, stable sorting, limit buffer size.
jl-filter - filtering
Alternative to grep, but for JSON.
jl-reduce - aggregation and grouping by key
General utility that produces welding elements into a single value for each group. Expects 3 JS-functions arguments:
-i FUNC- accumulator initialization: gets called for each group. The function executes in a group context.-u FUNC- accumulator update: the function that will be executed for each element of a group. The function executes in a group context and receives argumentr, containing each element of a thread.-r FUNC- received accumulator's value: will be executed at the end of group with the passed result. The function executes in a group context.
and one required argument
-k KEYDEF- key which will be used for grouping, could be a function. For correct execution thread's input should be sorted by this key in any direction.
Here is simple example of using jl-sum utility, which counts sum for amount field for each user that gets identified by uid field:
jl-reduce -k uid -i '{this.sum = 0}' -u '{this.sum += r.amount}' -r '{return this.sum}'For thread's input containing:
{"uid": 1, "amount": 10}
{"uid": 1, "amount": 11}
{"uid": 2, "amount": 12}
{"uid": 3, "amount": 13}Output will be:
{"key":1,"value":21}
{"key":2,"value":12}
{"key":3,"value":13}If -k argument will be missing, convolution will go over whole thread and will return:
{"value":46}jl-sum - summation
Predefined jl-reduce, which counts sum of parameter in group
jl-count - подсчёт количества
Predefined jl-reduce, which counts amount of elements in group
jl-transform - modification
jl-extract - field's value extraction
jl-from-csv - CSV to JSON conversation
jl-plainify - Flattens object to a single level deep k-v
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
9 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago
11 years ago