binary-parse v0.11.0
binary-parse
binary-parse
allows you to define a structure and parse binary data into that structure.
If you've ever analyzed a binary protocol or file format, you know how bloated and unreadable the source code is. It gets even worse when the protocol has funny things like integers that start or end in the middle of a byte, a mixture of little and big endian or structures that are not static but it is decided during parsing which fields are present.
binary-parse
tries to provide an interface to analyze such binary formats declaratively without losing the ability to dynamically make decisions.
Let's take a look how an IPv4 packet could be parsed (leaving the IP options and the payload raw):
const BinaryParser = require('binary-parse');
const structure =
{
Header:
{
version: 4,
headerLength: 4,
tos: 8,
length: 16,
identification: ['buffer', 2],
flags: ['bits', 3],
fragmentOffset: 13,
ttl: 8,
protocol: 8,
checksum: ['buffer', 2],
sourceAddress: ['buffer', 4],
destinationAddress: ['buffer', 4],
options: ['buffer', (header) => (header.headerLength - 5) * 4],
},
Packet:
{
header: 'Header',
body: 'buffer',
},
};
let parser = new BinaryParser(structure);
let rawPacket = Buffer.from(/* ... */);
let packet = parser.parse(rawPacket, 'Packet');
/*
***Just sample data. Not a real IPv4-packet***
packet =>
{
header:
{
version: 4,
headerLength: 5,
tos: 0,
length: 1337,
identification: Buffer <42 42>,
flags: [0, 1, 0],
fragmentOffset: 4711,
ttl: 42,
protocol: 0x10,
checksum: Buffer <13 37>,
sourceAddress: Buffer <0a 0d 25 2a>
destinationAddres: Buffer <0a 0d 25 9f>,
options: Buffer <...>,
},
body: Buffer <43 61 74 73 20 77 69 6c 6c 20 74 61 6b 65 20 6f 76 65 72 20 74 68 65 20 77 6f 72 6c 64 ...>,
}
*/
Usage
To use the binary parser, you have to create a structure and pass it together with the data to parse to a BinaryParser
instance. For examples, how to build such a structure, see examples below.
const BinaryParser = require('binary-parse');
const structure = { /* See examples below */ };
let parser = new BinaryParser(structure);
let data = Buffer.from(/* ... */);
let parsedObject = parser.parse(data, '<NAME OF ROOT STRUCTURE>');
Available types
If there's no endianess given, BigEndian is used
byte
: Int8, unsignedint8
uint8
int16
int16be
int16le
uint16
uint16be
uint16le
int32
int32be
int32le
uint32
uint32be
uint32le
int64
int64be
int64le
uint64
uint64be
uint64le
int
: Generic int. See examples below for how to use this type
Parameters:[length (bits), endianess, signed]
- Values for Endianess:
b
,be
,big
,bigendian
,l
,le
,little
,littleendian
Default:bigendian
- Values for Signed:
signed
,s
,1
,true
,unsigned
,u
,0
,false
Default:unsigned
- Values for Endianess:
floatbe
floatle
float
: Float, 32 bit
Parameters:[endianess]
- Values for Endianess:
b
,be
,big
,bigendian
,l
,le
,little
,littleendian
Default:bigendian
- Values for Endianess:
doublebe
doublele
double
: Double, 64 bit
Parameters:[endianess]
- Values for Endianess:
b
,be
,big
,bigendian
,l
,le
,little
,littleendian
Default:bigendian
- Values for Endianess:
string
Parameters[length (bytes), encoding]
- Encoding: All encodings accepted by NodeJS
Buffer.toString
-method
Default:ascii
- Encoding: All encodings accepted by NodeJS
utf8String
Parameters:[length (bytes)]
utf16LEString
Parameters:[length (bytes)]
base64String
Parameters:[length (bytes)]
This type reads binary data and outputs base64 stringhexString
Parameters:[length (bytes)]
This type reads binary data and outputs a hex stringcstring
C string / Null-terminated string Data is read until a null-byte is reached. The null-byte is consumed but not included into the string. Parameters[encoding]
- Encoding: All encodings accepted by NodeJS
Buffer.toString
-method Default:ascii
- Encoding: All encodings accepted by NodeJS
utf8CString
Parameters:none
utf16LECString
Parameters:none
base64CString
Parameters:none
This type reads binary data and outputs base64 stringhexCString
Parameters:none
This type reads binary data and outputs a hex stringarray
Parameters:[type, length (elements)]
buffer
Parameters:[length (bytes)]
bits
Parameters:[length (bytes)]
Reads bits into an array of 0s and 1s- Single numbers
If a single number is used as type, it's interpreted as unsigned, BigEndian integer with that number of bits
Examples
Single struct
const structure =
{
MyStruct:
{
field1: 'int8',
field2: 'uint16',
field3: 'int16le',
},
};
let parsedObject = parser.parse(data, 'MyStruct');
/* =>
{
field1: 13,
field2: 37,
field3: 42,
}
*/
Include other structs
const structure =
{
MyStruct:
{
field1: 'int8',
field2: 'uint16',
field3: 'int16le',
},
StructShell:
{
field1: 'int32',
field2: 'double',
child3: 'MyStruct',
field4: 'byte',
child5: 'MyStruct',
},
};
/* =>
{
field1: 111,
field2: 222,
child3:
{
field1: 13,
field2: 37,
field3: 42,
},
field4: 789,
chidd5:
{
field1: 31,
field2: 73,
field3: 24,
},
}
*/
Nested structs
const structure =
{
StructShell:
{
field1: 'int32',
field2: 'double',
child3:
{
field1: 'int8',
field2: 'uint16',
field3: 'int16le',
},
field4: 'byte',
child5: 'MyStruct',
},
};
Reading single bits as number
const structure =
{
MyStruct:
{
field1: 3, //3 Bits
field2: 'uint16', //16 bit are read across byte boundaries
field3: 5, //5 Bits
//In total, 3 bytes have been read
},
};
Reading single bits
const structure =
{
MyStruct:
{
field1: ['bits', 3],
field2: ['bits', 5],
field3: 'bits',
},
};
/* =>
{
field1: [1, 1, 0],
field2: [1, 0, 0, 0, 1],
field3: [0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, ...],
},
*/
Buffers
const structure =
{
MyStruct:
{
field1: 3,
buffer2: ['buffer', 4], //Buffer with 4 bytes, also reading across byte boundaries
field3: 5,
},
};
/* =>
{
field1: 1,
buffer2: Buffer <11 22 33 44>,
field3: 42,
},
*/
Arrays
const structure =
{
MyStruct:
{
array1: ['array', 'int32', 3],
array2: ['array', ['buffer', 4], 3],
},
StructShell:
{
structs: ['array', 'MyStruct', 2],
},
};
/* =>
{
structs:
[
{
array1: [13, 37, 42],
array2:
[
Buffer <a1 bb cc dd>,
Buffer <a2 bb cc dd>,
Buffer <a3 bb cc dd>,
]
},
{
array1: [73, 73, 73],
array2:
[
Buffer <00 12 34 56>,
Buffer <00 13 34 56>,
Buffer <00 14 34 56>,
]
}
]
lines:
[
1234,
5678,
1111,
2222,
3333,
],
},
*/
Custom ints
const structure =
{
PacketHeader:
{
//Endians: b, be, big, bigendian, l, le, little, littleendian
//Signed: signed, s, 1, true, unsigned, u, 0, false
int1: ['int', 4, 'be', 'unsigned'],
int2: ['int', 5, 'l', 'u'],
int3: ['int', 8, 'big', 1],
},
};
Read remaining data
const structure =
{
Header:
{
field1: 'int8',
field2: 'int16',
field3: 'int16',
},
Packet:
{
header: 'Header',
body: 'buffer'
/* or */ body: ['buffer']
/* or */ body: ['buffer', 0]
},
};
/* =>
{
header:
{
field1: 5,
field2: 67,
field3: 89,
},
body: Buffer <12 34 56 ...>,
}
*/
Dynamic values
Each value except the struct definitions at root-level are allowed to be functions. In this example only fixed values are shown. But obvioulsly the functions can execute and return what they want.
const structure =
{
Header:
{
field1: () => 'int8',
field2: () => ['buffer', 4]
field3: [() => 'array', 'int16', () => 5],
},
Packet:
{
header: () => 'Header',
body: 'buffer'
},
};
Calculations
const structure =
{
Header:
{
field1: 'int8',
field2: 'int16',
headerLength: 'uint16',
//HeaderLength - 5 bits from previous fields *8 as single numbers are interpreted as bit-count
headerOptions: (header) => (header.headerLength - 5) * 8
//Instead of arrow-functions which have to use the passed object-instance, normal functions
//can be used in which `this` points to the current object
/* or */ headerOptions: function() { return (this.headerLength - 5) * 8; }
/* or */ headerOptions: ['buffer', (header) => header.headerLength - 5]
},
};
Fixed values
const structure =
{
MyStruct:
{
field1: 'int8',
field2: 'uint16',
field3: 'uint16',
fixed4: ['fixed', 424242]
/* or */ fixed4: ['fixed', (header) => { /* ... */ return someFinalValue; }]
}
};
/* =>
{
field1: 13,
field2: 37,
field3: 42,
fixed4: 424242,
},
}
*/
Limited structures
If a number is passed as argument to a struct, the struct has exactly that amount of bits available. The fields of the structure start at the same position they would without this parameter, but if it tries to read more than the given amount of bits, an error will be thrown, as this struct has reached the end of its block of data. After the structure has finished reading and returns, the bit cursor is set to the end of the block that was given to the structure, so reading is continued as if the structure has read all of its available bits, no matter how many it actually read.
Of course also in this case the byte boundaries can be broken. You can start and end in the middle of byte or pass a number of bits which isn't divisible by 8.
const structure =
{
ChildStruct:
{
childField1: 16,
childField2: 8,
data: 'buffer'
},
MyStruct:
{
childLength: 'uint8',
subStruct: ['ChildStruct', (struct) => struct.childLength * 8],
field3: 8,
field4: 16,
data: 'buffer'
}
};
/* =>
{
childLength: 10,
subStruct:
{
childField1: 42,
childField2: 43,
data: Buffer <11 22 33 44 55 66 77>,
},
field3: 13,
field4: 37,
data: Buffer <12 34 56 ...>,
},
}
*/
Parameters
Some types accept parameters. These can be passed using []
. Example:
- Type:
array
- Parameters:
type
,length
- Syntax:
['array', <MY TYPE>, <MY LENGTH>]
As showed in examples Dynamic values
, functions can be used instead of hard-coded values.
For every parameter of the listed predefined types, functions can be used:
//...
someField: ['array', () => { return someType; }, () => { return someLength; }]
//...
someField: ['string', () => { return someEncoding; }, () => { return someLength; }]
//...
Reading to end
Using any type which accepts a length (int
, string
, array
, buffer
, bits
) with no length, length null
, undefined
, or Infinity
will lead them to read the data until the end.
The rule which all of these types follow is:
Is there any data, even a single bit, left? If yes, read the next element. If no, stop
That means, if, for example, you're reading an array of int32
with length 0
and there's one byte left at the end, the array tries to read another int32
which is not possible and an exception will be thrown. This also happens when reading a buffer or a string if there are single bits left.
The only safe parser (for reading to the end) is Bits
, since it obviously reads single bits and so doesn't get into the case where it tries to read more data than is available.
Big numbers
As integers in JavaScript are only safe up to 52 bit, all numbers which are read with more than 51 bit are represented using native JavaScript BigInts.
Parents
You can access data of the current object when using functions (see examples Dynamic values
and Calculations
).
If you need to use the data of parent objects, you can use the key BinaryParser.symbols.parent
to access that object:
//...
someDynamicField: (currentObject) => currentObject[BinaryParser.symbols.parent].someHeaderField
//...
Symbols as keys
Symbols as keys are not supported and will be completely ignored.
In general, this library is using a feature of many JavaScript engines which is not defined by ECMAScript: When iterating over an object, the keys are given in the order they were defined.
As symbols aren't taken into account when iterating over an object, it's not possible to find out there positions in relation to the normal keys.
API
new BinaryParser(structure)
arguments
structure: object
BinaryParser.parse(data, rootStructure)
arguments
data: Buffer | byte[] | Uint8Array
rootStructure: string
returns
object