laconia-batch v0.4.0
laconia-batch
🛡️ Laconia Batch — Reads large number of records without time limit.
Reads large number of records without Lambda time limit.
AWS Lambda maximum execution duration per request is 300 seconds, hence it is
impossible to utilise a Lambda to execute a long running task. laconia-batch
handles your batch processing needs by providing a beautifully designed API
which abstracts the time limitaton problem.
FAQ
Check out FAQ
Usage
Install laconia-batch using yarn:
yarn add laconia-batchOr via npm:
npm install --save laconia-batchThese are the currently supported input sources:
- DynamoDB
- S3
Example of batch processing by scanning a dynamodb table:
const laconiaBatch = require("laconia-batch");
module.exports.handler = laconiaBatch(
_ =>
laconiaBatch.dynamoDb({
operation: "SCAN",
dynamoDbParams: { TableName: "Music" }
}),
{ itemsPerSecond: 2 }
).on("item", ({ event }, item) => processItem(event, context));Rate limiting is supported out of the box by setting the batchOptions.itemsPerSecond
option.
How it works
laconia-batch works around the Lambda's time limitation by using recursion.
It will automatically recurse when Lambda timeout is about to happen, then resumes
from where it left off in the new invocation.
Imagine if you are about to process the array 1, 2, 3, 4, 5 and each requests can only handle two items, the following will happen:
- request 1: Process 1
- request 1: Process 2
- request 1: Not enough time, recursing with current cursor
- request 2: Process 3
- request 2: Process 4
- request 2: Not enough time, recursing with current cursor
- request 3: Process 5
API
laconiaBatch(readerFn, batchOptions)
readerFn(laconiaContext)- This
Functionis called when your Lambda is invoked - The function must return a reader object i.e.
dynamoDb(),s3() - Will be called with
laconiaContextobject, which can be destructured to{event, context}
- This
batchOptionsitemsPerSecond- Optional
- Rate limit will not be applied if value is not set
- Can be set to decimal, i.e. 0.5 will equate to 1 item per 2 second.
timeNeededToRecurseInMillis- Optional
- The value set here will be used to check if the current execution is to be stopped
- If you have a very slow item processing, the batch processor might not have enough time to recurse and your Lambda execution might be timing out. You can increase this value to increase the chance of the the recursion to happen
Example:
// Use all default batch options (No rate limiting)
laconiaBatch(_ => dynamoDb());
// Customise batch options
laconiaBatch(_ => dynamoDb(), {
itemsPerSecond: 2,
timeNeededToRecurseInMillis: 10000
});Events
There are events that you can listen to when laconia-batch is working.
- item:
laconiaContext, item- Fired on every item read.
itemis an object found during the readlaconiaContextcan be destructured to{event, context}
- start:
laconiaContext- Fired when the batch process is started for the very first time
laconiaContextcan be destructured to{event, context}
- stop:
laconiaContext, cursor- Fired when the current execution is timing out and about to be recursed
cursorcontains the information of how the last item is being readlaconiaContextcan be destructured to{event, context}
- end:
laconiaContext- Fired when the batch processor can no longer find any more records
laconiaContextcan be destructured to{event, context}
Example:
laconiaBatch({ ... })
.on('start', (laconiaContext) => ... )
.on('item', (laconiaContext, item) => ... )
.on('stop', (laconiaContext, cursor) => ... )
.on('end', (laconiaContext) => ... )dynamoDb(readerOptions)
Creates a reader for Dynamo DB table.
operation- Mandatory
- Valid values are:
'SCAN'and'QUERY'
dynamoDbParams- Mandatory
- This parameter is used when documentClent's operations are called
ExclusiveStartKeyparam can't be used as it will be overridden in the processing time!
documentClient = new AWS.DynamoDB.DocumentClient()- Optional
- Set this option if there's a need to cutomise the AWS.DynamoDB.DocumentClient instantation
- Used for DynamoDB operation
Example:
// Scans the entire Music table
dynamoDb({
operation: "SCAN",
dynamoDbParams: { TableName: "Music" }
});
// Queries Music table with a more complicated DynamoDB parameters
dynamoDb({
operation: "QUERY",
dynamoDbParams: {
TableName: "Music",
Limit: 1,
ExpressionAttributeValues: {
":a": "Bar"
},
FilterExpression: "Artist = :a"
}
});s3(readerOptions)
Creates a reader for an array stored in s3.
path- Mandatory
- The path to the array to be processed
- Set to
'.'if the object stored in s3 is the array - Set to a path if an object is stored in s3 and the array is a property of the object
lodash.getis used to retrieve the array
s3Params- Mandatory
- This parameter is used when
s3.getObjectis called to retrieve the array stored in s3
s3 = new AWS.S3()- Optional
- Set this option if there's a need to cutomise the AWS.S3 instantation
- Used for S3 operation
Example:
// Reads an array from array.json in MyBucket
s3({
path: ".",
s3Params: {
Bucket: "MyBucket",
Key: "array.json"
}
});
// Reads the array retrieved at database.music[0]["category"].list from object.json in MyBucket
s3({
path: 'database.music[0]["category"].list',
s3Params: {
Bucket: "MyBucket",
Key: "object.json"
}
});