0.0.5 • Published 5 years ago

@statengine/se-ingest-router v0.0.5

Weekly downloads
5
License
GPL-3.0
Repository
-
Last release
5 years ago

se-ingest-router

The se-ingest-router services normalizes raw ingest data and routes the normalized data to an ingest partition queue to be consumed by other services. se-ingest-router should be run as a persistent container that consumes messages from an SQS queue.

Message processing overview

  1. Because of size constaints, incoming messages are simply pointers to objects in S3
{
  s3Location: {
     Bucket: '....'
     Key: '...',
  }
}

After receiving a message in SQS, the router will then fetch the real payload based on this pointer.

  1. After fetching the payload, the router does two things
  • determines sequence number
  • normalizes the message utilizing the appropriate siamese package
  1. Based on incident number, the router will route the normalized message to the appropriate AMQP queue

Routing semantics

The following guidelines should be adhered to regarding routing logic.

1) Messages regarding the same incident should ALWAYS be routed to the same ingest partition. Each ingest partition processes messages sequentially, so by ensuring the same processor is processing all the messages for a given incident - all messages for a single incident are guranteed to be processed sequentially by the same processor. This allows us to avoid read/write collisions and other benefits such as improved cached hits on the same processor.

Sequence Number Semantics

The sequence number plays an important role in downstream processing. The sequence number allows us to recieve out-of-order messages from the source (i.e. due to publisher race conditions, non-guranteed AWS processing ordering, or because of parallel consumption from SQS). By establishing a sequence number, we know what messages occurred first. This is especially important for departments that send multiple messages per incident (such as Boston, DC, Tucson).

We attempt to determine the sequence number based on id (aka. unique filename). If the filename doesn't follow an established convention we fall back to the message timestamp.
See test/getSequenceNumber for examples or modifying the logic