1.0.0 • Published 2 years ago

gs-storage v1.0.0

Weekly downloads
-
License
ISC
Repository
-
Last release
2 years ago

Overview

The storage layer of the GrafSync system is responsible for all writes to the system based on the consumption of messages from the queue.

Startup

Startup of the server and storage layers can happen in either order. There is a two way sync to ensure that both are ready to consume messages.

The storage layer will consume existing messages during startup. It will send out a 'systemReady' message and when it consumes that one pause consumption, waiting for the server layer to also consume the message. When the server layers consumes that message it loads any data it needs, knowing that the storage layer has processed to the same point in the messages, and then calls the 'signal' endpoint in the storage layer to say it is ready. At this point both layers start consumer further messages.

Stack creation

Stack creation is based on the idea that an instance has a number of 'subscription units' that it can support. Different types of subscription are allocated a different numbers of units. Subscription types are grouped together into sets and only subscription types in that set can be put into the same bucket.

If a new subscription is required the code will check if there is space in any existing buckets and choose the first one in the results. If there is not enough space a new stack will be created.

Stack deletion is currently manual.

Message consumption

Message consumption only happens one at a time. All messages are either acknowledged in the positive (ACK) or negative (NACK). If a failure occurs the message is still acknowledged and the subscription 'failed'. This avoids blocking other subscriptions from being processed.

Email sending

In general emails will be sent by the storage instance master in the region in which the message originated. The exception is system errors which require any instance to potentially send an email to support.

Emails are sent via Amazon SES and applicable configuration of SES is needed for emails to be handled.

Imports

All imported filed processed by the server layer are uploaded to an S3 bucket once checked and validated. The storage layer will retrieve the file and load it into a staging named graph, apply final processing and then transfer it into the applicable 'live' named graph.

Only one S3 bucket is used by all regions the system is deployed in.

Environment variables

NameValuesPurpose
API_ENDPOINTURLThe web address and port of the server layer API for use in communications to users, such as emails, e.g. http://graphologi.graphifi.com
AUTO_REGISTERtrue or falseIndicates if users should be automatically registered (i.e. no email confirmation).
AWS_REGIONAWS region name, e.g. eu-west-2The region in which this instance is deployed.
BACKUP_DIRECTORYFolder pathThe path to the folder where the RDF store will back up data.
BUCKET_MODESINGLE or omitControls whether all new subscriptions go into a single bucket.
BUCKET_UNITSIntegerDefines the maximum number of subscription units on a single instance.
DATA_DIRECTORYFolder pathThe path to the folder where the RDF store will store data.
LUCENCE_DIRECTORYFolder pathThe path to the folder where the RDF store will store search indexes (if applicable).
DBFUSEKIThe RDF store being used by the system.
KB_SERVICERelative folder pathThe path to the database service code, e.g. /fuseki/fusekiService.
INSTANCE_TYPEFULL or omitControls whether instance acts as a subscription master.
INTEGRATION_EYStringThe text to prepend an integration key with.
IS_STACK_CREATORYES or omitIndicates whether this instance is responsible for creating AWS stacks (only instance only globally).
LOG_DIRECTORYFolder pathFolder for logging
MESSAGE_HOSTWeb addressThe address of the AmazonMQ instance.
MESSAGE_PORTPort numberThe port number on which the AmazonMQ instance will listen.
MESSAGE_USERNAMEUsernameThe username of the AmazonMQ login.
MESSAGE_PASSWORDPasswordThe password of the AmazonMQ login.
MESSAGE_SSLtrue or faleIndicate whether connection with AmazonMQ is over SSL.
MESSAGE_TOPICStringThe name of the AmazonMQ topic used by GrafSync.
MUTIPLE_TRIALStrue or falseIndicates whether multiples trial subscriptions are permitted for a user (for test purposes).
PATH_TO_FILESFolder pathThe path to files used by the system to load data into the RDF store.
PORTPort numberControls the port number the storage layer will listen on.
REGIONSNumberThe number of regions in which the system will be deployed.
S3_IMPORT_BUCKETS3 bucket nameThe name of the bucket that the is used for storage of import files.
S3_SNAPSHOT_BUCKETS3 bucket nameThe name of the bucket that the is used for storage of snapshots.
S3_UPGRADE_BUCKETS3 bucket nameThe name of the bucket that the is used for storing backups for subscription porting.
S3_REGIONAWS region nameThe name of the region in which the import S3 bucket is located.
SERVER_ENDPOINTURLThe web address and port of the server layer used internally.
SERVER_NUMBERIntegerThe bucket number of the instance.
SPARQL_ENDPOINTURLThe web address and port of the SPARQL endpoint.
SYSTEM_SPARQL_ENDPOINTURLThe web address of the regional master SPARQL endpoint.
UPLOAD_DIRECTORYFolder pathUpload directory
VERSIONS_DIRECTORYFolder pathVersions directory