gs-storage v1.0.0
Overview
The storage layer of the GrafSync system is responsible for all writes to the system based on the consumption of messages from the queue.
Startup
Startup of the server and storage layers can happen in either order. There is a two way sync to ensure that both are ready to consume messages.
The storage layer will consume existing messages during startup. It will send out a 'systemReady' message and when it consumes that one pause consumption, waiting for the server layer to also consume the message. When the server layers consumes that message it loads any data it needs, knowing that the storage layer has processed to the same point in the messages, and then calls the 'signal' endpoint in the storage layer to say it is ready. At this point both layers start consumer further messages.
Stack creation
Stack creation is based on the idea that an instance has a number of 'subscription units' that it can support. Different types of subscription are allocated a different numbers of units. Subscription types are grouped together into sets and only subscription types in that set can be put into the same bucket.
If a new subscription is required the code will check if there is space in any existing buckets and choose the first one in the results. If there is not enough space a new stack will be created.
Stack deletion is currently manual.
Message consumption
Message consumption only happens one at a time. All messages are either acknowledged in the positive (ACK) or negative (NACK). If a failure occurs the message is still acknowledged and the subscription 'failed'. This avoids blocking other subscriptions from being processed.
Email sending
In general emails will be sent by the storage instance master in the region in which the message originated. The exception is system errors which require any instance to potentially send an email to support.
Emails are sent via Amazon SES and applicable configuration of SES is needed for emails to be handled.
Imports
All imported filed processed by the server layer are uploaded to an S3 bucket once checked and validated. The storage layer will retrieve the file and load it into a staging named graph, apply final processing and then transfer it into the applicable 'live' named graph.
Only one S3 bucket is used by all regions the system is deployed in.
Environment variables
Name | Values | Purpose |
---|---|---|
API_ENDPOINT | URL | The web address and port of the server layer API for use in communications to users, such as emails, e.g. http://graphologi.graphifi.com |
AUTO_REGISTER | true or false | Indicates if users should be automatically registered (i.e. no email confirmation). |
AWS_REGION | AWS region name, e.g. eu-west-2 | The region in which this instance is deployed. |
BACKUP_DIRECTORY | Folder path | The path to the folder where the RDF store will back up data. |
BUCKET_MODE | SINGLE or omit | Controls whether all new subscriptions go into a single bucket. |
BUCKET_UNITS | Integer | Defines the maximum number of subscription units on a single instance. |
DATA_DIRECTORY | Folder path | The path to the folder where the RDF store will store data. |
LUCENCE_DIRECTORY | Folder path | The path to the folder where the RDF store will store search indexes (if applicable). |
DB | FUSEKI | The RDF store being used by the system. |
KB_SERVICE | Relative folder path | The path to the database service code, e.g. /fuseki/fusekiService. |
INSTANCE_TYPE | FULL or omit | Controls whether instance acts as a subscription master. |
INTEGRATION_EY | String | The text to prepend an integration key with. |
IS_STACK_CREATOR | YES or omit | Indicates whether this instance is responsible for creating AWS stacks (only instance only globally). |
LOG_DIRECTORY | Folder path | Folder for logging |
MESSAGE_HOST | Web address | The address of the AmazonMQ instance. |
MESSAGE_PORT | Port number | The port number on which the AmazonMQ instance will listen. |
MESSAGE_USERNAME | Username | The username of the AmazonMQ login. |
MESSAGE_PASSWORD | Password | The password of the AmazonMQ login. |
MESSAGE_SSL | true or fale | Indicate whether connection with AmazonMQ is over SSL. |
MESSAGE_TOPIC | String | The name of the AmazonMQ topic used by GrafSync. |
MUTIPLE_TRIALS | true or false | Indicates whether multiples trial subscriptions are permitted for a user (for test purposes). |
PATH_TO_FILES | Folder path | The path to files used by the system to load data into the RDF store. |
PORT | Port number | Controls the port number the storage layer will listen on. |
REGIONS | Number | The number of regions in which the system will be deployed. |
S3_IMPORT_BUCKET | S3 bucket name | The name of the bucket that the is used for storage of import files. |
S3_SNAPSHOT_BUCKET | S3 bucket name | The name of the bucket that the is used for storage of snapshots. |
S3_UPGRADE_BUCKET | S3 bucket name | The name of the bucket that the is used for storing backups for subscription porting. |
S3_REGION | AWS region name | The name of the region in which the import S3 bucket is located. |
SERVER_ENDPOINT | URL | The web address and port of the server layer used internally. |
SERVER_NUMBER | Integer | The bucket number of the instance. |
SPARQL_ENDPOINT | URL | The web address and port of the SPARQL endpoint. |
SYSTEM_SPARQL_ENDPOINT | URL | The web address of the regional master SPARQL endpoint. |
UPLOAD_DIRECTORY | Folder path | Upload directory |
VERSIONS_DIRECTORY | Folder path | Versions directory |
2 years ago