0.24.5 • Published 6 months ago

blobby v0.24.5

Weekly downloads
4
License
MIT
Repository
github
Last release
6 months ago

blobby

No, not that Mr. Blobby.

Blobby is an HTTP Proxy for Blob storage systems (such as S3) that automatically shards and replicates your data. Useful for single and multi datacenter architectures, blobby scales your storage and throughput requirements by way of sharding, as well as enables fast local reads in multi datacenter replication setups. Additionally blobby provides a simple CLI for analyzing your complex data architectures by way of storage comparisons, repairs, stats, and more.

NPM

Installation

Blobby can be installed as a local dependency of your app:

npm i blobby --save
./node_modules/.bin/blobby

Or installed globally:

npm i blobby -g
blobby

Basic Usage

Start the HTTP Proxy Server:

blobby server

Copy between storage systems:

blobby copy myOldStorage myNewStorage

See help for a full list of commands:

blobby help

Full Command List

Options

A number of configuration formats are supported, including JSON, JSON5, CommonJS, and Secure Configurations.

OptionTypeDefaultDesc
configarrayOf(string)[]One or more configuration files. If none are provided config-env will be used
config-dirstring"config"Directory of configuration files
config-envstring"NODE_ENV"Environment variable used to detect configuration
config-defaultstring"local"Default configuration to use if environment is not available
config-basestringnoneIf specified will use this configuration as the base (defaults) config that will be deep merged
config-extsarrayOf(string)['.json', '.json5', '.js']Supported extensions to detect for with configuration files
secure-configstringnoneDirectory of secure configuration files
secure-secretstringnoneThe secret required to decrypt secure configuration files
secure-filestringnoneFile to load that holds the secret required to decrypt secure configuration files
modestring"headers"Used when comparing files. For usage see Compare Modes
recursivebooleantrueEnable deep query (recursive subdirectories) for operations that support it
removeGhostsbooleanfalseFor repair's if true, will remove missing file instances instead of copying to missing storage
resume-keystringnoneIf a previous command was stopped you can resume from where you left off with this option
date-minstringnoneMinimum date required when processing records, all others are ignored
date-maxstringnoneMaximum date required when processing records, all others are ignored
retry-minnumber1000Minimum timeout (in ms) for first retry, where retries are applicable
retry-factornumber2Multiple in time applied to retry attempts, where retries are applicable
retry-attemptsnumber3Maximum retry attempts before failure is reported, where retries are applicable

Example using the default NODE_ENV environment variable to load config data:

blobby server --config-dir lib/config

Configuration

NameTypeDefaultDesc
httpHttpBindings{ "default": { "port": 80 } }Collection (hash for ease of merging) of HTTP bindings
http.{id}HttpBinding(required)HTTP Binding Object
http.{id}.portnumber80Port to bind to
http.{id}.hoststringundefinedHost to bind to, or nothing to use Node.js default
http.{id}.sslObject(required if enabling SSL)See Node.js TLS Optionshttp.{id}.ssl.pfxBuffer or stringnoneIf string will attempt to load pfx from disk
http.{id}.ssl.keyBuffer or stringnoneIf string will attempt to load private key from disk
http.{id}.ssl.certBuffer or stringnoneIf string will attempt to load certificate from disk
httpAgentObjectBooleanDefaultsInitialize global http(s) agents with these options. Defaults are optimized for most scenarios.
httpHandlerstringundefinedIf path is provided to a module (Function(req, res)) will allow parent app to peek into incoming requests. If handler returns false Blobby will ignore the request altogether and assume parent is handling the response
storageStorageBindings(required)Collection of storage bindings
storage.{id}StorageBinding(required)Storage Binding Object
storage.{id}.driverstring(required)Module name/path to use as storage client
storage.{id}.maxUploadSizenumbernoneSize in bytes allowed by uploads
storage.{id}.cacheControlstring"public,max-age=31536000"Default cache control headers to apply for GET's and PUT's if file does not provide it
storage.{id}.accessControlstring"public-read"Default to publically readable. Full ACL List
storage.{id}.driverstring(required)Module name/path to use as storage client
storage.{id}.dirSplitnumberfalse(future) If Number, auto-split paths every N characters to make listing of directories much faster
storage.{id}.authstringnoneRequired to support Uploads and Deletes, see Secure API Operations
storage.{id}.replicasarrayOf(string)[]Required to support Replication, see File Replication
storage.{id}.optionsObject{}Options provided to storage driver
retryRetryOptions(optional)Retry options used by some HTTP Server operations
retry.minnumber500Minimum timeout (in ms) for first retry
retry.factornumber2Multiple in time applied to retry attempts
retry.retriesnumber3Maximum retry attempts before failure is reported
corsCorsOptions(optional)CORS access is enabled by default, for GET's only
cors.access-control-allow-credentialsstringtrueAllow credentials
cors.access-control-allow-headersstring*Allow headers
cors.access-control-allow-methodsstringGETAllow methods
cors.access-control-allow-originstring*Allow origins
cors.access-control-max-agestring86400Cache duration of CORS headers
authAuthOptions(optional)Collection of named auth groups
auth.{id}.driverstring(required)Path of the driver to load, ala blobby-auth-header
auth.{id}.optionsObject(optional)Any options to pass to the auth driver
auth.{id}.publicReadsBooleantrueSet to false if GET's also require auth
logLogOptions(optional)Options based on EventEmitter
log.warningsbooltrueLog warnings to console.warn automatically. You can subscribe to client.on('warn') if you prefer
log.errorsbooltrueLog warnings to console.error automatically. You can subscribe to client.on('error') if you prefer

Storage Drivers

Secure Configuration

An optional feature for sensitive credentials is to leverage the included Config Shield support. Any secure configuration objects will be merged into the parent configuration object. If secure-config option is provided, it's expected that for every configuration file, there will be a corresponding secure configuration file using the same file name, but under the secure-config directory.

blobby server --secure-config config/secure --secure-file config/secure/secret.txt

Example for creating a secure configuration:

npm i config-shield -g
cshield config/secure/local.json config/secure/secret.txt
set storage { app1: { options: { password: 'super secret!' } } }
save
exit

See Config Shield for more advanced usage.

Server

Start HTTP Server using the provided Configuration.

blobby server

REST API

MethodRouteAuthInfo
GET/{storageId}/{filePath}PublicGet a file from storage
HEAD/{storageId}/{filePath}PublicGet info for file from storage
PUT/{storageId}/{filePath}SecureCreate or overwrite file in storage.
PUT (copy)/{storageId}/{filePath}SecureCopy file via experimental header x-amz-copy-source: [optional-bucket:]/source/path.
DELETE/{storageId}/{filePath}SecureDelete file from storage
GET/{storageId}/{directoryPath}/SecureGet directory contents by postfixing the path with /
DELETE/{storageId}/{filePath}/SecureDelete directory (recursively) from storage

Example Usage:

curl -XPUT -H "Authorization: ApiKey shhMySecret" --data-binary "@./some-file.jpg" http://localhost/myStorage/some/file.jpg
curl -XHEAD http://localhost/myStorage/some/file.jpg
curl http://localhost/myStorage/some/file.jpg
curl -H "Authorization: ApiKey shhMySecret" http://localhost/myStorage/some/
curl -XDELETE -H "Authorization: ApiKey shhMySecret" http://localhost/myStorage/some/file.jpg

Default permissions will be applied via storage.{id}.accessControl, but can be overridden via the x-amz-acl header, like so:

curl -XPUT -H "x-amz-acl: private" -H "Authorization: ApiKey shhMySecret" --data-binary "@./some-file.jpg" http://localhost/myStorage/some/file.jpg

The above examples is a perfect segway into Secure API Operations.

Secure API Operations

As indicated in Configuration, storage.{id}.auth is required to support uploads and deletes.

Example Config:

  auth: {
    mainAuth: {
      driver: './lib/my-jwt-handler',
      options: { /* options only my auth driver will understand */ }
    }
  },
  storage: {
    store1: {
      driver: '...',
      auth: 'mainAuth' // uploads to store1 require mainAuth
    }
  }

If you're creating your own Authorization handler, you can export a module with the following format:

module.exports = function(req, storageId, fileKey, authConfig, cb) {
  doSomethingAsync(function (err) => {
    if (err) return void cb(err); // fail authorization

    cb(); // authorization check passed, let them through
  });
}

Your handler can be synchronous or asynchronous, but cb must be invoked in either case.

Authorization Drivers

File Replication

As indicated in Configuration, storage.{id}.replicas is required to enabled replication. An array of one or more replicas can be provided, consisting of the storage identifier and optionally the configuration if the desired storage exists in a different environment (such as replication across data centers).

Format is [ConfigId::]StorageId, where ConfigId only needs to be specified if from a different environment.

Example of two replicas, one from same environment, other from a different environment:

replicas: ['myOtherStorage', 'otherConfig::AnotherStorage']

Important: Successful uploads (PUT's) and deletes (DELETE's) are only confirmed if all replica's have been written to. This is to avoid data inconsistencies and race conditions (i.e. performing an action on an asset before it's been written in all locations). In cases where speed is more important than consistency, querystring param waitForReplicas=0 can be set. There is no way to turn off replication without removing from configuration, so this option will only return success once the local storage is successful. The downside of this approach is that high availability is expected for every replica, and uploads (or deletes) will fail if one of the replica's cannot be written to.

Full Command List

Commands:
  checkdir <dir> <storage..>  One-Way shallow directory compare between storage
                              bindings and/or environments
  check <storage..>           One-Way compare files between storage bindings
                              and/or environments
  compare <storage..>         Compare files between storage bindings and/or
                              environments
  copydir <dir> <storage..>   One-way shallow directory copy between storage
                              bindings and/or environments
  copy <storage..>            One-way copy of files between storage bindings
                              and/or environments
  shard <storage> <dir>       Look up the given shard for a given storage and
                              path
  initialize <storage..>      Perform any initialization tasks required by the
                              given storage (ex: pre-creating bucket shards in
                              S3)
  repair <storage..>          Repair files between storage bindings and/or
                              environments
  rmdir <dir> <storage..>     Delete files for the given directory and storage
                              bindings and/or environments
  server                      Start HTTP API Server
  acl <dir> <storage..>       Set ACL's for a given directory for the given
                              storage bindings and/or environments
  stats <storage..>           Compute stats for storage bindings and/or
                              environments

Compare

For comparing the difference between storage bindings and/or environments. This is a two-way comparison. Use check instead if you only want to do a one-way comparison.

blobby compare <storage..>

Example of comparing two bindings:

blobby compare old new

Example of comparing one binding across 2 datacenters:

blobby compare app --config dc1 dc2

Example of comparing two bindings across 2 datacenters:

blobby compare old new --config dc1 dc2

Compare Modes

blobby compare old new --mode deep

Available modes:

  • fast - A simple check of file existence. Only recommended when you're comparing stores configured for immutable data. Size check will also be performed, if the storage driver provides it.
  • headers (recommended) - Similar in speed to fast, but requires ETag or LastModified headers or comparison will fail. Should only be used between storage drivers that support at least one of these headers. NOTE: S3 should only be compared against other S3 storages in this mode due to their inability to overwrite these headers.
  • deep - Performs an ETag check if available, otherwise falls back to loading files and performing hash checks. This option can range from a little slower, to much slower, depending on ETag availability. Recommended for mutable storage comparisons where caching headers are not available (ex: comparing a file system with S3 or vice versa).
  • force - If you want to skip comparison for any reason, this will force the comparison to fail, resulting in update of the destination for all source files. Also has the benefit of being the fastest option since destination does not need queried.

Repair

For repairing the difference between storage bindings and/or environments. This is a two-way repair. Use copy instead if you only want to do a one-way repair.

NPM

blobby repair <storage..>

Example of syncing data between old and new storage:

blobby repair old new

Example of syncing one storage across 2 datacenters:

blobby repair app --config dc1 dc2

Example of syncing two storage across 2 datacenters:

blobby repair old new --config dc1 dc2

For usage of mode, see Compare Modes.

Stats

Query statistics against your storage(s).

blobby stats <storage..>

Example of querying stats for a single storage:

blobby stats old

Initialize

Useful one-time initialization required by some storage drivers, such as pre-creating shard buckets in S3.

blobby initialize <storage..>

Example of initializing a single storage:

blobby initialize new

Shard

Useful for identifying the location of a given directory for storage drivers that support sharding.

blobby shard <storage> <dir>

Example:

blobby shard new 'some/path'
0.24.5

6 months ago

0.24.4

6 months ago

0.24.3

6 months ago

0.24.2

2 years ago

0.24.1

2 years ago

0.24.0

2 years ago

0.23.2

3 years ago

0.23.1

3 years ago

0.23.0

3 years ago

0.22.2

4 years ago

0.22.1

4 years ago

0.22.0

4 years ago

0.21.2

4 years ago

0.21.1

4 years ago

0.21.0

4 years ago

0.20.2

4 years ago

0.20.1

4 years ago

0.20.0

4 years ago

0.19.2

4 years ago

0.19.1

4 years ago

0.19.0

4 years ago

0.18.0

4 years ago

0.17.1

4 years ago

0.17.0

4 years ago

0.16.0

4 years ago

0.15.0

4 years ago

0.14.1

5 years ago

0.14.0

5 years ago

0.13.2

5 years ago

0.13.1

5 years ago

0.13.0

5 years ago

0.12.3

5 years ago

0.12.2

5 years ago

0.12.1

5 years ago

0.12.0

5 years ago

0.11.2

5 years ago

0.11.1

5 years ago

0.11.0

5 years ago

0.10.3

6 years ago

0.10.2

6 years ago

0.10.1

6 years ago

0.10.0

6 years ago

0.9.1

7 years ago

0.9.0

7 years ago

0.8.3

7 years ago

0.8.2

7 years ago

0.8.1

7 years ago

0.8.0

7 years ago

0.7.0

7 years ago

0.6.0

7 years ago

0.5.2

7 years ago

0.5.1

7 years ago

0.5.0

7 years ago

0.4.0

7 years ago

0.3.2

7 years ago

0.3.1

7 years ago

0.3.0

7 years ago

0.2.14

7 years ago

0.2.13

7 years ago

0.2.12

7 years ago

0.2.11

7 years ago

0.2.10

7 years ago

0.2.9

7 years ago

0.2.8

7 years ago

0.2.7

7 years ago

0.2.6

7 years ago

0.2.5

7 years ago

0.2.4

7 years ago

0.2.3

7 years ago

0.2.2

7 years ago

0.2.1

7 years ago

0.2.0

7 years ago

0.1.11

7 years ago

0.1.10

7 years ago

0.1.9

7 years ago

0.1.8

7 years ago

0.1.7

7 years ago

0.1.6

7 years ago

0.1.5

7 years ago

0.1.4

7 years ago

0.1.3

7 years ago

0.1.2

7 years ago

0.1.1

7 years ago

0.1.0

7 years ago