2.1.1 • Published 1 year ago

s3db.js v2.1.1

Weekly downloads
-
License
UNLICENSE
Repository
github
Last release
1 year ago

s3db.js

license: unlicense npm version Maintainability Coverage Status

Another way to create a cheap document-base database with an easy ORM to handle your dataset!

  1. Motivation
  2. Usage
    1. Install
    2. Quick Setup
    3. Insights
    4. Database
    5. Create a resource
  3. Resource methods
    1. Insert one
    2. Get one
    3. Update one
    4. Delete one
    5. Count
    6. Insert many
    7. Get many
    8. Get all
    9. Delete many
    10. Delete all
    11. List ids
  4. Resource streams
    1. Readable stream
    2. Writable stream
  5. S3 Client
  6. Events
  7. Plugins
  8. Examples
  9. Cost Simulation
    1. Big Example
    2. Small example
  10. Roadmap

Motivation

First of all:

  1. Nothing is for free, but it can be cheaper.
  2. I'm not responsible for your AWS Costs strategy, use s3db.js at your own risk.
  3. Please, do not use in production!

Let's go!

You might know AWS's S3 product for its high availability and its cheap pricing rules. I'll show you another clever and funny way to use S3.

AWS allows you define Metadata to every single file you upload into your bucket. This attribute must be defined within a 2kb limit using in UTF-8 encoding. As this encoding may vary the bytes width for each symbol you may use 500 to 2000 chars of metadata storage. Follow the docs at AWS S3 User Guide: Using metadata.

There is another management subset of data called tags that is used globally as key, value params. You can assign 10 tags with the conditions of: the key must be at most 128 unicode chars lengthy and the value up to 256 chars. With those key-values we can use more 2.5kb of data, unicode will allow you to use up to 2500 more chars. Follow the official docs at AWS User Guide: Object Tagging.

With all this set you may store objects that should be able to store up to 4.5kb of free space per object.

Check the cost simulation section below for a deep cost dive!

Lets give it a try! :)


Usage

You may check the snippets bellow or go straight to the Examples section!

Install

npm i s3db.js

# or

yarn add s3db.js

Quick setup

Our S3db client use connection string params.

import { S3db } from "s3db.js";

const {
  AWS_BUCKET,
  AWS_ACCESS_KEY_ID,
  AWS_SECRET_ACCESS_KEY,
} = process.env

const s3db = new S3db({
  uri: `s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}@${AWS_BUCKET}/databases/mydatabase`
});

s3db
  .connect()
  .then(() => console.log('connected!')))

If you do use dotenv package:

import * as dotenv from "dotenv";
dotenv.config();

import { S3db } from "s3db.js";

Insights

  • This implementation of ORM simulates a document repository. Due to the fact that s3db.js uses aws-sdk's' S3 api; all requests are GET/PUT as key=value resources. So the best case scenario is to access like a document implementation.

  • For better use of the cache and listing, the best ID format is to use sequential ids with leading zeros (eq: 00001, 00002, 00003) due to S3 internal keys sorting method. But you will need to manage this incremental ID by your own.

Database

Your s3db.js client can be initiated with options:

optionoptionaldescriptiontypedefault
cachetruePersist searched data to reduce repeated requestsbooleanundefined
parallelismtrueNumber of simultaneous tasksnumber10
passphrasetrueYour encryption secretstringundefined
ttltrue(Coming soon) TTL to your cache duration in secondsnumber86400
urifalseA url as your S3 connection stringstringundefined

Config example:

const {
  AWS_BUCKET = "my-bucket",
  AWS_ACCESS_KEY_ID = "secret",
  AWS_SECRET_ACCESS_KEY = "secret",
  AWS_BUCKET_PREFIX = "databases/test-" + Date.now(),
} = process.env;

const uri = `s3://${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}@${AWS_BUCKET}/${AWS_BUCKET_PREFIX}`;

const options = {
  uri,
  parallelism: 25,
  passphrase: fs.readFileSync("./cert.pem"),
};

s3db.connect()

This method must always be invoked before any operation take place. This will interact with AWS' S3 api and check the itens below:

  1. With current credentials:
    • Check if client has access to the S3 bucket.
    • Check if client has access to bucket life-cycle policies.
  2. With defined database:
    • Check if there is already a database in this connection string.
      • If any database is found, downloads it's medatada and loads each Resource definition.
      • Else, it will generate an empty metadata file into this prefix and mark that this is a new database from scratch.

Metadata file

s3db.js will generate a file /s3db.json at the pre-defined prefix with this structure:

{
  // file version
  "version": "1",

  // previously defined resources
  "resources": {
    // definition example
    "leads": {
      "name": "leads",

      // resource options
      "options": {},

      // resource defined schema
      "schema": {
        "name": "string",
        "token": "secret"
      },

      // rules to simplify metadata usage
      "mapper": {
        "name": "0",
        "token": "1"
      },
    }
  }
}

Create a resource

Resources are definitions of data collections.

// resource
const attributes = {
  utm: {
    source: "string|optional",
    medium: "string|optional",
    campaign: "string|optional",
    term: "string|optional",
  },
  lead: {
    fullName: "string",
    mobileNumber: "string",
    personalEmail: "email",
  },
};

const resource = await s3db.createResource({
  name: "leads",
  attributes,
});

Resources' names cannot prefix each other, like: leads and leads-copy! S3's api lists keys using prefix notation, so every time you list leads, all keys of leads-copy will appear as well.

Attributes

s3db.js use the fastest-validator package to define and validate your resource. Some few examples:

const attributes = {
  // few simple examples
  name: "string|min:4|max:64|trim",
  email: "email|nullable",
  mobile: "string|optional",
  count: "number|integer|positive",
  corrency: "corrency|symbol:R$",
  createdAt: "date",
  website: "url",
  id: "uuid",
  ids: "array|items:uuid|unique",

  // s3db defines a custom type "secret" that is encrypted
  token: "secret",

  // nested data works aswell
  geo: {
    lat: "number",
    long: "number",
    city: "string",
  },

  // may have multiple definitions.
  address_number: ["string", "number"],
};
Reference:

You may just use the reference:

const Leads = s3db.resource("leads");
Limitations:

As we need to store the resource definition within a JSON file, to keep your definitions intact the best way is to use the string-based shorthand definitions in your resource definition.

By design, the resource definition will will strip all functions in attributes to avoid eval() calls.

The fastest-validator starts with the params below:

// fastest-validator params
{
  useNewCustomCheckerFunction: true,
  defaults: {
    object: {
      strict: "remove",
    },
  },
}

Resources methods

Consider resource as:

const resource = s3db.resource("leads");

Insert one

// data
const insertedData = await resource.insert({
  id: "mypersonal@email.com", // if not defined a id will be generated!
  utm: {
    source: "abc",
  },
  lead: {
    fullName: "My Complex Name",
    personalEmail: "mypersonal@email.com",
    mobileNumber: "+5511234567890",
  },
  invalidAttr: "this attribute will disappear",
});

// {
//   id: "mypersonal@email.com",
//   utm: {
//     source: "abc",
//   },
//   lead: {
//     fullName: "My Complex Name",
//     personalEmail: "mypersonal@email.com",
//     mobileNumber: "+5511234567890",
//   },
//   invalidAttr: "this attribute will disappear",
// }

If not defined an id attribute, s3db.js will use nanoid to generate a random unique id!

Get one

const obj = await resource.get("mypersonal@email.com");

// {
//   id: "mypersonal@email.com",
//   utm: {
//     source: "abc",
//   },
//   lead: {
//     fullName: "My Complex Name",
//     personalEmail: "mypersonal@email.com",
//     mobileNumber: "+5511234567890",
//   },
// }

Update one

const obj = await resource.update("mypersonal@email.com", {
  lead: {
    fullName: "My New Name",
    mobileNumber: "+5511999999999",
  },
});

// {
//   id: "mypersonal@email.com",
//   utm: {
//     source: "abc",
//   },
//   lead: {
//     fullName: "My New Name",
//     personalEmail: "mypersonal@email.com",
//     mobileNumber: "+5511999999999",
//   },
// }

Delete one

await resource.delete(id);

Count

await resource.count();

// 101

Insert many

You may bulk insert data with a friendly method that receives a list of objects.

const objects = new Array(100).fill(0).map((v, k) => ({
  id: `bulk-${k}@mymail.com`,
  lead: {
    fullName: "My Test Name",
    personalEmail: `bulk-${k}@mymail.com`,
    mobileNumber: "+55 11 1234567890",
  },
}));

await resource.insertMany(objects);

Keep in mind that we need to send a request to each object to be created. There is an option to change the amount of simultaneos connections that your client will handle.

const s3db = new S3db({
  parallelism: 100, // default = 10
});

This method uses supercharge/promise-pool to organize the parallel promises.

Get many

await resource.getMany(["id1", "id2", "id3 "]);

// [
//   obj1,
//   obj2,
//   obj3,
// ]

Get all

const data = await resource.getAll();

// [
//   obj1,
//   obj2,
//   ...
// ]

Delete many

await resource.deleteMany(["id1", "id2", "id3 "]);

Delete all

await resource.deleteAll();

List ids

const ids = await resource.listIds();

// [
//   'id1',
//   'id2',
//   'id3',
// ]

Resource streams

As we need to request the metadata for each id to return it's attributes, a better way to handle a huge amount off data might be using streams.

Readable stream

const readableStream = await resource.readable();

readableStream.on("id", (id) => console.log("id =", id));
readableStream.on("data", (lead) => console.log("lead.id =", lead.id));
readableStream.on("end", console.log("end"));

Writable stream

const writableStream = await resource.writable();

writableStream.write({
  lead: {
    fullName: "My Test Name",
    personalEmail: `bulk-${k}@mymail.com`,
    mobileNumber: "+55 11 1234567890",
  },
});

S3 Client

s3db.js has a S3 proxied client named S3Client. It brings a few handy and less verbose functions to deal with AWS S3's api.

import { S3Client } from "s3db.js";

const client = new S3Client({ connectionString });

Each method has a :link: link to the official aws-sdk docs.

getObject :link:
const { Body, Metadata } = await client.getObject({
  key: `my-prefixed-file.csv`,
});

// AWS.Response
putObject :link:
const response = await client.putObject({
  key: `my-prefixed-file.csv`,
  contentType: "text/csv",
  metadata: { a: "1", b: "2", c: "3" },
  body: "a;b;c\n1;2;3\n4;5;6",
});

// AWS.Response
headObject :link:
const { Metadata } = await client.headObject({
  key: `my-prefixed-file.csv`,
});

// AWS.Response
deleteObject :link:
const response = await client.deleteObject({
  key: `my-prefixed-file.csv`,
});

// AWS.Response
deleteObjects :link:
const response = await client.deleteObjects({
  keys: [`my-prefixed-file.csv`, `my-other-prefixed-file.csv`],
});

// AWS.Response
listObjects :link:
const response = await client.listObjects({
  prefix: `my-subdir`,
});

// AWS.Response
count

Custom made method to make it easier to count keys within a listObjects loop.

const count = await client.count({
  prefix: `my-subdir`,
});

// 10
getAllKeys

Custom made method to make it easier to return all keys in a subpath within a listObjects loop.

All returned keys will have the it's fullpath replaced with the current "scope" path.

const keys = await client.getAllKeys({
  prefix: `my-subdir`,
});

// [
//   key1,
//   key2,
//   ...
// ]

Events

The 3 main classes S3db, Resource and S3Client are extensions of Javascript's EventEmitter.

S3DatabaseS3ClientS3ResourceS3Resource Readable Stream
errorerrorerrorerror
connectedrequestinsertid
responsegetdata
responseupdate
getObjectdelete
putObjectcount
headObjectinsertMany
deleteObjectdeleteAll
deleteObjectslistIds
listObjectsgetMany
countgetAll
getAllKeys

S3Database

error

s3db.on("error", (error) => console.error(error));

connected

s3db.on("connected", () => {});

S3Client

Using this reference for the events:

const client = s3db.client;

error

client.on("error", (error) => console.error(error));

request

Emitted when a request is generated to AWS.

client.on("request", (action, params) => {});

response

Emitted when a response is received from AWS.

client.on("response", (action, params, response) => {});

getObject

client.on("getObject", (options, response) => {});

putObject

client.on("putObject", (options, response) => {});

headObject

client.on("headObject", (options, response) => {});

deleteObject

client.on("deleteObject", (options, response) => {});

deleteObjects

client.on("deleteObjects", (options, response) => {});

listObjects

client.on("listObjects", (options, response) => {});

count

client.on("count", (options, response) => {});

getAllKeys

client.on("getAllKeys", (options, response) => {});

S3Resource

Using this reference for the events:

const resource = s3db.resource("leads");

error

resource.on("error", (err) => console.error(err));

insert

resource.on("insert", (data) => {});

get

resource.on("get", (data) => {});

update

resource.on("update", (attrs, data) => {});

delete

resource.on("delete", (id) => {});

count

resource.on("count", (count) => {});

insertMany

resource.on("insertMany", (count) => {});

getMany

resource.on("getMany", (count) => {});

getAll

resource.on("getAll", (count) => {});

deleteAll

resource.on("deleteAll", (count) => {});

listIds

resource.on("listIds", (count) => {});

Plugins

Anatomy of a plugin:

const MyPlugin = {
  setup(s3db: S3db) {},
  start() {},
};

We have an example of a costs simulator plugin here!


Examples

The processing power here was not the priority, just used my little nodebook Dell XPS. Check the ./examples directory to get some ideas on how to use this package and the code of the examples below.

Examples' random data uses fakerator, git it a try!

Bulk insert

$ npm run example:1

> s3db.js@1.0.0 example:1
> cd examples; node 1-bulk-insert.js

creating 10000 leads.
parallelism of 250 requests.

bulk-writing  10000/10000 (100%)  [==============================]  255/bps  0.0s (39.2s) [10001 requests]
bulk-writing: 40.404s

Total cost: 0.0500 USD

Resource read stream

$ npm run example:2

> s3db.js@1.0.0 example:2
> cd examples; node 2-read-stream.js

reading 10000 leads.
parallelism of 250 requests.

reading-pages   40/1 (100%)  [==============================]  1/bps  0.0s (64.4s)
reading-ids     10000/10000 (100%)  [==============================]  155/bps  0.0s (64.5s)
reading-data    10000/10000 (100%)  [==============================]  153/bps  0.0s (65.3s)
reading: 1:07.246 (m:ss.mmm)

Total cost: 0.0041 USD

Resource read stream writing into a csv

$ npm run example:3

> s3db.js@1.0.0 example:3
> cd examples; node 3-read-stream-to-csv.js

reading 10000 leads.
parallelism of 250 requests.

reading-data  10000/10000 (100%)  [==============================]  123/bps  0.0s (81.3s)
reading-data: 1:23.852 (m:ss.mmm)

Total size: 1.31 Mb

Resource read stream writing into a zipped csv

$ npm run example:4

> s3db.js@1.0.0 example:4
> cd examples; node 4-read-stream-to-zip.js

reading 10000 leads.
parallelism of 250 requests.

reading-data  10000/10000 (100%)  [==============================]  141/bps  0.0s (71.0s)
reading-data: 1:13.078 (m:ss.mmm)

Total zip size: 0.68 Mb

Write Stream

$ npm run example:5

> s3db.js@1.0.0 example:6
> cd examples; node 5-write-stream.js

reading 10000 leads.
parallelism of 250 requests.

requests        20010/1 (100%)  [==============================]  49/bps  0.0s (410.0s)
reading-pages   40/1 (100%)  [==============================]  0/bps  0.0s (395.6s)
reading-ids     10000/10000 (100%)  [==============================]  25/bps  0.0s (395.6s)
reading-data    10000/10000 (100%)  [==============================]  25/bps  0.0s (401.5s)
writing-ids     10000/10000 (100%)  [==============================]  25/bps  0.0s (395.7s)
writing-data    10000/10000 (100%)  [==============================]  25/bps  0.0s (395.7s)
copying-data: 6:51.352 (m:ss.mmm)

Total cost: 0.0541 USD

JWT Token validator

$ npm run example:6

> s3db.js@1.0.0 example:6
> cd examples; node jwt-tokens.js

Created tokens: .....
Validated tokens: .....

Cost simulation

S3's pricing deep dive:

Check by yourself the pricing page details at https://aws.amazon.com/s3/pricing/ and https://calculator.aws/#/addService/S3.

Big example

Lets try to simulate a big project where you have a database with a few tables:

  • pageviews: 100,000,000 lines of 100 bytes each
  • leads: 1,000,000 lines of 200 bytes each
const Fakerator = require("fakerator");
const fake = Fakerator("pt-BR");

const pageview = {
  ip: this.faker.internet.ip(),
  domain: this.faker.internet.url(),
  path: this.faker.internet.url(),
  query: `?q=${this.faker.lorem.word()}`,
};

const lead = {
  name: fake.names.name(),
  mobile: fake.phone.number(),
  email: fake.internet.email(),
  country: "Brazil",
  city: fake.address.city(),
  state: fake.address.countryCode(),
  address: fake.address.street(),
};

If you write the whole database of:

  • pageviews:
    • 100,000,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 500.00 USD (S3 Standard PUT requests cost)
  • leads:
    • 1,000,000 PUT requests for S3 Standard Storage x 0.000005 USD per request = 5.00 USD (S3 Standard PUT requests cost)

It will cost 505.00 USD, once.

If you want to read the whole database:

  • pageviews:
    • 100,000,000 GET requests in a month x 0.0000004 USD per request = 40.00 USD (S3 Standard GET requests cost)
    • (100,000,000 × 100 bytes)÷(1024×1000×1000) ≅ 10 Gb Internet: 10 GB x 0.09 USD per GB = 0.90 USD
  • leads:
    • 1,000,000 GET requests in a month x 0.0000004 USD per request = 0.40 USD (S3 Standard GET requests cost)
    • (1,000,000 × 200 bytes)÷(1024×1000×1000) ≅ 0.19 Gb Internet: 1 GB x 0.09 USD per GB = 0.09 USD

It will cost 41.39 USD, once.

Small example

Lets save some JWT tokens using the RFC:7519.

await s3db.createResource({
  name: "tokens",
  attributes: {
    iss: 'url|max:256',
    sub: 'string',
    aud: 'string',
    exp: 'number',
    email: 'email',
    name: 'string',
    scope: 'string',
    email_verified: 'boolean',
  })

function generateToken () {
  const token = createTokenLib(...)

  await resource.insert({
    id: token.jti || md5(token)
    ...token,
  })

  return token
}

function validateToken (token) {
  const id = token.jti || md5(token)

  if (!validateTokenSignature(token, ...)) {
    await resource.deleteById(id)
    throw new Error('invalid-token')
  }

  return resource.getById(id)
}

Roadmap

Tasks board can be found at this link!

Feel free to interact and PRs are welcome! :)