0.9.5 • Published 6 years ago

athena-admin v0.9.5

Weekly downloads
3
License
MIT
Repository
github
Last release
6 years ago

athena-admin

Migrate the table schema, replace objects so that it has partition key=value prefix and add partitions.

overview

$ npm install athena-admin
const AthenaAdmin = require('athena-admin').AthenaAdmin;
const dbDef = require('./sampledatabase.json');
const admin = new AthenaAdmin(dbDef);
await admin.replaceObjects();
await admin.migrate();
await admin.partition();

Database definition

Describe the database definition in the following format.

{
  "general": {
    "athenaRegion": "ap-northeast-1",
    "databaseName": "aaaa",
    "saveDefinitionLocation": "s3://saveDefinitionBucket/aaaa.json"
  },
  "tables": {
    "sample_data": {
      "columns": {
        "user_id": "int",
        "some_value": { /* = "struct<score:int,category:string>" */
          "score": "int",
          "category": "string"
        },
        "some_array1": ["string"], /* = array<string> */
        "some_array2": [{ /* = array<struct<aaa:int,bbb:string>> */
          "aaa": "int",
          "bbb": "string"
        }]
      },
      "srcLocation": "s3://src/location/",
      "partition": {
        "prePartitionLocation": "s3://pre/partition/", /* optional */
        "regexp": "(\\d{4})/(\\d{2})/(\\d{2})/", /* optional */
        "keys": [
          {
            "name": "dt",
            "type": "string",
            "format": "{1}-{2}-{3}", /* optional */
          }
        ]
      }
    }
  }
}

general

FieldDescription
athenaRegionRegion for Athena
databaseNameAthena database name
saveDefinitionLocationLocation to save the previous definition

tables

  • Root field name (sample_data) is a table name.
FieldDescription
columnsColumn name and type pairs. struct<> and array<> can also be described as a json object so you can describe these by converting the actual data values to the type.
srcLocationLocation to be refferenced by Athena
partitionPartition detectable by key=value prefix.If objects' location don't have partition's key=value prefix, you can replace from prePartitionLocation to srcLocation by replaceObjects(). This is for partition() automatically detecting and adding partitions with keys.key as its key and keys.format as its value of keys.type as its type.keys.format's {n} corresponds to the group of regexp. (e.g. s3://pre/partition/2017/12/01/00/aaa.png => [2017/12/01, 2017, 12, 01])

API

replaceObjects(deletePreObject=true, matchedHandler=(matched, objKey, table)=>matched)

Replaces object located in prePartitionLocation to srcLocation with partition key=value prefix. (e.g. s3://pre/partition/2017/12/01/00/aaa.png => s3://src/location/dt=2017-12-01/00/aaa.png)

If you need to change the key before this operation, use matchedHandler. The following example is changing the UTC string to that of TimeZone. (e.g. 2017/12/01/19 => 2017/12/02/04) There are full codes in /sample.

const utcToTZ = (matched, objKey, table) => {
  let existsDt = false;
  table.partition.keys.forEach((key) => {
    if (key.name === 'dt') {
      existsDt = true;
    }
  });
  if (!existsDt) {
    return matched;
  }

  let tz = moment(`${matched[0]} +00:00`, 'YYYY/MM/DD/HH ZZ');
  matched[1] = tz.format('YYYY');
  matched[2] = tz.format('MM');
  matched[3] = tz.format('DD');
  matched[4] = tz.format('HH');
  return matched;
};

await admin.replaceObjects(false, utcToTZ);

migrate()

If there are differences from the previous saved definition in S3, create/drop the table or update the schema.

partition()

Just run MSCK REPAIR TABLE. Partition is automatically detected and added by objects' key=value prefix.

Article

Athenaのmigrationやpartitionするathena-managerを作った - sambaiz-net

0.9.5

6 years ago

0.9.4

6 years ago

0.9.3

6 years ago

0.9.2

6 years ago

0.9.1

6 years ago

0.9.0

6 years ago