athena-admin v0.9.5
athena-admin
Migrate the table schema, replace objects so that it has partition key=value prefix and add partitions.

$ npm install athena-adminconst AthenaAdmin = require('athena-admin').AthenaAdmin;
const dbDef = require('./sampledatabase.json');
const admin = new AthenaAdmin(dbDef);
await admin.replaceObjects();
await admin.migrate();
await admin.partition();Database definition
Describe the database definition in the following format.
{
"general": {
"athenaRegion": "ap-northeast-1",
"databaseName": "aaaa",
"saveDefinitionLocation": "s3://saveDefinitionBucket/aaaa.json"
},
"tables": {
"sample_data": {
"columns": {
"user_id": "int",
"some_value": { /* = "struct<score:int,category:string>" */
"score": "int",
"category": "string"
},
"some_array1": ["string"], /* = array<string> */
"some_array2": [{ /* = array<struct<aaa:int,bbb:string>> */
"aaa": "int",
"bbb": "string"
}]
},
"srcLocation": "s3://src/location/",
"partition": {
"prePartitionLocation": "s3://pre/partition/", /* optional */
"regexp": "(\\d{4})/(\\d{2})/(\\d{2})/", /* optional */
"keys": [
{
"name": "dt",
"type": "string",
"format": "{1}-{2}-{3}", /* optional */
}
]
}
}
}
}general
| Field | Description |
|---|---|
| athenaRegion | Region for Athena |
| databaseName | Athena database name |
| saveDefinitionLocation | Location to save the previous definition |
tables
- Root field name (sample_data) is a table name.
| Field | Description |
|---|---|
| columns | Column name and type pairs. struct<> and array<> can also be described as a json object so you can describe these by converting the actual data values to the type. |
| srcLocation | Location to be refferenced by Athena |
| partition | Partition detectable by key=value prefix.If objects' location don't have partition's key=value prefix, you can replace from prePartitionLocation to srcLocation by replaceObjects(). This is for partition() automatically detecting and adding partitions with keys.key as its key and keys.format as its value of keys.type as its type.keys.format's {n} corresponds to the group of regexp. (e.g. s3://pre/partition/2017/12/01/00/aaa.png => [2017/12/01, 2017, 12, 01]) |
API
replaceObjects(deletePreObject=true, matchedHandler=(matched, objKey, table)=>matched)
Replaces object located in prePartitionLocation to srcLocation with partition key=value prefix.
(e.g. s3://pre/partition/2017/12/01/00/aaa.png => s3://src/location/dt=2017-12-01/00/aaa.png)
If you need to change the key before this operation, use matchedHandler.
The following example is changing the UTC string to that of TimeZone.
(e.g. 2017/12/01/19 => 2017/12/02/04)
There are full codes in /sample.
const utcToTZ = (matched, objKey, table) => {
let existsDt = false;
table.partition.keys.forEach((key) => {
if (key.name === 'dt') {
existsDt = true;
}
});
if (!existsDt) {
return matched;
}
let tz = moment(`${matched[0]} +00:00`, 'YYYY/MM/DD/HH ZZ');
matched[1] = tz.format('YYYY');
matched[2] = tz.format('MM');
matched[3] = tz.format('DD');
matched[4] = tz.format('HH');
return matched;
};
await admin.replaceObjects(false, utcToTZ);migrate()
If there are differences from the previous saved definition in S3, create/drop the table or update the schema.
partition()
Just run MSCK REPAIR TABLE. Partition is automatically detected and added by objects' key=value prefix.
Article
Athenaのmigrationやpartitionするathena-managerを作った - sambaiz-net