Aws-chameleon NPM

AWS Chameleon

aws-chameleon is a server-side A/B/n testing framework using AWS S3. This framework lets you create instances on your backend code and start A/B testing those parts on the backend. It allows you to create experiments, activate/deactivate experiments, archive/unarchive experiments, create variations for the experiments and the most important part putting user into an experiment variation for testing and collecting data for better business decision.

aws-chameleon is

part in-memory for the user-facing side of things (getting variations for the user).
part off-memory in S3 storage for the management side of things (managing experiments).

aws-chameleon response time

for user-facing functions like getVariation response time is sub 10ms thanks to part in-memory nature.
for management functions like createExperiment response time is about 1s because of S3 dependency.

Basically when initialized the framework pulls data from S3 and loads in memory for no-latency classification of users into experiments. And then if someone manages the experiment (ie create, activate ...) it changes in memory and updates S3. So it's semi-realtime.

All the chameleon indexes are refreshed every few minutes (defined below) from S3 to In-memory.

Installation

Chameleon requires Node.js v8+ to run.

npm install aws-chameleon --save

Export the s3 bucket name in your node environment variables. (CHAMELEON_S3_BUCKET_NAME)

export CHAMELEON_S3_BUCKET_NAME='XXXXX'

Make sure that the name is DNS complaint and follows AWS' bucket restrictions. https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html For each environment, make sure its different (ie prod, stage, dev)

eg. sharvil.chameleon.prod, sharvil.chameleon.stage ...

Export the refresh rate in whole minutes for your system in your node environment variables. (CHAMELEON_REFRESH_RATE)

export CHAMELEON_REFRESH_RATE = 1;

Getting Started

High Level Definitions

CHAMELEON_S3_BUCKET_NAME

This is the main bucket in S3 that is created by the framework which has everything. Export it as a env variable. Make sure its DNS complaint.

CHAMELEON_REFRESH_RATE

This is the rate in minutes at which the indexes refresh (ie. pull from S3 into memory). Default is 5 minutes. Export it as a env variable. Make sure its an integer value, eg. 1 / 2 / 4 ......

instance (Eg. paywall)

This is the name of the portion of your backend you wanna A/B test. Eg. if you want to test different paywalls from the backend, the instance name would be paywall or whatever you like. Make Sure its Unique. There would be one folder for each instance in the main bucket.

index.json

This stores the information for the chameleon to initialize and store indexes. There would be one main index.json and individual ones within each instance folder.

experiment

Each instance will have experiments. Each experiment will be a folder within the instance folder. It will contain all the variation JSON files for that experiment. Each experiment has a name and id. Folder name will be experimentName-experimentId. Every instance will always have ONE ACTIVE experiment at a time. Every instance can have a max of 10 experiments (if that number overflows you need to offload/archive the unneded experiments) (Of course you can unarchive experiments but the MAX limit is strictly followed there too). This is to keep a check of the in-memory data for the framework.

variation

Each experiments have variations that need to be tested against a user. These are stores as JSON files in the s3 folder. Each variation has a name and id. File name will be variationName-variationId.

control

Control is basically the default variation. Follows the same rules as a variation. You always need a control while creating experiments.

Directory Structure in S3

ab.testing.chameleon/
    - index.json
    - paywall
        -index.json
        -experiment-1
            -variation-1.json
            -variation-2.json
            -variation-3.json
    - onboarding
        -index.json
        -experiment-1
            -variation-1.json

Low Level Definitions

template

For an instance, there is always a JSON template associated with it. Each created variation has to follow the template or it will throw an error while creating the experiment. The template is defined at the code level where the chameleon instance is initialized and used to get variations for the user.

active experiment

Each instance can have multiple experiments. But only one active one.

archived experiment

You can archive experiments to make way for new experiments coz of the MAX 10 experiments per instance limit.

splits

You can specify the traffic you want to split for the experiments .. This is a array of integers that add up to 100. As in if an experiment has 3 total variations (control + 2 variations) and you want 70% of your traffic to go to control, 20% to variation1 and 10% to variation2 then the splits array = 70, 20, 10 Always has to add to 100. And length should be same as the number of variations of that experiment. This is optional. If not specified, then equal splits are taken into consideration.

bucketing

This currently uses murmurhash to hash the unique Id specified while getting variation. This hash is then used to return the appropriate variation/bucket.

Initialization

Ofcourse require the module. Create a template JSON. Call the getInstance method with the name and the template to get the chameleon object. Use this chameleon object's method to do all the things required. Mainly getVariation ... and other management method as create, activate... experiments.

const chameleon = require('aws-chameleon');
const template = {
  html: ''
};
const instanceName = 'paywall';
const options = {
  hash_seed: 2             // optional:default hash of the instanceName
};
const chameleonInstance = chameleon.getInstance(instanceName, template, options);

Options:

hash_seed (number):

Default: a hash of the instanceName converted to a number. If you specify the hash_seed make sure it is a number (usually between 1-500). This basically uses this hash_seed while bucketing the Id into a variation of the active experiment. This was added because when we have more than 1 instances running by themselves, the id that get's hashed to the 1st bucket of 1st active instance experiment, will also get hashed to the 1st of 2nd active instance experiment. To randomize this selection critiria, we use the hash_seed.

skip_template_validation (boolean):

False (default): The template is a way to make sure control and variations follow a given structure and that structure isn't changed by code updates, or by new experiments' control/variations once the instance has been created. This behavior can by skipped, in which case, clients getting variation data should have mechanisms to ensure the variation data is valid and perhaps defaults to fallback to.

(deprecated) run_by_itself (boolean):

(defaulted to true): All instances run on their own ie. all users will go through all instances. We do not pool the active instances and split the user traffic into those pooled instances. Instead we include a hash_seed for the instance itself, so that the users are randomized into all the active instances and statistically it won't interfer with the results.

Methods

Main Chameleon

const chameleon = require('aws-chameleon');

Using the main chameleon object you can call the following methods

getInstance(instanceName, template)

DOES NOT RETURN A PROMISE

This method is used to initialize and get an object of chameleon for that instance.

input:
    - instanceName:String (name of the instance/part of the backend to be tested)
    - template:JSON object (template of the expected variation)
    - options:JSON object (optional options ** See definition above **)
output:
    - chameleon instance object that can be used to get veriations and manage experiments (check below)

getAllInstances()

RETURNS A PROMISE

This method return a list of all the available instance created in S3 from the backend.

input:
    - N/A
output:
    - Array of instanceNames (eg. onboarding, paywall, price) as a promise

getActiveInstances()

RETURNS A PROMISE

This method return a list of all active instances on the backend. Active meaning, all the instances that have an active experiment. All the incoming traffic will be divided into these active experiments equally. It is a subset of the array returned by getAllInstances().

input:
    - N/A
output:
    - Array of instanceNames (eg. onboarding)

deleteInstance(instanceName)

#TODO

RETURNS A PROMISE

This method deletes the instance from S3. Once deleted no turning back. Make sure to use the backupInstance() method to download the instance folder from S3 to your machine before deleteting. To delete an instance completely, run this method, remove the code from your backend where the instance was initialized, and redeploy.

input:
    - N/A
output:
    - N/A

backupInstance(instanceName, path)

#TODO

RETURNS A PROMISE

This method downloads the S3 folder in the path specified to bakcup your chameleon instance. If you want the history for that instance, make sure to use this method before deleting. Use this method or directly download from S3, its the same.

input:
    - N/A
output:
    - N/A

Chameleon instance object

const chameleon = require('aws-chameleon');
const template = {
  html:''
};
const instanceName = 'paywall';
const chameleonInstance = chameleon.getInstance(instanceName, template);

Using the chameleonInstance object you can call the following methods

getVariation(id, reqExperimentObject = {}, reqVariationObject = {})

DOES NOT RETURNS A PROMISE

This method is used to get the variation for that userId for that instance into an the experiment. It is realtime with a sub 10ms response time because of its in-memory nature. If it returns null that means no experiment active or the user got bucketed into another instance.. You should handle the null scenario and return the default value in the code base

You can override and retrive a specific variation of an experiment for that instance. Just pass in the optional parameter of reqExperimentObject = {}, reqVariationObject = {}, which contain id and name for both experiment and variation. But both are required for the override. And make sure they are correct, if not it will throw error like variation not present or experiment not present. You can get these from the get methods mentioned below.

input:
    - id:String (an unique id of the user to bucket into an experiment)
    - reqExperimentObject:(optional)
        eg:{
                name:"experiment-1",
                id:"9059a7f0-a4ed-11e9-90d4-478235203f52",
            }
    - reqVariationObject:(optional)
        eg:{
                name:"variation-1",
                id:"12349a7f0-a4ed-11e9-90d4-478235203f5",
            }
output:
    - An JSON object of the variation the user is bucketted into for that instance & experiment.
    - OR an JSON object of the variation for that instance & experiment which is overridden.
    - OR null (handle this case explicitly)
    - OR error (handle the error)

createExperiment(experimentObject, control, variationsArray, splitsArray)

RETURNS A PROMISE

This method is used to create an experiment for that instance. Note that this does not activate the experiment. It just creates it. To activate that use activateExperiment() method. If there are over 10 experiments in-memory, it will throw an error. Use archive experiments to offload some old experiments.

input:
    - experimentObject :{ id, name } this is an object with id and name properties in it
        eg:{
                name:"experiment-1",
                id:"9059a7f0-a4ed-11e9-90d4-478235203f52",
            }
    - control :{ id, name, data } this is an abject with id, name and data properties in it.
        data, an json object, should always match the template specified for the instance or else it will throw and error.
        eg:{
                name:"variation-0",
                id:"9059a7f0-a4ed-11e9-90d4-478235203f5c",
                data:{ htmlName:"timerPaywall" }
            }
    - variationsArray :an array of all the variations of the experiment which is an array of object with { id, name, data } properties in them. Follows same rules as above.
        eg:[
                {
                    name:"variation-1",
                    id:"9059a7f0-a4ed-11e9-90d4-4782352023145",
                    data:{ htmlName:"slideshowPaywall" }
                },
                {
                    name:"variation-2",
                    id:"9059a7f0-a4ed-11e9-90d4-478235201125c",
                    data:{ htmlName:"contentPaywall" }
                }
            ]
    - splitsArray (optional) :this is a array of integers that adds up to 100 and length same as the length of variationsArray + 1 (control)
        eg:[60, 30, 10]
output:
    - 'success' promise if successfully created the experiment.
    - or it throws an error.. handle the error

activateExperiment(experimentObject)

RETURNS A PROMISE

This method is used to activate an experiment for that instance. Note that there can be only one active experiment at a time for that instance. If there is an already active experiment or the input parameter is not correct, it will throw an error. If already an active experiment present use deactivateExperiment() on that experiment and then activate this one.

input:
    - experimentObject :{ id, name } this is an object with id and name properties in it
        eg:{
                name:"experiment-1",
                id:"9059a7f0-a4ed-11e9-90d4-478235203f52",
            }
output:
    - 'success' promise if successfully activated the experiment.
    - or it throws an error.. handle the error

deactivateExperiment(experimentObject)

RETURNS A PROMISE

This method is used to deactivate an experiment for that instance. Note that there can be only one active experiment at a time for that instance. If the input parameter is not correct, it will throw an error.

input:
    - experimentObject :{ id, name } this is an object with id and name properties in it
        eg:{
                name:"experiment-1",
                id:"9059a7f0-a4ed-11e9-90d4-478235203f52",
            }
output:
    - 'success' promise if successfully activated the experiment.
    - or it throws an error.. handle the error

archiveExperiment(experimentObject)

RETURNS A PROMISE This method is used to archive and offload an experiment from in-memroy to S3 for that instance. Note that there can be only a MAX of 10 experiments in memory at a time for that instance. If there are over 10 experiments, then you need to use this method and archive some old ones. If you are trying to archive an active experiment or the input parameter is not correct, it will throw an error.

input:
    - experimentObject :{ id, name } this is an object with id and name properties in it
        eg:{
                name:"experiment-1",
                id:"9059a7f0-a4ed-11e9-90d4-478235203f52",
            }
output:
    - 'success' promise if successfully activated the experiment.
    - or it throws an error.. handle the error

unarchiveExperiment(experimentObject)

RETURNS A PROMISE

This method is used to unarchive an old experiment to in-memory for that instance. Note that there can be only a MAX of 10 experiments in memory at a time for that instance. If there are over 10 experiments, then you need archive some more in order to unarchive this. If the input parameter is not correct or the experiment doesn't exist, it will throw an error.

input:
    - experimentObject :{ id, name } this is an object with id and name properties in it
        eg:{
                name:"experiment-1",
                id:"9059a7f0-a4ed-11e9-90d4-478235203f52",
            }
output:
    - 'success' promise if successfully activated the experiment.
    - or it throws an error.. handle the error

deleteExperiment(experimentObject)

RETURNS A PROMISE

This method is used to permenantly delete an old/unwanted experiment for that instance. Make sure you back it up somewhere if you need the data before deleted. If the input parameter is not correct or the experiment doesn't exist, it will throw an error.

input:
    - experimentObject :{ id, name } this is an object with id and name properties in it
        eg:{
                name:"experiment-1",
                id:"9059a7f0-a4ed-11e9-90d4-478235203f52",
            }
output:
    - 'success' promise if successfully activated the experiment.
    - or it throws an error.. handle the error

getAllExperiments()

RETURNS A PROMISE

This method is used to retrive all the experiments in-memory for that instance. It returns template on that instance, the active experiment name, and all the experiments in-memory (max 10)

input:
    - N/A
output:
    - an object with following properties
        {
            template:{ htmlName:'' },
            activeExperiment:'experiment-1-9059a7f0-a4ed-11e9-90d4-478235203f5c',
            experiments:{
                experiment-1-9059a7f0-a4ed-11e9-90d4-478235203f5c:{
                    "name":"experiment-1",
                    "id":"9059a7f0-a4ed-11e9-90d4-478235203f5c",
                    "control":{
                        "name":"var0",
                        "id":"a71411c5-7593-4640-bc4d-01bb66e965bf",
                        "data":{
                            "htmlName":"default"
                        },
                        "url":"paywall-html/test-92b5e900-1897-4bad-866a-29331f8b88ff/var0-a71411c5-7593-4640-bc4d-01bb66e965bf.json"
                    },
                    "variations":[ ... ]
                },
                experiment-2-13127f0-a4ed-11e9-90d4-478235203f5c:{
                    ....
                }
            }
        }
    - or null.. handle the null

getAllArchivedExperiments()

RETURNS A PROMISE

This method is used to retrive all the archived experiments in s3 for that instance. It returns template on that instance and all the experiments archived previously.

input:
    - N/A
output:
    - an object with following properties
        {
            template:{ htmlName:'' },
            experiments:{
                experiment-1-9059a7f0-a4ed-11e9-90d4-478235203f5c:{
                    "name":"experiment-1",
                    "id":"9059a7f0-a4ed-11e9-90d4-478235203f5c",
                    "control":{
                        "name":"var0",
                        "id":"a71411c5-7593-4640-bc4d-01bb66e965bf",
                        "data":{
                            "htmlName":"default"
                        },
                        "url":"paywall-html/test-92b5e900-1897-4bad-866a-29331f8b88ff/var0-a71411c5-7593-4640-bc4d-01bb66e965bf.json"
                    },
                    "variations":[ ... ]
                },
                experiment-2-13127f0-a4ed-11e9-90d4-478235203f5c:{
                    ....
                }
            }
        }
    - or null.. handle the null

getActiveExperiment()

RETURNS A PROMISE

This method is used to retrive the active experiment for that instance. It returns an object of the expriment with id, name, control, variationsArray..

input:
    - N/A
output:
    - an object with following properties
        {
            "name":"experiment-1",
            "id":"9059a7f0-a4ed-11e9-90d4-478235203f5c",
            "control":{
                "name":"var0",
                "id":"a71411c5-7593-4640-bc4d-01bb66e965bf",
                "data":{
                    "htmlName":"default"
                },
                "url":"paywall-html/test-92b5e900-1897-4bad-866a-29331f8b88ff/var0-a71411c5-7593-4640-bc4d-01bb66e965bf.json"
            },
            "variations":[ ... ]
        }
    - or null.. handle the null

getActiveExperimentVariations()

#TODO

RETURNS A PROMISE

This method is used to retrive the variations for the active experiment for that instance.

input:
    - N/A
output:
    - N/A

NODE Principles used

Caching

Modules are cached after the first time they are loaded. This means (among other things) that every call to require('foo') will get exactly the same object returned, if it would resolve to the same file.

TODOs

This is to cover for a use case that: In a chameleon instance, if one experiment has 3 buckets (variations). User A goes into bucket #2. For a new experiment with 3 buckets, User A will go in bucket #2 again. To randomize this, we need to:
- Add a hash seed string for individual experiments in an instance. Generate hash seed from string.
- Default we can use the ID of the experiment.
- Expose the experiment level hash seed string via API. So people can choose whether to have a same hash seed as a previous experiment. Eg if we want to just update the new experiment with a new json, the new experiment hash string will be the same as the old exp, so that the users get bucketed in the same bucket.
Delete instance API.. from S3
Download instance (backup before delete)
Write Tests
Complete few more extra functions
Improve infrastructure