Benefit-prediction-service NPM

Plan Prediction Service

The plan prediction service utilizes ML/AI techniques to generate plans for groups based on what plans were historically generated. Currently there are two methods for predicting the plan, baseline and ML/model based.

The baseline method for predicting the plan is to predict the most frequently observed value for each benefit slug. This will not be correct every time but is a good starting place and is easy to implement.

The model based method for predicting the plan makes use of Catboost models which are gradient boosted trees with optimizations for categorical data. The benefits being predicted by these models are determined by a variance threshold. This threshold filters out benefits that are a single value over x% of the time. Another threshold, the count threshold, is used to filter out the infrequently observed combinations of benefit values. The combinations of the 2 threshold values will determine what benefits are predicted and the possible values for the predictions. In some cases there is no benefit to generating a model since the benefits selected do not vary enough to observe value, in these cases only baseline results are returned.

Environment Variables

MLFLOW_TRACKING_URI - MlFlow Tracking Server that has the Model Registry where the models are logged

MODEL_NAME_ROOT - root name of the models to be served. Default: _all_benefits_grouped

MODEL_STAGE - Stage of models to be used. Choices: STAGING, PRODUCTION Default: PRODUCTION

Required Permissions

The s3 bucket used by ML Flow as an artifact store must also be accessed in the service

Models

The models used in this services are named in the MLFlow Model Registry using the following naming convention.

pl*{MODEL_NAME_ROOT}

If a model cannot be found for a registered product line in the under the correct tag, a model will not be loaded and only the baseline plan prediction will be generated for that product line.

Development Enviroment Requirements

AWSCLI - providing credentials and access to S3. Must be configured for MFA if using IAM credential.
docker - used to run the container locally

Running the docker container

The service API runs on port 8001 in the container using HTTP, HTTPS is currentlly not being used for this service. The port forwared out of the container so that the API may be accessed by client connections.

If running the service from your development enviroment, the docker container will need access to an AWS credentials store so that a connection can be made to model files in S3. The bucket and file names used by the service are determined at runtime via MODEL_STAGE and MLFlow tagging.

You can start the service without the mlfow or the access to s3 but only the most common values will be predicted.

Docker command tested

docker run --name benefit-prediction-test -p 8001:80 --env MLFLOW_TRACKING_URI=http://ec2-34-217-53-154.us-west-2.compute.amazonaws.com -v ${HOME}/.aws/credentials:/root/.aws/credentials:ro benefit-prediction

@everything-registry/sub-chunk-1227 @zalastax/nolb-ben

4 years ago

4 years ago

4 years ago

4 years ago