How To Run A Benchmark Test on Wallaroo
Creating benchmarks for the performance of ML models in production can allow organizations much-needed transparency for finding bottlenecks within their existing infrastructure and deployment solutions.
This step-by-step guide will demonstrate how to run an open source model Aloha-CNN-LSTM, that classifies Domain names as legitimate or nefarious. Besides being open source, we find the ALOHA model serves as a good representation for benchmarking how complex ML models would perform in a certain data stack from a compute and latency perspective. For example, when we ran this model against Google Vertex, AWS SageMaker, and Databricks (running on Azure) on a similar 16-CPU server cluster, Wallaroo generated nearly 10x more inferences per second at 80% less cost.
That said, the best test is always running your data on your models. If interested in running a benchmark test comparing Wallaroo to other model deployment solutions, contact us here to get in touch with our specialists.
Step-by-Step Guide to Run ALOHA
Assuming you’ve installed Wallaroo in a cluster cloud Kubernetes cluster, the below will cover the process of running ALOHA, from creating the workspace and uploading the model, to deploying a pipeline that will feed the model data and verifying functionality.
Step 1: Establishing a Connection to Wallaroo
First, establish a connection to Wallaroo via the Wallaroo client by using the wallaroo.Client() command. Using the Python Library in the Jupyter Hub interface, this will be:
import wallaroo
wl = wallaroo.Client()
The URL granting SDK permission to your Wallaroo environment will be displayed. Enter the URL into your browser then confirm permissions.
Save the connection as a variable that can be later referenced.
Step 2: Create the Wallaroo Workspace
To create a new workspace saved as the “aloha-workspace” and set as the current workspace environment, enter the following command:
new_workspace = wl.create_workspace(“aloha-workspace”)_ = wl.set_current_workspace(new_workspace)
Next, verify that your workspace has been created as the current default workspace is created using the get_current_workspace() command by entering the following:
wl.get_current_workspace()
The result should be as follows:
Step 3: Configuring the Wallaroo Engine
Now we will determine the properties of our inference engine by using the Wallaroo DeploymentConfigBuilder() to set the configuration for the allocation of resources by Kubernetes.
You will need to define the number of inference engines on deployment, the number of cores per inference engine, and the GB of memory per inference engine.
In this case, we will use the following command to set our configuration to a single inference engine, 4 cores, and 8GB of memory per inference engine:
deployment_config = (wallaroo.DeploymentConfigBuilder().replica_count(1).cpus(4).memory(“8Gi”).build())
Step 4: Uploading the Models
Next, you can proceed to upload your models. The following command will apply the Aloha Model from a .ZIP file, in the form of a protocol buffer file for webpage evaluation, configured to utilize data in the TensorFlow format.
model = wl.upload_model(“aloha-2”, “./aloha-cnn-lstm.zip”).configure(“tensorflow”)
Step 5: Deploying a model
In this stage, you will need to define that you are using a TensorFlow model, state the deployment name, and the preferred deployment configuration. For this, you must create a pipeline called the aloha-test-demo that can ingest data, pass it to the Aloha model and generate a final output in a 45-second deployment process using the command below.
aloha_pipeline = wl.build_pipeline(‘aloha-test-demo’)aloha_pipeline.add_model_step(model)aloha_pipeline.deploy()
However, if you encounter a deployment error citing a lack of enough resources, you can undeploy other pipelines to cancel any running pipelines before redeploying this pipeline to regain resources by using the following command:
for p in wl.list_pipelines(): p.undeploy()
If successfully deployed, the result will appear as below:
To verify that the pipeline is running as well as the models associated with it, type in the following command:
aloha_pipeline.status()
It should generate the following results:
Step 6: Inferences
You can verify the functionality of your deployed pipeline by performing a smoke test using the infer_from_file command as follows:
aloha_pipeline.infer_from_file(“data-1.json”)
This will load a single encoded URL into the inference engine and print back the results as shown below:
Step 7: Batch Inference
If the smoke test is successful, you can proceed to feed in real data. In this case, we will use the data-1k.json with 1,000 inferences and the data-25k.json with 25,000 inferences, displaying the period it takes.
The following command will run the data-25k.json file through the aloha_pipeline deployment URL and generate the results in a file titled response.txt. Simply replace the URL from the _deployment._url() command into the curl command below:
aloha_pipeline._deployment._url()‘http://engine-lb.aloha-test-demo-5:29502/pipelines/aloha-test-demo'!curl -X POST http://engine-lb.aloha-test-demo-5:29502/pipelines/aloha-test-demo -H “Content-Type:application/json” — data @data-25k.json > curl_response.txt
The result should look like this:
Step 8: Undeploy Pipeline
Finally, you can use the command below to undeploy the pipeline to restore the Kubernetes resources to other tasks when done with your tests:
aloha_pipeline.undeploy()
However, if the deployment variable is unaltered, this command will restart the inference engine in the same configuration.
Correctly following all the above steps should let you successfully run the ALOHA Model and allow for a performance benchmark to evaluate your current model deployment. For more information or assistance, please visit the Wallaroo documentation site.