Model Interoperability with ONNX
Make it easier to share and deploy your machine learning models
ONNX (the Open Neural Network Exchange) is an open-source framework for serializing machine learning models. While it was originally developed for representing neural nets, it has been extended to a variety of other traditional machine learning algorithms. Because the model representation is independent of any specific environment, ONNX allows data scientists to share machine learning models that they produce, regardless of their preferred modeling framework, and to deploy them across a variety of runtime platforms.
In this article, we’ll give an overview of ONNX, and talk about why it’s an important tool for sharing and deploying machine learning models. We’ll also provide some tips and resources for converting models to ONNX.
The Basic Idea
Let’s take a simple linear model:
This expression can be represented by a computation graph, made up of features (inputs), edges, weights, and operators:
A notional computational graph for a linear model
An ONNX model is a description of this graph. The graph can then be “executed” by any runtime that understands the representation.
The beauty of this representation is that it can be used to express a wide variety of complex model types, regardless of how that model was originally fit. Whether you fit a gradient boosting model using scikit-learn or xgboost, or fit an LSTM using PyTorch or Tensorflow, you can serialize your model to an ONNX representation that’s not beholden to the original modeling framework.
These models can be run with ONNX Runtime, a cross-platform model accelerator that supports a wide variety of operating systems, architectures, and hardware accelerators.
This gives Data Scientists and ML Engineers a lot of flexibility to tune their respective ecosystems to their needs. Data Scientists can develop in the language and framework of their choice. They can share the models with colleagues who may prefer another framework. These colleagues can test out the model, without needing to know much about the original environment where the model was developed; just the appropriate format for the input data, and the appropriate version of ONNX.
ML Engineers can deploy these models to the best environment for their inferencing use case, with minimal or no dependence on the model’s development framework.
For example, our company, Wallaroo.ai, uses ONNX as the primary model framework for our ML production platform. Data Scientists can develop models in their preferred Python framework, convert them to ONNX, and upload them to the Wallaroo high-performance compute engine, which is implemented in Rust. Wallaroo then efficiently runs the model in the production environment.
Other production environments might run the model in C, or on special hardware accelerators, or deploy the models to the edge (a scenario Wallaroo also supports).
Let’s See It in Action
Let’s see an example of training a model, converting it to ONNX, and doing inferences in a Python ONNX runtime. For this example, we will train a simple Keras model to predict positive and negative movie reviews from IMDB. Since the focus of this article is on model conversion, rather than training, we’ll use the already tokenized version of the data set that is included in Keras.
This code snippet trains the model and saves it to the TensorFlow SavedModel format. It also saves off a small bit of data (five rows) for testing the fidelity of the ONNX conversion, later on.
import tensorflow.keras as keras
from tensorflow.keras import *
from tensorflow.keras.layers import *
#
# get the data
## a bit small, but this is just to create an example, not to make a good model
max_len = 100
embed_dim = 8
max_features = 10000# this is already tokenized
(x_train, y_train), (x_val, y_val) = keras.datasets.imdb.load_data(
num_words=max_features,
)print(len(x_train), "Training sequences")
print(len(x_val), "Validation sequences")
x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_len)
x_val = keras.preprocessing.sequence.pad_sequences(x_val, maxlen=max_len)# save a small amount of data for demonstrating the autoconversion
test_data_small = x_val[0:5, ]#
# Train a simple keras classifier
#model = Sequential()
model.add(Embedding(max_features, embed_dim, input_length = max_len))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model.summary()history = model.fit(x_train, y_train, epochs=5,
batch_size=32, validation_split=0.2)# save the trained model in SavedModel format
model.save("models/simple_sentiment_model/")
Note that for this example, the model input is a vector of 100 integer tokens ( max_len = 100
).
Converting the Model
To convert our model to ONNX, we will use the onnxmltools
package. The conversion function takes as input the trained Keras model, and a description of the model’s input. This description is a list of tuples, where each tuple is the name of the input, and the input type.
import onnx
import onnxmltools
from onnxmltools.convert.common.data_types import Int32TensorType
# create the input description
shape = [None, test_data_small.shape[1]]
input_name = 'input_text'
initial_types = [(input_name, Int32TensorType(shape))]
Our model has one input, of type Int32TensorType(None, 100)
— that is, the model accepts as input an arbitrary number of integer vectors of length 100. We’ll call that input “input_text.”
Finally, we convert and save the model.
onnx_model = onnxmltools.convert_keras(model, initial_types=initial_types)
onnx.save_model(onnx_model, 'models/sentiment.onnx')
Inferring with the ONNX Model
After the model is converted, it can be shared with other data scientists, who can run it using ONNX Runtime. We’ll show an example of that in Python, using the onnxruntime
package. The first thing a new user might want to do is interrogate the model to determine its inputs and outputs.
# start up an inference session
sess = onnxruntime.InferenceSession('models/sentiment.onnx')
# get the names types, and shapes of the input
for inp in sess.get_inputs():
print(f'input {inp.name} : {inp.type} of shape {inp.shape}')for outp in sess.get_outputs():
print(f'output {outp.name} : {outp.type} of shape {outp.shape}')# get just the names
inputs = [inp.name for inp in sess.get_inputs()]
This gives us the following output:
input input_text : tensor(int32) of shape ['unk__8', 100]
output dense : tensor(float) of shape ['unk__9', 1]
This tells us that the model takes as input named “input_text” that consists of integer vectors of length 100, each of which returns a single float named “dense” as output (the probability that the text is a positive review). In this example, we aren’t really using the output names.
Finally, let’s predict on our example input data, with the call sess.run()
. The inputs to the run
method are the name of the output (we’ll use None
here), and a dictionary keyed by the input name(s).
pred_onnx = sess.run(None, {inputs[0]: test_data_small})
pred_onnx
And now we’ve successfully inferred with the model, without needing the Keras environment.
Tips and Resources for ONNX Conversion
ONNX provides a lot of advantages in terms of sharing, running, and deploying models, but model conversion can be a challenge. Fortunately, both PyTorch and Hugging Face have fairly well documented and straightforward procedures for converting models from those respective frameworks.
For other ONNX-supported frameworks, the documentation is a bit diffuse, and there have been several conversion packages that have come and gone. I’ve found that onnxmltools
is the most reliable and up-to-date; the package supplies some useful examples for converting models from a variety of frameworks.
For deployment, the ideal situation would be for data scientists to be able to submit their original models to a deployment registry and have that registry automatically convert those models to ONNX or another appropriate representation to run into production. Wallaroo is currently working on making this situation reality. But in the meantime, learning how to convert models to ONNX for maximum interoperability is a valuable tool in the Data Scientist’s arsenal.