Model Deployment for Data Scientists
Abstract
In the world of machine learning, model deployment is a crucial piece of the puzzle. While data scientists excel at other parts of the pipeline, deploying machine learning models tends to fall under the umbrella of software engineering or IT operations. And for good reason—successful deployments require a myriad of complex tasks, including building infrastructure, implementing APIs, load balancing, and integrating with data pipelines. We’ll briefly walk you through a basic model deployment example by picking out tools and planning out an approach to construct a simple sentiment classification model.
Intro
By the end of this post you will have the tools to serve your deep learning (DL) models via an API. We will also discuss the pros and cons of each tool, as well as common challenges you will face in deployment. Below are the requirements for tools that we’ll use in this tutorial. We will be using Python, so you will need to be familiar with Python code. Note that some packages have large file sizes, so feel free to download them beforehand. Let’s begin with the most basic form of deployment, Flask server.
Requirements
Python 3.6=<
Tools
Flask
Python has rich support for various types of web frameworks such as Flask and Django. We have decided to use Flask, since its lightweight and efficient framework is perfect for quick development cycles.
Web servers that host machine learning models can easily be integrated into a production environment via an API or take inputs through web forms directly on your webpage.
Flask isn’t built to be a production-ready framework on its own, as it ships with a development only server.
TensorFlow & Keras
TensorFlow is an expansive platform/toolkit for creating prediction models. The library supports numerical computation and is well suited for dealing with computer vision and natural language processing. It defines the model as a static dataflow graph, with details on how data moves through the graph and what operations are applied to the data. Other features include Tensorboard which has tools for visualization and Tensorhub which host pre-trained models.
Keras is a high-level neural network API that uses TensorFlow on the backend and allows for fast experimentation.
Anaconda
Anaconda is a data science platform that comes prepackaged with common python packages. This platform also has other useful builtin tools such as conda which is a virtual environment and package manager for python. It also comes with Spyder and Jupyter notebook which are popular tools that have streamlined data science work such as data exploration and visualization.
By leveraging these tools together, we use the Flask framework to create an application server that can serve inference for your TensorFlow and Keras models.
Use Case
Flask is ideal to use in scenarios where you would like to quickly demo your model output and mock-up the ideal workflow. This quick deployment workflow gives you more time for model development. Since Flask includes a natively built web server, it doesn’t need much configuration or support to accept HTTP requests. However, Flask does not scale up well, as it is really only designed to support one request at a time. Typically, you would want to place dedicated web ( e.g., NGINX) and application servers (e.g., uWSGI, Gunicorn, mod_wsgi) in front of Flask so that it can deal with a high number of requests concurrently.
Proof of Concepts (POC) projects benefit from Flask since it can quickly demonstrate the end to end workflow of the model. Having a prediction server can help business leaders easily see and understand the problem that the model is solving and how it fits into the pipeline in production.
Now that we’ve explained all the tools /dependencies we will need let’s go ahead and deploy the model. You can swap our your own model architecture and weights in the code, but included in the repo in a sentiment analysis classification model. We’ve gone ahead and completed the data cleaning, feature engineering, and model selection steps, in our example we will focus more on how to deploy the model.
Now let’s get started and deploy our sentiment classification model!
Environment/Dependencies
First, we need to create a virtual environment to download dependencies.
If you have these packages already, you can skip this step. We will first create a virtual environment called “flask_deployment”, although, you can select any other name that you prefer. Creating a virtual environment will help contain packages within that environment so that you can easily manage package versions and avoid conflicts between different projects.
$ conda create -n flask_deployment python=3.6 anaconda$ source activate flask deployment$ pip install Flask$ pip install TensorFlow$ pip install keras
Code Walkthrough
*Full code can be accessed here: https://github.com/nightfalldlp/model_deployment
Python Code Walk Through
Here are the steps to deploy a Flask server. We have broken down the lines of code for easier comprehension. We also have provided the full code repository at the beginning and end of this section.
1. Imports
In lines 8-12, we are importing the necessary libraries.
import pickleimport tensorflow as tffrom flask import Flask, request, jsonifyfrom keras.preprocessing.sequence import pad_sequencesfrom keras.models import load_model
2. Creating the application
In line 16, we are creating an instance of the Flask object.
application = Flask(__name__)
3. Route and decorator
Line 27-29 is a decorator that routes any request to the specified endpoint, (e.g. ‘/’) and runs the function that it is decorating, in this case, it would be the running function.
@application.route('/',methods=['GET'])def running(): return "Testing the landing page! This worked"
4. Running the application
Lines 87-88 run the server on a local development server, which is accessible as localhost (although if you specify it to run on ‘0.0.0.0.’ it will open your address to the local network). However of those are not recommended for a production environment and are only typically used in the development.
if __name__ == "__main__": application.run(host="0.0.0.0", port=5000, debug=True)
You can start your server by running the script in your terminal. If you see the content in the box below, that means you have successfully created your app. By running your app this was it will expose this route to your local network. This will allow other team members to quickly test the performance of your model.
$ export FLASK_APP=test_app.py$ flask run -h 0.0.0.0If successful you should see * Serving Flask app "test_app.py" * Environment: production WARNING: Do not use the development server in a production environment. Use a production WSGI server instead. * Debug mode: offTo test your server, in your terminal you can either run$ curl http://127.0.0.1:5000/
5. Route number 2 using POST request
In lines 40-45 we switch to a POST request from a GET request. Both versions can transfer data through HTTP protocols, however, a GET request carries parameters in the URL strings, while POST carries parameters in the message body. This prevents you from having to use url encoding and allow you to make larger (multipart) posts. With TLS both of these are encrypted, however your logs at the terminator might capture whole URLs.
Here is where the request object comes in. You can extract the information in the request body by using request.get_json (line 42).
Restart the server using the commands in Step 4; again, you can try to ping your new endpoint with the following command:
$ curl http://127.0.0.1:5000/json-example -d '{"message":"this is a message"}'You should get back {“output”: “this is a message”} if it is working successfully.
6. Run your own TensorFlow model.
In lines 55-80 we have provided the code for our own TensorFlow model, but you can modify it so that you can bring in your own. If you would like to use your own models then save your keras model using the model.save(path) method and this will allow you to load it in the same manner We have also saved the tokenizer obj (to convert text to a sequence of vocabulary ids) as a pickle file and are now loading it back as a global variable.
with open('./dumps/tokenizer.pkl','rb') as f: tokenizer = pickle.load(f)
Lines 59-63 are loading the TensorFlow model.
def get_model(): # global model model = load_model('./dumps/whole_model.h5') model._make_predict_function() return modelmodel = get_model()
Lines 70-80 are used to create an endpoint so that you can run the inference.
In this function, we first receive the message from the post request, and then using the loaded tokenizer, we convert the message to a numpy array, so that our model can run inference. Once the prediction is done, we will need to index into the numpy array to access the prediction score. You cannot return a numpy array as that is not JSON serializable. You will need to jsonify your output which basically is converting it into a string representation, line 80.
@application.route('/keras-example',methods=['POST'])def keras_exmample(): req_data = request.json msg = req_data.get('message') ## tokenizing and model inference seq = tokenizer.texts_to_sequences([msg]) seq = pad_sequences(seq,maxlen=50,padding='post') pred = model.predict(seq).tolist()[0][0] out = {'prob':pred} return jsonify(out)
7. Inference route
The /keras-example route is where the endpoint is for our inference. If your server is running, you can run the following to get predictions from your flask server:
curl -X POST http://127.0.0.1:5000/keras-example -d '{"message":"I am extremely happy!"}' -H "Content-Type: application/json"{The following JSON pattern should return back to you if it was successful:"prob": 0.023021150380373}
Wait a minute, the sentence “I am extremely happy” has such a low positive sentiment score, what is going on? Well that’s because our model was not trained and the weights were randomly initialized :D. Model training is in itself an extensive topic, we will cover the topic in a future post.
If not, rerun the code in terminal:
$ export FLASK_APP=test_app.py$ flask run -h 0.0.0.0
When this runs successfully, you have deployed your first deep learning model!
Conclusion
In this post, we learned how to deploy a deep learning model with Flask and perform inference.
While this is an excellent first step, you have quite a way to go before the model hosting layer is production-ready. Even if you are planning to keep your local computer running all the time, you still need to find a way to scale up when the demand exceeds the capability of your machine.
The immediate challenges are:
- How to scale up and handle concurrent requests within your application
- How to auto-scale the number of machines (optimally through a cloud service provider).
We can cover these challenges in a future blog post and show you how to build a production-ready server by shifting this Flask server onto Amazon Elastic Beanstalk. We will also explore the additional feature sets that come out of the box with Elastic Beanstalk.