Machine Learning Engineering on AWS by Joshua Arvin Lat

Machine Learning Engineering on AWS by Joshua Arvin Lat

Author:Joshua Arvin Lat
Language: eng
Format: epub
Publisher: Packt
Published: 2022-02-15T00:00:00+00:00


SageMaker Deployment Solutions

After training our machine learning (ML) model, we can proceed with deploying it to a web API. This API can then be invoked by other applications (for example, a mobile application) to perform a “prediction” or inference. For example, the ML model we trained in Chapter 1, Introduction to ML Engineering on AWS, can be deployed to a web API and then be used to predict the likelihood of customers canceling their reservations or not, given a set of inputs. Deploying the ML model to a web API allows the ML model to be accessible to different applications and systems.

A few years ago, ML practitioners had to spend time building a custom backend API to host and deploy a model from scratch. If you were given this requirement, you might have used a Python framework such as Flask, Pyramid, or Django to deploy the ML model. Building a custom API to serve as an inference endpoint can take about a week or so since most of the application logic needs to be coded from scratch. If we were to set up A/B testing, auto-scaling, or model monitoring for the API, then we may have to spend a few additional weeks on top of the initial time spent to set up the base API. ML engineers and software developers generally underestimate the amount of work required to build and maintain ML inference endpoints. Requirements evolve over time and the custom application code becomes harder to manage as the requirements and solutions pile up. At this point, you might ask, “Is there a better and faster way to do this?”. The good news is that we could do all of it in “less than a day” if we were to use SageMaker to deploy our model! Instead of building everything from scratch, SageMaker has already automated most of the work and all we need to do is specify the right configuration parameters. If needed, SageMaker allows us to customize certain components and we can easily replace some of the default automated solutions with our own custom implementations.

One of the misconceptions when using SageMaker is that ML models need to be trained in SageMaker first before they can be deployed in the SageMaker hosting services. It is important to note that “this is not true” since the service was designed and built to support different scenarios, which include deploying a pre-trained model straight away. This means that if we have a pre-trained model trained outside of SageMaker, then we can proceed with deploying it without having to go through the training steps again. In this chapter, you’ll discover how easy it is to use the SageMaker Python SDK when performing model deployments. In just a few lines of code, we will show you how to deploy our pre-trained model into a variety of inference endpoint types – real-time, serverless, and asynchronous inference endpoints. We will also discuss when it’s best to use each of these inference endpoint types later in this chapter.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.