The desired outcome of machine learning deployments is having the ability to make predictions or classifications based on available data. Enterprises achieve this by building, training and deploying ML models on compute infrastructure suitable to execute the required inference tasks. In most cases, running inference tasks is a compute-intensive process. Optimizing this stage requires several iterations of performance monitoring and optimizations.
Many use cases require high performance when running inference tasks. A cloud-based service, such as Amazon SageMaker, delivers a wide range of EC2 instances and powerful compute capacity to run these tasks. But enterprises face the possibility of not reaching performance goals or incurring high AWS cost due to the use of powerful cloud servers.
Edge computing use cases have compute capacity constraints, so it's critical for enterprises to deploy performance-optimized ML models. This is where Amazon SageMaker Neo becomes relevant. It optimizes the performance of ML models based on the specific framework on which they're built and the hardware on which they execute.
How to optimize ML models with SageMaker Neo
The process for optimizing ML models using SageMaker Neo consists of the following steps:
- Build an ML model using any of the frameworks SageMaker Neo supports.
- Train the model, preferably using SageMaker.
- Use SageMaker Neo to create an optimized deployment package for the ML model framework and target hardware, such as EC2 instances and edge devices. This is the only additional task compared to the usual ML deployment process.
- Deploy the optimized ML model generated by SageMaker Neo on the target cloud or edge infrastructure.
SageMaker Neo supports models built in the following ML frameworks: Apache MXNet; Keras; Open Neural Network Exchange, or ONNX; PyTorch; TensorFlow; TensorFlow Lite; and XGBoost.
It supports hardware for target deployments from the following manufacturers: Ambarella, Arm, Intel, Nvidia, NXP, Qualcomm, Texas Instruments and Xilinx. It also supports devices running on OSes compatible with Windows, Linux, Android and Apple.
The combination of supported frameworks and hardware is an important consideration when planning the implementation of ML models in SageMaker. Ideally, enterprises evaluate their options in early stages of the design and development cycle.
SageMaker Neo delivers optimizations through two main components: a compiler and a runtime. The compiler applies optimizations based on the ML framework and target infrastructure, and it generates deployment artifacts by executing compilation jobs. These jobs can be triggered from the AWS console, SDK or CLI.
The output artifacts from Neo compilation jobs are placed in an S3 bucket, where they're available for deployment on target infrastructure. These jobs execute tasks using an optimized Neo runtime for the specific target platform.
Users can start SageMaker Neo compilation jobs in the SageMaker console by clicking Compilation jobs, available in the Inference left bar menu.
This launches the Compilation jobs screen, which displays a list of jobs. It also provides the option to start a job by clicking Create compilation job.
The first step is to enter a name for the job. Then, assign permissions through an Identity and Access Management (IAM) role by either creating a new role or selecting an existing one.
The Input configuration section provides the option to select an existing model artifact available in S3. It's important to make sure the assigned IAM role has access to that S3 location and that the file is in tarball format (.tar.gz). Data input configuration specifies the data format required by the ML model, which is provided in JSON format.
When choosing the Model artifacts option, it's also necessary to configure the ML framework that was used to build the input ML model. A drop-down shows a list of the frameworks SageMaker Neo supports.
The Input configuration section also provides the option to choose Model version. This feature is provided by SageMaker Model Registry, which, together with SageMaker Pipelines and the SDK, enables application owners to store, manage and access ML models.
The Output configuration section enables users to configure the target device or target platform for which the compiled model is optimized. It's also how users specify which S3 location the compiled output is stored in.
This section provides the option to configure encryption and compiler options. The Compiler options selection is optional for most targets. It provides additional details in areas such as input data types, CPU and platform, among other configurations relevant to specific targets.
When choosing the Target device configuration, users must select an option from a list of supported cloud instances or edge devices for which the model is optimized. For edge devices, it's recommended to use AWS IoT Greengrass to manage ML model deployments after the optimized model has been compiled.
The Target platform option provides a list of supported OSes, architectures and accelerators.
The console offers additional optional configurations, such as compilation job timeout, VPC, subnet, security groups and tags.
Once all parameters are provided, the next step is to click Submit, which starts the compilation job.
Deploy the ML model
Once the compilation job is complete, the output package is placed in the configured S3 output location. That package is then available for deployment to targets that execute inference tasks. Application owners only pay for the ML instance that runs inference tasks -- if that's the selected target type -- not for the Neo compilation job.
SageMaker Neo is a feature that can improve UX of ML applications and enable application owners to allocate optimal compute capacity for inference tasks that run on the cloud or edge devices. Neo applies these optimizations without affecting model accuracy. This is an important factor and is often affected when using other techniques for ML performance optimizations.
SageMaker Neo adds value and is relatively simple to implement, making it a recommended step to include in ML model release cycles.