Amazon SageMaker now supports Inf1 instances providing high performance and cost-effective machine learning inference

Posted on: Apr 22, 2020

Amazon SageMaker customers can now select Inf1 instances when deploying their machine learning models for real-time inference. Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. Using Inf1 instances on Amazon SageMaker, customers can run large scale machine learning and deep learning inference applications such as image recognition, speech recognition, natural language processing, personalization, forecasting, and fraud detection with high performance and significantly lower costs. 

Inf1 instances are built from the ground up to support machine learning inference applications and feature up to 16 AWS Inferentia chips, machine learning chips designed and built by AWS to optimize cost for deep learning inference. The Inferentia chips are coupled with the latest custom 2nd generation Intel® Xeon® Scalable processors and 100Gbps networking to provide high-performance and the lowest cost in the industry for ML inference applications. With 1 to 16 AWS Inferentia chips per instance, Inf1 instances can scale in performance up to 2000 Tera Operations per Second (TOPS) and deliver up to 3x higher throughput and up to 45% lower cost per inference compared to the AWS GPU based instances. The large on-chip memory on AWS Inferentia chips used in Inf1 instances allows caching of machine learning models directly on the chip eliminating the need to access outside memory resources during inference, and enabling low latency and inference throughput. To learn more about Inf1 instances, visit the product pages.  

Inf1 instances in Amazon SageMaker are now available in the N. Virginia and Oregon AWS regions in the US and are available in four sizes: ml.inf1.xlarge, ml.inf1.2xlarge, ml.inf1.6xlarge, and ml.inf1.24xlarge. Machine learning models developed using TensorFlow and MxNet frameworks can be deployed on Inf1 instances in Amazon SageMaker for real-time inference. To use Inf1 instances in Amazon SageMaker, you can compile your trained models using Amazon SageMaker Neo and select the Inf1 instances to deploy the compiled model on Amazon SageMaker.  

Visit the Amazon SageMaker developer guide for more information and Amazon SageMaker examples in Github Amazon SageMaker examples in Github to learn more about how to deploy machine learning models on Inf1 instances in Amazon SageMaker.  

Modified 8/27/2021 – In an effort to ensure a great experience, expired links in this post have been updated or removed from the original post.