aws-neuron (AWS Inferentia and Trainium)

Organization Card

Hugging Face is working with Amazon Web Services to make it easier than ever for startups and enterprises to train and deploy Hugging Face models to AWS Inferentia and Trainium instances directly to Amazon EC2 Instances or through SageMaker.

AWS Inferentia accelerators are designed by AWS to deliver high performance at the lowest cost for your deep learning (DL) inference applications. AWS Inferentia2 delivers up to 4x higher throughput and up to 10x lower latency compared to Inferentia. Inferentia2-based Amazon EC2 Inf2 instances are designed to deliver high performance at the lowest cost in Amazon EC2 for your DL inference and generative artificial intelligence (AI) applications. They are optimized to deploy increasingly complex models, such as large language models (LLM) and vision transformers, at scale. Inf2 instances are the first inference-optimized instances in Amazon EC2 to support scale-out distributed inference with ultra-high-speed connectivity between accelerators.

AWS Trainium is the second-generation machine learning (ML) accelerator that AWS purpose built for deep learning training of 100B+ parameter models. Each Amazon Elastic Compute Cloud (EC2) Trn1 instance deploys up to 16 AWS Trainium accelerators to deliver a high-performance, low-cost solution for deep learning (DL) training in the cloud. Trainium based EC2 Trn1 instances solve this challenge by delivering faster time to train while offering up to 50% cost-to-train savings over comparable Amazon EC2 instances.

🤗 Optimum Neuron
🤗 Optimum Neuron is the interface between the 🤗 Transformers library and AWS Accelerators including AWS Trainium and AWS Inferentia. It provides a set of tools enabling easy model loading, training and inference on single- and multi-Accelerator settings for different downstream tasks. The list of officially validated models and tasks is available here.

Learn More
Optimum Neuron
Tutorials
How to Get Started
References
AWS Neuron Documentation

Blogs / Videos
Fine-tune Llama 7B on AWS Trainium
Deploy Embedding Models on AWS inferentia2 with Amazon SageMaker
Deploy Llama 2 7B on AWS inferentia2 with Amazon SageMaker
Deploy Stable Diffusion XL on AWS inferentia2 with Amazon SageMaker
Accelerating Transformers with Optimum Neuron, AWS Trainium and AWS Inferentia2