Intel
/

distilbert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa-int8

Model card Files Files and versions Community

distilbert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa-int8 / README.md

kevinintel's picture

Update README.md

52e97e8 over 1 year ago

|

history blame contribute delete

1.76 kB

	# Model Details: int8 1x4 Sparse Distilbert
	The article discusses the how to make inference of transformer-based models more efficient on Intel hardware. The authors propose sparse pattern 1x4 to fit Intel instructions and improve the performance. We implement 1x4 block pruning and get an 80% sparse model on the SQuAD1.1 dataset. Combined with quantization, it achieves up to x24.2 speedup with less than 1% accuracy loss. The article also shows performance gains of other models with this approach.
	The model card has been written by Intel.

	### Model license
	Licensed under MIT license.

	\| Model Detail \| Description \|
	\| ---- \| --- \|
	\| language: \| en \|
	\| Model Authors Company \| Intel \|
	\| Date \| June 7, 2023 \|
	\| Version \| 1 \|
	\| Type \| NLP - Tiny language model\|
	\| Architecture \| " we propose a new pipeline for creating and running Fast Transformer models on CPUs, utilizing hardware-aware pruning, knowledge distillation, quantization, and our own Transformer inference runtime engine with optimized kernels for sparse and quantized operators. We demonstrate the efficiency of our pipeline by creating a Fast DistilBERT model showing minimal accuracy loss on the question-answering SQuADv1.1 benchmark, and throughput results under typical production constraints and environments. Our results outperform existing state-of-the-art Neural Magic's DeepSparse runtime performance by up to 50% and up to 4.1x performance speedup over ONNX Runtime." \|
	\| Paper or Other Resources \| https://arxiv.org/abs/2211.07715.pdf \|
	\| License \| TBD \|

	### How to use
	Please follow Readme in https://github.com/intel/intel-extension-for-transformers/tree/main/examples/huggingface/pytorch/text-classification/deployment/sparse/distilbert_base_uncased