metadata
license: apache-2.0
tags:
- onnx
- ort
ONNX and ORT models with quantization of google-bert/bert-large-cased-whole-word-masking
This repository contains the ONNX and ORT formats of the model google-bert/bert-large-cased-whole-word-masking, along with quantized versions.
License
The license for this model is "apache-2.0". For details, please refer to the original model page: google-bert/bert-large-cased-whole-word-masking.
Usage
To use this model, install ONNX Runtime and perform inference as shown below.
# Example code
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
import os
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-large-cased-whole-word-masking')
# Prepare inputs
text = 'Replace this text with your input.'
inputs = tokenizer(text, return_tensors='np')
# Specify the model paths
# Test both the ONNX model and the ORT model
model_paths = [
'onnx_models/model_opt.onnx', # ONNX model
'ort_models/model.ort' # ORT format model
]
# Run inference with each model
for model_path in model_paths:
print(f'\n===== Using model: {model_path} =====')
# Get the model extension
model_extension = os.path.splitext(model_path)[1]
# Load the model
if model_extension == '.ort':
# Load the ORT format model
session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider'])
else:
# Load the ONNX model
session = ort.InferenceSession(model_path)
# Run inference
outputs = session.run(None, dict(inputs))
# Display the output shapes
for idx, output in enumerate(outputs):
print(f'Output {idx} shape: {output.shape}')
# Display the results (add further processing if needed)
print(outputs)
Contents of the Model
This repository includes the following models:
ONNX Models
onnx_models/model.onnx
: Original ONNX model converted from google-bert/bert-large-cased-whole-word-maskingonnx_models/model_opt.onnx
: Optimized ONNX modelonnx_models/model_fp16.onnx
: FP16 quantized modelonnx_models/model_int8.onnx
: INT8 quantized modelonnx_models/model_uint8.onnx
: UINT8 quantized model
ORT Models
ort_models/model.ort
: ORT model using the optimized ONNX modelort_models/model_fp16.ort
: ORT model using the FP16 quantized modelort_models/model_int8.ort
: ORT model using the INT8 quantized modelort_models/model_uint8.ort
: ORT model using the UINT8 quantized model
Notes
Please adhere to the license and usage conditions of the original model google-bert/bert-large-cased-whole-word-masking.
Contribution
If you find any issues or have improvements, please create an issue or submit a pull request.