--- license: apache-2.0 tags: - onnx - ort --- # ONNX and ORT models with quantization of [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking) [日本語READMEはこちら](README_ja.md) This repository contains the ONNX and ORT formats of the model [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking), along with quantized versions. ## License The license for this model is "apache-2.0". For details, please refer to the original model page: [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking). ## Usage To use this model, install ONNX Runtime and perform inference as shown below. ```python # Example code import onnxruntime as ort import numpy as np from transformers import AutoTokenizer import os # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-large-cased-whole-word-masking') # Prepare inputs text = 'Replace this text with your input.' inputs = tokenizer(text, return_tensors='np') # Specify the model paths # Test both the ONNX model and the ORT model model_paths = [ 'onnx_models/model_opt.onnx', # ONNX model 'ort_models/model.ort' # ORT format model ] # Run inference with each model for model_path in model_paths: print(f'\n===== Using model: {model_path} =====') # Get the model extension model_extension = os.path.splitext(model_path)[1] # Load the model if model_extension == '.ort': # Load the ORT format model session = ort.InferenceSession(model_path, providers=['CPUExecutionProvider']) else: # Load the ONNX model session = ort.InferenceSession(model_path) # Run inference outputs = session.run(None, dict(inputs)) # Display the output shapes for idx, output in enumerate(outputs): print(f'Output {idx} shape: {output.shape}') # Display the results (add further processing if needed) print(outputs) ``` ## Contents of the Model This repository includes the following models: ### ONNX Models - `onnx_models/model.onnx`: Original ONNX model converted from [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking) - `onnx_models/model_opt.onnx`: Optimized ONNX model - `onnx_models/model_fp16.onnx`: FP16 quantized model - `onnx_models/model_int8.onnx`: INT8 quantized model - `onnx_models/model_uint8.onnx`: UINT8 quantized model ### ORT Models - `ort_models/model.ort`: ORT model using the optimized ONNX model - `ort_models/model_fp16.ort`: ORT model using the FP16 quantized model - `ort_models/model_int8.ort`: ORT model using the INT8 quantized model - `ort_models/model_uint8.ort`: ORT model using the UINT8 quantized model ## Notes Please adhere to the license and usage conditions of the original model [google-bert/bert-large-cased-whole-word-masking](https://huggingface.co/google-bert/bert-large-cased-whole-word-masking). ## Contribution If you find any issues or have improvements, please create an issue or submit a pull request.