|
--- |
|
license: mit |
|
base_model: |
|
- ibm-granite/granite-embedding-125m-english |
|
pipeline_tag: feature-extraction |
|
tags: |
|
- rag |
|
- embedding |
|
--- |
|
|
|
# ONNX Converted Version of IBM Granite Embedding Model |
|
|
|
This repository contains the ONNX converted version of the Hugging Face model [IBM Granite Embedding 125M English](https://huggingface.co/ibm-granite/granite-embedding-125m-english). |
|
|
|
## Running the Model |
|
|
|
You can run the ONNX model using the following code: |
|
|
|
```python |
|
import onnxruntime as ort |
|
from transformers import AutoTokenizer |
|
import numpy as np |
|
|
|
# Define paths |
|
model_path = "./model_uint8.onnx" # Path to ONNX model file |
|
tokenizer_path = "./" # Path to folder containing tokenizer.json and tokenizer_config.json |
|
|
|
# Load tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path) |
|
|
|
# Load ONNX model using ONNX Runtime |
|
onnx_session = ort.InferenceSession(model_path) |
|
|
|
# Example text input |
|
text = "hi." |
|
|
|
# Tokenize input |
|
inputs = tokenizer(text, return_tensors="np", truncation=True, padding=True) |
|
|
|
# Prepare input for ONNX model |
|
onnx_inputs = {key: inputs[key].astype(np.int64) for key in inputs.keys()} |
|
|
|
# Run inference |
|
outputs = onnx_session.run(None, onnx_inputs) |
|
|
|
# Extract embeddings (e.g., using mean pooling) |
|
last_hidden_state = outputs[0] # Assuming the first output is the last hidden state |
|
pooled_embedding = last_hidden_state.mean(axis=1) # Mean pooling over the sequence dimension |
|
|
|
print(f"Embedding: {pooled_embedding}") |