File size: 1,446 Bytes

---
license: mit
base_model:
- ibm-granite/granite-embedding-125m-english
pipeline_tag: feature-extraction
tags:
- rag
- embedding
---

# ONNX Converted Version of IBM Granite Embedding Model

This repository contains the ONNX converted version of the Hugging Face model [IBM Granite Embedding 125M English](https://huggingface.co/ibm-granite/granite-embedding-125m-english).

## Running the Model

You can run the ONNX model using the following code:

```python
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np

# Define paths
model_path = "./model_uint8.onnx"  # Path to ONNX model file
tokenizer_path = "./"  # Path to folder containing tokenizer.json and tokenizer_config.json

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)

# Load ONNX model using ONNX Runtime
onnx_session = ort.InferenceSession(model_path)

# Example text input
text = "hi."

# Tokenize input
inputs = tokenizer(text, return_tensors="np", truncation=True, padding=True)

# Prepare input for ONNX model
onnx_inputs = {key: inputs[key].astype(np.int64) for key in inputs.keys()}

# Run inference
outputs = onnx_session.run(None, onnx_inputs)

# Extract embeddings (e.g., using mean pooling)
last_hidden_state = outputs[0]  # Assuming the first output is the last hidden state
pooled_embedding = last_hidden_state.mean(axis=1)  # Mean pooling over the sequence dimension

print(f"Embedding: {pooled_embedding}")