File size: 1,446 Bytes
0cf27eb 6e85e8f 6f9fbe3 6e85e8f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
license: mit
base_model:
- ibm-granite/granite-embedding-125m-english
pipeline_tag: feature-extraction
tags:
- rag
- embedding
---
# ONNX Converted Version of IBM Granite Embedding Model
This repository contains the ONNX converted version of the Hugging Face model [IBM Granite Embedding 125M English](https://huggingface.co/ibm-granite/granite-embedding-125m-english).
## Running the Model
You can run the ONNX model using the following code:
```python
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
# Define paths
model_path = "./model_uint8.onnx" # Path to ONNX model file
tokenizer_path = "./" # Path to folder containing tokenizer.json and tokenizer_config.json
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
# Load ONNX model using ONNX Runtime
onnx_session = ort.InferenceSession(model_path)
# Example text input
text = "hi."
# Tokenize input
inputs = tokenizer(text, return_tensors="np", truncation=True, padding=True)
# Prepare input for ONNX model
onnx_inputs = {key: inputs[key].astype(np.int64) for key in inputs.keys()}
# Run inference
outputs = onnx_session.run(None, onnx_inputs)
# Extract embeddings (e.g., using mean pooling)
last_hidden_state = outputs[0] # Assuming the first output is the last hidden state
pooled_embedding = last_hidden_state.mean(axis=1) # Mean pooling over the sequence dimension
print(f"Embedding: {pooled_embedding}") |