--- license: mit base_model: - ibm-granite/granite-embedding-125m-english pipeline_tag: feature-extraction tags: - rag - embedding --- # ONNX Converted Version of IBM Granite Embedding Model This repository contains the ONNX converted version of the Hugging Face model [IBM Granite Embedding 125M English](https://huggingface.co/ibm-granite/granite-embedding-125m-english). ## Running the Model You can run the ONNX model using the following code: ```python import onnxruntime as ort from transformers import AutoTokenizer import numpy as np # Define paths model_path = "./model_uint8.onnx" # Path to ONNX model file tokenizer_path = "./" # Path to folder containing tokenizer.json and tokenizer_config.json # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(tokenizer_path) # Load ONNX model using ONNX Runtime onnx_session = ort.InferenceSession(model_path) # Example text input text = "hi." # Tokenize input inputs = tokenizer(text, return_tensors="np", truncation=True, padding=True) # Prepare input for ONNX model onnx_inputs = {key: inputs[key].astype(np.int64) for key in inputs.keys()} # Run inference outputs = onnx_session.run(None, onnx_inputs) # Extract embeddings (e.g., using mean pooling) last_hidden_state = outputs[0] # Assuming the first output is the last hidden state pooled_embedding = last_hidden_state.mean(axis=1) # Mean pooling over the sequence dimension print(f"Embedding: {pooled_embedding}")