rokeya71
/

granite-embedding-125m-english-onnx

Feature Extraction

Model card Files Files and versions Community

granite-embedding-125m-english-onnx / README.md

rokeya71's picture

Update README.md

6f9fbe3 verified 13 days ago

|

history blame contribute delete

1.45 kB

	---
	license: mit
	base_model:
	- ibm-granite/granite-embedding-125m-english
	pipeline_tag: feature-extraction
	tags:
	- rag
	- embedding
	---

	# ONNX Converted Version of IBM Granite Embedding Model

	This repository contains the ONNX converted version of the Hugging Face model [IBM Granite Embedding 125M English](https://huggingface.co/ibm-granite/granite-embedding-125m-english).

	## Running the Model

	You can run the ONNX model using the following code:

	```python
	import onnxruntime as ort
	from transformers import AutoTokenizer
	import numpy as np

	# Define paths
	model_path = "./model_uint8.onnx" # Path to ONNX model file
	tokenizer_path = "./" # Path to folder containing tokenizer.json and tokenizer_config.json

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)

	# Load ONNX model using ONNX Runtime
	onnx_session = ort.InferenceSession(model_path)

	# Example text input
	text = "hi."

	# Tokenize input
	inputs = tokenizer(text, return_tensors="np", truncation=True, padding=True)

	# Prepare input for ONNX model
	onnx_inputs = {key: inputs[key].astype(np.int64) for key in inputs.keys()}

	# Run inference
	outputs = onnx_session.run(None, onnx_inputs)

	# Extract embeddings (e.g., using mean pooling)
	last_hidden_state = outputs[0] # Assuming the first output is the last hidden state
	pooled_embedding = last_hidden_state.mean(axis=1) # Mean pooling over the sequence dimension

	print(f"Embedding: {pooled_embedding}")