--- license: mit language: - en pipeline_tag: text-classification tags: - medical - finance - chemistry - biology --- ![BGE-reranking](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*tCBbIjV_jLZP1AKLTX7rAw.png) # BGE-Renranker-Large This is an `int8` converted version of [bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large). Thanks to `c2translate` this should be at least 3 times faster than the original hf transformer version while its smaller with minimal performance loss. ## Model Details Different from embedding model `bge-large-en-v1.5`, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. Besides this is highly optimized version using `c2translate` library suitable for production environments. ### Model Sources The original model is based on `BAAI` `BGE-Reranker` model. Please visit [bge-reranker-orignal-repo](https://huggingface.co/BAAI/bge-reranker-large) for more details. ## Usage Simply `pip install ctranslate2` and then ```python import ctranslate2 import transformers import torch device_mapping="cuda" if torch.cuda.is_available() else "cpu" model_dir = "hooman650/ct2fast-bge-reranker" # ctranslate2 encoder heavy lifting encoder = ctranslate2.Encoder(model_dir, device = device_mapping) # the classification head comes from HF model_name = "BAAI/bge-reranker-large" tokenizer = transformers.AutoTokenizer.from_pretrained(model_name) classifier = transformers.AutoModelForSequenceClassification.from_pretrained(model_name).classifier classifier.eval() classifier.to(device_mapping) pairs = [ ["I like Ctranslate2","Ctranslate2 makes mid range models faster"], ["I like Ctranslate2","Using naive transformers might not be suitable for deployment"] ] with torch.no_grad(): tokens = tokenizer(pairs, padding=True, truncation=True, max_length=512).input_ids output = encoder.forward_batch(tokens) hidden_state = torch.as_tensor(output.last_hidden_state, device=device_mapping) logits = classifier(hidden_state).squeeze() print(logits) # tensor([ 1.0474, -9.4694], device='cuda:0') ``` #### Hardware Supports both GPU and CPU.