|
--- |
|
language: en |
|
tags: |
|
- rag |
|
- context-compression |
|
- gemma |
|
license: apache-2.0 |
|
datasets: |
|
- hotpotqa |
|
base_model: |
|
- google/gemma-2b-it |
|
--- |
|
|
|
# EXIT: Context-Aware Extractive Compression for RAG |
|
|
|
EXIT is a context-aware extractive compression model that improves the efficiency and effectiveness of Retrieval-Augmented Generation (RAG) by intelligently selecting relevant sentences while preserving contextual dependencies. |
|
|
|
[[Paper]](https://arxiv.org/abs/2412.12559) [[GitHub]](https://github.com/ThisIsHwang/EXIT) |
|
|
|
## Model Description |
|
|
|
EXIT is designed to: |
|
- Compress retrieved documents while preserving critical information |
|
- Consider full document context when evaluating sentence importance |
|
- Enable parallelizable, context-aware extraction |
|
- Adapt dynamically to query complexity |
|
- Balance compression ratio and answer accuracy |
|
|
|
## Task and Intended Use |
|
|
|
EXIT is trained to classify sentences as either relevant or irrelevant for answering a query based on their content and surrounding context. It is specifically designed for: |
|
|
|
- RAG context compression |
|
- Open-domain question answering |
|
- Both single-hop and multi-hop queries |
|
|
|
## Quickstart |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel |
|
import spacy |
|
|
|
# 1. Load models |
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
"google/gemma-2b-it", |
|
device_map="auto", |
|
torch_dtype=torch.float16 |
|
) |
|
exit_model = PeftModel.from_pretrained( |
|
base_model, |
|
"doubleyyh/exit-gemma-2b" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it") |
|
|
|
# 2. Initialize sentence splitter |
|
nlp = spacy.load("en_core_web_sm", disable=[ |
|
"tok2vec", "tagger", "parser", "attribute_ruler", |
|
"lemmatizer", "ner" |
|
]) |
|
nlp.enable_pipe("senter") |
|
|
|
# 3. Input |
|
query = "How do solid-state drives (SSDs) improve computer performance?" |
|
context = """ |
|
Solid-state drives use flash memory to store data without moving parts. |
|
Unlike traditional hard drives, SSDs have no mechanical components. |
|
The absence of physical movement allows for much faster data access speeds. |
|
I bought my computer last week. |
|
SSDs significantly reduce boot times and application loading speeds. |
|
They consume less power and are more reliable than mechanical drives. |
|
The price of SSDs has decreased significantly in recent years. |
|
""" |
|
|
|
# 4. Process sentences |
|
def get_relevance(query: str, context: str, sentence: str, tau: float = 0.5) -> bool: |
|
prompt = f'''<start_of_turn>user |
|
Query: |
|
{query} |
|
Full context: |
|
{context} |
|
Sentence: |
|
{sentence} |
|
Is this sentence useful in answering the query? Answer only "Yes" or "No".<end_of_turn> |
|
<start_of_turn>model |
|
''' |
|
inputs = tokenizer(prompt, return_tensors="pt").to(exit_model.device) |
|
|
|
with torch.no_grad(): |
|
outputs = exit_model(**inputs) |
|
yes_id = tokenizer.encode("Yes", add_special_tokens=False) |
|
no_id = tokenizer.encode("No", add_special_tokens=False) |
|
logits = outputs.logits[0, -1, [yes_id, no_id]] |
|
prob = torch.softmax(logits, dim=0)[0].item() |
|
|
|
return prob >= tau |
|
|
|
# 5. Compress document |
|
sentences = [sent.text.strip() for sent in nlp(context).sents] |
|
compressed = [sent for sent in sentences if get_relevance(query, context, sent)] |
|
compressed_text = " ".join(compressed) |
|
|
|
print(f"Compressed text ({len(compressed)}/{len(sentences)} sentences):", compressed_text) |
|
``` |
|
|
|
## Training Data |
|
|
|
The model was trained on the HotpotQA dataset using: |
|
- Positive examples: Sentences marked as supporting facts |
|
- Hard negatives: Sentences from same documents but not supporting facts |
|
- Random negatives: Sentences from unrelated documents |
|
|
|
## Parameters |
|
|
|
- Base model: Gemma-2b-it |
|
- Training method: PEFT/LoRA |
|
- Recommended tau threshold: 0.5 |
|
|
|
## Limitations |
|
|
|
- Currently optimized for English text only |
|
- No support for cross-lingual compression |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{hwang2024exit, |
|
title={EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation}, |
|
author={Hwang, Taeho and Cho, Sukmin and Jeong, Soyeong and Song, Hoyun and Han, SeungYoon and Park, Jong C.}, |
|
journal={arXiv preprint arXiv:2412.12559}, |
|
year={2024} |
|
} |
|
``` |