File size: 4,152 Bytes
e286ce9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
language: en
tags:
- rag
- context-compression
- gemma
license: apache-2.0
datasets:
- hotpotqa
base_model:
- google/gemma-2b-it
---

# EXIT: Context-Aware Extractive Compression for RAG

EXIT is a context-aware extractive compression model that improves the efficiency and effectiveness of Retrieval-Augmented Generation (RAG) by intelligently selecting relevant sentences while preserving contextual dependencies.

[[Paper]](https://arxiv.org/abs/2412.12559) [[GitHub]](https://github.com/ThisIsHwang/EXIT)

## Model Description

EXIT is designed to:
- Compress retrieved documents while preserving critical information
- Consider full document context when evaluating sentence importance 
- Enable parallelizable, context-aware extraction
- Adapt dynamically to query complexity
- Balance compression ratio and answer accuracy

## Task and Intended Use

EXIT is trained to classify sentences as either relevant or irrelevant for answering a query based on their content and surrounding context. It is specifically designed for:

- RAG context compression
- Open-domain question answering
- Both single-hop and multi-hop queries

## Quickstart

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import spacy

# 1. Load models
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2b-it",
    device_map="auto",
    torch_dtype=torch.float16
)
exit_model = PeftModel.from_pretrained(
    base_model, 
    "doubleyyh/exit-gemma-2b"
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")

# 2. Initialize sentence splitter
nlp = spacy.load("en_core_web_sm", disable=[
    "tok2vec", "tagger", "parser", "attribute_ruler", 
    "lemmatizer", "ner"
])
nlp.enable_pipe("senter")

# 3. Input
query = "How do solid-state drives (SSDs) improve computer performance?"
context = """
Solid-state drives use flash memory to store data without moving parts.
Unlike traditional hard drives, SSDs have no mechanical components.
The absence of physical movement allows for much faster data access speeds.
I bought my computer last week.
SSDs significantly reduce boot times and application loading speeds.
They consume less power and are more reliable than mechanical drives.
The price of SSDs has decreased significantly in recent years.
"""

# 4. Process sentences
def get_relevance(query: str, context: str, sentence: str, tau: float = 0.5) -> bool:
    prompt = f'''<start_of_turn>user
Query:
{query}
Full context:
{context}
Sentence:
{sentence}
Is this sentence useful in answering the query? Answer only "Yes" or "No".<end_of_turn>
<start_of_turn>model
'''
    inputs = tokenizer(prompt, return_tensors="pt").to(exit_model.device)
    
    with torch.no_grad():
        outputs = exit_model(**inputs)
        yes_id = tokenizer.encode("Yes", add_special_tokens=False)
        no_id = tokenizer.encode("No", add_special_tokens=False)
        logits = outputs.logits[0, -1, [yes_id, no_id]]
        prob = torch.softmax(logits, dim=0)[0].item()
        
    return prob >= tau

# 5. Compress document
sentences = [sent.text.strip() for sent in nlp(context).sents]
compressed = [sent for sent in sentences if get_relevance(query, context, sent)]
compressed_text = " ".join(compressed)

print(f"Compressed text ({len(compressed)}/{len(sentences)} sentences):", compressed_text)
```

## Training Data

The model was trained on the HotpotQA dataset using:
- Positive examples: Sentences marked as supporting facts
- Hard negatives: Sentences from same documents but not supporting facts  
- Random negatives: Sentences from unrelated documents

## Parameters

- Base model: Gemma-2b-it
- Training method: PEFT/LoRA
- Recommended tau threshold: 0.5

## Limitations

- Currently optimized for English text only
- No support for cross-lingual compression

## Citation

```bibtex
@article{hwang2024exit,
  title={EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation},
  author={Hwang, Taeho and Cho, Sukmin and Jeong, Soyeong and Song, Hoyun and Han, SeungYoon and Park, Jong C.},
  journal={arXiv preprint arXiv:2412.12559},
  year={2024}
}
```