Feature(LLMLingua): update the news
Browse files
app.py
CHANGED
@@ -7,7 +7,7 @@ INTRO = """
|
|
7 |
# LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models (EMNLP 2023) [[paper](https://arxiv.org/abs/2310.05736)]
|
8 |
_Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
|
9 |
|
10 |
-
This is an early demo of the prompt compression method LLMLingua.
|
11 |
|
12 |
It should be noted that due to limited resources, we only provide the **GPT2-Small** size language model in this demo. Using the **LLaMA2-7B** as a small language model would result in a significant performance improvement, especially at high compression ratios.
|
13 |
|
@@ -19,10 +19,15 @@ To use it, upload your prompt and set the compression target.
|
|
19 |
2. ✅ Set the target_token or compression ratio.
|
20 |
3. 🤔 Try experimenting with different target compression ratios or other hyperparameters to optimize the performance.
|
21 |
|
22 |
-
You can check our [
|
23 |
|
24 |
We also has a work to compress long context scenories, using less cost but even improve the downstream performance, LongLLMLingua.<br>
|
25 |
[LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression](https://arxiv.org/abs/2310.06839) (Under Review).<br>
|
|
|
|
|
|
|
|
|
|
|
26 |
"""
|
27 |
|
28 |
INTRO_EXAMPLES = '''
|
|
|
7 |
# LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models (EMNLP 2023) [[paper](https://arxiv.org/abs/2310.05736)]
|
8 |
_Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
|
9 |
|
10 |
+
### This is an <b>early demo</b> of the prompt compression method LLMLingua and <b>the capabilities are limited</b>, restricted to using only the GPT-2 small size mode.
|
11 |
|
12 |
It should be noted that due to limited resources, we only provide the **GPT2-Small** size language model in this demo. Using the **LLaMA2-7B** as a small language model would result in a significant performance improvement, especially at high compression ratios.
|
13 |
|
|
|
19 |
2. ✅ Set the target_token or compression ratio.
|
20 |
3. 🤔 Try experimenting with different target compression ratios or other hyperparameters to optimize the performance.
|
21 |
|
22 |
+
You can check our [project page](https://llmlingua.com/)!
|
23 |
|
24 |
We also has a work to compress long context scenories, using less cost but even improve the downstream performance, LongLLMLingua.<br>
|
25 |
[LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression](https://arxiv.org/abs/2310.06839) (Under Review).<br>
|
26 |
+
|
27 |
+
## News
|
28 |
+
|
29 |
+
- 🎈 We launched a [project page](https://llmlingua.com/) showcasing real-world case studies, including RAG, Online Meetings, CoT, and Code;
|
30 |
+
- 👾 LongLLMLingua has been incorporated into the [LlamaIndex pipeline](https://github.com/run-llama/llama_index/blob/main/llama_index/indices/postprocessor/longllmlingua.py), which is a widely used RAG framework.
|
31 |
"""
|
32 |
|
33 |
INTRO_EXAMPLES = '''
|