IlysvlVEizbr
commited on
Commit
•
970e4b6
1
Parent(s):
29cbdb2
Fix typo
Browse files
README.md
CHANGED
@@ -244,7 +244,7 @@ The accuracy of Qwen-7B-Chat on GSM8K is shown below
|
|
244 |
|
245 |
通过NTK插值,LogN注意力缩放可以扩展Qwen-7B-Chat的上下文长度。在长文本摘要数据集[VCSUM](https://arxiv.org/abs/2305.05280)上(文本平均长度在15K左右),Qwen-7B-Chat的Rouge-L结果如下:
|
246 |
|
247 |
-
**(若要启用这些技巧,请将config.json里的`
|
248 |
|
249 |
We introduce NTK-aware interpolation, LogN attention scaling to extend the context length of Qwen-7B-Chat. The Rouge-L results of Qwen-7B-Chat on long-text summarization dataset [VCSUM](https://arxiv.org/abs/2305.05280) (The average length of this dataset is around 15K) are shown below:
|
250 |
|
|
|
244 |
|
245 |
通过NTK插值,LogN注意力缩放可以扩展Qwen-7B-Chat的上下文长度。在长文本摘要数据集[VCSUM](https://arxiv.org/abs/2305.05280)上(文本平均长度在15K左右),Qwen-7B-Chat的Rouge-L结果如下:
|
246 |
|
247 |
+
**(若要启用这些技巧,请将config.json里的`use_dynamic_ntk`和`use_logn_attn`设置为true)**
|
248 |
|
249 |
We introduce NTK-aware interpolation, LogN attention scaling to extend the context length of Qwen-7B-Chat. The Rouge-L results of Qwen-7B-Chat on long-text summarization dataset [VCSUM](https://arxiv.org/abs/2305.05280) (The average length of this dataset is around 15K) are shown below:
|
250 |
|