Pclanglais
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,90 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
***Jambert*** is an experimental Jamba model fine-tuned for RAG tasks and document synthesis.
|
5 |
+
|
6 |
+
Given a question and a list of references, Jambert will write a summarized version.
|
7 |
+
|
8 |
+
As an initial test, Jambert is for now trained on a 4,096 token context window but with the expectations of doing later iteration on significantly longer texts, thanks to the Mamba architecture.
|
9 |
+
|
10 |
+
## Training.
|
11 |
+
Jambert was trained with Axolotl on a set of administrative documents and associated synthesis in French and English. It could work out as well in other languages, as this task has been proven to transfer easily accross languages.
|
12 |
+
|
13 |
+
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
14 |
+
<details><summary>See axolotl config</summary>
|
15 |
+
|
16 |
+
axolotl version: `0.4.0`
|
17 |
+
```yaml
|
18 |
+
|
19 |
+
base_model: jamba
|
20 |
+
trust_remote_code: true
|
21 |
+
|
22 |
+
load_in_8bit: false
|
23 |
+
load_in_4bit: true
|
24 |
+
strict: false
|
25 |
+
|
26 |
+
datasets:
|
27 |
+
- path: rag_dataset.json
|
28 |
+
ds_type: json
|
29 |
+
type: sharegpt
|
30 |
+
conversation: chatml
|
31 |
+
dataset_prepared_path:
|
32 |
+
val_set_size: 0.01
|
33 |
+
output_dir: ./out
|
34 |
+
|
35 |
+
sequence_len: 6000
|
36 |
+
sample_packing: true
|
37 |
+
pad_to_sequence_len: false
|
38 |
+
eval_sample_packing: true
|
39 |
+
|
40 |
+
use_wandb: false
|
41 |
+
|
42 |
+
adapter: qlora
|
43 |
+
lora_r: 8
|
44 |
+
lora_alpha: 16
|
45 |
+
lora_dropout: 0.05
|
46 |
+
lora_target_linear: true
|
47 |
+
|
48 |
+
low_cpu_mem_usage: true
|
49 |
+
gradient_accumulation_steps: 4
|
50 |
+
micro_batch_size: 1
|
51 |
+
num_epochs: 2
|
52 |
+
optimizer: paged_adamw_8bit
|
53 |
+
lr_scheduler: cosine
|
54 |
+
learning_rate: 0.0002
|
55 |
+
|
56 |
+
train_on_inputs: false
|
57 |
+
group_by_length: false
|
58 |
+
bf16: auto
|
59 |
+
fp16:
|
60 |
+
tf32: false
|
61 |
+
|
62 |
+
gradient_checkpointing: true
|
63 |
+
gradient_checkpointing_kwargs:
|
64 |
+
use_reentrant: false
|
65 |
+
early_stopping_patience:
|
66 |
+
resume_from_checkpoint:
|
67 |
+
local_rank:
|
68 |
+
logging_steps: 1
|
69 |
+
xformers_attention:
|
70 |
+
flash_attention: true
|
71 |
+
|
72 |
+
warmup_steps: 10
|
73 |
+
evals_per_epoch: 2
|
74 |
+
saves_per_epoch: 2
|
75 |
+
debug:
|
76 |
+
weight_decay: 0.0
|
77 |
+
special_tokens:
|
78 |
+
|
79 |
+
```
|
80 |
+
|
81 |
+
</details><br>
|
82 |
+
|
83 |
+
## Inference.
|
84 |
+
The repository provides both a 4-bit version that should run easily on any 80b or even 40b GPU, as well as the original adapter to be used in combination with the base model.
|
85 |
+
|
86 |
+
Inference was tested with the following script:
|
87 |
+
|
88 |
+
|
89 |
+
|
90 |
+
|