Pclanglais
/

Jambert

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Metrics Training metrics Community

Pclanglais commited on Apr 3, 2024

Commit

c4ba05e

·

verified ·

1 Parent(s): c215d89

Update README.md

Files changed (1) hide show

README.md +87 -0

README.md CHANGED Viewed

@@ -1,3 +1,90 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+***Jambert*** is an experimental Jamba model fine-tuned for RAG tasks and document synthesis.
+Given a question and a list of references, Jambert will write a summarized version.
+As an initial test, Jambert is for now trained on a 4,096 token context window but with the expectations of doing later iteration on significantly longer texts, thanks to the Mamba architecture.
+## Training.
+Jambert was trained with Axolotl on a set of administrative documents and associated synthesis in French and English. It could work out as well in other languages, as this task has been proven to transfer easily accross languages.
+[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.4.0`
+```yaml
+base_model: jamba
+trust_remote_code: true
+load_in_8bit: false
+load_in_4bit: true
+strict: false
+datasets:
+  - path: rag_dataset.json
+    ds_type: json
+    type: sharegpt
+    conversation: chatml
+dataset_prepared_path:
+val_set_size: 0.01
+output_dir: ./out
+sequence_len: 6000
+sample_packing: true
+pad_to_sequence_len: false
+eval_sample_packing: true
+use_wandb: false
+adapter: qlora
+lora_r: 8
+lora_alpha: 16
+lora_dropout: 0.05
+lora_target_linear: true
+low_cpu_mem_usage: true
+gradient_accumulation_steps: 4
+micro_batch_size: 1
+num_epochs: 2
+optimizer: paged_adamw_8bit
+lr_scheduler: cosine
+learning_rate: 0.0002
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+early_stopping_patience:
+resume_from_checkpoint:
+local_rank:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+warmup_steps: 10
+evals_per_epoch: 2
+saves_per_epoch: 2
+debug:
+weight_decay: 0.0
+special_tokens:
+```
+</details><br>
+## Inference.
+The repository provides both a 4-bit version that should run easily on any 80b or even 40b GPU, as well as the original adapter to be used in combination with the base model.
+Inference was tested with the following script: