ikala-ray commited on
Commit
d512a31
·
1 Parent(s): ea4f9c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -17
README.md CHANGED
@@ -1,5 +1,20 @@
1
  ---
2
  license: bigscience-openrail-m
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  datasets:
4
  - OpenAssistant/oasst1
5
  - databricks/databricks-dolly-15k
@@ -8,34 +23,155 @@ datasets:
8
  - theblackcat102/joke_explaination
9
  ---
10
 
11
- Based from [ckip-joint/bloom-3b-zh](https://huggingface.co/ckip-joint/bloom-3b-zh)
 
 
 
 
 
 
 
12
 
13
  **License:** MEDIATEK RESEARCH License ([link](https://huggingface.co/ckip-joint/bloom-3b-zh/blob/main/LICENSE_MR.md)) and RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license)), Non commercial
14
 
15
 
16
- Following [open assistant](https://github.com/LAION-AI/Open-Assistant) chat format
 
 
 
 
 
 
 
 
 
 
 
17
 
 
18
  ```
19
- <|prompter|>Hi, can you introduce yourself?</s><|assistant|>
20
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
- Output:
 
 
 
 
 
23
  ```
24
- Hello! My name is Open Assistant. I am an AI assistant trained by iKala AI Team. I can help you with a wide range of tasks, from answering questions to providing information and generating text. How can I help you?</s>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ```
26
 
27
- What this model is good for:
28
-
29
- - Summarization
30
- - Translation
31
- - Simple conversations
32
 
33
- Model is bad in:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
- - Math
36
- - World knowledge
37
- - Chain of thought
38
- - use of tools
39
- - LangChain or AutoGPT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
- [wandb](https://wandb.ai/ikala-ml-team/llm-supervised-finetuning/runs/lrfa95j3)
 
1
  ---
2
  license: bigscience-openrail-m
3
+ language:
4
+ - en
5
+ - zh
6
+ - ja
7
+ tags:
8
+ - sft
9
+ pipeline_tag: text-generation
10
+ widget:
11
+ - text: >-
12
+ <|prompter|>What is a meme, and what's the history behind this
13
+ word?<|endoftext|><|assistant|>
14
+ - text: <|prompter|>What's the Earth total population<|endoftext|><|assistant|>
15
+ - text: >-
16
+ <|prompter|>Write a story about future of AI
17
+ development<|endoftext|><|assistant|>
18
  datasets:
19
  - OpenAssistant/oasst1
20
  - databricks/databricks-dolly-15k
 
23
  - theblackcat102/joke_explaination
24
  ---
25
 
26
+ # Bloom-3B SFT model
27
+
28
+ It is based on a Bloom-zh's 3B that was fine-tuned on human demonstrations
29
+ of assistant conversations collected through the
30
+ [https://open-assistant.io/](https://open-assistant.io/) human feedback web
31
+ app before April 12, 2023.
32
+
33
+ supervised finetune on sequence of 5120 conversation dataset
34
 
35
  **License:** MEDIATEK RESEARCH License ([link](https://huggingface.co/ckip-joint/bloom-3b-zh/blob/main/LICENSE_MR.md)) and RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license)), Non commercial
36
 
37
 
38
+ ## Model Details
39
+
40
+ - **Developed by:** [Open-Assistant Contributors](https://open-assistant.io/) and [iKala](https://ikala.ai/)
41
+ - **Model type:** Transformer-based Language Model
42
+ - **Language:** English, Chinese, Japanese
43
+ - **Finetuned from:** [ckip-joint/bloom-3b-zh](https://huggingface.co/ckip-joint/bloom-3b-zh)
44
+ - **Code:** [Open-Assistant/model/model_training](https://github.com/LAION-AI/Open-Assistant/tree/main/model/model_training)
45
+
46
+ ## Prompting
47
+
48
+ Two special tokens are used to mark the beginning of user and assistant turns:
49
+ `<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
50
 
51
+ Input prompt example:
52
  ```
53
+ <|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>
54
  ```
55
+ The input ends with the `<|assistant|>` token to signal that the model should
56
+ start generating the assistant reply.
57
+
58
+ ## Benchmark
59
+
60
+
61
+ | model | MMLU | BBH | Humaneval @10 |
62
+ |---|---|---|---|
63
+ | [ikala/redpajama-3b-chat](https://huggingface.co/ikala/redpajama-3b-chat) | 24.6 | 29.3 | 4.8 |
64
+ | [ikala/bloom-zh-chat-3b](https://huggingface.co/ikala/bloom-zh-chat-3b) | 31.4 | 30.2 | 0.0 |
65
+ | llama-7b (reference) | 30.9 | 27.6 | 10.3 |
66
+
67
+ ## Dev Details
68
 
69
+ - base model: [togethercomputer/RedPajama-INCITE-Base-3B-v1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1)
70
+ - checkpoint: 1 epoch (6000 steps)
71
+
72
+ command: `deepspeed trainer_sft.py --configs defaults stablelm-7b oasst-mix --cache_dir /home/ubuntu/data_cache --output_dir .saved/stable-lm-7b-1 --num_train_epochs 4 --deepspeed`
73
+
74
+ data:
75
  ```
76
+ datasets:
77
+ - wmt2019_zh-en:
78
+ max_val_set: 1000
79
+ max_train_set: 20000
80
+ - ted_trans_en-ja:
81
+ max_val_set: 1000
82
+ max_train_set: 20000
83
+ - ted_trans_zh-ja:
84
+ max_val_set: 1000
85
+ max_train_set: 20000
86
+ - ikala:
87
+ input_file_path: export_conversation_v4.4.jsonl
88
+ val_split: 0.05
89
+ - dolly15k:
90
+ val_split: 0.05
91
+ - oasst_export:
92
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk,zh,ja,th,ko"
93
+ input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
94
+ val_split: 0.05
95
+ - joke
96
+ - gsm8k
97
+ - webgpt
98
  ```
99
 
100
+ with internal datasets `ikala` so if you try to reproduce please remove the dataset
 
 
 
 
101
 
102
+ redpajama-3b:
103
+ ```
104
+ redpajama-3b:
105
+ dtype: fp16
106
+ log_dir: "redpajama_3b"
107
+ learning_rate: 1e-5
108
+ model_name: saved_models/RedPajama-INCITE-Base-3B-v1
109
+ output_dir: ikala_v4_3b
110
+ weight_decay: 0.0
111
+ max_length: 8196
112
+ warmup_steps: 2000
113
+ gradient_checkpointing: true
114
+ gradient_accumulation_steps: 32
115
+ per_device_train_batch_size: 1
116
+ per_device_eval_batch_size: 2
117
+ eval_steps: 500
118
+ save_steps: 1000
119
+ num_train_epochs: 8
120
+ save_total_limit: 2
121
+ deepspeed_config: configs/zero3_config_sft.json
122
+ ```
123
 
124
+ zero config:
125
+ ```
126
+ {
127
+ "fp16": {
128
+ "enabled": "auto",
129
+ "loss_scale": 0,
130
+ "loss_scale_window": 1000,
131
+ "initial_scale_power": 16,
132
+ "hysteresis": 2,
133
+ "min_loss_scale": 1
134
+ },
135
+ "bf16": {
136
+ "enabled": "auto"
137
+ },
138
+ "optimizer": {
139
+ "type": "AdamW",
140
+ "params": {
141
+ "lr": "auto",
142
+ "betas": "auto",
143
+ "eps": "auto",
144
+ "weight_decay": "auto"
145
+ }
146
+ },
147
+ "scheduler": {
148
+ "type": "WarmupDecayLR",
149
+ "params": {
150
+ "warmup_min_lr": "auto",
151
+ "warmup_max_lr": "auto",
152
+ "warmup_num_steps": "auto",
153
+ "warmup_type": "linear",
154
+ "total_num_steps": "auto"
155
+ }
156
+ },
157
+ "zero_optimization": {
158
+ "stage": 3,
159
+ "overlap_comm": true,
160
+ "contiguous_gradients": true,
161
+ "sub_group_size": 1e9,
162
+ "reduce_bucket_size": "auto",
163
+ "stage3_prefetch_bucket_size": "auto",
164
+ "stage3_param_persistence_threshold": "auto",
165
+ "stage3_max_live_parameters": 1e9,
166
+ "stage3_max_reuse_distance": 1e9,
167
+ "stage3_gather_16bit_weights_on_model_save": true
168
+ },
169
+ "gradient_accumulation_steps": "auto",
170
+ "gradient_clipping": "auto",
171
+ "steps_per_print": 2000,
172
+ "train_batch_size": "auto",
173
+ "train_micro_batch_size_per_gpu": "auto",
174
+ "wall_clock_breakdown": false
175
+ }
176
 
177
+ ```