Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@ datasets:
|
|
5 |
- HuggingFaceTB/smoltalk
|
6 |
- HuggingFaceH4/ultrafeedback_binarized
|
7 |
base_model:
|
8 |
-
-
|
9 |
language:
|
10 |
- en
|
11 |
pipeline_tag: question-answering
|
@@ -26,8 +26,8 @@ In addition, Doge uses Dynamic Mask Attention as sequence transformation and can
|
|
26 |
```python
|
27 |
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer
|
28 |
|
29 |
-
tokenizer = AutoTokenizer.from_pretrained("
|
30 |
-
model = AutoModelForCausalLM.from_pretrained("
|
31 |
|
32 |
generation_config = GenerationConfig(
|
33 |
max_new_tokens=100,
|
@@ -70,14 +70,14 @@ We build the Doge-Instruct by first SFT on [SmolTalk](https://huggingface.co/dat
|
|
70 |
**SFT**:
|
71 |
| Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
|
72 |
|---|---|---|---|---|---|---|
|
73 |
-
| [Doge-20M-Instruct-SFT](https://huggingface.co/
|
74 |
-
| [Doge-60M-Instruct](https://huggingface.co/
|
75 |
|
76 |
**DPO**:
|
77 |
| Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
|
78 |
|---|---|---|---|---|---|---|
|
79 |
-
| [Doge-20M-Instruct](https://huggingface.co/
|
80 |
-
| [Doge-60M-Instruct](https://huggingface.co/
|
81 |
|
82 |
|
83 |
**Procedure**:
|
|
|
5 |
- HuggingFaceTB/smoltalk
|
6 |
- HuggingFaceH4/ultrafeedback_binarized
|
7 |
base_model:
|
8 |
+
- SmallDoge/Doge-20M
|
9 |
language:
|
10 |
- en
|
11 |
pipeline_tag: question-answering
|
|
|
26 |
```python
|
27 |
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer
|
28 |
|
29 |
+
tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-20M-Instruct")
|
30 |
+
model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-20M-Instruct", trust_remote_code=True)
|
31 |
|
32 |
generation_config = GenerationConfig(
|
33 |
max_new_tokens=100,
|
|
|
70 |
**SFT**:
|
71 |
| Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
|
72 |
|---|---|---|---|---|---|---|
|
73 |
+
| [Doge-20M-Instruct-SFT](https://huggingface.co/SmallDoge/Doge-20M-Instruct-SFT) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 2048 | 8e-4 | 0.25M | bfloat16 |
|
74 |
+
| [Doge-60M-Instruct](https://huggingface.co/SmallDoge/Doge-60M-Instruct) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 2048 | 6e-4 | 0.25M | bfloat16 |
|
75 |
|
76 |
**DPO**:
|
77 |
| Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
|
78 |
|---|---|---|---|---|---|---|
|
79 |
+
| [Doge-20M-Instruct](https://huggingface.co/SmallDoge/Doge-20M-Instruct) | [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 2 | 1024 | 8e-5 | 0.125M | bfloat16 |
|
80 |
+
| [Doge-60M-Instruct](https://huggingface.co/SmallDoge/Doge-60M-Instruct) | [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 2 | 1024 | 6e-5 | 0.125M | bfloat16 |
|
81 |
|
82 |
|
83 |
**Procedure**:
|