Question Answering
Transformers
Safetensors
English
doge
text-generation
custom_code
JingzeShi commited on
Commit
ea6fe65
verified
1 Parent(s): c5dfd83

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -5,7 +5,7 @@ datasets:
5
  - HuggingFaceTB/smoltalk
6
  - HuggingFaceH4/ultrafeedback_binarized
7
  base_model:
8
- - JingzeShi/Doge-20M
9
  language:
10
  - en
11
  pipeline_tag: question-answering
@@ -26,8 +26,8 @@ In addition, Doge uses Dynamic Mask Attention as sequence transformation and can
26
  ```python
27
  from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer
28
 
29
- tokenizer = AutoTokenizer.from_pretrained("JingzeShi/Doge-20M-Instruct")
30
- model = AutoModelForCausalLM.from_pretrained("JingzeShi/Doge-20M-Instruct", trust_remote_code=True)
31
 
32
  generation_config = GenerationConfig(
33
  max_new_tokens=100,
@@ -70,14 +70,14 @@ We build the Doge-Instruct by first SFT on [SmolTalk](https://huggingface.co/dat
70
  **SFT**:
71
  | Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
72
  |---|---|---|---|---|---|---|
73
- | [Doge-20M-Instruct-SFT](https://huggingface.co/JingzeShi/Doge-20M-Instruct-SFT) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 2048 | 8e-4 | 0.25M | bfloat16 |
74
- | [Doge-60M-Instruct](https://huggingface.co/JingzeShi/Doge-60M-Instruct) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 2048 | 6e-4 | 0.25M | bfloat16 |
75
 
76
  **DPO**:
77
  | Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
78
  |---|---|---|---|---|---|---|
79
- | [Doge-20M-Instruct](https://huggingface.co/JingzeShi/Doge-20M-Instruct) | [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 2 | 1024 | 8e-5 | 0.125M | bfloat16 |
80
- | [Doge-60M-Instruct](https://huggingface.co/JingzeShi/Doge-60M-Instruct) | [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 2 | 1024 | 6e-5 | 0.125M | bfloat16 |
81
 
82
 
83
  **Procedure**:
 
5
  - HuggingFaceTB/smoltalk
6
  - HuggingFaceH4/ultrafeedback_binarized
7
  base_model:
8
+ - SmallDoge/Doge-20M
9
  language:
10
  - en
11
  pipeline_tag: question-answering
 
26
  ```python
27
  from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer
28
 
29
+ tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-20M-Instruct")
30
+ model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-20M-Instruct", trust_remote_code=True)
31
 
32
  generation_config = GenerationConfig(
33
  max_new_tokens=100,
 
70
  **SFT**:
71
  | Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
72
  |---|---|---|---|---|---|---|
73
+ | [Doge-20M-Instruct-SFT](https://huggingface.co/SmallDoge/Doge-20M-Instruct-SFT) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 2048 | 8e-4 | 0.25M | bfloat16 |
74
+ | [Doge-60M-Instruct](https://huggingface.co/SmallDoge/Doge-60M-Instruct) | [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) | 2 | 2048 | 6e-4 | 0.25M | bfloat16 |
75
 
76
  **DPO**:
77
  | Model | Training Data | Epochs | Content Length | LR | Batch Size | Precision |
78
  |---|---|---|---|---|---|---|
79
+ | [Doge-20M-Instruct](https://huggingface.co/SmallDoge/Doge-20M-Instruct) | [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 2 | 1024 | 8e-5 | 0.125M | bfloat16 |
80
+ | [Doge-60M-Instruct](https://huggingface.co/SmallDoge/Doge-60M-Instruct) | [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 2 | 1024 | 6e-5 | 0.125M | bfloat16 |
81
 
82
 
83
  **Procedure**: