davidkim205 commited on
Commit
fe4f3ca
·
verified ·
1 Parent(s): f5219e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -2
README.md CHANGED
@@ -125,9 +125,98 @@ This method proposes a novel method for generating datasets for DPO (Self-superv
125
  * **Model Developers** : davidkim(changyeon kim)
126
  * **Repository** : [https://github.com/davidkim205/nox](https://github.com/davidkim205/nox)
127
  * **base mode** : abacusai/Smaug-72B-v0.1
128
- * **sft dataset** : will be updated soon.
129
- * **dpo dataset** : will be updated soon.
130
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
  ## Evaluation
132
  ### [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
133
  | **model** | **average** | **arc** | **hellaswag** | **mmlu** | **truthfulQA** | **winogrande** | **GSM8k** |
 
125
  * **Model Developers** : davidkim(changyeon kim)
126
  * **Repository** : [https://github.com/davidkim205/nox](https://github.com/davidkim205/nox)
127
  * **base mode** : abacusai/Smaug-72B-v0.1
128
+ * **sft dataset** : datasets_enconv_4m
129
+ * **dpo dataset** : datasets_encomp_151k
130
 
131
+ ## sft dataset info : datasets_enconv_4m
132
+ ### 100k random shuffle datasets
133
+ - stack-exchange-preferences
134
+ - SlimOrca
135
+ - alpaca-gpt4
136
+ - SHP
137
+ - HC3
138
+ - databricks-dolly-15k
139
+ - orca-dpo-pairs
140
+ - us-stockname
141
+ - OpenHermes2.5-dpo-binarized-alpha
142
+ - distilabel-math-preference-dpo
143
+ - Neural-DPO
144
+ - truthy-dpo-v0.1
145
+ - distilabel-capybara-dpo-7k-binarized
146
+ - us-sentiment
147
+ - contextual-dpo-v0.1
148
+
149
+ ### 1k random shuffle datasets
150
+ - bigbench
151
+ - glue_mnli
152
+ - glue_qqp
153
+ - xnli
154
+ - codexglue_code2text_go
155
+ - trivia_qa
156
+ - medmcqa
157
+ - hendrycks_ethics
158
+ - super_glue_record
159
+ - glue_qnli
160
+ - anli_r3
161
+ - swag
162
+ - squad_v2
163
+ - nq_open
164
+ - drop
165
+ - glue_sst2
166
+ - blimp
167
+ - paws-x
168
+ - unscramble
169
+ - anli_r2
170
+ - babi
171
+ - math_qa
172
+ - social_i_qa
173
+ - piqa
174
+ - arithmetic
175
+ - anli_r1
176
+ - prost
177
+ - sciq
178
+ - mc_taco
179
+ - medqa
180
+ - super_glue_boolq
181
+ - hendrycks_math
182
+ - lambada
183
+ - toxigen-data
184
+ - glue_cola
185
+ - pubmed_qa
186
+ - logiqa
187
+ - mutual
188
+ - headqa
189
+ - bbh
190
+ - super_glue_wic
191
+ - openbookqa
192
+ - glue_mrpc
193
+ - web_questions
194
+ - qasper
195
+ - super_glue_multirc
196
+ - story_cloze
197
+ - super_glue_rte
198
+ - glue_rte
199
+ - race
200
+ - xwinograd
201
+ - asdiv
202
+ - xstory_cloze
203
+ - crows_pairs_multilingual
204
+ - belebele
205
+ - glue_wnli
206
+ - super_glue_wsc
207
+ - coqa
208
+ - super_glue_copa
209
+ - super_glue_cb
210
+ - winograd_wsc
211
+ - mgsm
212
+ - scrolls_contract_nli
213
+
214
+ * If the data set cannot be found, it is internal company data and cannot be made public.
215
+
216
+ ## dpo dataset info : datasets_encomp_151k
217
+ Randomly selecting data from each category within the training dataset, we constructed a DPO (Data Perturbation Object) dataset using sentences with logits lower than the mean within the model-generated sentences.
218
+ * I'm sorry I can't reveal it.
219
+
220
  ## Evaluation
221
  ### [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
222
  | **model** | **average** | **arc** | **hellaswag** | **mmlu** | **truthfulQA** | **winogrande** | **GSM8k** |