davidkim205
/

Rhea-72b-v0.5

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

davidkim205 commited on Apr 3, 2024

Commit

fe4f3ca

·

verified ·

1 Parent(s): f5219e7

Update README.md

Files changed (1) hide show

README.md +91 -2

README.md CHANGED Viewed

@@ -125,9 +125,98 @@ This method proposes a novel method for generating datasets for DPO (Self-superv
 * **Model Developers** :  davidkim(changyeon kim)
 * **Repository** : [https://github.com/davidkim205/nox](https://github.com/davidkim205/nox)
 * **base mode** : abacusai/Smaug-72B-v0.1
-* **sft dataset** : will be updated soon.
-* **dpo dataset** : will be updated soon.
 ## Evaluation
 ### [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
 | **model**     | **average** | **arc** | **hellaswag** | **mmlu** | **truthfulQA** | **winogrande** | **GSM8k** |

 * **Model Developers** :  davidkim(changyeon kim)
 * **Repository** : [https://github.com/davidkim205/nox](https://github.com/davidkim205/nox)
 * **base mode** : abacusai/Smaug-72B-v0.1
+* **sft dataset** : datasets_enconv_4m
+* **dpo dataset** : datasets_encomp_151k
+## sft dataset info : datasets_enconv_4m
+### 100k random shuffle datasets
+- stack-exchange-preferences
+- SlimOrca
+- alpaca-gpt4
+- SHP
+- HC3
+- databricks-dolly-15k
+- orca-dpo-pairs
+- us-stockname
+- OpenHermes2.5-dpo-binarized-alpha
+- distilabel-math-preference-dpo
+- Neural-DPO
+- truthy-dpo-v0.1
+- distilabel-capybara-dpo-7k-binarized
+- us-sentiment
+- contextual-dpo-v0.1
+### 1k random shuffle datasets
+- bigbench
+- glue_mnli
+- glue_qqp
+- xnli
+- codexglue_code2text_go
+- trivia_qa
+- medmcqa
+- hendrycks_ethics
+- super_glue_record
+- glue_qnli
+- anli_r3
+- swag
+- squad_v2
+- nq_open
+- drop
+- glue_sst2
+- blimp
+- paws-x
+- unscramble
+- anli_r2
+- babi
+- math_qa
+- social_i_qa
+- piqa
+- arithmetic
+- anli_r1
+- prost
+- sciq
+- mc_taco
+- medqa
+- super_glue_boolq
+- hendrycks_math
+- lambada
+- toxigen-data
+- glue_cola
+- pubmed_qa
+- logiqa
+- mutual
+- headqa
+- bbh
+- super_glue_wic
+- openbookqa
+- glue_mrpc
+- web_questions
+- qasper
+- super_glue_multirc
+- story_cloze
+- super_glue_rte
+- glue_rte
+- race
+- xwinograd
+- asdiv
+- xstory_cloze
+- crows_pairs_multilingual
+- belebele
+- glue_wnli
+- super_glue_wsc
+- coqa
+- super_glue_copa
+- super_glue_cb
+- winograd_wsc
+- mgsm
+- scrolls_contract_nli
+* If the data set cannot be found, it is internal company data and cannot be made public.
+## dpo dataset info : datasets_encomp_151k
+Randomly selecting data from each category within the training dataset, we constructed a DPO (Data Perturbation Object) dataset using sentences with logits lower than the mean within the model-generated sentences.
+* I'm sorry I can't reveal it.
 ## Evaluation
 ### [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
 | **model**     | **average** | **arc** | **hellaswag** | **mmlu** | **truthfulQA** | **winogrande** | **GSM8k** |