Skywork-Reward-Data-Collection Collection Open-source preference datasets used to train the Skywork reward model series • 17 items • Updated Oct 12, 2024 • 12
MagpieLM Collection Aligning LMs with Fully Open Recipe + Synthetic Data Generated from Open-Source LMs. • 9 items • Updated 2 days ago • 15
Direct Preference Optimization Datasets Collection Datasets suitable for Direct Preference Optimization based on their colum names • 1597 items • Updated Jul 10, 2024 • 2
Probably DPO datasets Collection A collection of datasets that probably support DPO • 146 items • Updated Jun 26, 2024 • 12
Datasets built with ⚗️ distilabel Collection This collection contains some datasets generated and/or labelled using https://github.com/argilla-io/distilabel • 8 items • Updated 26 days ago • 12
Translated (En->Ko) dataset Collection Datasets translated from English to Korean using llama3-instrucTrans-enko-8b • 19 items • Updated Nov 23, 2024 • 3
Synthetic (text) Dataset Generation Collection Papers about synthetic dataset generation • 9 items • Updated Jun 21, 2024 • 8
synthetic-data-generation-demos Collection A collection of demos for various approaches to synthetic data generation • 4 items • Updated Jun 25, 2024 • 14
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12, 2024 • 221
Korean Datasets I've released so far. Collection 지금까지 업로드한 한국어 데이터셋 콜렉션입니다. • 8 items • Updated May 24, 2024 • 16