--- library_name: transformers tags: - llm - Large Language Model - llama3 - ORPO - ORPO β license: apache-2.0 datasets: - heegyu/hh-rlhf-ko language: - ko --- # Model Card for llama3-8b-instruct-orpo-ko ## Model Summary This model is a fine-tuned version of the meta-llama/Meta-Llama-3-8B-Instruct using the [odds ratio preference optimization (ORPO)](https://arxiv.org/abs/2403.07691). It has been trained to perform NLP tasks in Korean. ## Model Details ### Model Description - **Developed by:** Sungjoo Byun (Grace Byun) - **Language(s) (NLP):** Korean - **License:** Apache 2.0 - **Finetuned from model:** meta-llama/Meta-Llama-3-8B-Instruct ## Training Details ### Training Data The model was trained using the dataset [heegyu/hh-rlhf-ko](https://huggingface.co/datasets/heegyu/hh-rlhf-ko). We appreciate heegyu for sharing this valuable resource. ### Training Procedure We applied ORPO β to llama3-8b-instruct. The training was conducted on an A100 GPU with 80GB of memory. ## How to Get Started with the Model Use the code below to get started with the model: ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SungJoo/llama3-8b-instruct-orpo-ko") model = AutoModelForCausalLM.from_pretrained("SungJoo/llama3-8b-instruct-orpo-ko") ``` ## Citations Please cite the ORPO paper and our model as follows: ```bibtex @misc{hong2024orpo, title={ORPO: Monolithic Preference Optimization without Reference Model}, author={Jiwoo Hong and Noah Lee and James Thorne}, year={2024}, eprint={2403.07691}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ```bibtex @misc{byun, author = {Sungjoo Byun}, title = {llama3-8b-orpo-ko}, year = {2024}, publisher = {Hugging Face}, journal = {Hugging Face repository}, howpublished = {\url{https://huggingface.co/SungJoo/llama3-8b-instruct-orpo-ko}} } ``` ## Contact For any questions or issues, please contact byunsj@snu.ac.kr.