Model Card for Model ID

ORPO-Tuned Llama-3.2-1B-Instruct

Model Details

  • This model is a fine-tuned version of the meta-llama/Llama-3.2-1B-Instruct base model, adapted using the ORPO (Optimizing Reward and Preference Objectives) technique.
  • Base Model: It builds upon the Llama-3.2-1B-Instruct model, (1 billion parameter instruction-following language model).
  • Fine-Tuning Technique: The model was fine-tuned using ORPO. ORPO combines supervised fine-tuning with preference optimization.
  • Training Data: It was trained on the mlabonne/orpo-dpo-mix-40k dataset, containing 44,245 examples of prompts, chosen answers, and rejected answers.
  • Purpose: The model is designed to generate responses that are better aligned with human preferences while maintaining the general knowledge and capabilities of the base Llama 3 model.
  • Efficient Fine-Tuning: LoRA (Low-Rank Adaptation) was used for efficient adaptation, allowing for faster training and smaller storage requirements.
  • Capabilities: Model should follow instructions and generate responses that are more in line with human preferences compared to the base model.
  • Evaluation: The model's performance was evaluated on the HellaSwag benchmark

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Model Sources [optional]

https://uplimit.com/course/open-source-llms/session/session_clu1q3j6f016d128r2zxe3uyj/assignment/assignment_clyvnyyjh019h199337oef4ur https://uplimit.com/ugc-assets/course/course_clmz6fh2a00aa12bqdtjv6ygs/assets/1728565337395-85hdx93s03d0v9bd8j1nnxfjylyty2/uplimitopensourcellmsoctoberweekone.ipynb

Uses

Hands-on learning: Finetuning LLMs

Direct Use

Introduction to Finetuning LLMs course - Learning

Downstream Use [optional]

This model is designed for tasks requiring improved alignment with human preferences, such as:

  • Chatbots
  • Question-answering systems
  • General text generation with enhanced preference alignment

Out-of-Scope Use

This should not yet be used in the world - More finetuning is required

Bias, Risks, and Limitations

  • Performance may vary on tasks outside the training distribution
  • May inherit biases present in the base model and training data
  • Limited to 1B parameters, which may impact performance on complex tasks

Recommendations

  • Users should be aware of potential biases in model outputs
  • Not suitable for critical decision-making without human oversight
  • May generate plausible-sounding but incorrect information

Training Details

Training Data

For training data the model used:'mlabonne/orpo-dpo-mix-40k'

This dataset is designed for ORPO (Optimizing Reward and Preference Objectives) or DPO (Direct Preference Optimization) training of language models.

  • It contains 44,245 examples in the training split.
  • Includes prompts, chosen answers, and rejected answers for each sample.
  • Combines various high-quality DPO datasets. [More Information Needed]

Training Procedure

This model was fine-tuned using the ORPO (Optimizing Reward and Preference Objectives) technique on the meta-llama/Llama-3.2-1B-Instruct base model.

Base Model: meta-llama/Llama-3.2-1B-Instruct Training Technique: ORPO (Optimizing Reward and Preference Objectives) Efficient Fine-tuning Method: LoRA (Low-Rank Adaptation)

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Learning Rate: 2e-5
  • Batch Size: 4
  • Gradient Accumulation Steps: 4
  • Training Steps: 500
  • Warmup Steps: 20
  • LoRA Rank: 16
  • LoRA Alpha: 32

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

For evaluation the model used Hellaswag Results:

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc ↑ 0.4516 ± 0.0050
none 0 acc_norm ↑ 0.6139 ± 0.0049

Interpretation:

  • Performance Level: The model achieves a raw accuracy of 45.16% and a normalized accuracy of 61.39% on the HellaSwag task.
  • Confidence: The small standard errors (about 0.5% for both metrics) indicate that these results are fairly precise.
  • Improvement over Random: Given that HellaSwag typically has 4 choices per question, a random baseline would achieve 25% accuracy. This model performs significantly better than random.
  • Normalized vs. Raw Accuracy: The higher normalized accuracy (61.39% vs. 45.16%) suggests that the model performs better when accounting for task-specific challenges.
  • Room for Improvement: While the performance is well above random, there's still significant room for improvement to reach human-level performance (which is typically above 95% on HellaSwag).

Summary

  • Base Model: meta-llama/Llama-3.2-1B-Instruct
  • Model Type: Causal Language Model
  • Language: English

Intended Use

  • This model is designed for tasks requiring improved alignment with human preferences, such as:
  • Chatbots
  • Question-answering systems
  • General text generation with enhanced preference alignment

Training Data

  • Dataset: mlabonne/orpo-dpo-mix-40k
  • Size: 44,245 examples
  • Content: Prompts, chosen answers, and rejected answers

Task: HellaSwag

  • This is a benchmark task designed to evaluate a model's commonsense reasoning and ability to complete scenarios logically.
  • No specific filtering was applied to the test set.
  • The evaluation was done in a zero-shot setting, where the model didn't receive any examples before making predictions.

Interpretation:

  • Performance Level: The model achieves a raw accuracy of 45.16% and a normalized accuracy of 61.39% on the HellaSwag task.
  • Confidence: The small standard errors (about 0.5% for both metrics) indicate that these results are fairly precise.
  • Improvement over Random: Given that HellaSwag typically has 4 choices per question, a random baseline would achieve 25% accuracy. This model performs significantly better than random.
  • Normalized vs. Raw Accuracy: The higher normalized accuracy (61.39% vs. 45.16%) suggests that the model performs better when accounting for task-specific challenges.
  • Room for Improvement: While the performance is well above random, there's still significant room for improvement to reach human-level performance (which is typically above 95% on HellaSwag).
  • Metrics: a. acc (Accuracy): Value: 0.4516 (45.16%), Stderr: ± 0.0050 (0.50%), b. acc_norm (Normalized Accuracy): Value: 0.6139 (61.39%), Stderr: ± 0.0049 (0.49%)

Environmental Impact

  • Hardware Type: A100
  • Hours used: No comment
  • Cloud Provider: Google Collab
  • Compute Region: Sacramento, CA, US
  • Framework: PyTorch

Technical Specifications [optional]

Hardware: A100 GPU

Model Card Author

Ruth Shacterman

Model Card Contact

[More Information Needed]

Downloads last month
11
Safetensors
Model size
1.24B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for rshacter/rshacter-llama-3.2-1B-instruct

Finetuned
(229)
this model

Dataset used to train rshacter/rshacter-llama-3.2-1B-instruct