BertilBraun's picture
Update README.md
88ebdf9 verified
|
raw
history blame
3.92 kB
metadata
license: apache-2.0
datasets:
  - BertilBraun/competency-extraction-dpo-v2
base_model:
  - mistralai/Mixtral-8x7B-Instruct-v0.1
library_name: transformers
language:
  - en
pipeline_tag: text-generation
tags:
  - extraction

Fine-Tuning Process for Competence Extraction of Mixtral-8x7B-Instruct-v0.1

This document provides an overview of the fine-tuning process described in src/finetuning/ (of github.com/BertilBraun/Master-Thesis) for the competence extraction task. The process involves creating a synthetic custom dataset, training the model using Direct Preference Optimization (DPO), and evaluating the model's performance with both automated metrics and expert verification.

Competence Extraction Task

The objective is to extract detailed competency profiles from textual data, such as abstracts or documents. These profiles represent the skills and knowledge areas related to a specific domain.

Format of the Profiles

The profiles are structured in a standardized format to encapsulate the competencies effectively. An example profile might include:

  • Domain: The main area of expertise.
  • Competencies: A list of skills or knowledge areas with accompanying descriptions.
Domain: "Data Science"

Competencies:
- Machine Learning: Advanced knowledge of finetuning and training...
- Statistical Analysis: Mathematical modeling etc...
- Data Visualization: Creations of visualizations using MatPlot and Python...

Synthetic Custom Dataset

To train the model, a synthetic dataset is generated with the following components:

  • Abstracts: Collections of textual data related to various domains.
  • Generated Profiles: Competency profiles created based on the abstracts.
  • Preference Samples: Pairs of profiles with annotations indicating which profile better represents the competencies in the given abstracts.

This dataset simulates real-world data and provides the model with diverse examples to learn from.

Training with Direct Preference Optimization (DPO)

The model is fine-tuned using Direct Preference Optimization (DPO), which focuses on optimizing the model based on preference judgments between pairs of outputs.

Training Steps

  1. Data Preparation: Format the synthetic dataset into prompts and responses suitable for DPO.
  2. Model Configuration: Initialize the base model and configure training parameters, such as learning rate and batch size.
  3. Fine-Tuning: Train the model using the DPO algorithm to prefer outputs that better match the desired competencies.
  4. Evaluation: Assess the model's performance on a validation set to monitor improvement.

LLM as Evaluator

An auxiliary Large Language Model (LLM) is used to evaluate the generated profiles. The LLM assesses the quality and relevance of profiles, providing an automated way to generate preference judgments for training.

Expert Verification

Human experts review a subset of the model's outputs to verify the accuracy and quality of the extracted competencies. This step ensures that the fine-tuned model aligns with domain-specific expectations and provides reliable results.

Performance Metrics

  • Preference Over Base Model: The fine-tuned model achieves an 80% preference rate over the base model, indicating significant improvement in extracting relevant competencies.
  • Comparison with Larger Models: While improved, the model's performance still lags behind larger models in terms of profile quality, suggesting room for further enhancement.

Conclusion

The fine-tuning process successfully enhances the model's ability to extract competencies from textual data. Combining synthetic datasets, DPO training, and evaluations using both LLMs and expert verification contributes to the model's improved performance.