LLaMA-3.1-8B-AGNews-SFT

This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the ag_news dataset.

Model description

This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the ag_news dataset. The model is a transformer-based model that was trained on a large corpus of text data and fine-tuned on the ag_news_train_num dataset. The model is capable of performing text classification tasks and can be used to classify text into one of four categories: World, Sports, Business, and Sci/Tech.

Intended uses & limitations

How to use

You can use this model to classify text into one of four categories: World, Sports, Business, and Sci/Tech. To use the model, you can load it using the transformers library and pass the text you want to classify to the model. The model will return the predicted category for the text.

Limitations and bias

The model may not perform well on text that is outside the domain of the training data. The model may also exhibit bias in its predictions based on the biases present in the training data. It is important to be aware of these limitations when using the model and to evaluate its performance on your specific use case.

Training and evaluation data

Dataset

The model was fine-tuned on the ag_news_train_num dataset, which is a subset of the AG News dataset. The AG News dataset is a collection of news articles from the AG's corpus of news articles on the web. The ag_news_train_num dataset contains 120,000 news articles from the AG News dataset, with 30,000 articles in each of the four categories: World, Sports, Business, and Sci/Tech.

Data preprocessing

The data was preprocessed by tokenizing the text using the transformers library's tokenizer for the LLaMA-3.1-8B model. The text was tokenized into subword tokens, and the tokens were converted into input features for the model. The input features were then used to train the model on the classification task.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1.0

Training results

Class Accuracy Precision Recall F1 Score
World 95.95% 96.87% 95.95% 96.40%
Sports 99.42% 99.00% 99.42% 99.21%
Business 91.53% 93.95% 91.53% 92.72%
Sci/Tech 94.84% 91.99% 94.84% 93.39%
Overall (Macro Avg) 95.43% 95.45% 95.43% 95.43%

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.4.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.20.1
Downloads last month
3
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support LLaMA-3.1-8B-AGNews-SFT models for this pipeline type.

Model tree for Word2Li/LLaMA-3.1-8B-AGNews-SFT

Finetuned
(696)
this model

Dataset used to train Word2Li/LLaMA-3.1-8B-AGNews-SFT