LLaMA-3.1-8B-AGNews-SFT

This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the ag_news dataset.

Model description

This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the ag_news dataset. The model is a transformer-based model that was trained on a large corpus of text data and fine-tuned on the ag_news_train_num dataset. The model is capable of performing text classification tasks and can be used to classify text into one of four categories: World, Sports, Business, and Sci/Tech.

Intended uses & limitations

How to use

You can use this model to classify text into one of four categories: World, Sports, Business, and Sci/Tech. To use the model, you can load it using the transformers library and pass the text you want to classify to the model. The model will return the predicted category for the text.

Limitations and bias

The model may not perform well on text that is outside the domain of the training data. The model may also exhibit bias in its predictions based on the biases present in the training data. It is important to be aware of these limitations when using the model and to evaluate its performance on your specific use case.

Training and evaluation data

Dataset

The model was fine-tuned on the ag_news_train_num dataset, which is a subset of the AG News dataset. The AG News dataset is a collection of news articles from the AG's corpus of news articles on the web. The ag_news_train_num dataset contains 120,000 news articles from the AG News dataset, with 30,000 articles in each of the four categories: World, Sports, Business, and Sci/Tech.

Data preprocessing

The data was preprocessed by tokenizing the text using the transformers library's tokenizer for the LLaMA-3.1-8B model. The text was tokenized into subword tokens, and the tokens were converted into input features for the model. The input features were then used to train the model on the classification task.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 256
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 1.0

Training results

Class	Accuracy	Precision	Recall	F1 Score
World	95.95%	96.87%	95.95%	96.40%
Sports	99.42%	99.00%	99.42%	99.21%
Business	91.53%	93.95%	91.53%	92.72%
Sci/Tech	94.84%	91.99%	94.84%	93.39%
Overall (Macro Avg)	95.43%	95.45%	95.43%	95.43%

Framework versions

Transformers 4.45.2
Pytorch 2.4.1+cu121
Datasets 2.21.0
Tokenizers 0.20.1

Word2Li
/

LLaMA-3.1-8B-AGNews-SFT