LLaMA-3.1-8B-AGNews-SFT
This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the ag_news dataset.
Model description
This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the ag_news dataset. The model is a transformer-based model that was trained on a large corpus of text data and fine-tuned on the ag_news_train_num dataset. The model is capable of performing text classification tasks and can be used to classify text into one of four categories: World, Sports, Business, and Sci/Tech.
Intended uses & limitations
How to use
You can use this model to classify text into one of four categories: World, Sports, Business, and Sci/Tech. To use the model, you can load it using the transformers
library and pass the text you want to classify to the model. The model will return the predicted category for the text.
Limitations and bias
The model may not perform well on text that is outside the domain of the training data. The model may also exhibit bias in its predictions based on the biases present in the training data. It is important to be aware of these limitations when using the model and to evaluate its performance on your specific use case.
Training and evaluation data
Dataset
The model was fine-tuned on the ag_news_train_num dataset, which is a subset of the AG News dataset. The AG News dataset is a collection of news articles from the AG's corpus of news articles on the web. The ag_news_train_num dataset contains 120,000 news articles from the AG News dataset, with 30,000 articles in each of the four categories: World, Sports, Business, and Sci/Tech.
Data preprocessing
The data was preprocessed by tokenizing the text using the transformers
library's tokenizer for the LLaMA-3.1-8B model. The text was tokenized into subword tokens, and the tokens were converted into input features for the model. The input features were then used to train the model on the classification task.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 1.0
Training results
Class | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
World | 95.95% | 96.87% | 95.95% | 96.40% |
Sports | 99.42% | 99.00% | 99.42% | 99.21% |
Business | 91.53% | 93.95% | 91.53% | 92.72% |
Sci/Tech | 94.84% | 91.99% | 94.84% | 93.39% |
Overall (Macro Avg) | 95.43% | 95.45% | 95.43% | 95.43% |
Framework versions
- Transformers 4.45.2
- Pytorch 2.4.1+cu121
- Datasets 2.21.0
- Tokenizers 0.20.1
- Downloads last month
- 3
Model tree for Word2Li/LLaMA-3.1-8B-AGNews-SFT
Base model
meta-llama/Llama-3.1-8B