File size: 2,632 Bytes
adf640f
bbc1958
6d3aa09
 
 
 
 
 
 
adf640f
6d3aa09
43b9539
 
 
 
6d3aa09
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bbc1958
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6d3aa09
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
library_name: peft
license: apache-2.0
pipeline_tag: text-classification
tags:
- hatespeech
- hatecot
- cot
- llama
---

## Update
* The paper has been accepted to EMNLP 2024 Findings:
https://aclanthology.org/2024.findings-emnlp.343/

## Introduction
This is the LoRA-adapater for the Llama-7B introduced in the paper 
*HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models*.
The base model is instruction-finetuned on 52,000 samples that includes augmented humman annotation to produce 
legible explanations based on predefined criteria in the **provided definition**.


To use the model, please load along with the original Llama model (detailed configuration in the *Training Procedure*). 
For instruction to load Peft models: https://huggingface.co/docs/transformers/main/en/peft

These adapters can also be finetuned on a new set of data. See the article for more details.

## Usage
Use the following template to prompt the model:
```
### Instruction
Perform this task by considering the following Definitions.
Based on the message, label the input as only one of the following categories:
[Class 1], [Class 2], ..., or [Class N].
Provide a brief paragraph to explain step-by-step why the post should be classsified
with the provided Label based on the given Definitions. If this post targets a group or
entity relevant to the definition of the specified Label, explain who this target is and how
that leads to that Label.
Append the string '<END>' to the end of your response. Provide your response in the following format:
EXPLANATION: [text]
LABEL:[text] <END>
### Definitions:
[Class 1]: [Definition 1]
[Class 2]: [Definition 2]
...
[Class N]: [Definition 3]
### Input
{post}
### Response:
```

## Citation
```bibtex
@article{nghiem2024hatecot,
  title={HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models},
  author={Nghiem, Huy and Daum{\'e} III, Hal},
 journal={arXiv preprint arXiv:2403.11456},
  year={2024}
}
```

## Original Model
Please visit the main repository to gain permission to download original model weights.

https://huggingface.co/meta-llama



## Training procedure


The following `bitsandbytes` quantization config was used during training:
- quant_method: bitsandbytes
- load_in_8bit: True
- load_in_4bit: False
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: float16
### Framework versions


- PEFT 0.5.0