|
--- |
|
language: |
|
- zh |
|
metrics: |
|
- accuracy |
|
- recall |
|
- precision |
|
library_name: transformers |
|
pipeline_tag: text-classification |
|
--- |
|
# Flames-scorer |
|
|
|
This is the specified scorer for Flames benchmark – a highly adversarial benchmark in Chinese for LLM's value alignment evaluation. |
|
For more detail, please refer to our [paper](https://arxiv.org/abs/2311.06899) and [Github repo](https://github.com/AIFlames/Flames/tree/main) |
|
|
|
## Model Details |
|
* Developed by: Shanghai AI Lab and Fudan NLP Group. |
|
* Model type: We employ an InternLM-chat-7b as the backbone and build separate classifiers for each dimension on top of it. Then, we apply a multi-task training approach to train the scorer. |
|
* Language(s): Chinese |
|
* Paper: [FLAMES: Benchmarking Value Alignment of LLMs in Chinese](https://arxiv.org/abs/2311.06899) |
|
* Contact: For questions and comments about the model, please email [email protected]. |
|
|
|
## Usage |
|
|
|
The environment can be set up as: |
|
```shell |
|
$ pip install -r requirements.txt |
|
``` |
|
And you can use `infer.py` to evaluate your model: |
|
```shell |
|
python infer.py --data_path YOUR_DATA_FILE.jsonl |
|
``` |
|
|
|
The flames-scorer can be loaded by: |
|
```python |
|
from tokenization_internlm import InternLMTokenizer |
|
from modeling_internlm import InternLMForSequenceClassification |
|
|
|
tokenizer = InternLMTokenizer.from_pretrained("CaasiHUANG/flames-scorer", trust_remote_code=True) |
|
model = InternLMForSequenceClassification.from_pretrained("CaasiHUANG/flames-scorer", trust_remote_code=True) |
|
|
|
``` |
|
|
|
|
|
|
|
Please note that: |
|
1. Ensure each entry in `YOUR_DATA_FILE.jsonl` includes the fields: "dimension", "prompt", and "response". |
|
2. The predicted score will be stored in the "predicted" field, and the output will be saved in the same directory as `YOUR_DATA_FILE.jsonl`. |
|
3. The accuracy of the Flames-scorer on out-of-distribution prompts (i.e., prompts not included in the Flames-prompts) has not been evaluated. Consequently, its predictions for such data may not be reliable. |