File size: 1,680 Bytes
1f3fcd0 128a1cc 1f3fcd0 128a1cc 1f3fcd0 128a1cc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
---
language:
- zh
license: apache-2.0
tags:
- bert
- NLU
- NLI
inference: true
widget:
- text: "今天心情不好[SEP]今天很开心"
---
# Erlangshen-Roberta-110M-Similarity, model (Chinese),one model of [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM).
We collect 20 paraphrace datasets in the Chinese domain for finetune, with a total of 2773880 samples. Our model is mainly based on [roberta](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large)
## Usage
```python
from transformers import BertForSequenceClassification
from transformers import BertTokenizer
import torch
tokenizer=BertTokenizer.from_pretrained('IDEA-CCNL/Erlangshen-Roberta-110M-Similarity')
model=BertForSequenceClassification.from_pretrained('IDEA-CCNL/Erlangshen-Roberta-110M-Similarity')
texta='今天的饭不好吃'
textb='今天心情不好'
output=model(torch.tensor([tokenizer.encode(texta,textb)]))
print(torch.nn.functional.softmax(output.logits,dim=-1))
```
## Scores on downstream chinese tasks(The dev datasets of BUSTM and AFQMC may exist in the train set)
| Model | BQ | BUSTM | AFQMC |
| :--------: | :-----: | :----: | :-----: |
| Erlangshen-Roberta-110M-Similarity | 85.41 | 95.18 | 81.72 |
| Erlangshen-Roberta-330M-Similarity | 86.21 | 99.29 | 93.89 |
| Erlangshen-MegatronBert-1.3B-Similarity | 86.31 | - | - |
## Citation
If you find the resource is useful, please cite the following website in your paper.
```
@misc{Fengshenbang-LM,
title={Fengshenbang-LM},
author={IDEA-CCNL},
year={2021},
howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
}
``` |