suolyer commited on
Commit
09489d7
·
1 Parent(s): e54db0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md CHANGED
@@ -1,3 +1,74 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - zh
4
  license: apache-2.0
5
+
6
+ tags:
7
+ - bert
8
+ - NLU
9
+ - Sentiment
10
+ - Chinese
11
+
12
+ inference: false
13
+
14
+ widget:
15
+ - text: "今天心情不好"
16
+
17
  ---
18
+ # Erlangshen-Ubert-110M, model (Chinese),one model of [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM).
19
+ We collect 70+ datasets in the Chinese domain for finetune, with a total of 1065069 samples. Our model is mainly based on [macbert](https://huggingface.co/hfl/chinese-macbert-base)
20
+
21
+ Ubert is a solution we proposed when we were doing the [2022 AIWIN World Artificial Intelligence Innovation Competition](http://ailab.aiwin.org.cn/competitions/68#results), and achieved the first place in the A/B list. Compared with the officially provided baseline, an increase of 20 percentage points. Ubert can not only complete common extraction tasks such as entity recognition and event extraction, but also classification tasks such as news classification and natural language reasoning.
22
+
23
+
24
+ ## Usage
25
+ ```python
26
+ 安装我们的 fengshen 框架,我们暂且提供如下方式安装
27
+ ```python
28
+ git clone https://github.com/IDEA-CCNL/Fengshenbang-LM.git
29
+ cd Fengshenbang-LM
30
+ pip install --editable ./
31
+ ```
32
+
33
+ 一键运行下面代码得到预测结果, 你可以任意修改示例 text 和要抽取的 entity_type,体验一下 Zero-Shot 性能
34
+ ```python
35
+ import argparse
36
+ from fengshen import UbertPiplines
37
+
38
+ total_parser = argparse.ArgumentParser("TASK NAME")
39
+ total_parser = UbertPiplines.piplines_args(total_parser)
40
+ args = total_parser.parse_args()
41
+
42
+ test_data=[
43
+ {
44
+ "task_type": "抽取任务",
45
+ "subtask_type": "实体识别",
46
+ "text": "这也让很多业主据此认为,雅清苑是政府公务员挤对了国家的经适房政策。",
47
+ "choices": [
48
+ {"entity_type": "小区名字"},
49
+ {"entity_type": "岗位职责"}
50
+ ],
51
+ "id": 0}
52
+ ]
53
+
54
+ model = UbertPiplines(args)
55
+ result = model.predict(test_data)
56
+ for line in result:
57
+ print(line)
58
+ ```
59
+ ## Scores on downstream chinese tasks
60
+ | Model | ASAP-SENT | ASAP-ASPECT | ChnSentiCorp |
61
+ | :--------: | :-----: | :----: | :-----: |
62
+ | Erlangshen-Roberta-110M-Sentiment | 97.77 | 97.31 | 96.61 |
63
+ | Erlangshen-Roberta-330M-Sentiment | 97.9 | 97.51 | 96.66 |
64
+ | Erlangshen-MegatronBert-1.3B-Sentiment | 98.1 | 97.8 | 97 |
65
+ ## Citation
66
+ If you find the resource is useful, please cite the following website in your paper.
67
+ ```
68
+ @misc{Fengshenbang-LM,
69
+ title={Fengshenbang-LM},
70
+ author={IDEA-CCNL},
71
+ year={2021},
72
+ howpublished={\url{https://github.com/IDEA-CCNL/Fengshenbang-LM}},
73
+ }
74
+ ```