alanakbik commited on
Commit
75e8187
·
1 Parent(s): 3e16683

initial model commit

Browse files
Files changed (1) hide show
  1. README.md +151 -0
README.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - flair
4
+ - token-classification
5
+ - sequence-tagger-model
6
+ language: en
7
+ datasets:
8
+ - conll2000
9
+ inference: false
10
+ ---
11
+
12
+ ## English Chunking in Flair (default model)
13
+
14
+ This is the standard phrase chunking model for English that ships with [Flair](https://github.com/flairNLP/flair/).
15
+
16
+ F1-Score: **96,48** (corrected CoNLL-2000)
17
+
18
+ Predicts 4 tags:
19
+
20
+ | **tag** | **meaning** |
21
+ |---------------------------------|-----------|
22
+ | ADJP | adjectival |
23
+ | ADVP | adverbial |
24
+ | CONJP | conjunction |
25
+ | INTJ | interjection |
26
+ | LST | list marker |
27
+ | NP | noun phrase |
28
+ | PP | prepositional |
29
+ | PRT | particle |
30
+ | SBAR | subordinate clause |
31
+ | VP | verb phrase |
32
+
33
+ Based on [Flair embeddings](https://www.aclweb.org/anthology/C18-1139/) and LSTM-CRF.
34
+
35
+ ---
36
+
37
+ ### Demo: How to use in Flair
38
+
39
+ Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
40
+
41
+ ```python
42
+ from flair.data import Sentence
43
+ from flair.models import SequenceTagger
44
+
45
+ # load tagger
46
+ tagger = SequenceTagger.load("flair/chunk-english")
47
+
48
+ # make example sentence
49
+ sentence = Sentence("The happy man has been eating at the diner")
50
+
51
+ # predict NER tags
52
+ tagger.predict(sentence)
53
+
54
+ # print sentence
55
+ print(sentence)
56
+
57
+ # print predicted NER spans
58
+ print('The following NER tags are found:')
59
+ # iterate over entities and print
60
+ for entity in sentence.get_spans('np'):
61
+ print(entity)
62
+
63
+ ```
64
+
65
+ This yields the following output:
66
+ ```
67
+ Span [1,2,3]: "The happy man" [− Labels: NP (0.9958)]
68
+ Span [4,5,6]: "has been eating" [− Labels: VP (0.8759)]
69
+ Span [7]: "at" [− Labels: PP (1.0)]
70
+ Span [8,9]: "the diner" [− Labels: NP (0.9991)]
71
+
72
+ ```
73
+
74
+ So, the spans "*The happy man*" and "*the diner*" are labeled as **noun phrases** (NP) and "*has been eating*" is labeled as a **verb phrase** (VP) in the sentence "*The happy man has been eating at the diner*".
75
+
76
+
77
+ ---
78
+
79
+ ### Training: Script to train this model
80
+
81
+ The following Flair script was used to train this model:
82
+
83
+ ```python
84
+ from flair.data import Corpus
85
+ from flair.datasets import CONLL_2000
86
+ from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
87
+
88
+ # 1. get the corpus
89
+ corpus: Corpus = CONLL_2000()
90
+
91
+ # 2. what tag do we want to predict?
92
+ tag_type = 'np'
93
+
94
+ # 3. make the tag dictionary from the corpus
95
+ tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
96
+
97
+ # 4. initialize each embedding we use
98
+ embedding_types = [
99
+
100
+ # contextual string embeddings, forward
101
+ FlairEmbeddings('news-forward'),
102
+
103
+ # contextual string embeddings, backward
104
+ FlairEmbeddings('news-backward'),
105
+ ]
106
+
107
+ # embedding stack consists of Flair and GloVe embeddings
108
+ embeddings = StackedEmbeddings(embeddings=embedding_types)
109
+
110
+ # 5. initialize sequence tagger
111
+ from flair.models import SequenceTagger
112
+
113
+ tagger = SequenceTagger(hidden_size=256,
114
+ embeddings=embeddings,
115
+ tag_dictionary=tag_dictionary,
116
+ tag_type=tag_type)
117
+
118
+ # 6. initialize trainer
119
+ from flair.trainers import ModelTrainer
120
+
121
+ trainer = ModelTrainer(tagger, corpus)
122
+
123
+ # 7. run training
124
+ trainer.train('resources/taggers/chunk-english',
125
+ train_with_dev=True,
126
+ max_epochs=150)
127
+ ```
128
+
129
+
130
+
131
+ ---
132
+
133
+ ### Cite
134
+
135
+ Please cite the following paper when using this model.
136
+
137
+ ```
138
+ @inproceedings{akbik2018coling,
139
+ title={Contextual String Embeddings for Sequence Labeling},
140
+ author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
141
+ booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
142
+ pages = {1638--1649},
143
+ year = {2018}
144
+ }
145
+ ```
146
+
147
+ ---
148
+
149
+ ### Issues?
150
+
151
+ The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/).