emanuelaboros commited on
Commit
debc4ef
·
1 Parent(s): 6136a2f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +183 -0
README.md CHANGED
@@ -1,3 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: agpl-3.0
3
  ---
 
 
1
+ ---
2
+
3
+ language:
4
+ - multilingual
5
+ - af
6
+ - am
7
+ - ar
8
+ - as
9
+ - az
10
+ - be
11
+ - bg
12
+ - bm
13
+ - bn
14
+ - br
15
+ - bs
16
+ - ca
17
+ - cs
18
+ - cy
19
+ - da
20
+ - de
21
+ - el
22
+ - en
23
+ - eo
24
+ - es
25
+ - et
26
+ - eu
27
+ - fa
28
+ - ff
29
+ - fi
30
+ - fr
31
+ - fy
32
+ - ga
33
+ - gd
34
+ - gl
35
+ - gn
36
+ - gu
37
+ - ha
38
+ - he
39
+ - hi
40
+ - hr
41
+ - ht
42
+ - hu
43
+ - hy
44
+ - id
45
+ - ig
46
+ - is
47
+ - it
48
+ - ja
49
+ - jv
50
+ - ka
51
+ - kg
52
+ - kk
53
+ - km
54
+ - kn
55
+ - ko
56
+ - ku
57
+ - ky
58
+ - la
59
+ - lg
60
+ - ln
61
+ - lo
62
+ - lt
63
+ - lv
64
+ - mg
65
+ - mk
66
+ - ml
67
+ - mn
68
+ - mr
69
+ - ms
70
+ - my
71
+ - ne
72
+ - nl
73
+ - no
74
+ - om
75
+ - or
76
+ - pa
77
+ - pl
78
+ - ps
79
+ - pt
80
+ - qu
81
+ - ro
82
+ - ru
83
+ - sa
84
+ - sd
85
+ - si
86
+ - sk
87
+ - sl
88
+ - so
89
+ - sq
90
+ - sr
91
+ - ss
92
+ - su
93
+ - sv
94
+ - sw
95
+ - ta
96
+ - te
97
+ - th
98
+ - ti
99
+ - tl
100
+ - tn
101
+ - tr
102
+ - uk
103
+ - ur
104
+ - uz
105
+ - vi
106
+ - wo
107
+ - xh
108
+ - yo
109
+ - zh
110
+
111
+
112
+ tags:
113
+ - retrieval
114
+ - entity-retrieval
115
+ - named-entity-disambiguation
116
+ - entity-disambiguation
117
+ - named-entity-linking
118
+ - entity-linking
119
+ - text2text-generation
120
+ ---
121
+
122
+
123
+ # mGENRE
124
+
125
+
126
+ The mGENRE (multilingual Generative ENtity REtrieval) system as presented in [Multilingual Autoregressive Entity Linking](https://arxiv.org/abs/2103.12528) implemented in pytorch.
127
+
128
+ In a nutshell, mGENRE uses a sequence-to-sequence approach to entity retrieval (e.g., linking), based on fine-tuned [mBART](https://arxiv.org/abs/2001.08210) architecture. GENRE performs retrieval generating the unique entity name conditioned on the input text using constrained beam search to only generate valid identifiers. The model was first released in the [facebookresearch/GENRE](https://github.com/facebookresearch/GENRE) repository using `fairseq` (the `transformers` models are obtained with a conversion script similar to [this](https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/convert_bart_original_pytorch_checkpoint_to_pytorch.py).
129
+
130
+ This model was trained on 105 languages from Wikipedia.
131
+
132
+ ## BibTeX entry and citation info
133
+
134
+ **Please consider citing our works if you use code from this repository.**
135
+
136
+ ```bibtex
137
+ @article{decao2020multilingual,
138
+ author = {De Cao, Nicola and Wu, Ledell and Popat, Kashyap and Artetxe, Mikel
139
+ and Goyal, Naman and Plekhanov, Mikhail and Zettlemoyer, Luke
140
+ and Cancedda, Nicola and Riedel, Sebastian and Petroni, Fabio},
141
+ title = "{Multilingual Autoregressive Entity Linking}",
142
+ journal = {Transactions of the Association for Computational Linguistics},
143
+ volume = {10},
144
+ pages = {274-290},
145
+ year = {2022},
146
+ month = {03},
147
+ issn = {2307-387X},
148
+ doi = {10.1162/tacl_a_00460},
149
+ url = {https://doi.org/10.1162/tacl\_a\_00460},
150
+ eprint = {https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl\_a\_00460/2004070/tacl\_a\_00460.pdf},
151
+ }
152
+ ```
153
+
154
+ ## Usage
155
+
156
+ Here is an example of generation for Wikipedia page disambiguation:
157
+
158
+ ```python
159
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
160
+
161
+ tokenizer = AutoTokenizer.from_pretrained("impresso-project/nel-historic-multilingual")
162
+ model = AutoModelForSeq2SeqLM.from_pretrained("impresso-project/nel-historic-multilingual").eval()
163
+
164
+ sentences = ["[START] United Press [END] - On the home front, the British populace remains steadfast in the face of ongoing air raids. In [START] London [END], despite the destruction, the spirit of the people is unbroken, with volunteers and civil defense units working tirelessly to support the war effort. Reports from [START] BUP [START]correspondents highlight the nationwide push for increased production in factories, essential for supplying the front lines with the materials needed for victory. "]
165
+
166
+ outputs = model.generate(
167
+ **tokenizer(sentences, return_tensors="pt"),
168
+ num_beams=5,
169
+ num_return_sequences=5
170
+ )
171
+
172
+ tokenizer.batch_decode(outputs, skip_special_tokens=True)
173
+ ```
174
+ which outputs the following top-5 predictions (using constrained beam search)
175
+ ```
176
+ ['Albert Einstein >> it',
177
+ 'Albert Einstein (disambiguation) >> en',
178
+ 'Alfred Einstein >> it',
179
+ 'Alberto Einstein >> it',
180
+ 'Einstein >> it']
181
+ ```
182
+
183
  ---
184
  license: agpl-3.0
185
  ---
186
+