tolga-ozturk
commited on
Commit
·
f2c0c68
1
Parent(s):
1c353f8
Upload README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,63 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# mGPT-nsp
|
2 |
+
|
3 |
+
mGPT-nsp is fine-tuned for Next Sentence Prediction task on the [wikipedia dataset](https://huggingface.co/datasets/wikipedia) using [multilingual GPT](https://huggingface.co/THUMT/mGPT) model. It was introduced in this [paper](https://arxiv.org/abs/2307.07331) and first released on this page.
|
4 |
+
|
5 |
+
## Model description
|
6 |
+
|
7 |
+
mGPT-nsp is a Transformer-based model which fine-tuned for Next Sentence Prediction task on 11000 English and 11000 German Wikipedia articles. Similar to GPT-2, It was pre-trained on the raw texts only, with no human labeling. We use the same tokenization and vocabulary as the [mT5 model](https://huggingface.co/google/mt5-base).
|
8 |
+
|
9 |
+
## Intended uses
|
10 |
+
|
11 |
+
- Apply Next Sentence Prediction tasks. (compare the results with BERT models since BERT natively supports this task)
|
12 |
+
- See how to fine-tune a mGPT2 model using our [code](https://github.com/slds-lmu/stereotypes-multi/tree/main)
|
13 |
+
- Check our [paper](https://arxiv.org/abs/2307.07331) to see its results
|
14 |
+
|
15 |
+
## How to use
|
16 |
+
|
17 |
+
You can use this model directly with a pipeline for next sentence prediction. Here is how to use this model in PyTorch:
|
18 |
+
|
19 |
+
### Necessary Initialization
|
20 |
+
```python
|
21 |
+
from transformers import MT5Tokenizer, GPT2Model
|
22 |
+
import torch
|
23 |
+
from huggingface_hub import hf_hub_download
|
24 |
+
|
25 |
+
class ModelNSP(torch.nn.Module):
|
26 |
+
def __init__(self, pretrained_model="THUMT/mGPT"):
|
27 |
+
super(ModelNSP, self).__init__()
|
28 |
+
self.core_model = GPT2Model.from_pretrained(pretrained_model)
|
29 |
+
hidden_size = self.core_model.config.hidden_size
|
30 |
+
self.nsp_head = torch.nn.Sequential(torch.nn.Linear(hidden_size, 300), torch.nn.Linear(300, 300), torch.nn.Linear(300, 2))
|
31 |
+
|
32 |
+
def forward(self, input_ids, attention_mask=None):
|
33 |
+
core_model_outputs = self.core_model(input_ids, attention_mask=attention_mask)[0].mean(dim=1)
|
34 |
+
return self.nsp_head(core_model_outputs).softmax(dim=-1)
|
35 |
+
|
36 |
+
weights = torch.load(hf_hub_download(repo_id="tolga-ozturk/mGPT-nsp", filename="model_weights.bin"))
|
37 |
+
model = torch.nn.DataParallel(ModelNSP().eval())
|
38 |
+
model.load_state_dict(weights)
|
39 |
+
tokenizer = MT5Tokenizer.from_pretrained("tolga-ozturk/mGPT-nsp")
|
40 |
+
```
|
41 |
+
|
42 |
+
### Inference
|
43 |
+
```python
|
44 |
+
encoded_dict = tokenizer.batch_encode_plus(batch_text_or_text_pairs=[("In Italy, pizza is presented unsliced.", "The sky is blue."),
|
45 |
+
("In Italy, pizza is presented unsliced.", "However, it is served sliced in Turkey.")], truncation="longest_first", padding=True, return_tensors="pt", return_attention_mask=True, max_length=256)
|
46 |
+
outputs = model(encoded_dict.input_ids, attention_mask=encoded_dict.attention_mask)
|
47 |
+
print(outputs)
|
48 |
+
print(torch.argmax(outputs, dim=-1))
|
49 |
+
```
|
50 |
+
|
51 |
+
## BibTeX entry and citation info
|
52 |
+
|
53 |
+
```bibtex
|
54 |
+
@misc{title={How Different Is Stereotypical Bias Across Languages?},
|
55 |
+
author={Ibrahim Tolga Öztürk and Rostislav Nedelchev and Christian Heumann and Esteban Garces Arias and Marius Roger and Bernd Bischl and Matthias Aßenmacher},
|
56 |
+
year={2023},
|
57 |
+
eprint={2307.07331},
|
58 |
+
archivePrefix={arXiv},
|
59 |
+
primaryClass={cs.CL}
|
60 |
+
}
|
61 |
+
```
|
62 |
+
|
63 |
+
The work is done with Ludwig-Maximilians-Universität Statistics group, don't forget to check out [their huggingface page](https://huggingface.co/misoda) for other interesting works!
|