Update README.md
Browse files
README.md
CHANGED
@@ -6,21 +6,23 @@ library_name: transformers
|
|
6 |
---
|
7 |
|
8 |
|
9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
```
|
11 |
-
As | -
|
12 |
```
|
13 |
-
|
14 |
-
**The model response**
|
15 |
```
|
16 |
-
|
17 |
```
|
18 |
|
19 |
-
Eval loss:
|
20 |
-
|
21 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/644addfe9279988e0cbc296b/HIJI2a1nojM6lbDyYe0-A.png)
|
22 |
-
|
23 |
-
|
24 |
**Sample model usage**
|
25 |
```python
|
26 |
from transformers import T5ForConditionalGeneration, T5Tokenizer
|
@@ -50,8 +52,21 @@ def do_inference(text, model, tokenizer):
|
|
50 |
model = T5ForConditionalGeneration.from_pretrained("radlab/polish-denoiser-t5-base")
|
51 |
tokenizer = T5Tokenizer.from_pretrained("radlab/polish-denoiser-t5-base")
|
52 |
|
53 |
-
text_str = "As | -
|
54 |
print(do_inference(text_str, model, tokenizer))
|
55 |
|
56 |
```
|
57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
---
|
7 |
|
8 |
|
9 |
+
The presented model can be used for text de-noising.
|
10 |
+
You can use it if you have text that has noise after loading, such as after loading pdf files.
|
11 |
+
The model was learned on texts in Polish. The data was automatically noisy.
|
12 |
+
[allegro/plt5-base](https://huggingface.co/allegro/plt5-base) was used as the base model.
|
13 |
+
|
14 |
+
|
15 |
+
**Model input**
|
16 |
+
Model input must be preceded by a tag `denoise:`.
|
17 |
+
If you have text to be de-noised, e.g.:
|
18 |
```
|
19 |
+
As | -Tron^# om ia je@st je!d &*ną z na -J s | AA ta rsZy ch n a u k.
|
20 |
```
|
21 |
+
then input to the model must be constructed as follows:
|
|
|
22 |
```
|
23 |
+
denoise: As | -Tron^# om ia je@st je!d &*ną z na -J s | AA ta rsZy ch n a u k.
|
24 |
```
|
25 |
|
|
|
|
|
|
|
|
|
|
|
26 |
**Sample model usage**
|
27 |
```python
|
28 |
from transformers import T5ForConditionalGeneration, T5Tokenizer
|
|
|
52 |
model = T5ForConditionalGeneration.from_pretrained("radlab/polish-denoiser-t5-base")
|
53 |
tokenizer = T5Tokenizer.from_pretrained("radlab/polish-denoiser-t5-base")
|
54 |
|
55 |
+
text_str = "As | -Tron^# om ia je@st je!d &*ną z na -J s | AA ta rsZy ch n a u k."
|
56 |
print(do_inference(text_str, model, tokenizer))
|
57 |
|
58 |
```
|
59 |
|
60 |
+
Model reponse for **input**:
|
61 |
+
```
|
62 |
+
denoise: As | -Tron^# om ia je@st je!d &*ną z na -J s | AA ta rsZy ch n a u k.
|
63 |
+
```
|
64 |
+
is:
|
65 |
+
```
|
66 |
+
Astronomia jest jedną z najstarszych nauk.
|
67 |
+
```
|
68 |
+
|
69 |
+
|
70 |
+
**Evaluation**
|
71 |
+
Eval loss:
|
72 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/644addfe9279988e0cbc296b/HIJI2a1nojM6lbDyYe0-A.png)
|