pkedzia commited on
Commit
0953cd0
·
verified ·
1 Parent(s): c9b59aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -11
README.md CHANGED
@@ -6,21 +6,23 @@ library_name: transformers
6
  ---
7
 
8
 
9
- **Sample input**:
 
 
 
 
 
 
 
 
10
  ```
11
- As | -T ron^# om ia je@st je!d &*ną z na -J s | AA ta rsZy ch n a u k.
12
  ```
13
-
14
- **The model response**
15
  ```
16
- Astronomia jest jedną z najstarszych nauk.
17
  ```
18
 
19
- Eval loss:
20
-
21
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/644addfe9279988e0cbc296b/HIJI2a1nojM6lbDyYe0-A.png)
22
-
23
-
24
  **Sample model usage**
25
  ```python
26
  from transformers import T5ForConditionalGeneration, T5Tokenizer
@@ -50,8 +52,21 @@ def do_inference(text, model, tokenizer):
50
  model = T5ForConditionalGeneration.from_pretrained("radlab/polish-denoiser-t5-base")
51
  tokenizer = T5Tokenizer.from_pretrained("radlab/polish-denoiser-t5-base")
52
 
53
- text_str = "As | -T ron^# om ia je@st je!d &*ną z na -J s | AA ta rsZy ch n a u k."
54
  print(do_inference(text_str, model, tokenizer))
55
 
56
  ```
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ---
7
 
8
 
9
+ The presented model can be used for text de-noising.
10
+ You can use it if you have text that has noise after loading, such as after loading pdf files.
11
+ The model was learned on texts in Polish. The data was automatically noisy.
12
+ [allegro/plt5-base](https://huggingface.co/allegro/plt5-base) was used as the base model.
13
+
14
+
15
+ **Model input**
16
+ Model input must be preceded by a tag `denoise:`.
17
+ If you have text to be de-noised, e.g.:
18
  ```
19
+ As | -Tron^# om ia je@st je!d &*ną z na -J s | AA ta rsZy ch n a u k.
20
  ```
21
+ then input to the model must be constructed as follows:
 
22
  ```
23
+ denoise: As | -Tron^# om ia je@st je!d &*ną z na -J s | AA ta rsZy ch n a u k.
24
  ```
25
 
 
 
 
 
 
26
  **Sample model usage**
27
  ```python
28
  from transformers import T5ForConditionalGeneration, T5Tokenizer
 
52
  model = T5ForConditionalGeneration.from_pretrained("radlab/polish-denoiser-t5-base")
53
  tokenizer = T5Tokenizer.from_pretrained("radlab/polish-denoiser-t5-base")
54
 
55
+ text_str = "As | -Tron^# om ia je@st je!d &*ną z na -J s | AA ta rsZy ch n a u k."
56
  print(do_inference(text_str, model, tokenizer))
57
 
58
  ```
59
 
60
+ Model reponse for **input**:
61
+ ```
62
+ denoise: As | -Tron^# om ia je@st je!d &*ną z na -J s | AA ta rsZy ch n a u k.
63
+ ```
64
+ is:
65
+ ```
66
+ Astronomia jest jedną z najstarszych nauk.
67
+ ```
68
+
69
+
70
+ **Evaluation**
71
+ Eval loss:
72
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/644addfe9279988e0cbc296b/HIJI2a1nojM6lbDyYe0-A.png)