Mxode commited on
Commit
6c5df71
·
verified ·
1 Parent(s): 04ec7ab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -34
README.md CHANGED
@@ -1,34 +1,126 @@
1
- ---
2
- license: gpl-3.0
3
- ---
4
- # **NanoTranslator-S**
5
-
6
- ## Introduction
7
-
8
- 这是 NanoTranslator 的 Small 型号,目前仅支持**英译中**。仓库中同时提供了 ONNX 版本的模型。
9
-
10
-
11
-
12
- | Size | Params. | V. | H. | I. | L. | Att. H. | KV H. | Tie Emb. |
13
- | :--: | :-----: | :--: | :--: | :--: | :--: | :-----: | :---: | :------: |
14
- | XL | 50 M | 8000 | 320 | 1792 | 24 | 16 | 4 | True |
15
- | L | 22 M | 8000 | 256 | 1408 | 16 | 16 | 4 | True |
16
- | M | 9 M | 4000 | 168 | 896 | 16 | 12 | 4 | True |
17
- | S | 2 M | 2000 | 96 | 512 | 12 | 12 | 4 | True |
18
-
19
- - **V.** - vocab size
20
- - **H.** - hidden size
21
- - **I.** - intermediate size
22
- - **L.** - num layers
23
- - **Att. H.** - num attention heads
24
- - **KV H.** - num kv heads
25
- - **Tie Emb.** - tie word embeddings
26
-
27
-
28
-
29
- ## How to use
30
-
31
- ### Normal
32
-
33
-
34
- ### ONNX
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ datasets:
4
+ - Mxode/BiST
5
+ language:
6
+ - en
7
+ - zh
8
+ pipeline_tag: translation
9
+ library_name: transformers
10
+ ---
11
+ # **NanoTranslator-S**
12
+
13
+ English | [简体中文](README_zh-CN.md)
14
+
15
+ ## Introduction
16
+
17
+ This is the Small model of the NanoTranslator, currently supported only in **English to Chinese**.
18
+
19
+ The ONNX version of the model is also available in the repository.
20
+
21
+
22
+ | Size | Params. | V. | H. | I. | L. | Att. H. | KV H. | Tie Emb. |
23
+ | :--: | :-----: | :--: | :--: | :--: | :--: | :-----: | :---: | :------: |
24
+ | XL | 50 M | 8000 | 320 | 1792 | 24 | 16 | 4 | True |
25
+ | L | 22 M | 8000 | 256 | 1408 | 16 | 16 | 4 | True |
26
+ | M | 9 M | 4000 | 168 | 896 | 16 | 12 | 4 | True |
27
+ | S | 2 M | 2000 | 96 | 512 | 12 | 12 | 4 | True |
28
+
29
+ - **V.** - vocab size
30
+ - **H.** - hidden size
31
+ - **I.** - intermediate size
32
+ - **L.** - num layers
33
+ - **Att. H.** - num attention heads
34
+ - **KV H.** - num kv heads
35
+ - **Tie Emb.** - tie word embeddings
36
+
37
+
38
+
39
+ ## How to use
40
+
41
+ Prompt format as follows:
42
+
43
+ ```
44
+ <|im_start|> {English Text} <|endoftext|>
45
+ ```
46
+
47
+ ### Directly using transformers
48
+
49
+ ```python
50
+ import torch
51
+ from transformers import AutoTokenizer, AutoModelForCausalLM
52
+
53
+ model_path = 'Mxode/NanoTranslator-S'
54
+
55
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
56
+ model = AutoModelForCausalLM.from_pretrained(model_path)
57
+
58
+ def translate(text: str, model, **kwargs):
59
+ generation_args = dict(
60
+ max_new_tokens = kwargs.pop("max_new_tokens", 512),
61
+ do_sample = kwargs.pop("do_sample", True),
62
+ temperature = kwargs.pop("temperature", 0.55),
63
+ top_p = kwargs.pop("top_p", 0.8),
64
+ top_k = kwargs.pop("top_k", 40),
65
+ **kwargs
66
+ )
67
+
68
+ prompt = "<|im_start|>" + text + "<|endoftext|>"
69
+ model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
70
+
71
+ generated_ids = model.generate(model_inputs.input_ids, **generation_args)
72
+ generated_ids = [
73
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
74
+ ]
75
+
76
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
77
+ return response
78
+
79
+ text = "I love to watch my favorite TV series."
80
+
81
+ response = translate(text, model, max_new_tokens=64, do_sample=False)
82
+ print(response)
83
+ ```
84
+
85
+
86
+ ### ONNX
87
+
88
+ It has been measured that reasoning with ONNX models will be **2-10 times faster** than reasoning directly with transformers models.
89
+
90
+ You should switch to [onnx branch](https://huggingface.co/Mxode/NanoTranslator-S/tree/onnx) manually and download to local.
91
+
92
+ reference docs:
93
+
94
+ - [Export to ONNX](https://huggingface.co/docs/transformers/serialization)
95
+ - [Inference pipelines with the ONNX Runtime accelerator](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines)
96
+
97
+ **Using ORTModelForCausalLM**
98
+
99
+ ```python
100
+ from optimum.onnxruntime import ORTModelForCausalLM
101
+ from transformers import AutoTokenizer
102
+
103
+ model_path = "your/folder/to/onnx_model"
104
+
105
+ ort_model = ORTModelForCausalLM.from_pretrained(model_path)
106
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
107
+
108
+ text = "I love to watch my favorite TV series."
109
+
110
+ response = translate(text, ort_model, max_new_tokens=64, do_sample=False)
111
+ print(response)
112
+ ```
113
+
114
+ **Using pipeline**
115
+
116
+ ```python
117
+ from optimum.pipelines import pipeline
118
+
119
+ model_path = "your/folder/to/onnx_model"
120
+ pipe = pipeline("text-generation", model=model_path, accelerator="ort")
121
+
122
+ text = "I love to watch my favorite TV series."
123
+
124
+ response = pipe(text, max_new_tokens=64, do_sample=False)
125
+ response
126
+ ```