File size: 4,466 Bytes
6673691
 
 
 
 
 
 
 
 
 
6cb14ef
6673691
82e8e0d
 
6673691
 
6cb14ef
6673691
82e8e0d
6673691
bae6544
 
 
 
c7ccdaa
 
 
 
 
 
 
 
6673691
6cb14ef
6673691
 
 
 
6cb14ef
 
 
6673691
 
 
 
 
82e8e0d
6673691
 
 
 
 
 
 
 
 
 
 
6cb14ef
6673691
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6cb14ef
6673691
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82e8e0d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
license: gpl-3.0
datasets:
- Mxode/BiST
language:
- en
- zh
pipeline_tag: translation
library_name: transformers
---
# **NanoTranslator-M**

English | [简体中文](README_zh-CN.md)

## Introduction

This is the **medium** model of the NanoTranslator, currently supported only in **English to Chinese**.

The ONNX version of the model is also available in the repository.

All models are collected in the [NanoTranslator Collection](https://huggingface.co/collections/Mxode/nanotranslator-66e1de2ba352e926ae865bd2).

|  | P. | Arch. | Act. |  V.  |  H.  |  I.  |  L.  | A.H. | K.H. | Tie |
| :--: | :-----: | :--: | :--: | :--: | :-----: | :---: | :------: | :--: | :--: | :--: |
| [XXL2](https://huggingface.co/Mxode/NanoTranslator-XXL2) | 102 | LLaMA | SwiGLU | 16K | 1120 | 3072 | 6 | 16 | 8 | True |
|  [XXL](https://huggingface.co/Mxode/NanoTranslator-XXL)  |  100  |  LLaMA  |  SwiGLU  | 16K | 768  | 4096 |  8   |  24  |  8   | True |
|  [XL](https://huggingface.co/Mxode/NanoTranslator-XL)  |  78  | LLaMA | GeGLU  | 16K | 768  | 4096 |  6   |  24  |  8   | True |
| [L](https://huggingface.co/Mxode/NanoTranslator-L) |  49  | LLaMA | GeGLU  | 16K | 512  | 2816 |  8   |  16  |  8   | True |
| [M2](https://huggingface.co/Mxode/NanoTranslator-M2) | 22 | Qwen2 | GeGLU | 4K | 432  | 2304 |  6   |  24  |  8   | True |
|  [M](https://huggingface.co/Mxode/NanoTranslator-M)   |  22  |  LLaMA  |  SwiGLU  | 8K | 256  | 1408 |  16  |  16  |  4   | True |
|  [S](https://huggingface.co/Mxode/NanoTranslator-S)   | 9 | LLaMA | SwiGLU | 4K | 168  | 896  |  16  |  12  |  4   | True |
| [XS](https://huggingface.co/Mxode/NanoTranslator-XS) | 2 | LLaMA | SwiGLU | 2K | 96 | 512 | 12 | 12 | 4 | True |

- **P.** - Parameters (in million)
- **V.** - vocab size
- **H.** - hidden size
- **I.** - intermediate size
- **L.** - num layers
- **A.H.** - num attention heads
- **K.H.** - num kv heads
- **Tie** - tie word embeddings



## How to use

Prompt format as follows:

```
<|im_start|> {English Text} <|endoftext|>
```

### Directly using transformers

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = 'Mxode/NanoTranslator-M'

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

def translate(text: str, model, **kwargs):
    generation_args = dict(
        max_new_tokens = kwargs.pop("max_new_tokens", 512),
        do_sample = kwargs.pop("do_sample", True),
        temperature = kwargs.pop("temperature", 0.55),
        top_p = kwargs.pop("top_p", 0.8),
        top_k = kwargs.pop("top_k", 40),
        **kwargs
    )

    prompt = "<|im_start|>" + text + "<|endoftext|>"
    model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

    generated_ids = model.generate(model_inputs.input_ids, **generation_args)
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return response

text = "I love to watch my favorite TV series."

response = translate(text, model, max_new_tokens=64, do_sample=False)
print(response)
```


### ONNX

It has been measured that reasoning with ONNX models will be **2-10 times faster** than reasoning directly with transformers models.

You should switch to [onnx branch](https://huggingface.co/Mxode/NanoTranslator-M/tree/onnx) manually and download to local.

reference docs:

- [Export to ONNX](https://huggingface.co/docs/transformers/serialization)
- [Inference pipelines with the ONNX Runtime accelerator](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines)

**Using ORTModelForCausalLM**

```python
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer

model_path = "your/folder/to/onnx_model"

ort_model = ORTModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

text = "I love to watch my favorite TV series."

response = translate(text, ort_model, max_new_tokens=64, do_sample=False)
print(response)
```

**Using pipeline**

```python
from optimum.pipelines import pipeline

model_path = "your/folder/to/onnx_model"
pipe = pipeline("text-generation", model=model_path, accelerator="ort")

text = "I love to watch my favorite TV series."

response = pipe(text, max_new_tokens=64, do_sample=False)
response
```