File size: 4,466 Bytes
6673691 6cb14ef 6673691 82e8e0d 6673691 6cb14ef 6673691 82e8e0d 6673691 bae6544 c7ccdaa 6673691 6cb14ef 6673691 6cb14ef 6673691 82e8e0d 6673691 6cb14ef 6673691 6cb14ef 6673691 82e8e0d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
license: gpl-3.0
datasets:
- Mxode/BiST
language:
- en
- zh
pipeline_tag: translation
library_name: transformers
---
# **NanoTranslator-M**
English | [简体中文](README_zh-CN.md)
## Introduction
This is the **medium** model of the NanoTranslator, currently supported only in **English to Chinese**.
The ONNX version of the model is also available in the repository.
All models are collected in the [NanoTranslator Collection](https://huggingface.co/collections/Mxode/nanotranslator-66e1de2ba352e926ae865bd2).
| | P. | Arch. | Act. | V. | H. | I. | L. | A.H. | K.H. | Tie |
| :--: | :-----: | :--: | :--: | :--: | :-----: | :---: | :------: | :--: | :--: | :--: |
| [XXL2](https://huggingface.co/Mxode/NanoTranslator-XXL2) | 102 | LLaMA | SwiGLU | 16K | 1120 | 3072 | 6 | 16 | 8 | True |
| [XXL](https://huggingface.co/Mxode/NanoTranslator-XXL) | 100 | LLaMA | SwiGLU | 16K | 768 | 4096 | 8 | 24 | 8 | True |
| [XL](https://huggingface.co/Mxode/NanoTranslator-XL) | 78 | LLaMA | GeGLU | 16K | 768 | 4096 | 6 | 24 | 8 | True |
| [L](https://huggingface.co/Mxode/NanoTranslator-L) | 49 | LLaMA | GeGLU | 16K | 512 | 2816 | 8 | 16 | 8 | True |
| [M2](https://huggingface.co/Mxode/NanoTranslator-M2) | 22 | Qwen2 | GeGLU | 4K | 432 | 2304 | 6 | 24 | 8 | True |
| [M](https://huggingface.co/Mxode/NanoTranslator-M) | 22 | LLaMA | SwiGLU | 8K | 256 | 1408 | 16 | 16 | 4 | True |
| [S](https://huggingface.co/Mxode/NanoTranslator-S) | 9 | LLaMA | SwiGLU | 4K | 168 | 896 | 16 | 12 | 4 | True |
| [XS](https://huggingface.co/Mxode/NanoTranslator-XS) | 2 | LLaMA | SwiGLU | 2K | 96 | 512 | 12 | 12 | 4 | True |
- **P.** - Parameters (in million)
- **V.** - vocab size
- **H.** - hidden size
- **I.** - intermediate size
- **L.** - num layers
- **A.H.** - num attention heads
- **K.H.** - num kv heads
- **Tie** - tie word embeddings
## How to use
Prompt format as follows:
```
<|im_start|> {English Text} <|endoftext|>
```
### Directly using transformers
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = 'Mxode/NanoTranslator-M'
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)
def translate(text: str, model, **kwargs):
generation_args = dict(
max_new_tokens = kwargs.pop("max_new_tokens", 512),
do_sample = kwargs.pop("do_sample", True),
temperature = kwargs.pop("temperature", 0.55),
top_p = kwargs.pop("top_p", 0.8),
top_k = kwargs.pop("top_k", 40),
**kwargs
)
prompt = "<|im_start|>" + text + "<|endoftext|>"
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
generated_ids = model.generate(model_inputs.input_ids, **generation_args)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
return response
text = "I love to watch my favorite TV series."
response = translate(text, model, max_new_tokens=64, do_sample=False)
print(response)
```
### ONNX
It has been measured that reasoning with ONNX models will be **2-10 times faster** than reasoning directly with transformers models.
You should switch to [onnx branch](https://huggingface.co/Mxode/NanoTranslator-M/tree/onnx) manually and download to local.
reference docs:
- [Export to ONNX](https://huggingface.co/docs/transformers/serialization)
- [Inference pipelines with the ONNX Runtime accelerator](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines)
**Using ORTModelForCausalLM**
```python
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import AutoTokenizer
model_path = "your/folder/to/onnx_model"
ort_model = ORTModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
text = "I love to watch my favorite TV series."
response = translate(text, ort_model, max_new_tokens=64, do_sample=False)
print(response)
```
**Using pipeline**
```python
from optimum.pipelines import pipeline
model_path = "your/folder/to/onnx_model"
pipe = pipeline("text-generation", model=model_path, accelerator="ort")
text = "I love to watch my favorite TV series."
response = pipe(text, max_new_tokens=64, do_sample=False)
response
```
|