shibing624
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -121,6 +121,70 @@ print("Sentence embeddings:")
|
|
121 |
print(sentence_embeddings)
|
122 |
```
|
123 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
124 |
|
125 |
## Full Model Architecture
|
126 |
```
|
|
|
121 |
print(sentence_embeddings)
|
122 |
```
|
123 |
|
124 |
+
## Model speed up
|
125 |
+
|
126 |
+
|
127 |
+
| Model | ATEC | BQ | LCQMC | PAWSX | STSB |
|
128 |
+
|------------------------------------------------------------------------------------------------------------------------------|-------------------|-------------------|------------------|------------------|------------------|
|
129 |
+
| shibing624/text2vec-base-chinese (fp32, baseline) | 0.31928 | 0.42672 | 0.70157 | 0.17214 | 0.79296 |
|
130 |
+
| shibing624/text2vec-base-chinese (onnx-O4, [#29](https://huggingface.co/shibing624/text2vec-base-chinese/discussions/29)) | 0.31928 | 0.42672 | 0.70157 | 0.17214 | 0.79296 |
|
131 |
+
| shibing624/text2vec-base-chinese (ov, [#27](https://huggingface.co/shibing624/text2vec-base-chinese/discussions/27)) | 0.31928 | 0.42672 | 0.70157 | 0.17214 | 0.79296 |
|
132 |
+
| shibing624/text2vec-base-chinese (ov-qint8, [#30](https://huggingface.co/shibing624/text2vec-base-chinese/discussions/30)) | 0.30778 (-3.60%) | 0.43474 (+1.88%) | 0.69620 (-0.77%) | 0.16662 (-3.20%) | 0.79396 (+0.13%) |
|
133 |
+
|
134 |
+
In short:
|
135 |
+
1. ✅ shibing624/text2vec-base-chinese (onnx-O4), ONNX Optimized to [O4](https://huggingface.co/docs/optimum/en/onnxruntime/usage_guides/optimization) does not reduce performance, but gives a [~2x speedup](https://sbert.net/docs/sentence_transformer/usage/efficiency.html#benchmarks) on GPU.
|
136 |
+
2. ✅ shibing624/text2vec-base-chinese (ov), OpenVINO does not reduce performance, but gives a 1.12x speedup on CPU.
|
137 |
+
3. 🟡 shibing624/text2vec-base-chinese (ov-qint8), int8 quantization with OV incurs a small performance hit on some tasks, and a tiny performance gain on others, when quantizing with [Chinese STSB](https://huggingface.co/datasets/PhilipMay/stsb_multi_mt). Additionally, it results in a [4.78x speedup](https://sbert.net/docs/sentence_transformer/usage/efficiency.html#benchmarks) on CPU.
|
138 |
+
|
139 |
+
- usage: shibing624/text2vec-base-chinese (onnx-O4), for gpu
|
140 |
+
```
|
141 |
+
from sentence_transformers import SentenceTransformer
|
142 |
+
|
143 |
+
model = SentenceTransformer(
|
144 |
+
"shibing624/text2vec-base-chinese",
|
145 |
+
backend="onnx",
|
146 |
+
model_kwargs={"file_name": "model_O4.onnx"},
|
147 |
+
)
|
148 |
+
embeddings = model.encode(["怎么开通银行卡", "如何更换花呗绑定银行卡", "花呗更改绑定银行卡"])
|
149 |
+
print(embeddings.shape)
|
150 |
+
|
151 |
+
similarities = model.similarity(embeddings, embeddings)
|
152 |
+
print(similarities)
|
153 |
+
```
|
154 |
+
|
155 |
+
|
156 |
+
- usage: shibing624/text2vec-base-chinese (ov), for cpu
|
157 |
+
```
|
158 |
+
from sentence_transformers import SentenceTransformer
|
159 |
+
|
160 |
+
model = SentenceTransformer(
|
161 |
+
"shibing624/text2vec-base-chinese",
|
162 |
+
backend="openvino",
|
163 |
+
)
|
164 |
+
|
165 |
+
embeddings = model.encode(["怎么开通银行卡", "如何更换花呗绑定银行卡", "花呗更改绑定银行卡"])
|
166 |
+
print(embeddings.shape)
|
167 |
+
|
168 |
+
similarities = model.similarity(embeddings, embeddings)
|
169 |
+
print(similarities)
|
170 |
+
```
|
171 |
+
|
172 |
+
- usage: shibing624/text2vec-base-chinese (ov-qint8), for cpu
|
173 |
+
```
|
174 |
+
from sentence_transformers import SentenceTransformer
|
175 |
+
|
176 |
+
model = SentenceTransformer(
|
177 |
+
"shibing624/text2vec-base-chinese",
|
178 |
+
backend="onnx",
|
179 |
+
model_kwargs={"file_name": "model_qint8_avx512_vnni.onnx"},
|
180 |
+
)
|
181 |
+
embeddings = model.encode(["怎么开通银行卡", "如何更换花呗绑定银行卡", "花呗更改绑定银行卡"])
|
182 |
+
print(embeddings.shape)
|
183 |
+
|
184 |
+
similarities = model.similarity(embeddings, embeddings)
|
185 |
+
print(similarities)
|
186 |
+
```
|
187 |
+
|
188 |
|
189 |
## Full Model Architecture
|
190 |
```
|