Add new SentenceTransformer model.
Browse files- README.md +133 -81
- config_sentence_transformers.json +2 -2
README.md
CHANGED
@@ -17,89 +17,89 @@ tags:
|
|
17 |
- sentence-similarity
|
18 |
- feature-extraction
|
19 |
- generated_from_trainer
|
20 |
-
- dataset_size:
|
21 |
- loss:MSELoss
|
22 |
-
- dataset_size:5000
|
23 |
-
- dataset_size:8000
|
24 |
-
- dataset_size:100000
|
25 |
widget:
|
26 |
-
- source_sentence: '
|
27 |
|
28 |
'
|
29 |
sentences:
|
30 |
-
- '
|
|
|
31 |
|
32 |
'
|
33 |
-
- '
|
|
|
34 |
|
35 |
'
|
36 |
-
- '
|
37 |
|
38 |
'
|
39 |
-
- source_sentence: '
|
40 |
-
|
41 |
|
42 |
'
|
43 |
sentences:
|
44 |
-
- '
|
45 |
-
|
|
|
46 |
|
47 |
'
|
48 |
-
- '
|
49 |
-
|
50 |
-
बाबेलका राजाको सेनाहरूकहाँ सुम्पिनेछु।
|
51 |
|
52 |
'
|
53 |
-
- '
|
54 |
-
|
55 |
-
'
|
56 |
-
- source_sentence: 'The two-day conference will participate in investors from China,
|
57 |
-
India, Japan, the US, European countries, Britain and other countries, the Federation
|
58 |
-
said.
|
59 |
|
60 |
'
|
|
|
|
|
|
|
61 |
sentences:
|
62 |
-
- '
|
63 |
-
|
64 |
-
|
65 |
-
- 'दुई दिनसम्म हुने सम्मेलनमा चीन, भारत, जापान, अमेरिका, युरोपियन देशहरू, बेलायत
|
66 |
-
लगायत देशबाट लगानीकर्ताको सहभागिता गराउने महासंघले जानकारी दिएको छ
|
67 |
|
68 |
'
|
69 |
-
-
|
|
|
|
|
|
|
|
|
|
|
70 |
|
71 |
'
|
72 |
-
- source_sentence: '
|
73 |
-
|
|
|
74 |
|
75 |
'
|
76 |
sentences:
|
77 |
-
- '
|
78 |
-
तर त्यसपछि यसलाई फिर्ता लिनुभयो।
|
79 |
|
80 |
'
|
81 |
-
- '
|
82 |
-
|
83 |
|
84 |
'
|
85 |
-
- '
|
86 |
-
|
|
|
87 |
|
88 |
'
|
89 |
-
- source_sentence: '
|
90 |
-
to see if they can finally close this chapter.
|
91 |
|
92 |
'
|
93 |
sentences:
|
94 |
-
- '
|
|
|
|
|
95 |
|
96 |
'
|
97 |
-
- '
|
98 |
-
शीर्षकनियम स्���ेसबाट बनेको छ, जुन ठूलो गहिराइमा उच्च दबाब बुझ्न सक्षम छ।
|
99 |
|
100 |
'
|
101 |
-
- '
|
102 |
-
यो अध्याय बन्द गर्न सक्छन्।
|
103 |
|
104 |
'
|
105 |
model-index:
|
@@ -113,7 +113,7 @@ model-index:
|
|
113 |
type: unknown
|
114 |
metrics:
|
115 |
- type: negative_mse
|
116 |
-
value: -0.
|
117 |
name: Negative Mse
|
118 |
- task:
|
119 |
type: translation
|
@@ -123,13 +123,13 @@ model-index:
|
|
123 |
type: unknown
|
124 |
metrics:
|
125 |
- type: src2trg_accuracy
|
126 |
-
value: 0.
|
127 |
name: Src2Trg Accuracy
|
128 |
- type: trg2src_accuracy
|
129 |
-
value: 0.
|
130 |
name: Trg2Src Accuracy
|
131 |
- type: mean_accuracy
|
132 |
-
value: 0.
|
133 |
name: Mean Accuracy
|
134 |
---
|
135 |
|
@@ -184,9 +184,9 @@ from sentence_transformers import SentenceTransformer
|
|
184 |
model = SentenceTransformer("jangedoo/all-MiniLM-L6-v2-nepali")
|
185 |
# Run inference
|
186 |
sentences = [
|
187 |
-
'
|
188 |
-
'
|
189 |
-
'
|
190 |
]
|
191 |
embeddings = model.encode(sentences)
|
192 |
print(embeddings.shape)
|
@@ -232,7 +232,7 @@ You can finetune this model on your own dataset.
|
|
232 |
|
233 |
| Metric | Value |
|
234 |
|:-----------------|:------------|
|
235 |
-
| **negative_mse** | **-0.
|
236 |
|
237 |
#### Translation
|
238 |
|
@@ -240,9 +240,9 @@ You can finetune this model on your own dataset.
|
|
240 |
|
241 |
| Metric | Value |
|
242 |
|:------------------|:-----------|
|
243 |
-
| src2trg_accuracy | 0.
|
244 |
-
| trg2src_accuracy | 0.
|
245 |
-
| **mean_accuracy** | **0.
|
246 |
|
247 |
<!--
|
248 |
## Bias, Risks and Limitations
|
@@ -263,7 +263,7 @@ You can finetune this model on your own dataset.
|
|
263 |
#### momo22/eng2nep
|
264 |
|
265 |
* Dataset: [momo22/eng2nep](https://huggingface.co/datasets/momo22/eng2nep) at [57da8d4](https://huggingface.co/datasets/momo22/eng2nep/tree/57da8d44266896e334c1d8f2528cbbf666fbd0ca)
|
266 |
-
* Size:
|
267 |
* Columns: <code>English</code>, <code>Nepali</code>, and <code>label</code>
|
268 |
* Approximate statistics based on the first 1000 samples:
|
269 |
| | English | Nepali | label |
|
@@ -274,8 +274,8 @@ You can finetune this model on your own dataset.
|
|
274 |
| English | Nepali | label |
|
275 |
|:------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|
|
276 |
| <code>But with the origin of feudal practices in the Middle Ages, the practice of untouchability began, as well as discrimination against women.<br></code> | <code>तर मध्ययुगमा सामन्ती प्रथाको उद्भव भएसँगै जसरी छुवाछुत प्रथाको शुरुवात भयो, त्यसैगरी नारी प्रति पनि विभेद गरिन थालियो<br></code> | <code>[-0.05432726442813873, 0.029996933415532112, -0.008532932959496975, -0.035200122743844986, 0.008856767788529396, ...]</code> |
|
277 |
-
| <code>A Pandit was found on the way to Pokhara from Baglung.<br></code> | <code>वाग्लुङ्गबाट पोखरा आउँदा बाटोमा एकजना पण्डित भेटिए।<br></code> | <code>[-0.
|
278 |
-
| <code>He went on: "She ate a perfectly normal and healthy diet.<br></code> | <code>उनी गए: "उनले पूर्ण सामान्य र स्वस्थ आहार खाइन्।<br></code> | <code>[0.
|
279 |
* Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
|
280 |
|
281 |
### Evaluation Dataset
|
@@ -291,11 +291,11 @@ You can finetune this model on your own dataset.
|
|
291 |
| type | string | string | list |
|
292 |
| details | <ul><li>min: 4 tokens</li><li>mean: 26.48 tokens</li><li>max: 213 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 63.73 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>size: 384 elements</li></ul> |
|
293 |
* Samples:
|
294 |
-
| English | Nepali | label
|
295 |
-
|
296 |
-
| <code>Chapter 3<br></code> | <code>परिच्छेद–३<br></code> | <code>[-0.
|
297 |
-
| <code>The capability of MOF would be strengthened to enable it to efficiently play the lead role in donor coordination, and to secure support from all stakeholders in aid coordination activities.<br></code> | <code>दाताहरूको समन्वयमा नेतृत्वदायीको भूमिका निर्वाह प्रभावकारी ढंगले गर्न अर्थ मन्त्रालयको क्षमता सुदृढ गरिनेछ यसको लागि सबै सरोकारवालाबाट समर्थन प्राप्त गरिनेछ ।<br></code> | <code>[-0.
|
298 |
-
| <code>Polimatrix, Inc. is a system integrator and total solutions provider delivering radiation and nuclear protection and detection.<br></code> | <code>पोलिमाट्रिक्स, इन्कर्पोरेटिड प्रणाली इन्टिजर र कुल समाधान प्रदायक रेडियो र आणविक संरक्षण र पत्ता लगाउने प्रणाली इन्टिजर र कुल समाधान प्रदायक हो।<br></code> | <code>[-0.
|
299 |
* Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
|
300 |
|
301 |
### Training Hyperparameters
|
@@ -305,6 +305,7 @@ You can finetune this model on your own dataset.
|
|
305 |
- `per_device_train_batch_size`: 64
|
306 |
- `per_device_eval_batch_size`: 64
|
307 |
- `learning_rate`: 2e-05
|
|
|
308 |
- `warmup_ratio`: 0.1
|
309 |
- `bf16`: True
|
310 |
- `push_to_hub`: True
|
@@ -324,13 +325,14 @@ You can finetune this model on your own dataset.
|
|
324 |
- `per_gpu_eval_batch_size`: None
|
325 |
- `gradient_accumulation_steps`: 1
|
326 |
- `eval_accumulation_steps`: None
|
|
|
327 |
- `learning_rate`: 2e-05
|
328 |
- `weight_decay`: 0.0
|
329 |
- `adam_beta1`: 0.9
|
330 |
- `adam_beta2`: 0.999
|
331 |
- `adam_epsilon`: 1e-08
|
332 |
- `max_grad_norm`: 1.0
|
333 |
-
- `num_train_epochs`:
|
334 |
- `max_steps`: -1
|
335 |
- `lr_scheduler_type`: linear
|
336 |
- `lr_scheduler_kwargs`: {}
|
@@ -421,35 +423,85 @@ You can finetune this model on your own dataset.
|
|
421 |
- `optim_target_modules`: None
|
422 |
- `batch_eval_metrics`: False
|
423 |
- `eval_on_start`: False
|
|
|
424 |
- `batch_sampler`: batch_sampler
|
425 |
- `multi_dataset_batch_sampler`: proportional
|
426 |
|
427 |
</details>
|
428 |
|
429 |
### Training Logs
|
430 |
-
| Epoch | Step
|
431 |
-
|
432 |
-
| 0.
|
433 |
-
| 0.
|
434 |
-
| 0.
|
435 |
-
| 0.
|
436 |
-
| 0.
|
437 |
-
| 0.
|
438 |
-
| 0.
|
439 |
-
|
|
440 |
-
|
|
441 |
-
|
|
442 |
-
|
|
443 |
-
|
|
444 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
445 |
|
446 |
|
447 |
### Framework Versions
|
448 |
-
- Python: 3.
|
449 |
- Sentence Transformers: 3.0.1
|
450 |
-
- Transformers: 4.
|
451 |
-
- PyTorch: 2.
|
452 |
-
- Accelerate: 0.
|
453 |
- Datasets: 2.21.0
|
454 |
- Tokenizers: 0.19.1
|
455 |
|
|
|
17 |
- sentence-similarity
|
18 |
- feature-extraction
|
19 |
- generated_from_trainer
|
20 |
+
- dataset_size:800000
|
21 |
- loss:MSELoss
|
|
|
|
|
|
|
22 |
widget:
|
23 |
+
- source_sentence: 'OUTDOOR SPACE: A covered porch and a deck.
|
24 |
|
25 |
'
|
26 |
sentences:
|
27 |
+
- 'नेपालमा चीन र भारतबाट अवैध रूपले प्रशस्तै प्लाष्टिकका सामानहरू आइरहे पनि के कति
|
28 |
+
आउँछ भन्ने तथ्याङ्क कसैसँग छैन।
|
29 |
|
30 |
'
|
31 |
+
- 'पछिल्लो समयमा बेलायतले ब्रिटिस – गोर्खा सेनामा कार्यरत भूपू सैनिकहरूलाई नागरिकता
|
32 |
+
दिने जनाएको छ।
|
33 |
|
34 |
'
|
35 |
+
- 'OUTDOOR SPACE: ढाकिएको दलान र डक।
|
36 |
|
37 |
'
|
38 |
+
- source_sentence: 'Gunakar Aryal, station manager of Madanpokhara FM, says preparations
|
39 |
+
are underway to construct a radio station building from this amount.
|
40 |
|
41 |
'
|
42 |
sentences:
|
43 |
+
- 'उक्त अवसरमा समितीका उपाध्यक्ष सहीद हवारी, कोषाध्यक्ष ईलियास अन्सारी, सचिव ताजमा
|
44 |
+
खातुन, विश्वास सामुदायिक संस्थाका अध्यक्ष सुलेमान हवारी, युवा नेता रामकिसोर सिंह
|
45 |
+
पराग लगायतको उपस्थिती रहेको थियो
|
46 |
|
47 |
'
|
48 |
+
- 'मदनपोखरा एफएमका स्टेशन मेनेजर गुणाकर अर्याल यो रकमबाट रेडियोको स्टेशन भवन निर्माण
|
49 |
+
गर्ने तयारी भइरहेको बताउँछन्।
|
|
|
50 |
|
51 |
'
|
52 |
+
- 'एकपटक तिनीहरू माथि छन्, भण्डारले गृह पहुँचकर्ताहरूमा ठूलो छुट राखे।
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
'
|
55 |
+
- source_sentence: "I will stay here, because a good opportunity for a great and growing\
|
56 |
+
\ work has been given to me now. And there are many people working against it.\
|
57 |
+
\ \n"
|
58 |
sentences:
|
59 |
+
- 'राज्य विभागका प्रवक्ताले आफ्ना सबै कैदीहरूको भल्भकालागि FARC ���िम्मेवार राखे र
|
60 |
+
भन्नुभयो "जीवनको प्रमाण होस्टहरूको निष्कासन सुरक्षित गर्न कुनै श्रेणी प्रयासका
|
61 |
+
लागि आवश्यक र आवश्यक कदम हो।"
|
|
|
|
|
62 |
|
63 |
'
|
64 |
+
- "किनभने त्यहाँ प्रभावपूर्ण कार्यको एउटा विशाल मौका हात लाग्नेवाला छ। अनि धेरैजना\
|
65 |
+
\ त्यस कार्यको विरोधमा पनि काम गर्दैछन्। \n"
|
66 |
+
- '(८) यस नियम बमोजिम इजाजतपत्रवालाहरु गाभिएको सूचना उपनियम (४) बमोजिम इजाजतपत्र
|
67 |
+
प्राप्त गर्ने संस्थाले राष्ट्रियस्तरको दैनिक पत्रिकामा प्रकाशन गर्नु पर्नेछ ।
|
68 |
+
९. इजाजतपत्र रद्द भएको जानकारी दिनु पर्नेः ऐनको दफा १३ बमोजिम इजाजतपत्र रद्द भएमा
|
69 |
+
विभागले सोको जानकारी इजाजतपत्रवालालाई दिनु पर्नेछ ।
|
70 |
|
71 |
'
|
72 |
+
- source_sentence: 'Due to the fake, the audio CDs and VCDs of foreign songs at a
|
73 |
+
very cheap rate in the open roads and markets of Marashyam Memorial Care have
|
74 |
+
started to affect the Nepalese music market.
|
75 |
|
76 |
'
|
77 |
sentences:
|
78 |
+
- 'दुवैजना गोदावरी घुमेर आएका थिए।
|
|
|
79 |
|
80 |
'
|
81 |
+
- 'नीलो सूर्य बायोडिजेल शुद्ध तरकारी तेलबाट बनेको प्रिमियम जैविक इन्धनको प्रमुख
|
82 |
+
आपूर्तिकर्ता हो।
|
83 |
|
84 |
'
|
85 |
+
- 'नक्कलीले गर्दा सक्कलीलाई मारश्याम स्मृतराजधानीका खुला सडक र बजारमा अत्यन्त सस्तो
|
86 |
+
दरका विदेशी गीतका अडियो सीडी तथा भीसीडी पाइन थालेपछि त्यसको ठाडो असर नेपाली संगीत
|
87 |
+
बजारमा पर्र्न थालेको छ।
|
88 |
|
89 |
'
|
90 |
+
- source_sentence: '"This was very surprising to me," said UM Professor Michael Combi.
|
|
|
91 |
|
92 |
'
|
93 |
sentences:
|
94 |
+
- '९) अनाजलाई भिजाएको भाँडोबाट निकालेर एक पटक सफा पानीले धोई चालनीजस्तो जालीदार
|
95 |
+
भाँडोमा खन्याउनु पर्दछ र यसमा एक घण्टा जति राखी पानी पूरा तर्केपछि मोटो कपडामा
|
96 |
+
बाँध्ने या मोटो कपडाको थैलोमा भरेर झुण्ड्याई दिने या कुनै भाँडामा राखी दिने।
|
97 |
|
98 |
'
|
99 |
+
- '"यो मेरोलागि निकै आश्चर्यजनक थियो," युएम प्राध्यापक माइकल कम्बिले भन्नुभयो।
|
|
|
100 |
|
101 |
'
|
102 |
+
- 'ऐंसेलुखर्क – ३, नयाँ टोल, खोटाङ
|
|
|
103 |
|
104 |
'
|
105 |
model-index:
|
|
|
113 |
type: unknown
|
114 |
metrics:
|
115 |
- type: negative_mse
|
116 |
+
value: -0.21079338621348143
|
117 |
name: Negative Mse
|
118 |
- task:
|
119 |
type: translation
|
|
|
123 |
type: unknown
|
124 |
metrics:
|
125 |
- type: src2trg_accuracy
|
126 |
+
value: 0.7323
|
127 |
name: Src2Trg Accuracy
|
128 |
- type: trg2src_accuracy
|
129 |
+
value: 0.5639
|
130 |
name: Trg2Src Accuracy
|
131 |
- type: mean_accuracy
|
132 |
+
value: 0.6480999999999999
|
133 |
name: Mean Accuracy
|
134 |
---
|
135 |
|
|
|
184 |
model = SentenceTransformer("jangedoo/all-MiniLM-L6-v2-nepali")
|
185 |
# Run inference
|
186 |
sentences = [
|
187 |
+
'"This was very surprising to me," said UM Professor Michael Combi.\n',
|
188 |
+
'"यो मेरोलागि निकै आश्चर्यजनक थियो," युएम प्राध्यापक माइकल कम्बिले भन्नुभयो।\n',
|
189 |
+
'ऐंसेलुखर्क – ३, नयाँ टोल, खोटाङ\n',
|
190 |
]
|
191 |
embeddings = model.encode(sentences)
|
192 |
print(embeddings.shape)
|
|
|
232 |
|
233 |
| Metric | Value |
|
234 |
|:-----------------|:------------|
|
235 |
+
| **negative_mse** | **-0.2108** |
|
236 |
|
237 |
#### Translation
|
238 |
|
|
|
240 |
|
241 |
| Metric | Value |
|
242 |
|:------------------|:-----------|
|
243 |
+
| src2trg_accuracy | 0.7323 |
|
244 |
+
| trg2src_accuracy | 0.5639 |
|
245 |
+
| **mean_accuracy** | **0.6481** |
|
246 |
|
247 |
<!--
|
248 |
## Bias, Risks and Limitations
|
|
|
263 |
#### momo22/eng2nep
|
264 |
|
265 |
* Dataset: [momo22/eng2nep](https://huggingface.co/datasets/momo22/eng2nep) at [57da8d4](https://huggingface.co/datasets/momo22/eng2nep/tree/57da8d44266896e334c1d8f2528cbbf666fbd0ca)
|
266 |
+
* Size: 800,000 training samples
|
267 |
* Columns: <code>English</code>, <code>Nepali</code>, and <code>label</code>
|
268 |
* Approximate statistics based on the first 1000 samples:
|
269 |
| | English | Nepali | label |
|
|
|
274 |
| English | Nepali | label |
|
275 |
|:------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|
|
276 |
| <code>But with the origin of feudal practices in the Middle Ages, the practice of untouchability began, as well as discrimination against women.<br></code> | <code>तर मध्ययुगमा सामन्ती प्रथाको उद्भव भएसँगै जसरी छुवाछुत प्रथाको शुरुवात भयो, त्यसैगरी नारी प्रति पनि विभेद गरिन थालियो<br></code> | <code>[-0.05432726442813873, 0.029996933415532112, -0.008532932959496975, -0.035200122743844986, 0.008856767788529396, ...]</code> |
|
277 |
+
| <code>A Pandit was found on the way to Pokhara from Baglung.<br></code> | <code>वाग्लुङ्गबाट पोखरा आउँदा बाटोमा एकजना पण्डित भेटिए।<br></code> | <code>[-0.023763157427310944, 0.09590080380439758, -0.11197677254676819, 0.10978180170059204, -0.028137221932411194, ...]</code> |
|
278 |
+
| <code>He went on: "She ate a perfectly normal and healthy diet.<br></code> | <code>उनी गए: "उनले पूर्ण सामान्य र स्वस्थ आहार खाइन्।<br></code> | <code>[0.028130438178777695, 0.03038676083087921, -0.012276142835617065, 0.1316222846508026, -0.01928197592496872, ...]</code> |
|
279 |
* Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
|
280 |
|
281 |
### Evaluation Dataset
|
|
|
291 |
| type | string | string | list |
|
292 |
| details | <ul><li>min: 4 tokens</li><li>mean: 26.48 tokens</li><li>max: 213 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 63.73 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>size: 384 elements</li></ul> |
|
293 |
* Samples:
|
294 |
+
| English | Nepali | label |
|
295 |
+
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------|
|
296 |
+
| <code>Chapter 3<br></code> | <code>परिच्छेद–३<br></code> | <code>[-0.04945989325642586, 0.048675231635570526, 0.016583407297730446, 0.048761602491140366, -0.020754696801304817, ...]</code> |
|
297 |
+
| <code>The capability of MOF would be strengthened to enable it to efficiently play the lead role in donor coordination, and to secure support from all stakeholders in aid coordination activities.<br></code> | <code>दाताहरूको समन्वयमा नेतृत्वदायीको भूमिका निर्वाह प्रभावकारी ढंगले गर्न अर्थ मन्त्रालयको क्षमता सुदृढ गरिनेछ यसको लागि सबै सरोकारवालाबाट समर्थन प्राप्त गरिनेछ ।<br></code> | <code>[-0.06200314313173294, -0.016507906839251518, -0.029924260452389717, -0.05250919610261917, 0.07746176421642303, ...]</code> |
|
298 |
+
| <code>Polimatrix, Inc. is a system integrator and total solutions provider delivering radiation and nuclear protection and detection.<br></code> | <code>पोलिमाट्रिक्स, इन्कर्पोरेटिड प्रणाली इन्टिजर र कुल समाधान प्रदायक रेडियो र आणविक संरक्षण र पत्ता लगाउने प्रणाली इन्टिजर र कुल समाधान प्रदायक हो।<br></code> | <code>[-0.0446796789765358, 0.02642829343676567, -0.09837698936462402, -0.07765442132949829, -0.02036469243466854, ...]</code> |
|
299 |
* Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
|
300 |
|
301 |
### Training Hyperparameters
|
|
|
305 |
- `per_device_train_batch_size`: 64
|
306 |
- `per_device_eval_batch_size`: 64
|
307 |
- `learning_rate`: 2e-05
|
308 |
+
- `num_train_epochs`: 5
|
309 |
- `warmup_ratio`: 0.1
|
310 |
- `bf16`: True
|
311 |
- `push_to_hub`: True
|
|
|
325 |
- `per_gpu_eval_batch_size`: None
|
326 |
- `gradient_accumulation_steps`: 1
|
327 |
- `eval_accumulation_steps`: None
|
328 |
+
- `torch_empty_cache_steps`: None
|
329 |
- `learning_rate`: 2e-05
|
330 |
- `weight_decay`: 0.0
|
331 |
- `adam_beta1`: 0.9
|
332 |
- `adam_beta2`: 0.999
|
333 |
- `adam_epsilon`: 1e-08
|
334 |
- `max_grad_norm`: 1.0
|
335 |
+
- `num_train_epochs`: 5
|
336 |
- `max_steps`: -1
|
337 |
- `lr_scheduler_type`: linear
|
338 |
- `lr_scheduler_kwargs`: {}
|
|
|
423 |
- `optim_target_modules`: None
|
424 |
- `batch_eval_metrics`: False
|
425 |
- `eval_on_start`: False
|
426 |
+
- `eval_use_gather_object`: False
|
427 |
- `batch_sampler`: batch_sampler
|
428 |
- `multi_dataset_batch_sampler`: proportional
|
429 |
|
430 |
</details>
|
431 |
|
432 |
### Training Logs
|
433 |
+
| Epoch | Step | Training Loss | loss | mean_accuracy | negative_mse |
|
434 |
+
|:------:|:-----:|:-------------:|:------:|:-------------:|:------------:|
|
435 |
+
| 0.08 | 1000 | 0.0022 | 0.0019 | 0.0132 | -0.3831 |
|
436 |
+
| 0.16 | 2000 | 0.002 | 0.0018 | 0.0184 | -0.3665 |
|
437 |
+
| 0.24 | 3000 | 0.0019 | 0.0018 | 0.0243 | -0.3511 |
|
438 |
+
| 0.32 | 4000 | 0.0019 | 0.0017 | 0.0307 | -0.3400 |
|
439 |
+
| 0.4 | 5000 | 0.0018 | 0.0017 | 0.0386 | -0.3317 |
|
440 |
+
| 0.48 | 6000 | 0.0018 | 0.0016 | 0.0504 | -0.3239 |
|
441 |
+
| 0.56 | 7000 | 0.0017 | 0.0016 | 0.0701 | -0.3148 |
|
442 |
+
| 0.64 | 8000 | 0.0017 | 0.0016 | 0.0973 | -0.3057 |
|
443 |
+
| 0.72 | 9000 | 0.0017 | 0.0015 | 0.1307 | -0.2964 |
|
444 |
+
| 0.8 | 10000 | 0.0016 | 0.0015 | 0.1672 | -0.2882 |
|
445 |
+
| 0.88 | 11000 | 0.0016 | 0.0014 | 0.2049 | -0.2802 |
|
446 |
+
| 0.96 | 12000 | 0.0016 | 0.0014 | 0.2358 | -0.2752 |
|
447 |
+
| 1.04 | 13000 | 0.0015 | 0.0014 | 0.2631 | -0.2701 |
|
448 |
+
| 1.12 | 14000 | 0.0015 | 0.0014 | 0.2896 | -0.2650 |
|
449 |
+
| 1.2 | 15000 | 0.0015 | 0.0013 | 0.3191 | -0.2606 |
|
450 |
+
| 1.28 | 16000 | 0.0015 | 0.0013 | 0.3467 | -0.2570 |
|
451 |
+
| 1.3600 | 17000 | 0.0014 | 0.0013 | 0.3674 | -0.2536 |
|
452 |
+
| 1.44 | 18000 | 0.0014 | 0.0013 | 0.3868 | -0.2502 |
|
453 |
+
| 1.52 | 19000 | 0.0014 | 0.0013 | 0.4069 | -0.2475 |
|
454 |
+
| 1.6 | 20000 | 0.0014 | 0.0013 | 0.4235 | -0.2456 |
|
455 |
+
| 1.6800 | 21000 | 0.0014 | 0.0013 | 0.4397 | -0.2433 |
|
456 |
+
| 1.76 | 22000 | 0.0014 | 0.0012 | 0.4538 | -0.2410 |
|
457 |
+
| 1.8400 | 23000 | 0.0014 | 0.0012 | 0.4630 | -0.2392 |
|
458 |
+
| 1.92 | 24000 | 0.0014 | 0.0012 | 0.4798 | -0.2374 |
|
459 |
+
| 2.0 | 25000 | 0.0014 | 0.0012 | 0.4880 | -0.2354 |
|
460 |
+
| 2.08 | 26000 | 0.0013 | 0.0012 | 0.5018 | -0.2340 |
|
461 |
+
| 2.16 | 27000 | 0.0013 | 0.0012 | 0.5097 | -0.2324 |
|
462 |
+
| 2.24 | 28000 | 0.0013 | 0.0012 | 0.5199 | -0.2305 |
|
463 |
+
| 2.32 | 29000 | 0.0013 | 0.0012 | 0.5291 | -0.2292 |
|
464 |
+
| 2.4 | 30000 | 0.0013 | 0.0012 | 0.5373 | -0.2292 |
|
465 |
+
| 2.48 | 31000 | 0.0013 | 0.0012 | 0.5487 | -0.2271 |
|
466 |
+
| 2.56 | 32000 | 0.0013 | 0.0012 | 0.5543 | -0.2259 |
|
467 |
+
| 2.64 | 33000 | 0.0013 | 0.0012 | 0.5616 | -0.2249 |
|
468 |
+
| 2.7200 | 34000 | 0.0013 | 0.0012 | 0.5698 | -0.2236 |
|
469 |
+
| 2.8 | 35000 | 0.0013 | 0.0012 | 0.5779 | -0.2225 |
|
470 |
+
| 2.88 | 36000 | 0.0013 | 0.0012 | 0.5829 | -0.2218 |
|
471 |
+
| 2.96 | 37000 | 0.0013 | 0.0011 | 0.5893 | -0.2208 |
|
472 |
+
| 3.04 | 38000 | 0.0013 | 0.0011 | 0.5947 | -0.2202 |
|
473 |
+
| 3.12 | 39000 | 0.0013 | 0.0011 | 0.5986 | -0.2195 |
|
474 |
+
| 3.2 | 40000 | 0.0013 | 0.0011 | 0.6019 | -0.2183 |
|
475 |
+
| 3.2800 | 41000 | 0.0013 | 0.0011 | 0.6076 | -0.2177 |
|
476 |
+
| 3.36 | 42000 | 0.0013 | 0.0011 | 0.6112 | -0.2173 |
|
477 |
+
| 3.44 | 43000 | 0.0013 | 0.0011 | 0.6143 | -0.2166 |
|
478 |
+
| 3.52 | 44000 | 0.0012 | 0.0011 | 0.6178 | -0.2163 |
|
479 |
+
| 3.6 | 45000 | 0.0012 | 0.0011 | 0.6225 | -0.2153 |
|
480 |
+
| 3.68 | 46000 | 0.0012 | 0.0011 | 0.6232 | -0.2148 |
|
481 |
+
| 3.76 | 47000 | 0.0012 | 0.0011 | 0.6292 | -0.2142 |
|
482 |
+
| 3.84 | 48000 | 0.0012 | 0.0011 | 0.6317 | -0.2136 |
|
483 |
+
| 3.92 | 49000 | 0.0012 | 0.0011 | 0.6323 | -0.2135 |
|
484 |
+
| 4.0 | 50000 | 0.0012 | 0.0011 | 0.634 | -0.2134 |
|
485 |
+
| 4.08 | 51000 | 0.0012 | 0.0011 | 0.6362 | -0.2129 |
|
486 |
+
| 4.16 | 52000 | 0.0012 | 0.0011 | 0.6377 | -0.2126 |
|
487 |
+
| 4.24 | 53000 | 0.0012 | 0.0011 | 0.6379 | -0.2122 |
|
488 |
+
| 4.32 | 54000 | 0.0012 | 0.0011 | 0.6413 | -0.2118 |
|
489 |
+
| 4.4 | 55000 | 0.0012 | 0.0011 | 0.6425 | -0.2117 |
|
490 |
+
| 4.48 | 56000 | 0.0012 | 0.0011 | 0.6425 | -0.2115 |
|
491 |
+
| 4.5600 | 57000 | 0.0012 | 0.0011 | 0.6454 | -0.2114 |
|
492 |
+
| 4.64 | 58000 | 0.0012 | 0.0011 | 0.6440 | -0.2112 |
|
493 |
+
| 4.72 | 59000 | 0.0012 | 0.0011 | 0.6463 | -0.2110 |
|
494 |
+
| 4.8 | 60000 | 0.0012 | 0.0011 | 0.6466 | -0.2110 |
|
495 |
+
| 4.88 | 61000 | 0.0012 | 0.0011 | 0.6465 | -0.2109 |
|
496 |
+
| 4.96 | 62000 | 0.0012 | 0.0011 | 0.6481 | -0.2108 |
|
497 |
|
498 |
|
499 |
### Framework Versions
|
500 |
+
- Python: 3.11.9
|
501 |
- Sentence Transformers: 3.0.1
|
502 |
+
- Transformers: 4.44.0
|
503 |
+
- PyTorch: 2.4.0+cu121
|
504 |
+
- Accelerate: 0.33.0
|
505 |
- Datasets: 2.21.0
|
506 |
- Tokenizers: 0.19.1
|
507 |
|
config_sentence_transformers.json
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
{
|
2 |
"__version__": {
|
3 |
"sentence_transformers": "3.0.1",
|
4 |
-
"transformers": "4.
|
5 |
-
"pytorch": "2.
|
6 |
},
|
7 |
"prompts": {},
|
8 |
"default_prompt_name": null,
|
|
|
1 |
{
|
2 |
"__version__": {
|
3 |
"sentence_transformers": "3.0.1",
|
4 |
+
"transformers": "4.44.0",
|
5 |
+
"pytorch": "2.4.0+cu121"
|
6 |
},
|
7 |
"prompts": {},
|
8 |
"default_prompt_name": null,
|