Saideepthi55 commited on
Commit
be0d478
·
verified ·
1 Parent(s): 5c8233b

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,551 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: microsoft/mpnet-base
3
+ datasets:
4
+ - SwastikN/sxc_med_llm_chemical_gen
5
+ language:
6
+ - en
7
+ library_name: sentence-transformers
8
+ license: apache-2.0
9
+ metrics:
10
+ - cosine_accuracy
11
+ - dot_accuracy
12
+ - manhattan_accuracy
13
+ - euclidean_accuracy
14
+ - max_accuracy
15
+ pipeline_tag: sentence-similarity
16
+ tags:
17
+ - sentence-transformers
18
+ - sentence-similarity
19
+ - feature-extraction
20
+ - generated_from_trainer
21
+ - dataset_size:117502
22
+ - loss:MultipleNegativesRankingLoss
23
+ widget:
24
+ - source_sentence: Help me make the molecule CC(=O)OC[C@H](OC(C)=O)C(=O)N1CCCC[C@H]1C1CCN(C(=O)c2cc3ccccc3n2C)CC1
25
+ with the same hydrogen bond donors. The output molecule should be similar to the
26
+ input molecule. Please inform me of the number of hydrogen bond donor(s) of the
27
+ optimized molecule.
28
+ sentences:
29
+ - Your requirements guided the optimization, resulting in the molecule "CC(=O)OC(CCl)C(Cc1cccs1)[C@H](OC(C)=O)C(=O)N1CCCC[C@H]1C1CCN(C(=O)c2cc3ccccc3n2C)CC1"
30
+ with an approximate hydrogen bond donor(s) of 0.
31
+ - Given a molecule expressed in SMILES string, help me optimize it according to
32
+ my requirements.
33
+ - Help me adapt a molecular structure denoted in SMILES string based on my preferences.
34
+ - source_sentence: How can we modify the molecule CCC(CC)=C(CC)c1ccccc1OC(=O)OC(N=[N+]=[N-])c1ccccc1
35
+ to decrease its blood-brain barrier penetration (BBBP) value while keeping it
36
+ similar to the input molecule? Please inform me of the BBBP value of the optimized
37
+ molecule.
38
+ sentences:
39
+ - Describe a technology used for measuring people's emotional responses.
40
+ - I've successfully optimized the molecule according to your needs, resulting in
41
+ "CCOC(=O)c1ccccc1OC(=O)OC(N=[N+]=[N-])c1ccccc1" with an approximate BBBP value
42
+ of 0.71.
43
+ - Given a molecule expressed in SMILES string, help me optimize it according to
44
+ my requirements.
45
+ - source_sentence: How can we modify the molecule C/C(=C/C(=O)N1CC[C@H](CC(CCCCCC(CO)C(=O)O)NC(=O)OC(C)(C)C)[C@H]1c1cccnc1)C(=O)O
46
+ to increase its blood-brain barrier penetration (BBBP) value while keeping it
47
+ similar to the input molecule?
48
+ sentences:
49
+ - Given a molecule expressed in SMILES string, help me optimize it according to
50
+ my requirements.
51
+ - Aid me in refining a molecular structure written in SMILES notation based on my
52
+ criteria.
53
+ - Taking your requirements into account, I've optimized the molecule to "C/C(=C/C(=O)N1CC[C@H](CNC(=O)[C@H](CO)NC(=O)OC(C)(C)C)[C@H]1c1cccnc1)C(=O)O".
54
+ - source_sentence: Support me in transforming the molecule [SMILES] by incorporating
55
+ the same hydrogen bond acceptors and maintaining its resemblance to the original
56
+ molecule.
57
+ sentences:
58
+ - Taking your requirements into account, I've optimized the molecule to "CCOc1cccc(C2c3c(oc4ccc(C)cc4c3=O)C(=O)N2CCN(CC)CC)c1".
59
+ - Help me adapt a molecular structure denoted in SMILES string based on my preferences.
60
+ - Help me adapt a molecular structure denoted in SMILES string based on my preferences.
61
+ - source_sentence: With a molecule represented by the SMILES string CNNNCC(=O)N[C@H](C)C[C@@H](C)NCc1ccc2c(c1)CCC2,
62
+ propose adjustments that can increase its logP value while keeping the output
63
+ molecule structurally related to the input molecule.
64
+ sentences:
65
+ - Aid me in refining a molecular structure written in SMILES notation based on my
66
+ criteria.
67
+ - Given a molecule expressed in SMILES string, help me optimize it according to
68
+ my requirements.
69
+ - In line with your criteria, I've optimized the molecule and present it as "C[C@H](C[C@@H](C)NC(=O)COC(C)(C)C)NCc1ccc2c(c1)CCC2".
70
+ model-index:
71
+ - name: MPNet base trained on AllNLI triplets
72
+ results:
73
+ - task:
74
+ type: triplet
75
+ name: Triplet
76
+ dataset:
77
+ name: all nli dev
78
+ type: all-nli-dev
79
+ metrics:
80
+ - type: cosine_accuracy
81
+ value: 0.6562222222222223
82
+ name: Cosine Accuracy
83
+ - type: dot_accuracy
84
+ value: 0.5342222222222223
85
+ name: Dot Accuracy
86
+ - type: manhattan_accuracy
87
+ value: 0.7075555555555556
88
+ name: Manhattan Accuracy
89
+ - type: euclidean_accuracy
90
+ value: 0.6584444444444445
91
+ name: Euclidean Accuracy
92
+ - type: max_accuracy
93
+ value: 0.7075555555555556
94
+ name: Max Accuracy
95
+ - type: cosine_accuracy
96
+ value: 0.9804444444444445
97
+ name: Cosine Accuracy
98
+ - type: dot_accuracy
99
+ value: 0.01888888888888889
100
+ name: Dot Accuracy
101
+ - type: manhattan_accuracy
102
+ value: 0.9811111111111112
103
+ name: Manhattan Accuracy
104
+ - type: euclidean_accuracy
105
+ value: 0.9802222222222222
106
+ name: Euclidean Accuracy
107
+ - type: max_accuracy
108
+ value: 0.9811111111111112
109
+ name: Max Accuracy
110
+ ---
111
+
112
+ # MPNet base trained on AllNLI triplets
113
+
114
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) on the [sxc_med_llm_chemical_gen](https://huggingface.co/datasets/SwastikN/sxc_med_llm_chemical_gen) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
115
+
116
+ ## Model Details
117
+
118
+ ### Model Description
119
+ - **Model Type:** Sentence Transformer
120
+ - **Base model:** [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) <!-- at revision 6996ce1e91bd2a9c7d7f61daec37463394f73f09 -->
121
+ - **Maximum Sequence Length:** 512 tokens
122
+ - **Output Dimensionality:** 768 tokens
123
+ - **Similarity Function:** Cosine Similarity
124
+ - **Training Dataset:**
125
+ - [sxc_med_llm_chemical_gen](https://huggingface.co/datasets/SwastikN/sxc_med_llm_chemical_gen)
126
+ - **Language:** en
127
+ - **License:** apache-2.0
128
+
129
+ ### Model Sources
130
+
131
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
132
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
133
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
134
+
135
+ ### Full Model Architecture
136
+
137
+ ```
138
+ SentenceTransformer(
139
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
140
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
141
+ )
142
+ ```
143
+
144
+ ## Usage
145
+
146
+ ### Direct Usage (Sentence Transformers)
147
+
148
+ First install the Sentence Transformers library:
149
+
150
+ ```bash
151
+ pip install -U sentence-transformers
152
+ ```
153
+
154
+ Then you can load this model and run inference.
155
+ ```python
156
+ from sentence_transformers import SentenceTransformer
157
+
158
+ # Download from the 🤗 Hub
159
+ model = SentenceTransformer("Saideepthi55/sentencetransformer-ft")
160
+ # Run inference
161
+ sentences = [
162
+ 'With a molecule represented by the SMILES string CNNNCC(=O)N[C@H](C)C[C@@H](C)NCc1ccc2c(c1)CCC2, propose adjustments that can increase its logP value while keeping the output molecule structurally related to the input molecule.',
163
+ 'Given a molecule expressed in SMILES string, help me optimize it according to my requirements.',
164
+ 'In line with your criteria, I\'ve optimized the molecule and present it as "C[C@H](C[C@@H](C)NC(=O)COC(C)(C)C)NCc1ccc2c(c1)CCC2".',
165
+ ]
166
+ embeddings = model.encode(sentences)
167
+ print(embeddings.shape)
168
+ # [3, 768]
169
+
170
+ # Get the similarity scores for the embeddings
171
+ similarities = model.similarity(embeddings, embeddings)
172
+ print(similarities.shape)
173
+ # [3, 3]
174
+ ```
175
+
176
+ <!--
177
+ ### Direct Usage (Transformers)
178
+
179
+ <details><summary>Click to see the direct usage in Transformers</summary>
180
+
181
+ </details>
182
+ -->
183
+
184
+ <!--
185
+ ### Downstream Usage (Sentence Transformers)
186
+
187
+ You can finetune this model on your own dataset.
188
+
189
+ <details><summary>Click to expand</summary>
190
+
191
+ </details>
192
+ -->
193
+
194
+ <!--
195
+ ### Out-of-Scope Use
196
+
197
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
198
+ -->
199
+
200
+ ## Evaluation
201
+
202
+ ### Metrics
203
+
204
+ #### Triplet
205
+ * Dataset: `all-nli-dev`
206
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
207
+
208
+ | Metric | Value |
209
+ |:-------------------|:-----------|
210
+ | cosine_accuracy | 0.6562 |
211
+ | dot_accuracy | 0.5342 |
212
+ | manhattan_accuracy | 0.7076 |
213
+ | euclidean_accuracy | 0.6584 |
214
+ | **max_accuracy** | **0.7076** |
215
+
216
+ #### Triplet
217
+ * Dataset: `all-nli-dev`
218
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
219
+
220
+ | Metric | Value |
221
+ |:-------------------|:-----------|
222
+ | cosine_accuracy | 0.9804 |
223
+ | dot_accuracy | 0.0189 |
224
+ | manhattan_accuracy | 0.9811 |
225
+ | euclidean_accuracy | 0.9802 |
226
+ | **max_accuracy** | **0.9811** |
227
+
228
+ <!--
229
+ ## Bias, Risks and Limitations
230
+
231
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
232
+ -->
233
+
234
+ <!--
235
+ ### Recommendations
236
+
237
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
238
+ -->
239
+
240
+ ## Training Details
241
+
242
+ ### Training Dataset
243
+
244
+ #### sxc_med_llm_chemical_gen
245
+
246
+ * Dataset: [sxc_med_llm_chemical_gen](https://huggingface.co/datasets/SwastikN/sxc_med_llm_chemical_gen) at [ae28874](https://huggingface.co/datasets/SwastikN/sxc_med_llm_chemical_gen/tree/ae28874de7084969f9aa310b357ac699dbbafa23)
247
+ * Size: 117,502 training samples
248
+ * Columns: <code>input</code>, <code>instruction</code>, and <code>output</code>
249
+ * Approximate statistics based on the first 1000 samples:
250
+ | | input | instruction | output |
251
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
252
+ | type | string | string | string |
253
+ | details | <ul><li>min: 2 tokens</li><li>mean: 50.03 tokens</li><li>max: 156 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 16.19 tokens</li><li>max: 69 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 67.94 tokens</li><li>max: 321 tokens</li></ul> |
254
+ * Samples:
255
+ | input | instruction | output |
256
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
257
+ | <code></code> | <code>List the countries of the Middle East</code> | <code>Egypt, Israel, Iran, Iraq, Jordan, Kuwait, Lebanon, Qatar, Saudi Arabia, Syria, Turkey, The United Arab Emirates, Oman, and Yemen.</code> |
258
+ | <code>Support me in refining the molecule CCC(C)=C(CC)c1ccc(Nc2nc3nonc3nc2N2CCC(Oc3ccc(Cl)cc3)CC2)cc1 to have a more drug-like nature, ensuring it remains close to the input molecule. Please inform me of the QED value of the optimized molecule.</code> | <code>Help me adapt a molecular structure denoted in SMILES string based on my preferences.</code> | <code>I've optimized the molecule according to your requirements, yielding the structure "COC(=O)c1ccc(Nc2nc3nonc3nc2N2CCC(Oc3ccc(Cl)cc3)CC2)cc1" and an estimated QED value of 0.4.</code> |
259
+ | <code></code> | <code>Create a social media strategy to attract followers on Twitter.</code> | <code>The social media strategy for Twitter should involve Tweeting on a regular basis, with content that is engaging and relevant to the target audience. This can include retweetable quotes, links to videos and articles, sharing photos/images, polls, and so on. It is also beneficial for the information to include hashtags relevant to the content and any relevant topics of the industry. <br><br>It is also important to leverage Twitter influencers and accounts, as well as engaging with potential followers regularly and DM them when necessary. Additionally, running campaigns, contests and giveaways to gain more followers should also be part of the strategy. Finally, hashtag research should be conducted to increase the reach of the tweets.</code> |
260
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
261
+ ```json
262
+ {
263
+ "scale": 20.0,
264
+ "similarity_fct": "cos_sim"
265
+ }
266
+ ```
267
+
268
+ ### Evaluation Dataset
269
+
270
+ #### sxc_med_llm_chemical_gen
271
+
272
+ * Dataset: [sxc_med_llm_chemical_gen](https://huggingface.co/datasets/SwastikN/sxc_med_llm_chemical_gen) at [ae28874](https://huggingface.co/datasets/SwastikN/sxc_med_llm_chemical_gen/tree/ae28874de7084969f9aa310b357ac699dbbafa23)
273
+ * Size: 4,500 evaluation samples
274
+ * Columns: <code>input</code>, <code>instruction</code>, and <code>output</code>
275
+ * Approximate statistics based on the first 1000 samples:
276
+ | | input | instruction | output |
277
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
278
+ | type | string | string | string |
279
+ | details | <ul><li>min: 2 tokens</li><li>mean: 68.49 tokens</li><li>max: 165 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 16.89 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 69.16 tokens</li><li>max: 205 tokens</li></ul> |
280
+ * Samples:
281
+ | input | instruction | output |
282
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
283
+ | <code>With a molecule represented by the SMILES string COc1ccc(CN(C(=O)Nc2ccc([C@H](C#N)c3ccccc3)c(Cl)c2)C2CC2)cc1, propose adjustments that can decrease its blood-brain barrier permeability (BBBP) value by at least 0.1 compared to the pre-optimized value while keeping the output molecule structurally related to the input molecule. Please inform me of the BBBP value of the optimized molecule.</code> | <code>Help me adapt a molecular structure denoted in SMILES string based on my preferences.</code> | <code>I've optimized the molecule according to your requirements, yielding the structure "CC(C)c1cccc(Oc2ccc(CN(C(=O)Nc3ccc([C@H](C#N)c4ccccc4)c(Cl)c3)C3CC3)cc2)c1" and an estimated BBBP value of 0.34.</code> |
284
+ | <code>Which modifications can be applied to the molecule Cc1nn(-c2cc3cc(F)ccc3s2)c2ncc(C(=O)N3CCCC3c3nnc4n3CCCCC4)cc12 to increase its water solubility value while ensuring it remains similar to the original molecule?</code> | <code>Aid me in refining a molecular structure written in SMILES notation based on my criteria.</code> | <code>I've optimized the molecule according to your requirements, resulting in the following structure: "Cc1nn(C)c2ncc(C(=O)N3CCCC3c3nnc4n3CCCCC4)cc12".</code> |
285
+ | <code></code> | <code>Describe the benefits of having a mentor.</code> | <code>Having a mentor can provide tremendous benefits, including guidance, advice, support and connection to a network of contacts. Mentors can offer valuable feedback, help to create goals and strategies, and provide the accountability necessary to carry out the desired goals. They can also provide a fresh perspective which can help to create new ideas and solutions.</code> |
286
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
287
+ ```json
288
+ {
289
+ "scale": 20.0,
290
+ "similarity_fct": "cos_sim"
291
+ }
292
+ ```
293
+
294
+ ### Training Hyperparameters
295
+ #### Non-Default Hyperparameters
296
+
297
+ - `eval_strategy`: steps
298
+ - `learning_rate`: 2e-05
299
+ - `num_train_epochs`: 1
300
+ - `warmup_ratio`: 0.1
301
+ - `fp16`: True
302
+
303
+ #### All Hyperparameters
304
+ <details><summary>Click to expand</summary>
305
+
306
+ - `overwrite_output_dir`: False
307
+ - `do_predict`: False
308
+ - `eval_strategy`: steps
309
+ - `prediction_loss_only`: True
310
+ - `per_device_train_batch_size`: 8
311
+ - `per_device_eval_batch_size`: 8
312
+ - `per_gpu_train_batch_size`: None
313
+ - `per_gpu_eval_batch_size`: None
314
+ - `gradient_accumulation_steps`: 1
315
+ - `eval_accumulation_steps`: None
316
+ - `torch_empty_cache_steps`: None
317
+ - `learning_rate`: 2e-05
318
+ - `weight_decay`: 0.0
319
+ - `adam_beta1`: 0.9
320
+ - `adam_beta2`: 0.999
321
+ - `adam_epsilon`: 1e-08
322
+ - `max_grad_norm`: 1.0
323
+ - `num_train_epochs`: 1
324
+ - `max_steps`: -1
325
+ - `lr_scheduler_type`: linear
326
+ - `lr_scheduler_kwargs`: {}
327
+ - `warmup_ratio`: 0.1
328
+ - `warmup_steps`: 0
329
+ - `log_level`: passive
330
+ - `log_level_replica`: warning
331
+ - `log_on_each_node`: True
332
+ - `logging_nan_inf_filter`: True
333
+ - `save_safetensors`: True
334
+ - `save_on_each_node`: False
335
+ - `save_only_model`: False
336
+ - `restore_callback_states_from_checkpoint`: False
337
+ - `no_cuda`: False
338
+ - `use_cpu`: False
339
+ - `use_mps_device`: False
340
+ - `seed`: 42
341
+ - `data_seed`: None
342
+ - `jit_mode_eval`: False
343
+ - `use_ipex`: False
344
+ - `bf16`: False
345
+ - `fp16`: True
346
+ - `fp16_opt_level`: O1
347
+ - `half_precision_backend`: auto
348
+ - `bf16_full_eval`: False
349
+ - `fp16_full_eval`: False
350
+ - `tf32`: None
351
+ - `local_rank`: 0
352
+ - `ddp_backend`: None
353
+ - `tpu_num_cores`: None
354
+ - `tpu_metrics_debug`: False
355
+ - `debug`: []
356
+ - `dataloader_drop_last`: False
357
+ - `dataloader_num_workers`: 0
358
+ - `dataloader_prefetch_factor`: None
359
+ - `past_index`: -1
360
+ - `disable_tqdm`: False
361
+ - `remove_unused_columns`: True
362
+ - `label_names`: None
363
+ - `load_best_model_at_end`: False
364
+ - `ignore_data_skip`: False
365
+ - `fsdp`: []
366
+ - `fsdp_min_num_params`: 0
367
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
368
+ - `fsdp_transformer_layer_cls_to_wrap`: None
369
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
370
+ - `deepspeed`: None
371
+ - `label_smoothing_factor`: 0.0
372
+ - `optim`: adamw_torch
373
+ - `optim_args`: None
374
+ - `adafactor`: False
375
+ - `group_by_length`: False
376
+ - `length_column_name`: length
377
+ - `ddp_find_unused_parameters`: None
378
+ - `ddp_bucket_cap_mb`: None
379
+ - `ddp_broadcast_buffers`: False
380
+ - `dataloader_pin_memory`: True
381
+ - `dataloader_persistent_workers`: False
382
+ - `skip_memory_metrics`: True
383
+ - `use_legacy_prediction_loop`: False
384
+ - `push_to_hub`: False
385
+ - `resume_from_checkpoint`: None
386
+ - `hub_model_id`: None
387
+ - `hub_strategy`: every_save
388
+ - `hub_private_repo`: False
389
+ - `hub_always_push`: False
390
+ - `gradient_checkpointing`: False
391
+ - `gradient_checkpointing_kwargs`: None
392
+ - `include_inputs_for_metrics`: False
393
+ - `eval_do_concat_batches`: True
394
+ - `fp16_backend`: auto
395
+ - `push_to_hub_model_id`: None
396
+ - `push_to_hub_organization`: None
397
+ - `mp_parameters`:
398
+ - `auto_find_batch_size`: False
399
+ - `full_determinism`: False
400
+ - `torchdynamo`: None
401
+ - `ray_scope`: last
402
+ - `ddp_timeout`: 1800
403
+ - `torch_compile`: False
404
+ - `torch_compile_backend`: None
405
+ - `torch_compile_mode`: None
406
+ - `dispatch_batches`: None
407
+ - `split_batches`: None
408
+ - `include_tokens_per_second`: False
409
+ - `include_num_input_tokens_seen`: False
410
+ - `neftune_noise_alpha`: None
411
+ - `optim_target_modules`: None
412
+ - `batch_eval_metrics`: False
413
+ - `eval_on_start`: False
414
+ - `eval_use_gather_object`: False
415
+ - `batch_sampler`: batch_sampler
416
+ - `multi_dataset_batch_sampler`: proportional
417
+
418
+ </details>
419
+
420
+ ### Training Logs
421
+ | Epoch | Step | Training Loss | loss | all-nli-dev_max_accuracy |
422
+ |:------:|:----:|:-------------:|:------:|:------------------------:|
423
+ | 0 | 0 | - | - | 0.7076 |
424
+ | 0.0174 | 64 | - | - | 0.7156 |
425
+ | 0.0068 | 100 | 2.7336 | 2.6486 | 0.7524 |
426
+ | 0.0136 | 200 | 2.4965 | 1.9213 | 0.8162 |
427
+ | 0.0204 | 300 | 1.9042 | 1.7761 | 0.822 |
428
+ | 0.0272 | 400 | 1.6856 | 1.7172 | 0.8371 |
429
+ | 0.0340 | 500 | 1.6117 | 1.6916 | 0.8507 |
430
+ | 0.0408 | 600 | 1.5673 | 1.6809 | 0.8976 |
431
+ | 0.0477 | 700 | 1.5984 | 1.7052 | 0.9329 |
432
+ | 0.0545 | 800 | 1.5828 | 1.6841 | 0.9391 |
433
+ | 0.0613 | 900 | 1.5375 | 1.6534 | 0.9267 |
434
+ | 0.0681 | 1000 | 1.5561 | 1.6619 | 0.9509 |
435
+ | 0.0749 | 1100 | 1.4911 | 1.6538 | 0.9556 |
436
+ | 0.0817 | 1200 | 1.5075 | 1.6498 | 0.966 |
437
+ | 0.0885 | 1300 | 1.4722 | 1.6468 | 0.946 |
438
+ | 0.0953 | 1400 | 1.4806 | 1.6981 | 0.9631 |
439
+ | 0.1021 | 1500 | 1.4788 | 1.6335 | 0.9662 |
440
+ | 0.1089 | 1600 | 1.4668 | 1.6668 | 0.9731 |
441
+ | 0.1157 | 1700 | 1.4383 | 1.6473 | 0.9711 |
442
+ | 0.1225 | 1800 | 1.4549 | 1.6462 | 0.9713 |
443
+ | 0.1294 | 1900 | 1.4394 | 1.6184 | 0.9718 |
444
+ | 0.1362 | 2000 | 1.3861 | 1.6156 | 0.9676 |
445
+ | 0.1430 | 2100 | 1.4111 | 1.6045 | 0.9711 |
446
+ | 0.1498 | 2200 | 1.4286 | 1.6056 | 0.9782 |
447
+ | 0.1566 | 2300 | 1.4669 | 1.6174 | 0.9764 |
448
+ | 0.1634 | 2400 | 1.3761 | 1.6182 | 0.9776 |
449
+ | 0.1702 | 2500 | 1.4119 | 1.6150 | 0.9738 |
450
+ | 0.1770 | 2600 | 1.3625 | 1.5984 | 0.9776 |
451
+ | 0.1838 | 2700 | 1.3726 | 1.6092 | 0.9807 |
452
+ | 0.1906 | 2800 | 1.3265 | 1.6059 | 0.9789 |
453
+ | 0.1974 | 2900 | 1.3925 | 1.6004 | 0.978 |
454
+ | 0.2042 | 3000 | 1.3524 | 1.5964 | 0.9773 |
455
+ | 0.2111 | 3100 | 1.342 | 1.6213 | 0.9787 |
456
+ | 0.2179 | 3200 | 1.3478 | 1.6016 | 0.9822 |
457
+ | 0.2247 | 3300 | 1.3888 | 1.6038 | 0.9793 |
458
+ | 0.2315 | 3400 | 1.3328 | 1.5977 | 0.9813 |
459
+ | 0.2383 | 3500 | 1.372 | 1.6114 | 0.9824 |
460
+ | 0.2451 | 3600 | 1.3046 | 1.6082 | 0.9824 |
461
+ | 0.2519 | 3700 | 1.3857 | 1.5922 | 0.9824 |
462
+ | 0.2587 | 3800 | 1.3236 | 1.6127 | 0.9809 |
463
+ | 0.2655 | 3900 | 1.2929 | 1.5935 | 0.9824 |
464
+ | 0.2723 | 4000 | 1.3889 | 1.6047 | 0.9831 |
465
+ | 0.2791 | 4100 | 1.3509 | 1.6030 | 0.9844 |
466
+ | 0.2859 | 4200 | 1.3455 | 1.6099 | 0.9824 |
467
+ | 0.2928 | 4300 | 1.337 | 1.5939 | 0.984 |
468
+ | 0.2996 | 4400 | 1.3302 | 1.6057 | 0.9827 |
469
+ | 0.3064 | 4500 | 1.3377 | 1.6254 | 0.9833 |
470
+ | 0.3132 | 4600 | 1.3221 | 1.6020 | 0.9849 |
471
+ | 0.3200 | 4700 | 1.3209 | 1.6146 | 0.9824 |
472
+ | 0.3268 | 4800 | 1.354 | 1.6022 | 0.9824 |
473
+ | 0.3336 | 4900 | 1.3213 | 1.6136 | 0.9822 |
474
+ | 0.3404 | 5000 | 1.3484 | 1.5920 | 0.9807 |
475
+ | 0.3472 | 5100 | 1.3412 | 1.6106 | 0.978 |
476
+ | 0.3540 | 5200 | 1.3532 | 1.6001 | 0.9784 |
477
+ | 0.3608 | 5300 | 1.2984 | 1.6192 | 0.9762 |
478
+ | 0.3676 | 5400 | 1.3621 | 1.5850 | 0.98 |
479
+ | 0.3745 | 5500 | 1.2839 | 1.6158 | 0.9807 |
480
+ | 0.3813 | 5600 | 1.3664 | 1.6030 | 0.9831 |
481
+ | 0.3881 | 5700 | 1.327 | 1.6168 | 0.9822 |
482
+ | 0.3949 | 5800 | 1.3123 | 1.6040 | 0.982 |
483
+ | 0.4017 | 5900 | 1.3019 | 1.6092 | 0.9824 |
484
+ | 0.4085 | 6000 | 1.3908 | 1.5935 | 0.9829 |
485
+ | 0.4153 | 6100 | 1.3136 | 1.5916 | 0.9791 |
486
+ | 0.4221 | 6200 | 1.32 | 1.6091 | 0.9807 |
487
+ | 0.4289 | 6300 | 1.3018 | 1.6052 | 0.9827 |
488
+ | 0.4357 | 6400 | 1.3144 | 1.6083 | 0.9816 |
489
+ | 0.4425 | 6500 | 1.2865 | 1.6015 | 0.9829 |
490
+ | 0.4493 | 6600 | 1.2946 | 1.5882 | 0.9818 |
491
+ | 0.4562 | 6700 | 1.3245 | 1.5949 | 0.9824 |
492
+ | 0.4630 | 6800 | 1.3278 | 1.6081 | 0.9831 |
493
+ | 0.4698 | 6900 | 1.2842 | 1.6086 | 0.9836 |
494
+ | 0.4766 | 7000 | 1.3231 | 1.6170 | 0.9811 |
495
+
496
+
497
+ ### Framework Versions
498
+ - Python: 3.10.12
499
+ - Sentence Transformers: 3.1.0
500
+ - Transformers: 4.44.2
501
+ - PyTorch: 2.4.0+cu121
502
+ - Accelerate: 0.34.2
503
+ - Datasets: 3.0.0
504
+ - Tokenizers: 0.19.1
505
+
506
+ ## Citation
507
+
508
+ ### BibTeX
509
+
510
+ #### Sentence Transformers
511
+ ```bibtex
512
+ @inproceedings{reimers-2019-sentence-bert,
513
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
514
+ author = "Reimers, Nils and Gurevych, Iryna",
515
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
516
+ month = "11",
517
+ year = "2019",
518
+ publisher = "Association for Computational Linguistics",
519
+ url = "https://arxiv.org/abs/1908.10084",
520
+ }
521
+ ```
522
+
523
+ #### MultipleNegativesRankingLoss
524
+ ```bibtex
525
+ @misc{henderson2017efficient,
526
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
527
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
528
+ year={2017},
529
+ eprint={1705.00652},
530
+ archivePrefix={arXiv},
531
+ primaryClass={cs.CL}
532
+ }
533
+ ```
534
+
535
+ <!--
536
+ ## Glossary
537
+
538
+ *Clearly define terms in order to be accessible across audiences.*
539
+ -->
540
+
541
+ <!--
542
+ ## Model Card Authors
543
+
544
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
545
+ -->
546
+
547
+ <!--
548
+ ## Model Card Contact
549
+
550
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
551
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/mpnet-base",
3
+ "architectures": [
4
+ "MPNetModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "mpnet",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "relative_attention_num_buckets": 32,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.44.2",
23
+ "vocab_size": 30527
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.0",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.4.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff494406fd6659c3ef8748c54140e819067196ae4c6fb989cc6d50ac0173c457
3
+ size 437967672
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": true,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "mask_token": "<mask>",
58
+ "model_max_length": 512,
59
+ "pad_token": "<pad>",
60
+ "sep_token": "</s>",
61
+ "strip_accents": null,
62
+ "tokenize_chinese_chars": true,
63
+ "tokenizer_class": "MPNetTokenizer",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff