Saideepthi55 commited on
Commit
691ab76
·
verified ·
1 Parent(s): a55fe14

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,528 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: microsoft/mpnet-base
3
+ datasets:
4
+ - SwastikN/sxc_med_llm_chemical_gen
5
+ language:
6
+ - en
7
+ library_name: sentence-transformers
8
+ license: apache-2.0
9
+ metrics:
10
+ - cosine_accuracy
11
+ - dot_accuracy
12
+ - manhattan_accuracy
13
+ - euclidean_accuracy
14
+ - max_accuracy
15
+ pipeline_tag: sentence-similarity
16
+ tags:
17
+ - sentence-transformers
18
+ - sentence-similarity
19
+ - feature-extraction
20
+ - generated_from_trainer
21
+ - dataset_size:117502
22
+ - loss:MultipleNegativesRankingLoss
23
+ widget:
24
+ - source_sentence: Help me make the molecule CC(=O)OC[C@H](OC(C)=O)C(=O)N1CCCC[C@H]1C1CCN(C(=O)c2cc3ccccc3n2C)CC1
25
+ with the same hydrogen bond donors. The output molecule should be similar to the
26
+ input molecule. Please inform me of the number of hydrogen bond donor(s) of the
27
+ optimized molecule.
28
+ sentences:
29
+ - Your requirements guided the optimization, resulting in the molecule "CC(=O)OC(CCl)C(Cc1cccs1)[C@H](OC(C)=O)C(=O)N1CCCC[C@H]1C1CCN(C(=O)c2cc3ccccc3n2C)CC1"
30
+ with an approximate hydrogen bond donor(s) of 0.
31
+ - Given a molecule expressed in SMILES string, help me optimize it according to
32
+ my requirements.
33
+ - Help me adapt a molecular structure denoted in SMILES string based on my preferences.
34
+ - source_sentence: How can we modify the molecule CCC(CC)=C(CC)c1ccccc1OC(=O)OC(N=[N+]=[N-])c1ccccc1
35
+ to decrease its blood-brain barrier penetration (BBBP) value while keeping it
36
+ similar to the input molecule? Please inform me of the BBBP value of the optimized
37
+ molecule.
38
+ sentences:
39
+ - Describe a technology used for measuring people's emotional responses.
40
+ - I've successfully optimized the molecule according to your needs, resulting in
41
+ "CCOC(=O)c1ccccc1OC(=O)OC(N=[N+]=[N-])c1ccccc1" with an approximate BBBP value
42
+ of 0.71.
43
+ - Given a molecule expressed in SMILES string, help me optimize it according to
44
+ my requirements.
45
+ - source_sentence: How can we modify the molecule C/C(=C/C(=O)N1CC[C@H](CC(CCCCCC(CO)C(=O)O)NC(=O)OC(C)(C)C)[C@H]1c1cccnc1)C(=O)O
46
+ to increase its blood-brain barrier penetration (BBBP) value while keeping it
47
+ similar to the input molecule?
48
+ sentences:
49
+ - Given a molecule expressed in SMILES string, help me optimize it according to
50
+ my requirements.
51
+ - Aid me in refining a molecular structure written in SMILES notation based on my
52
+ criteria.
53
+ - Taking your requirements into account, I've optimized the molecule to "C/C(=C/C(=O)N1CC[C@H](CNC(=O)[C@H](CO)NC(=O)OC(C)(C)C)[C@H]1c1cccnc1)C(=O)O".
54
+ - source_sentence: Support me in transforming the molecule [SMILES] by incorporating
55
+ the same hydrogen bond acceptors and maintaining its resemblance to the original
56
+ molecule.
57
+ sentences:
58
+ - Taking your requirements into account, I've optimized the molecule to "CCOc1cccc(C2c3c(oc4ccc(C)cc4c3=O)C(=O)N2CCN(CC)CC)c1".
59
+ - Help me adapt a molecular structure denoted in SMILES string based on my preferences.
60
+ - Help me adapt a molecular structure denoted in SMILES string based on my preferences.
61
+ - source_sentence: With a molecule represented by the SMILES string CNNNCC(=O)N[C@H](C)C[C@@H](C)NCc1ccc2c(c1)CCC2,
62
+ propose adjustments that can increase its logP value while keeping the output
63
+ molecule structurally related to the input molecule.
64
+ sentences:
65
+ - Aid me in refining a molecular structure written in SMILES notation based on my
66
+ criteria.
67
+ - Given a molecule expressed in SMILES string, help me optimize it according to
68
+ my requirements.
69
+ - In line with your criteria, I've optimized the molecule and present it as "C[C@H](C[C@@H](C)NC(=O)COC(C)(C)C)NCc1ccc2c(c1)CCC2".
70
+ model-index:
71
+ - name: MPNet base trained on sxc_med_llm_chemical_gen
72
+ results:
73
+ - task:
74
+ type: triplet
75
+ name: Triplet
76
+ dataset:
77
+ name: sxc med llm chemical gen
78
+ type: sxc_med_llm_chemical_gen
79
+ metrics:
80
+ - type: cosine_accuracy
81
+ value: 0.9837777777777778
82
+ name: Cosine Accuracy
83
+ - type: dot_accuracy
84
+ value: 0.01622222222222222
85
+ name: Dot Accuracy
86
+ - type: manhattan_accuracy
87
+ value: 0.9826666666666667
88
+ name: Manhattan Accuracy
89
+ - type: euclidean_accuracy
90
+ value: 0.9835555555555555
91
+ name: Euclidean Accuracy
92
+ - type: max_accuracy
93
+ value: 0.9837777777777778
94
+ name: Max Accuracy
95
+ ---
96
+
97
+ # MPNet base trained on sxc_med_llm_chemical_gen
98
+
99
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) on the [sxc_med_llm_chemical_gen](https://huggingface.co/datasets/SwastikN/sxc_med_llm_chemical_gen) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
100
+
101
+ ## Model Details
102
+
103
+ ### Model Description
104
+ - **Model Type:** Sentence Transformer
105
+ - **Base model:** [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) <!-- at revision 6996ce1e91bd2a9c7d7f61daec37463394f73f09 -->
106
+ - **Maximum Sequence Length:** 512 tokens
107
+ - **Output Dimensionality:** 768 tokens
108
+ - **Similarity Function:** Cosine Similarity
109
+ - **Training Dataset:**
110
+ - [sxc_med_llm_chemical_gen](https://huggingface.co/datasets/SwastikN/sxc_med_llm_chemical_gen)
111
+ - **Language:** en
112
+ - **License:** apache-2.0
113
+
114
+ ### Model Sources
115
+
116
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
117
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
118
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
119
+
120
+ ### Full Model Architecture
121
+
122
+ ```
123
+ SentenceTransformer(
124
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
125
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
126
+ )
127
+ ```
128
+
129
+ ## Usage
130
+
131
+ ### Direct Usage (Sentence Transformers)
132
+
133
+ First install the Sentence Transformers library:
134
+
135
+ ```bash
136
+ pip install -U sentence-transformers
137
+ ```
138
+
139
+ Then you can load this model and run inference.
140
+ ```python
141
+ from sentence_transformers import SentenceTransformer
142
+
143
+ # Download from the 🤗 Hub
144
+ model = SentenceTransformer("Saideepthi55/sentencetransformer_ftmodel_on_chemical_dataset")
145
+ # Run inference
146
+ sentences = [
147
+ 'With a molecule represented by the SMILES string CNNNCC(=O)N[C@H](C)C[C@@H](C)NCc1ccc2c(c1)CCC2, propose adjustments that can increase its logP value while keeping the output molecule structurally related to the input molecule.',
148
+ 'Given a molecule expressed in SMILES string, help me optimize it according to my requirements.',
149
+ 'In line with your criteria, I\'ve optimized the molecule and present it as "C[C@H](C[C@@H](C)NC(=O)COC(C)(C)C)NCc1ccc2c(c1)CCC2".',
150
+ ]
151
+ embeddings = model.encode(sentences)
152
+ print(embeddings.shape)
153
+ # [3, 768]
154
+
155
+ # Get the similarity scores for the embeddings
156
+ similarities = model.similarity(embeddings, embeddings)
157
+ print(similarities.shape)
158
+ # [3, 3]
159
+ ```
160
+
161
+ <!--
162
+ ### Direct Usage (Transformers)
163
+
164
+ <details><summary>Click to see the direct usage in Transformers</summary>
165
+
166
+ </details>
167
+ -->
168
+
169
+ <!--
170
+ ### Downstream Usage (Sentence Transformers)
171
+
172
+ You can finetune this model on your own dataset.
173
+
174
+ <details><summary>Click to expand</summary>
175
+
176
+ </details>
177
+ -->
178
+
179
+ <!--
180
+ ### Out-of-Scope Use
181
+
182
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
183
+ -->
184
+
185
+ ## Evaluation
186
+
187
+ ### Metrics
188
+
189
+ #### Triplet
190
+ * Dataset: `sxc_med_llm_chemical_gen`
191
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
192
+
193
+ | Metric | Value |
194
+ |:-------------------|:-----------|
195
+ | cosine_accuracy | 0.9838 |
196
+ | dot_accuracy | 0.0162 |
197
+ | manhattan_accuracy | 0.9827 |
198
+ | euclidean_accuracy | 0.9836 |
199
+ | **max_accuracy** | **0.9838** |
200
+
201
+ <!--
202
+ ## Bias, Risks and Limitations
203
+
204
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
205
+ -->
206
+
207
+ <!--
208
+ ### Recommendations
209
+
210
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
211
+ -->
212
+
213
+ ## Training Details
214
+
215
+ ### Training Dataset
216
+
217
+ #### sxc_med_llm_chemical_gen
218
+
219
+ * Dataset: [sxc_med_llm_chemical_gen](https://huggingface.co/datasets/SwastikN/sxc_med_llm_chemical_gen) at [ae28874](https://huggingface.co/datasets/SwastikN/sxc_med_llm_chemical_gen/tree/ae28874de7084969f9aa310b357ac699dbbafa23)
220
+ * Size: 117,502 training samples
221
+ * Columns: <code>input</code>, <code>instruction</code>, and <code>output</code>
222
+ * Approximate statistics based on the first 1000 samples:
223
+ | | input | instruction | output |
224
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
225
+ | type | string | string | string |
226
+ | details | <ul><li>min: 2 tokens</li><li>mean: 50.03 tokens</li><li>max: 156 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 16.19 tokens</li><li>max: 69 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 67.94 tokens</li><li>max: 321 tokens</li></ul> |
227
+ * Samples:
228
+ | input | instruction | output |
229
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
230
+ | <code></code> | <code>List the countries of the Middle East</code> | <code>Egypt, Israel, Iran, Iraq, Jordan, Kuwait, Lebanon, Qatar, Saudi Arabia, Syria, Turkey, The United Arab Emirates, Oman, and Yemen.</code> |
231
+ | <code>Support me in refining the molecule CCC(C)=C(CC)c1ccc(Nc2nc3nonc3nc2N2CCC(Oc3ccc(Cl)cc3)CC2)cc1 to have a more drug-like nature, ensuring it remains close to the input molecule. Please inform me of the QED value of the optimized molecule.</code> | <code>Help me adapt a molecular structure denoted in SMILES string based on my preferences.</code> | <code>I've optimized the molecule according to your requirements, yielding the structure "COC(=O)c1ccc(Nc2nc3nonc3nc2N2CCC(Oc3ccc(Cl)cc3)CC2)cc1" and an estimated QED value of 0.4.</code> |
232
+ | <code></code> | <code>Create a social media strategy to attract followers on Twitter.</code> | <code>The social media strategy for Twitter should involve Tweeting on a regular basis, with content that is engaging and relevant to the target audience. This can include retweetable quotes, links to videos and articles, sharing photos/images, polls, and so on. It is also beneficial for the information to include hashtags relevant to the content and any relevant topics of the industry. <br><br>It is also important to leverage Twitter influencers and accounts, as well as engaging with potential followers regularly and DM them when necessary. Additionally, running campaigns, contests and giveaways to gain more followers should also be part of the strategy. Finally, hashtag research should be conducted to increase the reach of the tweets.</code> |
233
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
234
+ ```json
235
+ {
236
+ "scale": 20.0,
237
+ "similarity_fct": "cos_sim"
238
+ }
239
+ ```
240
+
241
+ ### Evaluation Dataset
242
+
243
+ #### sxc_med_llm_chemical_gen
244
+
245
+ * Dataset: [sxc_med_llm_chemical_gen](https://huggingface.co/datasets/SwastikN/sxc_med_llm_chemical_gen) at [ae28874](https://huggingface.co/datasets/SwastikN/sxc_med_llm_chemical_gen/tree/ae28874de7084969f9aa310b357ac699dbbafa23)
246
+ * Size: 4,500 evaluation samples
247
+ * Columns: <code>input</code>, <code>instruction</code>, and <code>output</code>
248
+ * Approximate statistics based on the first 1000 samples:
249
+ | | input | instruction | output |
250
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
251
+ | type | string | string | string |
252
+ | details | <ul><li>min: 2 tokens</li><li>mean: 68.49 tokens</li><li>max: 165 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 16.89 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 69.16 tokens</li><li>max: 205 tokens</li></ul> |
253
+ * Samples:
254
+ | input | instruction | output |
255
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
256
+ | <code>With a molecule represented by the SMILES string COc1ccc(CN(C(=O)Nc2ccc([C@H](C#N)c3ccccc3)c(Cl)c2)C2CC2)cc1, propose adjustments that can decrease its blood-brain barrier permeability (BBBP) value by at least 0.1 compared to the pre-optimized value while keeping the output molecule structurally related to the input molecule. Please inform me of the BBBP value of the optimized molecule.</code> | <code>Help me adapt a molecular structure denoted in SMILES string based on my preferences.</code> | <code>I've optimized the molecule according to your requirements, yielding the structure "CC(C)c1cccc(Oc2ccc(CN(C(=O)Nc3ccc([C@H](C#N)c4ccccc4)c(Cl)c3)C3CC3)cc2)c1" and an estimated BBBP value of 0.34.</code> |
257
+ | <code>Which modifications can be applied to the molecule Cc1nn(-c2cc3cc(F)ccc3s2)c2ncc(C(=O)N3CCCC3c3nnc4n3CCCCC4)cc12 to increase its water solubility value while ensuring it remains similar to the original molecule?</code> | <code>Aid me in refining a molecular structure written in SMILES notation based on my criteria.</code> | <code>I've optimized the molecule according to your requirements, resulting in the following structure: "Cc1nn(C)c2ncc(C(=O)N3CCCC3c3nnc4n3CCCCC4)cc12".</code> |
258
+ | <code></code> | <code>Describe the benefits of having a mentor.</code> | <code>Having a mentor can provide tremendous benefits, including guidance, advice, support and connection to a network of contacts. Mentors can offer valuable feedback, help to create goals and strategies, and provide the accountability necessary to carry out the desired goals. They can also provide a fresh perspective which can help to create new ideas and solutions.</code> |
259
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
260
+ ```json
261
+ {
262
+ "scale": 20.0,
263
+ "similarity_fct": "cos_sim"
264
+ }
265
+ ```
266
+
267
+ ### Training Hyperparameters
268
+ #### Non-Default Hyperparameters
269
+
270
+ - `eval_strategy`: steps
271
+ - `per_device_train_batch_size`: 16
272
+ - `per_device_eval_batch_size`: 16
273
+ - `learning_rate`: 2e-05
274
+ - `num_train_epochs`: 1
275
+ - `warmup_ratio`: 0.1
276
+ - `fp16`: True
277
+
278
+ #### All Hyperparameters
279
+ <details><summary>Click to expand</summary>
280
+
281
+ - `overwrite_output_dir`: False
282
+ - `do_predict`: False
283
+ - `eval_strategy`: steps
284
+ - `prediction_loss_only`: True
285
+ - `per_device_train_batch_size`: 16
286
+ - `per_device_eval_batch_size`: 16
287
+ - `per_gpu_train_batch_size`: None
288
+ - `per_gpu_eval_batch_size`: None
289
+ - `gradient_accumulation_steps`: 1
290
+ - `eval_accumulation_steps`: None
291
+ - `torch_empty_cache_steps`: None
292
+ - `learning_rate`: 2e-05
293
+ - `weight_decay`: 0.0
294
+ - `adam_beta1`: 0.9
295
+ - `adam_beta2`: 0.999
296
+ - `adam_epsilon`: 1e-08
297
+ - `max_grad_norm`: 1.0
298
+ - `num_train_epochs`: 1
299
+ - `max_steps`: -1
300
+ - `lr_scheduler_type`: linear
301
+ - `lr_scheduler_kwargs`: {}
302
+ - `warmup_ratio`: 0.1
303
+ - `warmup_steps`: 0
304
+ - `log_level`: passive
305
+ - `log_level_replica`: warning
306
+ - `log_on_each_node`: True
307
+ - `logging_nan_inf_filter`: True
308
+ - `save_safetensors`: True
309
+ - `save_on_each_node`: False
310
+ - `save_only_model`: False
311
+ - `restore_callback_states_from_checkpoint`: False
312
+ - `no_cuda`: False
313
+ - `use_cpu`: False
314
+ - `use_mps_device`: False
315
+ - `seed`: 42
316
+ - `data_seed`: None
317
+ - `jit_mode_eval`: False
318
+ - `use_ipex`: False
319
+ - `bf16`: False
320
+ - `fp16`: True
321
+ - `fp16_opt_level`: O1
322
+ - `half_precision_backend`: auto
323
+ - `bf16_full_eval`: False
324
+ - `fp16_full_eval`: False
325
+ - `tf32`: None
326
+ - `local_rank`: 0
327
+ - `ddp_backend`: None
328
+ - `tpu_num_cores`: None
329
+ - `tpu_metrics_debug`: False
330
+ - `debug`: []
331
+ - `dataloader_drop_last`: False
332
+ - `dataloader_num_workers`: 0
333
+ - `dataloader_prefetch_factor`: None
334
+ - `past_index`: -1
335
+ - `disable_tqdm`: False
336
+ - `remove_unused_columns`: True
337
+ - `label_names`: None
338
+ - `load_best_model_at_end`: False
339
+ - `ignore_data_skip`: False
340
+ - `fsdp`: []
341
+ - `fsdp_min_num_params`: 0
342
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
343
+ - `fsdp_transformer_layer_cls_to_wrap`: None
344
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
345
+ - `deepspeed`: None
346
+ - `label_smoothing_factor`: 0.0
347
+ - `optim`: adamw_torch
348
+ - `optim_args`: None
349
+ - `adafactor`: False
350
+ - `group_by_length`: False
351
+ - `length_column_name`: length
352
+ - `ddp_find_unused_parameters`: None
353
+ - `ddp_bucket_cap_mb`: None
354
+ - `ddp_broadcast_buffers`: False
355
+ - `dataloader_pin_memory`: True
356
+ - `dataloader_persistent_workers`: False
357
+ - `skip_memory_metrics`: True
358
+ - `use_legacy_prediction_loop`: False
359
+ - `push_to_hub`: False
360
+ - `resume_from_checkpoint`: None
361
+ - `hub_model_id`: None
362
+ - `hub_strategy`: every_save
363
+ - `hub_private_repo`: False
364
+ - `hub_always_push`: False
365
+ - `gradient_checkpointing`: False
366
+ - `gradient_checkpointing_kwargs`: None
367
+ - `include_inputs_for_metrics`: False
368
+ - `eval_do_concat_batches`: True
369
+ - `fp16_backend`: auto
370
+ - `push_to_hub_model_id`: None
371
+ - `push_to_hub_organization`: None
372
+ - `mp_parameters`:
373
+ - `auto_find_batch_size`: False
374
+ - `full_determinism`: False
375
+ - `torchdynamo`: None
376
+ - `ray_scope`: last
377
+ - `ddp_timeout`: 1800
378
+ - `torch_compile`: False
379
+ - `torch_compile_backend`: None
380
+ - `torch_compile_mode`: None
381
+ - `dispatch_batches`: None
382
+ - `split_batches`: None
383
+ - `include_tokens_per_second`: False
384
+ - `include_num_input_tokens_seen`: False
385
+ - `neftune_noise_alpha`: None
386
+ - `optim_target_modules`: None
387
+ - `batch_eval_metrics`: False
388
+ - `eval_on_start`: False
389
+ - `eval_use_gather_object`: False
390
+ - `batch_sampler`: batch_sampler
391
+ - `multi_dataset_batch_sampler`: proportional
392
+
393
+ </details>
394
+
395
+ ### Training Logs
396
+ | Epoch | Step | Training Loss | Validation Loss | sxc_med_llm_chemical_gen_max_accuracy |
397
+ |:------:|:----:|:-------------:|:---------------:|:-------------------------------------:|
398
+ | 0 | 0 | - | - | 0.7076 |
399
+ | 0.0136 | 100 | 4.1705 | 3.6314 | 0.7469 |
400
+ | 0.0272 | 200 | 3.0088 | 2.4771 | 0.8191 |
401
+ | 0.0408 | 300 | 2.3803 | 2.3765 | 0.8716 |
402
+ | 0.0545 | 400 | 2.2281 | 2.3122 | 0.9316 |
403
+ | 0.0681 | 500 | 2.1647 | 2.2997 | 0.9436 |
404
+ | 0.0817 | 600 | 2.1289 | 2.2663 | 0.9511 |
405
+ | 0.0953 | 700 | 2.0663 | 2.2601 | 0.9629 |
406
+ | 0.1089 | 800 | 2.065 | 2.2500 | 0.9687 |
407
+ | 0.1225 | 900 | 2.0399 | 2.2595 | 0.9693 |
408
+ | 0.1362 | 1000 | 1.9939 | 2.2375 | 0.9707 |
409
+ | 0.1498 | 1100 | 1.9858 | 2.2220 | 0.9684 |
410
+ | 0.1634 | 1200 | 2.0069 | 2.2265 | 0.9758 |
411
+ | 0.1770 | 1300 | 1.9591 | 2.2073 | 0.9702 |
412
+ | 0.1906 | 1400 | 1.9288 | 2.2078 | 0.976 |
413
+ | 0.2042 | 1500 | 1.9466 | 2.2036 | 0.9758 |
414
+ | 0.2179 | 1600 | 1.9175 | 2.2086 | 0.9764 |
415
+ | 0.2315 | 1700 | 1.8835 | 2.2329 | 0.9796 |
416
+ | 0.2451 | 1800 | 1.9134 | 2.2003 | 0.9796 |
417
+ | 0.2587 | 1900 | 1.8809 | 2.2003 | 0.9811 |
418
+ | 0.2723 | 2000 | 1.9263 | 2.2039 | 0.9824 |
419
+ | 0.2859 | 2100 | 1.9101 | 2.2084 | 0.9804 |
420
+ | 0.2996 | 2200 | 1.8846 | 2.2052 | 0.9831 |
421
+ | 0.3132 | 2300 | 1.8842 | 2.1903 | 0.9818 |
422
+ | 0.3268 | 2400 | 1.8945 | 2.1984 | 0.9807 |
423
+ | 0.3404 | 2500 | 1.9217 | 2.1859 | 0.9829 |
424
+ | 0.3540 | 2600 | 1.8704 | 2.1995 | 0.982 |
425
+ | 0.3676 | 2700 | 1.889 | 2.2038 | 0.9822 |
426
+ | 0.3813 | 2800 | 1.875 | 2.2079 | 0.9829 |
427
+ | 0.3949 | 2900 | 1.8792 | 2.1975 | 0.9833 |
428
+ | 0.4085 | 3000 | 1.882 | 2.1895 | 0.9796 |
429
+ | 0.4221 | 3100 | 1.8886 | 2.2115 | 0.9831 |
430
+ | 0.4357 | 3200 | 1.8629 | 2.2040 | 0.9838 |
431
+ | 0.4493 | 3300 | 1.8647 | 2.1973 | 0.9836 |
432
+ | 0.4630 | 3400 | 1.8888 | 2.1961 | 0.9838 |
433
+ | 0.4766 | 3500 | 1.8692 | 2.2027 | 0.9829 |
434
+ | 0.4902 | 3600 | 1.8846 | 2.1954 | 0.9838 |
435
+ | 0.5038 | 3700 | 1.8482 | 2.1888 | 0.9822 |
436
+ | 0.5174 | 3800 | 1.8527 | 2.1873 | 0.9824 |
437
+ | 0.5310 | 3900 | 1.8378 | 2.1940 | 0.9811 |
438
+ | 0.5447 | 4000 | 1.8679 | 2.2008 | 0.9833 |
439
+ | 0.5583 | 4100 | 1.8421 | 2.1845 | 0.9842 |
440
+ | 0.5719 | 4200 | 1.8325 | 2.1948 | 0.9847 |
441
+ | 0.5855 | 4300 | 1.8675 | 2.1750 | 0.9836 |
442
+ | 0.5991 | 4400 | 1.8483 | 2.1828 | 0.9831 |
443
+ | 0.6127 | 4500 | 1.854 | 2.1886 | 0.9831 |
444
+ | 0.6264 | 4600 | 1.827 | 2.1876 | 0.9824 |
445
+ | 0.6400 | 4700 | 1.8863 | 2.1849 | 0.9836 |
446
+ | 0.6536 | 4800 | 1.8919 | 2.1816 | 0.984 |
447
+ | 0.6672 | 4900 | 1.8211 | 2.1830 | 0.9847 |
448
+ | 0.6808 | 5000 | 1.8345 | 2.1847 | 0.9842 |
449
+ | 0.6944 | 5100 | 1.8685 | 2.1855 | 0.9853 |
450
+ | 0.7081 | 5200 | 1.85 | 2.1864 | 0.9844 |
451
+ | 0.7217 | 5300 | 1.8222 | 2.1875 | 0.9842 |
452
+ | 0.7353 | 5400 | 1.8179 | 2.1923 | 0.9844 |
453
+ | 0.7489 | 5500 | 1.7992 | 2.1909 | 0.9851 |
454
+ | 0.7625 | 5600 | 1.8495 | 2.1811 | 0.9847 |
455
+ | 0.7761 | 5700 | 1.808 | 2.1763 | 0.9842 |
456
+ | 0.7898 | 5800 | 1.8293 | 2.1861 | 0.9849 |
457
+ | 0.8034 | 5900 | 1.8184 | 2.1845 | 0.9851 |
458
+ | 0.8170 | 6000 | 1.8256 | 2.1956 | 0.9849 |
459
+ | 0.8306 | 6100 | 1.7904 | 2.1916 | 0.9842 |
460
+ | 0.8442 | 6200 | 1.8028 | 2.1918 | 0.9847 |
461
+ | 0.8578 | 6300 | 1.8316 | 2.1917 | 0.9856 |
462
+ | 0.8715 | 6400 | 1.7951 | 2.1929 | 0.9851 |
463
+ | 0.8851 | 6500 | 1.8175 | 2.1866 | 0.9847 |
464
+ | 0.8987 | 6600 | 1.8071 | 2.1899 | 0.9853 |
465
+ | 0.9123 | 6700 | 1.8632 | 2.1905 | 0.9844 |
466
+ | 0.9259 | 6800 | 1.8441 | 2.1885 | 0.984 |
467
+ | 0.9395 | 6900 | 1.8243 | 2.1865 | 0.9836 |
468
+ | 0.9532 | 7000 | 1.8055 | 2.1852 | 0.9842 |
469
+ | 0.9668 | 7100 | 1.8227 | 2.1843 | 0.984 |
470
+ | 0.9804 | 7200 | 1.8287 | 2.1831 | 0.984 |
471
+ | 0.9940 | 7300 | 1.8379 | 2.1838 | 0.9838 |
472
+
473
+
474
+ ### Framework Versions
475
+ - Python: 3.10.12
476
+ - Sentence Transformers: 3.2.0
477
+ - Transformers: 4.44.2
478
+ - PyTorch: 2.4.1+cu121
479
+ - Accelerate: 0.34.2
480
+ - Datasets: 3.0.1
481
+ - Tokenizers: 0.19.1
482
+
483
+ ## Citation
484
+
485
+ ### BibTeX
486
+
487
+ #### Sentence Transformers
488
+ ```bibtex
489
+ @inproceedings{reimers-2019-sentence-bert,
490
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
491
+ author = "Reimers, Nils and Gurevych, Iryna",
492
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
493
+ month = "11",
494
+ year = "2019",
495
+ publisher = "Association for Computational Linguistics",
496
+ url = "https://arxiv.org/abs/1908.10084",
497
+ }
498
+ ```
499
+
500
+ #### MultipleNegativesRankingLoss
501
+ ```bibtex
502
+ @misc{henderson2017efficient,
503
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
504
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
505
+ year={2017},
506
+ eprint={1705.00652},
507
+ archivePrefix={arXiv},
508
+ primaryClass={cs.CL}
509
+ }
510
+ ```
511
+
512
+ <!--
513
+ ## Glossary
514
+
515
+ *Clearly define terms in order to be accessible across audiences.*
516
+ -->
517
+
518
+ <!--
519
+ ## Model Card Authors
520
+
521
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
522
+ -->
523
+
524
+ <!--
525
+ ## Model Card Contact
526
+
527
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
528
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/mpnet-base",
3
+ "architectures": [
4
+ "MPNetModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "mpnet",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "relative_attention_num_buckets": 32,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.44.2",
23
+ "vocab_size": 30527
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.2.0",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.4.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:948f8fffe8431527aec8961c9df8ffe97d5b0c89a8ae2c8f3a176336fa8885ab
3
+ size 437967672
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": true,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "mask_token": "<mask>",
58
+ "model_max_length": 512,
59
+ "pad_token": "<pad>",
60
+ "sep_token": "</s>",
61
+ "strip_accents": null,
62
+ "tokenize_chinese_chars": true,
63
+ "tokenizer_class": "MPNetTokenizer",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff