niting089 commited on
Commit
a8c5691
·
verified ·
1 Parent(s): a16dac0

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,618 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Snowflake/snowflake-arctic-embed-m
3
+ library_name: sentence-transformers
4
+ metrics:
5
+ - cosine_accuracy@1
6
+ - cosine_accuracy@3
7
+ - cosine_accuracy@5
8
+ - cosine_accuracy@10
9
+ - cosine_precision@1
10
+ - cosine_precision@3
11
+ - cosine_precision@5
12
+ - cosine_precision@10
13
+ - cosine_recall@1
14
+ - cosine_recall@3
15
+ - cosine_recall@5
16
+ - cosine_recall@10
17
+ - cosine_ndcg@10
18
+ - cosine_mrr@10
19
+ - cosine_map@100
20
+ - dot_accuracy@1
21
+ - dot_accuracy@3
22
+ - dot_accuracy@5
23
+ - dot_accuracy@10
24
+ - dot_precision@1
25
+ - dot_precision@3
26
+ - dot_precision@5
27
+ - dot_precision@10
28
+ - dot_recall@1
29
+ - dot_recall@3
30
+ - dot_recall@5
31
+ - dot_recall@10
32
+ - dot_ndcg@10
33
+ - dot_mrr@10
34
+ - dot_map@100
35
+ pipeline_tag: sentence-similarity
36
+ tags:
37
+ - sentence-transformers
38
+ - sentence-similarity
39
+ - feature-extraction
40
+ - generated_from_trainer
41
+ - dataset_size:600
42
+ - loss:MatryoshkaLoss
43
+ - loss:MultipleNegativesRankingLoss
44
+ widget:
45
+ - source_sentence: How does the Blueprint for an AI Bill of Rights aim to protect
46
+ the rights of the American public?
47
+ sentences:
48
+ - "and use prohibitions. You and your communities should be free from unchecked\
49
+ \ surveillance; surveillance \ntechnologies should be subject to heightened oversight\
50
+ \ that includes at least pre-deployment assessment of their \npotential harms\
51
+ \ and scope limits to protect privacy and civil liberties. Continuous surveillance\
52
+ \ and monitoring"
53
+ - "steps to move these principles into practice and promote common approaches that\
54
+ \ allow technological \ninnovation to flourish while protecting people from harm.\
55
+ \ \n9"
56
+ - "ABOUT THIS FRAMEWORK­­­­­\nThe Blueprint for an AI Bill of Rights is a set of\
57
+ \ five principles and associated practices to help guide the \ndesign, use, and\
58
+ \ deployment of automated systems to protect the rights of the American public\
59
+ \ in the age of \nartificial intel-ligence. Developed through extensive consultation\
60
+ \ with the American public, these principles are"
61
+ - source_sentence: How can organizations monitor the impact of proxy features on algorithmic
62
+ discrimination?
63
+ sentences:
64
+ - "sociodemographic variables that adjust or “correct” the algorithm’s output on\
65
+ \ the basis of a patient’s race or\nethnicity, which can lead to race-based health\
66
+ \ inequities.47\n25\nAlgorithmic \nDiscrimination \nProtections"
67
+ - "proxy; if needed, it may be possible to identify alternative attributes that\
68
+ \ can be used instead. At a minimum, \norganizations should ensure a proxy feature\
69
+ \ is not given undue weight and should monitor the system closely \nfor any resulting\
70
+ \ algorithmic discrimination. \n26\nAlgorithmic \nDiscrimination \nProtections"
71
+ - "velopment, and deployment of automated systems, and from the \ncompounded harm\
72
+ \ of its reuse. Independent evaluation and report­\ning that confirms that the\
73
+ \ system is safe and effective, including re­\nporting of steps taken to mitigate\
74
+ \ potential harms, should be per­\nformed and the results made public whenever\
75
+ \ possible. \n15"
76
+ - source_sentence: What measures can be taken to ensure that AI systems are designed
77
+ to be accessible for people with disabilities?
78
+ sentences:
79
+ - "potential for meaningful impact on people’s rights, opportunities, or access\
80
+ \ and include those to impacted \ncommunities that may not be direct users of\
81
+ \ the automated system, risks resulting from purposeful misuse of \nthe system,\
82
+ \ and other concerns identified via the consultation process. Assessment and,\
83
+ \ where possible, mea­"
84
+ - "and as a lifecycle minimum performance standard. Decision possibilities resulting\
85
+ \ from performance testing \nshould include the possibility of not deploying the\
86
+ \ system. \nRisk identification and mitigation. Before deployment, and in a proactive\
87
+ \ and ongoing manner, poten­\ntial risks of the automated system should be identified\
88
+ \ and mitigated. Identified risks should focus on the"
89
+ - "individuals \nand \ncommunities \nfrom algorithmic \ndiscrimination and to use\
90
+ \ and design systems in an equitable way. This protection should include proactive\
91
+ \ \nequity assessments as part of the system design, use of representative data\
92
+ \ and protection against proxies \nfor demographic features, ensuring accessibility\
93
+ \ for people with disabilities in design and development,"
94
+ - source_sentence: 'How should organizations address concerns raised during public
95
+ consultations regarding AI data processing and interpretation? '
96
+ sentences:
97
+ - "and testing and evaluation of AI technologies and systems. It is expected to\
98
+ \ be released in the winter of 2022-23. \n21"
99
+ - "provide guidance whenever automated systems can meaningfully impact the public’s\
100
+ \ rights, opportunities, \nor access to critical needs. \n3"
101
+ - "learning models or for other purposes, including how data sources were processed\
102
+ \ and interpreted, a \nsummary of what data might be missing, incomplete, or erroneous,\
103
+ \ and data relevancy justifications; the \nresults of public consultation such\
104
+ \ as concerns raised and any decisions made due to these concerns; risk"
105
+ - source_sentence: What role do ethical considerations play in the development and
106
+ implementation of automated systems?
107
+ sentences:
108
+ - "tial to meaningfully impact rights, opportunities, or access. Additionally, this\
109
+ \ framework does not analyze or \ntake a position on legislative and regulatory\
110
+ \ proposals in municipal, state, and federal government, or those in \nother countries.\
111
+ \ \nWe have seen modest progress in recent years, with some state and local governments\
112
+ \ responding to these prob­"
113
+ - '•
114
+
115
+ Searches for “Black girls,” “Asian girls,” or “Latina girls” return predominantly39
116
+ sexualized content, rather
117
+
118
+ than role models, toys, or activities.40 Some search engines have been working
119
+ to reduce the prevalence of
120
+
121
+ these results, but the problem remains.41
122
+
123
+
124
+
125
+ Advertisement delivery systems that predict who is most likely to click on a job
126
+ advertisement end up deliv-'
127
+ - "particularly relevant to automated systems, without articulating a specific set\
128
+ \ of FIPPs or scoping \napplicability or the interests served to a single particular\
129
+ \ domain, like privacy, civil rights and civil liberties, \nethics, or risk management.\
130
+ \ The Technical Companion builds on this prior work to provide practical next"
131
+ model-index:
132
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
133
+ results:
134
+ - task:
135
+ type: information-retrieval
136
+ name: Information Retrieval
137
+ dataset:
138
+ name: Unknown
139
+ type: unknown
140
+ metrics:
141
+ - type: cosine_accuracy@1
142
+ value: 0.83
143
+ name: Cosine Accuracy@1
144
+ - type: cosine_accuracy@3
145
+ value: 0.96
146
+ name: Cosine Accuracy@3
147
+ - type: cosine_accuracy@5
148
+ value: 0.98
149
+ name: Cosine Accuracy@5
150
+ - type: cosine_accuracy@10
151
+ value: 0.99
152
+ name: Cosine Accuracy@10
153
+ - type: cosine_precision@1
154
+ value: 0.83
155
+ name: Cosine Precision@1
156
+ - type: cosine_precision@3
157
+ value: 0.31999999999999995
158
+ name: Cosine Precision@3
159
+ - type: cosine_precision@5
160
+ value: 0.19599999999999995
161
+ name: Cosine Precision@5
162
+ - type: cosine_precision@10
163
+ value: 0.09899999999999999
164
+ name: Cosine Precision@10
165
+ - type: cosine_recall@1
166
+ value: 0.83
167
+ name: Cosine Recall@1
168
+ - type: cosine_recall@3
169
+ value: 0.96
170
+ name: Cosine Recall@3
171
+ - type: cosine_recall@5
172
+ value: 0.98
173
+ name: Cosine Recall@5
174
+ - type: cosine_recall@10
175
+ value: 0.99
176
+ name: Cosine Recall@10
177
+ - type: cosine_ndcg@10
178
+ value: 0.9195971547817925
179
+ name: Cosine Ndcg@10
180
+ - type: cosine_mrr@10
181
+ value: 0.8960000000000001
182
+ name: Cosine Mrr@10
183
+ - type: cosine_map@100
184
+ value: 0.8966666666666666
185
+ name: Cosine Map@100
186
+ - type: dot_accuracy@1
187
+ value: 0.83
188
+ name: Dot Accuracy@1
189
+ - type: dot_accuracy@3
190
+ value: 0.96
191
+ name: Dot Accuracy@3
192
+ - type: dot_accuracy@5
193
+ value: 0.98
194
+ name: Dot Accuracy@5
195
+ - type: dot_accuracy@10
196
+ value: 0.99
197
+ name: Dot Accuracy@10
198
+ - type: dot_precision@1
199
+ value: 0.83
200
+ name: Dot Precision@1
201
+ - type: dot_precision@3
202
+ value: 0.31999999999999995
203
+ name: Dot Precision@3
204
+ - type: dot_precision@5
205
+ value: 0.19599999999999995
206
+ name: Dot Precision@5
207
+ - type: dot_precision@10
208
+ value: 0.09899999999999999
209
+ name: Dot Precision@10
210
+ - type: dot_recall@1
211
+ value: 0.83
212
+ name: Dot Recall@1
213
+ - type: dot_recall@3
214
+ value: 0.96
215
+ name: Dot Recall@3
216
+ - type: dot_recall@5
217
+ value: 0.98
218
+ name: Dot Recall@5
219
+ - type: dot_recall@10
220
+ value: 0.99
221
+ name: Dot Recall@10
222
+ - type: dot_ndcg@10
223
+ value: 0.9195971547817925
224
+ name: Dot Ndcg@10
225
+ - type: dot_mrr@10
226
+ value: 0.8960000000000001
227
+ name: Dot Mrr@10
228
+ - type: dot_map@100
229
+ value: 0.8966666666666666
230
+ name: Dot Map@100
231
+ ---
232
+
233
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
234
+
235
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
236
+
237
+ ## Model Details
238
+
239
+ ### Model Description
240
+ - **Model Type:** Sentence Transformer
241
+ - **Base model:** [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) <!-- at revision e2b128b9fa60c82b4585512b33e1544224ffff42 -->
242
+ - **Maximum Sequence Length:** 512 tokens
243
+ - **Output Dimensionality:** 768 tokens
244
+ - **Similarity Function:** Cosine Similarity
245
+ <!-- - **Training Dataset:** Unknown -->
246
+ <!-- - **Language:** Unknown -->
247
+ <!-- - **License:** Unknown -->
248
+
249
+ ### Model Sources
250
+
251
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
252
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
253
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
254
+
255
+ ### Full Model Architecture
256
+
257
+ ```
258
+ SentenceTransformer(
259
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
260
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
261
+ (2): Normalize()
262
+ )
263
+ ```
264
+
265
+ ## Usage
266
+
267
+ ### Direct Usage (Sentence Transformers)
268
+
269
+ First install the Sentence Transformers library:
270
+
271
+ ```bash
272
+ pip install -U sentence-transformers
273
+ ```
274
+
275
+ Then you can load this model and run inference.
276
+ ```python
277
+ from sentence_transformers import SentenceTransformer
278
+
279
+ # Download from the 🤗 Hub
280
+ model = SentenceTransformer("niting089/finetuned_arctic")
281
+ # Run inference
282
+ sentences = [
283
+ 'What role do ethical considerations play in the development and implementation of automated systems?',
284
+ 'particularly relevant to automated systems, without articulating a specific set of FIPPs or scoping \napplicability or the interests served to a single particular domain, like privacy, civil rights and civil liberties, \nethics, or risk management. The Technical Companion builds on this prior work to provide practical next',
285
+ '•\nSearches for “Black girls,” “Asian girls,” or “Latina girls” return predominantly39 sexualized content, rather\nthan role models, toys, or activities.40 Some search engines have been working to reduce the prevalence of\nthese results, but the problem remains.41\n•\nAdvertisement delivery systems that predict who is most likely to click on a job advertisement end up deliv-',
286
+ ]
287
+ embeddings = model.encode(sentences)
288
+ print(embeddings.shape)
289
+ # [3, 768]
290
+
291
+ # Get the similarity scores for the embeddings
292
+ similarities = model.similarity(embeddings, embeddings)
293
+ print(similarities.shape)
294
+ # [3, 3]
295
+ ```
296
+
297
+ <!--
298
+ ### Direct Usage (Transformers)
299
+
300
+ <details><summary>Click to see the direct usage in Transformers</summary>
301
+
302
+ </details>
303
+ -->
304
+
305
+ <!--
306
+ ### Downstream Usage (Sentence Transformers)
307
+
308
+ You can finetune this model on your own dataset.
309
+
310
+ <details><summary>Click to expand</summary>
311
+
312
+ </details>
313
+ -->
314
+
315
+ <!--
316
+ ### Out-of-Scope Use
317
+
318
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
319
+ -->
320
+
321
+ ## Evaluation
322
+
323
+ ### Metrics
324
+
325
+ #### Information Retrieval
326
+
327
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
328
+
329
+ | Metric | Value |
330
+ |:--------------------|:-----------|
331
+ | cosine_accuracy@1 | 0.83 |
332
+ | cosine_accuracy@3 | 0.96 |
333
+ | cosine_accuracy@5 | 0.98 |
334
+ | cosine_accuracy@10 | 0.99 |
335
+ | cosine_precision@1 | 0.83 |
336
+ | cosine_precision@3 | 0.32 |
337
+ | cosine_precision@5 | 0.196 |
338
+ | cosine_precision@10 | 0.099 |
339
+ | cosine_recall@1 | 0.83 |
340
+ | cosine_recall@3 | 0.96 |
341
+ | cosine_recall@5 | 0.98 |
342
+ | cosine_recall@10 | 0.99 |
343
+ | cosine_ndcg@10 | 0.9196 |
344
+ | cosine_mrr@10 | 0.896 |
345
+ | **cosine_map@100** | **0.8967** |
346
+ | dot_accuracy@1 | 0.83 |
347
+ | dot_accuracy@3 | 0.96 |
348
+ | dot_accuracy@5 | 0.98 |
349
+ | dot_accuracy@10 | 0.99 |
350
+ | dot_precision@1 | 0.83 |
351
+ | dot_precision@3 | 0.32 |
352
+ | dot_precision@5 | 0.196 |
353
+ | dot_precision@10 | 0.099 |
354
+ | dot_recall@1 | 0.83 |
355
+ | dot_recall@3 | 0.96 |
356
+ | dot_recall@5 | 0.98 |
357
+ | dot_recall@10 | 0.99 |
358
+ | dot_ndcg@10 | 0.9196 |
359
+ | dot_mrr@10 | 0.896 |
360
+ | dot_map@100 | 0.8967 |
361
+
362
+ <!--
363
+ ## Bias, Risks and Limitations
364
+
365
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
366
+ -->
367
+
368
+ <!--
369
+ ### Recommendations
370
+
371
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
372
+ -->
373
+
374
+ ## Training Details
375
+
376
+ ### Training Dataset
377
+
378
+ #### Unnamed Dataset
379
+
380
+
381
+ * Size: 600 training samples
382
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
383
+ * Approximate statistics based on the first 600 samples:
384
+ | | sentence_0 | sentence_1 |
385
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
386
+ | type | string | string |
387
+ | details | <ul><li>min: 11 tokens</li><li>mean: 19.86 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 60.47 tokens</li><li>max: 94 tokens</li></ul> |
388
+ * Samples:
389
+ | sentence_0 | sentence_1 |
390
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
391
+ | <code>What are the key principles outlined in the AI Bill of Rights aimed at ensuring automated systems benefit the American people? </code> | <code>BLUEPRINT FOR AN <br>AI BILL OF <br>RIGHTS <br>MAKING AUTOMATED <br>SYSTEMS WORK FOR <br>THE AMERICAN PEOPLE <br>OCTOBER 2022</code> |
392
+ | <code>How does the AI Bill of Rights address potential ethical concerns related to automated decision-making systems?</code> | <code>BLUEPRINT FOR AN <br>AI BILL OF <br>RIGHTS <br>MAKING AUTOMATED <br>SYSTEMS WORK FOR <br>THE AMERICAN PEOPLE <br>OCTOBER 2022</code> |
393
+ | <code>What is the purpose of the Blueprint for an AI Bill of Rights as outlined by the White House Office of Science and Technology Policy? </code> | <code>About this Document <br>The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People was <br>published by the White House Office of Science and Technology Policy in October 2022. This framework was <br>released one year after OSTP announced the launch of a process to develop “a bill of rights for an AI-powered</code> |
394
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
395
+ ```json
396
+ {
397
+ "loss": "MultipleNegativesRankingLoss",
398
+ "matryoshka_dims": [
399
+ 768,
400
+ 512,
401
+ 256,
402
+ 128,
403
+ 64
404
+ ],
405
+ "matryoshka_weights": [
406
+ 1,
407
+ 1,
408
+ 1,
409
+ 1,
410
+ 1
411
+ ],
412
+ "n_dims_per_step": -1
413
+ }
414
+ ```
415
+
416
+ ### Training Hyperparameters
417
+ #### Non-Default Hyperparameters
418
+
419
+ - `eval_strategy`: steps
420
+ - `per_device_train_batch_size`: 20
421
+ - `per_device_eval_batch_size`: 20
422
+ - `num_train_epochs`: 5
423
+ - `multi_dataset_batch_sampler`: round_robin
424
+
425
+ #### All Hyperparameters
426
+ <details><summary>Click to expand</summary>
427
+
428
+ - `overwrite_output_dir`: False
429
+ - `do_predict`: False
430
+ - `eval_strategy`: steps
431
+ - `prediction_loss_only`: True
432
+ - `per_device_train_batch_size`: 20
433
+ - `per_device_eval_batch_size`: 20
434
+ - `per_gpu_train_batch_size`: None
435
+ - `per_gpu_eval_batch_size`: None
436
+ - `gradient_accumulation_steps`: 1
437
+ - `eval_accumulation_steps`: None
438
+ - `torch_empty_cache_steps`: None
439
+ - `learning_rate`: 5e-05
440
+ - `weight_decay`: 0.0
441
+ - `adam_beta1`: 0.9
442
+ - `adam_beta2`: 0.999
443
+ - `adam_epsilon`: 1e-08
444
+ - `max_grad_norm`: 1
445
+ - `num_train_epochs`: 5
446
+ - `max_steps`: -1
447
+ - `lr_scheduler_type`: linear
448
+ - `lr_scheduler_kwargs`: {}
449
+ - `warmup_ratio`: 0.0
450
+ - `warmup_steps`: 0
451
+ - `log_level`: passive
452
+ - `log_level_replica`: warning
453
+ - `log_on_each_node`: True
454
+ - `logging_nan_inf_filter`: True
455
+ - `save_safetensors`: True
456
+ - `save_on_each_node`: False
457
+ - `save_only_model`: False
458
+ - `restore_callback_states_from_checkpoint`: False
459
+ - `no_cuda`: False
460
+ - `use_cpu`: False
461
+ - `use_mps_device`: False
462
+ - `seed`: 42
463
+ - `data_seed`: None
464
+ - `jit_mode_eval`: False
465
+ - `use_ipex`: False
466
+ - `bf16`: False
467
+ - `fp16`: False
468
+ - `fp16_opt_level`: O1
469
+ - `half_precision_backend`: auto
470
+ - `bf16_full_eval`: False
471
+ - `fp16_full_eval`: False
472
+ - `tf32`: None
473
+ - `local_rank`: 0
474
+ - `ddp_backend`: None
475
+ - `tpu_num_cores`: None
476
+ - `tpu_metrics_debug`: False
477
+ - `debug`: []
478
+ - `dataloader_drop_last`: False
479
+ - `dataloader_num_workers`: 0
480
+ - `dataloader_prefetch_factor`: None
481
+ - `past_index`: -1
482
+ - `disable_tqdm`: False
483
+ - `remove_unused_columns`: True
484
+ - `label_names`: None
485
+ - `load_best_model_at_end`: False
486
+ - `ignore_data_skip`: False
487
+ - `fsdp`: []
488
+ - `fsdp_min_num_params`: 0
489
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
490
+ - `fsdp_transformer_layer_cls_to_wrap`: None
491
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
492
+ - `deepspeed`: None
493
+ - `label_smoothing_factor`: 0.0
494
+ - `optim`: adamw_torch
495
+ - `optim_args`: None
496
+ - `adafactor`: False
497
+ - `group_by_length`: False
498
+ - `length_column_name`: length
499
+ - `ddp_find_unused_parameters`: None
500
+ - `ddp_bucket_cap_mb`: None
501
+ - `ddp_broadcast_buffers`: False
502
+ - `dataloader_pin_memory`: True
503
+ - `dataloader_persistent_workers`: False
504
+ - `skip_memory_metrics`: True
505
+ - `use_legacy_prediction_loop`: False
506
+ - `push_to_hub`: False
507
+ - `resume_from_checkpoint`: None
508
+ - `hub_model_id`: None
509
+ - `hub_strategy`: every_save
510
+ - `hub_private_repo`: False
511
+ - `hub_always_push`: False
512
+ - `gradient_checkpointing`: False
513
+ - `gradient_checkpointing_kwargs`: None
514
+ - `include_inputs_for_metrics`: False
515
+ - `eval_do_concat_batches`: True
516
+ - `fp16_backend`: auto
517
+ - `push_to_hub_model_id`: None
518
+ - `push_to_hub_organization`: None
519
+ - `mp_parameters`:
520
+ - `auto_find_batch_size`: False
521
+ - `full_determinism`: False
522
+ - `torchdynamo`: None
523
+ - `ray_scope`: last
524
+ - `ddp_timeout`: 1800
525
+ - `torch_compile`: False
526
+ - `torch_compile_backend`: None
527
+ - `torch_compile_mode`: None
528
+ - `dispatch_batches`: None
529
+ - `split_batches`: None
530
+ - `include_tokens_per_second`: False
531
+ - `include_num_input_tokens_seen`: False
532
+ - `neftune_noise_alpha`: None
533
+ - `optim_target_modules`: None
534
+ - `batch_eval_metrics`: False
535
+ - `eval_on_start`: False
536
+ - `eval_use_gather_object`: False
537
+ - `batch_sampler`: batch_sampler
538
+ - `multi_dataset_batch_sampler`: round_robin
539
+
540
+ </details>
541
+
542
+ ### Training Logs
543
+ | Epoch | Step | cosine_map@100 |
544
+ |:------:|:----:|:--------------:|
545
+ | 1.0 | 30 | 0.8731 |
546
+ | 1.6667 | 50 | 0.89 |
547
+ | 2.0 | 60 | 0.895 |
548
+ | 3.0 | 90 | 0.8959 |
549
+ | 3.3333 | 100 | 0.8967 |
550
+
551
+
552
+ ### Framework Versions
553
+ - Python: 3.10.12
554
+ - Sentence Transformers: 3.1.1
555
+ - Transformers: 4.44.2
556
+ - PyTorch: 2.4.1+cu121
557
+ - Accelerate: 0.34.2
558
+ - Datasets: 3.0.0
559
+ - Tokenizers: 0.19.1
560
+
561
+ ## Citation
562
+
563
+ ### BibTeX
564
+
565
+ #### Sentence Transformers
566
+ ```bibtex
567
+ @inproceedings{reimers-2019-sentence-bert,
568
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
569
+ author = "Reimers, Nils and Gurevych, Iryna",
570
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
571
+ month = "11",
572
+ year = "2019",
573
+ publisher = "Association for Computational Linguistics",
574
+ url = "https://arxiv.org/abs/1908.10084",
575
+ }
576
+ ```
577
+
578
+ #### MatryoshkaLoss
579
+ ```bibtex
580
+ @misc{kusupati2024matryoshka,
581
+ title={Matryoshka Representation Learning},
582
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
583
+ year={2024},
584
+ eprint={2205.13147},
585
+ archivePrefix={arXiv},
586
+ primaryClass={cs.LG}
587
+ }
588
+ ```
589
+
590
+ #### MultipleNegativesRankingLoss
591
+ ```bibtex
592
+ @misc{henderson2017efficient,
593
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
594
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
595
+ year={2017},
596
+ eprint={1705.00652},
597
+ archivePrefix={arXiv},
598
+ primaryClass={cs.CL}
599
+ }
600
+ ```
601
+
602
+ <!--
603
+ ## Glossary
604
+
605
+ *Clearly define terms in order to be accessible across audiences.*
606
+ -->
607
+
608
+ <!--
609
+ ## Model Card Authors
610
+
611
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
612
+ -->
613
+
614
+ <!--
615
+ ## Model Card Contact
616
+
617
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
618
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "finetuned_arctic",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.44.2",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.4.1+cu121"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": null
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5de40de0b87a0fa4602678a65b38d93016dd2b71825d08129ceabce8c675f6b5
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "mask_token": "[MASK]",
48
+ "max_length": 512,
49
+ "model_max_length": 512,
50
+ "pad_to_multiple_of": null,
51
+ "pad_token": "[PAD]",
52
+ "pad_token_type_id": 0,
53
+ "padding_side": "right",
54
+ "sep_token": "[SEP]",
55
+ "stride": 0,
56
+ "strip_accents": null,
57
+ "tokenize_chinese_chars": true,
58
+ "tokenizer_class": "BertTokenizer",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "[UNK]"
62
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff