mogmix commited on
Commit
45d5f8b
·
verified ·
1 Parent(s): 2f832b4

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,731 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:6300
11
+ - loss:MatryoshkaLoss
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: BAAI/bge-base-en-v1.5
14
+ widget:
15
+ - source_sentence: Balance as of December 31, 2023 for Medicaid and Medicare Rebates
16
+ was $5,297 million, for Managed Care Rebates was $7,020 million, and for Wholesaler
17
+ Chargebacks was $1,172 million.
18
+ sentences:
19
+ - What can Membership Rewards points be redeemed for?
20
+ - What were the ending balances for Medicaid and Medicare Rebates, Managed Care
21
+ Rebates, and Wholesaler Chargebacks as of December 31, 2023?
22
+ - What was the percentage increase in the general and administrative expenses from
23
+ the fiscal year ending on October 2, 2022, to the fiscal year ending on October
24
+ 1, 2023?
25
+ - source_sentence: In analyzing goodwill for potential impairment in the quantitative
26
+ impairment test, the company uses the market approach, when available and appropriate,
27
+ or a combination of the income and market approaches to estimate the reporting
28
+ unit’s fair value.
29
+ sentences:
30
+ - What is the purpose of Visa according to the overview provided?
31
+ - What approaches does the company use to analyze goodwill for potential impairment
32
+ in the quantitative impairment test?
33
+ - What method is used to record amortization and costs for owned content that is
34
+ predominantly monetized on an individual basis?
35
+ - source_sentence: This report includes forward-looking statements within the meaning
36
+ of the Private Securities Litigation Reform Act of 1995, which are subject to
37
+ risks and uncertainties.
38
+ sentences:
39
+ - What are forward-looking statements in financial reports?
40
+ - What percentage of the Pharmacy & Consumer Wellness segment's revenues did the
41
+ pharmacy category constitute in 2023?
42
+ - What are the depreciation methods and useful life estimates for buildings, furniture,
43
+ and computer equipment as mentioned in the company's accounting policies?
44
+ - source_sentence: We would use the net proceeds from the sale of any securities offered
45
+ pursuant to the shelf registration statement for general corporate purposes, which
46
+ may include funding for working capital, financing capital expenditures, research
47
+ and development, and potential acquisitions or strategic alliances.
48
+ sentences:
49
+ - What measures does Goldman Sachs employ to handle their cyber incident response?
50
+ - What awards did the company receive in 2022 for environmental and safety achievements?
51
+ - How are the proceeds from the shelf registration statement planned to be used?
52
+ - source_sentence: We use a variety of practices to measure and support progress against
53
+ these growth behaviors and to ensure that our employees are engaged and fulfilled
54
+ at work.
55
+ sentences:
56
+ - How does the company measure and support employee engagement and cultural growth?
57
+ - How does the company's membership format affect its profitability?
58
+ - What is the maximum additional exclusivity period granted by the FDA for approved
59
+ drugs that undergo pediatric testing?
60
+ pipeline_tag: sentence-similarity
61
+ library_name: sentence-transformers
62
+ metrics:
63
+ - cosine_accuracy@1
64
+ - cosine_accuracy@3
65
+ - cosine_accuracy@5
66
+ - cosine_accuracy@10
67
+ - cosine_precision@1
68
+ - cosine_precision@3
69
+ - cosine_precision@5
70
+ - cosine_precision@10
71
+ - cosine_recall@1
72
+ - cosine_recall@3
73
+ - cosine_recall@5
74
+ - cosine_recall@10
75
+ - cosine_ndcg@10
76
+ - cosine_mrr@10
77
+ - cosine_map@100
78
+ model-index:
79
+ - name: BGE base Financial Matryoshka
80
+ results:
81
+ - task:
82
+ type: information-retrieval
83
+ name: Information Retrieval
84
+ dataset:
85
+ name: dim 768
86
+ type: dim_768
87
+ metrics:
88
+ - type: cosine_accuracy@1
89
+ value: 0.7071428571428572
90
+ name: Cosine Accuracy@1
91
+ - type: cosine_accuracy@3
92
+ value: 0.8314285714285714
93
+ name: Cosine Accuracy@3
94
+ - type: cosine_accuracy@5
95
+ value: 0.8728571428571429
96
+ name: Cosine Accuracy@5
97
+ - type: cosine_accuracy@10
98
+ value: 0.9228571428571428
99
+ name: Cosine Accuracy@10
100
+ - type: cosine_precision@1
101
+ value: 0.7071428571428572
102
+ name: Cosine Precision@1
103
+ - type: cosine_precision@3
104
+ value: 0.27714285714285714
105
+ name: Cosine Precision@3
106
+ - type: cosine_precision@5
107
+ value: 0.17457142857142854
108
+ name: Cosine Precision@5
109
+ - type: cosine_precision@10
110
+ value: 0.09228571428571428
111
+ name: Cosine Precision@10
112
+ - type: cosine_recall@1
113
+ value: 0.7071428571428572
114
+ name: Cosine Recall@1
115
+ - type: cosine_recall@3
116
+ value: 0.8314285714285714
117
+ name: Cosine Recall@3
118
+ - type: cosine_recall@5
119
+ value: 0.8728571428571429
120
+ name: Cosine Recall@5
121
+ - type: cosine_recall@10
122
+ value: 0.9228571428571428
123
+ name: Cosine Recall@10
124
+ - type: cosine_ndcg@10
125
+ value: 0.8152573597721203
126
+ name: Cosine Ndcg@10
127
+ - type: cosine_mrr@10
128
+ value: 0.7808815192743759
129
+ name: Cosine Mrr@10
130
+ - type: cosine_map@100
131
+ value: 0.7835857411528796
132
+ name: Cosine Map@100
133
+ - task:
134
+ type: information-retrieval
135
+ name: Information Retrieval
136
+ dataset:
137
+ name: dim 512
138
+ type: dim_512
139
+ metrics:
140
+ - type: cosine_accuracy@1
141
+ value: 0.6971428571428572
142
+ name: Cosine Accuracy@1
143
+ - type: cosine_accuracy@3
144
+ value: 0.8328571428571429
145
+ name: Cosine Accuracy@3
146
+ - type: cosine_accuracy@5
147
+ value: 0.8742857142857143
148
+ name: Cosine Accuracy@5
149
+ - type: cosine_accuracy@10
150
+ value: 0.9157142857142857
151
+ name: Cosine Accuracy@10
152
+ - type: cosine_precision@1
153
+ value: 0.6971428571428572
154
+ name: Cosine Precision@1
155
+ - type: cosine_precision@3
156
+ value: 0.2776190476190476
157
+ name: Cosine Precision@3
158
+ - type: cosine_precision@5
159
+ value: 0.17485714285714285
160
+ name: Cosine Precision@5
161
+ - type: cosine_precision@10
162
+ value: 0.09157142857142857
163
+ name: Cosine Precision@10
164
+ - type: cosine_recall@1
165
+ value: 0.6971428571428572
166
+ name: Cosine Recall@1
167
+ - type: cosine_recall@3
168
+ value: 0.8328571428571429
169
+ name: Cosine Recall@3
170
+ - type: cosine_recall@5
171
+ value: 0.8742857142857143
172
+ name: Cosine Recall@5
173
+ - type: cosine_recall@10
174
+ value: 0.9157142857142857
175
+ name: Cosine Recall@10
176
+ - type: cosine_ndcg@10
177
+ value: 0.8089182108201057
178
+ name: Cosine Ndcg@10
179
+ - type: cosine_mrr@10
180
+ value: 0.7743531746031744
181
+ name: Cosine Mrr@10
182
+ - type: cosine_map@100
183
+ value: 0.777472809187461
184
+ name: Cosine Map@100
185
+ - task:
186
+ type: information-retrieval
187
+ name: Information Retrieval
188
+ dataset:
189
+ name: dim 256
190
+ type: dim_256
191
+ metrics:
192
+ - type: cosine_accuracy@1
193
+ value: 0.6957142857142857
194
+ name: Cosine Accuracy@1
195
+ - type: cosine_accuracy@3
196
+ value: 0.83
197
+ name: Cosine Accuracy@3
198
+ - type: cosine_accuracy@5
199
+ value: 0.87
200
+ name: Cosine Accuracy@5
201
+ - type: cosine_accuracy@10
202
+ value: 0.91
203
+ name: Cosine Accuracy@10
204
+ - type: cosine_precision@1
205
+ value: 0.6957142857142857
206
+ name: Cosine Precision@1
207
+ - type: cosine_precision@3
208
+ value: 0.27666666666666667
209
+ name: Cosine Precision@3
210
+ - type: cosine_precision@5
211
+ value: 0.174
212
+ name: Cosine Precision@5
213
+ - type: cosine_precision@10
214
+ value: 0.09099999999999998
215
+ name: Cosine Precision@10
216
+ - type: cosine_recall@1
217
+ value: 0.6957142857142857
218
+ name: Cosine Recall@1
219
+ - type: cosine_recall@3
220
+ value: 0.83
221
+ name: Cosine Recall@3
222
+ - type: cosine_recall@5
223
+ value: 0.87
224
+ name: Cosine Recall@5
225
+ - type: cosine_recall@10
226
+ value: 0.91
227
+ name: Cosine Recall@10
228
+ - type: cosine_ndcg@10
229
+ value: 0.8052344976922489
230
+ name: Cosine Ndcg@10
231
+ - type: cosine_mrr@10
232
+ value: 0.7713877551020404
233
+ name: Cosine Mrr@10
234
+ - type: cosine_map@100
235
+ value: 0.7749003964653882
236
+ name: Cosine Map@100
237
+ - task:
238
+ type: information-retrieval
239
+ name: Information Retrieval
240
+ dataset:
241
+ name: dim 128
242
+ type: dim_128
243
+ metrics:
244
+ - type: cosine_accuracy@1
245
+ value: 0.6828571428571428
246
+ name: Cosine Accuracy@1
247
+ - type: cosine_accuracy@3
248
+ value: 0.8257142857142857
249
+ name: Cosine Accuracy@3
250
+ - type: cosine_accuracy@5
251
+ value: 0.8528571428571429
252
+ name: Cosine Accuracy@5
253
+ - type: cosine_accuracy@10
254
+ value: 0.9071428571428571
255
+ name: Cosine Accuracy@10
256
+ - type: cosine_precision@1
257
+ value: 0.6828571428571428
258
+ name: Cosine Precision@1
259
+ - type: cosine_precision@3
260
+ value: 0.2752380952380953
261
+ name: Cosine Precision@3
262
+ - type: cosine_precision@5
263
+ value: 0.17057142857142854
264
+ name: Cosine Precision@5
265
+ - type: cosine_precision@10
266
+ value: 0.09071428571428569
267
+ name: Cosine Precision@10
268
+ - type: cosine_recall@1
269
+ value: 0.6828571428571428
270
+ name: Cosine Recall@1
271
+ - type: cosine_recall@3
272
+ value: 0.8257142857142857
273
+ name: Cosine Recall@3
274
+ - type: cosine_recall@5
275
+ value: 0.8528571428571429
276
+ name: Cosine Recall@5
277
+ - type: cosine_recall@10
278
+ value: 0.9071428571428571
279
+ name: Cosine Recall@10
280
+ - type: cosine_ndcg@10
281
+ value: 0.7972100056891113
282
+ name: Cosine Ndcg@10
283
+ - type: cosine_mrr@10
284
+ value: 0.7619444444444445
285
+ name: Cosine Mrr@10
286
+ - type: cosine_map@100
287
+ value: 0.7654665230481205
288
+ name: Cosine Map@100
289
+ - task:
290
+ type: information-retrieval
291
+ name: Information Retrieval
292
+ dataset:
293
+ name: dim 64
294
+ type: dim_64
295
+ metrics:
296
+ - type: cosine_accuracy@1
297
+ value: 0.6371428571428571
298
+ name: Cosine Accuracy@1
299
+ - type: cosine_accuracy@3
300
+ value: 0.8042857142857143
301
+ name: Cosine Accuracy@3
302
+ - type: cosine_accuracy@5
303
+ value: 0.8428571428571429
304
+ name: Cosine Accuracy@5
305
+ - type: cosine_accuracy@10
306
+ value: 0.8814285714285715
307
+ name: Cosine Accuracy@10
308
+ - type: cosine_precision@1
309
+ value: 0.6371428571428571
310
+ name: Cosine Precision@1
311
+ - type: cosine_precision@3
312
+ value: 0.2680952380952381
313
+ name: Cosine Precision@3
314
+ - type: cosine_precision@5
315
+ value: 0.16857142857142854
316
+ name: Cosine Precision@5
317
+ - type: cosine_precision@10
318
+ value: 0.08814285714285712
319
+ name: Cosine Precision@10
320
+ - type: cosine_recall@1
321
+ value: 0.6371428571428571
322
+ name: Cosine Recall@1
323
+ - type: cosine_recall@3
324
+ value: 0.8042857142857143
325
+ name: Cosine Recall@3
326
+ - type: cosine_recall@5
327
+ value: 0.8428571428571429
328
+ name: Cosine Recall@5
329
+ - type: cosine_recall@10
330
+ value: 0.8814285714285715
331
+ name: Cosine Recall@10
332
+ - type: cosine_ndcg@10
333
+ value: 0.7645594630559873
334
+ name: Cosine Ndcg@10
335
+ - type: cosine_mrr@10
336
+ value: 0.7265028344671197
337
+ name: Cosine Mrr@10
338
+ - type: cosine_map@100
339
+ value: 0.7306525198080603
340
+ name: Cosine Map@100
341
+ ---
342
+
343
+ # BGE base Financial Matryoshka
344
+
345
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
346
+
347
+ ## Model Details
348
+
349
+ ### Model Description
350
+ - **Model Type:** Sentence Transformer
351
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
352
+ - **Maximum Sequence Length:** 512 tokens
353
+ - **Output Dimensionality:** 768 dimensions
354
+ - **Similarity Function:** Cosine Similarity
355
+ - **Training Dataset:**
356
+ - json
357
+ - **Language:** en
358
+ - **License:** apache-2.0
359
+
360
+ ### Model Sources
361
+
362
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
363
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
364
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
365
+
366
+ ### Full Model Architecture
367
+
368
+ ```
369
+ SentenceTransformer(
370
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
371
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
372
+ (2): Normalize()
373
+ )
374
+ ```
375
+
376
+ ## Usage
377
+
378
+ ### Direct Usage (Sentence Transformers)
379
+
380
+ First install the Sentence Transformers library:
381
+
382
+ ```bash
383
+ pip install -U sentence-transformers
384
+ ```
385
+
386
+ Then you can load this model and run inference.
387
+ ```python
388
+ from sentence_transformers import SentenceTransformer
389
+
390
+ # Download from the 🤗 Hub
391
+ model = SentenceTransformer("mogmix/bge-base-financial-matryoshka")
392
+ # Run inference
393
+ sentences = [
394
+ 'We use a variety of practices to measure and support progress against these growth behaviors and to ensure that our employees are engaged and fulfilled at work.',
395
+ 'How does the company measure and support employee engagement and cultural growth?',
396
+ "How does the company's membership format affect its profitability?",
397
+ ]
398
+ embeddings = model.encode(sentences)
399
+ print(embeddings.shape)
400
+ # [3, 768]
401
+
402
+ # Get the similarity scores for the embeddings
403
+ similarities = model.similarity(embeddings, embeddings)
404
+ print(similarities.shape)
405
+ # [3, 3]
406
+ ```
407
+
408
+ <!--
409
+ ### Direct Usage (Transformers)
410
+
411
+ <details><summary>Click to see the direct usage in Transformers</summary>
412
+
413
+ </details>
414
+ -->
415
+
416
+ <!--
417
+ ### Downstream Usage (Sentence Transformers)
418
+
419
+ You can finetune this model on your own dataset.
420
+
421
+ <details><summary>Click to expand</summary>
422
+
423
+ </details>
424
+ -->
425
+
426
+ <!--
427
+ ### Out-of-Scope Use
428
+
429
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
430
+ -->
431
+
432
+ ## Evaluation
433
+
434
+ ### Metrics
435
+
436
+ #### Information Retrieval
437
+
438
+ * Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64`
439
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
440
+
441
+ | Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
442
+ |:--------------------|:-----------|:-----------|:-----------|:-----------|:-----------|
443
+ | cosine_accuracy@1 | 0.7071 | 0.6971 | 0.6957 | 0.6829 | 0.6371 |
444
+ | cosine_accuracy@3 | 0.8314 | 0.8329 | 0.83 | 0.8257 | 0.8043 |
445
+ | cosine_accuracy@5 | 0.8729 | 0.8743 | 0.87 | 0.8529 | 0.8429 |
446
+ | cosine_accuracy@10 | 0.9229 | 0.9157 | 0.91 | 0.9071 | 0.8814 |
447
+ | cosine_precision@1 | 0.7071 | 0.6971 | 0.6957 | 0.6829 | 0.6371 |
448
+ | cosine_precision@3 | 0.2771 | 0.2776 | 0.2767 | 0.2752 | 0.2681 |
449
+ | cosine_precision@5 | 0.1746 | 0.1749 | 0.174 | 0.1706 | 0.1686 |
450
+ | cosine_precision@10 | 0.0923 | 0.0916 | 0.091 | 0.0907 | 0.0881 |
451
+ | cosine_recall@1 | 0.7071 | 0.6971 | 0.6957 | 0.6829 | 0.6371 |
452
+ | cosine_recall@3 | 0.8314 | 0.8329 | 0.83 | 0.8257 | 0.8043 |
453
+ | cosine_recall@5 | 0.8729 | 0.8743 | 0.87 | 0.8529 | 0.8429 |
454
+ | cosine_recall@10 | 0.9229 | 0.9157 | 0.91 | 0.9071 | 0.8814 |
455
+ | **cosine_ndcg@10** | **0.8153** | **0.8089** | **0.8052** | **0.7972** | **0.7646** |
456
+ | cosine_mrr@10 | 0.7809 | 0.7744 | 0.7714 | 0.7619 | 0.7265 |
457
+ | cosine_map@100 | 0.7836 | 0.7775 | 0.7749 | 0.7655 | 0.7307 |
458
+
459
+ <!--
460
+ ## Bias, Risks and Limitations
461
+
462
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
463
+ -->
464
+
465
+ <!--
466
+ ### Recommendations
467
+
468
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
469
+ -->
470
+
471
+ ## Training Details
472
+
473
+ ### Training Dataset
474
+
475
+ #### json
476
+
477
+ * Dataset: json
478
+ * Size: 6,300 training samples
479
+ * Columns: <code>positive</code> and <code>anchor</code>
480
+ * Approximate statistics based on the first 1000 samples:
481
+ | | positive | anchor |
482
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
483
+ | type | string | string |
484
+ | details | <ul><li>min: 4 tokens</li><li>mean: 45.46 tokens</li><li>max: 439 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 20.55 tokens</li><li>max: 41 tokens</li></ul> |
485
+ * Samples:
486
+ | positive | anchor |
487
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|
488
+ | <code>We believe our residential connectivity revenue will increase as a result of growth in average domestic broadband revenue per customer, as well as increases in domestic wireless and international connectivity revenue.</code> | <code>What are the projected trends for Comcast's residential connectivity revenue in 2023?</code> |
489
+ | <code>The company's Artificial Intelligence Platform (AIP) leverages machine learning technologies and LLMs within the Gotham and Foundry platforms to connect AI with enterprise data, aiding in decision-making processes.</code> | <code>How does the company integrate large language models with its software platforms?</code> |
490
+ | <code>The impairment charges for Depop and Elo7 were influenced by factors such as macroeconomic conditions including reopening and inflation, as well as management changes and revised projected cash flows affecting their fair values.</code> | <code>What factors contributed to the impairment charges for Depop and Elo7 in 2022?</code> |
491
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
492
+ ```json
493
+ {
494
+ "loss": "MultipleNegativesRankingLoss",
495
+ "matryoshka_dims": [
496
+ 768,
497
+ 512,
498
+ 256,
499
+ 128,
500
+ 64
501
+ ],
502
+ "matryoshka_weights": [
503
+ 1,
504
+ 1,
505
+ 1,
506
+ 1,
507
+ 1
508
+ ],
509
+ "n_dims_per_step": -1
510
+ }
511
+ ```
512
+
513
+ ### Training Hyperparameters
514
+ #### Non-Default Hyperparameters
515
+
516
+ - `eval_strategy`: epoch
517
+ - `per_device_train_batch_size`: 32
518
+ - `per_device_eval_batch_size`: 16
519
+ - `gradient_accumulation_steps`: 16
520
+ - `learning_rate`: 2e-05
521
+ - `num_train_epochs`: 4
522
+ - `lr_scheduler_type`: cosine
523
+ - `warmup_ratio`: 0.1
524
+ - `bf16`: True
525
+ - `tf32`: True
526
+ - `load_best_model_at_end`: True
527
+ - `optim`: adamw_torch_fused
528
+ - `batch_sampler`: no_duplicates
529
+
530
+ #### All Hyperparameters
531
+ <details><summary>Click to expand</summary>
532
+
533
+ - `overwrite_output_dir`: False
534
+ - `do_predict`: False
535
+ - `eval_strategy`: epoch
536
+ - `prediction_loss_only`: True
537
+ - `per_device_train_batch_size`: 32
538
+ - `per_device_eval_batch_size`: 16
539
+ - `per_gpu_train_batch_size`: None
540
+ - `per_gpu_eval_batch_size`: None
541
+ - `gradient_accumulation_steps`: 16
542
+ - `eval_accumulation_steps`: None
543
+ - `torch_empty_cache_steps`: None
544
+ - `learning_rate`: 2e-05
545
+ - `weight_decay`: 0.0
546
+ - `adam_beta1`: 0.9
547
+ - `adam_beta2`: 0.999
548
+ - `adam_epsilon`: 1e-08
549
+ - `max_grad_norm`: 1.0
550
+ - `num_train_epochs`: 4
551
+ - `max_steps`: -1
552
+ - `lr_scheduler_type`: cosine
553
+ - `lr_scheduler_kwargs`: {}
554
+ - `warmup_ratio`: 0.1
555
+ - `warmup_steps`: 0
556
+ - `log_level`: passive
557
+ - `log_level_replica`: warning
558
+ - `log_on_each_node`: True
559
+ - `logging_nan_inf_filter`: True
560
+ - `save_safetensors`: True
561
+ - `save_on_each_node`: False
562
+ - `save_only_model`: False
563
+ - `restore_callback_states_from_checkpoint`: False
564
+ - `no_cuda`: False
565
+ - `use_cpu`: False
566
+ - `use_mps_device`: False
567
+ - `seed`: 42
568
+ - `data_seed`: None
569
+ - `jit_mode_eval`: False
570
+ - `use_ipex`: False
571
+ - `bf16`: True
572
+ - `fp16`: False
573
+ - `fp16_opt_level`: O1
574
+ - `half_precision_backend`: auto
575
+ - `bf16_full_eval`: False
576
+ - `fp16_full_eval`: False
577
+ - `tf32`: True
578
+ - `local_rank`: 0
579
+ - `ddp_backend`: None
580
+ - `tpu_num_cores`: None
581
+ - `tpu_metrics_debug`: False
582
+ - `debug`: []
583
+ - `dataloader_drop_last`: False
584
+ - `dataloader_num_workers`: 0
585
+ - `dataloader_prefetch_factor`: None
586
+ - `past_index`: -1
587
+ - `disable_tqdm`: False
588
+ - `remove_unused_columns`: True
589
+ - `label_names`: None
590
+ - `load_best_model_at_end`: True
591
+ - `ignore_data_skip`: False
592
+ - `fsdp`: []
593
+ - `fsdp_min_num_params`: 0
594
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
595
+ - `fsdp_transformer_layer_cls_to_wrap`: None
596
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
597
+ - `deepspeed`: None
598
+ - `label_smoothing_factor`: 0.0
599
+ - `optim`: adamw_torch_fused
600
+ - `optim_args`: None
601
+ - `adafactor`: False
602
+ - `group_by_length`: False
603
+ - `length_column_name`: length
604
+ - `ddp_find_unused_parameters`: None
605
+ - `ddp_bucket_cap_mb`: None
606
+ - `ddp_broadcast_buffers`: False
607
+ - `dataloader_pin_memory`: True
608
+ - `dataloader_persistent_workers`: False
609
+ - `skip_memory_metrics`: True
610
+ - `use_legacy_prediction_loop`: False
611
+ - `push_to_hub`: False
612
+ - `resume_from_checkpoint`: None
613
+ - `hub_model_id`: None
614
+ - `hub_strategy`: every_save
615
+ - `hub_private_repo`: None
616
+ - `hub_always_push`: False
617
+ - `gradient_checkpointing`: False
618
+ - `gradient_checkpointing_kwargs`: None
619
+ - `include_inputs_for_metrics`: False
620
+ - `include_for_metrics`: []
621
+ - `eval_do_concat_batches`: True
622
+ - `fp16_backend`: auto
623
+ - `push_to_hub_model_id`: None
624
+ - `push_to_hub_organization`: None
625
+ - `mp_parameters`:
626
+ - `auto_find_batch_size`: False
627
+ - `full_determinism`: False
628
+ - `torchdynamo`: None
629
+ - `ray_scope`: last
630
+ - `ddp_timeout`: 1800
631
+ - `torch_compile`: False
632
+ - `torch_compile_backend`: None
633
+ - `torch_compile_mode`: None
634
+ - `dispatch_batches`: None
635
+ - `split_batches`: None
636
+ - `include_tokens_per_second`: False
637
+ - `include_num_input_tokens_seen`: False
638
+ - `neftune_noise_alpha`: None
639
+ - `optim_target_modules`: None
640
+ - `batch_eval_metrics`: False
641
+ - `eval_on_start`: False
642
+ - `use_liger_kernel`: False
643
+ - `eval_use_gather_object`: False
644
+ - `average_tokens_across_devices`: False
645
+ - `prompts`: None
646
+ - `batch_sampler`: no_duplicates
647
+ - `multi_dataset_batch_sampler`: proportional
648
+
649
+ </details>
650
+
651
+ ### Training Logs
652
+ | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
653
+ |:---------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
654
+ | 0.8122 | 10 | 1.5675 | - | - | - | - | - |
655
+ | 1.0 | 13 | - | 0.8000 | 0.7975 | 0.7897 | 0.7811 | 0.7419 |
656
+ | 1.5685 | 20 | 0.6203 | - | - | - | - | - |
657
+ | 2.0 | 26 | - | 0.8114 | 0.8063 | 0.8044 | 0.7928 | 0.7599 |
658
+ | 2.3249 | 30 | 0.4678 | - | - | - | - | - |
659
+ | 3.0 | 39 | - | 0.8152 | 0.8092 | 0.8046 | 0.7967 | 0.7660 |
660
+ | 3.0812 | 40 | 0.4106 | - | - | - | - | - |
661
+ | **3.731** | **48** | **-** | **0.8153** | **0.8089** | **0.8052** | **0.7972** | **0.7646** |
662
+
663
+ * The bold row denotes the saved checkpoint.
664
+
665
+ ### Framework Versions
666
+ - Python: 3.12.7
667
+ - Sentence Transformers: 3.3.1
668
+ - Transformers: 4.47.0
669
+ - PyTorch: 2.5.1+cu124
670
+ - Accelerate: 1.2.1
671
+ - Datasets: 3.2.0
672
+ - Tokenizers: 0.21.0
673
+
674
+ ## Citation
675
+
676
+ ### BibTeX
677
+
678
+ #### Sentence Transformers
679
+ ```bibtex
680
+ @inproceedings{reimers-2019-sentence-bert,
681
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
682
+ author = "Reimers, Nils and Gurevych, Iryna",
683
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
684
+ month = "11",
685
+ year = "2019",
686
+ publisher = "Association for Computational Linguistics",
687
+ url = "https://arxiv.org/abs/1908.10084",
688
+ }
689
+ ```
690
+
691
+ #### MatryoshkaLoss
692
+ ```bibtex
693
+ @misc{kusupati2024matryoshka,
694
+ title={Matryoshka Representation Learning},
695
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
696
+ year={2024},
697
+ eprint={2205.13147},
698
+ archivePrefix={arXiv},
699
+ primaryClass={cs.LG}
700
+ }
701
+ ```
702
+
703
+ #### MultipleNegativesRankingLoss
704
+ ```bibtex
705
+ @misc{henderson2017efficient,
706
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
707
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
708
+ year={2017},
709
+ eprint={1705.00652},
710
+ archivePrefix={arXiv},
711
+ primaryClass={cs.CL}
712
+ }
713
+ ```
714
+
715
+ <!--
716
+ ## Glossary
717
+
718
+ *Clearly define terms in order to be accessible across audiences.*
719
+ -->
720
+
721
+ <!--
722
+ ## Model Card Authors
723
+
724
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
725
+ -->
726
+
727
+ <!--
728
+ ## Model Card Contact
729
+
730
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
731
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.47.0",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.47.0",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b7df39b6fdc4e68cd753a4801f4f167727e22567db2c1a61732f607cf4468b9b
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff