sachindatasociety commited on
Commit
36300c2
·
verified ·
1 Parent(s): e60d3db

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,424 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ library_name: sentence-transformers
4
+ pipeline_tag: sentence-similarity
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:143
11
+ - loss:MultipleNegativesRankingLoss
12
+ widget:
13
+ - source_sentence: 'JSON APIs: Node.js'
14
+ sentences:
15
+ - 'Prerequisite course required: RESTful APIs: Node.js'
16
+ - 'Course Name:JSON APIs: Node.js|Course Description:An introduction to JSON API,
17
+ using Node.js.|Course language: JavaScript|Prerequisite course required: RESTful
18
+ APIs: Node.js|Target Audience:Professionals who would like to learn the core concepts
19
+ of JSON API, using Node.js.'
20
+ - An introduction to JSON API, using Node.js.
21
+ - 'Course language: JavaScript'
22
+ - Professionals who would like to learn the core concepts of JSON API, using Node.js.
23
+ - source_sentence: Enzyme
24
+ sentences:
25
+ - For anyone who has built an application in React and wants to test the React components
26
+ - A course that explores Enzyme, which is a JavaScript utility for React applications.
27
+ The course equips users to simulate runs and test React components' outputs.
28
+ - 'Prerequisite course required: React Testing Library'
29
+ - 'Course language: TBD'
30
+ - 'Course Name:Enzyme|Course Description:A course that explores Enzyme, which is
31
+ a JavaScript utility for React applications. The course equips users to simulate
32
+ runs and test React components'' outputs.|Course language: TBD|Prerequisite course
33
+ required: React Testing Library|Target Audience:For anyone who has built an application
34
+ in React and wants to test the React components'
35
+ - source_sentence: 'React Ecosystem: State Management & Redux'
36
+ sentences:
37
+ - 'Course Name:React Ecosystem: State Management & Redux|Course Description:A course
38
+ that builds on the React Ecosystem. It explains how state management works in
39
+ React and goes over the Redux state management library|Course language: JavaScript|Prerequisite
40
+ course required: React Ecosystem: Forms|Target Audience:Professionals who would
41
+ like to learn about state management in React'
42
+ - 'Course language: JavaScript'
43
+ - 'Prerequisite course required: React Ecosystem: Forms'
44
+ - A course that builds on the React Ecosystem. It explains how state management
45
+ works in React and goes over the Redux state management library
46
+ - Professionals who would like to learn about state management in React
47
+ - source_sentence: Ensemble Methods in Python
48
+ sentences:
49
+ - 'Course language: Python'
50
+ - 'Prerequisite course required: Decision Trees'
51
+ - This course covers an overview of ensemble learning methods like random forest
52
+ and boosting. At the end of this course, students will be able to implement and
53
+ compare random forest algorithm and boosting.
54
+ - Professionals with some experience in building basic algorithms who would like
55
+ to expand their skill set to more advanced Python classification techniques.
56
+ - 'Course Name:Ensemble Methods in Python|Course Description:This course covers
57
+ an overview of ensemble learning methods like random forest and boosting. At the
58
+ end of this course, students will be able to implement and compare random forest
59
+ algorithm and boosting.|Course language: Python|Prerequisite course required:
60
+ Decision Trees|Target Audience:Professionals with some experience in building
61
+ basic algorithms who would like to expand their skill set to more advanced Python
62
+ classification techniques.'
63
+ - source_sentence: Visualizing Data with Matplotlib in Python
64
+ sentences:
65
+ - Professionals with basic Python experience who would like to expand their skill
66
+ set to more Python visualization techniques and tools.
67
+ - 'Prerequisite course required: Intro to Python'
68
+ - 'Course language: Python'
69
+ - 'Course Name:Visualizing Data with Matplotlib in Python|Course Description:This
70
+ course covers the basics of data visualization and exploratory data analysis.
71
+ It helps students learn different plots and their use cases.|Course language:
72
+ Python|Prerequisite course required: Intro to Python|Target Audience:Professionals
73
+ with basic Python experience who would like to expand their skill set to more
74
+ Python visualization techniques and tools.'
75
+ - This course covers the basics of data visualization and exploratory data analysis.
76
+ It helps students learn different plots and their use cases.
77
+ ---
78
+
79
+ # SentenceTransformer based on BAAI/bge-base-en-v1.5
80
+
81
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
82
+
83
+ ## Model Details
84
+
85
+ ### Model Description
86
+ - **Model Type:** Sentence Transformer
87
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
88
+ - **Maximum Sequence Length:** 512 tokens
89
+ - **Output Dimensionality:** 768 tokens
90
+ - **Similarity Function:** Cosine Similarity
91
+ <!-- - **Training Dataset:** Unknown -->
92
+ <!-- - **Language:** Unknown -->
93
+ <!-- - **License:** Unknown -->
94
+
95
+ ### Model Sources
96
+
97
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
98
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
99
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
100
+
101
+ ### Full Model Architecture
102
+
103
+ ```
104
+ SentenceTransformer(
105
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
106
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
107
+ (2): Normalize()
108
+ )
109
+ ```
110
+
111
+ ## Usage
112
+
113
+ ### Direct Usage (Sentence Transformers)
114
+
115
+ First install the Sentence Transformers library:
116
+
117
+ ```bash
118
+ pip install -U sentence-transformers
119
+ ```
120
+
121
+ Then you can load this model and run inference.
122
+ ```python
123
+ from sentence_transformers import SentenceTransformer
124
+
125
+ # Download from the 🤗 Hub
126
+ model = SentenceTransformer("datasocietyco/bge-base-en-v1.5-course-recommender-v2")
127
+ # Run inference
128
+ sentences = [
129
+ 'Visualizing Data with Matplotlib in Python',
130
+ 'This course covers the basics of data visualization and exploratory data analysis. It helps students learn different plots and their use cases.',
131
+ 'Course language: Python',
132
+ ]
133
+ embeddings = model.encode(sentences)
134
+ print(embeddings.shape)
135
+ # [3, 768]
136
+
137
+ # Get the similarity scores for the embeddings
138
+ similarities = model.similarity(embeddings, embeddings)
139
+ print(similarities.shape)
140
+ # [3, 3]
141
+ ```
142
+
143
+ <!--
144
+ ### Direct Usage (Transformers)
145
+
146
+ <details><summary>Click to see the direct usage in Transformers</summary>
147
+
148
+ </details>
149
+ -->
150
+
151
+ <!--
152
+ ### Downstream Usage (Sentence Transformers)
153
+
154
+ You can finetune this model on your own dataset.
155
+
156
+ <details><summary>Click to expand</summary>
157
+
158
+ </details>
159
+ -->
160
+
161
+ <!--
162
+ ### Out-of-Scope Use
163
+
164
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
165
+ -->
166
+
167
+ <!--
168
+ ## Bias, Risks and Limitations
169
+
170
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
171
+ -->
172
+
173
+ <!--
174
+ ### Recommendations
175
+
176
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
177
+ -->
178
+
179
+ ## Training Details
180
+
181
+ ### Training Dataset
182
+
183
+ #### Unnamed Dataset
184
+
185
+
186
+ * Size: 143 training samples
187
+ * Columns: <code>name</code>, <code>description</code>, <code>languages</code>, <code>prerequisites</code>, <code>target_audience</code>, and <code>combined</code>
188
+ * Approximate statistics based on the first 143 samples:
189
+ | | name | description | languages | prerequisites | target_audience | combined |
190
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
191
+ | type | string | string | string | string | string | string |
192
+ | details | <ul><li>min: 3 tokens</li><li>mean: 7.82 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 13 tokens</li><li>mean: 39.24 tokens</li><li>max: 117 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 6.57 tokens</li><li>max: 10 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 12.85 tokens</li><li>max: 22 tokens</li></ul> | <ul><li>min: 12 tokens</li><li>mean: 23.02 tokens</li><li>max: 54 tokens</li></ul> | <ul><li>min: 58 tokens</li><li>mean: 94.5 tokens</li><li>max: 187 tokens</li></ul> |
193
+ * Samples:
194
+ | name | description | languages | prerequisites | target_audience | combined |
195
+ |:----------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------|:-------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
196
+ | <code>Reinforcement Learning</code> | <code>This course covers the specialized branch of machine learning and deep learning called reinforcement learning (RL). By the end of this course students will be able to define RL use cases and real world scenarios where RL models are used, they will be able to create a simple RL model and evaluate its performance.</code> | <code>Course language: Python</code> | <code>Prerequisite course required: Working with Complex Pre-trained CNNs in Python</code> | <code>Professionals some Python experience who would like to expand their skillset to more advanced machine learning algorithms for reinforcement learning.</code> | <code>Course Name:Reinforcement Learning|Course Description:This course covers the specialized branch of machine learning and deep learning called reinforcement learning (RL). By the end of this course students will be able to define RL use cases and real world scenarios where RL models are used, they will be able to create a simple RL model and evaluate its performance.|Course language: Python|Prerequisite course required: Working with Complex Pre-trained CNNs in Python|Target Audience:Professionals some Python experience who would like to expand their skillset to more advanced machine learning algorithms for reinforcement learning.</code> |
197
+ | <code>Optimizing Ensemble Methods in Python</code> | <code>This course covers advanced topics in optimizing ensemble learning methods – specifically random forest and gradient boosting. Students will learn to implement base models and perform hyperparameter tuning to enhance the performance of models.</code> | <code>Course language: Python</code> | <code>Prerequisite course required: Ensemble Methods in Python</code> | <code>Professionals experience in ensemble methods and who want to enhance their skill set in advanced Python classification techniques.</code> | <code>Course Name:Optimizing Ensemble Methods in Python|Course Description:This course covers advanced topics in optimizing ensemble learning methods – specifically random forest and gradient boosting. Students will learn to implement base models and perform hyperparameter tuning to enhance the performance of models.|Course language: Python|Prerequisite course required: Ensemble Methods in Python|Target Audience:Professionals experience in ensemble methods and who want to enhance their skill set in advanced Python classification techniques.</code> |
198
+ | <code>Fundamentals of Accelerated Computing with OpenACC</code> | <code>Find out how to write and configure code parallelization with OpenACC, optimize memory movements between the CPU and GPU accelerator, and apply the techniques to accelerate a CPU-only Laplace Heat Equation to achieve performance gains.</code> | <code>Course language: Python</code> | <code>No prerequisite course required</code> | <code>Professionals who want to learn how to write code, configure code parallelization with OpenACC, optimize memory movements between the CPU and GPU accelerator, and implement the workflow learnt for massive performance gains.</code> | <code>Course Name:Fundamentals of Accelerated Computing with OpenACC|Course Description:Find out how to write and configure code parallelization with OpenACC, optimize memory movements between the CPU and GPU accelerator, and apply the techniques to accelerate a CPU-only Laplace Heat Equation to achieve performance gains.|Course language: Python|No prerequisite course required|Target Audience:Professionals who want to learn how to write code, configure code parallelization with OpenACC, optimize memory movements between the CPU and GPU accelerator, and implement the workflow learnt for massive performance gains.</code> |
199
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
200
+ ```json
201
+ {
202
+ "scale": 20.0,
203
+ "similarity_fct": "cos_sim"
204
+ }
205
+ ```
206
+
207
+ ### Evaluation Dataset
208
+
209
+ #### Unnamed Dataset
210
+
211
+
212
+ * Size: 36 evaluation samples
213
+ * Columns: <code>name</code>, <code>description</code>, <code>languages</code>, <code>prerequisites</code>, <code>target_audience</code>, and <code>combined</code>
214
+ * Approximate statistics based on the first 36 samples:
215
+ | | name | description | languages | prerequisites | target_audience | combined |
216
+ |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
217
+ | type | string | string | string | string | string | string |
218
+ | details | <ul><li>min: 3 tokens</li><li>mean: 7.92 tokens</li><li>max: 13 tokens</li></ul> | <ul><li>min: 13 tokens</li><li>mean: 46.39 tokens</li><li>max: 92 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 6.75 tokens</li><li>max: 10 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 13.47 tokens</li><li>max: 20 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 23.75 tokens</li><li>max: 54 tokens</li></ul> | <ul><li>min: 61 tokens</li><li>mean: 103.28 tokens</li><li>max: 165 tokens</li></ul> |
219
+ * Samples:
220
+ | name | description | languages | prerequisites | target_audience | combined |
221
+ |:----------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------|:----------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
222
+ | <code>Intro to CSS, Part 2</code> | <code>A course that continues to build on the foundational understanding of CSS syntax and allows students to work with responsive design and media queries.</code> | <code>Course language: CSS, HTML</code> | <code>Prerequisite course required: Intro to CSS, Part 1</code> | <code>Professionals who would like to continue learning the core concepts of CSS and be able to style simple web pages.</code> | <code>Course Name:Intro to CSS, Part 2|Course Description:A course that continues to build on the foundational understanding of CSS syntax and allows students to work with responsive design and media queries.|Course language: CSS, HTML|Prerequisite course required: Intro to CSS, Part 1|Target Audience:Professionals who would like to continue learning the core concepts of CSS and be able to style simple web pages.</code> |
223
+ | <code>Foundations of Statistics in Python</code> | <code>This course is designed for learners who would like to learn about statistics and apply it for decision-making. This course is a comprehensive review of statistical terms ranging from foundational (mean, median, mode, standard deviation, variance, covariance, correlation) to more complex concepts such as normality in data, confidence intervals, and p-values. Additional topics include how to calculate summary statistics and how to carry out hypothesis testing to inform decisions.</code> | <code>Course language: Python</code> | <code>Prerequisite course required: Intro to Visualization in Python</code> | <code>Professionals some Python experience who would like to expand their skill set to more advanced Python visualization techniques and tools.</code> | <code>Course Name:Foundations of Statistics in Python|Course Description:This course is designed for learners who would like to learn about statistics and apply it for decision-making. This course is a comprehensive review of statistical terms ranging from foundational (mean, median, mode, standard deviation, variance, covariance, correlation) to more complex concepts such as normality in data, confidence intervals, and p-values. Additional topics include how to calculate summary statistics and how to carry out hypothesis testing to inform decisions.|Course language: Python|Prerequisite course required: Intro to Visualization in Python|Target Audience:Professionals some Python experience who would like to expand their skill set to more advanced Python visualization techniques and tools.</code> |
224
+ | <code>Spherical k-Means and Hierarchical Clustering in R</code> | <code>This course covers the unsupervised learning method called clustering which is used to find patterns or groups in data without the need for labelled data. This course includes different methods of clustering on numerical data including density-based and hierarchical-based clustering and how to build, evaluate and interpret these models.</code> | <code>Course language: R</code> | <code>Prerequisite course required: Intro to Clustering in R</code> | <code>Professionals with some R experience who would like to expand their skillset to more clustering techniques like hierarchical clustering and DBSCAN.</code> | <code>Course Name:Spherical k-Means and Hierarchical Clustering in R|Course Description:This course covers the unsupervised learning method called clustering which is used to find patterns or groups in data without the need for labelled data. This course includes different methods of clustering on numerical data including density-based and hierarchical-based clustering and how to build, evaluate and interpret these models.|Course language: R|Prerequisite course required: Intro to Clustering in R|Target Audience:Professionals with some R experience who would like to expand their skillset to more clustering techniques like hierarchical clustering and DBSCAN.</code> |
225
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
226
+ ```json
227
+ {
228
+ "scale": 20.0,
229
+ "similarity_fct": "cos_sim"
230
+ }
231
+ ```
232
+
233
+ ### Training Hyperparameters
234
+ #### Non-Default Hyperparameters
235
+
236
+ - `eval_strategy`: steps
237
+ - `per_device_train_batch_size`: 16
238
+ - `per_device_eval_batch_size`: 16
239
+ - `learning_rate`: 3e-06
240
+ - `max_steps`: 64
241
+ - `warmup_ratio`: 0.1
242
+ - `batch_sampler`: no_duplicates
243
+
244
+ #### All Hyperparameters
245
+ <details><summary>Click to expand</summary>
246
+
247
+ - `overwrite_output_dir`: False
248
+ - `do_predict`: False
249
+ - `eval_strategy`: steps
250
+ - `prediction_loss_only`: True
251
+ - `per_device_train_batch_size`: 16
252
+ - `per_device_eval_batch_size`: 16
253
+ - `per_gpu_train_batch_size`: None
254
+ - `per_gpu_eval_batch_size`: None
255
+ - `gradient_accumulation_steps`: 1
256
+ - `eval_accumulation_steps`: None
257
+ - `torch_empty_cache_steps`: None
258
+ - `learning_rate`: 3e-06
259
+ - `weight_decay`: 0.0
260
+ - `adam_beta1`: 0.9
261
+ - `adam_beta2`: 0.999
262
+ - `adam_epsilon`: 1e-08
263
+ - `max_grad_norm`: 1.0
264
+ - `num_train_epochs`: 3.0
265
+ - `max_steps`: 64
266
+ - `lr_scheduler_type`: linear
267
+ - `lr_scheduler_kwargs`: {}
268
+ - `warmup_ratio`: 0.1
269
+ - `warmup_steps`: 0
270
+ - `log_level`: passive
271
+ - `log_level_replica`: warning
272
+ - `log_on_each_node`: True
273
+ - `logging_nan_inf_filter`: True
274
+ - `save_safetensors`: True
275
+ - `save_on_each_node`: False
276
+ - `save_only_model`: False
277
+ - `restore_callback_states_from_checkpoint`: False
278
+ - `no_cuda`: False
279
+ - `use_cpu`: False
280
+ - `use_mps_device`: False
281
+ - `seed`: 42
282
+ - `data_seed`: None
283
+ - `jit_mode_eval`: False
284
+ - `use_ipex`: False
285
+ - `bf16`: False
286
+ - `fp16`: False
287
+ - `fp16_opt_level`: O1
288
+ - `half_precision_backend`: auto
289
+ - `bf16_full_eval`: False
290
+ - `fp16_full_eval`: False
291
+ - `tf32`: None
292
+ - `local_rank`: 0
293
+ - `ddp_backend`: None
294
+ - `tpu_num_cores`: None
295
+ - `tpu_metrics_debug`: False
296
+ - `debug`: []
297
+ - `dataloader_drop_last`: False
298
+ - `dataloader_num_workers`: 0
299
+ - `dataloader_prefetch_factor`: None
300
+ - `past_index`: -1
301
+ - `disable_tqdm`: False
302
+ - `remove_unused_columns`: True
303
+ - `label_names`: None
304
+ - `load_best_model_at_end`: False
305
+ - `ignore_data_skip`: False
306
+ - `fsdp`: []
307
+ - `fsdp_min_num_params`: 0
308
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
309
+ - `fsdp_transformer_layer_cls_to_wrap`: None
310
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
311
+ - `deepspeed`: None
312
+ - `label_smoothing_factor`: 0.0
313
+ - `optim`: adamw_torch
314
+ - `optim_args`: None
315
+ - `adafactor`: False
316
+ - `group_by_length`: False
317
+ - `length_column_name`: length
318
+ - `ddp_find_unused_parameters`: None
319
+ - `ddp_bucket_cap_mb`: None
320
+ - `ddp_broadcast_buffers`: False
321
+ - `dataloader_pin_memory`: True
322
+ - `dataloader_persistent_workers`: False
323
+ - `skip_memory_metrics`: True
324
+ - `use_legacy_prediction_loop`: False
325
+ - `push_to_hub`: False
326
+ - `resume_from_checkpoint`: None
327
+ - `hub_model_id`: None
328
+ - `hub_strategy`: every_save
329
+ - `hub_private_repo`: False
330
+ - `hub_always_push`: False
331
+ - `gradient_checkpointing`: False
332
+ - `gradient_checkpointing_kwargs`: None
333
+ - `include_inputs_for_metrics`: False
334
+ - `eval_do_concat_batches`: True
335
+ - `fp16_backend`: auto
336
+ - `push_to_hub_model_id`: None
337
+ - `push_to_hub_organization`: None
338
+ - `mp_parameters`:
339
+ - `auto_find_batch_size`: False
340
+ - `full_determinism`: False
341
+ - `torchdynamo`: None
342
+ - `ray_scope`: last
343
+ - `ddp_timeout`: 1800
344
+ - `torch_compile`: False
345
+ - `torch_compile_backend`: None
346
+ - `torch_compile_mode`: None
347
+ - `dispatch_batches`: None
348
+ - `split_batches`: None
349
+ - `include_tokens_per_second`: False
350
+ - `include_num_input_tokens_seen`: False
351
+ - `neftune_noise_alpha`: None
352
+ - `optim_target_modules`: None
353
+ - `batch_eval_metrics`: False
354
+ - `eval_on_start`: False
355
+ - `use_liger_kernel`: False
356
+ - `eval_use_gather_object`: False
357
+ - `batch_sampler`: no_duplicates
358
+ - `multi_dataset_batch_sampler`: proportional
359
+
360
+ </details>
361
+
362
+ ### Training Logs
363
+ | Epoch | Step | Training Loss | loss |
364
+ |:------:|:----:|:-------------:|:------:|
365
+ | 2.2222 | 20 | 1.5188 | 1.1718 |
366
+ | 4.4444 | 40 | 1.0652 | 0.8327 |
367
+ | 6.6667 | 60 | 0.677 | 0.7192 |
368
+
369
+
370
+ ### Framework Versions
371
+ - Python: 3.9.13
372
+ - Sentence Transformers: 3.1.1
373
+ - Transformers: 4.45.1
374
+ - PyTorch: 2.2.2
375
+ - Accelerate: 0.34.2
376
+ - Datasets: 3.0.0
377
+ - Tokenizers: 0.20.0
378
+
379
+ ## Citation
380
+
381
+ ### BibTeX
382
+
383
+ #### Sentence Transformers
384
+ ```bibtex
385
+ @inproceedings{reimers-2019-sentence-bert,
386
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
387
+ author = "Reimers, Nils and Gurevych, Iryna",
388
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
389
+ month = "11",
390
+ year = "2019",
391
+ publisher = "Association for Computational Linguistics",
392
+ url = "https://arxiv.org/abs/1908.10084",
393
+ }
394
+ ```
395
+
396
+ #### MultipleNegativesRankingLoss
397
+ ```bibtex
398
+ @misc{henderson2017efficient,
399
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
400
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
401
+ year={2017},
402
+ eprint={1705.00652},
403
+ archivePrefix={arXiv},
404
+ primaryClass={cs.CL}
405
+ }
406
+ ```
407
+
408
+ <!--
409
+ ## Glossary
410
+
411
+ *Clearly define terms in order to be accessible across audiences.*
412
+ -->
413
+
414
+ <!--
415
+ ## Model Card Authors
416
+
417
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
418
+ -->
419
+
420
+ <!--
421
+ ## Model Card Contact
422
+
423
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
424
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.45.1",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.45.1",
5
+ "pytorch": "2.2.2"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79b4d26aaf77276af894a178bf468281f26c73ff30822e4261221540fe1a1991
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff