Update README.md
Browse files
README.md
CHANGED
@@ -46,8 +46,6 @@ widget:
|
|
46 |
- Un gato está mirando hacia la cámara también.
|
47 |
- '"Sí, no deseo estar presente durante este testimonio", declaró tranquilamente
|
48 |
Peterson, de 31 años, al juez cuando fue devuelto a su celda.'
|
49 |
-
datasets:
|
50 |
-
- clibrain/stsb_multi_es_aug_gpt3.5-turbo_2
|
51 |
pipeline_tag: sentence-similarity
|
52 |
library_name: sentence-transformers
|
53 |
metrics:
|
@@ -190,7 +188,7 @@ model-index:
|
|
190 |
|
191 |
# SentenceTransformer based on nomic-ai/modernbert-embed-base
|
192 |
|
193 |
-
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) on the
|
194 |
|
195 |
## Model Details
|
196 |
|
@@ -201,9 +199,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [n
|
|
201 |
- **Output Dimensionality:** 768 dimensions
|
202 |
- **Similarity Function:** Cosine Similarity
|
203 |
- **Training Dataset:**
|
204 |
-
-
|
205 |
-
<!-- - **Language:** Unknown -->
|
206 |
-
<!-- - **License:** Unknown -->
|
207 |
|
208 |
### Model Sources
|
209 |
|
@@ -307,9 +303,8 @@ You can finetune this model on your own dataset.
|
|
307 |
|
308 |
### Training Dataset
|
309 |
|
310 |
-
####
|
311 |
|
312 |
-
* Dataset: [stsb_multi_es_aug_gpt3.5-turbo_2](https://huggingface.co/datasets/clibrain/stsb_multi_es_aug_gpt3.5-turbo_2) at [3567b77](https://huggingface.co/datasets/clibrain/stsb_multi_es_aug_gpt3.5-turbo_2/tree/3567b77024bc5cc6372e058c9f05107deb361664)
|
313 |
* Size: 2,697 training samples
|
314 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
|
315 |
* Approximate statistics based on the first 1000 samples:
|
@@ -347,9 +342,8 @@ You can finetune this model on your own dataset.
|
|
347 |
|
348 |
### Evaluation Dataset
|
349 |
|
350 |
-
####
|
351 |
|
352 |
-
* Dataset: [stsb_multi_es_aug_gpt3.5-turbo_2](https://huggingface.co/datasets/clibrain/stsb_multi_es_aug_gpt3.5-turbo_2) at [3567b77](https://huggingface.co/datasets/clibrain/stsb_multi_es_aug_gpt3.5-turbo_2/tree/3567b77024bc5cc6372e058c9f05107deb361664)
|
353 |
* Size: 697 evaluation samples
|
354 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
|
355 |
* Approximate statistics based on the first 697 samples:
|
|
|
46 |
- Un gato está mirando hacia la cámara también.
|
47 |
- '"Sí, no deseo estar presente durante este testimonio", declaró tranquilamente
|
48 |
Peterson, de 31 años, al juez cuando fue devuelto a su celda.'
|
|
|
|
|
49 |
pipeline_tag: sentence-similarity
|
50 |
library_name: sentence-transformers
|
51 |
metrics:
|
|
|
188 |
|
189 |
# SentenceTransformer based on nomic-ai/modernbert-embed-base
|
190 |
|
191 |
+
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) on the stsb_multi_es_augmented (private) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
192 |
|
193 |
## Model Details
|
194 |
|
|
|
199 |
- **Output Dimensionality:** 768 dimensions
|
200 |
- **Similarity Function:** Cosine Similarity
|
201 |
- **Training Dataset:**
|
202 |
+
- Private stsb dataset
|
|
|
|
|
203 |
|
204 |
### Model Sources
|
205 |
|
|
|
303 |
|
304 |
### Training Dataset
|
305 |
|
306 |
+
#### stsb_multi_es_augmented (private)
|
307 |
|
|
|
308 |
* Size: 2,697 training samples
|
309 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
|
310 |
* Approximate statistics based on the first 1000 samples:
|
|
|
342 |
|
343 |
### Evaluation Dataset
|
344 |
|
345 |
+
#### stsb_multi_es_augmented (private)
|
346 |
|
|
|
347 |
* Size: 697 evaluation samples
|
348 |
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
|
349 |
* Approximate statistics based on the first 697 samples:
|