Update README.md
Browse files
README.md
CHANGED
@@ -19,11 +19,16 @@ The model intended to be used for encoding sentences or short paragraphs. Given
|
|
19 |
# Training data
|
20 |
|
21 |
The model was trained on a random collection of **English** sentences from Wikipedia: [Training data file](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt)
|
|
|
|
|
22 |
|
23 |
# Model Training
|
24 |
|
25 |
<mark>In order to make use of the **few-shot** capability of **miCSE**, the mode needs to be trained on your data. The source code and instructions to do so will be provided shortly. Stay tuned :). </mark>
|
26 |
|
|
|
|
|
|
|
27 |
# Model Usage
|
28 |
### Example 1) - Sentence Similarity
|
29 |
|
|
|
19 |
# Training data
|
20 |
|
21 |
The model was trained on a random collection of **English** sentences from Wikipedia: [Training data file](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt)
|
22 |
+
Training data consists of data splits of different sizes (from 10% to 0.0064%) of the SimCSE training corpus. Each split size comprises 5 files, each created with a different seed.
|
23 |
+
Data can be downloaded [here](https://huggingface.co/datasets/sap-ai-research/datasets-for-micse).
|
24 |
|
25 |
# Model Training
|
26 |
|
27 |
<mark>In order to make use of the **few-shot** capability of **miCSE**, the mode needs to be trained on your data. The source code and instructions to do so will be provided shortly. Stay tuned :). </mark>
|
28 |
|
29 |
+
## Training Data
|
30 |
+
|
31 |
+
|
32 |
# Model Usage
|
33 |
### Example 1) - Sentence Similarity
|
34 |
|