TJKlein commited on
Commit
14068ab
·
1 Parent(s): 9975451

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -19,11 +19,16 @@ The model intended to be used for encoding sentences or short paragraphs. Given
19
  # Training data
20
 
21
  The model was trained on a random collection of **English** sentences from Wikipedia: [Training data file](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt)
 
 
22
 
23
  # Model Training
24
 
25
  <mark>In order to make use of the **few-shot** capability of **miCSE**, the mode needs to be trained on your data. The source code and instructions to do so will be provided shortly. Stay tuned :). </mark>
26
 
 
 
 
27
  # Model Usage
28
  ### Example 1) - Sentence Similarity
29
 
 
19
  # Training data
20
 
21
  The model was trained on a random collection of **English** sentences from Wikipedia: [Training data file](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt)
22
+ Training data consists of data splits of different sizes (from 10% to 0.0064%) of the SimCSE training corpus. Each split size comprises 5 files, each created with a different seed.
23
+ Data can be downloaded [here](https://huggingface.co/datasets/sap-ai-research/datasets-for-micse).
24
 
25
  # Model Training
26
 
27
  <mark>In order to make use of the **few-shot** capability of **miCSE**, the mode needs to be trained on your data. The source code and instructions to do so will be provided shortly. Stay tuned :). </mark>
28
 
29
+ ## Training Data
30
+
31
+
32
  # Model Usage
33
  ### Example 1) - Sentence Similarity
34