jondurbin commited on
Commit
258bef4
·
verified ·
1 Parent(s): 582b769

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -163,7 +163,7 @@ __*Only train splits are used, and a decontamination by cosine similarity is per
163
 
164
  ## Prompt formatting
165
 
166
- In sticking with the theme of the bagel, I didn't want to use a single prompt format, so I used 4 - vicuna, llama-2, alpaca, and chat-ml.
167
  I also didn't want to randomly select a single prompt format for each item (hoping each instruction would generalize more when used in a variety of prompt formats), so each instruction is converted into every prompt format (with 0.75 probability).
168
 
169
  This means each epoch of our fine-tune is the equivalent of 3 epochs.
@@ -223,12 +223,14 @@ print(tokenizer.apply_chat_template(chat, tokenize=False))
223
  </details>
224
 
225
  <details>
226
- <summary><b>ChatML</b></summary>
 
 
227
 
228
  ```text
229
- {bos}<|im_start|>{role}
230
  {text}
231
- <|im_end|>{eos}
232
  ```
233
  </details>
234
 
 
163
 
164
  ## Prompt formatting
165
 
166
+ In sticking with the theme of the bagel, I didn't want to use a single prompt format, so I used 4 - vicuna, llama-2, alpaca, and a modified chat-ml.
167
  I also didn't want to randomly select a single prompt format for each item (hoping each instruction would generalize more when used in a variety of prompt formats), so each instruction is converted into every prompt format (with 0.75 probability).
168
 
169
  This means each epoch of our fine-tune is the equivalent of 3 epochs.
 
223
  </details>
224
 
225
  <details>
226
+ <summary><b>ChatML (sort of)</b></summary>
227
+
228
+ ChatML special tokens are really obnoxious, so instead of enlarging the tokenizer and embedding layers (which decreases performance and causes inference problems in tensor parallelism), I just use BOS and EOS tokens instead of `<|im_start|>` and `<|im_end|>` - and no, I won't change this.
229
 
230
  ```text
231
+ {bos}{role}
232
  {text}
233
+ {eos}
234
  ```
235
  </details>
236