murawaki commited on
Commit
29856d6
·
1 Parent(s): e029f14

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -33,14 +33,13 @@ You can use this model directly with a pipeline for text generation.
33
  {'generated_text': '昨日私は京都ではありませんが、自分の住んでる事について色々と'},
34
  {'generated_text': '昨日私は京都では地図を見ることしかしない、京福電車のホームで'},
35
  {'generated_text': '昨日私は京都でこみちに住み始めた時からある不思議な現象で、そ'}]
36
- ...
37
  ```
38
 
39
  You can also use this model to get the features of a given text.
40
 
41
  ## Vocabulary
42
 
43
- A character-level vocabulary of size 6K is used. To be precise, rare characters may be split into bytes because byte-level byte-pair encoding (BPE) is used. The BPE tokenizer was trained on a small subset of the training data. Since the data were converted into a one-character-per-line format, merge operations never transgressed character boundaries.
44
 
45
  ## Training data
46
 
 
33
  {'generated_text': '昨日私は京都ではありませんが、自分の住んでる事について色々と'},
34
  {'generated_text': '昨日私は京都では地図を見ることしかしない、京福電車のホームで'},
35
  {'generated_text': '昨日私は京都でこみちに住み始めた時からある不思議な現象で、そ'}]
 
36
  ```
37
 
38
  You can also use this model to get the features of a given text.
39
 
40
  ## Vocabulary
41
 
42
+ A character-level vocabulary of size 6K is used. To be precise, rare characters may be split into bytes because byte-level byte-pair encoding (BPE) is used. The BPE tokenizer was trained on a small subset of the training data. Since the data were converted into a one-character-per-line format, merge operations never go beyond character boundaries.
43
 
44
  ## Training data
45