openGPT-X
/

Teuken-7B-instruct-research-v0.4

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mfromm commited on Oct 25, 2024

Commit

4acfc1e

·

verified ·

1 Parent(s): d4e0be4

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -131,13 +131,13 @@ This example demonstrates how to load the model and tokenizer, prepare input, ge
 ## Training Details
-### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 Teuken-7B-base-v0.4 was pre-trained on 4 trillion tokens of data from publicly available sources.
 The pretraining data has a cutoff of September 2023.
 For composing the final instruction-tuning dataset termed "Honey", we first include all German examples. We aim to include roughly the same amount of English examples, as we have German examples:
   1. Add all multi-turn examples

 ## Training Details
+### Pre-Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 Teuken-7B-base-v0.4 was pre-trained on 4 trillion tokens of data from publicly available sources.
 The pretraining data has a cutoff of September 2023.
+More information are available in our [preprint](http://arxiv.org/abs/2410.08800).
 For composing the final instruction-tuning dataset termed "Honey", we first include all German examples. We aim to include roughly the same amount of English examples, as we have German examples:
   1. Add all multi-turn examples