--- language: - de - bg - cs - da - el - en - es - et - fi - fr - ga - hr - hu - it - lt - lv - mt - nl - pl - pt - ro - sl - sv - sk metrics: - accuracy - bleu pipeline_tag: text-generation library_name: transformers base_model: - openGPT-X/Teuken-7B-base-v0.4 license: apache-2.0 --- # Model Card for Teuken-7B-instruct-v0.4 Teuken-7B-instruct-v0.4 is an instruction-tuned version of Teuken-7B-base-v0.4. ### Model Description - **Developed by:** Fraunhofer IAIS - **Funded by:** German Federal Ministry of Economics and Climate Protection (BMWK) in the context of the OpenGPT-X project - **Model type:** Transformer based decoder-only model - **Language(s) (NLP):** bg, cs, da, de, el, en, es, et, fi, fr, ga, hr, hu, it, lt, lv, mt, nl, pl, pt, ro, sk, sl, sv - **Shared by:** Fraunhofer IAIS ## Uses Teuken-7B-instruct-v0.4 is intended for commercial and research use in all official 24 European languages. Since Teuken-7B-chat-v0.4 focuses on covering all 24 EU languages, it renders more stable results across these languages and better reflects European values in its answers than English-centric models. It is therefore specialized for use in multilingual tasks. ### Out-of-Scope Use The model is not intended for use in math and coding tasks. ## Bias, Risks, and Limitations Teuken-7B-instruct-v0.4 is an instruction-tuned version of Teuken-7B-base-v0.4 that is not completely free from biases and hallucinations. ## How to Get Started with the Model ## Usage The model requires transformers, sentencepiece, and the torch library. After installation, here's an example of how to use the model: The prompt template for the fine-tuned model is defined as follows: ```python user="Hi!" lang_code = "DE" system_messages={ "EN": "A chat between a human and an artificial intelligence assistant." " The assistant gives helpful and polite answers to the human's questions.", "DE": "Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz." " Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen.", } prompt = f"System: {system_messages[lang_code]}\nUser: {user}\nAssistant:" ``` ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model_name = "openGPT-X/Teuken-7B-instruct-v0.4" model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True, torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", ) model = model.to(device).eval() tokenizer = AutoTokenizer.from_pretrained( model_name, use_fast=False, trust_remote_code=True, ) messages = [{"role": "User", "content": "Wer bist du?"}] prompt_ids = tokenizer.apply_chat_template(messages, chat_template="DE", tokenize=True, add_generation_prompt=True, return_tensors="pt") prediction = model.generate( prompt_ids.to(model.device), max_length=512, do_sample=True, top_k=50, top_p=0.95, temperature=0.7, num_return_sequences=1, ) prediction_text = tokenizer.decode(prediction[0]) print(prediction_text) ``` This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result. ## Training Details ### Training Data For composing the final instruction-tuning dataset termed "Honey", we first include all German examples. We aim to include roughly the same amount of English examples, as we have German examples: 1. Add all multi-turn examples 2. Add the entire code_alpaca dataset subset 3. Add entire lmsys_chat_1m_high_quality_train_en dataset subset 4. For the remaining dataset subsets ("open_orca", "evol_instruct_143k", "evol_instruct_70k", "bactrianx_EN") add the examples with the highest reward scores ("quality score") so that each dataset subset contributes an equal amount of high-quality examples ## Dataset Sizes Before Composition ### English ### German ### Training Procedure Instruction fined tuned version of Teuken-7B-base-v0.4. #### Training Hyperparameters - **Training regime:** bf16 mixed precision ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The model was evaluated in 21 languages on ARC, GSM8K, HellaSwag, TruthfulQA, Translation and MMLU. Results can be seen in the European LLM Leaderboard (https://huggingface.co/spaces/openGPT-X/european-llm-leaderboard). ## Technical Specifications ### Model Architecture and Objective | Hyper-Parameter | Value | |----------------------------|----------| | Training Objective | CLM | | Activation Function | SwiGLU | | Seq Length | 4096 | | Position Embeddings | Rotary | | Num Layers | 32 | | Hidden Size | 4096 | | FFN Hidden Size | 13440 | | Num Attention Heads | 32 | | Head Dim | 128 | | Group Query Attention | yes | | Num Query Groups | 2 | | Normalization | RMSNorm | | Learning rate | 3e-4 | | Min learning rate | 3e-5 | | Disable bias in linear | yes | | Hidden dropout | 0.0 | | Attention dropout | 0.0 | | Optimizer | AdamW | | Beta1 | 0.9 | | Beta2 | 0.95 | | Sequence-parallelism | Data-type | bf16 | | Recompute-activations | yes | | Distributed-optimizers | yes | | Model Initialization | | **BibTeX:** TODO **APA:** TODO ## Model Card Contact

Contact Information

You can reach out to the following model card contact: