File size: 7,055 Bytes

---
language:
- de
- bg
- cs
- da
- el
- en
- es
- et
- fi
- fr
- ga
- hr
- hu
- it
- lt
- lv
- mt
- nl
- pl
- pt
- ro
- sl
- sv
- sk
metrics:
- accuracy
- bleu
pipeline_tag: text-generation
library_name: transformers
base_model:
- openGPT-X/Teuken-7B-base-v0.4
license: apache-2.0
---
# Model Card for Teuken-7B-instruct-v0.4

Teuken-7B-instruct-v0.4 is an instruction-tuned version of Teuken-7B-base-v0.4.


### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Fraunhofer IAIS
- **Funded by:** German Federal Ministry of Economics and Climate Protection (BMWK) in the context of the OpenGPT-X project
- **Model type:** Transformer based decoder-only model
- **Language(s) (NLP):** bg, cs, da, de, el, en, es, et, fi, fr, ga, hr, hu, it, lt, lv, mt, nl, pl, pt, ro, sk, sl, sv
- **Shared by:** Fraunhofer IAIS

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Teuken-7B-instruct-v0.4 is intended for commercial and research use in all official 24 European languages. Since Teuken-7B-chat-v0.4 focuses on covering all 24 EU languages, it renders more stable results across these languages and better reflects European values in its answers than English-centric models. It is therefore specialized for use in multilingual tasks.

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

The model is not intended for use in math and coding tasks.

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

Teuken-7B-instruct-v0.4 is an instruction-tuned version of Teuken-7B-base-v0.4 that is not completely free from biases and hallucinations.

## How to Get Started with the Model

## Usage
The model requires transformers, sentencepiece, and the torch library.
After installation, here's an example of how to use the model:

The prompt template for the fine-tuned model is defined as follows:
```python
user="Hi!"
lang_code = "DE"
system_messages={
            "EN": "A chat between a human and an artificial intelligence assistant."
            " The assistant gives helpful and polite answers to the human's questions.",
            "DE": "Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz."
            " Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen.",
        }
 
prompt = f"System: {system_messages[lang_code]}\nUser: {user}\nAssistant:<s>"
```

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_name = "openGPT-X/Teuken-7B-instruct-v0.4"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)
model = model.to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    use_fast=False,
    trust_remote_code=True,
)

messages = [{"role": "User", "content": "Wer bist du?"}]
prompt_ids = tokenizer.apply_chat_template(messages, chat_template="DE", tokenize=True, add_generation_prompt=True, return_tensors="pt")
prediction = model.generate(
    prompt_ids.to(model.device),
    max_length=512,
    do_sample=True,
    top_k=50,
    top_p=0.95,
    temperature=0.7,
    num_return_sequences=1,
)
prediction_text = tokenizer.decode(prediction[0])
print(prediction_text)
```

This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

For composing the final instruction-tuning dataset termed "Honey", we first include all German examples. We aim to include roughly the same amount of English examples, as we have German examples:
  1. Add all multi-turn examples
  2. Add the entire code_alpaca dataset subset
  3. Add entire lmsys_chat_1m_high_quality_train_en dataset subset
  4. For the remaining dataset subsets ("open_orca", "evol_instruct_143k", "evol_instruct_70k", "bactrianx_EN") add the examples with the highest reward scores ("quality score") so that each dataset subset contributes an equal amount of high-quality examples

## Dataset Sizes Before Composition

### English



### German



### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
Instruction fined tuned version of Teuken-7B-base-v0.4.


#### Training Hyperparameters

- **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, , bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

The model was evaluated in 21 languages on ARC, GSM8K, HellaSwag, TruthfulQA, Translation and MMLU. Results can be seen in the European LLM Leaderboard (https://huggingface.co/spaces/openGPT-X/european-llm-leaderboard).

## Technical Specifications

### Model Architecture and Objective

| Hyper-Parameter            | Value    |
|----------------------------|----------|
| Training Objective         | CLM      |
| Activation Function        | SwiGLU   |
| Seq Length                 | 4096     |
| Position Embeddings        | Rotary   |
| Num Layers                 | 32       |
| Hidden Size                | 4096     |
| FFN Hidden Size            | 13440    |
| Num Attention Heads        | 32       |
| Head Dim                   | 128      |
| Group Query Attention      | yes      |
| Num Query Groups           | 2        |
| Normalization              | RMSNorm  |
| Learning rate              | 3e-4     |
| Min learning rate          | 3e-5     |
| Disable bias in linear     | yes      |
| Hidden dropout             | 0.0      |
| Attention dropout          | 0.0      |
| Optimizer                  | AdamW    |
| Beta1                      | 0.9      |
| Beta2                      | 0.95     |
| Sequence-parallelism      
| Data-type                  | bf16     |
| Recompute-activations      | yes      |
| Distributed-optimizers     | yes      |
| Model Initialization       |          |



**BibTeX:**

TODO

**APA:**

TODO

## Model Card Contact

<div class="hf-card">
    <h2>Contact Information</h2>
    <p>You can reach out to the following model card contact:</p>
    <ul>
        <li>
            <a href="https://huggingface.co/openGPT-X" target="_blank">OpenGPT-X</a> 
            - <a href="[email protected]">[email protected]</a>
        </li>
    </ul>
</div>