File size: 5,097 Bytes

1a38b01
2721bc1
 
 
 
 
 
 
1a38b01
 
2721bc1
1a38b01
2721bc1
1a38b01
2721bc1
 
 
1a38b01
2721bc1
1a38b01
 
2721bc1

---
language:
- en
- fr
- es
- pt
tags:
- falcon3
---

# Falcon3-7B-Base

**Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.

This repository contains the **Falcon3-3B-Base**. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.
Falcon3-3B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 8K.
Falcon3-3B-Base pruned (depth + width) from Falcon3-7B-Base, was effeciently trained on only 100 GT using a knowledge distillation objective.

⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.** 

## Model Details
- Architecture
  - Transformer based causal decoder only architecture
  - 22 decoder blocks
  - Grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
  - Wider head dimension: 256
  - High RoPE value to support long context understanding: 1000042
  - 8k context length
  - 131k vocab size
- Pruned and Healed from Falcon3-7B-Base on only 100 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
- Supports EN, FR, ES, PT
- Developed by [Technology Innovation Institute](https://www.tii.ae)
- License: TII Falcon-LLM License 2.0
- Model Release Date: December 2024


## Getting started

<details>
<summary> Click to expand </summary>

```python
import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation", 
    model="tiiuae/Falcon3-3B-Base", 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)
response = pipe("Question: How many hours in one day? Answer: ")
print(response[0]['generated_text'])
```

</details>

<br>

# Benchmarks
We report in the following table our internal pipeline benchmarks:



<table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
    <colgroup>
        <col style="width: 10%;">
        <col style="width: 10%;">
        <col style="width: 7%;">
        <col style="width: 7%;">
        <col style="width: 7%;">
        <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
    </colgroup>
    <thead>
        <tr>
            <th>Category</th>
            <th>Benchmark</th>
            <th>Llama3.2-3B</th>
            <th>Qwen2.5-3B</th>
            <th>Minitron-4B</th>
            <th>Falcon3-3B-Base</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td rowspan="3">General</td>
            <td>MMLU (5-shot)</td>
            <td>56.1</td>
            <td>65.6</td>
            <td>58.6</td>
            <td>55.5</td>
        </tr>
        <tr>
            <td>MMLU-PRO (5-shot)</td>
            <td>24.9</td>
            <td>31.99</td>
            <td>26.21</td>
            <td>28.77</td>
        </tr>
        <tr>
            <td>IFEval</td>
            <td>12.83</td>
            <td>27</td>
            <td>22.81</td>
            <td>27.67</td>
        </tr>
        <tr>
            <td rowspan="2">Math</td>
            <td>GSM8K (5-shot)</td>
            <td>26.68</td>
            <td>68.99</td>
            <td>25.7</td>
            <td>63.91</td>
        </tr>
        <tr>
            <td>MATH(4-shot)</td>
            <td>1.39</td>
            <td>8.43</td>
            <td>1.73</td>
            <td>9.38</td>
        </tr>
        <tr>
            <td rowspan="4">Reasoning</td>
            <td>Arc Challenge (25-shot)</td>
            <td>50.76</td>
            <td>55.54</td>
            <td>50.34</td>
            <td>54.86</td>
        </tr>
        <tr>
            <td>GPQA (0-shot)</td>
            <td>27.49</td>
            <td>27.53</td>
            <td>38.6</td>
            <td>31.15</td>
        </tr>
        <tr>
            <td>MUSR (0-shot)</td>
            <td>35.24</td>
            <td>43.03</td>
            <td>42.13</td>
            <td>37.5</td>
        </tr>
        <tr>
            <td>BBH (3-shot)</td>
            <td>38.59</td>
            <td>46.12</td>
            <td>40.85</td>
            <td>44.23</td>
        </tr>
        <tr>
            <td rowspan="4">CommonSense Understanding</td>
            <td>PIQA (0-shot)</td>
            <td>77.42</td>
            <td>78.89</td>
            <td>78.29</td>
            <td>75.62</td>
        </tr>
        <tr>
            <td>SciQ (0-shot)</td>
            <td>92.7</td>
            <td>95.6</td>
            <td>96.1</td>
            <td>93.1</td>
        </tr>
        <tr>
            <td>Winogrande (0-shot)</td>
            <td>69.69</td>
            <td>68.82</td>
            <td>68.35</td>
            <td>64.64</td>
        </tr>
        <tr>
            <td>OpenbookQA (0-shot)</td>
            <td>43.2</td>
            <td>42.2</td>
            <td>43</td>
            <td>39.4</td>
        </tr>
    </tbody>
</table>


# Citation
If Falcon3 family were helpful to your work, feel free to give us a cite.

```
@misc{Falcon3,
    title = {The Falcon 3 family of Open Models},
    author = {TII Team},
    month = {December},
    year = {2024}
}
```