File size: 1,813 Bytes
1ae19b5 a9918ea 1ae19b5 a9918ea 1ae19b5 02457f4 a9918ea a0cda08 a9918ea a0cda08 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea 95d8600 a9918ea 95d8600 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea 1ae19b5 a9918ea |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
---
language:
- en
pipeline_tag: text-generation
tags:
- meta
- llama-3
license: llama3
---
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/pf4d6FA7DriRtVq5HCkxd.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/VcZWbW_eZkJAZZ5ricL4B.png)
# Llama-3-Giraffe-70B
Abacus.AI presents our longer-necked variant of Llama 3 70B!
This model has an effective context length of approximately 128k.
We have currently trained on ~1B tokens.
This is an initial release and we are hoping to improve the heatmap below further as we continue training.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/_NVEuQ2ZT-sBtDBNjgmbt.png)
## Training Methodology
The methodology for training uses [PoSE](https://arxiv.org/abs/2309.10400) and dynamic-NTK interpolation.
### NTK-scaling
The scale factor for NTK is 4. Note that we also tried theta-scaling but this did not work as well as NTK scaling in our experiments.
### PoSE
We utilise Positional Skip-wise Training (PoSE) with the following parameters:
- **Number of Chunks**: 5
- **Max position ID**: 32768
### Data
We use on average ~8K long samples from [RedPajama](https://github.com/togethercomputer/RedPajama-Data).
### Hardware
We train on 8xH100 GPUs with Deepspeed Zero Stage 3.
## Evaluation Methodology
We use the [EasyContext](https://github.com/abacusai/EasyContext/blob/eval_runs/eval_needle.py) implementation of Needle-in-a-Haystack to evaluate Llama-3-Giraffe-70B.
We evaluate with the following parameters:
- **Min context length**: 2000
- **Max context length**: 128000
- **Context interval**: 4000
- **Depth interval**: 0.1
- **Num samples**: 2
- **Rnd number digits**: 7
- **Haystack dir**: PaulGrahamEssays |