image/png

image/png

Llama-3-Giraffe-70B

Abacus.AI presents our longer-necked variant of Llama 3 70B!

This model has an effective context length of approximately 128k.

We have currently trained on ~1B tokens. This is an initial release and we are hoping to improve the heatmap below further as we continue training.

image/png

Training Methodology

The methodology for training uses PoSE and dynamic-NTK interpolation.

NTK-scaling

The scale factor for NTK is 4. Note that we also tried theta-scaling but this did not work as well as NTK scaling in our experiments.

PoSE

We utilise Positional Skip-wise Training (PoSE) with the following parameters:

  • Number of Chunks: 5
  • Max position ID: 32768

Data

We use on average ~8K long samples from RedPajama.

Hardware

We train on 8xH100 GPUs with Deepspeed Zero Stage 3.

Evaluation Methodology

We use the EasyContext implementation of Needle-in-a-Haystack to evaluate Llama-3-Giraffe-70B.

We evaluate with the following parameters:

  • Min context length: 2000
  • Max context length: 128000
  • Context interval: 4000
  • Depth interval: 0.1
  • Num samples: 2
  • Rnd number digits: 7
  • Haystack dir: PaulGrahamEssays
Downloads last month
44
Safetensors
Model size
70.6B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for abacusai/Llama-3-Giraffe-70B

Finetunes
1 model
Merges
4 models
Quantizations
2 models