File size: 622 Bytes
ce2c166 8344b20 ce2c166 059162e de5d0c2 4869fa4 059162e 95b5ce4 059162e 8a2164c 059162e de5d0c2 059162e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
---
license: llama2
language:
- en
pipeline_tag: conversational
---
Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.37BPW.
[2.64BPW](https://huggingface.co/LavaPlanet/Goliath120B-exl2-2.64bpw)
Pippa llama2 Chat was used as the calibration dataset.
Can be run on two RTX 3090s w/ 24GB vram each.
Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use.
```yaml
2.37BPW @ 4096 ctx
empty ctx
GPU split: 16/24
GPU1: 17.4/24GB
GPU2: 19.5/24GB
11~ tk/s
3000+ ctx
8~-12 tk/s
``` |