|
--- |
|
license: llama2 |
|
language: |
|
- en |
|
pipeline_tag: conversational |
|
--- |
|
Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.37BPW. |
|
|
|
[2.64BPW](https://huggingface.co/LavaPlanet/Goliath120B-exl2-2.64bpw) |
|
|
|
Pippa llama2 Chat was used as the calibration dataset. |
|
|
|
Can be run on two RTX 3090s w/ 24GB vram each. |
|
|
|
Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use. |
|
```yaml |
|
2.37BPW @ 4096 ctx |
|
empty ctx |
|
GPU split: 16/24 |
|
GPU1: 17.4/24GB |
|
GPU2: 19.5/24GB |
|
11~ tk/s |
|
3000+ ctx |
|
8~-12 tk/s |
|
``` |