File size: 622 Bytes

ce2c166
 
8344b20
 
 
ce2c166
059162e
de5d0c2
4869fa4
 
059162e
95b5ce4
059162e
8a2164c
059162e
de5d0c2
059162e

---
license: llama2
language:
- en
pipeline_tag: conversational
---
Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.37BPW.

[2.64BPW](https://huggingface.co/LavaPlanet/Goliath120B-exl2-2.64bpw)

Pippa llama2 Chat was used as the calibration dataset.

Can be run on two RTX 3090s w/ 24GB vram each.

Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use. 
```yaml
2.37BPW @ 4096 ctx
    empty ctx
    	GPU split: 16/24
    	GPU1: 17.4/24GB
    	GPU2: 19.5/24GB
    	11~ tk/s
 	3000+ ctx
      8~-12 tk/s
```