LavaPlanet
/

Goliath120B-exl2-2.37bpw

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Goliath120B-exl2-2.37bpw / README.md

LavaPlanet's picture

Update README.md

4869fa4 about 1 year ago

|

history blame contribute delete

622 Bytes

	---
	license: llama2
	language:
	- en
	pipeline_tag: conversational
	---
	Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.37BPW.

	[2.64BPW](https://huggingface.co/LavaPlanet/Goliath120B-exl2-2.64bpw)

	Pippa llama2 Chat was used as the calibration dataset.

	Can be run on two RTX 3090s w/ 24GB vram each.

	Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use.
	```yaml
	2.37BPW @ 4096 ctx
	empty ctx
	GPU split: 16/24
	GPU1: 17.4/24GB
	GPU2: 19.5/24GB
	11~ tk/s
	3000+ ctx
	8~-12 tk/s
	```