README.md · LavaPlanet/Goliath120B-exl2-2.37bpw at main

metadata

license: llama2
language:
  - en
pipeline_tag: conversational

Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.37BPW.

2.64BPW

Pippa llama2 Chat was used as the calibration dataset.

Can be run on two RTX 3090s w/ 24GB vram each.

Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use.

2.37BPW @ 4096 ctx
    empty ctx
        GPU split: 16/24
        GPU1: 17.4/24GB
        GPU2: 19.5/24GB
        11~ tk/s
     3000+ ctx
      8~-12 tk/s