LavaPlanet commited on
Commit
059162e
·
1 Parent(s): 8aaa423

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -49
README.md CHANGED
@@ -4,56 +4,20 @@ language:
4
  - en
5
  pipeline_tag: conversational
6
  ---
7
- # Goliath 120B
8
 
9
- An auto-regressive causal LM created by combining 2x finetuned [Llama-2 70B](https://huggingface.co/meta-llama/llama-2-70b-hf) into one.
10
 
11
- Please check out the quantized formats provided by [@TheBloke](https:///huggingface.co/TheBloke) and [@Panchovix](https://huggingface.co/Panchovix):
12
-
13
- - [GGUF](https://huggingface.co/TheBloke/goliath-120b-GGUF) (llama.cpp)
14
- - [GPTQ](https://huggingface.co/TheBloke/goliath-120b-GPTQ) (KoboldAI, TGW, Aphrodite)
15
- - [AWQ](https://huggingface.co/TheBloke/goliath-120b-AWQ) (TGW, Aphrodite, vLLM)
16
- - [Exllamav2](https://huggingface.co/Panchovix/goliath-120b-exl2) (TGW, KoboldAI)
17
-
18
- # Prompting Format
19
-
20
- Both Vicuna and Alpaca will work, but due the initial and final layers belonging primarily to Xwin, I expect Vicuna to work the best.
21
-
22
- # Merge process
23
-
24
- The models used in the merge are [Xwin](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1) and [Euryale](https://huggingface.co/Sao10K/Euryale-1.3-L2-70B).
25
-
26
- The layer ranges used are as follows:
27
 
 
28
  ```yaml
29
- - range 0, 16
30
- Xwin
31
- - range 8, 24
32
- Euryale
33
- - range 17, 32
34
- Xwin
35
- - range 25, 40
36
- Euryale
37
- - range 33, 48
38
- Xwin
39
- - range 41, 56
40
- Euryale
41
- - range 49, 64
42
- Xwin
43
- - range 57, 72
44
- Euryale
45
- - range 65, 80
46
- Xwin
47
- ```
48
-
49
- # Screenshots
50
-
51
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/635567189c72a7e742f1419c/Cat8_Rimaz6Ni7YhQiiGB.png)
52
-
53
- # Benchmarks
54
- Coming soon.
55
-
56
- # Acknowledgements
57
- Credits goes to [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge the model - [mergekit](https://github.com/cg123/mergekit).
58
-
59
- Special thanks to [@Undi95](https://huggingface.co/Undi95) for helping with the merge ratios.
 
4
  - en
5
  pipeline_tag: conversational
6
  ---
7
+ Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.37BPW.
8
 
9
+ Pippa llama2 Chat was used as the calibration dataset.
10
 
11
+ Can be run on two RTX 3090s w/ 24GB vram each.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
+ Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use.
14
  ```yaml
15
+ 2.37BPW @ 4096 ctx
16
+ empty ctx
17
+ GPU split: 16/24
18
+ GPU1: 17.4/24GB
19
+ GPU2: 19.5/24GB
20
+ 11~ tk/s
21
+ 3000+ ctx
22
+ 8~-12 tk/s
23
+ ```