mirekphd commited on
Commit
4da52d0
·
1 Parent(s): 8b6456e

Fixed README file.

Browse files
Files changed (1) hide show
  1. README.md +31 -43
README.md CHANGED
@@ -10,11 +10,11 @@ license: apache-2.0
10
  ---
11
  ## This version
12
 
13
- This model was converted to a **8-bit GGUF format (`q8_0`)** from **[`Alibaba-NLP/gte-Qwen2-7B-instruct`](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct)** using `llama-quantize` built from [`llama.cpp`](https://github.com/ggerganov/llama.cpp).
14
 
15
  Custom conversion script settings:
16
  ```json
17
- "gte-Qwen2-1.5B-instruct": {
18
  "model_name": "gte-Qwen2-1.5B-instruct",
19
  "hq_quant_type": "f32",
20
  "final_quant_type": "q8_0",
@@ -24,28 +24,22 @@ Custom conversion script settings:
24
  "numexpr_max_thread": 8
25
  }
26
  ```
27
- Please refer to the [original model card](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) for more details on the unquantized model, including its metrics, which may be different (typically slightly worse) for this quantized version.
28
 
29
 
30
- ## gte-Qwen2-7B-instruct
31
 
32
- **gte-Qwen2-7B-instruct** is the latest model in the gte (General Text Embedding) model family that ranks **No.1** in both English and Chinese evaluations on the Massive Text Embedding Benchmark [MTEB benchmark](https://huggingface.co/spaces/mteb/leaderboard) (as of June 16, 2024).
33
-
34
- Recently, the [**Qwen team**](https://huggingface.co/Qwen) released the Qwen2 series models, and we have trained the **gte-Qwen2-7B-instruct** model based on the [Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B) LLM model. Compared to the [gte-Qwen1.5-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct) model, the **gte-Qwen2-7B-instruct** model uses the same training data and training strategies during the finetuning stage, with the only difference being the upgraded base model to Qwen2-7B. Considering the improvements in the Qwen2 series models compared to the Qwen1.5 series, we can also expect consistent performance enhancements in the embedding models.
35
 
36
  The model incorporates several key advancements:
37
 
38
  - Integration of bidirectional attention mechanisms, enriching its contextual understanding.
39
  - Instruction tuning, applied solely on the query side for streamlined efficiency
40
  - Comprehensive training across a vast, multilingual text corpus spanning diverse domains and scenarios. This training leverages both weakly supervised and supervised data, ensuring the model's applicability across numerous languages and a wide array of downstream tasks.
41
-
42
-
43
  ## Model Information
44
-
45
- ### Overview
46
  - Model Type: GTE (General Text Embeddings)
47
- - Model Size: 7B
48
- - Embedding Dimension: 3584
49
  - Context Window: 131072
50
  ### Supported languages
51
  - North America: English
@@ -60,18 +54,18 @@ The model incorporates several key advancements:
60
  ```
61
  llama_model_loader: - kv 0: general.architecture str = qwen2
62
  llama_model_loader: - kv 1: general.type str = model
63
- llama_model_loader: - kv 2: general.name str = gte-Qwen2-7B-instruct
64
  llama_model_loader: - kv 3: general.finetune str = instruct
65
  llama_model_loader: - kv 4: general.basename str = gte-Qwen2
66
- llama_model_loader: - kv 5: general.size_label str = 7B
67
  llama_model_loader: - kv 6: general.license str = apache-2.0
68
  llama_model_loader: - kv 7: general.tags arr[str,5] = ["mteb", "sentence-transformers", "tr...
69
  llama_model_loader: - kv 8: qwen2.block_count u32 = 28
70
  llama_model_loader: - kv 9: qwen2.context_length u32 = 131072
71
- llama_model_loader: - kv 10: qwen2.embedding_length u32 = 3584
72
- llama_model_loader: - kv 11: qwen2.feed_forward_length u32 = 18944
73
- llama_model_loader: - kv 12: qwen2.attention.head_count u32 = 28
74
- llama_model_loader: - kv 13: qwen2.attention.head_count_kv u32 = 4
75
  llama_model_loader: - kv 14: qwen2.rope.freq_base f32 = 1000000.000000
76
  llama_model_loader: - kv 15: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
77
  llama_model_loader: - kv 16: general.file_type u32 = 7
@@ -87,7 +81,7 @@ llama_model_loader: - kv 25: tokenizer.ggml.add_eos_token bool
87
  llama_model_loader: - kv 26: tokenizer.chat_template str = {% for message in messages %}{{'<|im_...
88
  llama_model_loader: - kv 27: general.quantization_version u32 = 2
89
  llama_model_loader: - kv 28: split.no u16 = 0
90
- llama_model_loader: - kv 29: split.count u16 = 8
91
  llama_model_loader: - kv 30: split.tensors.count i32 = 339
92
  llama_model_loader: - type f32: 141 tensors
93
  llama_model_loader: - type q8_0: 198 tensors
@@ -100,23 +94,23 @@ llm_load_print_meta: n_vocab = 151646
100
  llm_load_print_meta: n_merges = 151387
101
  llm_load_print_meta: vocab_only = 0
102
  llm_load_print_meta: n_ctx_train = 131072
103
- llm_load_print_meta: n_embd = 3584
104
  llm_load_print_meta: n_layer = 28
105
- llm_load_print_meta: n_head = 28
106
- llm_load_print_meta: n_head_kv = 4
107
  llm_load_print_meta: n_rot = 128
108
  llm_load_print_meta: n_swa = 0
109
  llm_load_print_meta: n_embd_head_k = 128
110
  llm_load_print_meta: n_embd_head_v = 128
111
- llm_load_print_meta: n_gqa = 7
112
- llm_load_print_meta: n_embd_k_gqa = 512
113
- llm_load_print_meta: n_embd_v_gqa = 512
114
  llm_load_print_meta: f_norm_eps = 0.0e+00
115
  llm_load_print_meta: f_norm_rms_eps = 1.0e-06
116
  llm_load_print_meta: f_clamp_kqv = 0.0e+00
117
  llm_load_print_meta: f_max_alibi_bias = 0.0e+00
118
  llm_load_print_meta: f_logit_scale = 0.0e+00
119
- llm_load_print_meta: n_ff = 18944
120
  llm_load_print_meta: n_expert = 0
121
  llm_load_print_meta: n_expert_used = 0
122
  llm_load_print_meta: causal attn = 1
@@ -132,11 +126,11 @@ llm_load_print_meta: ssm_d_inner = 0
132
  llm_load_print_meta: ssm_d_state = 0
133
  llm_load_print_meta: ssm_dt_rank = 0
134
  llm_load_print_meta: ssm_dt_b_c_rms = 0
135
- llm_load_print_meta: model type = 7B
136
  llm_load_print_meta: model ftype = Q8_0
137
- llm_load_print_meta: model params = 7.61 B
138
- llm_load_print_meta: model size = 7.53 GiB (8.50 BPW)
139
- llm_load_print_meta: general.name = gte-Qwen2-7B-instruct
140
  llm_load_print_meta: BOS token = 151643 '<|endoftext|>'
141
  llm_load_print_meta: EOS token = 151643 '<|endoftext|>'
142
  llm_load_print_meta: EOT token = 151645 '<|im_end|>'
@@ -145,15 +139,9 @@ llm_load_print_meta: LF token = 148848 'ÄĬ'
145
  llm_load_print_meta: EOG token = 151643 '<|endoftext|>'
146
  llm_load_print_meta: EOG token = 151645 '<|im_end|>'
147
  llm_load_print_meta: max token length = 256
148
- llm_load_tensors: CPU_Mapped model buffer size = 1008.21 MiB
149
- llm_load_tensors: CPU_Mapped model buffer size = 959.63 MiB
150
- llm_load_tensors: CPU_Mapped model buffer size = 974.51 MiB
151
- llm_load_tensors: CPU_Mapped model buffer size = 983.77 MiB
152
- llm_load_tensors: CPU_Mapped model buffer size = 944.73 MiB
153
- llm_load_tensors: CPU_Mapped model buffer size = 944.76 MiB
154
- llm_load_tensors: CPU_Mapped model buffer size = 944.74 MiB
155
- llm_load_tensors: CPU_Mapped model buffer size = 954.29 MiB
156
- ........................................................................................
157
  llama_new_context_with_model: n_seq_max = 1
158
  llama_new_context_with_model: n_ctx = 131072
159
  llama_new_context_with_model: n_ctx_per_seq = 131072
@@ -162,10 +150,10 @@ llama_new_context_with_model: n_ubatch = 512
162
  llama_new_context_with_model: flash_attn = 0
163
  llama_new_context_with_model: freq_base = 1000000.0
164
  llama_new_context_with_model: freq_scale = 1
165
- llama_kv_cache_init: CPU KV buffer size = 7168.00 MiB
166
- llama_new_context_with_model: KV self size = 7168.00 MiB, K (f16): 3584.00 MiB, V (f16): 3584.00 MiB
167
  llama_new_context_with_model: CPU output buffer size = 0.01 MiB
168
- llama_new_context_with_model: CPU compute buffer size = 7452.01 MiB
169
  llama_new_context_with_model: graph nodes = 986
170
  llama_new_context_with_model: graph splits = 1
171
  ```
 
10
  ---
11
  ## This version
12
 
13
+ This model was converted to a **8-bit GGUF format (`q8_0`)** from **[`Alibaba-NLP/gte-Qwen2-1.5B-instruct`](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct)** using `llama-quantize` built from [`llama.cpp`](https://github.com/ggerganov/llama.cpp).
14
 
15
  Custom conversion script settings:
16
  ```json
17
+ "gte-Qwen2-1.5B-instruct": {
18
  "model_name": "gte-Qwen2-1.5B-instruct",
19
  "hq_quant_type": "f32",
20
  "final_quant_type": "q8_0",
 
24
  "numexpr_max_thread": 8
25
  }
26
  ```
27
+ Please refer to the [original model card](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) for more details on the unquantized model, including its metrics, which may be different (typically slightly worse) for this quantized version.
28
 
29
 
30
+ ## gte-Qwen2-1.5B-instruct
31
 
32
+ **gte-Qwen2-1.5B-instruct** is the latest model in the gte (General Text Embedding) model family. The model is built on [Qwen2-1.5B](https://huggingface.co/Qwen/Qwen2-1.5B) LLM model and use the same training data and strategies as the [gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) model.
 
 
33
 
34
  The model incorporates several key advancements:
35
 
36
  - Integration of bidirectional attention mechanisms, enriching its contextual understanding.
37
  - Instruction tuning, applied solely on the query side for streamlined efficiency
38
  - Comprehensive training across a vast, multilingual text corpus spanning diverse domains and scenarios. This training leverages both weakly supervised and supervised data, ensuring the model's applicability across numerous languages and a wide array of downstream tasks.
 
 
39
  ## Model Information
 
 
40
  - Model Type: GTE (General Text Embeddings)
41
+ - Model Size: 1.5B
42
+ - Embedding Dimension: 1536
43
  - Context Window: 131072
44
  ### Supported languages
45
  - North America: English
 
54
  ```
55
  llama_model_loader: - kv 0: general.architecture str = qwen2
56
  llama_model_loader: - kv 1: general.type str = model
57
+ llama_model_loader: - kv 2: general.name str = gte-Qwen2-1.5B-instruct
58
  llama_model_loader: - kv 3: general.finetune str = instruct
59
  llama_model_loader: - kv 4: general.basename str = gte-Qwen2
60
+ llama_model_loader: - kv 5: general.size_label str = 1.5B
61
  llama_model_loader: - kv 6: general.license str = apache-2.0
62
  llama_model_loader: - kv 7: general.tags arr[str,5] = ["mteb", "sentence-transformers", "tr...
63
  llama_model_loader: - kv 8: qwen2.block_count u32 = 28
64
  llama_model_loader: - kv 9: qwen2.context_length u32 = 131072
65
+ llama_model_loader: - kv 10: qwen2.embedding_length u32 = 1536
66
+ llama_model_loader: - kv 11: qwen2.feed_forward_length u32 = 8960
67
+ llama_model_loader: - kv 12: qwen2.attention.head_count u32 = 12
68
+ llama_model_loader: - kv 13: qwen2.attention.head_count_kv u32 = 2
69
  llama_model_loader: - kv 14: qwen2.rope.freq_base f32 = 1000000.000000
70
  llama_model_loader: - kv 15: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
71
  llama_model_loader: - kv 16: general.file_type u32 = 7
 
81
  llama_model_loader: - kv 26: tokenizer.chat_template str = {% for message in messages %}{{'<|im_...
82
  llama_model_loader: - kv 27: general.quantization_version u32 = 2
83
  llama_model_loader: - kv 28: split.no u16 = 0
84
+ llama_model_loader: - kv 29: split.count u16 = 2
85
  llama_model_loader: - kv 30: split.tensors.count i32 = 339
86
  llama_model_loader: - type f32: 141 tensors
87
  llama_model_loader: - type q8_0: 198 tensors
 
94
  llm_load_print_meta: n_merges = 151387
95
  llm_load_print_meta: vocab_only = 0
96
  llm_load_print_meta: n_ctx_train = 131072
97
+ llm_load_print_meta: n_embd = 1536
98
  llm_load_print_meta: n_layer = 28
99
+ llm_load_print_meta: n_head = 12
100
+ llm_load_print_meta: n_head_kv = 2
101
  llm_load_print_meta: n_rot = 128
102
  llm_load_print_meta: n_swa = 0
103
  llm_load_print_meta: n_embd_head_k = 128
104
  llm_load_print_meta: n_embd_head_v = 128
105
+ llm_load_print_meta: n_gqa = 6
106
+ llm_load_print_meta: n_embd_k_gqa = 256
107
+ llm_load_print_meta: n_embd_v_gqa = 256
108
  llm_load_print_meta: f_norm_eps = 0.0e+00
109
  llm_load_print_meta: f_norm_rms_eps = 1.0e-06
110
  llm_load_print_meta: f_clamp_kqv = 0.0e+00
111
  llm_load_print_meta: f_max_alibi_bias = 0.0e+00
112
  llm_load_print_meta: f_logit_scale = 0.0e+00
113
+ llm_load_print_meta: n_ff = 8960
114
  llm_load_print_meta: n_expert = 0
115
  llm_load_print_meta: n_expert_used = 0
116
  llm_load_print_meta: causal attn = 1
 
126
  llm_load_print_meta: ssm_d_state = 0
127
  llm_load_print_meta: ssm_dt_rank = 0
128
  llm_load_print_meta: ssm_dt_b_c_rms = 0
129
+ llm_load_print_meta: model type = 1.5B
130
  llm_load_print_meta: model ftype = Q8_0
131
+ llm_load_print_meta: model params = 1.78 B
132
+ llm_load_print_meta: model size = 1.76 GiB (8.50 BPW)
133
+ llm_load_print_meta: general.name = gte-Qwen2-1.5B-instruct
134
  llm_load_print_meta: BOS token = 151643 '<|endoftext|>'
135
  llm_load_print_meta: EOS token = 151643 '<|endoftext|>'
136
  llm_load_print_meta: EOT token = 151645 '<|im_end|>'
 
139
  llm_load_print_meta: EOG token = 151643 '<|endoftext|>'
140
  llm_load_print_meta: EOG token = 151645 '<|im_end|>'
141
  llm_load_print_meta: max token length = 256
142
+ llm_load_tensors: CPU_Mapped model buffer size = 1008.90 MiB
143
+ llm_load_tensors: CPU_Mapped model buffer size = 791.29 MiB
144
+ ............................................................................
 
 
 
 
 
 
145
  llama_new_context_with_model: n_seq_max = 1
146
  llama_new_context_with_model: n_ctx = 131072
147
  llama_new_context_with_model: n_ctx_per_seq = 131072
 
150
  llama_new_context_with_model: flash_attn = 0
151
  llama_new_context_with_model: freq_base = 1000000.0
152
  llama_new_context_with_model: freq_scale = 1
153
+ llama_kv_cache_init: CPU KV buffer size = 3584.00 MiB
154
+ llama_new_context_with_model: KV self size = 3584.00 MiB, K (f16): 1792.00 MiB, V (f16): 1792.00 MiB
155
  llama_new_context_with_model: CPU output buffer size = 0.01 MiB
156
+ llama_new_context_with_model: CPU compute buffer size = 3340.01 MiB
157
  llama_new_context_with_model: graph nodes = 986
158
  llama_new_context_with_model: graph splits = 1
159
  ```