Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,165 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- de
|
5 |
+
- it
|
6 |
+
- fr
|
7 |
+
- da
|
8 |
+
- sv
|
9 |
+
- fi
|
10 |
+
- 'no'
|
11 |
+
---
|
12 |
+
![Kraken](https://vago-solutions.de/wp-content/uploads/2024/05/Kraken_Pic-multi.png "Kraken-Multilingual")
|
13 |
+
|
14 |
+
|
15 |
+
## Overview
|
16 |
+
|
17 |
+
The Kraken-Multilingual model and Architecture **Kraken** is a **joint effort** between **Cognitive Computations**, **VAGO Solutions** and **Hyperspace.ai.**
|
18 |
+
|
19 |
+
Created by **Fernando Fernandes Neto**, **David Golchinfar**, **Lucas Atkins** and **Eric Hartford**
|
20 |
+
|
21 |
+
The Kraken-Multilingual model supports German, English, Italian, French, Swedish, Finnish, Danish and Norwegian language.
|
22 |
+
|
23 |
+
The Kraken Architecture is a sophisticated machine learning framework designed for dynamic text generation tasks. It utilizes the Hugging Face transformers library to orchestrate multiple causal language models (CLMs) and intelligently route input through different models based on the context and content of the input text. The architecture is powered by a custom configuration class (KrakenConfig) that facilitates the integration and management of various components such as tokenizers, models, and routing mechanisms.
|
24 |
+
|
25 |
+
## Features
|
26 |
+
|
27 |
+
Dynamic Model Routing: Uses a sequence classification model to route inputs to the most suitable language model based on the input's characteristics.
|
28 |
+
Multiple Language Models: Supports integration of various pre-trained causal language models, allowing for flexible, context-appropriate responses.
|
29 |
+
Customizable Templates: Includes support for input formatting using predefined templates, enhancing the model's adaptability to different conversational contexts.
|
30 |
+
Extensible Configuration: Leverages a custom configuration setup that can be easily extended and adapted for various use cases involving causal language modeling.
|
31 |
+
|
32 |
+
## Selected Models as Experts:
|
33 |
+
```
|
34 |
+
"German/English Expert": "VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct",
|
35 |
+
"Function Italian Expert": "mii-community/zefiro-7b-dpo-ITA",
|
36 |
+
"French Expert": "paulml/Hermes-2-Pro-French",
|
37 |
+
"Scandinavian Expert": "norallm/normistral-7b-warm-instruct",
|
38 |
+
```
|
39 |
+
|
40 |
+
**How to load and call Kraken-Multilingual model :**
|
41 |
+
```
|
42 |
+
from transformers import AutoConfig, AutoModelForCausalLM
|
43 |
+
from configuration_kraken import KrakenConfig
|
44 |
+
from modeling_kraken import KrakenForCausalLM
|
45 |
+
|
46 |
+
AutoConfig.register("kraken", KrakenConfig)
|
47 |
+
AutoModelForCausalLM.register(KrakenConfig, KrakenForCausalLM)
|
48 |
+
|
49 |
+
device = "cuda:0" ## Setup "cuda:0" if NVIDIA, "mps" if on Mac
|
50 |
+
|
51 |
+
# Load the model and config:
|
52 |
+
config = AutoConfig.from_pretrained("./kraken_model")
|
53 |
+
model = AutoModelForCausalLM.from_pretrained("./kraken_model", config=config, trust_remote_code=True)
|
54 |
+
```
|
55 |
+
|
56 |
+
# Call the German expert:
|
57 |
+
```
|
58 |
+
messages = [
|
59 |
+
{'role': 'system', 'content': 'Du bist ein freundlicher und hilfreicher deutscher KI-Assistent'},
|
60 |
+
{'role': 'user', 'content': "Erzähle mir eine kurze Gute Nacht Geschichte in 2 Sätzen."}
|
61 |
+
]
|
62 |
+
|
63 |
+
tokenizer = model.tokenizer
|
64 |
+
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
|
65 |
+
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda:0")
|
66 |
+
output_ids = model.generate(input_ids, max_length=150)
|
67 |
+
print(model.expert_tokenizer(text=input_text).decode(output_ids[0], skip_special_tokens=True))
|
68 |
+
```
|
69 |
+
|
70 |
+
|
71 |
+
|
72 |
+
# Call the English expert:
|
73 |
+
```
|
74 |
+
messages = [
|
75 |
+
{'role': 'system', 'content': '"You are a helpful AI Assistant'},
|
76 |
+
{'role': 'user', 'content': "Find the mass percentage of Ba in BaO"}
|
77 |
+
]
|
78 |
+
|
79 |
+
tokenizer = model.tokenizer
|
80 |
+
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
|
81 |
+
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
|
82 |
+
output_ids = model.generate(input_ids, max_length=250)
|
83 |
+
print(model.expert_tokenizer(text=input_text).decode(output_ids[0], skip_special_tokens=True))
|
84 |
+
```
|
85 |
+
|
86 |
+
# Call the Italian expert:
|
87 |
+
```
|
88 |
+
messages = [
|
89 |
+
{'role': 'system', 'content': 'Sei un utile assistente AI.'},
|
90 |
+
{'role': 'user', 'content': 'Hai qualche idea su cosa potrei fare a Roma?''}
|
91 |
+
]
|
92 |
+
|
93 |
+
tokenizer = model.tokenizer
|
94 |
+
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
|
95 |
+
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
|
96 |
+
output_ids = model.generate(input_ids ,temperature=0.6, do_sample=True, top_p=0.9,top_k=20, max_length=500)
|
97 |
+
print(model.expert_tokenizer(text=input_text).decode(output_ids[0], skip_special_tokens=True))
|
98 |
+
```
|
99 |
+
|
100 |
+
# Call the French expert:
|
101 |
+
```
|
102 |
+
messages = [
|
103 |
+
{'role': 'system', 'content': 'Vous êtes un assistant IA allemand sympathique et serviable'},
|
104 |
+
{'role': 'user', 'content': 'J'aimerais faire du shopping à Paris. Que pouvez-vous recommander?'}
|
105 |
+
]
|
106 |
+
|
107 |
+
tokenizer = model.tokenizer
|
108 |
+
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
|
109 |
+
print(input_text)
|
110 |
+
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
|
111 |
+
output_ids = model.generate(input_ids ,temperature=0.6, do_sample=True, top_p=0.9,top_k=20, max_length=250)
|
112 |
+
print(model.expert_tokenizer(text=input_text).decode(output_ids[0], skip_special_tokens=True))
|
113 |
+
```
|
114 |
+
|
115 |
+
# Call the Scandinavian expert:
|
116 |
+
```
|
117 |
+
messages = [
|
118 |
+
{'role': 'system', 'content': 'Du är en hjälpsam AI-assistent'},
|
119 |
+
{'role': 'user', 'content': 'Jag kommer från Tyskland och skulle vilja resa till Sverige. Är en färja över Danmark ett bra sätt att resa?'}
|
120 |
+
]
|
121 |
+
|
122 |
+
tokenizer = model.tokenizer
|
123 |
+
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
|
124 |
+
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
|
125 |
+
output_ids = model.generate(input_ids ,temperature=0.1, do_sample=True, top_p=0.9,top_k=20, max_length=250)
|
126 |
+
print(model.expert_tokenizer(text=input_text).decode(output_ids[0], skip_special_tokens=True))
|
127 |
+
```
|
128 |
+
|
129 |
+
|
130 |
+
# Switch expert and or quantization:
|
131 |
+
Go into the config file of the kraken_model folder
|
132 |
+
```
|
133 |
+
"models": {
|
134 |
+
"expert1": "VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct", # Switch to a german/english model of your choice
|
135 |
+
"expert2": "mii-community/zefiro-7b-dpo-ITA", # Switch to a italian model of your choice
|
136 |
+
"expert3": "paulml/Hermes-2-Pro-French", # Switch to a french model of your choice
|
137 |
+
"expert4": "norallm/normistral-7b-warm-instruct" # Switch to a scandinavian model of your choice
|
138 |
+
},
|
139 |
+
# Currently supported: "4bit","8bit" and "awq"
|
140 |
+
"quantization": {
|
141 |
+
"expert1": null,
|
142 |
+
"expert2": null,
|
143 |
+
"expert3": null,
|
144 |
+
"expert4": null
|
145 |
+
},
|
146 |
+
"router": "kraken_router",
|
147 |
+
# Adjust the tokenizer to your selected model
|
148 |
+
"tokenizers": {
|
149 |
+
"expert1": "VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct",
|
150 |
+
"expert2": "mii-community/zefiro-7b-dpo-ITA",
|
151 |
+
"expert3": "paulml/Hermes-2-Pro-French",
|
152 |
+
"expert4": "norallm/normistral-7b-warm-instruct"
|
153 |
+
}
|
154 |
+
},
|
155 |
+
"model_type": "kraken",
|
156 |
+
"torch_dtype": "float32",
|
157 |
+
"transformers_version": "4.41.0"
|
158 |
+
}
|
159 |
+
|
160 |
+
|
161 |
+
```
|
162 |
+
|
163 |
+
## Cite As
|
164 |
+
|
165 |
+
Fernando Fernandes Neto, David Golchinfar, Lucas Atkins, Eric Hartford - [Kraken: An OpenSource Collection of Experts Model, 2024](https://github.com/cognitivecomputations/kraken)
|