JordiBayarri
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -47,11 +47,13 @@ Aloe is trained in 20 medical tasks, resulting in a robust and versatile healthc
|
|
47 |
|
48 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/VUYw4IdANKGrH2VOedwH0.png)
|
49 |
|
50 |
-
Aloe-70B-Beta is the latest iteration in the Aloe family
|
51 |
-
Beta more than triples the training data used by Alpha, for a total of 1.8B tokens
|
52 |
|
53 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/bCuV5kZUT9H9UECAOWDRc.png)
|
54 |
|
|
|
|
|
55 |
Beta also boosts the alignment and safety stages with respect to Alpha. This includes a [medical preference dataset](https://huggingface.co/datasets/TsinghuaC3I/UltraMedical-Preference), as well as the red-teaming dataset (available soon).
|
56 |
|
57 |
Complete training details, model merging configurations, and all training data (including synthetically generated data) can be found below. This includes [the RAG system](https://github.com/HPAI-BSC/prompt_engine) that was developed to test Aloe Beta in a deployment setup. Aloe comes with a healthcare-specific risk assessment to facilitate to the safe use and deployment of such systems.
|
@@ -73,14 +75,12 @@ Complete training details, model merging configurations, and all training data (
|
|
73 |
|
74 |
## Model Performance
|
75 |
|
76 |
-
Aloe Beta has been tested on the most popular healthcare QA datasets, with and without Medprompt inference technique. Results show competitive performance, achieving SOTA within models of the same size.
|
77 |
|
78 |
|
79 |
|
80 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/s8rWbwpYTkar5_X_LnOhb.png)
|
81 |
|
82 |
-
More evaluations coming soon!
|
83 |
-
|
84 |
<!---
|
85 |
The Beta model has been developed to excel in several different medical tasks. For this reason, we evaluated the model in many different medical tasks:
|
86 |
|
@@ -89,12 +89,18 @@ The Beta model has been developed to excel in several different medical tasks. F
|
|
89 |
|
90 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/2NW3im0aH2u6RKp969sjx.png)
|
91 |
|
|
|
|
|
|
|
|
|
92 |
We also compared the performance of the model in the general domain, using the OpenLLM Leaderboard benchmark. Aloe-Beta gets competitive results with the current SOTA general models in the most used general benchmarks and outperforms the medical models:
|
93 |
|
94 |
|
|
|
|
|
|
|
|
|
95 |
|
96 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/imK19fzyMUvIJaAbSVnGE.png)
|
97 |
-
-->
|
98 |
## Uses
|
99 |
|
100 |
### Direct Use
|
@@ -243,20 +249,19 @@ We used Deepspeed's Zero-3 distributed training using the following hardware:
|
|
243 |
|
244 |
The training set consists of around 1.8B tokens, having 3 different types of data:
|
245 |
|
246 |
-
- Medical domain datasets
|
247 |
- [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
|
248 |
- [HPAI-BSC/chain-of-diagnosis](https://huggingface.co/datasets/HPAI-BSC/chain-of-diagnosis)
|
249 |
- [HPAI-BSC/MedS-Ins](https://huggingface.co/datasets/HPAI-BSC/MedS-Ins)
|
250 |
- [HPAI-BSC/ultramedica](https://huggingface.co/datasets/HPAI-BSC/ultramedical)
|
251 |
-
- Synthetic data
|
252 |
- [HPAI-BSC/pubmedqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/pubmedqa-cot-llama31)
|
253 |
- [HPAI-BSC/medqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medqa-cot-llama31)
|
254 |
- [HPAI-BSC/medmcqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medmcqa-cot-llama31)
|
255 |
- [HPAI-BSC/headqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/headqa-cot-llama31)
|
256 |
- [HPAI-BSC/MMLU-medical-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/MMLU-medical-cot-llama31)
|
257 |
- [HPAI-BSC/Polymed-QA](https://huggingface.co/datasets/HPAI-BSC/Polymed-QA)
|
258 |
-
|
259 |
-
- General data:
|
260 |
- [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
|
261 |
|
262 |
#### Training parameters
|
@@ -280,7 +285,7 @@ The model trained was merged with the Llama-3.1-Instruct model using the DARE_TI
|
|
280 |
The model is aligned using the Direct Preference Optimization (DPO) technique through a two-step process:
|
281 |
|
282 |
1. General DPO Alignment: This step uses a dataset combining medical, general preference, and safety data. We used our dataset [HPAI-BSC/Aloe-Beta-DPO](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-DPO). We split the dataset into five parts, and the model was trained iteratively for one epoch on each chunk. We used a learning rate of 2e-7.
|
283 |
-
2. Red-Teaming Alignment: This step further fine-tunes the model to resist a variety of potential attacks, enhancing its robustness and security.
|
284 |
|
285 |
<!---
|
286 |
^^^ LINKS TO DPO DATA (DPO added, missing the RT^^^
|
|
|
47 |
|
48 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/VUYw4IdANKGrH2VOedwH0.png)
|
49 |
|
50 |
+
**Aloe-70B-Beta** is the latest iteration in the **Aloe family**, building and improving on the success of its predecessor, [Aloe-8B-Alpha](https://huggingface.co/HPAI-BSC/Llama3-Aloe-8B-Alpha) in a larger model size.
|
51 |
+
Beta more than **triples** the training data used by Alpha, for a total of **1.8B tokens**, including a wider variety of medical tasks and instructions (e.g., text summarization, explanation, diagnosis, text classification, treatment recommendation, ...).
|
52 |
|
53 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/bCuV5kZUT9H9UECAOWDRc.png)
|
54 |
|
55 |
+
To mitigate catastrophic forgetting and enable the model to effectively learn new capabilities like **function calling**, we incorporated a diverse set of high-quality general-purpose data constituting 20% of the total training set. The curated data includes some of the highest-quality content available across a range of topics, including mathematics, programming, STEM, and very long instructions (> 8k tokens), to enrich the model's adaptability and comprehension across diverse domains.
|
56 |
+
|
57 |
Beta also boosts the alignment and safety stages with respect to Alpha. This includes a [medical preference dataset](https://huggingface.co/datasets/TsinghuaC3I/UltraMedical-Preference), as well as the red-teaming dataset (available soon).
|
58 |
|
59 |
Complete training details, model merging configurations, and all training data (including synthetically generated data) can be found below. This includes [the RAG system](https://github.com/HPAI-BSC/prompt_engine) that was developed to test Aloe Beta in a deployment setup. Aloe comes with a healthcare-specific risk assessment to facilitate to the safe use and deployment of such systems.
|
|
|
75 |
|
76 |
## Model Performance
|
77 |
|
78 |
+
Aloe Beta has been tested on the most popular healthcare QA datasets, with and without **Medprompt** inference technique. Results show competitive performance, achieving SOTA within models of the same size.
|
79 |
|
80 |
|
81 |
|
82 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/s8rWbwpYTkar5_X_LnOhb.png)
|
83 |
|
|
|
|
|
84 |
<!---
|
85 |
The Beta model has been developed to excel in several different medical tasks. For this reason, we evaluated the model in many different medical tasks:
|
86 |
|
|
|
89 |
|
90 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/2NW3im0aH2u6RKp969sjx.png)
|
91 |
|
92 |
+
|
93 |
+
|
94 |
+
-->
|
95 |
+
|
96 |
We also compared the performance of the model in the general domain, using the OpenLLM Leaderboard benchmark. Aloe-Beta gets competitive results with the current SOTA general models in the most used general benchmarks and outperforms the medical models:
|
97 |
|
98 |
|
99 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/UKW36y9yjqn3Q5OfrCuIc.png)
|
100 |
+
|
101 |
+
More evaluations coming soon!
|
102 |
+
|
103 |
|
|
|
|
|
104 |
## Uses
|
105 |
|
106 |
### Direct Use
|
|
|
249 |
|
250 |
The training set consists of around 1.8B tokens, having 3 different types of data:
|
251 |
|
252 |
+
- Medical domain datasets. Includes data from 20 different medical tasks.
|
253 |
- [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
|
254 |
- [HPAI-BSC/chain-of-diagnosis](https://huggingface.co/datasets/HPAI-BSC/chain-of-diagnosis)
|
255 |
- [HPAI-BSC/MedS-Ins](https://huggingface.co/datasets/HPAI-BSC/MedS-Ins)
|
256 |
- [HPAI-BSC/ultramedica](https://huggingface.co/datasets/HPAI-BSC/ultramedical)
|
257 |
+
- Synthetic data. We expanded our training data by generating high-quality answers using Llama3.1-70B:
|
258 |
- [HPAI-BSC/pubmedqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/pubmedqa-cot-llama31)
|
259 |
- [HPAI-BSC/medqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medqa-cot-llama31)
|
260 |
- [HPAI-BSC/medmcqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medmcqa-cot-llama31)
|
261 |
- [HPAI-BSC/headqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/headqa-cot-llama31)
|
262 |
- [HPAI-BSC/MMLU-medical-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/MMLU-medical-cot-llama31)
|
263 |
- [HPAI-BSC/Polymed-QA](https://huggingface.co/datasets/HPAI-BSC/Polymed-QA)
|
264 |
+
- General data. It includes maths, STEM, code, function calling, and instruction of very long instructions.
|
|
|
265 |
- [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
|
266 |
|
267 |
#### Training parameters
|
|
|
285 |
The model is aligned using the Direct Preference Optimization (DPO) technique through a two-step process:
|
286 |
|
287 |
1. General DPO Alignment: This step uses a dataset combining medical, general preference, and safety data. We used our dataset [HPAI-BSC/Aloe-Beta-DPO](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-DPO). We split the dataset into five parts, and the model was trained iteratively for one epoch on each chunk. We used a learning rate of 2e-7.
|
288 |
+
2. Red-Teaming Alignment: This step further fine-tunes the model to resist a variety of potential attacks, enhancing its robustness and security. The dataset will be shared soon. In this stage, we set the learning rate to 1e-7.
|
289 |
|
290 |
<!---
|
291 |
^^^ LINKS TO DPO DATA (DPO added, missing the RT^^^
|