JordiBayarri commited on
Commit
91ee1fa
verified
1 Parent(s): 79f31f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -12
README.md CHANGED
@@ -47,11 +47,13 @@ Aloe is trained in 20 medical tasks, resulting in a robust and versatile healthc
47
 
48
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/VUYw4IdANKGrH2VOedwH0.png)
49
 
50
- Aloe-70B-Beta is the latest iteration in the Aloe family, building and improving on the success of its predecessor, [Aloe-8B-Alpha](https://huggingface.co/HPAI-BSC/Llama3-Aloe-8B-Alpha) in a larger model size.
51
- Beta more than triples the training data used by Alpha, for a total of 1.8B tokens, including a wider variety of medical tasks and instructions (e.g., text summarization, explanation, diagnosis, text classification, treatment recommendation, ...).
52
 
53
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/bCuV5kZUT9H9UECAOWDRc.png)
54
 
 
 
55
  Beta also boosts the alignment and safety stages with respect to Alpha. This includes a [medical preference dataset](https://huggingface.co/datasets/TsinghuaC3I/UltraMedical-Preference), as well as the red-teaming dataset (available soon).
56
 
57
  Complete training details, model merging configurations, and all training data (including synthetically generated data) can be found below. This includes [the RAG system](https://github.com/HPAI-BSC/prompt_engine) that was developed to test Aloe Beta in a deployment setup. Aloe comes with a healthcare-specific risk assessment to facilitate to the safe use and deployment of such systems.
@@ -73,14 +75,12 @@ Complete training details, model merging configurations, and all training data (
73
 
74
  ## Model Performance
75
 
76
- Aloe Beta has been tested on the most popular healthcare QA datasets, with and without Medprompt inference technique. Results show competitive performance, achieving SOTA within models of the same size.
77
 
78
 
79
 
80
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/s8rWbwpYTkar5_X_LnOhb.png)
81
 
82
- More evaluations coming soon!
83
-
84
  <!---
85
  The Beta model has been developed to excel in several different medical tasks. For this reason, we evaluated the model in many different medical tasks:
86
 
@@ -89,12 +89,18 @@ The Beta model has been developed to excel in several different medical tasks. F
89
 
90
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/2NW3im0aH2u6RKp969sjx.png)
91
 
 
 
 
 
92
  We also compared the performance of the model in the general domain, using the OpenLLM Leaderboard benchmark. Aloe-Beta gets competitive results with the current SOTA general models in the most used general benchmarks and outperforms the medical models:
93
 
94
 
 
 
 
 
95
 
96
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/imK19fzyMUvIJaAbSVnGE.png)
97
- -->
98
  ## Uses
99
 
100
  ### Direct Use
@@ -243,20 +249,19 @@ We used Deepspeed's Zero-3 distributed training using the following hardware:
243
 
244
  The training set consists of around 1.8B tokens, having 3 different types of data:
245
 
246
- - Medical domain datasets:
247
  - [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
248
  - [HPAI-BSC/chain-of-diagnosis](https://huggingface.co/datasets/HPAI-BSC/chain-of-diagnosis)
249
  - [HPAI-BSC/MedS-Ins](https://huggingface.co/datasets/HPAI-BSC/MedS-Ins)
250
  - [HPAI-BSC/ultramedica](https://huggingface.co/datasets/HPAI-BSC/ultramedical)
251
- - Synthetic data generated using Llama3.1:
252
  - [HPAI-BSC/pubmedqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/pubmedqa-cot-llama31)
253
  - [HPAI-BSC/medqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medqa-cot-llama31)
254
  - [HPAI-BSC/medmcqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medmcqa-cot-llama31)
255
  - [HPAI-BSC/headqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/headqa-cot-llama31)
256
  - [HPAI-BSC/MMLU-medical-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/MMLU-medical-cot-llama31)
257
  - [HPAI-BSC/Polymed-QA](https://huggingface.co/datasets/HPAI-BSC/Polymed-QA)
258
- - Genstruct data (coming soon)
259
- - General data:
260
  - [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
261
 
262
  #### Training parameters
@@ -280,7 +285,7 @@ The model trained was merged with the Llama-3.1-Instruct model using the DARE_TI
280
  The model is aligned using the Direct Preference Optimization (DPO) technique through a two-step process:
281
 
282
  1. General DPO Alignment: This step uses a dataset combining medical, general preference, and safety data. We used our dataset [HPAI-BSC/Aloe-Beta-DPO](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-DPO). We split the dataset into five parts, and the model was trained iteratively for one epoch on each chunk. We used a learning rate of 2e-7.
283
- 2. Red-Teaming Alignment: This step further fine-tunes the model to resist a variety of potential attacks, enhancing its robustness and security. Dataset will be shared soon. In this stage, we set the learning rate to 1e-7.
284
 
285
  <!---
286
  ^^^ LINKS TO DPO DATA (DPO added, missing the RT^^^
 
47
 
48
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/VUYw4IdANKGrH2VOedwH0.png)
49
 
50
+ **Aloe-70B-Beta** is the latest iteration in the **Aloe family**, building and improving on the success of its predecessor, [Aloe-8B-Alpha](https://huggingface.co/HPAI-BSC/Llama3-Aloe-8B-Alpha) in a larger model size.
51
+ Beta more than **triples** the training data used by Alpha, for a total of **1.8B tokens**, including a wider variety of medical tasks and instructions (e.g., text summarization, explanation, diagnosis, text classification, treatment recommendation, ...).
52
 
53
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/bCuV5kZUT9H9UECAOWDRc.png)
54
 
55
+ To mitigate catastrophic forgetting and enable the model to effectively learn new capabilities like **function calling**, we incorporated a diverse set of high-quality general-purpose data constituting 20% of the total training set. The curated data includes some of the highest-quality content available across a range of topics, including mathematics, programming, STEM, and very long instructions (> 8k tokens), to enrich the model's adaptability and comprehension across diverse domains.
56
+
57
  Beta also boosts the alignment and safety stages with respect to Alpha. This includes a [medical preference dataset](https://huggingface.co/datasets/TsinghuaC3I/UltraMedical-Preference), as well as the red-teaming dataset (available soon).
58
 
59
  Complete training details, model merging configurations, and all training data (including synthetically generated data) can be found below. This includes [the RAG system](https://github.com/HPAI-BSC/prompt_engine) that was developed to test Aloe Beta in a deployment setup. Aloe comes with a healthcare-specific risk assessment to facilitate to the safe use and deployment of such systems.
 
75
 
76
  ## Model Performance
77
 
78
+ Aloe Beta has been tested on the most popular healthcare QA datasets, with and without **Medprompt** inference technique. Results show competitive performance, achieving SOTA within models of the same size.
79
 
80
 
81
 
82
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/s8rWbwpYTkar5_X_LnOhb.png)
83
 
 
 
84
  <!---
85
  The Beta model has been developed to excel in several different medical tasks. For this reason, we evaluated the model in many different medical tasks:
86
 
 
89
 
90
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/2NW3im0aH2u6RKp969sjx.png)
91
 
92
+
93
+
94
+ -->
95
+
96
  We also compared the performance of the model in the general domain, using the OpenLLM Leaderboard benchmark. Aloe-Beta gets competitive results with the current SOTA general models in the most used general benchmarks and outperforms the medical models:
97
 
98
 
99
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/UKW36y9yjqn3Q5OfrCuIc.png)
100
+
101
+ More evaluations coming soon!
102
+
103
 
 
 
104
  ## Uses
105
 
106
  ### Direct Use
 
249
 
250
  The training set consists of around 1.8B tokens, having 3 different types of data:
251
 
252
+ - Medical domain datasets. Includes data from 20 different medical tasks.
253
  - [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
254
  - [HPAI-BSC/chain-of-diagnosis](https://huggingface.co/datasets/HPAI-BSC/chain-of-diagnosis)
255
  - [HPAI-BSC/MedS-Ins](https://huggingface.co/datasets/HPAI-BSC/MedS-Ins)
256
  - [HPAI-BSC/ultramedica](https://huggingface.co/datasets/HPAI-BSC/ultramedical)
257
+ - Synthetic data. We expanded our training data by generating high-quality answers using Llama3.1-70B:
258
  - [HPAI-BSC/pubmedqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/pubmedqa-cot-llama31)
259
  - [HPAI-BSC/medqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medqa-cot-llama31)
260
  - [HPAI-BSC/medmcqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medmcqa-cot-llama31)
261
  - [HPAI-BSC/headqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/headqa-cot-llama31)
262
  - [HPAI-BSC/MMLU-medical-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/MMLU-medical-cot-llama31)
263
  - [HPAI-BSC/Polymed-QA](https://huggingface.co/datasets/HPAI-BSC/Polymed-QA)
264
+ - General data. It includes maths, STEM, code, function calling, and instruction of very long instructions.
 
265
  - [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
266
 
267
  #### Training parameters
 
285
  The model is aligned using the Direct Preference Optimization (DPO) technique through a two-step process:
286
 
287
  1. General DPO Alignment: This step uses a dataset combining medical, general preference, and safety data. We used our dataset [HPAI-BSC/Aloe-Beta-DPO](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-DPO). We split the dataset into five parts, and the model was trained iteratively for one epoch on each chunk. We used a learning rate of 2e-7.
288
+ 2. Red-Teaming Alignment: This step further fine-tunes the model to resist a variety of potential attacks, enhancing its robustness and security. The dataset will be shared soon. In this stage, we set the learning rate to 1e-7.
289
 
290
  <!---
291
  ^^^ LINKS TO DPO DATA (DPO added, missing the RT^^^