louisbrulenaudet
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -3,8 +3,6 @@ tags:
|
|
3 |
- merge
|
4 |
- mergekit
|
5 |
- lazymergekit
|
6 |
-
- mlabonne/OmniBeagle-7B
|
7 |
-
- WizardLM/WizardMath-7B-V1.1
|
8 |
- Maths
|
9 |
base_model:
|
10 |
- mlabonne/OmniBeagle-7B
|
@@ -25,6 +23,19 @@ Pearl-7B-slerp is a merge of the following models using [LazyMergekit](https://c
|
|
25 |
* [mlabonne/OmniBeagle-7B](https://huggingface.co/mlabonne/OmniBeagle-7B)
|
26 |
* [WizardLM/WizardMath-7B-V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
Spherical Linear Interpolation (SLERP) serves as a technique for seamlessly interpolating between two vectors while maintaining a constant rate of change and upholding the geometric properties of the spherical space in which these vectors exist.
|
29 |
|
30 |
Opting for SLERP over traditional linear interpolation is motivated by various considerations. Linear interpolation in high-dimensional spaces may result in a reduction in the magnitude of the interpolated vector, diminishing the scale of weights. Additionally, in many cases, the alteration in the weights' direction conveys more meaningful information, such as feature learning and representation, compared to the magnitude of change.
|
@@ -37,18 +48,6 @@ The implementation of SLERP involves the following steps:
|
|
37 |
|
38 |
In essence, SLERP provides a robust mechanism for interpolating vectors, offering advantages in preserving directional information and mitigating issues associated with linear interpolation in high-dimensional spaces.
|
39 |
|
40 |
-
## Evaluation
|
41 |
-
|
42 |
-
The evaluation was performed using the HuggingFace Open LLM Leaderboard.
|
43 |
-
|
44 |
-
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K | #Params (B) |
|
45 |
-
|-------------------------------------------|------------|-------|-----------|-------|------------|------------|-------|--------------|
|
46 |
-
| **louisbrulenaudet/Pearl-7B-slerp** |**72.75** | 68.00 | 87.16 | 64.04 | 62.35 | 81.29 |**73.62**| 7.24 |
|
47 |
-
| mistralai/Mixtral-8x7B-Instruct-v0.1 | 72.62 | 70.22 | 87.63 | 71.16 | 64.58 | 81.37 | 60.73 | 46.7 |
|
48 |
-
| microsoft/phi-2 | 61.33 | 61.09 | 75.11 | 58.11 | 44.47 | 74.35 | 54.81 | 2.78 |
|
49 |
-
| microsoft/Orca-2-13b | 58.64 | 60.67 | 79.81 | 60.37 | 56.41 | 76.64 | 17.97 | 13 |
|
50 |
-
| mistralai/Mistral-7B-Instruct-v0.1 | 54.96 | 54.52 | 75.63 | 55.38 | 56.28 | 73.72 | 14.25 | 7.24 |
|
51 |
-
| meta-llama/Llama-2-7b-hf | 50.97 | 53.07 | 78.59 | 46.87 | 38.76 | 74.03 | 14.48 | 6.74 |
|
52 |
|
53 |
## Configuration
|
54 |
|
|
|
3 |
- merge
|
4 |
- mergekit
|
5 |
- lazymergekit
|
|
|
|
|
6 |
- Maths
|
7 |
base_model:
|
8 |
- mlabonne/OmniBeagle-7B
|
|
|
23 |
* [mlabonne/OmniBeagle-7B](https://huggingface.co/mlabonne/OmniBeagle-7B)
|
24 |
* [WizardLM/WizardMath-7B-V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
|
25 |
|
26 |
+
### Evaluation
|
27 |
+
|
28 |
+
The evaluation was performed using the HuggingFace Open LLM Leaderboard.
|
29 |
+
|
30 |
+
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K | #Params (B) |
|
31 |
+
|-------------------------------------------|------------|-------|-----------|-------|------------|------------|-------|--------------|
|
32 |
+
| **louisbrulenaudet/Pearl-7B-slerp** |**72.75** | 68.00 | 87.16 | 64.04 | 62.35 | 81.29 |**73.62**| 7.24 |
|
33 |
+
| mistralai/Mixtral-8x7B-Instruct-v0.1 | 72.62 | 70.22 | 87.63 | 71.16 | 64.58 | 81.37 | 60.73 | 46.7 |
|
34 |
+
| microsoft/phi-2 | 61.33 | 61.09 | 75.11 | 58.11 | 44.47 | 74.35 | 54.81 | 2.78 |
|
35 |
+
| microsoft/Orca-2-13b | 58.64 | 60.67 | 79.81 | 60.37 | 56.41 | 76.64 | 17.97 | 13 |
|
36 |
+
| mistralai/Mistral-7B-Instruct-v0.1 | 54.96 | 54.52 | 75.63 | 55.38 | 56.28 | 73.72 | 14.25 | 7.24 |
|
37 |
+
| meta-llama/Llama-2-7b-hf | 50.97 | 53.07 | 78.59 | 46.87 | 38.76 | 74.03 | 14.48 | 6.74 |
|
38 |
+
|
39 |
Spherical Linear Interpolation (SLERP) serves as a technique for seamlessly interpolating between two vectors while maintaining a constant rate of change and upholding the geometric properties of the spherical space in which these vectors exist.
|
40 |
|
41 |
Opting for SLERP over traditional linear interpolation is motivated by various considerations. Linear interpolation in high-dimensional spaces may result in a reduction in the magnitude of the interpolated vector, diminishing the scale of weights. Additionally, in many cases, the alteration in the weights' direction conveys more meaningful information, such as feature learning and representation, compared to the magnitude of change.
|
|
|
48 |
|
49 |
In essence, SLERP provides a robust mechanism for interpolating vectors, offering advantages in preserving directional information and mitigating issues associated with linear interpolation in high-dimensional spaces.
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
52 |
## Configuration
|
53 |
|