|
--- |
|
base_model: |
|
- akjindal53244/Llama-3.1-Storm-8B |
|
- Casual-Autopsy/L3-Umbral-Mind-RP-v2.0-8B |
|
library_name: transformers |
|
tags: |
|
- merge |
|
- llama |
|
- not-for-all-audiences |
|
--- |
|
|
|
# Llama-3-Umbral-Storm-8B (8K) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/79tIjC6Ykm4rlwOHa9uzZ.png) |
|
|
|
RP model, "L3-Umbral-Mind-v2.0" as a base, nearswapped with one of the smartest L3.1 models "Storm". |
|
|
|
* Warning: Based on Mopey-Mule so it should be negative, don't use this model for any truthful information or advice. |
|
|
|
* <b>----></b>[ GGUF Q8 static](https://huggingface.co/v000000/L3-Umbral-Storm-8B-t0.0001-Q8_0-GGUF) |
|
|
|
# Thank you mradermacher for the quants! |
|
|
|
* [GGUFs](https://huggingface.co/mradermacher/L3-Umbral-Storm-8B-t0.0001-GGUF) |
|
* [GGUFs imatrix](https://huggingface.co/mradermacher/L3-Umbral-Storm-8B-t0.0001-i1-GGUF) |
|
|
|
------------------------------------------------------------------------------- |
|
|
|
## merge |
|
|
|
This is a merge of pre-trained language models. |
|
|
|
## Merge Details |
|
|
|
This model is on the Llama-3 arch with Llama-3.1 merged in, so it has 8k context length. But could possibly be extended slightly with RoPE due to the L3.1 layers. |
|
|
|
### Merge Method |
|
|
|
This model was merged using the <b>NEARSWAP t0.0001</b> merge algorithm. |
|
|
|
### Models Merged |
|
|
|
The following models were included in the merge: |
|
* Base Model: [Casual-Autopsy/L3-Umbral-Mind-RP-v2.0-8B](https://huggingface.co/Casual-Autopsy/L3-Umbral-Mind-RP-v2.0-8B) |
|
* [akjindal53244/Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) |
|
|
|
### Configuration |
|
|
|
```yaml |
|
slices: |
|
- sources: |
|
- model: Casual-Autopsy/L3-Umbral-Mind-RP-v2.0-8B |
|
layer_range: [0, 32] |
|
- model: akjindal53244/Llama-3.1-Storm-8B |
|
layer_range: [0, 32] |
|
merge_method: nearswap |
|
base_model: Casual-Autopsy/L3-Umbral-Mind-RP-v2.0-8B |
|
parameters: |
|
t: |
|
- value: 0.0001 |
|
dtype: bfloat16 |
|
``` |
|
|
|
# Prompt Template: |
|
```bash |
|
<|begin_of_text|><|start_header_id|>system<|end_header_id|> |
|
|
|
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
|
{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
|
{output}<|eot_id|> |
|
|
|
``` |
|
|
|
Credit to Alchemonaut: |
|
|
|
```python |
|
def lerp(a, b, t): |
|
return a * (1 - t) + b * t |
|
|
|
def nearswap(v0, v1, t): |
|
lweight = np.abs(v0 - v1) |
|
with np.errstate(divide='ignore', invalid='ignore'): |
|
lweight = np.where(lweight != 0, t / lweight, 1.0) |
|
lweight = np.nan_to_num(lweight, nan=1.0, posinf=1.0, neginf=1.0) |
|
np.clip(lweight, a_min=0.0, a_max=1.0, out=lweight) |
|
return lerp(v0, v1, lweight) |
|
``` |
|
|
|
Credit to Numbra for idea. |
|
|