v000000
/

L3-Umbral-Storm-8B-t0.0001

Text Generation

Not-For-All-Audiences

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

L3-Umbral-Storm-8B-t0.0001 / README.md

v000000's picture

Update README.md

b72ea86 verified 5 months ago

|

history blame contribute delete

2.59 kB

	---
	base_model:
	- akjindal53244/Llama-3.1-Storm-8B
	- Casual-Autopsy/L3-Umbral-Mind-RP-v2.0-8B
	library_name: transformers
	tags:
	- merge
	- llama
	- not-for-all-audiences
	---

	# Llama-3-Umbral-Storm-8B (8K)

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/79tIjC6Ykm4rlwOHa9uzZ.png)

	RP model, "L3-Umbral-Mind-v2.0" as a base, nearswapped with one of the smartest L3.1 models "Storm".

	* Warning: Based on Mopey-Mule so it should be negative, don't use this model for any truthful information or advice.

	* <b>----></b>[ GGUF Q8 static](https://huggingface.co/v000000/L3-Umbral-Storm-8B-t0.0001-Q8_0-GGUF)

	# Thank you mradermacher for the quants!

	* [GGUFs](https://huggingface.co/mradermacher/L3-Umbral-Storm-8B-t0.0001-GGUF)
	* [GGUFs imatrix](https://huggingface.co/mradermacher/L3-Umbral-Storm-8B-t0.0001-i1-GGUF)

	-------------------------------------------------------------------------------

	## merge

	This is a merge of pre-trained language models.

	## Merge Details

	This model is on the Llama-3 arch with Llama-3.1 merged in, so it has 8k context length. But could possibly be extended slightly with RoPE due to the L3.1 layers.

	### Merge Method

	This model was merged using the <b>NEARSWAP t0.0001</b> merge algorithm.

	### Models Merged

	The following models were included in the merge:
	* Base Model: [Casual-Autopsy/L3-Umbral-Mind-RP-v2.0-8B](https://huggingface.co/Casual-Autopsy/L3-Umbral-Mind-RP-v2.0-8B)
	* [akjindal53244/Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)

	### Configuration

	```yaml
	slices:
	- sources:
	- model: Casual-Autopsy/L3-Umbral-Mind-RP-v2.0-8B
	layer_range: [0, 32]
	- model: akjindal53244/Llama-3.1-Storm-8B
	layer_range: [0, 32]
	merge_method: nearswap
	base_model: Casual-Autopsy/L3-Umbral-Mind-RP-v2.0-8B
	parameters:
	t:
	- value: 0.0001
	dtype: bfloat16
	```

	# Prompt Template:
	```bash
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	{system_prompt}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{input}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	{output}<\|eot_id\|>

	```

	Credit to Alchemonaut:

	```python
	def lerp(a, b, t):
	return a * (1 - t) + b * t

	def nearswap(v0, v1, t):
	lweight = np.abs(v0 - v1)
	with np.errstate(divide='ignore', invalid='ignore'):
	lweight = np.where(lweight != 0, t / lweight, 1.0)
	lweight = np.nan_to_num(lweight, nan=1.0, posinf=1.0, neginf=1.0)
	np.clip(lweight, a_min=0.0, a_max=1.0, out=lweight)
	return lerp(v0, v1, lweight)
	```

	Credit to Numbra for idea.