Trained with compute from Backyard.ai | Thanks to them and @dynafire for helping me out.


Fimbulvetr-v2 but extended to 16K with PoSE. A sane context value would be ~12K before it degrades.

Note:
- I left Rope Theta at 10K for this train, instead of expanding it like with Stheno 3.3. Solar did not play well with extended theta, grad norm / loss values went parabolic or plunged from 10000+ down. Unreliable pretty much, unlike Stheno 3.3's training run.


Notes:
- I noticed people having bad issues with quants. Be it GGUF or others, at 8 bit or less. Kind of a weird issue? I had little to no issues during testing unquanted.
- Slightly different results from base Fimbulvetr-v2, but during my tests they are similar enough. The vibes are still there.
- Formatting issues happen rarely. Sometimes. A reroll / regenerate fixes it from tests.
- I get consistent and reliable answers at ~11K context fine.
- Still coherent at up to 16K though! Just works not that well.

I recommend sticking up to 12K context, but loading the model at 16K for inference. It has a really accurate context up to 10K from multiple different extended long context tests. 16K works fine for roleplays, but not for more detailed tasks.

Needle

Red Needle in Haystack testing results for this specific one are usually due to weird result artifacts, like the model answering part of the key, or commenting extra. Basically, they got the result, but it's incomplete or there's additional stuff taken. Something like ' 3211' or '3211 and' instead of '321142'. Weird. Hence why its coherent and semi-reliable for roleplays at 16K context.

Downloads last month
58
Safetensors
Model size
10.7B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Sao10K/Fimbulvetr-11B-v2.1-16K

Merges
3 models
Quantizations
12 models

Spaces using Sao10K/Fimbulvetr-11B-v2.1-16K 6