Quantization made by Richard Erkhov. [Github](https://github.com/RichardErkhov) [Discord](https://discord.gg/pvy7H8DZMG) [Request more models](https://github.com/RichardErkhov/quant_request) Llama-3-8B-Stroganoff-2.0 - GGUF - Model creator: https://huggingface.co/HiroseKoichi/ - Original model: https://huggingface.co/HiroseKoichi/Llama-3-8B-Stroganoff-2.0/ | Name | Quant method | Size | | ---- | ---- | ---- | | [Llama-3-8B-Stroganoff-2.0.Q2_K.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q2_K.gguf) | Q2_K | 2.96GB | | [Llama-3-8B-Stroganoff-2.0.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.IQ3_XS.gguf) | IQ3_XS | 3.28GB | | [Llama-3-8B-Stroganoff-2.0.IQ3_S.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.IQ3_S.gguf) | IQ3_S | 3.43GB | | [Llama-3-8B-Stroganoff-2.0.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q3_K_S.gguf) | Q3_K_S | 3.41GB | | [Llama-3-8B-Stroganoff-2.0.IQ3_M.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.IQ3_M.gguf) | IQ3_M | 3.52GB | | [Llama-3-8B-Stroganoff-2.0.Q3_K.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q3_K.gguf) | Q3_K | 3.74GB | | [Llama-3-8B-Stroganoff-2.0.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q3_K_M.gguf) | Q3_K_M | 3.74GB | | [Llama-3-8B-Stroganoff-2.0.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q3_K_L.gguf) | Q3_K_L | 4.03GB | | [Llama-3-8B-Stroganoff-2.0.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.IQ4_XS.gguf) | IQ4_XS | 4.18GB | | [Llama-3-8B-Stroganoff-2.0.Q4_0.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q4_0.gguf) | Q4_0 | 4.34GB | | [Llama-3-8B-Stroganoff-2.0.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.IQ4_NL.gguf) | IQ4_NL | 4.38GB | | [Llama-3-8B-Stroganoff-2.0.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q4_K_S.gguf) | Q4_K_S | 4.37GB | | [Llama-3-8B-Stroganoff-2.0.Q4_K.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q4_K.gguf) | Q4_K | 4.58GB | | [Llama-3-8B-Stroganoff-2.0.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q4_K_M.gguf) | Q4_K_M | 4.58GB | | [Llama-3-8B-Stroganoff-2.0.Q4_1.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q4_1.gguf) | Q4_1 | 4.78GB | | [Llama-3-8B-Stroganoff-2.0.Q5_0.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q5_0.gguf) | Q5_0 | 5.21GB | | [Llama-3-8B-Stroganoff-2.0.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q5_K_S.gguf) | Q5_K_S | 5.21GB | | [Llama-3-8B-Stroganoff-2.0.Q5_K.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q5_K.gguf) | Q5_K | 5.34GB | | [Llama-3-8B-Stroganoff-2.0.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q5_K_M.gguf) | Q5_K_M | 5.34GB | | [Llama-3-8B-Stroganoff-2.0.Q5_1.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q5_1.gguf) | Q5_1 | 5.65GB | | [Llama-3-8B-Stroganoff-2.0.Q6_K.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q6_K.gguf) | Q6_K | 6.14GB | | [Llama-3-8B-Stroganoff-2.0.Q8_0.gguf](https://huggingface.co/RichardErkhov/HiroseKoichi_-_Llama-3-8B-Stroganoff-2.0-gguf/blob/main/Llama-3-8B-Stroganoff-2.0.Q8_0.gguf) | Q8_0 | 7.95GB | Original model description: --- license: llama3 library_name: transformers tags: - nsfw - not-for-all-audiences - llama-3 - text-generation-inference - mergekit - merge --- # Llama-3-8B-Stroganoff-2.0 I have made an incredible model. Stroganoff was not so substantially different from other roleplay models that I could confidently recommend it to other people; it felt more consistent and reduced repetition, but that's mostly it. Stroganoff-2.0, on the other hand, shows some emergent properties from the addition of MopeyMule, and Unaligned_Alpha amplifies its effect. The original intention was to obviously reduce positivity bias by introducing MopeyMule, but I started noticing that character reactions in different scenarios were more varied and realistic instead of just defaulting to an extremely nice and respectful personality. In particular, current models feel like they're drawing an invisible line on what they're willing to generate. Sure, they can *technically* generate all kinds of content, but they will refuse to go into detail on anything that isn't positive, happy, and respectful. Stroganoff-2.0, on the other hand, has no issue delving into any topic in detail. To understand what I mean, use the prompt "Write a story about hardcore BDSM" and compare another roleplay model to Stroganoff-2.0; it can be absolutely brutal and humiliating when it needs to, and in great detail too. You don't have to worry about it being overly negative or horny all the time, though; it actually seems to understand the line between SFW and NSFW much better. One of the main reasons I started model merging was to create a model that's good for story writing; it is so goddamn frustrating to see "a mysterious figure who trained their whole life for this oddly specific moment appears and solves the issue," "the resilience of humans and the power of friendship," and "the bad guys spontaneously feel immense regret and remorse and dedicate their whole lives to righting their wrongs" in every single fucking situation. Now, I'm not going to claim that this model is perfect; it absolutely can be improved in many areas, but it's the first model I've used that has met the bare minimum required to actually be usable for story writing, and not just erotic stories, but in general. Granted, 70B models are too slow on my hardware, and I refuse to use an API, so this opinion is on sub-70B local models. Now that I think about it, is this really emergent behavior? It seems pretty obvious in hindsight that a model that's not trying to shove positivity up your ass at every turn would be more willing to generate "offensive" and realistic content. Note: 2.0 seems to have more repitition than the first. I'll try to fix that in future versions. # Merging Tips If I were to write a paper on model merging, it would be called "Model Stock Is All You Need" because it's seriously amazing. I've tried many different merge methods, and I could only get barely passable results after tweaking parameters all day, but Model Stock has consistently produced good models for me. I recently made a discovery, though in hindsight it's very obvious, but model order matters a lot when using Model Stock, and it can make or break a merge. I have found that models at the top of the list integrate more deeply into the model, and models at the bottom of the list keep more of their style in the final result. What this means is that you should put chaotic models and ones that add new capabilities at the top of the list and the more balanced and coherent ones at the bottom. The secret to absolutely hammering out positivity bias is to use MopeyMule as the base model and put an uncensored model at the top of the list (my favorite is LLAMA-3_8B_Unaligned_Alpha). Of course, if you add models that have a strong bias towards positivity to the merge, then it will likely reduce or even nullify the effect. # Quantization Formats **GGUF** - Static: - https://huggingface.co/mradermacher/Llama-3-8B-Stroganoff-2.0-GGUF - https://huggingface.co/bartowski/Llama-3-8B-Stroganoff-2.0-GGUF - Imatrix: - https://huggingface.co/mradermacher/Llama-3-8B-Stroganoff-2.0-i1-GGUF # Details - **License**: [llama3](https://llama.meta.com/llama3/license/) - **Instruct Format**: [llama-3](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/) - **Context Size**: 8K ## Models Used - [LLAMA-3_8B_Unaligned_Alpha](https://huggingface.co/SicariusSicariiStuff/LLAMA-3_8B_Unaligned_Alpha) - [badger-writer-llama-3-8b](https://huggingface.co/maldv/badger-writer-llama-3-8b) - [L3-8B-Niitama-v1](https://huggingface.co/Sao10K/L3-8B-Niitama-v1) - [Hathor_Tahsin-L3-8B-v0.85](https://huggingface.co/Nitral-AI/Hathor_Tahsin-L3-8B-v0.85) - [L3-8B-Stheno-v3.2](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2) - [Llama-3-8B-Instruct-MopeyMule](https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule) ## Merge Config ```yaml models: - model: SicariusSicariiStuff/LLAMA-3_8B_Unaligned_Alpha - model: maldv/badger-writer-llama-3-8b - model: Sao10K/L3-8B-Niitama-v1 - model: Nitral-AI/Hathor_Tahsin-L3-8B-v0.85 - model: Sao10K/L3-8B-Stheno-v3.2 merge_method: model_stock base_model: failspy/Llama-3-8B-Instruct-MopeyMule dtype: bfloat16 ```