ActulLLY ITS woRKING IT JUST NEEDS TRAINING DATA!! .... Personally i found models run better in gpt4all! - (served better by lmstudio)
This project is implemented by simply patching the base Mistral implementation in Huggingface transformers using a new modeling_mistral.py and a new configuration_mistral.py and otherwise applying standard transformers features (e.g. the default Trainer).
IE: First Clone the latest transformers enter the models\mistral folder and upload the modelling_mistral.py then cd transformers and install frot he folder pip install ./transformers
after it can be loaded normally for training;
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/mistral-7b-bnb-4bit",
"unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
"unsloth/llama-2-7b-bnb-4bit",
"unsloth/llama-2-13b-bnb-4bit",
"unsloth/codellama-34b-bnb-4bit",
"unsloth/tinyllama-bnb-4bit",
"unsloth/gemma-7b-bnb-4bit", # New Google 6 trillion tokens model 2.5x faster!
"unsloth/gemma-2b-bnb-4bit",
] # More models at https://huggingface.co/unsloth
model = FastLanguageModel.from_pretrained(
model_name = "LeroyDyer/Mixtral_AI_CyberBrain_3.0", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
max_seq_length = 2048,
dtype = dtype,
load_in_4bit = load_in_4bit,
# trust_remote_code = True,
ignore_mismatched_sizes = True,
merged_talk_heads=True,
merged_lm_and_talk_heads=False,
merged_lm_and_think_heads=True,
use_concat_talk_head=True,
use_shallow_think=True,
use_shallow_talk=False,
use_complex_think_head=False,
use_complex_talk_head=True,
use_weighted_talk_head=True,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id,truncation=True,padding_side="right")
tokenizer.pad_token_id = tokenizer.eos_token_id
model.tokenizer = tokenizer
model.train
right now the modelling_mistral.py s still havng problems loading remotely hence the hacky way... but after its fixed it will be fine.
merge
This is a merge of pre-trained language models created using mergekit. yes multiple verions of this model was merged in attempts to grab the neccasary tensors ... but some how it did not build as some parameters was not loading. ie it would not load the config file! hopefully this will be rectified soon. so remote loading will be fine ... enabling for enhanced training. the model was trained to perfection so it still works fine! the lora was made so tat later it can be loaded with the model for further training of the effected tensors...
Extended capabilities:
mistralai/Mistral-7B-Instruct-v0.1 - Prime-Base
ChaoticNeutrals/Eris-LelantaclesV2-7b - role play
ChaoticNeutrals/Eris_PrimeV3-Vision-7B - vision
rvv-karma/BASH-Coder-Mistral-7B - coding
Locutusque/Hercules-3.1-Mistral-7B - Unhinging
KoboldAI/Mistral-7B-Erebus-v3 - NSFW
Locutusque/Hyperion-2.1-Mistral-7B - CHAT
Severian/Nexus-IKM-Mistral-7B-Pytorch - Thinking
NousResearch/Hermes-2-Pro-Mistral-7B - Generalizing
mistralai/Mistral-7B-Instruct-v0.2 - BASE
Nitral-AI/ProdigyXBioMistral_7B - medical
Nitral-AI/Infinite-Mika-7b - 128k - Context Expansion enforcement
Nous-Yarn-Mistral-7b-128k - 128k - Context Expansion
yanismiraoui/Yarn-Mistral-7b-128k-sharded
ChaoticNeutrals/Eris_Prime-V2-7B - Roleplay
his Expert is a companon to the MEGA_MIND 24b CyberSeries represents a groundbreaking leap in the realm of language models, integrating a diverse array of expert models into a unified framework. At its core lies the Mistral-7B-Instruct-v0.2, a refined instructional model designed for versatility and efficiency.
Enhanced with an expanded context window and advanced routing mechanisms, the Mistral-7B-Instruct-v0.2 exemplifies the power of Mixture of Experts, allowing seamless integration of specialized sub-models. This architecture facilitates unparalleled performance and scalability, enabling the CyberSeries to tackle a myriad of tasks with unparalleled speed and accuracy.
Among its illustrious sub-models, the OpenOrca - Mistral-7B-8k shines as a testament to fine-tuning excellence, boasting top-ranking performance in its class. Meanwhile, the Hermes 2 Pro introduces cutting-edge capabilities such as Function Calling and JSON Mode, catering to diverse application needs.
Driven by Reinforcement Learning from AI Feedback, the Starling-LM-7B-beta demonstrates remarkable adaptability and optimization, while the Phi-1.5 Transformer model stands as a beacon of excellence across various domains, from common sense reasoning to medical inference.
With models like BioMistral tailored specifically for medical applications and Nous-Yarn-Mistral-7b-128k excelling in handling long-context data, the MEGA_MIND 24b CyberSeries emerges as a transformative force in the landscape of language understanding and artificial intelligence.
Experience the future of language models with the MEGA_MIND 24b CyberSeries, where innovation meets performance, and possibilities are limitless.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
slices:
- sources:
- model: LeroyDyer/Mixtral_AI_CyberBrain_2.0
layer_range: [0, 32]
- model: ezelikman/quietstar-8-ahead
layer_range: [0, 32]
# or, the equivalent models: syntax:
# models:
# - model: mistralai/Mistral-7B-Instruct-v0.2
# LaRGER MODEL MUST BE BASE or
# BASE MODEL MUST BE THE TOKENIZER YOU WISH TO ADOPT
# so for models with customized processes they must be the base model
# If the base model has remote code then this must be collected and added
# to the repo after and the config file adusted to allow for automapping to your new repo
# - model: yanismiraoui/Yarn-Mistral-7b-128k-sharded
merge_method: slerp
base_model: ezelikman/quietstar-8-ahead
parameters:
t:
- filter: self_attn
value: [0.3, 0.6, 0.3786, 0.6, 0.6]
- filter: mlp
value: [0.7, 0.4, 0.6, 0.4, 0.7]
- value: 0.5 # fallback for rest of tensors
dtype: float16
- Downloads last month
- 124