--- license: apache-2.0 language: - en - ja tags: - finetuned library_name: transformers pipeline_tag: text-generation --- # Our Models - [Vecteus](https://huggingface.co/Local-Novel-LLM-project/Vecteus-v1) - [Ninja-v1](https://huggingface.co/Local-Novel-LLM-project/Ninja-v1) - [Ninja-v1-NSFW](https://huggingface.co/Local-Novel-LLM-project/Ninja-v1-NSFW) - [Ninja-v1-128k](https://huggingface.co/Local-Novel-LLM-project/Ninja-v1-128k) - [Ninja-v1-NSFW-128k](https://huggingface.co/Local-Novel-LLM-project/Ninja-v1-NSFW-128k) ## Model Card for VecTeus-v1.0 The Mistral-7B--based Large Language Model (LLM) is an noveldataset fine-tuned version of the Mistral-7B-v0.1 VecTeus has the following changes compared to Mistral-7B-v0.1. - 128k context window (8k context in v0.1) - Achieving both high quality Japanese and English generation - Can be generated NSFW - Memory ability that does not forget even after long-context generation This model was created with the help of GPUs from the first LocalAI hackathon. We would like to take this opportunity to thank ## List of Creation Methods - Chatvector for multiple models - Simple linear merging of result models - Domain and Sentence Enhancement with LORA - Context expansion ## Instruction format Freed from templates. Congratulations ## Example prompts to improve (Japanese) - BAD: あなたは○○として振る舞います - GOOD: あなたは○○です - BAD: あなたは○○ができます - GOOD: あなたは○○をします ## Performing inference ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "Local-Novel-LLM-project/Vecteus-v1" new_tokens = 1024 model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.float16, attn_implementation="flash_attention_2", device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_id) system_prompt = "あなたはプロの小説家です。\n小説を書いてください\n-------- " prompt = input("Enter a prompt: ") system_prompt += prompt + "\n-------- " model_inputs = tokenizer([system_prompt], return_tensors="pt").to("cuda") generated_ids = model.generate(**model_inputs, max_new_tokens=new_tokens, do_sample=True) print(tokenizer.batch_decode(generated_ids)[0]) ```` ## Merge recipe - VT0.1 = Ninjav1 + Original Lora - VT0.2 = Ninjav1 128k + Original Lora - VT0.2on0.1 = VT0.1 + VT0.2 - VT1 = all VT Series + Lora + Ninja 128k and Normal ## Other points to keep in mind - The training data may be biased. Be careful with the generated sentences. - Memory usage may be large for long inferences. - If possible, we recommend inferring with llamacpp rather than Transformers.