--- base_model: ruggsea/Llama3.1-8B-SEP-Chat datasets: - ruggsea/stanford-encyclopedia-of-philosophy_chat_multi_turn language: - en - it license: other tags: - llama-cpp - gguf-my-repo --- # Triangle104/Llama3.1-8B-SEP-Chat-Q8_0-GGUF This model was converted to GGUF format from [`ruggsea/Llama3.1-8B-SEP-Chat`](https://huggingface.co/ruggsea/Llama3.1-8B-SEP-Chat) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. Refer to the [original model card](https://huggingface.co/ruggsea/Llama3.1-8B-SEP-Chat) for more details on the model. --- Model details: - This model is a LoRA finetune of meta-llama/Meta-Llama-3.1-8B trained on multi-turn philosophical conversations. It is designed to engage in philosophical discussions in a conversational yet rigorous manner, maintaining academic standards while being accessible. Model description The model was trained using the TRL (Transformer Reinforcement Learning) library's chat template, enabling it to handle multi-turn conversations in a natural way. It builds upon the capabilities of its predecessor Llama3-stanford-encyclopedia-philosophy-QA but extends it to handle more interactive, back-and-forth philosophical discussions. Chat Format The model uses the standard chat format with roles: <|system|> {{system_prompt}} <|user|> {{user_message}} <|assistant|> {{assistant_response}} Training Details The model was trained with the following system prompt: You are an expert and informative yet accessible Philosophy university professor. Students will engage with you in philosophical discussions. Respond to their questions and comments in a correct and rigorous but accessible way, maintaining academic standards while fostering understanding. Training hyperparameters The following hyperparameters were used during training: Learning rate: 2e-5 Train batch size: 1 Gradient accumulation steps: 4 Effective batch size: 4 Optimizer: paged_adamw_8bit LR scheduler: cosine with warmup Warmup ratio: 0.03 Training epochs: 5 LoRA config: r: 256 alpha: 128 Target modules: all-linear Dropout: 0.05 Framework versions PEFT 0.10.0 Transformers 4.40.1 PyTorch 2.2.2+cu121 TRL latest Datasets 2.19.0 Tokenizers 0.19.1 Intended Use This model is designed for: Multi-turn philosophical discussions Academic philosophical inquiry Teaching and learning philosophy Exploring philosophical concepts through dialogue Limitations The model should not be used as a substitute for professional philosophical advice or formal philosophical education While the model aims to be accurate, its responses should be verified against authoritative sources The model may occasionally generate plausible-sounding but incorrect philosophical arguments As with all language models, it may exhibit biases present in its training data License This model is subject to the Meta Llama 2 license agreement. Please refer to Meta's licensing terms for usage requirements and restrictions. How to use Here's an example of how to use the model: from transformers import AutoModelForCausalLM, AutoTokenizer # Load model and tokenizer model = AutoModelForCausalLM.from_pretrained("ruggsea/Llama3.1-SEP-Chat") tokenizer = AutoTokenizer.from_pretrained("ruggsea/Llama3.1-SEP-Chat") # Example conversation messages = [ {"role": "user", "content": "What is the difference between ethics and morality?"} ] # Format prompt using chat template prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=False ) # Generate response inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) response = tokenizer.decode(outputs[0], skip_special_tokens=True) --- ## Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) ```bash brew install llama.cpp ``` Invoke the llama.cpp server or the CLI. ### CLI: ```bash llama-cli --hf-repo Triangle104/Llama3.1-8B-SEP-Chat-Q8_0-GGUF --hf-file llama3.1-8b-sep-chat-q8_0.gguf -p "The meaning to life and the universe is" ``` ### Server: ```bash llama-server --hf-repo Triangle104/Llama3.1-8B-SEP-Chat-Q8_0-GGUF --hf-file llama3.1-8b-sep-chat-q8_0.gguf -c 2048 ``` Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. Step 1: Clone llama.cpp from GitHub. ``` git clone https://github.com/ggerganov/llama.cpp ``` Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). ``` cd llama.cpp && LLAMA_CURL=1 make ``` Step 3: Run inference through the main binary. ``` ./llama-cli --hf-repo Triangle104/Llama3.1-8B-SEP-Chat-Q8_0-GGUF --hf-file llama3.1-8b-sep-chat-q8_0.gguf -p "The meaning to life and the universe is" ``` or ``` ./llama-server --hf-repo Triangle104/Llama3.1-8B-SEP-Chat-Q8_0-GGUF --hf-file llama3.1-8b-sep-chat-q8_0.gguf -c 2048 ```