--- license: apache-2.0 datasets: - amphora/QwQ-LongCoT-130K language: - en base_model: - prithivMLmods/QwQ-LCoT-7B-Instruct pipeline_tag: text-generation library_name: transformers tags: - QwQ - Adapter - safetensors - Qwen2.5 - text-generation-inference ---
________            ________                 _____  ___.    
\_____  \  __  _  __\_____  \               /  |  | \_ |__  
 /  / \  \ \ \/ \/ / /  / \  \    ______   /   |  |_ | __ \ 
/   \_/.  \ \     / /   \_/.  \  /_____/  /    ^   / | \_\ \
\_____\ \_/  \/\_/  \_____\ \_/           \____   |  |___  /
       \__>                \__>                |__|      \/ 
The **QwQ-4B-Instruct** is a lightweight and efficient fine-tuned language model for instruction-following tasks and reasoning. It is based on a quantized version of the **Qwen2.5-7B** model, optimized for inference speed and reduced memory consumption, while retaining robust capabilities for complex tasks. With its robust natural language processing capabilities, **QwQ-4B-Instruct** excels in generating step-by-step solutions, creative content, and logical analyses. Its architecture integrates advanced understanding of both structured and unstructured data, ensuring precise text generation aligned with user inputs. - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains. - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots. - **Long-context Support** up to 128K tokens and can generate up to 8K tokens. - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. # **Demo Start** Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents. ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "prithivMLmods/QwQ-4B-Instruct" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Give me a short introduction to large language model." messages = [ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` # **Run with Ollama [Ollama Run]** Ollama makes running machine learning models simple and efficient. Follow these steps to set up and run your GGUF models quickly. ## Quick Start: Step-by-Step Guide | Step | Description | Command / Instructions | |------|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------| | 1 | **Install Ollama 🦙** | Download Ollama from [https://ollama.com/download](https://ollama.com/download) and install it on your system. | | 2 | **Create Your Model File** | - Create a file named after your model, e.g., `metallama`. | | | | - Add the following line to specify the base model: | | | | ```bash | | | | FROM Llama-3.2-1B.F16.gguf | | | | ``` | | | | - Ensure the base model file is in the same directory. | | 3 | **Create and Patch the Model** | Run the following commands to create and verify your model: | | | | ```bash | | | | ollama create metallama -f ./metallama | | | | ollama list | | | | ``` | | 4 | **Run the Model** | Use the following command to start your model: | | | | ```bash | | | | ollama run metallama | | | | ``` | | 5 | **Interact with the Model** | Once the model is running, interact with it: | | | | ```plaintext | | | | >>> Tell me about Space X. | | | | Space X, the private aerospace company founded by Elon Musk, is revolutionizing space exploration... | | | | ``` | ## Conclusion With Ollama, running and interacting with models is seamless. Start experimenting today!