--- base_model: mistralai/Mistral-7B-v0.1 inference: false license: apache-2.0 model_creator: Mistral AI model_name: Mistral 7B v0.1 model_type: mistral pipeline_tag: text-generation prompt_template: | {prompt} quantized_by: TheBloke tags: - pretrained - mistral - quantized --- # Mistral 7B v0.1 Quantized by asya.ai ## Mistral 7B v0.1 - AWQ - Model creator: [Mistral AI](https://huggingface.co/mistralai) - Original model: [Mistral 7B v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) ## Description This repo contains AWQ model files Original model takes up 15GB GPU RAM and achieves 0.368 response score Quantized version uses 5GB GPU RAM and achieves 0.329 response score Models validated on [HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots) dataset and evaluated as cosine similarity for response embeddings ## How to use this AWQ model from Python code ```python from awq import AutoAWQForCausalLM from transformers import AutoTokenizer model_name_or_path = "asya-ai/Mistral-7B-v0.1-AWQ" # Load model model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True, trust_remote_code=False, safetensors=True) tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False) prompt = "Tell me about AI" prompt_template=f'''{prompt} ''' print("\n\n*** Generate:") tokens = tokenizer( prompt_template, return_tensors='pt' ).input_ids.cuda() # Generate output generation_output = model.generate( tokens, do_sample=True, temperature=0.7, top_p=0.95, top_k=40, max_new_tokens=512 ) print("Output: ", tokenizer.decode(generation_output[0])) ``` ## Compatibility The files provided are tested to work with: - [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) # Original model card: Mistral AI's Mistral 7B v0.1 # Model Card for Mistral-7B-v0.1 The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested. For full details of this model please read our [Release blog post](https://mistral.ai/news/announcing-mistral-7b/) ## Model Architecture Mistral-7B-v0.1 is a transformer model, with the following architecture choices: - Grouped-Query Attention - Sliding-Window Attention - Byte-fallback BPE tokenizer ## The Mistral AI Team Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.