---
base_model: mistralai/Mistral-7B-v0.1
inference: false
license: apache-2.0
model_creator: Mistral AI
model_name: Mistral 7B v0.1
model_type: mistral
pipeline_tag: text-generation
prompt_template: |
  {prompt}
quantized_by: TheBloke
tags:
- pretrained
- mistral
- quantized
---

# Mistral 7B v0.1 Quantized by asya.ai

## Mistral 7B v0.1 - AWQ
- Model creator: [Mistral AI](https://huggingface.co/mistralai)
- Original model: [Mistral 7B v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)

<!-- description start -->
## Description

This repo contains AWQ model files

Original model takes up 15GB GPU RAM and achieves 0.368 response score

Quantized version uses 5GB GPU RAM and achieves 0.329 response score

Models validated on [HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots) dataset and evaluated as cosine similarity for response embeddings 

<!-- README_AWQ.md-use-from-python start -->
## How to use this AWQ model from Python code

```python
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_name_or_path = "asya-ai/Mistral-7B-v0.1-AWQ"

# Load model
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True,
                                          trust_remote_code=False, safetensors=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False)

prompt = "Tell me about AI"
prompt_template=f'''{prompt}

'''

print("\n\n*** Generate:")

tokens = tokenizer(
    prompt_template,
    return_tensors='pt'
).input_ids.cuda()

# Generate output
generation_output = model.generate(
    tokens,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    max_new_tokens=512
)

print("Output: ", tokenizer.decode(generation_output[0]))

```
<!-- README_AWQ.md-use-from-python end -->

<!-- README_AWQ.md-compatibility start -->
## Compatibility

The files provided are tested to work with:

- [AutoAWQ](https://github.com/casper-hansen/AutoAWQ)

<!-- footer end -->

# Original model card: Mistral AI's Mistral 7B v0.1


# Model Card for Mistral-7B-v0.1

The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. 
Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested.

For full details of this model please read our [Release blog post](https://mistral.ai/news/announcing-mistral-7b/)

## Model Architecture 
Mistral-7B-v0.1 is a transformer model, with the following architecture choices:
- Grouped-Query Attention
- Sliding-Window Attention
- Byte-fallback BPE tokenizer

## The Mistral AI Team
 
Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.