alokabhishek commited on
Commit
b2efe14
·
verified ·
1 Parent(s): 9b3153c

Created Readme

Browse files
Files changed (1) hide show
  1. README.md +118 -0
README.md ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - 4bit
5
+ - AWQ
6
+ - AutoAWQ
7
+ - llama
8
+ - llama-2
9
+ - facebook
10
+ - meta
11
+ - 7b
12
+ - quantized
13
+ license: llama2
14
+ pipeline_tag: text-generation
15
+ ---
16
+
17
+ # Model Card for alokabhishek/Llama-2-7b-chat-hf-4bit-AWQ
18
+
19
+ <!-- Provide a quick summary of what the model is/does. -->
20
+
21
+ This repo contains 4-bit quantized (using AutoAWQ) model of Meta's meta-llama/Llama-2-7b-chat-hf
22
+
23
+ ## Model Details
24
+
25
+ - Model creator: [Meta](https://huggingface.co/meta-llama)
26
+ - Original model: [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
27
+
28
+
29
+ ### About 4 bit quantization using AutoAWQ
30
+
31
+ AutoAWS github repo: [bitsandbytes github repo](https://github.com/casper-hansen/AutoAWQ/tree/main)
32
+
33
+ # How to Get Started with the Model
34
+
35
+ Use the code below to get started with the model.
36
+
37
+ ## How to run from Python code
38
+
39
+ #### First install the package
40
+ ```shell
41
+ !pip install autoawq
42
+ !pip install accelerate
43
+ ```
44
+
45
+ #### Import
46
+
47
+ ```python
48
+ import torch
49
+ import os
50
+ from torch import bfloat16
51
+ from huggingface_hub import login, HfApi, create_repo
52
+ from transformers import AutoTokenizer, pipeline
53
+ from awq import AutoAWQForCausalLM
54
+ ```
55
+
56
+ #### Use a pipeline as a high-level helper
57
+
58
+ ```python
59
+ # define the model ID
60
+ model_id_llama = "alokabhishek/Llama-2-7b-chat-hf-4bit-AWQ"
61
+
62
+ # Load model
63
+ tokenizer_llama = AutoTokenizer.from_pretrained(model_id_llama, use_fast=True)
64
+ model_llama = AutoAWQForCausalLM.from_quantized(model_id_llama, fuse_layer=True, trust_remote_code = False, safetensors = True)
65
+
66
+ # Set up the prompt and prompt template. Change instruction as per requirements.
67
+ prompt_llama = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
68
+ fromatted_prompt = f'''[INST] <<SYS>> You are a helpful, and fun loving assistant. Always answer as jestfully as possible. <</SYS>> {prompt_llama} [/INST] '''
69
+
70
+ tokens = tokenizer_llama(fromatted_prompt, return_tensors="pt").input_ids.cuda()
71
+
72
+ # Generate output, adjust parameters as per requirements
73
+ generation_output = model_llama.generate(tokens, do_sample=True, temperature=1.7, top_p=0.95, top_k=40, max_new_tokens=512)
74
+
75
+ # Print the output
76
+ print(tokenizer_llama.decode(generation_output[0], skip_special_tokens=True))
77
+
78
+
79
+ ```
80
+
81
+
82
+ ## Uses
83
+
84
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
85
+
86
+ ### Direct Use
87
+
88
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
89
+
90
+ [More Information Needed]
91
+
92
+ ### Downstream Use [optional]
93
+
94
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
95
+
96
+ [More Information Needed]
97
+
98
+ ### Out-of-Scope Use
99
+
100
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
101
+
102
+ [More Information Needed]
103
+
104
+ ## Bias, Risks, and Limitations
105
+
106
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
107
+
108
+ [More Information Needed]
109
+
110
+
111
+
112
+ ## Model Card Authors [optional]
113
+
114
+ [More Information Needed]
115
+
116
+ ## Model Card Contact
117
+
118
+ [More Information Needed]