eclfe commited on
Commit
32725ff
·
verified ·
1 Parent(s): 5d90657

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -94
README.md CHANGED
@@ -19,46 +19,24 @@ This model was trained using [H2O LLM Studio](https://github.com/h2oai/h2o-llmst
19
 
20
  ## Usage
21
 
22
- To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers` library installed.
23
 
24
- ```bash
25
- pip install transformers==4.40.2
26
- ```
27
-
28
- Also make sure you are providing your huggingface token to the pipeline if the model is lying in a private repo.
29
-
30
- - Either leave `token=True` in the `pipeline` and login to hugginface_hub by running
31
-
32
- ```python
33
  import huggingface_hub
34
  huggingface_hub.login(<ACCESS_TOKEN>)
35
- ```
36
-
37
- - Or directly pass your <ACCESS_TOKEN> to `token` in the `pipeline`
38
 
39
- ```python
40
  from transformers import pipeline
41
 
42
  generate_text = pipeline(
43
- model="eclfe/sqlen-1-21-1",
44
  torch_dtype="auto",
45
  trust_remote_code=True,
46
  device_map={"": "cuda:0"},
47
  token=True,
48
  )
49
 
50
- # generate configuration can be modified to your needs
51
- # generate_text.model.generation_config.min_new_tokens = 2
52
- # generate_text.model.generation_config.max_new_tokens = 256
53
- # generate_text.model.generation_config.do_sample = False
54
- # generate_text.model.generation_config.num_beams = 1
55
- # generate_text.model.generation_config.temperature = float(0.0)
56
- # generate_text.model.generation_config.repetition_penalty = float(1.0)
57
 
58
  messages = [
59
- {"role": "user", "content": "Hi, how are you?"},
60
- {"role": "assistant", "content": "I'm doing great, how about you?"},
61
- {"role": "user", "content": "Why is drinking water so healthy?"},
62
  ]
63
 
64
  res = generate_text(
@@ -66,75 +44,6 @@ res = generate_text(
66
  renormalize_logits=True
67
  )
68
  print(res[0]["generated_text"][-1]['content'])
69
- ```
70
-
71
- You can print a sample prompt after applying chat template to see how it is feed to the tokenizer:
72
-
73
- ```python
74
- print(generate_text.tokenizer.apply_chat_template(
75
- messages,
76
- tokenize=False,
77
- add_generation_prompt=True,
78
- ))
79
- ```
80
-
81
- You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps:
82
-
83
- ```python
84
- from transformers import AutoModelForCausalLM, AutoTokenizer
85
-
86
- model_name = "eclfe/sqlen-1-21-1" # either local folder or huggingface model name
87
- # Important: The prompt needs to be in the same format the model was trained with.
88
- # You can find an example prompt in the experiment logs.
89
- messages = [
90
- {"role": "user", "content": "Hi, how are you?"},
91
- {"role": "assistant", "content": "I'm doing great, how about you?"},
92
- {"role": "user", "content": "Why is drinking water so healthy?"},
93
- ]
94
-
95
- tokenizer = AutoTokenizer.from_pretrained(
96
- model_name,
97
- trust_remote_code=True,
98
- )
99
- model = AutoModelForCausalLM.from_pretrained(
100
- model_name,
101
- torch_dtype="auto",
102
- device_map={"": "cuda:0"},
103
- trust_remote_code=True,
104
- )
105
- model.cuda().eval()
106
-
107
- # generate configuration can be modified to your needs
108
- # model.generation_config.min_new_tokens = 2
109
- # model.generation_config.max_new_tokens = 256
110
- # model.generation_config.do_sample = False
111
- # model.generation_config.num_beams = 1
112
- # model.generation_config.temperature = float(0.0)
113
- # model.generation_config.repetition_penalty = float(1.0)
114
-
115
- inputs = tokenizer.apply_chat_template(
116
- messages,
117
- tokenize=True,
118
- add_generation_prompt=True,
119
- return_tensors="pt",
120
- return_dict=True,
121
- ).to("cuda")
122
-
123
- tokens = model.generate(
124
- input_ids=inputs["input_ids"],
125
- attention_mask=inputs["attention_mask"],
126
- renormalize_logits=True
127
- )[0]
128
-
129
- tokens = tokens[inputs["input_ids"].shape[1]:]
130
- answer = tokenizer.decode(tokens, skip_special_tokens=True)
131
- print(answer)
132
- ```
133
-
134
- ## Quantization and sharding
135
-
136
- You can load the models using quantization by specifying ```load_in_8bit=True``` or ```load_in_4bit=True```. Also, sharding on multiple GPUs is possible by setting ```device_map=auto```.
137
-
138
  ## Model Architecture
139
 
140
  ```
 
19
 
20
  ## Usage
21
 
22
+ ! pip install transformers==4.40.2
23
 
 
 
 
 
 
 
 
 
 
24
  import huggingface_hub
25
  huggingface_hub.login(<ACCESS_TOKEN>)
 
 
 
26
 
 
27
  from transformers import pipeline
28
 
29
  generate_text = pipeline(
30
+ model="eclfe/sqlen-1-21",
31
  torch_dtype="auto",
32
  trust_remote_code=True,
33
  device_map={"": "cuda:0"},
34
  token=True,
35
  )
36
 
 
 
 
 
 
 
 
37
 
38
  messages = [
39
+ {"role": "user", "content": '"SELECT CITYalias0.CITY_NAME FROM CITY AS CITYalias0 WHERE CITYalias0.POPULATION = ( SELECT MAX( CITYalias1.POPULATION ) FROM CITY AS CITYalias1 WHERE CITYalias1.STATE_NAME = "state_name0" ) AND CITYalias0.STATE_NAME = "state_name0" ;'},
 
 
40
  ]
41
 
42
  res = generate_text(
 
44
  renormalize_logits=True
45
  )
46
  print(res[0]["generated_text"][-1]['content'])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ## Model Architecture
48
 
49
  ```