doberst commited on
Commit
4c36692
·
1 Parent(s): 34e36fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -10
README.md CHANGED
@@ -77,14 +77,13 @@ Any model can provide inaccurate or incomplete information, and should be used i
77
 
78
  The fastest way to get started with BLING is through direct import in transformers:
79
 
80
- from transformers import AutoTokenizer, AutoModelForCausalLM
81
- tokenizer = AutoTokenizer.from_pretrained("llmware/bling-sheared-llama-2.7b-0.1")
82
- model = AutoModelForCausalLM.from_pretrained("llmware/bling-sheared-llama-2.7b-0.1")
83
-
84
 
85
  The BLING model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
86
 
87
- full_prompt = "\<human>\: " + my_prompt + "\n" + "\<bot>\:"
88
 
89
  The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
90
 
@@ -93,7 +92,35 @@ The BLING model was fine-tuned with closed-context samples, which assume general
93
 
94
  To get the best results, package "my_prompt" as follows:
95
 
96
- my_prompt = {{text_passage}} + "\n" + {{question/instruction}}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
 
98
 
99
  ## Citation [optional]
@@ -110,7 +137,3 @@ This BLING model was built on top of a Sheared Llama model base - for more infor
110
 
111
  Darren Oberst & llmware team
112
 
113
- Please reach out anytime if you are interested in this project and would like to participate and work with us!
114
-
115
-
116
-
 
77
 
78
  The fastest way to get started with BLING is through direct import in transformers:
79
 
80
+ from transformers import AutoTokenizer, AutoModelForCausalLM
81
+ tokenizer = AutoTokenizer.from_pretrained("llmware/bling-sheared-llama-2.7b-0.1")
82
+ model = AutoModelForCausalLM.from_pretrained("llmware/bling-sheared-llama-2.7b-0.1")
 
83
 
84
  The BLING model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
85
 
86
+ full_prompt = "\<human>\: " + my_prompt + "\n" + "\<bot>\:"
87
 
88
  The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
89
 
 
92
 
93
  To get the best results, package "my_prompt" as follows:
94
 
95
+ my_prompt = {{text_passage}} + "\n" + {{question/instruction}}
96
+
97
+ If you are using a HuggingFace generation script:
98
+
99
+ # prepare prompt packaging used in fine-tuning process
100
+ new_prompt = "<human>: " + entries["context"] + "\n" + entries["query"] + "\n" + "<bot>:"
101
+
102
+ inputs = tokenizer(new_prompt, return_tensors="pt")
103
+ start_of_output = len(inputs.input_ids[0])
104
+
105
+ # temperature: set at 0.3 for consistency of output
106
+ # max_new_tokens: set at 100 - may prematurely stop a few of the summaries
107
+
108
+ outputs = model.generate(
109
+ inputs.input_ids.to(device),
110
+ eos_token_id=tokenizer.eos_token_id,
111
+ pad_token_id=tokenizer.eos_token_id,
112
+ do_sample=True,
113
+ temperature=0.3,
114
+ max_new_tokens=100,
115
+ )
116
+
117
+ output_only = tokenizer.decode(outputs[0][start_of_output:],skip_special_tokens=True)
118
+
119
+ # note: due to artifact of the fine-tuning, use this post-processing with HF generation
120
+
121
+ eot = output_only.find("<|endoftext|>")
122
+ if eot > -1:
123
+ output_only = output_only[:eot]
124
 
125
 
126
  ## Citation [optional]
 
137
 
138
  Darren Oberst & llmware team
139