llmware
/

bling-sheared-llama-2.7b-0.1

Text Generation

text-generation-inference

Model card Files Files and versions Community

doberst commited on Nov 12, 2023

Commit

4c36692

·

1 Parent(s): 34e36fc

Update README.md

Files changed (1) hide show

README.md +33 -10

README.md CHANGED Viewed

@@ -77,14 +77,13 @@ Any model can provide inaccurate or incomplete information, and should be used i
 The fastest way to get started with BLING is through direct import in transformers:
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("llmware/bling-sheared-llama-2.7b-0.1")
-model = AutoModelForCausalLM.from_pretrained("llmware/bling-sheared-llama-2.7b-0.1")
 The BLING model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
-full_prompt = "\<human>\: " + my_prompt + "\n" + "\<bot>\:"
 The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
@@ -93,7 +92,35 @@ The BLING model was fine-tuned with closed-context samples, which assume general
 To get the best results, package "my_prompt" as follows:
-my_prompt = {{text_passage}} + "\n" + {{question/instruction}}
 ## Citation [optional]
@@ -110,7 +137,3 @@ This BLING model was built on top of a Sheared Llama model base - for more infor
 Darren Oberst & llmware team
-Please reach out anytime if you are interested in this project and would like to participate and work with us!

 The fastest way to get started with BLING is through direct import in transformers:
+   from transformers import AutoTokenizer, AutoModelForCausalLM
+   tokenizer = AutoTokenizer.from_pretrained("llmware/bling-sheared-llama-2.7b-0.1")
+   model = AutoModelForCausalLM.from_pretrained("llmware/bling-sheared-llama-2.7b-0.1")
 The BLING model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
+   full_prompt = "\<human>\: " + my_prompt + "\n" + "\<bot>\:"
 The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
 To get the best results, package "my_prompt" as follows:
+    my_prompt = {{text_passage}} + "\n" + {{question/instruction}}
+If you are using a HuggingFace generation script:
+    # prepare prompt packaging used in fine-tuning process
+    new_prompt = "<human>: " + entries["context"] + "\n" + entries["query"] + "\n" + "<bot>:"
+    inputs = tokenizer(new_prompt, return_tensors="pt")
+    start_of_output = len(inputs.input_ids[0])
+    #   temperature: set at 0.3 for consistency of output
+    #   max_new_tokens:  set at 100 - may prematurely stop a few of the summaries
+    outputs = model.generate(
+            inputs.input_ids.to(device),
+            eos_token_id=tokenizer.eos_token_id,
+            pad_token_id=tokenizer.eos_token_id,
+            do_sample=True,
+            temperature=0.3,
+            max_new_tokens=100,
+            )
+    output_only = tokenizer.decode(outputs[0][start_of_output:],skip_special_tokens=True)
+    #   note: due to artifact of the fine-tuning, use this post-processing with HF generation
+    eot = output_only.find("<|endoftext|>")
+    if eot > -1:
+        output_only = output_only[:eot]
 ## Citation [optional]
 Darren Oberst & llmware team