doberst commited on
Commit
aad6ac8
·
verified ·
1 Parent(s): 30d617b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -11
README.md CHANGED
@@ -3,21 +3,19 @@ license: apache-2.0
3
  inference: false
4
  ---
5
 
6
- # dragon-phi-3-answer-tool
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
10
- dragon-phi-3-answer-tool is part of the DRAGON ("Delivering RAG On ...") model series, RAG-instruct trained on top of a Microsoft Phi-3 base model.
11
-
12
- DRAGON models are fine-tuned with high-quality custom instruct datasets, designed for production use in RAG scenarios.
13
 
14
 
15
  ### Benchmark Tests
16
 
17
  Evaluated against the benchmark test: [RAG-Instruct-Benchmark-Tester](https://www.huggingface.co/datasets/llmware/rag_instruct_benchmark_tester)
18
- Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
19
 
20
- --**Accuracy Score**: **100.0** correct out of 100
21
  --Not Found Classification: 95.0%
22
  --Boolean: 97.5%
23
  --Math/Logic: 80.0%
@@ -32,7 +30,7 @@ For test run results (and good indicator of target use cases), please see the fi
32
  <!-- Provide a longer summary of what this model is. -->
33
 
34
  - **Developed by:** llmware
35
- - **Model type:** Dragon
36
  - **Language(s) (NLP):** English
37
  - **License:** Apache 2.0
38
  - **Finetuned from model:** Microsoft Phi-3
@@ -71,14 +69,17 @@ Any model can provide inaccurate or incomplete information, and should be used i
71
  The fastest way to get started with BLING is through direct import in transformers:
72
 
73
  from transformers import AutoTokenizer, AutoModelForCausalLM
74
- tokenizer = AutoTokenizer.from_pretrained("bling-phi-2-v0", trust_remote_code=True)
75
- model = AutoModelForCausalLM.from_pretrained("bling-phi-2-v0", trust_remote_code=True)
76
 
77
  Please refer to the generation_test .py files in the Files repository, which includes 200 samples and script to test the model. The **generation_test_llmware_script.py** includes built-in llmware capabilities for fact-checking, as well as easy integration with document parsing and actual retrieval to swap out the test set for RAG workflow consisting of business documents.
78
 
79
- The dRAGon model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
 
 
 
 
80
 
81
- full_prompt = "<human>: " + my_prompt + "\n" + "<bot>:"
82
 
83
  The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
84
 
 
3
  inference: false
4
  ---
5
 
6
+ # bling-phi-3
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
10
+ bling-phi-3 is part of the BLING ("Best Little Instruct No-GPU") model series, RAG-instruct trained on top of a Microsoft Phi-3 base model.
 
 
11
 
12
 
13
  ### Benchmark Tests
14
 
15
  Evaluated against the benchmark test: [RAG-Instruct-Benchmark-Tester](https://www.huggingface.co/datasets/llmware/rag_instruct_benchmark_tester)
16
+ 1 Test Run (temperature=0.0, sample=False) with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
17
 
18
+ --**Accuracy Score**: **99.5** correct out of 100
19
  --Not Found Classification: 95.0%
20
  --Boolean: 97.5%
21
  --Math/Logic: 80.0%
 
30
  <!-- Provide a longer summary of what this model is. -->
31
 
32
  - **Developed by:** llmware
33
+ - **Model type:** bling
34
  - **Language(s) (NLP):** English
35
  - **License:** Apache 2.0
36
  - **Finetuned from model:** Microsoft Phi-3
 
69
  The fastest way to get started with BLING is through direct import in transformers:
70
 
71
  from transformers import AutoTokenizer, AutoModelForCausalLM
72
+ tokenizer = AutoTokenizer.from_pretrained("llmware/bling-phi-3", trust_remote_code=True)
73
+ model = AutoModelForCausalLM.from_pretrained("llmware/bling-phi-3", trust_remote_code=True)
74
 
75
  Please refer to the generation_test .py files in the Files repository, which includes 200 samples and script to test the model. The **generation_test_llmware_script.py** includes built-in llmware capabilities for fact-checking, as well as easy integration with document parsing and actual retrieval to swap out the test set for RAG workflow consisting of business documents.
76
 
77
+ The BLING model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
78
+
79
+ full_prompt = "<human>: " + my_prompt + "\n" + "<bot>:"
80
+
81
+ (As an aside, we intended to retire "human-bot" and tried several variations of the new Microsoft Phi-3 prompt template and ultimately had slightly better results with the very simple "human-bot" separators, so we opted to keep them.)
82
 
 
83
 
84
  The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
85