Adding optimization section in model card.
Browse files
README.md
CHANGED
@@ -7,7 +7,8 @@ inference: false
|
|
7 |
datasets:
|
8 |
- databricks/databricks-dolly-15k
|
9 |
---
|
10 |
-
# dolly-v2-7b Model Card
|
|
|
11 |
## Summary
|
12 |
|
13 |
Databricks’ `dolly-v2-7b`, an instruction-following large language model trained on the Databricks machine learning platform
|
@@ -27,6 +28,22 @@ running inference for various GPU configurations.
|
|
27 |
|
28 |
**Owner**: Databricks, Inc.
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
## Model Overview
|
31 |
`dolly-v2-7b` is a 6.9 billion parameter causal language model created by [Databricks](https://databricks.com/) that is derived from
|
32 |
[EleutherAI’s](https://www.eleuther.ai/) [Pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b) and fine-tuned
|
@@ -173,4 +190,6 @@ but a robust statement as to the sources of these variations requires further st
|
|
173 |
| databricks/dolly-v1-6b | 0.41 | 0.62963 | 0.643252 | 0.676758 | 0.384812 | 0.773667 | 0.687768 | 0.583431 |
|
174 |
| EleutherAI/gpt-neox-20b | 0.402 | 0.683923 | 0.656669 | 0.7142 | 0.408703 | 0.784004 | 0.695413 | 0.602236 |
|
175 |
|
176 |
-
# Happy Hacking!
|
|
|
|
|
|
7 |
datasets:
|
8 |
- databricks/databricks-dolly-15k
|
9 |
---
|
10 |
+
# dolly-v2-7b Olive Optimized Model Card
|
11 |
+
|
12 |
## Summary
|
13 |
|
14 |
Databricks’ `dolly-v2-7b`, an instruction-following large language model trained on the Databricks machine learning platform
|
|
|
28 |
|
29 |
**Owner**: Databricks, Inc.
|
30 |
|
31 |
+
## Olive Optimization
|
32 |
+
|
33 |
+
This repo hosts model files that may be loaded as an [`ORTModelForCausalLM`](https://github.com/huggingface/optimum/blob/a6951c17c3450e1dea99617aa842334f4e904392/optimum/onnxruntime/modeling_decoder.py#L623) when using Python with [🤗 Optimum](https://huggingface.co/docs/optimum/onnxruntime/overview). Alternatively, the ONNX models may be composed into a custom pipeline in any language that supports ONNX Runtime & DirectML. If you choose to use ONNX Runtime & DirectML outside of Python, then you will need to provide your own implementation of the tokenizer.
|
34 |
+
|
35 |
+
| Model | Impl |
|
36 |
+
| ------------------------------- | ----------------------------------------------------------- |
|
37 |
+
| **dolly-v2-7b decoder merged with past** | **ONNX Model** |
|
38 |
+
| Tokenizer | `AutoTokenizer` (🤗 Transformers) |
|
39 |
+
|
40 |
+
The ONNX model above was processed with the [Olive](https://github.com/microsoft/olive) toolchain using the [Olive + Dolly V2 with DirectML Sample](https://github.com/microsoft/Olive/tree/main/examples/directml/dolly_v2). The Olive sample performs the following steps:
|
41 |
+
|
42 |
+
1. Run the [OptimumConversion Pass](https://microsoft.github.io/Olive/api/passes.html#optimumconversion)
|
43 |
+
2. Run the [OrtTransformersOptimization Pass](https://microsoft.github.io/Olive/api/passes.html#orttransformersoptimization), which leverages the [ONNX Runtime Transformer Model Optimization Tool](https://onnxruntime.ai/docs/performance/transformers-optimization.html). This step executes several time-consuming graph transformations, such as fusing subgraphs into LayerNorm.
|
44 |
+
3. Convert the optimized ONNX models from FLOAT32 to FLOAT16.
|
45 |
+
4. Run the [OptimumMerging Pass](https://microsoft.github.io/Olive/api/passes.html#optimummerging) to leverage caching and reduce memory usage by merging the decoder_model.onnx and decoder_with_past_model.onnx models together.
|
46 |
+
|
47 |
## Model Overview
|
48 |
`dolly-v2-7b` is a 6.9 billion parameter causal language model created by [Databricks](https://databricks.com/) that is derived from
|
49 |
[EleutherAI’s](https://www.eleuther.ai/) [Pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b) and fine-tuned
|
|
|
190 |
| databricks/dolly-v1-6b | 0.41 | 0.62963 | 0.643252 | 0.676758 | 0.384812 | 0.773667 | 0.687768 | 0.583431 |
|
191 |
| EleutherAI/gpt-neox-20b | 0.402 | 0.683923 | 0.656669 | 0.7142 | 0.408703 | 0.784004 | 0.695413 | 0.602236 |
|
192 |
|
193 |
+
# Happy Hacking!
|
194 |
+
|
195 |
+
This model is an optimized version of Databricks, Inc. [databricks/dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b).
|