docs: add model card
Browse files
README.md
CHANGED
@@ -1,5 +1,63 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- zh
|
5 |
+
license: other
|
6 |
+
tags:
|
7 |
+
- chat
|
8 |
+
license_name: tongyi-qianwen
|
9 |
+
license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
|
10 |
+
pipeline_tag: text-generation
|
11 |
+
library_name: transformers
|
12 |
+
---
|
13 |
+
|
14 |
+
# magnum-72b-v1-llamaify
|
15 |
+
|
16 |
+
This is a converted version of the Magnum 72B v1 model, now in LLaMA format. The original model was designed to replicate the prose quality of the Claude 3 models, specifically Sonnet and Opus. This converted version maintains the same capabilities but is now compatible with LLaMA-based frameworks and tools.
|
17 |
+
|
18 |
+
The speed may also be a bit faster, especially if you use frameworks optimized for LLaMA.
|
19 |
+
|
20 |
+
## Model Details
|
21 |
+
|
22 |
+
- **Base Model:** [Qwen-2 72B Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct)
|
23 |
+
- **Training Data:** 55 million tokens of high-quality RP data
|
24 |
+
- **Training Duration:** 1.5 epochs
|
25 |
+
- **Hardware Used:** 8x AMD Instinct™ MI300X Accelerators
|
26 |
+
|
27 |
+
## Prompting
|
28 |
+
|
29 |
+
The model uses ChatML formatting for instructions. A typical input would look like this:
|
30 |
+
|
31 |
+
```
|
32 |
+
<|im_start|>user
|
33 |
+
Hi there!<|im_end|>
|
34 |
+
<|im_start|>assistant
|
35 |
+
Nice to meet you!<|im_end|>
|
36 |
+
<|im_start|>user
|
37 |
+
Can I ask a question?<|im_end|>
|
38 |
+
<|im_start|>assistant
|
39 |
+
```
|
40 |
+
|
41 |
+
## Credits
|
42 |
+
|
43 |
+
Credit goes to Anthracite for the original model.
|
44 |
+
|
45 |
+
## Conversion Details
|
46 |
+
|
47 |
+
This version of the model has been converted to the LLaMA format to enhance compatibility with a wider range of tools and frameworks. While the core capabilities of the model remain the same, users should be aware that there might be slight differences in performance due to the conversion process.
|
48 |
+
|
49 |
+
## Usage
|
50 |
+
|
51 |
+
Can be used in transformers or any software that supports LLaMA arch models.
|
52 |
+
|
53 |
+
## Limitations
|
54 |
+
|
55 |
+
Users should be aware that while this converted model maintains the general capabilities of the original, there might be subtle differences in performance or behavior due to the format change. It's recommended to test the model for your specific use case.
|
56 |
+
|
57 |
+
## License
|
58 |
+
|
59 |
+
This model inherits the license from its base model, Qwen-2 72B Instruct. Please refer to the [original license](https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE) for terms of use.
|
60 |
+
|
61 |
+
## Contact
|
62 |
+
|
63 |
+
For questions or issues related to this converted model, please open an issue in the model's repository.
|