nvidia-open-model-license
#12
by
itlevy
- opened
NOTICE
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
2 |
+
|
3 |
+
NVIDIA CORPORATION, its affiliates and licensors retain all intellectual property and proprietary rights in and to this material, related documentation and any modifications thereto. Any use, reproduction, disclosure or distribution of this material and related documentation without an express license agreement from NVIDIA CORPORATION or its affiliates is strictly prohibited.
|
4 |
+
|
5 |
+
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
|
README.md
CHANGED
@@ -8,9 +8,9 @@ tags:
|
|
8 |
- llama-3
|
9 |
- pytorch
|
10 |
license: other
|
11 |
-
license_name: nvidia-
|
12 |
license_link: >-
|
13 |
-
https://
|
14 |
---
|
15 |
|
16 |
# Llama-3_1-Nemotron-51B-instruct
|
@@ -22,14 +22,15 @@ Llama-3_1-Nemotron-51B-instruct is a model which offers a great tradeoff between
|
|
22 |
|
23 |
|
24 |
## License
|
25 |
-
[NVIDIA
|
|
|
26 |
|
27 |
## How was the model developed
|
28 |
|
29 |
Llama-3_1-Nemotron-51B-instruct is a large language model (LLM) which is a derivative of Llama-3.1-70B-instruct (AKA the reference model). We utilize a block-wise distillation of the reference model, where for each block we create multiple variants providing different tradeoffs of quality vs. computational complexity. We then search over the blocks to create a model which meets the required throughput and memory (optimized for a single H100-80GB GPU) while minimizing the quality degradation. The model then undergoes knowledge distillation (KD), with a focus on English single and multi-turn chat use-cases.
|
30 |
The KD step included 40 billion tokens consisting of a mixture of 3 datasets - FineWeb, Buzz-V1.2 and Dolma.
|
31 |
|
32 |
-
Links to [NIM](https://build.nvidia.com/nvidia/llama-3_1-nemotron-51b-
|
33 |
|
34 |
|
35 |
This results in a final model that is aligned for human chat preferences.
|
@@ -74,7 +75,7 @@ print(pipeline([{"role": "user", "content": "Hey how are you?"}]))
|
|
74 |
|
75 |
## Required Hardware
|
76 |
|
77 |
-
FP8 Inference (
|
78 |
- 1x H100-80GB GPU
|
79 |
|
80 |
BF16 Inference:
|
|
|
8 |
- llama-3
|
9 |
- pytorch
|
10 |
license: other
|
11 |
+
license_name: nvidia-open-model-license
|
12 |
license_link: >-
|
13 |
+
https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
|
14 |
---
|
15 |
|
16 |
# Llama-3_1-Nemotron-51B-instruct
|
|
|
22 |
|
23 |
|
24 |
## License
|
25 |
+
This model is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
|
26 |
+
Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/). Built with Llama.
|
27 |
|
28 |
## How was the model developed
|
29 |
|
30 |
Llama-3_1-Nemotron-51B-instruct is a large language model (LLM) which is a derivative of Llama-3.1-70B-instruct (AKA the reference model). We utilize a block-wise distillation of the reference model, where for each block we create multiple variants providing different tradeoffs of quality vs. computational complexity. We then search over the blocks to create a model which meets the required throughput and memory (optimized for a single H100-80GB GPU) while minimizing the quality degradation. The model then undergoes knowledge distillation (KD), with a focus on English single and multi-turn chat use-cases.
|
31 |
The KD step included 40 billion tokens consisting of a mixture of 3 datasets - FineWeb, Buzz-V1.2 and Dolma.
|
32 |
|
33 |
+
Links to [NIM](https://build.nvidia.com/nvidia/llama-3_1-nemotron-51b-vv), blog and [huggingface](https://huggingface.co/nvidia/Llama-3_1-Nemotron-51B-Instruct)
|
34 |
|
35 |
|
36 |
This results in a final model that is aligned for human chat preferences.
|
|
|
75 |
|
76 |
## Required Hardware
|
77 |
|
78 |
+
FP8 Inference (recommendedrecommended):
|
79 |
- 1x H100-80GB GPU
|
80 |
|
81 |
BF16 Inference:
|