nvidia-open-model-license

#12
by itlevy - opened
Files changed (2) hide show
  1. NOTICE +5 -0
  2. README.md +6 -5
NOTICE ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2
+
3
+ NVIDIA CORPORATION, its affiliates and licensors retain all intellectual property and proprietary rights in and to this material, related documentation and any modifications thereto. Any use, reproduction, disclosure or distribution of this material and related documentation without an express license agreement from NVIDIA CORPORATION or its affiliates is strictly prohibited.
4
+
5
+ Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
README.md CHANGED
@@ -8,9 +8,9 @@ tags:
8
  - llama-3
9
  - pytorch
10
  license: other
11
- license_name: nvidia-ai-foundation-models-community-license
12
  license_link: >-
13
- https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/
14
  ---
15
 
16
  # Llama-3_1-Nemotron-51B-instruct
@@ -22,14 +22,15 @@ Llama-3_1-Nemotron-51B-instruct is a model which offers a great tradeoff between
22
 
23
 
24
  ## License
25
- [NVIDIA AI Foundation Models Community License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/). Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/). Built with Llama.
 
26
 
27
  ## How was the model developed
28
 
29
  Llama-3_1-Nemotron-51B-instruct is a large language model (LLM) which is a derivative of Llama-3.1-70B-instruct (AKA the reference model). We utilize a block-wise distillation of the reference model, where for each block we create multiple variants providing different tradeoffs of quality vs. computational complexity. We then search over the blocks to create a model which meets the required throughput and memory (optimized for a single H100-80GB GPU) while minimizing the quality degradation. The model then undergoes knowledge distillation (KD), with a focus on English single and multi-turn chat use-cases.
30
  The KD step included 40 billion tokens consisting of a mixture of 3 datasets - FineWeb, Buzz-V1.2 and Dolma.
31
 
32
- Links to [NIM](https://build.nvidia.com/nvidia/llama-3_1-nemotron-51b-instruct), [blog](https://developer.nvidia.com/blog/advancing-the-accuracy-efficiency-frontier-with-llama-3-1-nemotron-51b/) and [huggingface](https://huggingface.co/nvidia/Llama-3_1-Nemotron-51B-Instruct)
33
 
34
 
35
  This results in a final model that is aligned for human chat preferences.
@@ -74,7 +75,7 @@ print(pipeline([{"role": "user", "content": "Hey how are you?"}]))
74
 
75
  ## Required Hardware
76
 
77
- FP8 Inference (recommended):
78
  - 1x H100-80GB GPU
79
 
80
  BF16 Inference:
 
8
  - llama-3
9
  - pytorch
10
  license: other
11
+ license_name: nvidia-open-model-license
12
  license_link: >-
13
+ https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
14
  ---
15
 
16
  # Llama-3_1-Nemotron-51B-instruct
 
22
 
23
 
24
  ## License
25
+ This model is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
26
+ Additional Information: [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/). Built with Llama.
27
 
28
  ## How was the model developed
29
 
30
  Llama-3_1-Nemotron-51B-instruct is a large language model (LLM) which is a derivative of Llama-3.1-70B-instruct (AKA the reference model). We utilize a block-wise distillation of the reference model, where for each block we create multiple variants providing different tradeoffs of quality vs. computational complexity. We then search over the blocks to create a model which meets the required throughput and memory (optimized for a single H100-80GB GPU) while minimizing the quality degradation. The model then undergoes knowledge distillation (KD), with a focus on English single and multi-turn chat use-cases.
31
  The KD step included 40 billion tokens consisting of a mixture of 3 datasets - FineWeb, Buzz-V1.2 and Dolma.
32
 
33
+ Links to [NIM](https://build.nvidia.com/nvidia/llama-3_1-nemotron-51b-vv), blog and [huggingface](https://huggingface.co/nvidia/Llama-3_1-Nemotron-51B-Instruct)
34
 
35
 
36
  This results in a final model that is aligned for human chat preferences.
 
75
 
76
  ## Required Hardware
77
 
78
+ FP8 Inference (recommendedrecommended):
79
  - 1x H100-80GB GPU
80
 
81
  BF16 Inference: