File size: 3,909 Bytes
5ee52db 699179f 4721549 d28a350 5ee52db c0fd3c3 5ee52db c0fd3c3 5ee52db c0fd3c3 5ee52db c0fd3c3 5ee52db c0fd3c3 5ee52db c0fd3c3 5ee52db c0fd3c3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
language:
- en
tags:
- gpt2
license: apache-2.0
datasets:
- wikitext
- openwebtext
- spacemanidol/cc-stories
model-index:
- name: megatron-gpt2-345m
results:
- task:
type: text-generation
name: Text generation
dataset:
name: WikiText-103
type: wikitext
metrics:
- type: wikitext
value: 19.31
name: Perplexity
---
<!---
# ##############################################################################################
#
# Copyright (c) 2021-, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# ##############################################################################################
-->
This is an archive of [nvidia/megatron-gpt2-345m](https://huggingface.co/nvidia/megatron-gpt2-345m) that contains readily available model weights (375M). Its performance on Wikitext-103 is 19.31.<sup>1</sup> In comparison, the performance of GPT2-large (1.5B) is 17.48 and GPT2-medium (762M) is 22.05.<sup>2</sup>
### References
1. Shoeybi, Mohammad, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv, 2019, [https://doi.org/10.48550/ARXIV.1909.08053](https://doi.org/10.48550/ARXIV.1909.08053).
2. Alec Radford, et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019. [https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).
## Description
[Megatron](https://arxiv.org/pdf/1909.08053.pdf) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model was trained from a generative, left-to-right transformer in the style of GPT-2. This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories. It contains 345 million parameters.
Find more information at [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
# How to run Megatron GPT2 using Transformers
## Text generation
The following code shows how to use the Megatron GPT2 checkpoint and Transformers to generate text.
```python
import os
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("robowaifudev/megatron-gpt2-345m")
if torch.cuda.is_available():
device = torch.device("cuda")
model.half()
else:
device = torch.device("cpu")
model.to(device)
model.eval()
# Generate
prompt = "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith,"
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
output = model.generate(
input_ids=input_ids,
max_length=len(input_ids) + 128,
do_sample=True,
top_k=64,
top_p=0.9,
temperature=0.8,
num_return_sequences=2,
repetition_penalty=1.025
)
# Output the text.
print("Prompt:", prompt)
print("*" * 3)
for i, sentence in enumerate(output):
text = tokenizer.decode(sentence, clean_up_tokenization_spaces=True)
print(f"{i}:", text)
print("*" * 3)
```
# Original code
The original Megatron code can be found here: [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM).
|