|
--- |
|
language: |
|
- en |
|
tags: |
|
- gpt2 |
|
license: apache-2.0 |
|
datasets: |
|
- wikitext |
|
- openwebtext |
|
- spacemanidol/cc-stories |
|
model-index: |
|
- name: megatron-gpt2-345m |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text generation |
|
dataset: |
|
name: WikiText-103 |
|
type: wikitext |
|
metrics: |
|
- type: wikitext |
|
value: 19.31 |
|
name: Perplexity |
|
--- |
|
|
|
<!--- |
|
# ############################################################################################## |
|
# |
|
# Copyright (c) 2021-, NVIDIA CORPORATION. All rights reserved. |
|
# |
|
# Licensed under the Apache License, Version 2.0 (the "License"); |
|
# you may not use this file except in compliance with the License. |
|
# You may obtain a copy of the License at |
|
# |
|
# http://www.apache.org/licenses/LICENSE-2.0 |
|
# |
|
# Unless required by applicable law or agreed to in writing, software |
|
# distributed under the License is distributed on an "AS IS" BASIS, |
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
|
# See the License for the specific language governing permissions and |
|
# limitations under the License. |
|
# |
|
# ############################################################################################## |
|
--> |
|
|
|
This is an archive of [nvidia/megatron-gpt2-345m](https://huggingface.co/nvidia/megatron-gpt2-345m) that contains readily available model weights (375M). Its performance on Wikitext-103 is 19.31.<sup>1</sup> In comparison, the performance of GPT2-large (1.5B) is 17.48 and GPT2-medium (762M) is 22.05.<sup>2</sup> |
|
|
|
### References |
|
|
|
1. Shoeybi, Mohammad, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv, 2019, [https://doi.org/10.48550/ARXIV.1909.08053](https://doi.org/10.48550/ARXIV.1909.08053). |
|
2. Alec Radford, et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019. [https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf). |
|
|
|
## Description |
|
|
|
[Megatron](https://arxiv.org/pdf/1909.08053.pdf) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model was trained from a generative, left-to-right transformer in the style of GPT-2. This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories. It contains 345 million parameters. |
|
|
|
Find more information at [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) |
|
|
|
# How to run Megatron GPT2 using Transformers |
|
|
|
## Text generation |
|
|
|
The following code shows how to use the Megatron GPT2 checkpoint and Transformers to generate text. |
|
|
|
```python |
|
import os |
|
import torch |
|
|
|
from transformers import GPT2Tokenizer, GPT2LMHeadModel |
|
|
|
tokenizer = GPT2Tokenizer.from_pretrained("gpt2") |
|
model = GPT2LMHeadModel.from_pretrained("robowaifudev/megatron-gpt2-345m") |
|
|
|
if torch.cuda.is_available(): |
|
device = torch.device("cuda") |
|
model.half() |
|
else: |
|
device = torch.device("cpu") |
|
model.to(device) |
|
model.eval() |
|
|
|
# Generate |
|
prompt = "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith," |
|
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device) |
|
output = model.generate( |
|
input_ids=input_ids, |
|
max_length=len(input_ids) + 128, |
|
do_sample=True, |
|
top_k=64, |
|
top_p=0.9, |
|
temperature=0.8, |
|
num_return_sequences=2, |
|
repetition_penalty=1.025 |
|
) |
|
|
|
# Output the text. |
|
print("Prompt:", prompt) |
|
print("*" * 3) |
|
for i, sentence in enumerate(output): |
|
text = tokenizer.decode(sentence, clean_up_tokenization_spaces=True) |
|
print(f"{i}:", text) |
|
print("*" * 3) |
|
``` |
|
|
|
# Original code |
|
|
|
The original Megatron code can be found here: [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM). |
|
|