File size: 3,909 Bytes

5ee52db
 
 
 
 
 
 
 
 
699179f
4721549
 
 
 
 
 
 
 
 
 
 
 
d28a350
5ee52db
 
c0fd3c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5ee52db
c0fd3c3
 
 
5ee52db
 
c0fd3c3
 
5ee52db
c0fd3c3
 
 
 
5ee52db
 
c0fd3c3
 
 
5ee52db
 
c0fd3c3
 
 
5ee52db
c0fd3c3

---
language: 
  - en
tags:
  - gpt2
license: apache-2.0
datasets:
  - wikitext
  - openwebtext
  - spacemanidol/cc-stories
model-index:
  - name: megatron-gpt2-345m
    results:
      - task:
          type: text-generation
          name: Text generation
        dataset:
          name: WikiText-103
          type: wikitext
        metrics:
          - type: wikitext
            value: 19.31
            name: Perplexity
---

<!---
# ##############################################################################################
# 
# Copyright (c) 2021-, NVIDIA CORPORATION.  All rights reserved.
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# 
#     http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# 
# ##############################################################################################
-->

This is an archive of [nvidia/megatron-gpt2-345m](https://huggingface.co/nvidia/megatron-gpt2-345m) that contains readily available model weights (375M). Its performance on Wikitext-103 is 19.31.<sup>1</sup> In comparison, the performance of GPT2-large (1.5B) is 17.48 and GPT2-medium (762M) is 22.05.<sup>2</sup>

### References

1. Shoeybi, Mohammad, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv, 2019, [https://doi.org/10.48550/ARXIV.1909.08053](https://doi.org/10.48550/ARXIV.1909.08053).
2. Alec Radford, et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019. [https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).

## Description

[Megatron](https://arxiv.org/pdf/1909.08053.pdf) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model was trained from a generative, left-to-right transformer in the style of GPT-2. This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories. It contains 345 million parameters. 

Find more information at [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM)

# How to run Megatron GPT2 using Transformers

## Text generation

The following code shows how to use the Megatron GPT2 checkpoint and Transformers to generate text.

```python
import os
import torch

from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("robowaifudev/megatron-gpt2-345m")

if torch.cuda.is_available():
    device = torch.device("cuda")
    model.half()
else:
    device = torch.device("cpu")
model.to(device)
model.eval()

# Generate
prompt = "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith,"
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
output = model.generate(
    input_ids=input_ids,
    max_length=len(input_ids) + 128,
    do_sample=True,
    top_k=64,
    top_p=0.9,
    temperature=0.8,
    num_return_sequences=2,
    repetition_penalty=1.025
)

# Output the text.
print("Prompt:", prompt)
print("*" * 3)
for i, sentence in enumerate(output):
    text = tokenizer.decode(sentence, clean_up_tokenization_spaces=True)
    print(f"{i}:", text)
    print("*" * 3)
```

# Original code

The original Megatron code can be found here: [https://github.com/NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM).