patrickvonplaten commited on
Commit
b445847
·
1 Parent(s): b002a96

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -5
README.md CHANGED
@@ -7,20 +7,35 @@ tags:
7
  license: mit
8
  ---
9
 
10
-
11
  # OPT : Open Pre-trained Transformer Language Models
12
 
13
- OPT was predominantly pretrained with English text, but a small amount of non-English data is still present within the training corpus via CommonCrawl. The model was pretrained using a causal language modeling (CLM) objective.
14
-
15
  OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
16
 
17
  **Disclaimer**: The team releasing OPT wrote an official model card, which is available in Appendix D of the [paper](https://arxiv.org/pdf/2205.01068.pdf).
18
  Content from **this** model card has been written by the Hugging Face team.
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ## Model description
21
 
22
- OPT belongs to the same family of decoder-only models like [GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling
23
- objective.
24
 
25
  For evaluation, OPT follows [GPT-3](https://arxiv.org/abs/2005.14165) by using their prompts and overall experimental setup. For more details, please read
26
  the [official paper](https://arxiv.org/abs/2205.01068).
 
7
  license: mit
8
  ---
9
 
 
10
  # OPT : Open Pre-trained Transformer Language Models
11
 
 
 
12
  OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
13
 
14
  **Disclaimer**: The team releasing OPT wrote an official model card, which is available in Appendix D of the [paper](https://arxiv.org/pdf/2205.01068.pdf).
15
  Content from **this** model card has been written by the Hugging Face team.
16
 
17
+ ## Intro
18
+
19
+ To quote the first two paragraphs of the [official paper](https://arxiv.org/abs/2205.01068)
20
+
21
+ > Large language models trained on massive text collections have shown surprising emergent
22
+ > capabilities to generate text and perform zero- and few-shot learning. While in some cases the public
23
+ > can interact with these models through paid APIs, full model access is currently limited to only a
24
+ > few highly resourced labs. This restricted access has limited researchers’ ability to study how and
25
+ > why these large language models work, hindering progress on improving known challenges in areas
26
+ > such as robustness, bias, and toxicity.
27
+
28
+ > We present Open Pretrained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M
29
+ > to 175B parameters, which we aim to fully and responsibly share with interested researchers. We train the OPT models to roughly match
30
+ > the performance and sizes of the GPT-3 class of models, while also applying the latest best practices in data
31
+ > collection and efficient training. Our aim in developing this suite of OPT models is to enable reproducible and responsible research at scale, and
32
+ > to bring more voices to the table in studying the impact of these LLMs. Definitions of risk, harm, bias, and toxicity, etc., should be articulated by the
33
+ > collective research community as a whole, which is only possible when models are available for study.
34
+
35
  ## Model description
36
 
37
+ OPT was predominantly pretrained with English text, but a small amount of non-English data is still present within the training corpus via CommonCrawl. The model was pretrained using a causal language modeling (CLM) objective.
38
+ OPT belongs to the same family of decoder-only models like [GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling objective.
39
 
40
  For evaluation, OPT follows [GPT-3](https://arxiv.org/abs/2005.14165) by using their prompts and overall experimental setup. For more details, please read
41
  the [official paper](https://arxiv.org/abs/2205.01068).