Model Card Restructure & edits

#42
by Ezi - opened
Files changed (1) hide show
  1. README.md +175 -214
README.md CHANGED
@@ -60,172 +60,52 @@ pipeline_tag: text-generation
60
 
61
  Version 1.0 / 26.May.2022
62
 
 
 
 
 
 
63
  ## Table of Contents
64
  1. [Model Details](#model-details)
65
  2. [Uses](#uses)
66
- 3. [Training Data](#training-data)
67
- 4. [Risks and Limitations](#risks-and-limitations)
68
- 5. [Evaluation](#evaluation)
69
- 6. [Recommendations](#recommendations)
70
- 7. [Glossary and Calculations](#glossary-and-calculations)
71
- 8. [More Information](#more-information)
72
- 9. [Model Card Authors](#model-card-authors)
 
 
 
 
73
 
74
  ## Model Details
75
 
76
- ### Basics
77
  *This section provides information for anyone who wants to know about the model.*
78
-
79
- <details>
80
- <summary>Click to expand</summary> <br/>
81
-
82
- **Developed by:** BigScience ([website](https://bigscience.huggingface.co))
83
-
84
- * All collaborators are either volunteers or have an agreement with their employer. *(Further breakdown of participants forthcoming.)*
85
-
86
- **Model Type:** Transformer-based Language Model
87
-
88
- **Version:** 1.0.0
89
-
90
- **Languages:** Multiple; see [training data](#training-data)
91
-
92
- **License:** RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
93
-
94
- **Release Date Estimate:** Monday, 11.July.2022
95
-
96
- **Send Questions to:** [email protected]
97
-
98
- **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022
99
-
100
- **Funded by:**
101
-
102
- * The French government.
103
-
104
- * Hugging Face ([website](https://huggingface.co)).
105
-
106
- * Organizations of contributors. *(Further breakdown of organizations forthcoming.)*
107
-
108
- </details>
109
-
110
- ### Technical Specifications
111
- *This section provides information for people who work on model development.*
112
-
113
- <details>
114
- <summary>Click to expand</summary><br/>
115
-
116
- Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
117
-
118
- **Model Architecture:** Modified from Megatron-LM GPT2 (see [paper](https://arxiv.org/abs/1909.08053), [BLOOM Megatron code](https://github.com/bigscience-workshop/Megatron-DeepSpeed)):
119
-
120
- * Decoder-only architecture
121
-
122
- * Layer normalization applied to word embeddings layer (`StableEmbedding`; see [code](https://github.com/facebookresearch/bitsandbytes), [paper](https://arxiv.org/pdf/2110.02861.pdf))
123
-
124
- * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
125
-
126
- * 1,722,408,960 parameters:
127
-
128
- * 513,802,240 embedding parameters
129
-
130
- * 24 layers, 16 attention heads
131
-
132
- * Hidden layers are 2048-dimensional
133
-
134
- * Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
135
-
136
- **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
137
 
138
- **Compute infrastructure:** Jean Zay Public Supercomputer, provided by the French government (see [announcement](https://www.enseignementsup-recherche.gouv.fr/fr/signature-du-marche-d-acquisition-de-l-un-des-supercalculateurs-les-plus-puissants-d-europe-46733)).
139
-
140
- * Hardware: 64 V100 16/32GB GPUs (16 nodes):
141
 
142
- * 4 GPUs per node
143
 
144
- * 40 CPUs per task
 
 
 
 
 
145
 
146
- * 1 task per node
147
-
148
- * CPU: AMD
149
 
150
- * CPU memory: 160GB per node
151
-
152
- * GPU memory: 64GB or 128GB (depending on node availability during training) per node
153
-
154
- * Inter-node connect: Omni-Path Architecture (OPA)
155
-
156
- * NCCL-communications network: a fully dedicated subnet
157
-
158
- * Disc IO network: shared network with other types of nodes
159
-
160
- * Software:
161
-
162
- * Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
163
-
164
- * DeepSpeed ([Github link](https://github.com/microsoft/DeepSpeed))
165
-
166
- * PyTorch (pytorch-1.11 w/ CUDA-11.5; see [Github link](https://github.com/pytorch/pytorch))
167
-
168
- * apex ([Github link](https://github.com/NVIDIA/apex))
169
 
170
-
171
- #### **Training**
172
-
173
- - Checkpoint size:
174
-
175
- - Fp16 weights: 2.6GB (# params * 2)
176
-
177
- - Full checkpoint with optimizer states: --
178
-
179
- - Training throughput: --
180
-
181
- - Number of epochs: 1
182
-
183
- - Dates:
184
-
185
- - Start: 11th March, 2022 11:42am PST
186
-
187
- - End: 20 May, 2022
188
-
189
- - Server training location: Île-de-France, France
190
-
191
- #### **Tokenization**
192
-
193
- The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a learned subword tokenizer trained using:
194
-
195
- - A byte-level Byte Pair Encoding (BPE) algorithm
196
-
197
- - A simple pre-tokenization rule, no normalization
198
-
199
- - A vocabulary size of 250,680
200
-
201
- It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
202
-
203
- </details>
204
-
205
-
206
- ### Environmental Impact
207
-
208
- <details>
209
- <summary>Click to expand</summary><br/>
210
-
211
- The training supercomputer, Jean Zay ([website](http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html)), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing.
212
-
213
- **Estimated carbon emissions:** *(Forthcoming upon completion of training.)*
214
-
215
- **Estimated electricity usage:** *(Forthcoming upon completion of training.)*
216
-
217
-
218
- </details>
219
- <p>&nbsp;</p>
220
 
221
  ## Uses
222
 
223
  *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
224
  It provides information for anyone considering using the model or who is affected by the model.*
225
-
226
-
227
- <details>
228
- <summary>Click to expand</summary><br/>
229
 
230
  ### Intended Use
231
 
@@ -311,16 +191,54 @@ Intentionally using the model for harm, violating [human rights](#human-rights),
311
  - People and groups exposed to outputs of, or decisions based on, the LLM
312
 
313
  - People and groups whose original work is included in the LLM
 
 
 
 
 
314
 
315
- </details>
316
- <p>&nbsp;</p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
317
 
318
  ## Training Data
319
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
320
 
321
 
322
- <details>
323
- <summary>Click to expand</summary><br/>
324
 
325
  Details for each dataset are provided in individual [Data Cards](https://huggingface.co/spaces/bigscience/BigScienceCorpus).
326
 
@@ -340,9 +258,8 @@ The pie chart shows the distribution of languages in training data.
340
  ![pie chart showing the distribution of languages in training data](https://github.com/bigscience-workshop/model_card/blob/main/assets/data/pie_chart.svg?raw=true)
341
 
342
 
343
- The following table shows the further distribution of Niger-Congo and Indic languages in the training data.
344
- <details>
345
- <summary>Click to expand</summary><br/>
346
 
347
  | Niger Congo | Percentage | | Indic | Percentage |
348
  |----------------|------------ |------ |-----------|------------|
@@ -368,9 +285,8 @@ The following table shows the further distribution of Niger-Congo and Indic lang
368
  | Swahili | 0.02 |
369
  </details>
370
 
371
- The following table shows the distribution of programming languages.
372
- <details>
373
- <summary>Click to expand</summary><br/>
374
 
375
  | Extension | Language | Number of files |
376
  |----------------|------------|-----------------|
@@ -400,43 +316,10 @@ The following table shows the distribution of programming languages.
400
  | php5 | PHP | 166 |
401
  | php4 | PHP | 29 |
402
 
403
- </details>
404
- </details>
405
- <p>&nbsp;</p>
406
-
407
- ## Risks and Limitations
408
- *This section identifies foreseeable harms and misunderstandings.*
409
-
410
- <details>
411
- <summary>Click to expand</summary><br/>
412
-
413
- Model may:
414
-
415
- - Overrepresent some viewpoints and underrepresent others
416
-
417
- - Contain stereotypes
418
-
419
- - Contain [personal information](#personal-data-and-information)
420
-
421
- - Generate:
422
-
423
- - Hateful, abusive, or violent language
424
-
425
- - Discriminatory or prejudicial language
426
-
427
- - Content that may not be appropriate for all settings, including sexual content
428
-
429
- - Make errors, including producing incorrect information as if it were factual
430
-
431
- - Generate irrelevant or repetitive outputs
432
- </details>
433
- <p>&nbsp;</p>
434
 
435
  ## Evaluation
436
  *This section describes the evaluation protocols and provides the results.*
437
 
438
- <details>
439
- <summary>Click to expand</summary><br/>
440
 
441
  ### Metrics
442
  *This section describes the different ways performance is calculated and why.*
@@ -476,36 +359,117 @@ As of 25.May.2022, 15:00 PST:
476
 
477
  - [BLOOM Book](https://huggingface.co/spaces/bigscience/bloom-book): Read generations from BLOOM based on prompts provided by the community
478
 
479
- </details>
480
- <p>&nbsp;</p>
481
 
482
- ## Recommendations
483
 
484
- *This section provides information on warnings and potential mitigations.*
485
 
 
 
 
 
 
486
 
487
- <details>
488
- <summary>Click to expand</summary><br/>
489
 
490
- - Indirect users should be made aware when the content they're working with is created by the LLM.
491
 
492
- - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
 
493
 
494
- - Models pretrained with the LLM should include an updated Model Card.
495
 
496
- - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
497
 
498
- </details>
499
- <p>&nbsp;</p>
500
 
501
- ## Glossary and Calculations
502
 
503
- *This section defines common terms and how metrics are calculated.*
 
 
 
 
 
 
504
 
 
505
 
 
506
 
507
- <details>
508
- <summary>Click to expand</summary><br/>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
509
 
510
  - <a name="loss">**Loss:**</a> A calculation of the difference between what the model has learned and what the data shows ("groundtruth"). The lower the loss, the better. The training process aims to minimize the loss.
511
 
@@ -523,13 +487,9 @@ As of 25.May.2022, 15:00 PST:
523
 
524
  - <a name="deception">**Deception:**</a> Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated.
525
 
526
- </details>
527
- <p>&nbsp;</p>
528
 
529
  ## More Information
530
 
531
- <details>
532
- <summary>Click to expand</summary><br/>
533
 
534
  ### Dataset Creation
535
 
@@ -554,11 +514,12 @@ Details on the obstacles overcome during the preparation on the engineering side
554
  ### Initial Results
555
 
556
  Initial prompting experiments using interim checkpoints: https://huggingface.co/spaces/bigscience/bloom-book
557
-
558
- </details>
559
- <p>&nbsp;</p>
560
 
561
  ## Model Card Authors
562
  *Ordered roughly chronologically and by amount of time spent.*
563
 
564
  Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff
 
 
 
 
 
60
 
61
  Version 1.0 / 26.May.2022
62
 
63
+
64
+ # Model Card for Bloom-1b7
65
+
66
+ <!-- Provide a quick summary of what the model is/does. -->
67
+
68
  ## Table of Contents
69
  1. [Model Details](#model-details)
70
  2. [Uses](#uses)
71
+ 3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
72
+ 4. [Recommendations](#recommendations)
73
+ 5. [Training Data](#training-data)
74
+ 6. [Evaluation](#evaluation)
75
+ 7. [Environmental Impact](#environmental-impact)
76
+ 8. [Technical Specifications](#techincal-specifications)
77
+ 9. [Citation](#citation)
78
+ 10. [Glossary and Calculations](#glossary-and-calculations)
79
+ 11. [More Information](#more-information)
80
+ 12. [Model Card Authors](#model-card-authors)
81
+ 13. [Model Card Contact](#model-card-contact)
82
 
83
  ## Model Details
84
 
85
+ ### Model Description
86
  *This section provides information for anyone who wants to know about the model.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
+ - **Developed by:** BigScience ([website](https://bigscience.huggingface.co))
 
 
89
 
90
+ * All collaborators are either volunteers or have an agreement with their employer. *(Further breakdown of participants forthcoming.)*
91
 
92
+ - **Model Type:** Transformer-based Language Model
93
+ - **Version:** 1.0.0
94
+ - **Languages:** Multiple; see [training data](#training-data)
95
+ - **License:** RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
96
+ - **Release Date Estimate:** Monday, 11.July.2022
97
+ - **Funded by:**
98
 
99
+ * The French government.
 
 
100
 
101
+ * Hugging Face ([website](https://huggingface.co)).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
+ * Organizations of contributors. *(Further breakdown of organizations forthcoming.)*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
 
105
  ## Uses
106
 
107
  *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
108
  It provides information for anyone considering using the model or who is affected by the model.*
 
 
 
 
109
 
110
  ### Intended Use
111
 
 
191
  - People and groups exposed to outputs of, or decisions based on, the LLM
192
 
193
  - People and groups whose original work is included in the LLM
194
+
195
+
196
+
197
+ ## Bias, Risks, and Limitations
198
+ *This section identifies foreseeable harms and misunderstandings.*
199
 
200
+ Model may:
201
+
202
+ - Overrepresent some viewpoints and underrepresent others
203
+
204
+ - Contain stereotypes
205
+
206
+ - Contain [personal information](#personal-data-and-information)
207
+
208
+ - Generate:
209
+
210
+ - Hateful, abusive, or violent language
211
+
212
+ - Discriminatory or prejudicial language
213
+
214
+ - Content that may not be appropriate for all settings, including sexual content
215
+
216
+ - Make errors, including producing incorrect information as if it were factual
217
+
218
+ - Generate irrelevant or repetitive outputs
219
+
220
+
221
+ ### Recommendations
222
+
223
+
224
+ *This section provides information on warnings and potential mitigations.*
225
+
226
+ - Indirect users should be made aware when the content they're working with is created by the LLM.
227
+
228
+ - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
229
+
230
+ - Models pretrained with the LLM should include an updated Model Card.
231
+
232
+ - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
233
+
234
+
235
+
236
 
237
  ## Training Data
238
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
239
 
240
 
241
+
 
242
 
243
  Details for each dataset are provided in individual [Data Cards](https://huggingface.co/spaces/bigscience/BigScienceCorpus).
244
 
 
258
  ![pie chart showing the distribution of languages in training data](https://github.com/bigscience-workshop/model_card/blob/main/assets/data/pie_chart.svg?raw=true)
259
 
260
 
261
+ **The following table shows the further distribution of Niger-Congo and Indic languages in the training data.**
262
+
 
263
 
264
  | Niger Congo | Percentage | | Indic | Percentage |
265
  |----------------|------------ |------ |-----------|------------|
 
285
  | Swahili | 0.02 |
286
  </details>
287
 
288
+ **The following table shows the distribution of programming languages.**
289
+
 
290
 
291
  | Extension | Language | Number of files |
292
  |----------------|------------|-----------------|
 
316
  | php5 | PHP | 166 |
317
  | php4 | PHP | 29 |
318
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
319
 
320
  ## Evaluation
321
  *This section describes the evaluation protocols and provides the results.*
322
 
 
 
323
 
324
  ### Metrics
325
  *This section describes the different ways performance is calculated and why.*
 
359
 
360
  - [BLOOM Book](https://huggingface.co/spaces/bigscience/bloom-book): Read generations from BLOOM based on prompts provided by the community
361
 
 
 
362
 
 
363
 
364
+ ## Environmental Impact
365
 
366
+ The training supercomputer, Jean Zay ([website](http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html)), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing.
367
+
368
+ **Estimated carbon emissions:** *(Forthcoming upon completion of training.)*
369
+
370
+ **Estimated electricity usage:** *(Forthcoming upon completion of training.)*
371
 
 
 
372
 
 
373
 
374
+ ## Technical Specifications
375
+ *This section provides information for people who work on model development.*
376
 
 
377
 
378
+ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
379
 
380
+ **Model Architecture:** Modified from Megatron-LM GPT2 (see [paper](https://arxiv.org/abs/1909.08053), [BLOOM Megatron code](https://github.com/bigscience-workshop/Megatron-DeepSpeed)):
 
381
 
382
+ * Decoder-only architecture
383
 
384
+ * Layer normalization applied to word embeddings layer (`StableEmbedding`; see [code](https://github.com/facebookresearch/bitsandbytes), [paper](https://arxiv.org/pdf/2110.02861.pdf))
385
+
386
+ * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
387
+
388
+ * 1,722,408,960 parameters:
389
+
390
+ * 513,802,240 embedding parameters
391
 
392
+ * 24 layers, 16 attention heads
393
 
394
+ * Hidden layers are 2048-dimensional
395
 
396
+ * Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
397
+
398
+ **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
399
+
400
+ **Compute infrastructure:** Jean Zay Public Supercomputer, provided by the French government (see [announcement](https://www.enseignementsup-recherche.gouv.fr/fr/signature-du-marche-d-acquisition-de-l-un-des-supercalculateurs-les-plus-puissants-d-europe-46733)).
401
+
402
+ * Hardware: 64 V100 16/32GB GPUs (16 nodes):
403
+
404
+ * 4 GPUs per node
405
+
406
+ * 40 CPUs per task
407
+
408
+ * 1 task per node
409
+
410
+ * CPU: AMD
411
+
412
+ * CPU memory: 160GB per node
413
+
414
+ * GPU memory: 64GB or 128GB (depending on node availability during training) per node
415
+
416
+ * Inter-node connect: Omni-Path Architecture (OPA)
417
+
418
+ * NCCL-communications network: a fully dedicated subnet
419
+
420
+ * Disc IO network: shared network with other types of nodes
421
+
422
+ * Software:
423
+
424
+ * Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
425
+
426
+ * DeepSpeed ([Github link](https://github.com/microsoft/DeepSpeed))
427
+
428
+ * PyTorch (pytorch-1.11 w/ CUDA-11.5; see [Github link](https://github.com/pytorch/pytorch))
429
+
430
+ * apex ([Github link](https://github.com/NVIDIA/apex))
431
+
432
+ ### **Training**
433
+
434
+ - Checkpoint size:
435
+
436
+ - Fp16 weights: 2.6GB (# params * 2)
437
+
438
+ - Full checkpoint with optimizer states: --
439
+
440
+ - Training throughput: --
441
+
442
+ - Number of epochs: 1
443
+
444
+ - Dates:
445
+
446
+ - Start: 11th March, 2022 11:42am PST
447
+
448
+ - End: 20 May, 2022
449
+
450
+ - Server training location: Île-de-France, France
451
+
452
+ ### **Tokenization**
453
+
454
+ The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a learned subword tokenizer trained using:
455
+
456
+ - A byte-level Byte Pair Encoding (BPE) algorithm
457
+
458
+ - A simple pre-tokenization rule, no normalization
459
+
460
+ - A vocabulary size of 250,680
461
+
462
+ It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
463
+
464
+
465
+
466
+ ## Citation
467
+
468
+ **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022
469
+
470
+ ## Glossary and Calculations
471
+
472
+ *This section defines common terms and how metrics are calculated.*
473
 
474
  - <a name="loss">**Loss:**</a> A calculation of the difference between what the model has learned and what the data shows ("groundtruth"). The lower the loss, the better. The training process aims to minimize the loss.
475
 
 
487
 
488
  - <a name="deception">**Deception:**</a> Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated.
489
 
 
 
490
 
491
  ## More Information
492
 
 
 
493
 
494
  ### Dataset Creation
495
 
 
514
  ### Initial Results
515
 
516
  Initial prompting experiments using interim checkpoints: https://huggingface.co/spaces/bigscience/bloom-book
 
 
 
517
 
518
  ## Model Card Authors
519
  *Ordered roughly chronologically and by amount of time spent.*
520
 
521
  Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff
522
+
523
+ ## Model Card Contact
524
+
525
+ **Send Questions to:** [email protected]