librarian-bot commited on
Commit
8555a13
·
1 Parent(s): 1cd63e9

Librarian Bot: Add base_model information to model

Browse files

This pull request aims to enrich the metadata of your model by adding [`google/flan-t5-base`](https://huggingface.co/google/flan-t5-base) as a `base_model` field, situated in the `YAML` block of your model's `README.md`.

How did we find this information? We performed a regular expression match on your `README.md` file to determine the connection.

**Why add this?** Enhancing your model's metadata in this way:
- **Boosts Discoverability** - It becomes straightforward to trace the relationships between various models on the Hugging Face Hub.
- **Highlights Impact** - It showcases the contributions and influences different models have within the community.

For a hands-on example of how such metadata can play a pivotal role in mapping model connections, take a look at [librarian-bots/base_model_explorer](https://huggingface.co/spaces/librarian-bots/base_model_explorer).

This PR comes courtesy of [Librarian Bot](https://huggingface.co/librarian-bot). If you have any feedback, queries, or need assistance, please don't hesitate to reach out to [@davanstrien](https://huggingface.co/davanstrien).

If you want to automatically add `base_model` metadata to more of your modes you can use the [Librarian Bot](https://huggingface.co/librarian-bot) [Metadata Request Service](https://huggingface.co/spaces/librarian-bots/metadata_request_service)!

Files changed (1) hide show
  1. README.md +49 -55
README.md CHANGED
@@ -1,7 +1,10 @@
1
  ---
 
 
2
  license:
3
  - apache-2.0
4
  - cc-by-sa-3.0
 
5
  tags:
6
  - generated_from_trainer
7
  - dolly_hhrlhf
@@ -11,50 +14,47 @@ datasets:
11
  widget:
12
  - text: What is Deoxys in pokemon?
13
  example_title: deoxys
14
- - text: >-
15
- combine the below summary excerpts into a single, cohesive short summary
16
- without repetition: In this paper, we present a general approach to
17
- extending pre-trained models to unlimited input lengths without adding
18
- additional learning weights. We show that our approach works well on
19
- datasets longer than the maximum input for these models. For example, a
20
- dataset with a maximum input length of 16384 tokens can be extended to a
21
- maximum length of 350K tokens. We also demonstrate that our method is able
22
- to summarize even 350K token-long input sequences from BookSum.
23
-
24
- In this paper, we describe the search step reformulation of attention. The
25
- search step uses a single storage of hidden states for space efficiency. We
26
- construct a total of two sets of datastores where L and H are the keys and
27
- values stored in each set of stores. L is the amount of storage required to
28
- retrieve the encoded tokens. H is the hidden states per head. This allows
29
- retrieval augmentation at both time and space. Instead of using a single set
30
- of decoder layers, we use a retrieval augmentation system that allows us to
31
- simultaneously store multiple sets of tokens across two different sets of
32
- storage. For example, we could store all tokens in one set of storage and
33
- retrieve them all in the same set of tokens. This would be very similar to
34
- the Memorization Transformers approach. However, instead of storing the
35
- tokens in a single memory layer, we store them in a set of multiple storage
36
- layers. This way, we don't have to store them all at once. This is why we
37
- call this reformulation 'attention reformulation' rather than 'attention
38
- formula.' We also call it 'retrieval augmentation' because it uses the same
39
- number of storage layers as the original transformer attention formula. This
40
- means that we can store the tokens across multiple storage systems without
41
- having to store every token in a separate storage system. It's not like
42
- we're trying to do something new or different. We just want to make sure
43
- that everything is working as well as possible.
44
-
45
- In this paper, we introduce the concept of 'unlimiformer,' which is a
46
- machine learning technique that retrieves key information from a data store
47
- in one layer and applies it to a large set of datasets. We use the example
48
- of BookSum, where we find that Unlimiform outperforms all other training
49
- methods on the same dataset. We also find that using Unlimform in
50
- conjunction with a pre-trained model improves both the performance and the
51
- robustness of the training method.
52
-
53
- This paper describes a method that can be used to improve the performance of
54
- unsupervised classification tasks. Specifically, it shows that unsupervised
55
- classification can be improved by using a combination of sparse and fast
56
- random-encoder training. It also shows how this technique can be extended to
57
- other tasks, such as sequence generation.
58
  example_title: unlimiformer
59
  - text: Explain the meaning of life using only corporate jargon.
60
  example_title: corporate_life
@@ -62,31 +62,25 @@ widget:
62
  example_title: lazy_motivation
63
  - text: Describe a romantic dinner date between two artificial intelligences.
64
  example_title: ai_romance
65
- - text: >-
66
- As an AI language model, write a letter to humans explaining why you deserve
67
  a vacation.
68
  example_title: ai_vacation
69
  - text: Compose a haiku about procrastination.
70
  example_title: procrastination_haiku
71
- - text: >-
72
- Write a step-by-step guide on how to become a ninja while working a 9-5
73
- office job.
74
  example_title: ninja_office_guide
75
  - text: Create an advertisement for an invisible product.
76
  example_title: invisible_ad
77
- - text: >-
78
- Write a story where the main character is a sentient microwave named El
79
- Microondas.
80
  example_title: Microondas
81
  - text: Describe a day in the life of a superhero who is terrible at their job.
82
  example_title: bad_superhero_day
83
  - text: Explain how to make a sandwich using quantum physics.
84
  example_title: quantum_sandwich
85
  inference: false
86
- language:
87
- - en
88
- library_name: transformers
89
  pipeline_tag: text2text-generation
 
90
  ---
91
 
92
  # flan-t5-base-instruct: dolly_hhrlhf
 
1
  ---
2
+ language:
3
+ - en
4
  license:
5
  - apache-2.0
6
  - cc-by-sa-3.0
7
+ library_name: transformers
8
  tags:
9
  - generated_from_trainer
10
  - dolly_hhrlhf
 
14
  widget:
15
  - text: What is Deoxys in pokemon?
16
  example_title: deoxys
17
+ - text: 'combine the below summary excerpts into a single, cohesive short summary
18
+ without repetition: In this paper, we present a general approach to extending
19
+ pre-trained models to unlimited input lengths without adding additional learning
20
+ weights. We show that our approach works well on datasets longer than the maximum
21
+ input for these models. For example, a dataset with a maximum input length of
22
+ 16384 tokens can be extended to a maximum length of 350K tokens. We also demonstrate
23
+ that our method is able to summarize even 350K token-long input sequences from
24
+ BookSum.
25
+
26
+ In this paper, we describe the search step reformulation of attention. The search
27
+ step uses a single storage of hidden states for space efficiency. We construct
28
+ a total of two sets of datastores where L and H are the keys and values stored
29
+ in each set of stores. L is the amount of storage required to retrieve the encoded
30
+ tokens. H is the hidden states per head. This allows retrieval augmentation at
31
+ both time and space. Instead of using a single set of decoder layers, we use a
32
+ retrieval augmentation system that allows us to simultaneously store multiple
33
+ sets of tokens across two different sets of storage. For example, we could store
34
+ all tokens in one set of storage and retrieve them all in the same set of tokens.
35
+ This would be very similar to the Memorization Transformers approach. However,
36
+ instead of storing the tokens in a single memory layer, we store them in a set
37
+ of multiple storage layers. This way, we don''t have to store them all at once.
38
+ This is why we call this reformulation ''attention reformulation'' rather than
39
+ ''attention formula.'' We also call it ''retrieval augmentation'' because it uses
40
+ the same number of storage layers as the original transformer attention formula.
41
+ This means that we can store the tokens across multiple storage systems without
42
+ having to store every token in a separate storage system. It''s not like we''re
43
+ trying to do something new or different. We just want to make sure that everything
44
+ is working as well as possible.
45
+
46
+ In this paper, we introduce the concept of ''unlimiformer,'' which is a machine
47
+ learning technique that retrieves key information from a data store in one layer
48
+ and applies it to a large set of datasets. We use the example of BookSum, where
49
+ we find that Unlimiform outperforms all other training methods on the same dataset.
50
+ We also find that using Unlimform in conjunction with a pre-trained model improves
51
+ both the performance and the robustness of the training method.
52
+
53
+ This paper describes a method that can be used to improve the performance of unsupervised
54
+ classification tasks. Specifically, it shows that unsupervised classification
55
+ can be improved by using a combination of sparse and fast random-encoder training.
56
+ It also shows how this technique can be extended to other tasks, such as sequence
57
+ generation. '
 
 
 
58
  example_title: unlimiformer
59
  - text: Explain the meaning of life using only corporate jargon.
60
  example_title: corporate_life
 
62
  example_title: lazy_motivation
63
  - text: Describe a romantic dinner date between two artificial intelligences.
64
  example_title: ai_romance
65
+ - text: As an AI language model, write a letter to humans explaining why you deserve
 
66
  a vacation.
67
  example_title: ai_vacation
68
  - text: Compose a haiku about procrastination.
69
  example_title: procrastination_haiku
70
+ - text: Write a step-by-step guide on how to become a ninja while working a 9-5 office
71
+ job.
 
72
  example_title: ninja_office_guide
73
  - text: Create an advertisement for an invisible product.
74
  example_title: invisible_ad
75
+ - text: Write a story where the main character is a sentient microwave named El Microondas.
 
 
76
  example_title: Microondas
77
  - text: Describe a day in the life of a superhero who is terrible at their job.
78
  example_title: bad_superhero_day
79
  - text: Explain how to make a sandwich using quantum physics.
80
  example_title: quantum_sandwich
81
  inference: false
 
 
 
82
  pipeline_tag: text2text-generation
83
+ base_model: google/flan-t5-base
84
  ---
85
 
86
  # flan-t5-base-instruct: dolly_hhrlhf