togethercomputer
/

StripedHyena-Hessian-7B

Text Generation

Transformers

Safetensors

English

stripedhyena

custom_code

Model card Files Files and versions Community

Zymrael commited on Dec 8, 2023

Commit

00e8992

1 Parent(s): cabde32

chore: update readme

Browse files

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -14,7 +14,9 @@ language:
 One of the focus areas at Together Research is new architectures for long context, improved training, and inference performance over the Transformer architecture. Spinning out of a research program from our team and academic collaborators, with roots in **signal processing-inspired sequence models**, we are excited to introduce the **StripedHyena** models. StripedHyena is the **first alternative model competitive with the best open-source Transformers** of similar sizes in short and long-context evaluations.
-- Read more here in [our blog](https://www.together.ai/blog/stripedhyena-7b)
 - Play with the model on our playground!
 - Dive into the details of our [standalone implementation](https://github.com/togethercomputer/stripedhyena), and our related research: [1](https://arxiv.org/abs/2302.10866), [2](https://arxiv.org/abs/2310.18780), [3](https://arxiv.org/abs/2311.05908).
@@ -23,5 +25,6 @@ One of the focus areas at Together Research is new architectures for long contex
 StripedHyena is a hybrid architecture composed of multi-head, grouped-query attention and gated convolutions arranged in [Hyena](https://arxiv.org/abs/2302.10866) blocks, different from traditional decoder-only Transformers.
   - Costant memory decoding in Hyena blocks via representation of convolutions as state-space models (modal or canonical form), or as truncated filters.
   - Low latency, faster decoding and higher throughput than Transformers.
-  - Improvement to training and inference-optimal scaling laws, compared to optimized Transformer architectures such as Llama.
   - Trained on sequences of up to 32k, allowing it to process longer prompts.

 One of the focus areas at Together Research is new architectures for long context, improved training, and inference performance over the Transformer architecture. Spinning out of a research program from our team and academic collaborators, with roots in **signal processing-inspired sequence models**, we are excited to introduce the **StripedHyena** models. StripedHyena is the **first alternative model competitive with the best open-source Transformers** of similar sizes in short and long-context evaluations.
+**StripedHyena-Hessian-7B (SH 7B)** is our **base model** for this release.
+- Read more here in [our blog](https://www.together.ai/blog/stripedhyena-7b).
 - Play with the model on our playground!
 - Dive into the details of our [standalone implementation](https://github.com/togethercomputer/stripedhyena), and our related research: [1](https://arxiv.org/abs/2302.10866), [2](https://arxiv.org/abs/2310.18780), [3](https://arxiv.org/abs/2311.05908).
 StripedHyena is a hybrid architecture composed of multi-head, grouped-query attention and gated convolutions arranged in [Hyena](https://arxiv.org/abs/2302.10866) blocks, different from traditional decoder-only Transformers.
   - Costant memory decoding in Hyena blocks via representation of convolutions as state-space models (modal or canonical form), or as truncated filters.
   - Low latency, faster decoding and higher throughput than Transformers.
+  - Improvement to training and inference-optimal scaling laws, compared to optimized Transformer architectures such as Llama-2.
   - Trained on sequences of up to 32k, allowing it to process longer prompts.