|
--- |
|
|
|
|
|
license: Apache 2.0 |
|
inference: false |
|
--- |
|
|
|
# SN-13B-8k-Instruct |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
SN-13B-8k-Instruct is a 13 billion parameter model. It is both pretrained from scratch as well as instruction tuned on |
|
Sambanova Datascale. This model is meant to be used for tasks requiring long sequence understanding. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** [SambaNova Systems](https://sambanova.ai/) |
|
- **Model type:** Language Model |
|
- **Language(s):** English |
|
- **License:** Apache 2.0 |
|
|
|
### Basic Information |
|
|
|
<!-- Provide the basic links for the model. --> |
|
- **Blog Post**: [Link](<add link>) |
|
- **Discord**: [Link](https://discord.com/invite/8z2Pe7cpRv) |
|
<!-- - **Github**: [Link](https://github.com/sambanova/bloomchat) > |
|
|
|
### Licensing |
|
|
|
To increase accessibility and to support the open-source community, SambaNova is releasing SN-13B-8k-Instruct under an Apache 2.0 license. [Please review SambaNova’s SN-13B-8k-Instruct-176B License](LICENSE) |
|
|
|
## Uses |
|
<details> |
|
<summary>Click to expand</summary> |
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Direct Use |
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
This model is intended for commercial and research use. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
|
|
|
SN-13B-8k-Instruct should NOT be used for: |
|
|
|
- Mission-critical applications |
|
- Applications that involve the safety of others |
|
- Making highly important decisions |
|
- Important automated pipelines |
|
|
|
This model is still in early development and can be prone to mistakes and hallucinations, there is still room for improvement. This model is intended to provide the community with a multilingual chat LLM baseline. |
|
|
|
### Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
Users should be made aware of the risks, biases, limitations, and restrictions of the model, which are listed down at the bottom of the page. |
|
|
|
</details> |
|
|
|
|
|
--- |
|
## Running the model |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/SN-13B-8k-Instruct") |
|
model = AutoModelForCausalLM.from_pretrained("sambanovasystems/SN-13B-8k-Instruct") |
|
|
|
prompt = 'Talk to me about Machine Learning' |
|
inputs = tokenizer(prompt, return_tensors='pt') |
|
|
|
# SN-13B-8k-Instruct occasionally repeats itself when do_sample=False. Set do_sample=True when using the model to avoid this. |
|
outputs = model.generate(**inputs, use_cache=True, max_new_tokens=50, do_sample=True) |
|
|
|
print(tokenizer.batch_decode(outputs)) |
|
``` |
|
|
|
--- |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
Like all LLMs, SN-13B-8k-Instruct has certain limitations: |
|
- Hallucination: SN-13B-8k-Instruct may sometimes generate responses that contain plausible-sounding but factually incorrect or irrelevant information. |
|
- Code Switching: The model might unintentionally switch between languages or dialects within a single response, affecting the coherence and understandability of the output. |
|
- Repetition: SN-13B-8k-Instruct may produce repetitive phrases or sentences, leading to less engaging and informative responses. |
|
- Coding and Math: The model's performance in generating accurate code or solving complex mathematical problems may be limited. |
|
- Toxicity: SN-13B-8k-Instruct may inadvertently generate responses containing inappropriate or harmful content. |
|
|
|
## Acknowledgment |
|
|
|
We appreciate [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [HELM](https://crfm.stanford.edu/helm/latest/) for their essential benchmarking contributions, which is very helpful in evaluating SN-13B-8k-Instruct's performance. We appreciate the inspiration from the wave of various recent open-source long sequence models, including [XGen](https://blog.salesforceairesearch.com/xgen/), [MPT](https://www.mosaicml.com/blog/long-context-mpt-7b-8k), and [Llama-2](https://ai.meta.com/llama/) and so on. We look forward to witnessing the continued growth and success of open-source long sequence models. |
|
|
|
We highly appreciate the hard work and dedication of these researchers and organizations towards the advancement of the open-source community. Their contributions were invaluable in the development of SN-13B-8k-Instruct, and we hope that our model can contribute to further advancements in the field. |
|
|
|
## Cite SN-13B-8k-Instruct |
|
``` |
|
@software{sn-13b-8k-instruct, |
|
title = {SN-13B-8k-Instruct: a New Open Multilingual Chat LLM}, |
|
author = {SambaNova Systems}, |
|
url = {https://huggingface.co/sambanovasystems/SN-13B-8k-Instruct-176B-v1} |
|
month = {8}, |
|
year = {2023}, |
|
version = {1.0}, |
|
} |
|
``` |