SN-13B-8k-Instruct / README.md
viren-shah's picture
add model card and license
9faedd5
|
raw
history blame
5.2 kB
metadata
license: Apache 2.0
inference: false

SN-13B-8k-Instruct

SN-13B-8k-Instruct is a 13 billion parameter model. It is both pretrained from scratch as well as instruction tuned on Sambanova Datascale. This model is meant to be used for tasks requiring long sequence understanding.

Model Details

Model Description

  • Developed by: SambaNova Systems
  • Model type: Language Model
  • Language(s): English
  • License: Apache 2.0

Basic Information

  • Blog Post: Link
  • Discord: Link

    Direct Use

    This model is intended for commercial and research use.

    Out-of-Scope Use

    SN-13B-8k-Instruct should NOT be used for:

    • Mission-critical applications
    • Applications that involve the safety of others
    • Making highly important decisions
    • Important automated pipelines

    This model is still in early development and can be prone to mistakes and hallucinations, there is still room for improvement. This model is intended to provide the community with a multilingual chat LLM baseline.

    Recommendations

    Users should be made aware of the risks, biases, limitations, and restrictions of the model, which are listed down at the bottom of the page.


    Running the model

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/SN-13B-8k-Instruct")
    model = AutoModelForCausalLM.from_pretrained("sambanovasystems/SN-13B-8k-Instruct")
    
    prompt = 'Talk to me about Machine Learning'
    inputs = tokenizer(prompt, return_tensors='pt')
    
    # SN-13B-8k-Instruct occasionally repeats itself when do_sample=False.  Set do_sample=True when using the model to avoid this.
    outputs = model.generate(**inputs, use_cache=True, max_new_tokens=50, do_sample=True)
    
    print(tokenizer.batch_decode(outputs))
    

    Bias, Risks, and Limitations

    Like all LLMs, SN-13B-8k-Instruct has certain limitations:

    • Hallucination: SN-13B-8k-Instruct may sometimes generate responses that contain plausible-sounding but factually incorrect or irrelevant information.
    • Code Switching: The model might unintentionally switch between languages or dialects within a single response, affecting the coherence and understandability of the output.
    • Repetition: SN-13B-8k-Instruct may produce repetitive phrases or sentences, leading to less engaging and informative responses.
    • Coding and Math: The model's performance in generating accurate code or solving complex mathematical problems may be limited.
    • Toxicity: SN-13B-8k-Instruct may inadvertently generate responses containing inappropriate or harmful content.

    Acknowledgment

    We appreciate lm-eval-harness and HELM for their essential benchmarking contributions, which is very helpful in evaluating SN-13B-8k-Instruct's performance. We appreciate the inspiration from the wave of various recent open-source long sequence models, including XGen, MPT, and Llama-2 and so on. We look forward to witnessing the continued growth and success of open-source long sequence models.

    We highly appreciate the hard work and dedication of these researchers and organizations towards the advancement of the open-source community. Their contributions were invaluable in the development of SN-13B-8k-Instruct, and we hope that our model can contribute to further advancements in the field.

    Cite SN-13B-8k-Instruct

    @software{sn-13b-8k-instruct,
      title = {SN-13B-8k-Instruct: a New Open Multilingual Chat LLM},
      author = {SambaNova Systems},
      url = {https://huggingface.co/sambanovasystems/SN-13B-8k-Instruct-176B-v1}
      month = {8},
      year = {2023},
      version = {1.0},
    }