--- license: llama2 language: - en pipeline_tag: text-generation --- # InvestLM This is the repo for a new financial domain large language model, InvestLM, tuned on [Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1), using a carefully curated instruction dataset related to financial investment. We provide guidance on how to use InvestLM for inference. Github Link: [InvestLM](https://github.com/AbaciNLP/InvestLM) Test only, not for sharing. # About AWQ [AWQ](https://github.com/casper-hansen/AutoAWQ) is an efficient, accurate, and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference. # Inference Please use the following command to log in hugging face first. ``` huggingface-cli login ``` ## Prompt template ``` [INST] {prompt} [/INST] ``` ## How to use this AWQ model from Python code ``` pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0" ``` ``` from transformers import AutoTokenizer, TextStreamer quant_path = "yixuantt/InvestLM-Mistral-AWQ" # Load model model = AutoModelForCausalLM.from_pretrained( quant_path, low_cpu_mem_usage=True, device_map="cuda:0" ) tokenizer = AutoTokenizer.from_pretrained(quant_path) # Convert prompt to tokens prompt_template = "[INST] {prompt} [/INST]" prompt = "What is finance?" tokens = tokenizer( prompt_template.format(prompt=prompt), return_tensors='pt' ).input_ids.cuda() # Generate output generation_output = model.generate( tokens, max_new_tokens = 512 ) print("Output: ", tokenizer.decode(generation_output[0])) ```