yixuantt
/

InvestLM-mistral-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

InvestLM-mistral-AWQ / README.md

yixuantt's picture

Update README.md

374fa69 verified 3 months ago

|

history blame contribute delete

1.67 kB

	---
	license: llama2
	language:
	- en
	pipeline_tag: text-generation
	---
	# InvestLM
	This is the repo for a new financial domain large language model, InvestLM, tuned on [Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1), using a carefully curated instruction dataset related to financial investment. We provide guidance on how to use InvestLM for inference.

	Github Link: [InvestLM](https://github.com/AbaciNLP/InvestLM)

	<font color="#0000FF">Test only, not for sharing.</font>

	# About AWQ
	[AWQ](https://github.com/casper-hansen/AutoAWQ) is an efficient, accurate, and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference.

	# Inference
	Please use the following command to log in hugging face first.
	```
	huggingface-cli login
	```
	## Prompt template

	```
	[INST] {prompt} [/INST]
	```

	## How to use this AWQ model from Python code
	```
	pip3 install --upgrade "autoawq>=0.1.6" "transformers>=4.35.0"
	```

	```
	from transformers import AutoTokenizer, TextStreamer

	quant_path = "yixuantt/InvestLM-Mistral-AWQ"

	# Load model
	model = AutoModelForCausalLM.from_pretrained(
	quant_path,
	low_cpu_mem_usage=True,
	device_map="cuda:0"
	)
	tokenizer = AutoTokenizer.from_pretrained(quant_path)

	# Convert prompt to tokens
	prompt_template = "[INST] {prompt} [/INST]"
	prompt = "What is finance?"

	tokens = tokenizer(
	prompt_template.format(prompt=prompt),
	return_tensors='pt'
	).input_ids.cuda()

	# Generate output
	generation_output = model.generate(
	tokens,
	max_new_tokens = 512
	)

	print("Output: ", tokenizer.decode(generation_output[0]))

	```