Quantization made by Richard Erkhov.

finance-Llama3-8B - GGUF

Model creator: https://huggingface.co/instruction-pretrain/
Original model: https://huggingface.co/instruction-pretrain/finance-Llama3-8B/

Name	Quant method	Size
finance-Llama3-8B.Q2_K.gguf	Q2_K	2.96GB
finance-Llama3-8B.IQ3_XS.gguf	IQ3_XS	3.28GB
finance-Llama3-8B.IQ3_S.gguf	IQ3_S	3.43GB
finance-Llama3-8B.Q3_K_S.gguf	Q3_K_S	3.41GB
finance-Llama3-8B.IQ3_M.gguf	IQ3_M	3.52GB
finance-Llama3-8B.Q3_K.gguf	Q3_K	3.74GB
finance-Llama3-8B.Q3_K_M.gguf	Q3_K_M	3.74GB
finance-Llama3-8B.Q3_K_L.gguf	Q3_K_L	4.03GB
finance-Llama3-8B.IQ4_XS.gguf	IQ4_XS	4.18GB
finance-Llama3-8B.Q4_0.gguf	Q4_0	4.34GB
finance-Llama3-8B.IQ4_NL.gguf	IQ4_NL	4.38GB
finance-Llama3-8B.Q4_K_S.gguf	Q4_K_S	4.37GB
finance-Llama3-8B.Q4_K.gguf	Q4_K	4.58GB
finance-Llama3-8B.Q4_K_M.gguf	Q4_K_M	4.58GB
finance-Llama3-8B.Q4_1.gguf	Q4_1	4.78GB
finance-Llama3-8B.Q5_0.gguf	Q5_0	5.21GB
finance-Llama3-8B.Q5_K_S.gguf	Q5_K_S	5.21GB
finance-Llama3-8B.Q5_K.gguf	Q5_K	5.34GB
finance-Llama3-8B.Q5_K_M.gguf	Q5_K_M	5.34GB
finance-Llama3-8B.Q5_1.gguf	Q5_1	5.65GB
finance-Llama3-8B.Q6_K.gguf	Q6_K	6.14GB
finance-Llama3-8B.Q8_0.gguf	Q8_0	7.95GB

Original model description:

license: llama3 language: - en tags: - finance datasets: - Open-Orca/OpenOrca - GAIR/lima - WizardLM/WizardLM_evol_instruct_V2_196k

Instruction Pre-Training: Language Models are Supervised Multitask Learners

This repo contains the finance model developed from Llama3-8B in our paper Instruction Pre-Training: Language Models are Supervised Multitask Learners.

We explore supervised multitask pre-training by proposing Instruction Pre-Training, a framework that scalably augments massive raw corpora with instruction-response pairs to pre-train language models. The instruction-response pairs are generated by an efficient instruction synthesizer built on open-source models. Instruction Pre-Training outperforms Vanilla Pre-training in both general pre-training from scratch and domain-adaptive continual pre-training. In pre-training from scratch, Instruction Pre-Training not only improves pre-trained base models but also benefits more from further instruction tuning. In continual pre-training, Instruction Pre-Training enables Llama3-8B to be comparable to or even outperform Llama3-70B.

**************************** Updates ****************************

2024/7/31: Updated pre-training suggestions in the Advanced Usage section of instruction-synthesizer
2024/7/15: We scaled up the pre-trained tokens from 100B to 250B, with the number of synthesized instruction-response pairs reaching 500M! Below, we show the performance trend on downstream tasks throughout the pre-training process:
2024/6/21: Released the paper, code, and resources

Resources

🤗 We share our data and models with example usages, feel free to open any discussions at this page! 🤗

Thanks to the demo davanstrien/instruction-synthesizer for implementing our approach
Context-Based Instruction Synthesizer: instruction-synthesizer
Fine-Tuning Data for the Synthesizer: ft-instruction-synthesizer-collection
General Models Pre-Trained from Scratch (on 100B tokes):
- InstructLM-500M
- InstructLM-1.3B
Domain-Specific Models Pre-Trained from Llama3-8B:
- Finance-Llama3-8B
- Biomedicine-Llama3-8B
General Instruction-Augmented Corpora: general-instruction-augmented-corpora
Domain-Specific Instruction-Augmented Corpora (no finance data to avoid ethical issues): medicine-instruction-augmented-corpora

Domain-Adaptive Continued Pre-Training

Following AdaptLLM, we augment the domain-specific raw corpora with instruction-response pairs generated by our context-based instruction synthesizer.

1. To chat with the finance-Llama3-8B model:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("instruction-pretrain/finance-Llama3-8B")
tokenizer = AutoTokenizer.from_pretrained("instruction-pretrain/finance-Llama3-8B")

# Put your input here, NO prompt template is required
user_input = '''Use this fact to answer the question: Title of each class Trading Symbol(s) Name of each exchange on which registered
Common Stock, Par Value $.01 Per Share MMM New York Stock Exchange
MMM Chicago Stock Exchange, Inc.
1.500% Notes due 2026 MMM26 New York Stock Exchange
1.750% Notes due 2030 MMM30 New York Stock Exchange
1.500% Notes due 2031 MMM31 New York Stock Exchange

Which debt securities are registered to trade on a national securities exchange under 3M's name as of Q2 of 2023?'''

inputs = tokenizer(user_input, return_tensors="pt", add_special_tokens=True).input_ids.to(model.device)
outputs = model.generate(input_ids=inputs, max_new_tokens=400)[0]

answer_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[answer_start:], skip_special_tokens=True)

print(pred)

2. To evaluate our models on the domain-specific tasks

Set up dependencies

git clone https://github.com/microsoft/LMOps
cd LMOps/adaptllm
pip install -r requirements.txt

Evaluate

DOMAIN='finance'

# if the model can fit on a single GPU: set MODEL_PARALLEL=False
# elif the model is too large to fit on a single GPU: set MODEL_PARALLEL=True
MODEL_PARALLEL=False

# number of GPUs, chosen from [1,2,4,8]
N_GPU=1

# Set as True
add_bos_token=True

bash scripts/inference.sh ${DOMAIN} 'instruction-pretrain/finance-Llama3-8B' ${add_bos_token} ${MODEL_PARALLEL} ${N_GPU}

Citation

If you find our work helpful, please cite us:

Instruction Pre-Training

@article{cheng2024instruction,
  title={Instruction Pre-Training: Language Models are Supervised Multitask Learners},
  author={Cheng, Daixuan and Gu, Yuxian and Huang, Shaohan and Bi, Junyu and Huang, Minlie and Wei, Furu},
  journal={arXiv preprint arXiv:2406.14491},
  year={2024}
}

Adapt LLM to Domains

@inproceedings{
cheng2024adapting,
title={Adapting Large Language Models via Reading Comprehension},
author={Daixuan Cheng and Shaohan Huang and Furu Wei},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=y886UXPEZ0}
}