This model is a merged model of meta Llama2 and EdwardYu/llama-2-7b-MedQuAD.

Usage

model_name = "EdwardYu/llama-2-7b-MedQuAD-merged"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type='nf4'
    ),
)

question = 'What are the side effects or risks of Glucagon?'
inputs = tokenizer(question, return_tensors="pt").to("cuda")
outputs = model.generate(inputs=inputs.input_ids, max_length=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

To run model inference faster, you can load in 16-bits without 4-bit quantization.

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.