Model Description

This is a fine-tuned version of the Minerva model, trained on the Medical Meadow Flashcard Dataset for question answering. The model was developed by the Sapienza NLP Team in collaboration with Future Artificial Intelligence Research (FAIR) and CINECA; specifically, I used the version with 350 million parameters due to computational limits, though versions with 1 billion and 3 billion parameters also exist. For more details, please refer to their repositories: Sapienza NLP on Hugging Face and Minerva LLMs.

Issues and possible Solutions

  • In the original fine-tuned version, my model tended to generate answers that continued unnecessarily, leading to repeated sentences and a degradation in quality over time. Parameters like 'max_length' or 'max_new_tokens' were ineffective as they merely stopped the generation at a specified point without properly concluding the sentence. To address this issue, I redefined the stopping criteria to terminate the generation at the first period ('.'), as demonstrated in the code below:
  • class newStoppingCriteria(StoppingCriteria):
    
      def __init__(self, stop_word):
          self.stop_word = stop_word
    
      def __call__(self, input_ids, scores, **kwargs):
    
          decoded_text = tokenizer.decode(input_ids[0], skip_special_tokens=True)
          return self.stop_word in decoded_text
    
    
    criteria = newStoppingCriteria(stop_word = ".")
    stoppingCriteriaList = StoppingCriteriaList([criteria])
    

  • Since the preprocessed text was formatted as "BoS token - Question - EoS token - BoS token - Answer - EoS token," the model generated answers that included the question as well. To resolve this, I implemented a method to remove the question from the generated text, leaving only the answer:

  • outputText = tokenizer.decode(output_ids[0], skip_special_tokens = True)
    inputText = tokenizer.decode(inputEncoding.input_ids[0], skip_special_tokens = True)
    answer = outputText[len(inputText):].strip()
    

Use Example

  question = 'What causes Wernicke encephalopathy?'

  inputEncoding = tokenizer(question, return_tensors = 'pt').to('cuda')
  output_ids = model.generate(
    
      inputEncoding.input_ids, 
      max_length = 128, 
      do_sample = True, 
      temperature = 0.7, 
      top_p = 0.97, 
      top_k = 2, 
      pad_token_id = tokenizer.eos_token_id,
      repetition_penalty = 1.2,
      stopping_criteria = stoppingCriteriaList  
  )

  outputText = tokenizer.decode(output_ids[0], skip_special_tokens = True)
  inputText = tokenizer.decode(inputEncoding.input_ids[0], skip_special_tokens = True)
  answer = outputText[len(inputText):].strip()

  # Generated Answer: Wernicke encephalopathy is caused by a defect in the Wern-Herxheimer reaction, which leads to an accumulation of acid and alkaline phosphatase activity.
  # Effective Answer: The underlying pathophysiologic cause of Wernicke encephalopathy is thiamine (B1) deficiency.

Training Information

The model was fine-tuned for 3 epochs using the parameters specified in its original repository:

  trainingArgs = TrainingArguments(

    output_dir = "MedicalFlashcardsMinerva",
    evaluation_strategy = "steps",
    save_strategy = "steps",
    learning_rate = 2e-4,
    per_device_train_batch_size = 6,
    per_device_eval_batch_size = 6,
    gradient_accumulation_steps = 8,
    num_train_epochs = 3,
    lr_scheduler_type = "cosine",
    warmup_ratio = 0.1,
    adam_beta1 = 0.9,
    adam_beta2 = 0.95,
    adam_epsilon = 1e-8,
    weight_decay = 0.01,
    logging_steps = 100,
    report_to = "none",

    )
Downloads last month
28
Safetensors
Model size
352M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train FabioS08/MedicalFlashcardsMinerva