Model Card for Model ID: NepaliAI/NFT-6.9k

Model Details

Model Description

The NepaliAI/NFT-6.9k model is based on the BART (Bidirectional and Auto-Regressive Transformers) architecture, specifically utilizing the facebook/bart-large-xsum pre-trained model. It has been fine-tuned on the Nepali health-related question-answering dataset.

Intended Use

The model is designed to generate answers to health-related questions provided by users. The primary language for input and output is Nepali.

Training Data

The model has been fine-tuned on the NepaliAI health-related question-answering dataset, derived from the NepaliAI/Nepali-Health-QA dataset. The training data consists of pairs of health-related questions and their corresponding answers.

Training Procedure

The model was trained for 5 epochs with the following training parameters:

  • Learning Rate: 5e-5
  • Batch Size: 2
  • Gradient Accumulation Steps: 4
  • FP16 (mixed-precision training): Enabled
  • Optimizer: AdamW with weight decay

The training loss consistently decreased, indicating successful learning.

Use case:

  from transformers import BartTokenizer, BartForConditionalGeneration
  
  # Load the trained model
  model = BartForConditionalGeneration.from_pretrained("NepaliAI/NFT-6.9k")
  
  # Load the tokenizer for generating new output
  tokenizer = BartTokenizer.from_pretrained("NepaliAI/NFT-6.9k")
  
  # Example text
  input_text = "के म मेरो महिनावारीको समयमा स्ट्रेप थ्रोटको लागि डाक्टरले तोकेको औषधि लिन सक्छु?"
  
  # Tokenize the input
  inputs = tokenizer(input_text, return_tensors="pt", max_length=128, truncation=True)
  
  # Generate text
  generated_text = model.generate(**inputs, max_length=256, top_p=0.95, top_k=50, do_sample=True, temperature=0.7, num_return_sequences=1, no_repeat_ngram_size=2)
  
  # Decode the generated text
  generated_response = tokenizer.batch_decode(generated_text, skip_special_tokens=True)[0]
  
  # Print the generated response
  print("Generated Response:", generated_response)

Evaluation

Metrics

No evaluation performed yet.

Limitations

  • The model's knowledge is limited to the training data, and it might not generalize well to unseen health-related questions.
  • The model's responses might vary based on the complexity and diversity of input questions.

Ethical Considerations

Intended Use

#The model is intended for informational purposes related to health. Users should be aware that the model's responses are generated based on patterns learned from the training data and might not substitute professional medical advice.

Bias

Care should be taken to minimize biases present in the training data. Diverse and representative datasets are crucial to mitigate biases in the model's responses.

Privacy

The model does not store or retain user input. However, users are advised not to input sensitive or personally identifiable information.

Future Directions

  • Continuous improvement through hyperparameter tuning and model architecture exploration.
  • Expansion of the training dataset to enhance the model's knowledge and performance.
  • Adaptation of the model for other fields of Nepali datasets, starting with health-related datasets.

License

This model is made available under the Hugging Face Model Hub's community guidelines and the license specified by the facebook/bart-large-xsum pre-trained model.


Downloads last month
2
Safetensors
Model size
406M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train NepaliAI/NFT-6.9k