library_name: transformers
tags:
- Persian
- Named Entity Recognition
- NER
- Albert
Model Card for Behpoyan-NER
Behpoyan-NER is a fine-tuned Albert model for Named Entity Recognition (NER) in the Persian language. It is based on the HooshvareLab/albert-fa-zwnj-base-v2-ner
model and identifies ten types of entities: Date (DAT), Event (EVE), Facility (FAC), Location (LOC), Money (MON), Organization (ORG), Percent (PCT), Person (PER), Product (PRO), and Time (TIM).
Model Details
Model Description
Behpoyan-NER is designed to recognize named entities in Persian text, improving upon the capabilities of its base model, HooshvareLab/albert-fa-zwnj-base-v2-ner
. It was fine-tuned on a dataset combining ARMAN, PEYMA, and WikiANN datasets, which are widely used for NER in the Persian language.
- Developed by: Behpoyan
- Model type: Albert for Token Classification
- Language(s) (NLP): Persian (fa)
- License: MIT
Model Sources
- Repository: Behpoyan/Behpoyan-NER
- Base Model Repository: HooshvareLab/albert-fa-zwnj-base-v2-ner
Direct Use
This model can be directly used for Named Entity Recognition tasks in Persian text. Example applications include text analysis, information extraction, and Persian-language NLP applications.
Downstream Use
The model can be fine-tuned further for domain-specific NER tasks or combined with other models for complex NLP pipelines.
Out-of-Scope Use
The model is not designed for languages other than Persian or tasks outside token classification. Misuse for generating biased or harmful content is discouraged.
Recommendations
While the model performs well for general-purpose NER in Persian, users should validate its performance on their specific datasets. Be cautious of biases in the training data, especially in identifying less-represented entities.
How to Get Started with the Model
Here’s how you can use the model:
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model_name = "Behpoyan/Behpoyan-NER"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = '''
"در سال ۱۴۰۱، شرکت علیبابا اعلام کرد که با همکاری بانک ملت، یک پروژه بزرگ برای توسعه زیرساختهای تجارت الکترونیک در ایران آغاز خواهد کرد.
این پروژه در تهران و اصفهان اجرا میشود و پیشبینی میشود تا پایان سال ۱۴۰۲ تکمیل شود."
'''
ner_results = nlp(example)
print(ner_results)