Behpouyan-NER / README.md
Behpouyan's picture
Upload AlbertForTokenClassification
341afd1 verified
|
raw
history blame
2.87 kB
metadata
library_name: transformers
tags:
  - Persian
  - Named Entity Recognition
  - NER
  - Albert

Model Card for Behpoyan-NER

Behpoyan-NER is a fine-tuned Albert model for Named Entity Recognition (NER) in the Persian language. It is based on the HooshvareLab/albert-fa-zwnj-base-v2-ner model and identifies ten types of entities: Date (DAT), Event (EVE), Facility (FAC), Location (LOC), Money (MON), Organization (ORG), Percent (PCT), Person (PER), Product (PRO), and Time (TIM).

Model Details

Model Description

Behpoyan-NER is designed to recognize named entities in Persian text, improving upon the capabilities of its base model, HooshvareLab/albert-fa-zwnj-base-v2-ner. It was fine-tuned on a dataset combining ARMAN, PEYMA, and WikiANN datasets, which are widely used for NER in the Persian language.

  • Developed by: Behpoyan
  • Model type: Albert for Token Classification
  • Language(s) (NLP): Persian (fa)
  • License: MIT

Model Sources

Direct Use

This model can be directly used for Named Entity Recognition tasks in Persian text. Example applications include text analysis, information extraction, and Persian-language NLP applications.

Downstream Use

The model can be fine-tuned further for domain-specific NER tasks or combined with other models for complex NLP pipelines.

Out-of-Scope Use

The model is not designed for languages other than Persian or tasks outside token classification. Misuse for generating biased or harmful content is discouraged.

Recommendations

While the model performs well for general-purpose NER in Persian, users should validate its performance on their specific datasets. Be cautious of biases in the training data, especially in identifying less-represented entities.

How to Get Started with the Model

Here’s how you can use the model:

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_name = "Behpoyan/Behpoyan-NER"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = '''
"در سال ۱۴۰۱، شرکت علی‌بابا اعلام کرد که با همکاری بانک ملت، یک پروژه بزرگ برای توسعه زیرساخت‌های تجارت الکترونیک در ایران آغاز خواهد کرد. 
این پروژه در تهران و اصفهان اجرا می‌شود و پیش‌بینی می‌شود تا پایان سال ۱۴۰۲ تکمیل شود."
'''
ner_results = nlp(example)

print(ner_results)