|
--- |
|
license: mit |
|
language: |
|
- pt |
|
model-index: |
|
- name: bert-br-portuguese |
|
results: [] |
|
--- |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information Keras had access to. You should |
|
probably proofread and complete it, then remove this comment. --> |
|
|
|
# BERT-BR |
|
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/6385e26cc12615765caa6afe/3lSkNEfW57BNudZIFyTH2.png" width=400 height=400> |
|
Image generated by ChatGPT with DALL-E from OpenAI. |
|
|
|
## Model description |
|
|
|
BERT-BR is a BERT model pre-trained from scratch on a dataset of literary book reviews in Brazilian Portuguese. |
|
The model is specifically designed for understanding the context and sentiment of book reviews in Portuguese. |
|
BERT-BR features 6 layers, 4 attention heads, and an embedding dimension of 768. |
|
|
|
## Training data |
|
|
|
The BERT-BR model was pre-trained on a dataset of literary book reviews in Brazilian Portuguese. |
|
The dataset comprises a diverse range of book genres and review sentiments, making the model |
|
suitable for various book-related NLP tasks in Portuguese. |
|
|
|
|
|
## Usage ideas |
|
|
|
- Sentiment analysis on book reviews in Portuguese |
|
- Book recommendation systems in Portuguese |
|
- Text classification for book genres in Portuguese |
|
- Named entity recognition in book-related contexts in Portuguese |
|
- Aspect extraction in book-related contexts in Portuguese |
|
- Text generation for book summaries in Portuguese |
|
|
|
## Limitations and bias |
|
As the BERT-BR model was pre-trained on literary book reviews in Brazilian Portuguese, |
|
it may not perform as well on other types of text or reviews in different languages. |
|
Additionally, the model may inherit certain biases from the training data, which could |
|
affect its predictions or embeddings. The tokenizer is based on the BERTimbau tokenizer, |
|
which was specifically designed for Brazilian Portuguese text, so it might not work |
|
well with other languages or Portuguese variants. |
|
|
|
## Framework versions |
|
|
|
- Transformers 4.21.3 |
|
- TensorFlow 2.9.1 |
|
- Datasets 2.7.0 |
|
- Tokenizers 0.12.1 |