bert-br-portuguese / README.md
jcfneto's picture
Update README.md
0dfd44b
---
license: mit
language:
- pt
model-index:
- name: bert-br-portuguese
results: []
---
<!-- This model card has been generated automatically according to the information Keras had access to. You should
probably proofread and complete it, then remove this comment. -->
# BERT-BR
<img src="https://cdn-uploads.huggingface.co/production/uploads/6385e26cc12615765caa6afe/3lSkNEfW57BNudZIFyTH2.png" width=400 height=400>
Image generated by ChatGPT with DALL-E from OpenAI.
## Model description
BERT-BR is a BERT model pre-trained from scratch on a dataset of literary book reviews in Brazilian Portuguese.
The model is specifically designed for understanding the context and sentiment of book reviews in Portuguese.
BERT-BR features 6 layers, 4 attention heads, and an embedding dimension of 768.
## Training data
The BERT-BR model was pre-trained on a dataset of literary book reviews in Brazilian Portuguese.
The dataset comprises a diverse range of book genres and review sentiments, making the model
suitable for various book-related NLP tasks in Portuguese.
## Usage ideas
- Sentiment analysis on book reviews in Portuguese
- Book recommendation systems in Portuguese
- Text classification for book genres in Portuguese
- Named entity recognition in book-related contexts in Portuguese
- Aspect extraction in book-related contexts in Portuguese
- Text generation for book summaries in Portuguese
## Limitations and bias
As the BERT-BR model was pre-trained on literary book reviews in Brazilian Portuguese,
it may not perform as well on other types of text or reviews in different languages.
Additionally, the model may inherit certain biases from the training data, which could
affect its predictions or embeddings. The tokenizer is based on the BERTimbau tokenizer,
which was specifically designed for Brazilian Portuguese text, so it might not work
well with other languages or Portuguese variants.
## Framework versions
- Transformers 4.21.3
- TensorFlow 2.9.1
- Datasets 2.7.0
- Tokenizers 0.12.1