File size: 2,027 Bytes
6b048b7
8dd57ac
 
 
2baa254
 
 
6b048b7
 
2baa254
6b048b7
 
 
8dd57ac
6b048b7
0dfd44b
 
6b048b7
 
 
8dd57ac
 
 
6b048b7
8dd57ac
6b048b7
8dd57ac
 
 
6b048b7
 
8dd57ac
6b048b7
8dd57ac
 
 
 
 
 
6b048b7
8dd57ac
 
 
 
 
 
 
6b048b7
b5d6f6c
6b048b7
 
 
 
8dd57ac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
license: mit
language:
- pt
model-index:
- name: bert-br-portuguese
  results: []
---


<!-- This model card has been generated automatically according to the information Keras had access to. You should
probably proofread and complete it, then remove this comment. -->

# BERT-BR

<img src="https://cdn-uploads.huggingface.co/production/uploads/6385e26cc12615765caa6afe/3lSkNEfW57BNudZIFyTH2.png" width=400 height=400>
Image generated by ChatGPT with DALL-E from OpenAI.

## Model description

BERT-BR is a BERT model pre-trained from scratch on a dataset of literary book reviews in Brazilian Portuguese. 
The model is specifically designed for understanding the context and sentiment of book reviews in Portuguese. 
BERT-BR features 6 layers, 4 attention heads, and an embedding dimension of 768.

## Training data

The BERT-BR model was pre-trained on a dataset of literary book reviews in Brazilian Portuguese. 
The dataset comprises a diverse range of book genres and review sentiments, making the model 
suitable for various book-related NLP tasks in Portuguese.


## Usage ideas

- Sentiment analysis on book reviews in Portuguese
- Book recommendation systems in Portuguese
- Text classification for book genres in Portuguese
- Named entity recognition in book-related contexts in Portuguese
- Aspect extraction in book-related contexts in Portuguese
- Text generation for book summaries in Portuguese

## Limitations and bias
As the BERT-BR model was pre-trained on literary book reviews in Brazilian Portuguese, 
it may not perform as well on other types of text or reviews in different languages. 
Additionally, the model may inherit certain biases from the training data, which could 
affect its predictions or embeddings. The tokenizer is based on the BERTimbau tokenizer, 
which was specifically designed for Brazilian Portuguese text, so it might not work 
well with other languages or Portuguese variants.

## Framework versions

- Transformers 4.21.3
- TensorFlow 2.9.1
- Datasets 2.7.0
- Tokenizers 0.12.1