Update README.md
Browse files
README.md
CHANGED
@@ -1,47 +1,56 @@
|
|
1 |
---
|
2 |
-
tags:
|
3 |
-
- generated_from_keras_callback
|
4 |
model-index:
|
5 |
-
- name:
|
6 |
results: []
|
|
|
|
|
|
|
7 |
---
|
8 |
|
9 |
<!-- This model card has been generated automatically according to the information Keras had access to. You should
|
10 |
probably proofread and complete it, then remove this comment. -->
|
11 |
|
12 |
-
#
|
13 |
|
14 |
-
|
15 |
-
It achieves the following results on the evaluation set:
|
16 |
|
17 |
|
18 |
## Model description
|
19 |
|
20 |
-
|
|
|
|
|
21 |
|
22 |
-
##
|
23 |
|
24 |
-
|
|
|
|
|
25 |
|
26 |
-
##
|
27 |
|
28 |
-
|
29 |
|
30 |
-
##
|
31 |
-
|
32 |
-
### Training hyperparameters
|
33 |
-
|
34 |
-
The following hyperparameters were used during training:
|
35 |
-
- optimizer: None
|
36 |
-
- training_precision: float32
|
37 |
-
|
38 |
-
### Training results
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
### Framework versions
|
43 |
|
44 |
- Transformers 4.21.3
|
45 |
- TensorFlow 2.9.1
|
46 |
- Datasets 2.7.0
|
47 |
-
- Tokenizers 0.12.1
|
|
|
1 |
---
|
|
|
|
|
2 |
model-index:
|
3 |
+
- name: bert-br
|
4 |
results: []
|
5 |
+
license: mit
|
6 |
+
language:
|
7 |
+
- pt
|
8 |
---
|
9 |
|
10 |
<!-- This model card has been generated automatically according to the information Keras had access to. You should
|
11 |
probably proofread and complete it, then remove this comment. -->
|
12 |
|
13 |
+
# BERT-BR
|
14 |
|
15 |
+
BERTBookReviews
|
|
|
16 |
|
17 |
|
18 |
## Model description
|
19 |
|
20 |
+
BERT-BR is a BERT model pre-trained from scratch on a dataset of literary book reviews in Brazilian Portuguese.
|
21 |
+
The model is specifically designed for understanding the context and sentiment of book reviews in Portuguese.
|
22 |
+
BERT-BR features 6 layers, 4 attention heads, and an embedding dimension of 768.
|
23 |
|
24 |
+
## Training data
|
25 |
|
26 |
+
The BERT-BR model was pre-trained on a dataset of literary book reviews in Brazilian Portuguese.
|
27 |
+
The dataset comprises a diverse range of book genres and review sentiments, making the model
|
28 |
+
suitable for various book-related NLP tasks in Portuguese.
|
29 |
|
30 |
+
## Evaluation
|
31 |
|
32 |
+
WIP.
|
33 |
|
34 |
+
## Usage ideas
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
+
- Sentiment analysis on book reviews in Portuguese
|
37 |
+
- Book recommendation systems in Portuguese
|
38 |
+
- Text classification for book genres in Portuguese
|
39 |
+
- Named entity recognition in book-related contexts in Portuguese
|
40 |
+
- Aspect extraction in book-related contexts in Portuguese
|
41 |
+
- Text generation for book summaries in Portuguese
|
42 |
|
43 |
+
## Limitations and bias
|
44 |
+
As the BERT-BR model was pre-trained on literary book reviews in Brazilian Portuguese,
|
45 |
+
it may not perform as well on other types of text or reviews in different languages.
|
46 |
+
Additionally, the model may inherit certain biases from the training data, which could
|
47 |
+
affect its predictions or embeddings. The tokenizer is based on the BERTimbau tokenizer,
|
48 |
+
which was specifically designed for Brazilian Portuguese text, so it might not work
|
49 |
+
well with other languages or Portuguese variants.
|
50 |
|
51 |
### Framework versions
|
52 |
|
53 |
- Transformers 4.21.3
|
54 |
- TensorFlow 2.9.1
|
55 |
- Datasets 2.7.0
|
56 |
+
- Tokenizers 0.12.1
|