File size: 1,806 Bytes
52e0a62
3eb0473
 
 
52e0a62
 
 
 
 
104989a
52e0a62
f872876
 
 
52e0a62
 
104989a
 
 
 
52e0a62
104989a
52e0a62
104989a
 
 
 
 
 
52e0a62
104989a
52e0a62
104989a
 
 
 
 
52e0a62
104989a
52e0a62
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: mit
language:
- pt
model-index:
- name: bert-tv-portuguese
  results: []
---

# BERT-TV

<img src="https://cdn-uploads.huggingface.co/production/uploads/6385e26cc12615765caa6afe/3lSkNEfW57BNudZIFyTH2.png" width=400 height=400>
Image generated by ChatGPT with DALL-E from OpenAI.

## Model description

BERT-TV is a BERT model specifically pre-trained from scratch on a dataset of television reviews in Brazilian Portuguese. 
This model is tailored to grasp the nuances and specificities associated with the context and sentiment expressed in 
television reviews. BERT-TV features 6 layers, 12 attention heads, and an embedding dimension of 768, making it adept at 
handling NLP tasks related to television content in Portuguese.

## Usage ideas

- Sentiment analysis on television reviews in Portuguese
- Recommender systems for television models in Portuguese
- Text classification for different television brands and types in Portuguese
- Named entity recognition in television-related contexts in Portuguese
- Aspect extraction for features and specifications of televisions in Portuguese
- Text generation for summarizing television reviews in Portuguese

## Limitations and bias

As the BERT-TV model is exclusively pre-trained on television reviews in Brazilian Portuguese, its performance may be 
limited when applied to other types of text or reviews in different languages. Furthermore, the model could inherit 
biases present in the training data, which may influence its predictions or embeddings. The tokenizer is adapted from 
the BERTimbau tokenizer, optimized for Brazilian Portuguese, thus it might not deliver optimal results with other 
languages or Portuguese dialects.

## Framework versions

- Transformers 4.27.3
- TensorFlow 2.11.1
- Datasets 2.11.0
- Tokenizers 0.13.3