File size: 2,040 Bytes
59ab608
c1c8bdb
4e8ce6a
 
 
 
0554350
4e8ce6a
 
 
 
59ab608
c1c8bdb
4e8ce6a
c1c8bdb
4e8ce6a
c1c8bdb
 
 
4e8ce6a
c1c8bdb
4e8ce6a
c1c8bdb
9df1c0d
c1c8bdb
4e8ce6a
c1c8bdb
9df1c0d
 
4e8ce6a
 
 
 
 
 
 
c1c8bdb
4e8ce6a
 
 
 
 
 
 
c1c8bdb
 
 
4e8ce6a
c1c8bdb
4e8ce6a
 
c1c8bdb
4e8ce6a
c1c8bdb
4e8ce6a
 
 
 
 
 
c1c8bdb
4e8ce6a
 
c1c8bdb
4e8ce6a
c1c8bdb
4e8ce6a
 
 
 
 
 
 
c1c8bdb
4e8ce6a
 
c1c8bdb
4e8ce6a
c1c8bdb
4e8ce6a
 
dfbd618
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
library_name: transformers
tags:
- BERT
- Transformers
- BETO
- Clickbait
license: mit
language:
- es
pipeline_tag: text-classification
---

# BETO Spanish Clickbaits Model 

This clickbait analysis model is based on the BETO, a Spanish variant of BERT. 

## Model Details

BETO is a BERT model trained on a big Spanish corpus. BETO is of size similar to a BERT-Base and was trained with the Whole Word Masking technique.

[BETO huggingface](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased)

Model fine-tuned with a news (around ~30k) of several Spanish Newspapers.

## Training evaluate

Using transformers

```
BATCH_SIZE = 100
NUM_PROCS = 32
LR = 0.00005
EPOCHS = 5
MAX_LENGTH = 25
MODEL = 'dccuchile/bert-base-spanish-wwm-cased'

{'eval_loss': 0.0386480949819088,
 'eval_accuracy': 0.9872786230980294,
 'eval_runtime': 10.0476,
 'eval_samples_per_second': 398.999,
 'eval_steps_per_second': 4.081,
 'epoch': 5.0}
```

## Uses

This model is designed to classify newspaper news as clickbaits or not. 

You can see a use case in this url:
[Spanish Newspapers](https://clickbait.taniwa.es/)

### Direct Use

```
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TextClassificationPipeline,
)

tokenizer = AutoTokenizer.from_pretrained("taniwasl/clickbait_es")
model = AutoModelForSequenceClassification.from_pretrained("taniwasl/clickbait_es")

review_text = 'La explosión destruye parcialmente el edificio, Egipto'

nlp = TextClassificationPipeline(task = "text-classification",
                model = model,
                tokenizer = tokenizer,
                max_length = 25,
                truncation=True,
                add_special_tokens=True
                )

print(nlp(review_text))
```

## License Disclaimer

The license MIT best describes our intentions for our work. 
However we are not sure that all the datasets used to train BETO have licenses compatible with MIT (specially for commercial use). 
Please use at your own discretion only for no commercial use.