Upload TFBertForSequenceClassification
Browse files- README.md +19 -121
- config.json +1 -1
- tf_model.h5 +1 -1
README.md
CHANGED
@@ -1,150 +1,48 @@
|
|
1 |
---
|
2 |
license: cc-by-sa-4.0
|
3 |
-
language:
|
4 |
-
- es
|
5 |
-
metrics:
|
6 |
-
- accuracy
|
7 |
-
pipeline_tag: text-classification
|
8 |
tags:
|
9 |
-
-
|
10 |
-
-
|
11 |
-
-
|
12 |
-
|
13 |
-
widget:
|
14 |
-
- text: '"Los soldados españoles que están en Afganistán cuentan con las máximas medidas de seguridad para su protección" Jesús Cuadrado, portavoz socialista en la Comisión de Defensa del Congreso, ha enviado esta tarde, en nombre del PSOE, un mensaje de apoyo y solidaridad a los soldados heridos hoy en Afganistán y a sus familias, así como a sus compañeros en esta misión internacional. Cuadrado ha resaltado el “enorme sacrificio” que supone para los soldados una misión de estas características. Un sacrificio “que está al servicio de todo los españoles”, ha explicado, “porque contribuyen a la creación de un Estado”, en un lugar que ha sido usado hasta el momento por los terroristas para cometer atentados en su país y en resto del mundo. Gracias al trabajo de “nuestros soldados”, ha añadido, “ahora hay un ejército compuesto por militares afganos y hay una policía”. Así, “el trabajo de los militares españoles está al servicio de España y de los demás países”, que participan en esta misión por mandato de la OTAN, ha recordado. “Es uno de los trabajos más solidarios y comprometidos que se pueden hacer en el mundo”, ha resaltado el portavoz socialista. La seguridad de nuestras tropas, una prioridad absoluta “La seguridad al cien por cien es imposible”, ha admitido Cuadrado. Así lo ha señalado en numerosas ocasiones tanto el Ministerio de Defensa como el resto del Gobierno. “Los riesgos que asumimos allí son muy elevados y, precisamente por eso, durante estos años, el Gobierno ha hecho de la seguridad de nuestras tropas una prioridad absoluta”. Cuadrado ha recordado que “todos los blindados que utilizan los militares españoles en Afganistán han sido renovados”. Los soldados españoles, en concreto, cuentan con 67 RG31 y 131 blindados tipo Lince. También se ha construido una nueva base y se ha dotado, a todo el material que utilizan, de las más modernas medidas de seguridad, así como de unos servicios sanitarios de alto nivel. “Igualmente, es conocido que se ha ido mejorando el sistema de transporte de nuestras tropas con las mejores medidas de seguridad”, concluyó el portavoz en la Comisión de Defensa.'
|
15 |
---
|
16 |
|
17 |
-
|
|
|
18 |
|
|
|
19 |
|
20 |
-
|
|
|
21 |
|
22 |
|
23 |
## Model description
|
24 |
|
25 |
-
|
26 |
-
| Code | Issue |
|
27 |
-
|--|-------|
|
28 |
-
| 1 | Macroeconomics |
|
29 |
-
| 2 | Civil Rights |
|
30 |
-
| 3 | Health |
|
31 |
-
| 4 | Agriculture |
|
32 |
-
| 5 | Labor |
|
33 |
-
| 6 | Education |
|
34 |
-
| 7 | Environment |
|
35 |
-
| 8 | Energy |
|
36 |
-
| 9 | Immigration |
|
37 |
-
| 10 | Transportation |
|
38 |
-
| 12 | Law and Crime |
|
39 |
-
| 13 | Social Welfare |
|
40 |
-
| 14 | Housing |
|
41 |
-
| 15 | Domestic Commerce |
|
42 |
-
| 16 | Defense |
|
43 |
-
| 17 | Technology |
|
44 |
-
| 18 | Foreign Trade |
|
45 |
-
| 19.1 | International Affairs |
|
46 |
-
| 19.2 | European Union |
|
47 |
-
| 20 | Government Operations |
|
48 |
-
| 23 | Culture |
|
49 |
-
| 98 | Non-thematic |
|
50 |
-
| 99 | Other |
|
51 |
-
|
52 |
-
## Model variations
|
53 |
-
|
54 |
-
There are several monolingual models for different countries, and a multilingual model. The multilingual model can be easily extended to other languages, country contexts, or time periods by fine-tuning it with minimal additional labeled texts.
|
55 |
|
56 |
## Intended uses & limitations
|
57 |
|
58 |
-
|
59 |
|
60 |
-
|
61 |
|
62 |
-
|
63 |
-
|
64 |
-
This model can be used directly with a pipeline for text classification:
|
65 |
-
|
66 |
-
```python
|
67 |
-
>>> from transformers import pipeline
|
68 |
-
>>> tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
|
69 |
-
>>> partypress = pipeline("text-classification", model = "cornelius/partypress-monolingual-spain", tokenizer = "cornelius/partypress-monolingual-spain", **tokenizer_kwargs)
|
70 |
-
>>> partypress("Your text here.")
|
71 |
-
```
|
72 |
-
|
73 |
-
### Limitations and bias
|
74 |
-
|
75 |
-
The model was trained with data from parties in Spain. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower.
|
76 |
-
|
77 |
-
The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database. For example, the performance is highest for press releases from Ireland (75%) and lowest for Poland (55%).
|
78 |
-
|
79 |
-
## Training data
|
80 |
-
|
81 |
-
The PARTYPRESS multilingual model was fine-tuned with about 3,000 press releases from parties in Spain. The press releases were labeled by two expert human coders.
|
82 |
-
|
83 |
-
For the training data of the underlying model, please refer to [dccuchile/bert-base-spanish-wwm-uncased](https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased)
|
84 |
|
85 |
## Training procedure
|
86 |
|
87 |
-
###
|
88 |
|
89 |
-
|
|
|
|
|
90 |
|
91 |
-
###
|
92 |
|
93 |
-
For the pretraining, please refer to [dccuchile/bert-base-spanish-wwm-uncased](https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased)
|
94 |
|
95 |
-
### Fine-tuning
|
96 |
|
97 |
-
|
98 |
-
|
99 |
-
#### Training Hyperparameters
|
100 |
-
|
101 |
-
The batch size for training was 12, for testing 2, with four epochs. All other hyperparameters were the standard from the transformers library.
|
102 |
-
|
103 |
-
|
104 |
-
#### Framework versions
|
105 |
|
106 |
- Transformers 4.28.0
|
107 |
- TensorFlow 2.12.0
|
108 |
- Datasets 2.12.0
|
109 |
- Tokenizers 0.13.3
|
110 |
-
|
111 |
-
|
112 |
-
## Evaluation results
|
113 |
-
|
114 |
-
Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation that are comparable to the performance of our expert human coders. Please refer to Erfort et al. (2023)
|
115 |
-
|
116 |
-
### BibTeX entry and citation info
|
117 |
-
|
118 |
-
```bibtex
|
119 |
-
@article{erfort_partypress_2023,
|
120 |
-
author = {Cornelius Erfort and
|
121 |
-
Lukas F. Stoetzer and
|
122 |
-
Heike Klüver},
|
123 |
-
title = {The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases},
|
124 |
-
journal = {Research and Politics},
|
125 |
-
volume = {forthcoming},
|
126 |
-
year = {2023},
|
127 |
-
}
|
128 |
-
```
|
129 |
-
|
130 |
-
### Further resources
|
131 |
-
|
132 |
-
Github: [cornelius-erfort/partypress](https://github.com/cornelius-erfort/partypress)
|
133 |
-
|
134 |
-
Research and Politics Dataverse: [Replication Data for: The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FOINX7Q)
|
135 |
-
|
136 |
-
|
137 |
-
|
138 |
-
## Acknowledgements
|
139 |
-
|
140 |
-
Research for this contribution is part of the Cluster of Excellence "Contestations of the Liberal Script" (EXC 2055, Project-ID: 390715649), funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Spain´s Excellence Strategy. Cornelius Erfort is moreover grateful for generous funding provided by the DFG through the Research Training Group DYNAMICS (GRK 2458/1).
|
141 |
-
|
142 |
-
## Contact
|
143 |
-
|
144 |
-
Cornelius Erfort
|
145 |
-
|
146 |
-
Humboldt-Universität zu Berlin
|
147 |
-
|
148 |
-
[corneliuserfort.de](corneliuserfort.de)
|
149 |
-
|
150 |
-
|
|
|
1 |
---
|
2 |
license: cc-by-sa-4.0
|
|
|
|
|
|
|
|
|
|
|
3 |
tags:
|
4 |
+
- generated_from_keras_callback
|
5 |
+
model-index:
|
6 |
+
- name: partypress-monolingual-spain
|
7 |
+
results: []
|
|
|
|
|
8 |
---
|
9 |
|
10 |
+
<!-- This model card has been generated automatically according to the information Keras had access to. You should
|
11 |
+
probably proofread and complete it, then remove this comment. -->
|
12 |
|
13 |
+
# partypress-monolingual-spain
|
14 |
|
15 |
+
This model is a fine-tuned version of [cornelius/partypress-monolingual-spain](https://huggingface.co/cornelius/partypress-monolingual-spain) on an unknown dataset.
|
16 |
+
It achieves the following results on the evaluation set:
|
17 |
|
18 |
|
19 |
## Model description
|
20 |
|
21 |
+
More information needed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
## Intended uses & limitations
|
24 |
|
25 |
+
More information needed
|
26 |
|
27 |
+
## Training and evaluation data
|
28 |
|
29 |
+
More information needed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
## Training procedure
|
32 |
|
33 |
+
### Training hyperparameters
|
34 |
|
35 |
+
The following hyperparameters were used during training:
|
36 |
+
- optimizer: None
|
37 |
+
- training_precision: float32
|
38 |
|
39 |
+
### Training results
|
40 |
|
|
|
41 |
|
|
|
42 |
|
43 |
+
### Framework versions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
- Transformers 4.28.0
|
46 |
- TensorFlow 2.12.0
|
47 |
- Datasets 2.12.0
|
48 |
- Tokenizers 0.13.3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
{
|
2 |
-
"_name_or_path": "
|
3 |
"architectures": [
|
4 |
"BertForSequenceClassification"
|
5 |
],
|
|
|
1 |
{
|
2 |
+
"_name_or_path": "cornelius/partypress-monolingual-spain",
|
3 |
"architectures": [
|
4 |
"BertForSequenceClassification"
|
5 |
],
|
tf_model.h5
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 439762284
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:78d9dbc6b2136c5c2c94ea77b3d45b7f3435520215a07b55a4d7501eff8e536e
|
3 |
size 439762284
|