Identifying and Analysing political quotes from the Danish Parliament related to climate change using NLP

KlimaBERT, a sequence-classifier fine-tuned to predict whether political quotes are climate-related. When predicting the positive class 1, "climate-related", the model achieves a F1-score of 0.97, Precision of 0.97, and Recall of 0.97. The negative class, 0, is defined as "non-climate-related".

KlimaBERT is fine-tuned using the pre-trained DaBERT-uncased model, on a training set of 1.000 manually labelled data-points. The training set contains both political quotes and summaries of bills from the Danish Parliament.

The model is created to identify political quotes related to climate change, and performs best on official texts from the Danish Parliament.

Fine-tuning

To fine-tune a model similar to KlimaBERT, follow the fine-tuning notebooks

References

BERT: Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805

DaBERT: Certainly (2021). Certainly has trained the most advanced danish bert model to date. https://www.certainly.io/blog/danish-bert-model/.

Acknowledgements

The resources are created through the work of my Master's thesis, so I would like to thank my supervisors Leon Derczynski and Vedran Sekara for the great support throughout the project! And a HUGE thanks to Gustav Gyrst for great sparring and co-development of the tools you find in this repo.

Contact

For any further help, questions, comments etc. feel free to contact the author Jonathan Kristensen on LinedIn or by creating a "discussion" on this model's page.

Downloads last month
19
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.