--- license: cc-by-sa-4.0 language: - en library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - topic-relatedness - semantic-relatedness base_model: - sentence-transformers/distiluse-base-multilingual-cased-v1 datasets: - FrancescoPeriti/TRoTR --- # TRoTR-all-distilroberta-v1 ```FrancescoPeriti/TRoTR-distiluse-base-multilingual-cased-v1``` is a fine-tuned version of the ```sentence-transformers/distiluse-base-multilingual-cased-v1```. **NOTE**: In our work, we performed cross-validation across 10 different folds. For a given model (e.g., ```distiluse-base-multilingual-cased-v1```), this process involved fine-tuning 10 separate models and reporting the average performance across the test folds. Rather than sharing all the fine-tuned models for each fold, we decided to provide only an example model for the [**FOLD1**](https://github.com/FrancescoPeriti/TRoTR/tree/main/TRoTR/datasets/FOLD_1). Please note that the results in the paper are based on the averaged performance across all folds. Therefore, the performance of this single model is not directly comparable to the results reported in the paper. You can find more details in our paper [TRoTR: A Framework for Evaluating the Recontextualization of Text](https://aclanthology.org/2024.emnlp-main.774.pdf) by Francesco Periti, Pierluigi Cassotti, Stefano Montanelli, Nina Tahmasebi, and Dominik Schlechtweg. The repository of our project is [https://github.com/FrancescoPeriti/TRoTR](https://github.com/FrancescoPeriti/TRoTR). ### Model Description This model is designed to evaluate the topic relatedness of text reuse in different contexts. The model is fine-tuned on the **TRoTR** dataset for _text recontextualization_ using _contrastive learning_. Specifically, given a target text-reuse excerpt ๐‘ก within two contexts ๐‘โ‚ and ๐‘โ‚‚, the model is trained to minimize the embedding distance between ๐‘โ‚ and ๐‘โ‚‚ if they share the same topic, and to maximize the distance if they don't share the same topic. As an example, consider three recontextualizations of the biblical passage ```John 15:13```: - (1) Itโ€™s the wonderful pride month!! โค๏ธ๐Ÿงก๐Ÿ’›๐Ÿ’š๐Ÿ’™๐Ÿ’œ Honestly pride is everyday! Love is love donโ€™t forget I love you โค๏ธ. Remember this! John 15:12-13: โ€œMy command is this: Love each other as I have loved you. ```Greater love has no one than this: to lay down oneโ€™s life for oneโ€™s friends```โ€ - (2) At a large Crimean event today Putin quoted the Bible to defend the special military operation in Ukraine which has killed thousands and displaced millions. His words โ€œ```There is no greater love than if someone gives soul for their friends```โ€. And people were cheering him. Madness!!! - (3) โ€œFreeing people from genocide is the reason, motive & goal of the military operation we started in the Donbas& Ukraineโ€, Putin says, then quotes the Bible: โ€œ```There is no greater love than to lay down oneโ€™s life for oneโ€™s friends.```โ€ Itโ€™s like Billy Graham meets North Korea In this example, the biblical passage is incorporated within three texts with different topic recontextualizations. In particular, the text (1) has a different topic with respect to text (2) and (3), while the texts (2) and (3) are topic related ## How to Get Started with the Model ```python from sentence_transformers import SentenceTransformer # Load the model model = SentenceTransformer('FrancescoPeriti/TRoTR-distiluse-base-multilingual-cased-v1') # Example sentences for text recontextualization context1 = "It's the wonderful pride month!! โค๏ธ๐Ÿงก๐Ÿ’›๐Ÿ’š๐Ÿ’™๐Ÿ’œ Honestly pride is everyday! Love is love don't forget I love you โค๏ธ. Remember this! John 15:12-13: My command is this: Love each other as I have loved you. Greater love has no one than this: to lay down one's life for one's friends" context2 = "At a large Crimean event today Putin quoted the Bible to defend the special military operation in Ukraine which has killed thousands and displaced millions. His words \"Greater love has no one than this: to lay down one's life for one's friends\". And people were cheering him. Madness!!!" context3 = "\"Freeing people from genocide is the reason, motive and goal of the military operation we started in the Donbas and Ukraine\", Putin says, then quotes the Bible: \"Greater love has no one than this: to lay down one's life for one's friends\" It's like Billy Graham meets North Korea." # Encode the two contexts into embeddings embedding1 = model.encode([context1]) embedding2 = model.encode([context2]) embedding3 = model.encode([context3]) # Calculate similarity similarity1 = model.similarity(embedding1, embedding2) similarity2 = model.similarity(embedding1, embedding3) similarity3 = model.similarity(embedding2, embedding3) # Print the similarity score print(f"Cosine similarities between the contexts: {similarity1}, {similarity2}, {similarity3}") # Cosine similarities between the contexts: tensor([[0.4249]]), tensor([[0.4724]]), tensor([[0.8182]]) ``` ## Citation Francesco Periti, Pierluigi Cassotti, Stefano Montanelli, Nina Tahmasebi, and Dominik Schlechtweg. 2024. [TRoTR: A Framework for Evaluating the Re-contextualization of Text Reuse](https://aclanthology.org/2024.emnlp-main.774/). In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 13972โ€“13990, Miami, Florida, USA. Association for Computational Linguistics. **BibTeX:** ``` @inproceedings{periti2024trotr, title = {{TRoTR: A Framework for Evaluating the Re-contextualization of Text Reuse}}, author = "Periti, Francesco and Cassotti, Pierluigi and Montanelli, Stefano and Tahmasebi, Nina and Schlechtweg, Dominik", editor = "Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung", booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.emnlp-main.774", pages = "13972--13990", abstract = "Current approaches for detecting text reuse do not focus on recontextualization, i.e., how the new context(s) of a reused text differs from its original context(s). In this paper, we propose a novel framework called TRoTR that relies on the notion of topic relatedness for evaluating the diachronic change of context in which text is reused. TRoTR includes two NLP tasks: TRiC and TRaC. TRiC is designed to evaluate the topic relatedness between a pair of recontextualizations. TRaC is designed to evaluate the overall topic variation within a set of recontextualizations. We also provide a curated TRoTR benchmark of biblical text reuse, human-annotated with topic relatedness. The benchmark exhibits an inter-annotator agreement of .811. We evaluate multiple, established SBERT models on the TRoTR tasks and find that they exhibit greater sensitivity to textual similarity than topic relatedness. Our experiments show that fine-tuning these models can mitigate such a kind of sensitivity.", } ```