--- language: - es license: mit widget: - text: "La Constitución española de 1978 es la suprema del ordenamiento jurídico español." tags: - Long documents - longformer - robertalex - spanish - legal --- # Legal ⚖️ longformer-base-4096-spanish ## [Longformer](https://arxiv.org/abs/2004.05150) is a Transformer model for long documents. `legal-longformer-base-4096` is a BERT-like model started from the RoBERTa checkpoint (**[RoBERTalex](PlanTL-GOB-ES/RoBERTalex)** in this case) and pre-trained for *MLM* on long documents from the [Spanish Legal Domain Corpora](https://zenodo.org/record/5495529/#.Y205lpHMKV5). It supports sequences of length up to **4,096**! **Longformer** uses a combination of a sliding window (*local*) attention and *global* attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations. This model was made following the research done by [Iz Beltagy and Matthew E. Peters and Arman Cohan](https://arxiv.org/abs/2004.05150). ## Model (base checkpoint) [RoBERTalex](https://huggingface.co/PlanTL-GOB-ES/RoBERTalex?) There are few models trained for the Spanish language. Some of the models have been trained with a low resource, unclean corpora. The ones derived from the Spanish National Plan for Language Technologies are proficient in solving several tasks and have been trained using large-scale clean corpora. However, the Spanish Legal domain language could be thought of as an independent language on its own. We, therefore, created a Spanish Legal model from scratch trained exclusively on legal corpora. ## Dataset [Spanish Legal Domain Corpora](https://zenodo.org/record/5495529) A collection of corpora of Spanish legal domain. More legal domain resources: https://github.com/PlanTL-GOB-ES/lm-legal-es ## Citation If you want to cite this model you can use this: ```bibtex @misc{narrativa2022legal-longformer-base-4096-spanish, title={Legal Spanish LongFormer by Narrativa}, author={Romero, Manuel}, publisher={Hugging Face}, journal={Hugging Face Hub}, howpublished={\url{https://huggingface.co/Narrativa/legal-longformer-base-4096-spanish}}, year={2022} } ``` ## Disclaimer The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions. When third parties, deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of artificial intelligence. In no event shall the owner of the models (SEDIA – State Secretariat for digitalization and artificial intelligence) nor the creator (BSC – Barcelona Supercomputing Center) be liable for any results arising from the use made by third parties of these models. > About Narrativa: Natural Language Generation (NLG) | Gabriele, our machine learning-based platform, builds and deploys natural language solutions. #NLG #AI