Word Sense Linking: Disambiguating Outside the Sandbox

Conference Paper Hugging Face Collection GitHub

Model Description

We introduce the task of Word Sense Linking (WSL), which focuses on accurately mapping spans of text to their most appropriate senses using a reference inventory. The Word Sense Linking model is designed to identify and disambiguate spans of text to their most suitable senses from a reference inventory. The annotations are provided as sense keys from WordNet, a large lexical database of English.

Installation

Installation from PyPI:

git clone https://github.com/Babelscape/WSL
cd WSL
pip install -r requirements.txt

Usage

WSL is composed of two main components: a retriever and a reader. The retriever is responsible for retrieving relevant senses from a senses inventory (e.g WordNet), while the reader is responsible for extracting spans from the input text and link them to the retrieved documents. WSL can be used with the from_pretrained method to load a pre-trained pipeline.

from wsl import WSL
from wsl.inference.data.objects import WSLOutput

wsl_model = WSL.from_pretrained("Babelscape/wsl-base")
wsl_out: WSLOutput = wsl_model("Bus drivers drive busses for a living.")
WSLOutput(
text='Bus drivers drive busses for a living.',
tokens=['Bus', 'drivers', 'drive', 'busses', 'for', 'a', 'living', '.'],
id=0,
spans=[
    Span(start=0, end=11, label='bus driver: someone who drives a bus', text='Bus drivers'),
    Span(start=12, end=17, label='drive: operate or control a vehicle', text='drive'),
    Span(start=18, end=24, label='bus: a vehicle carrying many passengers; used for public transport', text='busses'),
    Span(start=31, end=37, label='living: the financial means whereby one lives', text='living')
],
candidates=Candidates(
    candidates=[
                {"text": "bus driver: someone who drives a bus", "id": "bus_driver%1:18:00::", "metadata": {}},
                {"text": "driver: the operator of a motor vehicle", "id": "driver%1:18:00::", "metadata": {}},
                {"text": "driver: someone who drives animals that pull a vehicle", "id": "driver%1:18:02::", "metadata": {}},
                {"text": "bus: a vehicle carrying many passengers; used for public transport", "id": "bus%1:06:00::", "metadata": {}},
                {"text": "living: the financial means whereby one lives", "id": "living%1:26:00::", "metadata": {}}
    ]
),

)

Model Performance

Here you can find the performances of our model on the WSL evaluation dataset.

Validation (SE07)

Models P R F1
BEM_SUP 67.6 40.9 51.0
BEM_HEU 70.8 51.2 59.4
ConSeC_SUP 76.4 46.5 57.8
ConSeC_HEU 76.7 55.4 64.3
Our Model 73.8 74.9 74.4

Test (ALL_FULL)

Models P R F1
BEM_SUP 74.8 50.7 60.4
BEM_HEU 76.6 61.2 68.0
ConSeC_SUP 78.9 53.1 63.5
ConSeC_HEU 80.4 64.3 71.5
Our Model 75.2 76.7 75.9

Additional Information

Licensing Information: Contents of this repository are restricted to only non-commercial research purposes under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). Copyright of the dataset contents belongs to Babelscape. Arxiv Paper: Word Sense Linking: Disambiguating Outside the Sandbox

Citation Information

@inproceedings{bejgu-etal-2024-wsl,
    title     = "Word Sense Linking: Disambiguating Outside the Sandbox",
    author    = "Bejgu, Andrei Stefan and Barba, Edoardo and Procopio, Luigi and Fern{\'a}ndez-Castro, Alberte and Navigli, Roberto",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    month     = aug,
    year      = "2024",
    address   = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
}

Contributions: Thanks to @andreim14, @edobobo, @poccio and @navigli for adding this model.

Downloads last month
53
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train Babelscape/wsl-base

Collection including Babelscape/wsl-base