New Embedding Model for MTEB - Retriever/BIER Benchmark - Applying for refresh

#134
by nv-bschifferer - opened

I created a new embedding model: https://huggingface.co/nvidia/NV-Retriever-v1
We want to submit it only to the MTEB - Retriever section / MTEB/BIER benchmark.

Can someone refresh the leaderboard?

The model weights will be added soon.

Small off topic:
https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md

It says that Hit the Refresh button at the bottom of the leaderboard and you should see your scores 🥇. How does that work? I could not find a refresh button.

@tomaarsen @Muennighoff - are the scores added correctly to the model card?

Massive Text Embedding Benchmark org
edited Jul 10, 2024

Hello @nv-bschifferer ,

Great timing, I just finished a local refresh to try and figure out why NV-Retriever-v1 hadn't been automatically added to the leaderboard yet. It seems that the latest updates to MTEB introduced a slight bug in the dataset naming, also described in #132. I've created a private testing repository with a fixed version of your model's README, and that one indeed works locally as intended:

image.png

I'll create a PR to https://huggingface.co/nvidia/NV-Retriever-v1 to apply the fix to your model card, and then you should be all good to go at the next daily refresh.

Small off topic:
https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md

It says that Hit the Refresh button at the bottom of the leaderboard and you should see your scores 🥇. How does that work? I could not find a refresh button.

Just 2 days ago we removed this refresh button in favor of an automatic daily refresh, but it seems that these docs are still outdated. Apologies as we work out some of the kinks here.

  • Tom Aarsen

@tomaarsen thanks for the quick response and testing it. I approved your PR.

I'm looking forward to your release & reading about your approach!
Yes, we are finalising the model files and writing the paper, right now. I will let you know, when it is ready.

Slack Channel:
Thanks Tom. I haven't heard about that slack channel. It would be great, if you could invite me. I would add the other team members, as well.

Just 2 days ago we removed this refresh button in favor of an automatic daily refresh, but it seems that these docs are still outdated. Apologies as we work out some of the kinks here.
Ah ok, that make sense. All good - I was just wondering, if I miss something.

Can you share when the daily refresh will happen of the LB?

Best,
Benedikt

Massive Text Embedding Benchmark org
edited Jul 10, 2024

Can you share when the daily refresh will happen of the LB?

Looks like the refresh starts at around 2:30am UTC (source), and the refresh itself takes about 10 minutes. However, it seems like the last daily refresh failed, so I can't confirm that the next one will work.

I've sent you a Slack invite, let me know if it works :)

  • Tom Aarsen
Massive Text Embedding Benchmark org

Thanks for raising the refresh issue! Opened an issue here: https://github.com/embeddings-benchmark/mteb/issues/1073

@tomaarsen @Muennighoff thanks a lot for the support. I checked the leaderboard and the model is not added to it. It seems that the last refresh failed again: https://github.com/embeddings-benchmark/leaderboard/actions/runs/9884681390

I downloaded the logs, it seems there is some dataset/key error:

2024-07-11T02:56:35.1061892Z Fetching 'leonn71/gte-Qwen2-1.5B-instruct-Q6_K-GGUF' metadata: 100%|██████████| 1264/1264 [00:07<00:00, 160.78it/s]
2024-07-11T02:56:35.1095798Z 
2024-07-11T02:56:35.1096583Z Fetching leaderboard results for 'fr':  29%|██▊       | 4/14 [06:35<16:28, 98.83s/it]
2024-07-11T02:56:35.1097677Z Traceback (most recent call last):
2024-07-11T02:56:35.1110169Z   File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
2024-07-11T02:56:35.1111011Z     return self._engine.get_loc(casted_key)
2024-07-11T02:56:35.1111574Z   File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
2024-07-11T02:56:35.1112466Z   File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
2024-07-11T02:56:35.1113580Z   File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
2024-07-11T02:56:35.1114900Z   File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
2024-07-11T02:56:35.1115679Z KeyError: 'PawsXPairClassification (fr)'
2024-07-11T02:56:35.1115934Z 
2024-07-11T02:56:35.1116176Z The above exception was the direct cause of the following exception:
2024-07-11T02:56:35.1116534Z 
2024-07-11T02:56:35.1116652Z Traceback (most recent call last):
2024-07-11T02:56:35.1117201Z   File "/home/runner/work/leaderboard/leaderboard/refresh.py", line 518, in <module>
2024-07-11T02:56:35.1118087Z     all_data_tasks, boards_data = refresh_leaderboard()
2024-07-11T02:56:35.1118909Z   File "/home/runner/work/leaderboard/leaderboard/refresh.py", line 408, in refresh_leaderboard
2024-07-11T02:56:35.1119632Z     data_overall, data_tasks = get_mteb_average(board_config["tasks"])
2024-07-11T02:56:35.1120370Z   File "/home/runner/work/leaderboard/leaderboard/refresh.py", line 347, in get_mteb_average
2024-07-11T02:56:35.1120960Z     DATA_OVERALL = get_mteb_data(
2024-07-11T02:56:35.1121553Z   File "/home/runner/work/leaderboard/leaderboard/refresh.py", line 326, in get_mteb_data
2024-07-11T02:56:35.1122448Z     df['PawsXPairClassification (fr)'] = df['PawsXPairClassification (fr)'].fillna(df['PawsX (fr)'])
2024-07-11T02:56:35.1123520Z   File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/pandas/core/frame.py", line 4102, in __getitem__
2024-07-11T02:56:35.1124251Z     indexer = self.columns.get_loc(key)
2024-07-11T02:56:35.1125216Z   File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
2024-07-11T02:56:35.1126051Z     raise KeyError(key) from err
2024-07-11T02:56:35.1126436Z KeyError: 'PawsXPairClassification (fr)'
2024-07-11T02:56:35.3368875Z ##[error]Process completed with exit code 1.
2024-07-11T02:56:35.3456086Z Post job cleanup.
2024-07-11T02:56:35.4143970Z [command]/usr/bin/git version
2024-07-11T02:56:35.4179801Z git version 2.45.2
2024-07-11T02:56:35.4224415Z Temporarily overriding HOME='/home/runner/work/_temp/758bb8b9-b035-4365-b04b-de5e29838f3a' before making global git config changes
2024-07-11T02:56:35.4225422Z Adding repository directory to the temporary git global config as a safe directory
2024-07-11T02:56:35.4228112Z [command]/usr/bin/git config --global --add safe.directory /home/runner/work/leaderboard/leaderboard
Massive Text Embedding Benchmark org

Well identified, thanks for sharing.

cc @KennethEnevoldsen @orionweller here's some details on the current failure. Could one of you perhaps look into this?

  • Tom Aarsen

I read the code and it seems that these lines causes the error: https://github.com/embeddings-benchmark/leaderboard/blob/main/refresh.py#L325-L327

        if ('PawsXPairClassification (fr)' in datasets) and ('PawsX (fr)' in cols):
            df['PawsXPairClassification (fr)'] = df['PawsXPairClassification (fr)'].fillna(df['PawsX (fr)'])
            datasets.remove('PawsX (fr)')

The code checks, if PawsXPairClassification (fr) is in dataset and PawsX (fr) in cols. cols are the available columns in the dataframe df. However, I think there are some cases, where the dataframe df does not have the column PawsXPairClassification (fr) and therefore df['PawsXPairClassification (fr)'].fillna(df['PawsX (fr)'])this part throws the error

Massive Text Embedding Benchmark org

Thanks @nv-bschifferer and sorry for the issues!

Should be fixed in https://github.com/embeddings-benchmark/leaderboard/pull/7

Give it 30 minutes or so for the updates to propagate and refresh the leaderboard :)

Massive Text Embedding Benchmark org
edited Jul 11, 2024

Looks like there is still something else, feel free to track the progress/comment on https://github.com/embeddings-benchmark/leaderboard/issues/8

EDIT: I was wrong appears to be working for this model!

@tomaarsen : thanks for your patient. It took us a little bit to release the model weights and finishing the paper.

I uploaded the model weights, today. The paper is available on arxiv: NV-Retriever: Improving text embedding models with effective hard-negative mining ; https://arxiv.org/abs/2407.15831

orionweller changed discussion status to closed

Sign up or log in to comment