Spaces:
Running
on
CPU Upgrade
New Embedding Model for MTEB - Retriever/BIER Benchmark - Applying for refresh
I created a new embedding model: https://huggingface.co/nvidia/NV-Retriever-v1
We want to submit it only to the MTEB - Retriever section / MTEB/BIER benchmark.
Can someone refresh the leaderboard?
The model weights will be added soon.
Small off topic:
https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md
It says that Hit the Refresh button at the bottom of the leaderboard and you should see your scores 🥇
. How does that work? I could not find a refresh button.
@tomaarsen @Muennighoff - are the scores added correctly to the model card?
Hello @nv-bschifferer ,
Great timing, I just finished a local refresh to try and figure out why NV-Retriever-v1
hadn't been automatically added to the leaderboard yet. It seems that the latest updates to MTEB introduced a slight bug in the dataset naming, also described in #132. I've created a private testing repository with a fixed version of your model's README, and that one indeed works locally as intended:
I'll create a PR to https://huggingface.co/nvidia/NV-Retriever-v1 to apply the fix to your model card, and then you should be all good to go at the next daily refresh.
Small off topic:
https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.mdIt says that
Hit the Refresh button at the bottom of the leaderboard and you should see your scores 🥇
. How does that work? I could not find a refresh button.
Just 2 days ago we removed this refresh button in favor of an automatic daily refresh, but it seems that these docs are still outdated. Apologies as we work out some of the kinks here.
- Tom Aarsen
@tomaarsen thanks for the quick response and testing it. I approved your PR.
I'm looking forward to your release & reading about your approach!
Yes, we are finalising the model files and writing the paper, right now. I will let you know, when it is ready.
Slack Channel:
Thanks Tom. I haven't heard about that slack channel. It would be great, if you could invite me. I would add the other team members, as well.
Just 2 days ago we removed this refresh button in favor of an automatic daily refresh, but it seems that these docs are still outdated. Apologies as we work out some of the kinks here.
Ah ok, that make sense. All good - I was just wondering, if I miss something.
Can you share when the daily refresh will happen of the LB?
Best,
Benedikt
Can you share when the daily refresh will happen of the LB?
Looks like the refresh starts at around 2:30am UTC (source), and the refresh itself takes about 10 minutes. However, it seems like the last daily refresh failed, so I can't confirm that the next one will work.
I've sent you a Slack invite, let me know if it works :)
- Tom Aarsen
Thanks for raising the refresh issue! Opened an issue here: https://github.com/embeddings-benchmark/mteb/issues/1073
@tomaarsen @Muennighoff thanks a lot for the support. I checked the leaderboard and the model is not added to it. It seems that the last refresh failed again: https://github.com/embeddings-benchmark/leaderboard/actions/runs/9884681390
I downloaded the logs, it seems there is some dataset/key error:
2024-07-11T02:56:35.1061892Z Fetching 'leonn71/gte-Qwen2-1.5B-instruct-Q6_K-GGUF' metadata: 100%|██████████| 1264/1264 [00:07<00:00, 160.78it/s]
2024-07-11T02:56:35.1095798Z
2024-07-11T02:56:35.1096583Z Fetching leaderboard results for 'fr': 29%|██▊ | 4/14 [06:35<16:28, 98.83s/it]
2024-07-11T02:56:35.1097677Z Traceback (most recent call last):
2024-07-11T02:56:35.1110169Z File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
2024-07-11T02:56:35.1111011Z return self._engine.get_loc(casted_key)
2024-07-11T02:56:35.1111574Z File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
2024-07-11T02:56:35.1112466Z File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
2024-07-11T02:56:35.1113580Z File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
2024-07-11T02:56:35.1114900Z File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
2024-07-11T02:56:35.1115679Z KeyError: 'PawsXPairClassification (fr)'
2024-07-11T02:56:35.1115934Z
2024-07-11T02:56:35.1116176Z The above exception was the direct cause of the following exception:
2024-07-11T02:56:35.1116534Z
2024-07-11T02:56:35.1116652Z Traceback (most recent call last):
2024-07-11T02:56:35.1117201Z File "/home/runner/work/leaderboard/leaderboard/refresh.py", line 518, in <module>
2024-07-11T02:56:35.1118087Z all_data_tasks, boards_data = refresh_leaderboard()
2024-07-11T02:56:35.1118909Z File "/home/runner/work/leaderboard/leaderboard/refresh.py", line 408, in refresh_leaderboard
2024-07-11T02:56:35.1119632Z data_overall, data_tasks = get_mteb_average(board_config["tasks"])
2024-07-11T02:56:35.1120370Z File "/home/runner/work/leaderboard/leaderboard/refresh.py", line 347, in get_mteb_average
2024-07-11T02:56:35.1120960Z DATA_OVERALL = get_mteb_data(
2024-07-11T02:56:35.1121553Z File "/home/runner/work/leaderboard/leaderboard/refresh.py", line 326, in get_mteb_data
2024-07-11T02:56:35.1122448Z df['PawsXPairClassification (fr)'] = df['PawsXPairClassification (fr)'].fillna(df['PawsX (fr)'])
2024-07-11T02:56:35.1123520Z File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/pandas/core/frame.py", line 4102, in __getitem__
2024-07-11T02:56:35.1124251Z indexer = self.columns.get_loc(key)
2024-07-11T02:56:35.1125216Z File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
2024-07-11T02:56:35.1126051Z raise KeyError(key) from err
2024-07-11T02:56:35.1126436Z KeyError: 'PawsXPairClassification (fr)'
2024-07-11T02:56:35.3368875Z ##[error]Process completed with exit code 1.
2024-07-11T02:56:35.3456086Z Post job cleanup.
2024-07-11T02:56:35.4143970Z [command]/usr/bin/git version
2024-07-11T02:56:35.4179801Z git version 2.45.2
2024-07-11T02:56:35.4224415Z Temporarily overriding HOME='/home/runner/work/_temp/758bb8b9-b035-4365-b04b-de5e29838f3a' before making global git config changes
2024-07-11T02:56:35.4225422Z Adding repository directory to the temporary git global config as a safe directory
2024-07-11T02:56:35.4228112Z [command]/usr/bin/git config --global --add safe.directory /home/runner/work/leaderboard/leaderboard
Well identified, thanks for sharing.
cc @KennethEnevoldsen @orionweller here's some details on the current failure. Could one of you perhaps look into this?
- Tom Aarsen
I read the code and it seems that these lines causes the error: https://github.com/embeddings-benchmark/leaderboard/blob/main/refresh.py#L325-L327
if ('PawsXPairClassification (fr)' in datasets) and ('PawsX (fr)' in cols):
df['PawsXPairClassification (fr)'] = df['PawsXPairClassification (fr)'].fillna(df['PawsX (fr)'])
datasets.remove('PawsX (fr)')
The code checks, if PawsXPairClassification (fr)
is in dataset and PawsX (fr)
in cols. cols are the available columns in the dataframe df
. However, I think there are some cases, where the dataframe df
does not have the column PawsXPairClassification (fr)
and therefore df['PawsXPairClassification (fr)'].fillna(df['PawsX (fr)'])
this part throws the error
Thanks @nv-bschifferer and sorry for the issues!
Should be fixed in https://github.com/embeddings-benchmark/leaderboard/pull/7
Give it 30 minutes or so for the updates to propagate and refresh the leaderboard :)
Looks like there is still something else, feel free to track the progress/comment on https://github.com/embeddings-benchmark/leaderboard/issues/8
EDIT: I was wrong appears to be working for this model!
@tomaarsen : thanks for your patient. It took us a little bit to release the model weights and finishing the paper.
I uploaded the model weights, today. The paper is available on arxiv: NV-Retriever: Improving text embedding models with effective hard-negative mining ; https://arxiv.org/abs/2407.15831