Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1058

Suggestion: Adding outlier-resistant averaging methods

#968

by zelk12 - opened Oct 6, 2024

Discussion

zelk12

Oct 6, 2024

Add an option of outputting model parameters, taking into account exploding (very large values in one of the columns). To be able to find models, average equals, capable of solving all of the problems presented here in the tests.

Link to the verification .xlsx file in Google Drive.

zelk12 changed discussion title from suggestion: Additional option for outputting indicators to Suggestion: Additional option for outputting indicators Oct 6, 2024

alozowski

Open LLM Leaderboard org Oct 7, 2024

Hi @zelk12 ,

Thank you for your suggestion! Do I understand you correctly that you would like to add a colour differentiation of results on Leaderboard? Or what models' parameters do you want to see?

zelk12

Oct 7, 2024

Hello. This is more about ?calculating the results?. In general, if we look at Figure 1, we can see that the first model, by average1, should be in second place. But by average2, it is in first place.

The second half is calculated by taking into account values that are very different in the results.

As an example, I will set one of the parameters of the third model to 9,000.

Here we can see that in the first table, due to the calculation of the mean, model 3 is in the lead, but in the second table it can only rank second.

The same is true if we set the model two parameters as 9,000.

It is only when we set the 3 parameters as 9,000 that model 3 and in the second table ranks 1 in terms of average.

Something like that. Unfortunately, I'm not very good at explaining things.

alozowski

Open LLM Leaderboard org Oct 9, 2024

I think I got your idea, thank you!
You're pointing out that the current method of calculating averages doesn't account for extreme values in one or more columns, which can skew the results. So the goal of harmonising the average score is to find models that perform well across all tasks, rather than letting outliers dominate the average score

This idea makes sense, we need to discuss it internally and I will get back to you with my answer

alozowski

Open LLM Leaderboard org Oct 9, 2024

Let me rename the discussion, feel free to correct me

alozowski changed discussion title from Suggestion: Additional option for outputting indicators to Suggestion: Adding outlier-resistant averaging methods Oct 9, 2024

zelk12

Oct 9, 2024

Ok. And yes, you probably have the right definition.

alozowski

Open LLM Leaderboard org Oct 11, 2024

I'm back with our thoughts – we've decided to maintain our current arithmetic mean approach due to its simplicity and wide understanding. Plus, since we're currently normalising the scores, we're mitigating an outlier effect

Nevertheless, I will keep in mind your approach and might get back to it later

Let me close this discussion for now, we greatly appreciate your involvement! Please, feel free to share any your ideas here in discussions and don't hesitate to ask questions in case of any problems!

alozowski changed discussion status to closed Oct 11, 2024

zelk12

Oct 11, 2024

As an option, this approach can be added to another column.

alozowski

Open LLM Leaderboard org Oct 14, 2024

Yes, we discussed it as a separate column, but the logic remains the same as I've described above

CombinHorizon

13 days ago

if anyone here wants only: over-fitting (too high score) outlier suppression, but not remove low-score outliers,
then maybe some mix of geometric mean and harmonic mean,

instead using odds ratio (should be truncated at extremes near 1 or 0, to avoid problems) based averaging may instead keep significance of result e.g. 0.99 vs 0.95 vs 0.9, and 0.01 vs 0.03 vs 0.1,
and using a weighted average of these, is maybe an option?

clefourrier

Open LLM Leaderboard org 11 days ago

Hi!
We try to avoid adding too many options to the leaderboard to keep it usable by the majority of people. If you want to compute your own custom geometric/harmonic means on the results, you can do so by downloading the contents here: https://huggingface.co/datasets/open-llm-leaderboard/contents/tree/main

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment