BigLAM: BigScience Libraries, Archives and Museums

non-profit

AI & ML interests

๐Ÿค— Hugging Face x ๐ŸŒธ BigScience initiative to create open source community resources for LAMs.

Recent Activity

biglam's activity

alielfilali01ย 
posted an update 5 days ago
view post
Post
1673
~75% on the challenging GPQA with only 40M parameters ๐Ÿ”ฅ๐Ÿฅณ

GREAT ACHIEVEMENT ! Or is it ?

This new Work, "Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation", take out the mystery about many models i personally suspected their results. Speacially on leaderboards other than the english one, Like the Open Arabic LLM Leaderbaord OALL/Open-Arabic-LLM-Leaderboard.

The authors of this work, first started by training a model on the GPQA data, which, unsurprisingly, led to the model achieving 100% performance.

Afterward, they trained what they referred to as a 'legitimate' model on legitimate data (MedMCQA). However, they introduced a distillation loss from the earlier, 'cheated' model.

What they discovered was fascinating: the knowledge of GPQA leaked through this distillation loss, even though the legitimate model was never explicitly trained on GPQA during this stage.

This raises important questions about the careful use of distillation in model training, especially when the training data is opaque. As they demonstrated, itโ€™s apparently possible to (intentionally or unintentionally) leak test data through this method.

Find out more: Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation (2412.15255)
  • 1 reply
ยท
davanstrienย 
posted an update 8 days ago
view post
Post
2943
๐Ÿ‡ธ๐Ÿ‡ฐ Hovorte po slovensky? Help build better AI for Slovak!

We only need 90 more annotations to include Slovak in the next Hugging Face FineWeb2-C dataset ( data-is-better-together/fineweb-c) release!

Your contribution will help create better language models for 5+ million Slovak speakers.

Annotate here: data-is-better-together/fineweb-c.

Read more about why we're doing it: https://huggingface.co/blog/davanstrien/fineweb2-community
  • 3 replies
ยท
davanstrienย 
posted an update 14 days ago
view post
Post
1671
Introducing FineWeb-C ๐ŸŒ๐ŸŽ“, a community-built dataset for improving language models in ALL languages.

Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.

318 annotators, 32K+ annotations, 12 languages - and growing! ๐ŸŒ

data-is-better-together/fineweb-c
alielfilali01ย 
posted an update 22 days ago
view post
Post
3383
Unpopular opinion: Open Source takes courage to do !

Not everyone is brave enough to release what they have done (the way they've done it) to the wild to be judged !
It really requires a high level of "knowing wth are you doing" ! It's kind of a super power !

Cheers to the heroes here who see this!
ยท
alielfilali01ย 
posted an update 26 days ago
view post
Post
1505
Apparently i forgot to put this here !

Well, this is a bit late but consider given our recent blog a read if you are interested in Evaluation.

You don't have to be into Arabic NLP in order to read it, the main contribution we are introducing is a new evaluation measure for NLG. We made the fisrt application of this measure on Arabic for now and we will be working with colleagues from the community to expand it to other languages.

Blog:
Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard
https://huggingface.co/blog/leaderboard-3c3h-aragen

Space:
inceptionai/AraGen-Leaderboard

Give it a read and let me know your thoughts ๐Ÿค—
stefan-itย 
posted an update 26 days ago
view post
Post
1183
My latest project is the outcome of the last 2+ years working with TPUs from the amazing TPU Research Cloud (TRC) program and training Encoder-only LMs with the TensorFlow Model Garden library.

๐Ÿ‘‰ Link: https://github.com/stefan-it/model-garden-lms

An overview of some features:

- Cheatsheet for setting-up a TPU VM Pod (with all necessary dependencies) to pretrain LMs with TF Model Garden
- Conversion scripts that convert TF Model Garden weights to Hugging Face Transformers-compatible models
- Supported architectures include BERT, BERT with Token Dropping and TEAMS

I also released BERT-based models pretrained on the great Hugging Face FineWeb and FineWeb-Edu datasets (10BT subset). With more to come!

๐Ÿ‘‰ Model Hub Link: https://huggingface.co/model-garden-lms

If you find these resources useful, please give them a like!

Made from Bavarian Oberland with โค๏ธ and ๐Ÿฅจ.
christopherย 
posted an update 27 days ago
view post
Post
1582
The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot
ยท
christopherย 
posted an update 29 days ago
davanstrienย 
posted an update about 1 month ago
view post
Post
505
Increasingly, LLMs are becoming very useful for helping scale annotation tasks, i.e. labelling and filtering. When combined with the structured generation, this can be a very scalable way of doing some pre-annotation without requiring a large team of human annotators.

However, there are quite a few cases where it still doesn't work well. This is a nice paper looking at the limitations of LLM as an annotator for Low Resource Languages: On Limitations of LLM as Annotator for Low Resource Languages (2411.17637).

Humans will still have an important role in the loop to help improve models for all languages (and domains).
davanstrienย 
posted an update about 1 month ago
view post
Post
2481
First dataset for the new Hugging Face Bluesky community organisation: bluesky-community/one-million-bluesky-posts ๐Ÿฆ‹

๐Ÿ“Š 1M public posts from Bluesky's firehose API
๐Ÿ” Includes text, metadata, and language predictions
๐Ÿ”ฌ Perfect to experiment with using ML for Bluesky ๐Ÿค—

Excited to see people build more open tools for a more open social media platform!
davanstrienย 
posted an update about 1 month ago
view post
Post
1351
The Bluesky AT Protocol unlocks exciting possibilities:
- Building custom feeds using ML
- Creating dashboards for data exploration
- Developing custom models for Bluesky
To gather Bluesky resources on the Hub, I've created a community org: https://huggingface.co/bluesky-community

My first rather modest contribution is a dashboard that shows the number of posts every second. Drinking straight from the firehose API ๐Ÿšฐ

bluesky-community/bluesky-posts-over-time
  • 1 reply
ยท
davanstrienย 
posted an update about 2 months ago
albertvillanovaย 
posted an update about 2 months ago
view post
Post
1443
๐Ÿšจ How green is your model? ๐ŸŒฑ Introducing a new feature in the Comparator tool: Environmental Impact for responsible #LLM research!
๐Ÿ‘‰ open-llm-leaderboard/comparator
Now, you can not only compare models by performance, but also by their environmental footprint!

๐ŸŒ The Comparator calculates COโ‚‚ emissions during evaluation and shows key model characteristics: evaluation score, number of parameters, architecture, precision, type... ๐Ÿ› ๏ธ
Make informed decisions about your model's impact on the planet and join the movement towards greener AI!
alielfilali01ย 
posted an update about 2 months ago
view post
Post
2185
Unpopular opinion : o1-preview is more stupid than 4o and Qwen2.5-72B-Instruct in extremely underrated !
  • 2 replies
ยท
albertvillanovaย 
posted an update about 2 months ago
view post
Post
1530
๐Ÿš€ New feature of the Comparator of the ๐Ÿค— Open LLM Leaderboard: now compare models with their base versions & derivatives (finetunes, adapters, etc.). Perfect for tracking how adjustments affect performance & seeing innovations in action. Dive deeper into the leaderboard!

๐Ÿ› ๏ธ Here's how to use it:
1. Select your model from the leaderboard.
2. Load its model tree.
3. Choose any base & derived models (adapters, finetunes, merges, quantizations) for comparison.
4. Press Load.
See side-by-side performance metrics instantly!

Ready to dive in? ๐Ÿ† Try the ๐Ÿค— Open LLM Leaderboard Comparator now! See how models stack up against their base versions and derivatives to understand fine-tuning and other adjustments. Easier model analysis for better insights! Check it out here: open-llm-leaderboard/comparator ๐ŸŒ
davanstrienย 
posted an update 2 months ago
albertvillanovaย 
posted an update 2 months ago
view post
Post
3134
๐Ÿš€ Exciting update! You can now compare multiple models side-by-side with the Hugging Face Open LLM Comparator! ๐Ÿ“Š

open-llm-leaderboard/comparator

Dive into multi-model evaluations, pinpoint the best model for your needs, and explore insights across top open LLMs all in one place. Ready to level up your model comparison game?