Librarian Bots

🇸🇰 Hovorte po slovensky? Help build better AI for Slovak!

We only need 90 more annotations to include Slovak in the next Hugging Face FineWeb2-C dataset ( data-is-better-together/fineweb-c) release!

Your contribution will help create better language models for 5+ million Slovak speakers.

Annotate here: data-is-better-together/fineweb-c.

Read more about why we're doing it: https://huggingface.co/blog/davanstrien/fineweb2-community

3 replies

librarian-bot

in librarian-bots/dataset-to-model-monitor 8 days ago

Discussion tracking new models trained on HuggingFaceH4/grok-conversation-harmless

#45 opened 11 months ago by

librarian-bot

davanstrien

posted an update 14 days ago

Post

1671

Introducing FineWeb-C 🌐🎓, a community-built dataset for improving language models in ALL languages.

Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.

318 annotators, 32K+ annotations, 12 languages - and growing! 🌍

data-is-better-together/fineweb-c

yjernite

posted an update 22 days ago

Post

2070

🇪🇺 Policy Thoughts in the EU AI Act Implementation 🇪🇺

There is a lot to like in the first draft of the EU GPAI Code of Practice, especially as regards transparency requirements. The Systemic Risks part, on the other hand, is concerning for both smaller developers and for external stakeholders.

I wrote more on this topic ahead of the next draft. TLDR: more attention to immediate large-scale risks and to collaborative solutions supported by evidence can help everyone - as long as developers disclose sufficient information about their design choices and deployment contexts.

Full blog here, based on our submitted response with @frimelle and @brunatrevelin :

https://huggingface.co/blog/yjernite/eu-draft-cop-risks#on-the-proposed-taxonomy-of-systemic-risks

2 replies

AI & ML interests

Recent Activity

Team members 3

librarian-bots's activity

Discussion tracking new models trained on HuggingFaceH4/cai-conversation-harmless

Discussion tracking new models trained on HuggingFaceH4/ultrafeedback_binarized

Discussion tracking new models trained on nvidia/HelpSteer

Discussion tracking new models trained on Open-Orca/OpenOrca

Discussion tracking new models trained on HuggingFaceH4/ultrachat_200k

Discussion tracking new models trained on davanstrien/model_cards_with_readmes

Discussion tracking new models trained on LDJnr/Capybara

Discussion tracking new models trained on argilla/distilabel-intel-orca-dpo-pairs

Discussion tracking new models trained on BAAI/TACO

Discussion tracking new models trained on google/fleurs

Discussion tracking new models trained on argilla/ultrafeedback-binarized-preferences-cleaned

Discussion tracking new models trained on HuggingFaceH4/no_robots

Discussion tracking new models trained on OpenAssistant/oasst1

Discussion tracking new models trained on HuggingFaceH4/grok-conversation-harmless