Librarian Bots

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

librarian-bots's activity

davanstrienย 
posted an update 8 days ago
view post
Post
2943
๐Ÿ‡ธ๐Ÿ‡ฐ Hovorte po slovensky? Help build better AI for Slovak!

We only need 90 more annotations to include Slovak in the next Hugging Face FineWeb2-C dataset ( data-is-better-together/fineweb-c) release!

Your contribution will help create better language models for 5+ million Slovak speakers.

Annotate here: data-is-better-together/fineweb-c.

Read more about why we're doing it: https://huggingface.co/blog/davanstrien/fineweb2-community
  • 3 replies
ยท
davanstrienย 
posted an update 14 days ago
view post
Post
1671
Introducing FineWeb-C ๐ŸŒ๐ŸŽ“, a community-built dataset for improving language models in ALL languages.

Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.

318 annotators, 32K+ annotations, 12 languages - and growing! ๐ŸŒ

data-is-better-together/fineweb-c
yjerniteย 
posted an update 22 days ago
view post
Post
2070
๐Ÿ‡ช๐Ÿ‡บ Policy Thoughts in the EU AI Act Implementation ๐Ÿ‡ช๐Ÿ‡บ

There is a lot to like in the first draft of the EU GPAI Code of Practice, especially as regards transparency requirements. The Systemic Risks part, on the other hand, is concerning for both smaller developers and for external stakeholders.

I wrote more on this topic ahead of the next draft. TLDR: more attention to immediate large-scale risks and to collaborative solutions supported by evidence can help everyone - as long as developers disclose sufficient information about their design choices and deployment contexts.

Full blog here, based on our submitted response with @frimelle and @brunatrevelin :

https://huggingface.co/blog/yjernite/eu-draft-cop-risks#on-the-proposed-taxonomy-of-systemic-risks
  • 2 replies
ยท