Bluesky Community

community
Activity Feed

AI & ML interests

Tools for Bluesky πŸ¦‹

Recent Activity

bluesky-community's activity

cfahlgren1Β 
posted an update about 11 hours ago
view post
Post
303
You'll notice the AI in the SQL Console is much better at working with chatml conversations:

Here's example of unnesting the cfahlgren1/react-code-instructions in less than 10 seconds by asking it. Check it out here: cfahlgren1/react-code-instructions

- "show me the average assistant response length"
- "extract user, system, and assistant messages into separate columns"

It's super easy to work with conversational datasets now with natural language πŸ—£οΈ





clemΒ 
posted an update about 16 hours ago
view post
Post
1086
Cool to see @ylecun joining the top 10 of most followed on HF!

(and leaderboard by @mvaloatto is here: mvaloatto/TCTF)
  • 1 reply
Β·
cfahlgren1Β 
posted an update 5 days ago
davanstrienΒ 
posted an update 8 days ago
view post
Post
2943
πŸ‡ΈπŸ‡° Hovorte po slovensky? Help build better AI for Slovak!

We only need 90 more annotations to include Slovak in the next Hugging Face FineWeb2-C dataset ( data-is-better-together/fineweb-c) release!

Your contribution will help create better language models for 5+ million Slovak speakers.

Annotate here: data-is-better-together/fineweb-c.

Read more about why we're doing it: https://huggingface.co/blog/davanstrien/fineweb2-community
  • 3 replies
Β·
davanstrienΒ 
posted an update 14 days ago
view post
Post
1671
Introducing FineWeb-C πŸŒπŸŽ“, a community-built dataset for improving language models in ALL languages.

Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.

318 annotators, 32K+ annotations, 12 languages - and growing! 🌍

data-is-better-together/fineweb-c
clemΒ 
posted an update 17 days ago
view post
Post
1666
Coming back to Paris Friday to open our new Hugging Face office!

We're at capacity for the party but add your name in the waiting list as we're trying to privatize the passage du Caire for extra space for robots πŸ€–πŸ¦ΎπŸ¦Ώ

https://t.co/enkFXjWndJ
  • 1 reply
Β·
nataliaElvΒ 
posted an update 18 days ago
view post
Post
1639
If you are still wondering how the FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video!

I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out!

https://www.youtube.com/watch?v=_-ORB4WAVGU
nataliaElvΒ 
posted an update 24 days ago
view post
Post
1265
How do your annotations for FineWeb2 compare to your teammates'?

I started contributing some annotations to the FineWeb2 collaborative annotation sprint and I wanted to know if my labelling trends were similar to those of my teammates.

I did some analysis and I wasn't surprised to see that I'm being a bit harsher on my evaluations than my mates πŸ˜‚


Do you want to see how your annotations compare to others?
πŸ‘‰ Go to this Gradio space: nataliaElv/fineweb2_compare_my_annotations
✍️ Enter the dataset that you've contributed to and your Hugging Face username.

How were your results?
- Contribute some annotations: data-is-better-together/fineweb-c
- Join your language channel in Rocket chat: HuggingFaceFW/discussion
cfahlgren1Β 
posted an update about 1 month ago
view post
Post
1925
You can just ask things πŸ—£οΈ

"show me messages in the coding category that are in the top 10% of reward model scores"

Download really high quality instructions from the Llama3.1 405B synthetic dataset πŸ”₯

argilla/magpie-ultra-v1.0

nataliaElvΒ 
posted an update about 1 month ago
view post
Post
1184
We're so close to reaching 100 languages! Can you help us cover the remaining 200? Check if we're still looking for language leads for your language: nataliaElv/language-leads-dashboard
clemΒ 
posted an update about 1 month ago
view post
Post
4341
Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):

- There will be the first major public protest related to AI
- A big company will see its market cap divided by two or more because of AI
- At least 100,000 personal AI robots will be pre-ordered
- China will start to lead the AI race (as a consequence of leading the open-source AI race).
- There will be big breakthroughs in AI for biology and chemistry.
- We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.

How my predictions for 2024 turned out:

- A hyped AI company will go bankrupt or get acquired for a ridiculously low price
βœ… (Inflexion, AdeptAI,...)

- Open-source LLMs will reach the level of the best closed-source LLMs
βœ… with QwQ and dozens of others

- Big breakthroughs in AI for video, time-series, biology and chemistry
βœ… for video πŸ”΄for time-series, biology and chemistry

- We will talk much more about the cost (monetary and environmental) of AI
βœ…Monetary πŸ”΄Environmental (😒)

- A popular media will be mostly AI-generated
βœ… with NotebookLM by Google

- 10 millions AI builders on Hugging Face leading to no increase of unemployment
πŸ”œcurrently 7M of AI builders on Hugging Face
Β·
cfahlgren1Β 
posted an update about 1 month ago
view post
Post
3010
We just dropped an LLM inside the SQL Console 🀯

The amazing, new Qwen/Qwen2.5-Coder-32B-Instruct model can now write SQL for any Hugging Face dataset ✨

It's 2025, you shouldn't be hand writing SQL! This is a big step in making it where anyone can do in depth analysis on a dataset. Let us know what you think πŸ€—
clemΒ 
posted an update about 1 month ago
view post
Post
4371
Hugging Face is becoming the best place to share the most viral AI apps with spaces.

Kolors Virtual Try-on just crossed 6,000,000 unique visitors & is now the #5 most popular space. Congrats to the Kwai Kolors team!

Kwai-Kolors/Kolors-Virtual-Try-On
  • 2 replies
Β·
davanstrienΒ 
posted an update about 1 month ago
view post
Post
505
Increasingly, LLMs are becoming very useful for helping scale annotation tasks, i.e. labelling and filtering. When combined with the structured generation, this can be a very scalable way of doing some pre-annotation without requiring a large team of human annotators.

However, there are quite a few cases where it still doesn't work well. This is a nice paper looking at the limitations of LLM as an annotator for Low Resource Languages: On Limitations of LLM as Annotator for Low Resource Languages (2411.17637).

Humans will still have an important role in the loop to help improve models for all languages (and domains).
nataliaElvΒ 
posted an update about 1 month ago
view post
Post
1633
Would you like to get a high-quality dataset to pre-train LLMs in your language? 🌏

At Hugging Face we're preparing a collaborative annotation effort to build an open-source multilingual dataset as part of the Data is Better Together initiative.

Follow the link below, check if your language is listed and sign up to be a Language Lead!

https://forms.gle/s9nGajBh6Pb9G72J6

🚩 Report: Legal issue(s)

1
#13 opened about 1 month ago by
tigeryfan

Language tags

1
#1 opened about 1 month ago by
nataliaElv

Opt out?

2
#2 opened about 1 month ago by
John-breen