blog-explorers (Blog-explorers)

wolfram

in blog-explorers/README about 14 hours ago

[Support] Community Articles

67

#5 opened 10 months ago by

victor

roseking

posted an update 3 days ago

Post

2434

🤗 Hugging Face Download Tool

The Hugging Face Download Tool is a sophisticated graphical user interface application designed to simplify the process of downloading resources from Hugging Face repositories. This tool addresses common challenges in model and file downloads through its intelligent features and user-friendly interface.

✨ Key Features
- 🖥️ Intuitive graphical interface for easy operation
- 🔄 Advanced retry mechanism with smart error handling
- ⏸️ Resume capability for interrupted downloads
- 📊 Real-time download status monitoring
- 🔐 Secure access to private repositories via token authentication

🛠️ Technical Highlights
The tool implements several advanced features to ensure reliable downloads:
- 📦 Chunk-based downloading with 1MB segments
- ⚡ Adaptive retry intervals (5-300 seconds) based on error types
- 🔌 Connection pooling for optimized performance
- 🛡️ Built-in rate limiting protection
- 🔑 Secure token handling for private repository access

This tool is ideal for researchers, developers, and AI practitioners who regularly work with Hugging Face resources and need a reliable, user-friendly download solution. 💻 It supports all major operating systems and requires minimal setup, making it accessible to users of all technical levels. 🚀

GitHub：https://github.com/2404589803/hf_downloader

2 replies

·

sheryc

authored a paper 17 days ago

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published 29 days ago • 11

AtAndDev

posted an update 17 days ago

Post

359

@s3nh Hey man check your discord! Got some news.

4 replies

·

celinah

posted an update 18 days ago

Post

596

🚀 We've just dropped a new release v0.27.0 of the 𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎_𝚑𝚞𝚋 Python library!

This release includes:
- 💾 New torch model loading utilities in the serialization module — providing a standardized way to save and load torch models with built-in support for sharding and safe serialization.
- 📦 Tooling for something exciting — if you like single-file formats for models like GGUF, you'll love what we're cooking up 👀 More coming soon!
- 🛠️ Loads of quality-of-life improvements and bug fixes!

release notes and full details here 👇
Wauplin/huggingface_hub#10

$ pip install -U huggingface_hub

julien-c

posted an update 24 days ago

Post

7920

After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community 🔥

cc: @reach-vb @pierric @victor and the HF team

28 replies

·

christopher

posted an update 26 days ago

Post

1582

The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot

3 replies

·

christopher

posted an update 29 days ago

Post

2331

The Lichess database of games, puzzles, and engine evaluations is now on the Hub: https://huggingface.co/Lichess

Billions of chess data points to download, query, and stream and we're excited to see what you'll build with it! ♟️ 🤗

- Lichess/positions-datasets-66f50837db5cd3287d60d489
- Lichess/games-datasets-66f508df78f4b43e1bb2d353

abhishek

posted an update about 1 month ago

Post

1690

🎉 SUPER BLACK FRIDAY DEAL 🎉

Train almost any model on a variety of tasks such as llm finetuning, text classification/regression, summarization, question answering, image classification/regression, object detection, tabular data, etc for FREE using AutoTrain locally. 🔥
https://github.com/huggingface/autotrain-advanced

julien-c

posted an update about 1 month ago

Post

2339

wow 😮

INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.

PrimeIntellect/INTELLECT-1-Instruct

fede97

authored a paper about 1 month ago

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Paper • 2411.16863 • Published Nov 25, 2024

kcz358

authored a paper about 1 month ago

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Paper • 2411.14982 • Published Nov 22, 2024 • 16

monsoon-nlp

posted an update about 1 month ago

Post

1429

Great to see Tatta Bio release an embeddings version of their DNA/protein language model 🧬: tattabio/gLM2_650M_embed

4 replies

·

louisbrulenaudet

posted an update about 2 months ago

Post

1785

I’ve published a new dataset to simplify model merging 🤗

This dataset facilitates the search for compatible architectures for model merging with @arcee_ai’s mergekit, streamlining the automation of high-performance merge searches 📖

Dataset : louisbrulenaudet/mergekit-configs

1 reply

·

abhishek

posted an update about 2 months ago

Post

5505

INTRODUCING Hugging Face AutoTrain Client 🔥
Fine-tuning models got even easier!!!!
Now you can fine-tune SOTA models on all compatible dataset-model pairs on Hugging Face Hub using Python on Hugging Face Servers. Choose from a number of GPU flavors, millions of models and dataset pairs and 10+ tasks 🤗

To try, install autotrain-advanced using pip. You can ignore dependencies and install without --no-deps and then you'd need to install some dependencies by hand.

"pip install autotrain-advanced"

Github repo: https://github.com/huggingface/autotrain-advanced

6 replies

·

Aurelien-Morgan

posted an update 2 months ago

Post

477

I just shipped retrain-pipelines 0.1.1 today. The doc is also pimped compared to previous release. That was clearly not mature then.
I'll have to focus on another project for the next couple weeks but, anyone feel free to open issues on the GitHub repo and discuss any interest you'd have there if you will (please?) !
In the meantime, you may enjoy retrying this :
https://huggingface.co/blog/Aurelien-Morgan/stateful-metaflow-on-colab

louisbrulenaudet

posted an update 2 months ago

Post

1174

Introducing Lemone-router, a series of classification models designed to produce an optimal multi-agent system for different branches of tax law.

Trained on a base of 49k lines comprising a set of synthetic questions generated by GPT-4 Turbo and Llama 3.1 70B, which have been further refined through evol-instruction tuning and manual curation and authority documents, these models are based on an 8-category decomposition of the classification scheme derived from the Bulletin officiel des finances publiques - impôts :

label2id = {
    "Bénéfices professionnels": 0,
    "Contrôle et contentieux": 1,
    "Dispositifs transversaux": 2,
    "Fiscalité des entreprises": 3,
    "Patrimoine et enregistrement": 4,
    "Revenus particuliers": 5,
    "Revenus patrimoniaux": 6,
    "Taxes sur la consommation": 7
}
	
id2label = {
    0: "Bénéfices professionnels",
    1: "Contrôle et contentieux",
    2: "Dispositifs transversaux",
    3: "Fiscalité des entreprises",
    4: "Patrimoine et enregistrement",
    5: "Revenus particuliers",
    6: "Revenus patrimoniaux",
    7: "Taxes sur la consommation"
}

It achieves the following results on the evaluation set:
- Loss: 0.4734
- Accuracy: 0.9191

Link to the collection: louisbrulenaudet/lemone-router-671cce21d6410f3570514762

abhishek

authored a paper 2 months ago

AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published Oct 21, 2024 • 59

abhishek

posted an update 2 months ago

Post

4389

AutoTrain: No-code training for state-of-the-art models (2410.15735)

celinah

posted an update 3 months ago

Post

1110

📣 𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎_𝚑𝚞𝚋 v0.26.0 is out with some new features and improvements!

✨ 𝗧𝗼𝗽 𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀:
- 🔐 Multiple access tokens support: Easily manage multiple access tokens with new CLI commands. Perfect for handling multiple tokens with specific permissions in production or when collaborating with external teams.
- 🖼️ Conversational VLMs inference is now supported with InferenceClient's chat completion!
- 📄 Daily Papers API: Seamlessly search and retrieve detailed paper information from the Hub!

We’ve also introduced multiple bug fixes and quality-of-life improvements - thanks to the awesome contributions from our community! 🤗

Check out the release notes here: Wauplin/huggingface_hub#9

and you can try it out now 👇

pip install huggingface_hub==0.26.0

Blog-explorers

AI & ML interests

Recent Activity

blog-explorers's activity

[Support] Community Articles

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

AutoTrain: No-code training for state-of-the-art models

AI & ML interests

Recent Activity

Team members 670

blog-explorers's activity

[Support] Community Articles