19 45 221

Marc Kovka

GPT007

AI & ML interests

None yet

Recent Activity

liked a model 19 days ago

gokaygokay/Flux-Watercolor-Strokes-LoRA

reacted to lewtun's post with 🔥 19 days ago

We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥 How? By combining step-wise reward models with tree search algorithms :) We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think" We're open sourcing the full recipe and sharing a detailed blog post. In our blog post we cover: 📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time. 🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets. 🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM Here's the links: - Blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute - Code: https://github.com/huggingface/search-and-learn Enjoy!

View all activity

Organizations

None yet

GPT007's activity

liked a model 19 days ago

gokaygokay/Flux-Watercolor-Strokes-LoRA

Text-to-Image • Updated 19 days ago • 132 • • 4

reacted to lewtun's post with 🔥 19 days ago

Post

6656

We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!

2 replies

liked a model 4 months ago

allenai/OLMoE-1B-7B-0924-Instruct

Text Generation • Updated Sep 13, 2024 • 4.28k • 86

liked 3 Spaces 4 months ago

Running on Zero

258

🪐

Latent Navigation

Runtime error

305

🔍🕵️

Enhance This DemoFusion SDXL

Creative Upscaler High-Res Image Generation DemoFusion SDXL

Running on Zero

1.05k

🖼️🪄

Finegrain Image Enhancer

Clarity AI Upscaler Reproduction

liked a model 4 months ago

maximuspowers/bias-detection-ner

Token Classification • Updated Sep 2, 2024 • 139 • 3

reacted to m-ric's post with 🔥 5 months ago

Post

3401

𝗚𝗼𝗼𝗴𝗹𝗲 𝗽𝗮𝗽𝗲𝗿 : 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 𝘂𝗽 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗼𝗺𝗽𝘂𝘁𝗲 𝗯𝗲𝗮𝘁𝘀 𝟭𝟰𝘅 𝗹𝗮𝗿𝗴𝗲𝗿 𝗺𝗼𝗱𝗲𝗹𝘀 🚀

Remember scaling laws? These are empirical laws that say "the bigger your model, the better it gets". More precisely, "as your compute increases exponentially, loss decreases in a linear fashion". They have wild implications, suggesting that spending 100x more training compute would make you super-LLMs. That's why companies are racing to build the biggest AI superclusters ever, and Meta bought 350k H100 GPUs, which probably cost in the order of $1B.

But think of this : we're building huge reasoning machines, but only ask them to do one pass through the model to get one token of the final answer : i.e., we expend a minimal effort on inference. That's like building a Caterpillar truck and making it run on a lawnmower's motor. 🚚🛵 Couldn't we optimize this? 🤔

💡 So instead of scaling up on training by training even bigger models on many more trillions of tokens, Google researchers explored this under-explored avenue : scaling up inference compute.

They combine two methods to use more compute : either a reviser that iterated to adapt the model distribution, or generate N different completions (for instance through Beam Search) and select only the best one using an additional verifier model.

They use a Palm-2 model (released in May 23) on the MATH dataset : Palm-2 has the advantage of getting a low performance on MATH, but not zero, so that improvements will be noticeable.

And the results show that for the same fixed amount of inference compute:
💥 a smaller model with more effort on decoding beats a x14 bigger model using naive greedy sampling.

That means that you can divide your training costs by 14 and still get the same perf for the same inference cost!

Take that, scaling laws. Mark Zuckerberg, you're welcome, hope I can get some of these H100s.

Read the paper here 👉 Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (2408.03314)

1 reply

reacted to Abhaykoul's post with 🔥 5 months ago

Post

2969

Introducing HelpingAI2-9B, an emotionally intelligent LLM.
Model Link : https://huggingface.co/OEvortex/HelpingAI2-9B
Demo Link: Abhaykoul/HelpingAI2

This model is part of the innovative HelpingAI series and it stands out for its ability to engage users with emotional understanding.

Key Features:
-----------------

* It gets 95.89 score on EQ Bench greather than all top notch LLMs, reflecting advanced emotional recognition.
* It gives responses in empathetic and supportive manner.

Must try our demo: Abhaykoul/HelpingAI2

1 reply

liked 2 Spaces 5 months ago

Runtime error

634

🎨

FLUX.1 [Inpainting]

Runtime error

😻

Blending

reacted to alvdansen's post with 🤗 5 months ago

Post

6977

Alright Ya'll

I know it's a Saturday, but I decided to release my first Flux Dev Lora.

A retrain of my "Frosting Lane" model and I am sure the styles will just keep improving.

Have fun! Link Below - Thanks again to @ostris for the trainer and Black Forest Labs for the awesome model!

alvdansen/frosting_lane_flux

liked a Space 5 months ago

Runtime error

🚀

dracula_revamped

upvoted an article 5 months ago

Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Jul 31, 2024

• 59

reacted to singhsidhukuldeep's post with 😎 5 months ago

Post

2762

What is the best LLM for RAG systems? 🤔

In a business setting, it will be the one that gives the best performance at a great price! 💼💰

And maybe it should be easy to fine-tune, cheap to fine-tune... FREE to fine-tune? 😲✨

That's @Google Gemini 1.5 Flash! 🚀🌟

It now supports fine-tuning, and the inference cost is the same as the base model! <coughs LORA adopters> 🤭🤖

So the base model must be expensive? 💸
For the base model, the input price is reduced by 78% to $0.075/1 million tokens and the output price by 71% to $0.3/1 million tokens. 📉💵

But is it any good? 🤷‍♂️
On the LLM Hallucination Index, Gemini 1.5 Flash achieved great context adherence scores of 0.94, 1, and 0.92 across short, medium, and long contexts. 📊🎯

Google has finally given a model that is free to tune and offers an excellent balance between performance and cost. ⚖️👌

Happy tuning... 🎶🔧

Gemini 1.5 Flash: https://developers.googleblog.com/en/gemini-15-flash-updates-google-ai-studio-gemini-api/ 🔗

LLM Hallucination Index: https://www.rungalileo.io/hallucinationindex 🔗

1 reply

reacted to not-lain's post with 🔥 5 months ago

Post

6770

🔥 New state of the art model for background removal is out
🤗 You can try the model at ZhengPeng7/BiRefNet
📈 model shows impressive results outperforming briaai/RMBG-1.4
🚀 you can try out the model in: ZhengPeng7/BiRefNet_demo

📃paper: Bilateral Reference for High-Resolution Dichotomous Image Segmentation (2401.03407)

1 reply

reacted to Jaward's post with 🔥 5 months ago

Post

1778

PyTorch implementation of the Self-Compression & Differentiable Quantization Algorithm introduced in “Self-Compressing Neural Networks” paper.

The algorithm shows dynamic neural network compression during training - with reduced size of weight, activation tensors and bits required to represent weights.

It’s basically shrinking the neural network size (weights and activations) as it’s being trained without compromising performance - this helps reduce compute and inference cost.

Code: https://github.com/Jaykef/ai-algorithms
Paper: https://arxiv.org/pdf/2301.13142

reacted to MonsterMMORPG's post with ❤️🚀🔥 5 months ago

Post

2380

Live Portrait Updated to V5

Animals Live animation added

All of the main repo changes and improvements added to our modified and improve app

Link : https://patreon.com/posts/107609670

Works perfect on Massed Compute, RunPod, free Kaggle account and Windows

1-Click to install with instructions

All tested and verified

Windows tutorial : https://youtu.be/FPtpNrmuwXk

Cloud (RunPod, Massed Compute & free Kaggle account) tutorial : https://youtu.be/wG7oPp01COg

Making XPose / UniPose / ops library compiling working was a challenge on Massed Compute and Kaggle.