20 2 66

sometimesanotion PRO

sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

reacted to prithivMLmods's post with 🚀 about 17 hours ago

Reasoning SmolLM2 🚀 🎯Fine-tuning SmolLM2 on a lightweight synthetic reasoning dataset for reasoning-specific tasks. Future updates will focus on lightweight, blazing-fast reasoning models. Until then, check out the blog for fine-tuning details. 🔥Blog : https://huggingface.co/blog/prithivMLmods/smollm2-ft 🔼 Models : + SmolLM2-CoT-360M : https://huggingface.co/prithivMLmods/SmolLM2-CoT-360M + Reasoning-SmolLM2-135M : https://huggingface.co/prithivMLmods/Reasoning-SmolLM2-135M + SmolLM2-CoT-360M-GGUF : https://huggingface.co/prithivMLmods/SmolLM2-CoT-360M-GGUF 🤠 Other Details : + Demo : https://huggingface.co/prithivMLmods/SmolLM2-CoT-360M/blob/main/Demo/SmolLM2%20Demo.ipynb + Fine-tune nB : https://huggingface.co/prithivMLmods/SmolLM2-CoT-360M/blob/main/finetune/SmolLM-FT.ipynb

reacted to prithivMLmods's post with 🔥 about 17 hours ago

liked a model about 17 hours ago

qingy2024/UwU-7B-Instruct

View all activity

Organizations

sometimesanotion's activity

reacted to prithivMLmods's post with 🚀🔥 about 17 hours ago

Post

3339

Reasoning SmolLM2 🚀

🎯Fine-tuning SmolLM2 on a lightweight synthetic reasoning dataset for reasoning-specific tasks. Future updates will focus on lightweight, blazing-fast reasoning models. Until then, check out the blog for fine-tuning details.

🔥Blog : https://huggingface.co/blog/prithivMLmods/smollm2-ft

🔼 Models :
+ SmolLM2-CoT-360M : prithivMLmods/SmolLM2-CoT-360M
+ Reasoning-SmolLM2-135M : prithivMLmods/Reasoning-SmolLM2-135M
+ SmolLM2-CoT-360M-GGUF : prithivMLmods/SmolLM2-CoT-360M-GGUF

🤠 Other Details :
+ Demo : prithivMLmods/SmolLM2-CoT-360M
+ Fine-tune nB : prithivMLmods/SmolLM2-CoT-360M

liked a model about 17 hours ago

qingy2024/UwU-7B-Instruct

Text Generation • Updated 1 day ago • 48 • 21

New activity in mradermacher/Lamarck-14B-v0.6-GGUF about 19 hours ago

Thank you for this!

#1 opened about 19 hours ago by

sometimesanotion

updated a model about 19 hours ago

sometimesanotion/Lamarck-14B-v0.4-Qwenvergence

Text Generation • Updated about 19 hours ago • 123 • 1

New activity in bamec66557/Qwen-2.5-14B-MINUS 1 day ago

Extra SLERP parameters

#1 opened 1 day ago by

sometimesanotion

updated a model 1 day ago

sometimesanotion/Qwen2.5-14B-MinusLike-Slerp-Experimental

Text Generation • Updated 1 day ago • 2 • 1

liked a model 1 day ago

bamec66557/Qwen-2.5-14B-MINUS

Text Generation • Updated 6 days ago • 27 • 3

updated a model 1 day ago

sometimesanotion/Qwen2.5-14B-Vimarckoso-v3

Text Generation • Updated 1 day ago • 330 • 7

New activity in hotmailuser/QwenSlerp2-14B 1 day ago

This should be an interesting merge

#1 opened 2 days ago by

sometimesanotion

updated a model 2 days ago

sometimesanotion/Lamarck-14B-v0.6

Text Generation • Updated 2 days ago • 51 • 1

reacted to singhsidhukuldeep's post with 👍 2 days ago

Post

2681

Groundbreaking Research Alert: Rethinking RAG with Cache-Augmented Generation (CAG)

Researchers from National Chengchi University and Academia Sinica have introduced a paradigm-shifting approach that challenges the conventional wisdom of Retrieval-Augmented Generation (RAG).

Instead of the traditional retrieve-then-generate pipeline, their innovative Cache-Augmented Generation (CAG) framework preloads documents and precomputes key-value caches, eliminating the need for real-time retrieval during inference.

Technical Deep Dive:
- CAG preloads external knowledge and precomputes KV caches, storing them for future use
- The system processes documents only once, regardless of subsequent query volume
- During inference, it loads the precomputed cache alongside user queries, enabling rapid response generation
- The cache reset mechanism allows efficient handling of multiple inference sessions through strategic token truncation

Performance Highlights:
- Achieved superior BERTScore metrics compared to both sparse and dense retrieval RAG systems
- Demonstrated up to 40x faster generation times compared to traditional approaches
- Particularly effective with both SQuAD and HotPotQA datasets, showing robust performance across different knowledge tasks

Why This Matters:
The approach significantly reduces system complexity, eliminates retrieval latency, and mitigates common RAG pipeline errors. As LLMs continue evolving with expanded context windows, this methodology becomes increasingly relevant for knowledge-intensive applications.