sometimesanotion PRO

sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

liked a model about 17 hours ago
qingy2024/UwU-7B-Instruct
View all activity

Organizations

Hugging Face Discord Community's profile picture

sometimesanotion's activity

reacted to prithivMLmods's post with ๐Ÿš€๐Ÿ”ฅ about 17 hours ago
view post
Post
3339
Reasoning SmolLM2 ๐Ÿš€

๐ŸŽฏFine-tuning SmolLM2 on a lightweight synthetic reasoning dataset for reasoning-specific tasks. Future updates will focus on lightweight, blazing-fast reasoning models. Until then, check out the blog for fine-tuning details.

๐Ÿ”ฅBlog : https://huggingface.co/blog/prithivMLmods/smollm2-ft

๐Ÿ”ผ Models :
+ SmolLM2-CoT-360M : prithivMLmods/SmolLM2-CoT-360M
+ Reasoning-SmolLM2-135M : prithivMLmods/Reasoning-SmolLM2-135M
+ SmolLM2-CoT-360M-GGUF : prithivMLmods/SmolLM2-CoT-360M-GGUF

๐Ÿค  Other Details :
+ Demo : prithivMLmods/SmolLM2-CoT-360M
+ Fine-tune nB : prithivMLmods/SmolLM2-CoT-360M




New activity in mradermacher/Lamarck-14B-v0.6-GGUF about 19 hours ago

Thank you for this!

1
#1 opened about 19 hours ago by
sometimesanotion
New activity in bamec66557/Qwen-2.5-14B-MINUS 1 day ago
reacted to singhsidhukuldeep's post with ๐Ÿ‘ 2 days ago
view post
Post
2681
Groundbreaking Research Alert: Rethinking RAG with Cache-Augmented Generation (CAG)

Researchers from National Chengchi University and Academia Sinica have introduced a paradigm-shifting approach that challenges the conventional wisdom of Retrieval-Augmented Generation (RAG).

Instead of the traditional retrieve-then-generate pipeline, their innovative Cache-Augmented Generation (CAG) framework preloads documents and precomputes key-value caches, eliminating the need for real-time retrieval during inference.

Technical Deep Dive:
- CAG preloads external knowledge and precomputes KV caches, storing them for future use
- The system processes documents only once, regardless of subsequent query volume
- During inference, it loads the precomputed cache alongside user queries, enabling rapid response generation
- The cache reset mechanism allows efficient handling of multiple inference sessions through strategic token truncation

Performance Highlights:
- Achieved superior BERTScore metrics compared to both sparse and dense retrieval RAG systems
- Demonstrated up to 40x faster generation times compared to traditional approaches
- Particularly effective with both SQuAD and HotPotQA datasets, showing robust performance across different knowledge tasks

Why This Matters:
The approach significantly reduces system complexity, eliminates retrieval latency, and mitigates common RAG pipeline errors. As LLMs continue evolving with expanded context windows, this methodology becomes increasingly relevant for knowledge-intensive applications.

14B model detected as 7B

7
#1049 opened 15 days ago by
djuna