26 1

Dattu Sharma

imdatta0

https://datta0.github.io/

AI & ML interests

Everything ML. Specifically Deep Learning.

Recent Activity

updated a model 17 days ago

imdatta0/lora_final

new activity 25 days ago

meta-llama/Llama-3.3-70B-Instruct:Tokenizer doesn't load with transformers 4.34.4

updated a model 27 days ago

imdatta0/l3.1-8b-ins-magiccoder

View all activity

Organizations

imdatta0's activity

New activity in meta-llama/Llama-3.3-70B-Instruct 25 days ago

Tokenizer doesn't load with transformers 4.34.4

#21 opened 26 days ago by

imdatta0

New activity in imdatta0/wikipedia_en_sample about 1 month ago

Librarian Bot: Add language metadata for dataset

#2 opened about 2 months ago by

librarian-bot

commented 3 papers about 1 month ago

commented 2 papers about 2 months ago

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

Paper • 2410.07145 • Published Oct 9, 2024 • 2 •

Round and Round We Go! What makes Rotary Positional Encodings useful?

Paper • 2410.06205 • Published Oct 8, 2024 • 1 •

commented 5 papers 3 months ago

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1, 2024 • 30 •

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8, 2024 • 107 •

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 168 •

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1, 2024 • 30 •

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 138 •

commented 7 papers 4 months ago

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27, 2024 • 37 •

KTO: Model Alignment as Prospect Theoretic Optimization

Paper • 2402.01306 • Published Feb 2, 2024 • 16 •

Planning In Natural Language Improves LLM Search For Code Generation

Paper • 2409.03733 • Published Sep 5, 2024 •

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3, 2024 • 77 •

FocusLLM: Scaling LLM's Context by Parallel Decoding

Paper • 2408.11745 • Published Aug 21, 2024 • 23 •

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22, 2024 • 30 •

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 57 •

New activity in imdatta0/pints 4 months ago

Librarian Bot: Add language metadata for dataset

#1 opened 4 months ago by

librarian-bot