kiran's picture

kiran

kira

·

ki6an

AI & ML interests

agi

Recent Activity

new activity 2 days ago

MiniMaxAI/MiniMax-Text-01:Request: Add vLLM Support for This Model

liked a model 11 days ago

microsoft/phi-4

liked a dataset 11 days ago

PowerInfer/QWQ-LONGCOT-500K

View all activity

Organizations

kira's activity

upvoted 2 collections 2 months ago

xLAM models

xLAM: A Family of Large Action Models to Empower AI Agent Systems: https://github.com/SalesforceAIResearch/xLAM • 11 items • Updated about 1 month ago • 45

Qwen2.5-Coder

Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 261

upvoted a collection 3 months ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated 28 days ago • 204

upvoted 2 collections 6 months ago

Mini Pretrain Datasets

9 items • Updated Jul 9, 2024 • 9

Useful Pretrain-Datasets

pretrain-datasets with (maybe) good quality • 20 items • Updated Jun 12, 2024 • 1

upvoted a collection 8 months ago

Yi-1.5 (2024/05)

10 items • Updated May 20, 2024 • 92

upvoted a collection 9 months ago

GPT-4 generated datasets

Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs. • 18 items • Updated Apr 16, 2024 • 9

upvoted a paper 9 months ago

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 65

upvoted 4 papers about 1 year ago

Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 21

Extending LLMs' Context Window with 100 Samples

Paper • 2401.07004 • Published Jan 13, 2024 • 15

Scalable Pre-training of Large Autoregressive Image Models

Paper • 2401.08541 • Published Jan 16, 2024 • 36

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13, 2024 • 25

upvoted a collection about 1 year ago

Papers about model merging

referenced in the mergekit repo: https://github.com/cg123/mergekit • 4 items • Updated Feb 13, 2024 • 14

upvoted a paper about 1 year ago

CogVLM: Visual Expert for Pretrained Language Models

Paper • 2311.03079 • Published Nov 6, 2023 • 23

upvoted 5 papers over 1 year ago

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 17

One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 31

SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference

Paper • 2307.02628 • Published Jul 5, 2023 • 10

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

Paper • 2306.17107 • Published Jun 29, 2023 • 11

Extending Context Window of Large Language Models via Positional Interpolation

Paper • 2306.15595 • Published Jun 27, 2023 • 53