28 401 18

Fangyuan Yu PRO

Ksgk-fy

fangyuan-ksgk

AI & ML interests

AGI

Recent Activity

upvoted a paper 3 days ago

LTX-Video: Realtime Video Latent Diffusion

updated a collection 3 days ago

Cognition

updated a collection 5 days ago

Cognition

View all activity

Organizations

Ksgk-fy's activity

commented 4 papers 2 months ago

commented 9 papers 3 months ago

Autoregressive Large Language Models are Computationally Universal

Paper • 2410.03170 • Published Oct 4, 2024 • 1 •

Agent-as-a-Judge: Evaluate Agents with Agents

Paper • 2410.10934 • Published Oct 14, 2024 • 18 •

Emergent properties with repeated examples

Paper • 2410.07041 • Published Oct 9, 2024 • 8 •

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 25 •

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 23 •

Intelligence at the Edge of Chaos

Paper • 2410.02536 • Published Oct 3, 2024 • 6 •

Can Models Learn Skill Composition from Examples?

Paper • 2409.19808 • Published Sep 29, 2024 • 9 •

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

Paper • 2409.12192 • Published Sep 18, 2024 • 4 •

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

Paper • 2409.12192 • Published Sep 18, 2024 • 4 •

commented 2 papers 4 months ago

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

Paper • 2409.12903 • Published Sep 19, 2024 • 22 •

Iterative Graph Alignment

Paper • 2408.16667 • Published Aug 29, 2024 • 2 •

New activity in meta-llama/Llama-3.1-8B-Instruct 6 months ago

Tokenizer 'apply_chat_template' issue

#42 opened 6 months ago by

Ksgk-fy

what is the right tokenizer should I use for llama 3.1 8B?

#19 opened 6 months ago by

calebl

New activity in qresearch/llama-3.1-8B-vision-378 6 months ago

Great model!

#1 opened 6 months ago by

Ksgk-fy

New activity in neuralmagic/Mistral-7B-Instruct-v0.3-GPTQ-4bit 8 months ago

Inference GPU Ram requirement >60GB

#1 opened 8 months ago by

Ksgk-fy

commented a paper 9 months ago

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26, 2024 • 78 •