Melisa Russak's picture

8 27 3

Melisa Russak

melisa

·

melisa-writer

AI & ML interests

I love definitions

Organizations

melisa's activity

upvoted a paper 22 days ago

Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published 23 days ago • 63

upvoted 2 papers about 1 month ago

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 40

Adaptive Decoding via Latent Preference Optimization

Paper • 2411.09661 • Published Nov 14, 2024 • 10

upvoted an article about 1 month ago

Article

Fine-tuning LLMs with Singular Value Decomposition

By

•

Jun 2, 2024

• 8

upvoted a paper about 2 months ago

Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published Nov 13, 2024 • 43

upvoted 2 papers 2 months ago

Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse

Paper • 2410.21333 • Published Oct 27, 2024 • 10

Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation

Paper • 2410.18565 • Published Oct 24, 2024 • 44

upvoted 2 papers 3 months ago

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 89

Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published Sep 30, 2024 • 53

upvoted 5 papers 4 months ago

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5, 2024 • 88

ContextCite: Attributing Model Generation to Context

Paper • 2409.00729 • Published Sep 1, 2024 • 13

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29, 2024 • 52

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27, 2024 • 37

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 138

upvoted an article 4 months ago

Article

Using Writer Framework with Hugging Face Spaces

By

•

Aug 20, 2024

• 30

upvoted a paper 6 months ago

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 86

upvoted 4 papers 7 months ago

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 37

Zamba: A Compact 7B SSM Hybrid Model

Paper • 2405.16712 • Published May 26, 2024 • 22

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 50