vladbogo
's Collections
Papers to read
updated
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward
Finetuning of Diffusion Models
Paper
•
2402.08714
•
Published
•
11
Data Engineering for Scaling Language Models to 128K Context
Paper
•
2402.10171
•
Published
•
23
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper
•
2402.10893
•
Published
•
10
Coercing LLMs to do and reveal (almost) anything
Paper
•
2402.14020
•
Published
•
12
OpenCodeInterpreter: Integrating Code Generation with Execution and
Refinement
Paper
•
2402.14658
•
Published
•
82
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper
•
2402.14289
•
Published
•
19
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
•
2402.13753
•
Published
•
114
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Paper
•
2402.16822
•
Published
•
15
Beyond Language Models: Byte Models are Digital World Simulators
Paper
•
2402.19155
•
Published
•
49
Finetuned Multimodal Language Models Are High-Quality Image-Text Data
Filters
Paper
•
2403.02677
•
Published
•
16
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Paper
•
2402.11753
•
Published
•
5
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper
•
2403.04732
•
Published
•
19
Evaluating and Mitigating Discrimination in Language Model Decisions
Paper
•
2312.03689
•
Published
•
1
How predictable is language model benchmark performance?
Paper
•
2401.04757
•
Published
•
2
PersonaLLM: Investigating the Ability of Large Language Models to
Express Personality Traits
Paper
•
2305.02547
•
Published
•
7
Is Cosine-Similarity of Embeddings Really About Similarity?
Paper
•
2403.05440
•
Published
•
3
Multistep Consistency Models
Paper
•
2403.06807
•
Published
•
14
LLM Task Interference: An Initial Study on the Impact of Task-Switch in
Conversational History
Paper
•
2402.18216
•
Published
•
1
V3D: Video Diffusion Models are Effective 3D Generators
Paper
•
2403.06738
•
Published
•
28
Paper
•
2401.04088
•
Published
•
158
RAFT: Adapting Language Model to Domain Specific RAG
Paper
•
2403.10131
•
Published
•
67
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image
Generation
Paper
•
2403.16990
•
Published
•
25
Can large language models explore in-context?
Paper
•
2403.15371
•
Published
•
32
DreamLIP: Language-Image Pre-training with Long Captions
Paper
•
2403.17007
•
Published
•
1
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision
Language Models
Paper
•
2403.20331
•
Published
•
14
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact
Language Model
Paper
•
2404.01331
•
Published
•
25
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction
Paper
•
2404.02905
•
Published
•
65
Stream of Search (SoS): Learning to Search in Language
Paper
•
2404.03683
•
Published
•
29
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
Understanding
Paper
•
2404.05726
•
Published
•
20
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
•
2404.07143
•
Published
•
104
Lost in Translation: Modern Neural Networks Still Struggle With Small
Realistic Image Transformations
Paper
•
2404.07153
•
Published
•
1
ControlNet++: Improving Conditional Controls with Efficient Consistency
Feedback
Paper
•
2404.07987
•
Published
•
47
Ferret-v2: An Improved Baseline for Referring and Grounding with Large
Language Models
Paper
•
2404.07973
•
Published
•
30
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
Controls to Any Diffusion Model
Paper
•
2404.09967
•
Published
•
20
MeshLRM: Large Reconstruction Model for High-Quality Mesh
Paper
•
2404.12385
•
Published
•
26
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper
•
2404.12803
•
Published
•
29
Groma: Localized Visual Tokenization for Grounding Multimodal Large
Language Models
Paper
•
2404.13013
•
Published
•
30
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension
and Generation
Paper
•
2404.14396
•
Published
•
18
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper
•
2404.14047
•
Published
•
44
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper
•
2404.14507
•
Published
•
21
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Paper
•
2404.19427
•
Published
•
71
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
•
2405.00732
•
Published
•
118
Corrective Retrieval Augmented Generation
Paper
•
2401.15884
•
Published
•
3
Observational Scaling Laws and the Predictability of Language Model
Performance
Paper
•
2405.10938
•
Published
•
11
Your Transformer is Secretly Linear
Paper
•
2405.12250
•
Published
•
149
Diffusion for World Modeling: Visual Details Matter in Atari
Paper
•
2405.12399
•
Published
•
28
LANISTR: Multimodal Learning from Structured and Unstructured Data
Paper
•
2305.16556
•
Published
•
2
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Paper
•
2405.11273
•
Published
•
17
Not All Language Model Features Are Linear
Paper
•
2405.14860
•
Published
•
39
Stacking Your Transformers: A Closer Look at Model Growth for Efficient
LLM Pre-Training
Paper
•
2405.15319
•
Published
•
25
Paper
•
2405.18407
•
Published
•
46
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper
•
2405.20340
•
Published
•
19
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
Paper
•
2405.20335
•
Published
•
17
LLMs achieve adult human performance on higher-order theory of mind
tasks
Paper
•
2405.18870
•
Published
•
17
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper
•
2406.00888
•
Published
•
30
Guiding a Diffusion Model with a Bad Version of Itself
Paper
•
2406.02507
•
Published
•
15
Self-Improving Robust Preference Optimization
Paper
•
2406.01660
•
Published
•
18
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with
LLM
Paper
•
2406.02884
•
Published
•
15
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper
•
2406.02657
•
Published
•
37
Proofread: Fixes All Errors with One Tap
Paper
•
2406.04523
•
Published
•
12
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio
Understanding in Video-LLMs
Paper
•
2406.07476
•
Published
•
32
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation
in Videos
Paper
•
2406.08407
•
Published
•
24
Large Language Model Unlearning via Embedding-Corrupted Prompts
Paper
•
2406.07933
•
Published
•
7
An Image is Worth More Than 16x16 Patches: Exploring Transformers on
Individual Pixels
Paper
•
2406.09415
•
Published
•
50
TextGrad: Automatic "Differentiation" via Text
Paper
•
2406.07496
•
Published
•
27
Make It Count: Text-to-Image Generation with an Accurate Number of
Objects
Paper
•
2406.10210
•
Published
•
76
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context
Reinforcement Learning
Paper
•
2406.08973
•
Published
•
86
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and
Instruction-Tuning Dataset for LVLMs
Paper
•
2406.11833
•
Published
•
61
mDPO: Conditional Preference Optimization for Multimodal Large Language
Models
Paper
•
2406.11839
•
Published
•
37
How Do Large Language Models Acquire Factual Knowledge During
Pretraining?
Paper
•
2406.11813
•
Published
•
30
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
•
2406.08464
•
Published
•
65
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective
Distillation and Unlabeled Data Augmentation
Paper
•
2406.12849
•
Published
•
49
Self-MoE: Towards Compositional Large Language Models with
Self-Specialized Experts
Paper
•
2406.12034
•
Published
•
14
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN
Inversion and High Quality Image Editing
Paper
•
2406.10601
•
Published
•
65
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems
Paper
•
2406.14972
•
Published
•
7
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
Paper
•
2406.13457
•
Published
•
16
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large
Language Models as Decision Makers
Paper
•
2406.12430
•
Published
•
7
Weight subcloning: direct initialization of transformers using larger
pretrained ones
Paper
•
2312.09299
•
Published
•
17
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended
Multi-Doc QA
Paper
•
2406.17419
•
Published
•
17
Large Language Models Assume People are More Rational than We Really are
Paper
•
2406.17055
•
Published
•
4
DreamBench++: A Human-Aligned Benchmark for Personalized Image
Generation
Paper
•
2406.16855
•
Published
•
54
Video-Infinity: Distributed Long Video Generation
Paper
•
2406.16260
•
Published
•
28
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in
Large Video-Language Models
Paper
•
2406.16338
•
Published
•
25
Adam-mini: Use Fewer Learning Rates To Gain More
Paper
•
2406.16793
•
Published
•
67
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and
Understanding
Paper
•
2406.19389
•
Published
•
52
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Paper
•
2406.20095
•
Published
•
17
MagMax: Leveraging Model Merging for Seamless Continual Learning
Paper
•
2407.06322
•
Published
•
1
A Single Transformer for Scalable Vision-Language Modeling
Paper
•
2407.06438
•
Published
•
1
Video Diffusion Alignment via Reward Gradients
Paper
•
2407.08737
•
Published
•
48
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper
•
2407.09025
•
Published
•
130
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled
Refusal Training
Paper
•
2407.09121
•
Published
•
5
GAVEL: Generating Games Via Evolution and Language Models
Paper
•
2407.09388
•
Published
•
15
Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using
Large-scale Public Data
Paper
•
2407.08726
•
Published
•
8
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal
Perception
Paper
•
2407.08303
•
Published
•
17
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context
Window?
Paper
•
2407.11963
•
Published
•
43
How do Large Language Models Navigate Conflicts between Honesty and
Helpfulness?
Paper
•
2402.07282
•
Published
•
1
Fewer Truncations Improve Language Modeling
Paper
•
2404.10830
•
Published
•
3
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert
Prompts
Paper
•
2407.00256
•
Published
•
1
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Paper
•
2403.00409
•
Published
•
1
Efficient Exploration for LLMs
Paper
•
2402.00396
•
Published
•
21
Text2SQL is Not Enough: Unifying AI and Databases with TAG
Paper
•
2408.14717
•
Published
•
24
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized
Academic Assistance
Paper
•
2409.04593
•
Published
•
23