Papers to read - a vladbogo Collection

Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

vladbogo 's Collections

AI Paper of the Day

LLMs

Vision

Papers to read

updated Sep 10, 2024

PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

Paper • 2402.08714 • Published Feb 13, 2024 • 11
Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15, 2024 • 23
RLVF: Learning from Verbal Feedback without Overgeneralization

Paper • 2402.10893 • Published Feb 16, 2024 • 10
Coercing LLMs to do and reveal (almost) anything

Paper • 2402.14020 • Published Feb 21, 2024 • 12
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

Paper • 2402.14658 • Published Feb 22, 2024 • 82
TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22, 2024 • 19
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 114
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Paper • 2402.16822 • Published Feb 26, 2024 • 15
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 49
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5, 2024 • 16
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

Paper • 2402.11753 • Published Feb 19, 2024 • 5
How Far Are We from Intelligent Visual Deductive Reasoning?

Paper • 2403.04732 • Published Mar 7, 2024 • 19
Evaluating and Mitigating Discrimination in Language Model Decisions

Paper • 2312.03689 • Published Dec 6, 2023 • 1
How predictable is language model benchmark performance?

Paper • 2401.04757 • Published Jan 9, 2024 • 2
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Paper • 2305.02547 • Published May 4, 2023 • 7
Is Cosine-Similarity of Embeddings Really About Similarity?

Paper • 2403.05440 • Published Mar 8, 2024 • 3
Multistep Consistency Models

Paper • 2403.06807 • Published Mar 11, 2024 • 14
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History

Paper • 2402.18216 • Published Feb 28, 2024 • 1
V3D: Video Diffusion Models are Effective 3D Generators

Paper • 2403.06738 • Published Mar 11, 2024 • 28
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 158
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 67
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Paper • 2403.16990 • Published Mar 25, 2024 • 25
Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22, 2024 • 32
DreamLIP: Language-Image Pre-training with Long Captions

Paper • 2403.17007 • Published Mar 25, 2024 • 1
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

Paper • 2403.20331 • Published Mar 29, 2024 • 14
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29, 2024 • 25
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 65
Stream of Search (SoS): Learning to Search in Language

Paper • 2404.03683 • Published Apr 1, 2024 • 29
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Paper • 2404.05726 • Published Apr 8, 2024 • 20
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 104
Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations

Paper • 2404.07153 • Published Apr 10, 2024 • 1
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11, 2024 • 47
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11, 2024 • 30
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15, 2024 • 20
MeshLRM: Large Reconstruction Model for High-Quality Mesh

Paper • 2404.12385 • Published Apr 18, 2024 • 26
TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Paper • 2404.12803 • Published Apr 19, 2024 • 29
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19, 2024 • 30
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Paper • 2404.14396 • Published Apr 22, 2024 • 18
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22, 2024 • 44
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22, 2024 • 21
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30, 2024 • 71
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29, 2024 • 118
Corrective Retrieval Augmented Generation

Paper • 2401.15884 • Published Jan 29, 2024 • 3
Observational Scaling Laws and the Predictability of Language Model Performance

Paper • 2405.10938 • Published May 17, 2024 • 11
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 149
Diffusion for World Modeling: Visual Details Matter in Atari

Paper • 2405.12399 • Published May 20, 2024 • 28
LANISTR: Multimodal Learning from Structured and Unstructured Data

Paper • 2305.16556 • Published May 26, 2023 • 2
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Paper • 2405.11273 • Published May 18, 2024 • 17
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23, 2024 • 39
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Paper • 2405.15319 • Published May 24, 2024 • 25
Phased Consistency Model

Paper • 2405.18407 • Published May 28, 2024 • 46
MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30, 2024 • 19
Xwin-LM: Strong and Scalable Alignment Practice for LLMs

Paper • 2405.20335 • Published May 30, 2024 • 17
LLMs achieve adult human performance on higher-order theory of mind tasks

Paper • 2405.18870 • Published May 29, 2024 • 17
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Paper • 2406.00888 • Published Jun 2, 2024 • 30
Guiding a Diffusion Model with a Bad Version of Itself

Paper • 2406.02507 • Published Jun 4, 2024 • 15
Self-Improving Robust Preference Optimization

Paper • 2406.01660 • Published Jun 3, 2024 • 18
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Paper • 2406.02884 • Published Jun 5, 2024 • 15
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 37
Proofread: Fixes All Errors with One Tap

Paper • 2406.04523 • Published Jun 6, 2024 • 12
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11, 2024 • 32
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Paper • 2406.08407 • Published Jun 12, 2024 • 24
Large Language Model Unlearning via Embedding-Corrupted Prompts

Paper • 2406.07933 • Published Jun 12, 2024 • 7
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 50
TextGrad: Automatic "Differentiation" via Text

Paper • 2406.07496 • Published Jun 11, 2024 • 27
Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Paper • 2406.10210 • Published Jun 14, 2024 • 76
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Paper • 2406.08973 • Published Jun 13, 2024 • 86
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Paper • 2406.11833 • Published Jun 17, 2024 • 61
mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Paper • 2406.11839 • Published Jun 17, 2024 • 37
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17, 2024 • 30
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 65
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation

Paper • 2406.12849 • Published Jun 18, 2024 • 49
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Paper • 2406.12034 • Published Jun 17, 2024 • 14
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing

Paper • 2406.10601 • Published Jun 15, 2024 • 65
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Paper • 2406.14972 • Published Jun 21, 2024 • 7
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

Paper • 2406.13457 • Published Jun 19, 2024 • 16
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers

Paper • 2406.12430 • Published Jun 18, 2024 • 7
Weight subcloning: direct initialization of transformers using larger pretrained ones

Paper • 2312.09299 • Published Dec 14, 2023 • 17
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

Paper • 2406.17419 • Published Jun 25, 2024 • 17
Large Language Models Assume People are More Rational than We Really are

Paper • 2406.17055 • Published Jun 24, 2024 • 4
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Paper • 2406.16855 • Published Jun 24, 2024 • 54
Video-Infinity: Distributed Long Video Generation

Paper • 2406.16260 • Published Jun 24, 2024 • 28
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Paper • 2406.16338 • Published Jun 24, 2024 • 25
Adam-mini: Use Fewer Learning Rates To Gain More

Paper • 2406.16793 • Published Jun 24, 2024 • 67
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 52
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Paper • 2406.20095 • Published Jun 28, 2024 • 17
MagMax: Leveraging Model Merging for Seamless Continual Learning

Paper • 2407.06322 • Published Jul 8, 2024 • 1
A Single Transformer for Scalable Vision-Language Modeling

Paper • 2407.06438 • Published Jul 8, 2024 • 1
Video Diffusion Alignment via Reward Gradients

Paper • 2407.08737 • Published Jul 11, 2024 • 48
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Paper • 2407.09025 • Published Jul 12, 2024 • 130
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Paper • 2407.09121 • Published Jul 12, 2024 • 5
GAVEL: Generating Games Via Evolution and Language Models

Paper • 2407.09388 • Published Jul 12, 2024 • 15
Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Paper • 2407.08726 • Published Jul 11, 2024 • 8
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Paper • 2407.08303 • Published Jul 11, 2024 • 17
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Paper • 2407.11963 • Published Jul 16, 2024 • 43
How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?

Paper • 2402.07282 • Published Feb 11, 2024 • 1
Fewer Truncations Improve Language Modeling

Paper • 2404.10830 • Published Apr 16, 2024 • 3
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

Paper • 2407.00256 • Published Jun 28, 2024 • 1
Provably Robust DPO: Aligning Language Models with Noisy Feedback

Paper • 2403.00409 • Published Mar 1, 2024 • 1
Efficient Exploration for LLMs

Paper • 2402.00396 • Published Feb 1, 2024 • 21
Text2SQL is Not Enough: Unifying AI and Databases with TAG

Paper • 2408.14717 • Published Aug 27, 2024 • 24
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Paper • 2409.04593 • Published Sep 6, 2024 • 23

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs