Lirbi
's Collections
AI Math: Diffusion
updated
Controllable Text Generation for Large Language Models: A Survey
Paper
•
2408.12599
•
Published
•
63
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed
Representations
Paper
•
2408.12590
•
Published
•
35
Real-Time Video Generation with Pyramid Attention Broadcast
Paper
•
2408.12588
•
Published
•
15
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
•
2408.11039
•
Published
•
58
MegaFusion: Extend Diffusion Models towards Higher-resolution Image
Generation without Further Tuning
Paper
•
2408.11001
•
Published
•
11
CODE: Confident Ordinary Differential Editing
Paper
•
2408.12418
•
Published
•
4
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its
Teacher
Paper
•
2408.14176
•
Published
•
60
Foundation Models for Music: A Survey
Paper
•
2408.14340
•
Published
•
43
Diffusion Models Are Real-Time Game Engines
Paper
•
2408.14837
•
Published
•
121
Distribution Backtracking Builds A Faster Convergence Trajectory for
One-step Diffusion Distillation
Paper
•
2408.15991
•
Published
•
15
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion
Model
Paper
•
2408.16767
•
Published
•
30
Discrete Diffusion Modeling by Estimating the Ratios of the Data
Distribution
Paper
•
2310.16834
•
Published
•
4
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time
Series Forecasters
Paper
•
2408.17253
•
Published
•
37
Paper
•
2409.00587
•
Published
•
31
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world
Videos
Paper
•
2409.02095
•
Published
•
35
Diffusion Policy Policy Optimization
Paper
•
2409.00588
•
Published
•
19
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper
•
2409.02097
•
Published
•
32
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion
Dependency
Paper
•
2409.02634
•
Published
•
90
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with
Adversarial Conditional Diffusion Distillation
Paper
•
2409.02245
•
Published
•
9
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free
Real Image Editing
Paper
•
2409.01322
•
Published
•
94
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with
Image-Based Surface Representation
Paper
•
2409.03718
•
Published
•
25
Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens
for Text-to-Any-Task
Paper
•
2409.04005
•
Published
•
18
SongCreator: Lyrics-based Universal Song Generation
Paper
•
2409.06029
•
Published
•
21
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Paper
•
2409.06135
•
Published
•
14
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video
Diffusion Models
Paper
•
2409.07452
•
Published
•
20
Instant Facial Gaussians Translator for Relightable and Interactable
Facial Rendering
Paper
•
2409.07441
•
Published
•
10
IFAdapter: Instance Feature Control for Grounded Text-to-Image
Generation
Paper
•
2409.08240
•
Published
•
18
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with
Diffusion Priors
Paper
•
2409.08278
•
Published
•
10
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
Paper
•
2409.08270
•
Published
•
9
Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric
Videos
Paper
•
2409.08353
•
Published
•
10
InstantDrag: Improving Interactivity in Drag-based Image Editing
Paper
•
2409.08857
•
Published
•
31
A Diffusion Approach to Radiance Field Relighting using
Multi-Illumination Synthesis
Paper
•
2409.08947
•
Published
•
11
DrawingSpinUp: 3D Animation from Single Character Drawings
Paper
•
2409.08615
•
Published
•
17
Seed-Music: A Unified Framework for High Quality and Controlled Music
Generation
Paper
•
2409.09214
•
Published
•
49
Phidias: A Generative Model for Creating 3D Content from Text, Image,
and 3D Conditions with Reference-Augmented Diffusion
Paper
•
2409.11406
•
Published
•
25
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper
•
2409.11355
•
Published
•
28
OSV: One Step is Enough for High-Quality Image to Video Generation
Paper
•
2409.11367
•
Published
•
13
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion
Transformer
Paper
•
2409.10819
•
Published
•
18
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction
Paper
•
2409.11211
•
Published
•
8
Single-Layer Learnable Activation for Implicit Neural Representation
(SL^{2}A-INR)
Paper
•
2409.10836
•
Published
•
4
Implicit Neural Representations with Fourier Kolmogorov-Arnold Networks
Paper
•
2409.09323
•
Published
•
5
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Paper
•
2409.09401
•
Published
•
6
Vista3D: Unravel the 3D Darkside of a Single Image
Paper
•
2409.12193
•
Published
•
9
LVCD: Reference-based Lineart Video Colorization with Diffusion Models
Paper
•
2409.12960
•
Published
•
23
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive
Diffusion
Paper
•
2409.12957
•
Published
•
18
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt
Paper
•
2409.12892
•
Published
•
5
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient
Video Latent Generation
Paper
•
2409.12532
•
Published
•
5
FlexiTex: Enhancing Texture Generation with Visual Guidance
Paper
•
2409.12431
•
Published
•
11
MIMO: Controllable Character Video Synthesis with Spatial Decomposed
Modeling
Paper
•
2409.16160
•
Published
•
32
Tabular Data Generation using Binary Diffusion
Paper
•
2409.13882
•
Published
•
3
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language
Instructions
Paper
•
2409.15278
•
Published
•
24
MaterialFusion: Enhancing Inverse Rendering with Material Diffusion
Priors
Paper
•
2409.15273
•
Published
•
11
MaskedMimic: Unified Physics-Based Character Control Through Masked
Motion Inpainting
Paper
•
2409.14393
•
Published
•
9
SpaceBlender: Creating Context-Rich Collaborative Spaces Through
Generative 3D Scene Blending
Paper
•
2409.13926
•
Published
•
5
Self-Supervised Audio-Visual Soundscape Stylization
Paper
•
2409.14340
•
Published
•
2
MuCodec: Ultra Low-Bitrate Music Codec
Paper
•
2409.13216
•
Published
•
23
Portrait Video Editing Empowered by Multimodal Generative Priors
Paper
•
2409.13591
•
Published
•
16
Colorful Diffuse Intrinsic Image Decomposition in the Wild
Paper
•
2409.13690
•
Published
•
13
V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic
Gaussians
Paper
•
2409.13648
•
Published
•
10
Temporally Aligned Audio for Video with Autoregression
Paper
•
2409.13689
•
Published
•
8
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror
Reflections
Paper
•
2409.14677
•
Published
•
15
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense
Prediction
Paper
•
2409.18124
•
Published
•
32
Pixel-Space Post-Training of Latent Diffusion Models
Paper
•
2409.17565
•
Published
•
20
Disco4D: Disentangled 4D Human Generation and Animation from a Single
Image
Paper
•
2409.17280
•
Published
•
9
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D
Diffusion
Paper
•
2409.17145
•
Published
•
13
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors
Paper
•
2409.17058
•
Published
•
11
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
Paper
•
2409.18964
•
Published
•
26
Image Copy Detection for Diffusion Models
Paper
•
2409.19952
•
Published
•
13
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image
Restoration
Paper
•
2410.00418
•
Published
•
10
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D
Semantic MPIs
Paper
•
2410.00337
•
Published
•
11
DressRecon: Freeform 4D Human Reconstruction from Monocular Video
Paper
•
2409.20563
•
Published
•
8
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model
And Input View Curation
Paper
•
2410.00890
•
Published
•
19
Cottention: Linear Transformers With Cosine Attention
Paper
•
2409.18747
•
Published
•
16
Addition is All You Need for Energy-efficient Language Models
Paper
•
2410.00907
•
Published
•
144
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
Paper
•
2410.01731
•
Published
•
16
3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and
Box-Focused Sampling for 3D Object Detection
Paper
•
2410.01647
•
Published
•
28
HarmoniCa: Harmonizing Training and Inference for Better Feature Cache
in Diffusion Transformer Acceleration
Paper
•
2410.01723
•
Published
•
4
Eliminating Oversaturation and Artifacts of High Guidance Scales in
Diffusion Models
Paper
•
2410.02416
•
Published
•
26
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation
Paper
•
2410.01680
•
Published
•
32
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control
Paper
•
2410.00316
•
Published
•
5
VideoGuide: Improving Video Diffusion Models without Training Through a
Teacher's Guide
Paper
•
2410.04364
•
Published
•
28
Presto! Distilling Steps and Layers for Accelerating Music Generation
Paper
•
2410.05167
•
Published
•
15
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal
Instruction
Paper
•
2410.04932
•
Published
•
9
RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion
Models
Paper
•
2409.19989
•
Published
•
17
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep
Approach
Paper
•
2410.03160
•
Published
•
4
SePPO: Semi-Policy Preference Optimization for Diffusion Alignment
Paper
•
2410.05255
•
Published
•
4
IterComp: Iterative Composition-Aware Feedback Learning from Model
Gallery for Text-to-Image Generation
Paper
•
2410.07171
•
Published
•
41
Pyramidal Flow Matching for Efficient Video Generative Modeling
Paper
•
2410.05954
•
Published
•
38
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based
Image/Video Generation
Paper
•
2410.05591
•
Published
•
13
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for
Text-to-Image Diffusion Model Unlearning
Paper
•
2410.05664
•
Published
•
7
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow
Matching
Paper
•
2410.06885
•
Published
•
43
Diversity-Rewarded CFG Distillation
Paper
•
2410.06084
•
Published
•
10
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial
Diffusion and Masked Generative Models
Paper
•
2410.08207
•
Published
•
18
Semantic Score Distillation Sampling for Compositional Text-to-3D
Generation
Paper
•
2410.09009
•
Published
•
14
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image
Generation
Paper
•
2410.08159
•
Published
•
25
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
Paper
•
2410.07303
•
Published
•
18
Progressive Autoregressive Video Diffusion Models
Paper
•
2410.08151
•
Published
•
15
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional
Diffusion Sampler
Paper
•
2410.05651
•
Published
•
13
Animate-X: Universal Character Image Animation with Enhanced Motion
Representation
Paper
•
2410.10306
•
Published
•
54
Cavia: Camera-controllable Multi-view Video Diffusion with
View-Integrated Attention
Paper
•
2410.10774
•
Published
•
25
Semantic Image Inversion and Editing using Rectified Stochastic
Differential Equations
Paper
•
2410.10792
•
Published
•
29
Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies
Paper
•
2410.10803
•
Published
•
6
Efficient Diffusion Models: A Comprehensive Survey from Principles to
Practices
Paper
•
2410.11795
•
Published
•
17
Constant Acceleration Flow
Paper
•
2411.00322
•
Published
•
23
In-Context LoRA for Diffusion Transformers
Paper
•
2410.23775
•
Published
•
11
Minimum Entropy Coupling with Bottleneck
Paper
•
2410.21666
•
Published
•
4
Task Vectors are Cross-Modal
Paper
•
2410.22330
•
Published
•
11
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale
Paper
•
2410.20280
•
Published
•
23
Continuous Speech Synthesis using per-token Latent Diffusion
Paper
•
2410.16048
•
Published
•
29
FasterCache: Training-Free Video Diffusion Model Acceleration with High
Quality
Paper
•
2410.19355
•
Published
•
23
SMITE: Segment Me In TimE
Paper
•
2410.18538
•
Published
•
15
Scaling Diffusion Language Models via Adaptation from Autoregressive
Models
Paper
•
2410.17891
•
Published
•
15
DPLM-2: A Multimodal Diffusion Protein Language Model
Paper
•
2410.13782
•
Published
•
20
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning
via Image-Guided Diffusion
Paper
•
2410.13674
•
Published
•
16
DimensionX: Create Any 3D and 4D Scenes from a Single Image with
Controllable Video Diffusion
Paper
•
2411.04928
•
Published
•
48
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion
Models
Paper
•
2411.05007
•
Published
•
16
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
Paper
•
2411.04989
•
Published
•
14
Controlling Language and Diffusion Models by Transporting Activations
Paper
•
2410.23054
•
Published
•
16
DreamPolish: Domain Score Distillation With Progressive Geometry
Generation
Paper
•
2411.01602
•
Published
•
10
Constrained Diffusion Implicit Models
Paper
•
2411.00359
•
Published
•
6
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Paper
•
2411.02336
•
Published
•
23
Scaling Properties of Diffusion Models for Perceptual Tasks
Paper
•
2411.08034
•
Published
•
13
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model
with Compact Wavelet Encodings
Paper
•
2411.08017
•
Published
•
11
Add-it: Training-Free Object Insertion in Images With Pretrained
Diffusion Models
Paper
•
2411.07232
•
Published
•
63
Edify Image: High-Quality Image Generation with Pixel Space Laplacian
Diffusion Models
Paper
•
2411.07126
•
Published
•
28
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D
Generation
Paper
•
2411.08033
•
Published
•
22
Generative World Explorer
Paper
•
2411.11844
•
Published
•
75
Stylecodes: Encoding Stylistic Information For Image Generation
Paper
•
2411.12811
•
Published
•
11
Stable Flow: Vital Layers for Training-Free Image Editing
Paper
•
2411.14430
•
Published
•
14
Style-Friendly SNR Sampler for Style-Driven Generation
Paper
•
2411.14793
•
Published
•
36
Material Anything: Generating Materials for Any 3D Object via Diffusion
Paper
•
2411.15138
•
Published
•
42
DreamRunner: Fine-Grained Storytelling Video Generation with
Retrieval-Augmented Motion Adaptation
Paper
•
2411.16657
•
Published
•
17
One Diffusion to Generate Them All
Paper
•
2411.16318
•
Published
•
26
OminiControl: Minimal and Universal Control for Diffusion Transformer
Paper
•
2411.15098
•
Published
•
53
Novel View Extrapolation with Video Diffusion Priors
Paper
•
2411.14208
•
Published
•
10
TEXGen: a Generative Diffusion Model for Mesh Textures
Paper
•
2411.14740
•
Published
•
15
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
Paper
•
2411.18613
•
Published
•
50
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Paper
•
2411.17440
•
Published
•
35
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous
Driving
Paper
•
2411.15139
•
Published
•
15
Diffusion Self-Distillation for Zero-Shot Customized Image Generation
Paper
•
2411.18616
•
Published
•
15
Omegance: A Single Parameter for Various Granularities in
Diffusion-Based Synthesis
Paper
•
2411.17769
•
Published
•
7