ColorFlow: Retrieval-Augmented Image Sequence Colorization
Abstract
Automatic black-and-white image sequence colorization while preserving character and object identity (ID) is a complex task with significant market demand, such as in cartoon or comic series colorization. Despite advancements in visual colorization using large-scale generative models like diffusion models, challenges with controllability and identity consistency persist, making current solutions unsuitable for industrial application.To address this, we propose ColorFlow, a three-stage diffusion-based framework tailored for image sequence colorization in industrial applications. Unlike existing methods that require per-ID finetuning or explicit ID embedding extraction, we propose a novel robust and generalizable Retrieval Augmented Colorization pipeline for colorizing images with relevant color references. Our pipeline also features a dual-branch design: one branch for color identity extraction and the other for colorization, leveraging the strengths of diffusion models. We utilize the self-attention mechanism in diffusion models for strong in-context learning and color identity matching. To evaluate our model, we introduce ColorFlow-Bench, a comprehensive benchmark for reference-based colorization. Results show that ColorFlow outperforms existing models across multiple metrics, setting a new standard in sequential image colorization and potentially benefiting the art industry. We release our codes and models on our project page: https://zhuang2002.github.io/ColorFlow/.
Community
Paper Link: https://arxiv.org/abs/2412.11815
Project Page: https://zhuang2002.github.io/ColorFlow/
Code: https://github.com/TencentARC/ColorFlow
Online Demo: https://huggingface.co/spaces/TencentARC/ColorFlow
very nice job!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control (2024)
- Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis (2024)
- Diffusion Self-Distillation for Zero-Shot Customized Image Generation (2024)
- MV-Adapter: Multi-view Consistent Image Generation Made Easy (2024)
- LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation (2024)
- Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator (2024)
- DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper