Symmetry, Vol. 17, Pages 290: Multi-Source Training-Free Controllable Style Transfer via Diffusion Models
Symmetry doi: 10.3390/sym17020290
Authors:
Cuihong Yu
Cheng Han
Chao Zhang
Diffusion models, as representative models in the field of artificial intelligence, have made significant progress in text-to-image synthesis. However, studies of style transfer using diffusion models typically require a large amount of text to describe semantic content or specific painting attributes, and the style and layout of semantic content in synthesized images are frequently uncertain. To accomplish high-quality fixed content style transfer, this paper adopts text-free guidance and proposes a multi-source, training-free and controllable style transfer method by using single image or video as content input and single or multiple style images as style guidance. To be specific, the proposed method firstly fuses the inversion noise of a content image with that of a single or multiple style images as the initial noise of stylized image sampling process. Then, the proposed method extracts the self-attention mechanism’s query, key, and value vectors from the DDIM inversion process of content and style images and injects them into the stylized image sampling process to improve the color, texture and semantics of stylized images. By setting the hyperparameters involved in the proposed method, the style transfer effect of symmetric style proportion and asymmetric style distribution can be achieved. By comparing with state-of-the-art baselines, the proposed method demonstrates high fidelity and excellent stylized performance, and can be applied to numerous image or video style transfer tasks.
Source link
Cuihong Yu www.mdpi.com