Mix-of-Show

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

Decentralized Multi-Concept Customization by Mix-of-Show

For example, we extend following 14 customized concepts to the pretrained model. 👇 👇 👇

→

Multiple Characters

“ [V ^potter, V ^hermione, holding hands] and V ^thanos, near a castle.”

“ V ^bengio, V ^lecun and V ^hinton, near a lake.”

→

Multiple Objects

“ V ^dogA, V ^cat and V ^dogB, on the grass, under the sunset.”

“ Two V ^chair, V ^vase and V ^table, in a living room.”

→

Multiple Characters and Objects

“ [V ^lecun sit on V ^chair], [V ^bengio sit on V ^chair], V ^vase and V ^table,
in a living room”

“ [V ^potter sit on V ^chair], V ^dogA, V ^cat and V ^table, in a living room.”

→

Multiple Characters, Objects and Scenes

“ [V ^lecun sit on V ^chair], [V ^bengio sit on V ^chair], V ^vase and V ^table,
near V ^pyramid ”

“ [V ^potter, V ^hermione, holding hands], V ^dogA, V ^cat, near V ^rock.”

Abstract

Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community. These models can be easily customized for new concepts using low-rank adaptations (LoRAs). However, the utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We refer to this scenario as decentralized multi- concept customization, which involves single-client concept tuning and center-node concept fusion. In this paper, we propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization, including concept conflicts resulting from existing single-client LoRA tuning and identity loss during model fusion. Mix-of-Show adopts an embedding-decomposed LoRA (ED-LoRA) for single-client tuning and gradient fusion for the center node to preserve the in-domain essence of single concepts and support theoretically limitless concept fusion. Additionally, we introduce regionally controllable sampling, which extends spatially controllable sampling (e.g., ControlNet and T2I-Adaptor) to address attribute binding and missing object problems in multi-concept sampling. Extensive experiments demonstrate that Mix-of-Show is capable of composing multiple customized concepts with high fidelity, including characters, objects, and scenes.

Main Observation

Method Overview - (tune&fuse)

In single-client concept tuning, Mix-of-Show (ED-LoRA) extends the embedding expressiveness, and tackle the embedding-LoRA co-adaptation issue. ED-LoRA preserves more in-domain essence of given concept in embedding and reduce concept conflict.
In center-node concept fusion, Mix-of-Show (gradient fusion) can fuse multiple concept ED-LoRAs and align its single-concept inference behavior, without accessing to the training data. Gradient fusion tackles the identity loss of weight fusion in previous LoRA merge.

Method Overview - (sample)

For complex compositions, stable diffusion models often encounter challenges such as missing object and attribute binding. To address this, we introduce regionally controllable sampling, which builds upon spatial controllable sampling (e.g., ControlNet and T2i-Adaptor). This approach allows us to assign region prompts and attributes by rewriting features in the cross-attention mechanism. By leveraging regionally controllable sampling in conjunction with our Mix-of-Show framework, we can achieve complex compositions using multiple customized concepts.

Result Summarization

Single-Concept Sampling from (single-concept tuned model/multi-concept fused model)

Multi-Concept Sampling (semantically-distinct subjects, without spatial condition)

Multi-Concept Sampling (complex composition, with spatial condition)

Bibtex


    @article{gu2023mixofshow,
        title={Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models},
        author={Gu, Yuchao and Wang, Xintao and Wu, Jay Zhangjie and Shi, Yujun and Chen Yunpeng and Fan, Zihan and Xiao, Wuyou and Zhao, Rui and Chang, Shuning and Wu, Weijia and Ge, Yixiao and Shan Ying and Shou, Mike Zheng},
        journal={arXiv preprint arXiv:2305.18292},
        year={2023}
    }