Update README.md
Browse files
README.md
CHANGED
@@ -34,14 +34,14 @@ are semantically meaningful. We also show that APTP can automatically discover p
|
|
34 |
|
35 |
|
36 |
<p align="center">
|
37 |
-
<img src="
|
38 |
</p>
|
39 |
<p align="left">
|
40 |
<em>APTP: We prune a text-to-image diffusion model like Stable Diffusion (left) into a mixture of efficient experts (right) in a prompt-based manner. Our prompt router routes distinct types of prompts to different experts, allowing experts' architectures to be separately specialized by removing layers or channels.</em>
|
41 |
</p>
|
42 |
|
43 |
<p align="center">
|
44 |
-
<img src="
|
45 |
</p>
|
46 |
<p align="left">
|
47 |
<em>APTP pruning scheme. We train the prompt router and the set of architecture codes to prune a T2I diffusion model into a mixture of experts. The prompt router consists of three modules. We use a Sentence Transformer as the prompt encoder to encode the input prompt into a representation z. Then, the architecture predictor transforms z into the architecture embedding e that has the same dimensionality as architecture codes. Finally, the router routes the embedding e into an architecture code a(i). We use optimal transport to evenly distribute the prompts in a training batch among the architecture codes. The architecture code a(i) = (u(i), v(i)) determines pruning the model’s width and depth. We train the prompt router’s parameters and architecture codes in an end-to-end manner using the denoising objective of the pruned model L<sub>DDPM</sub>, distillation loss between the pruned and original models L<sub>distill</sub>, average resource usage for the samples in the batch R, and contrastive objective L<sub>cont</sub>, encouraging embeddings e preserving semantic similarity of the representations z.</em>
|
|
|
34 |
|
35 |
|
36 |
<p align="center">
|
37 |
+
<img src="assets/fig_1.gif" alt="APTP Overview" width="600" />
|
38 |
</p>
|
39 |
<p align="left">
|
40 |
<em>APTP: We prune a text-to-image diffusion model like Stable Diffusion (left) into a mixture of efficient experts (right) in a prompt-based manner. Our prompt router routes distinct types of prompts to different experts, allowing experts' architectures to be separately specialized by removing layers or channels.</em>
|
41 |
</p>
|
42 |
|
43 |
<p align="center">
|
44 |
+
<img src="assets/fig_2.gif" alt="APTP Pruning Scheme" width="600" />
|
45 |
</p>
|
46 |
<p align="left">
|
47 |
<em>APTP pruning scheme. We train the prompt router and the set of architecture codes to prune a T2I diffusion model into a mixture of experts. The prompt router consists of three modules. We use a Sentence Transformer as the prompt encoder to encode the input prompt into a representation z. Then, the architecture predictor transforms z into the architecture embedding e that has the same dimensionality as architecture codes. Finally, the router routes the embedding e into an architecture code a(i). We use optimal transport to evenly distribute the prompts in a training batch among the architecture codes. The architecture code a(i) = (u(i), v(i)) determines pruning the model’s width and depth. We train the prompt router’s parameters and architecture codes in an end-to-end manner using the denoising objective of the pruned model L<sub>DDPM</sub>, distillation loss between the pruned and original models L<sub>distill</sub>, average resource usage for the samples in the batch R, and contrastive objective L<sub>cont</sub>, encouraging embeddings e preserving semantic similarity of the representations z.</em>
|