--- license: openrail++ tags: - text-to-image - stable-diffusion - diffusers --- # AnimeBoysXL v3.0 **It takes substantial time and efforts to bake models. If you appreciate my models, I would be grateful if you could support me on [Ko-fi](https://ko-fi.com/koolchh) ☕.** ## Features - ✔️ **Good for inference**: AnimeBoysXL v3.0 is a flexible model which is good at generating images of anime boys and males-only content in a wide range of styles. - ✔️ **Good for training**: AnimeBoysXL v3.0 is suitable for further training, thanks to its neutral style and ability to recognize a great deal of concepts. Feel free to train your own anime boy model/LoRA from AnimeBoysXL. - ❌ AnimeBoysXL v3.0 is not optimized for creating anime girls. Please consider using other models for that purpose. ## Inference Guide - **Prompt**: Use tag-based prompts to describe your subject. - Tag ordering matters. It is highly recommended to structure your prompt with the following templates: ``` 1boy, male focus, character name, series name, anything else you'd like to describe ``` ``` 2boys, male focus, multiple boys, character name(s), series name, anything else you'd like to describe ``` - Append ``` , best quality, amazing quality, best aesthetic, amazing aesthetic, absurdres ``` to the prompt to improve image quality. - (*Optional*) Append ``` , year YYYY ``` to the prompt to shift the output toward the prevalent style of that year. `YYYY` is a 4 digit year, e.g. `, year 2023` - **Negative prompt**: Choose from one of the following two presets. 1. Heavy (*recommended*): ``` lowres, bad, text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts ``` 2. Light: ``` lowres, jpeg artifacts, worst quality, watermark, blurry, bad aesthetic ``` - **VAE**: Make sure you're using [SDXL VAE](https://huggingface.co/stabilityai/sdxl-vae/tree/main). - **Sampling method, sampling steps and CFG scale**: I find **(Euler a, 28, 5)** good. You are encouraged to experiment with other settings. - **Width and height**: **832*1216** for portrait, **1024*1024** for square, and **1216*832** for landscape. ## 🧨Diffusers Example Usage ```python import torch from diffusers import DiffusionPipeline pipe = DiffusionPipeline.from_pretrained("Koolchh/AnimeBoysXL-v3.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16") pipe.to("cuda") prompt = "1boy, male focus, shirt, solo, looking at viewer, smile, black hair, brown eyes, short hair, best quality, amazing quality, best aesthetic, amazing aesthetic, absurdres" negative_prompt = "lowres, bad, text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts" image = pipe( prompt=prompt, negative_prompt=negative_prompt, width=1024, height=1024, guidance_scale=5, num_inference_steps=28 ).images[0] ``` ## Training Details AnimeBoysXL v3.0 is trained from [Pony Diffusion V6 XL](https://civitai.com/models/257749/pony-diffusion-v6-xl), on ~516k images. The following tags are attached to the training data to make it easier to steer toward either more aesthetic or more flexible results. ### Quality tags | tag | score | |-------------------|-----------| | `best quality` | >= 150 | | `amazing quality` | [75, 150) | | `great quality` | [25, 75) | | `normal quality` | [0, 25) | | `bad quality` | (-5, 0) | | `worst quality` | <= -5 | ### Aesthetic tags | tag | |---------------------| | `best aesthetic` | | `amazing aesthetic` | | `great aesthetic` | | `normal aesthetic` | | `bad aesthetic` | ### Rating tags | tag | rating | |-----------------|--------------| | `sfw` | general | | `slightly nsfw` | sensitive | | `fairly nsfw` | questionable | | `very nsfw` | explicit | ### Year tags `year YYYY` where `YYYY` is in the range of [2005, 2023]. ### Training configurations - Hardware: 4 * Nvidia A100 80GB GPUs - Optimizer: AdaFactor - Gradient accumulation steps: 8 - Batch size: 4 * 8 * 4 = 128 - Learning rates: - 8e-6 for U-Net - 5.2e-6 for text encoder 1 (CLIP ViT-L) - 4.8e-6 for text encoder 2 (OpenCLIP ViT-bigG) - Learning rate schedule: constant with 250 warmup steps - Mixed precision training type: FP16 - Epochs: 40 ### Changes from v2.0 - Change the base model from Stable Diffusion XL Base 1.0 to Pony Diffusion V6 XL. - Revamp the dataset's aesthetic tag based on the developer's preference. - Update quality score and aesthetic score criteria. - Use FP16 mixed-precision training. - Train for more epochs.