Kirazuri (Anima)
Version 2 (Latest)
A full finetune of the predominantly trained on high-resolution 1536x1536 AR buckets.
Expanded the dataset with more recent data and included the full dataset used for my previous model
Total training dataset of 35,537 non-synthetic images manually curated including quality and aesthetic ratings with a dataset cutoff now of 2026/04/15.
Training Details
Main training with diffusion-pipe commit: d5b78a2c49a07db8f7d9a4c795e4cfe7ba1c3dfe
Final stage for high-res used fix in commit: b0aa4f1e03169f3280c8518d37570a448420f8be
Samples seen(unbatched steps): ~680,000
Training time: ~220 hrs
Learning Rate: 4e-6 (General Training) and 2e-6 (Aesthetic)
LLM Adaptor Learning Rate: 8e-7 (General Training) and 2e-7 (Aesthetic)
Per-resolution Effective Batch size: 128 (512p), 96 (1024p), and 48 (1536p)
Precision: Mixed BF16
Optimizer: AdamW8bit with Kahan Summation
Weight Decay: 0.01
Timestep Sampling Strategy: Logit-Normal (General Training)
Tag Dropout: 30% with protected first 8 tags
Additional Features used:
Structured dataset by resolutions and manual ratings for staged training
multiscale_loss_weight=0.5 and flux_shift=true for high-resolution training
Mixed Natural Language captions with diffusion-pipe captions.json format:
"image_1.jpg": [ "{tags}", "{first_n_tags}.\n{nl_caption}", "{dropout_tags1}.\n{nl_caption}", "{nl_caption}\n{dropout_tags2}" ]
Recognitions
Thanks to Circlestone Labs for the Anima Preview base model.
Thanks to tdrussell of Circlestone Labs for the diffusion-pipe trainer.
Thanks to bluvoll for support using their fork of diffusion-pipe.
Thanks to narugo1992 and the deepghs team for open-sourcing various training sets, image processing tools, and models.
License
This model is released under the same license as the base model.
See the base model for details of the CircleStone Labs Non-Commercial License.



