Experimental NSFW model based on Illustrious, finetuned on a curated dataset of furry art and e621 tags by training and merging my own LoRAs with it incrementally.
Tags used for captioning are generated using toynya/Z3D-E621-Convnext vision model.
Prompt with e621 tags, without the underscores and with a \ before every parentheses, e.g. solo, anthro, male, outdoors, detailed background, shorts, fox, canid, digital media \(artwork\)
For CFG I use a value between 6-7.5, and around 30-40 steps. Sampler should be Euler A and I've found the best results from using karras or beta scheduler.
The first version works best with longish lists of tags; aim for at least 77 tokens if possible.