RouWei

Name: RouWei - 0.6 epsilon
Rating: 0 (0 reviews)
Author: Enaki

CHECKPOINT

Reprint

Enaki

Updated: Nov 15, 2024 7:57 PM

# Large scale finetune of Illustrious with state of the art techniques and performance.

(tl/dr: Works exactly as it should without flaws you might encounter in other checkpoints.)

## Key advantages:

* Easy and convenient prompting

* Great aesthetic, anatomy, stability along with versatility

* Vibrant colors and smooth gradients without trace of burning

* Full brightness range even with epsilon

* 22k+ artist styles, many general styles, almost any character

## An addition to mentioned, comparing with vanilla Illustrious and NoobAI:

* No more annoying watermarks

* No tags bleed ([1](https://files.catbox.moe/4y2851.jpg), [2](https://files.catbox.moe/fpuo7l.jpg))

* No characters bleed and related side effects (unwanted outfits, style, composition changes)

* No spawning of strange creatures, sfx on background or extra pair of breasts ([1](https://files.catbox.moe/hwobss.jpg), [2](https://files.catbox.moe/gthl5w.jpg))

* Better coherence ([1](https://files.catbox.moe/lj53dr.png), [2](https://files.catbox.moe/3pcnku.png)), prompt following, anatomy (significant boost over illustrious, slight or neglectable over noob)

* Artist styles look exactly as they should (and lots of new added)

* Better prompt following without ignoring tags and need of (higher weights:1.4)

* Forget about long scizo-negative

* Stable style without random fluctuations on different seeds

* New characters

Large well balanced dataset of 4.5M pictures (0.8M with natural text captions) picked from over 12M of different arts, significantly reworked TE and parts of UNET, innovative training approaches. All this in combination with great base model (despite variety of problems illustrious is currently the best base for anime) made it possible to create a checkpoint that would meet modern demands and show unique results.

Dataset cut-off - September 2024.

# Features and prompting:

It works good both with short-simple and long-complex prompts. However, if there are contradictory or weird tags and concepts - they won't be ignored affecting the output. No guide-rails, no safeguards, no lobotomy, consider pruning scizo-prompts.

Dataset contains only booru-style tags and (simplified) natural text expressions. Despite having a share of furries, all captions have been converted to classic booru style to avoid a number of problems that may arise when mixing different systems. So e621 tags won't be understanded properly.

## Basic:

~1 megapixel for txt2img, any AR with resolution multiple of 64 (1024x1024, 1152x, 1216x832,...). Euler_a, CFG 4..9 (5-7 is best), 20..28steps. Sigmas multiply may improve results a bit, LCM/PCM and exotic samplers untested. Highresfix - x1.5 latent + denoise 0.6 or any gan + denoise 0.3..0.55.

## Quality classification:

Only 4 quality tags:

```

masterpiece, best quality, low quality, worst quality

```

Nothing else. Meta tags like lowres have been removed, do not use them. Low resolution images have been either removed or upscaled and cleaned with DAT depending on their importance.

## Negative prompt:

```

worst quality, low quality, watermark

```

That's all, no need of "rusty trombone", "farting on prey" and others. Do not put tags like greyscale, monochrome in negative unless you understand what are you doing. It will lead to burning and over-saturation, colors are fine out of box.

## Artist styles:

[Grids with examples](https://mega.nz/folder/ATYVQbKI#JZOo3_alb9NhZPaTsIVv7g), [list](https://files.catbox.moe/ujxum5.txt) (also can be found in "training data").

Used with "by " it's mandatory. Multiple give very interesting results, can be controlled with prompt weights.

## General styles:

```

2.5d, anime screencap, bold line, sketch, cgi, digital painting, flat colors, smooth shading, minimalistic, ink style, oil style, pastel style

```

## Booru tags styles:

```

1950s (style), 1960s (style), 1970s (style), 1980s (style), 1990s (style), 1990s (style), animification, art nouveau, pinup (style), toon (style), western comics (style), nihonga, shikishi, minimalism, fine art parody

```

and everything from [this group](https://danbooru.donmai.us/wiki_pages/traditional_media).

Can be used in combinations (with artists too), with weights, both in positive and negative prompts.

## Characters:

Use full name booru tag and proper formatting, like "karin_(blue_archive)" -> "karin \(blue_archive\)", use skin tags for better reproducing, like "karin \(bunny \(blue_archive\)". Autocomplete extension might be very useful.

## Natural text:

Use it in combination with booru tags, works great. Use only natural text after typing styles and quality tags. Use just booru tags and forget about it, it's all up to you.

Dataset contains over 800k of pitures with hybrid natural-text captions made by Opus-Vision, GPT-4o and [ToriiGate](https://huggingface.co/Minthy/ToriiGate-v0.3)

## Lots of Tail/Ears-related concepts:

```

tail censor, holding own tail, hugging own tail, holding another's tail, tail grab, tail raised, tail down, ears down, hand on own ear, tail around own leg, tail around penis, tail through clothes, tail under clothes, lifted by tail, tail biting, tail insertion, tail masturbation, holding with tail, ...

```

# Brightness/colors/contrast:

You can use extra meta tags to control it:

```

low brightness, high brightness, low gamma, high gamma, sharp colors, soft colors, hdr, sdr, limited range

```

[Example](https://files.catbox.moe/lcujwk.jpg)

They work both in epsilon and vpred version and works really good.

Unfortunately, there is an issue - the model relies on them too much. Without low brightness or low gamma or limited range (in negative) it might be difficult to achieve true 0,0,0 black, the same often true for white.

Both epsilon and vpred versions have true zsnr, full range of colors and brightness without common flaws observed. But they behave differently, just try it.

# Vpred version

It is experimental. There is something wrong with token padding (probably) in the vpred version, either with the model or on the inference side. If you got broken washed out pictures [like this](https://files.catbox.moe/m46qj5.png) - put BREAK somewhere on the prompt. This is not happening on dark or bright pictures, to be investigated. Or just use the epsilon version, it already provides full range and great experience.

Otherwise, at the moment of release, this is probably the only vpred model that runs okay and doesn't suffer from burned colors, limited range, need of extra tweaks, rescales, adjustments and so on (default parameters: [1](https://files.catbox.moe/1w369r.jpg), [2](https://files.catbox.moe/2lgy3n.png), cfg rescale: [1](https://files.catbox.moe/jq5yaj.jpg), [2](https://files.catbox.moe/7dxunf.png), [3](https://files.catbox.moe/doeiqy.png)). It even tends to have similar NAI3 behavior with wrong skin colors and large fillups with red/yellow/blue under specific prompts. Full experience lmao.

To launch the vpred version you will need a dev build of A1111, comfy (with a special loader node) or Reforge. Just use the same parameters (Euler a, cfg 5..7, 20..28 steps) like epsilon. Cfg rescale is not mandatory but you can try it and choose if you like the results.

As was mentioned above, to get full black or full white fill you will need to write a prompt longer than a single tag or use brightness meta-tags.

# Known issues:

Of course, there are:

* As mentioned, the model relies too much on brightness meta tags, so you'll have to use them to get full performance

* Vpred version has problems with chunks padding or something else, solved with BREAK

* Inferior in furry-related knowledge compared to NoobAi

* Some cherry-picked character datasets have prompting issues - Yozora and a few cute fox-girls are not consistent

* A little small details polishing finetune or lora would be nice, it's up to the community

* To be discovered

### Requests for artists/characters in future models are open. If you find an artist/character/concept that performs weakly, inaccurately, or has a strong watermark - please report, will add them explicitly. Follow for new versions.

## [JOIN THE DISCORD SERVER](https://discord.gg/ZXHENAhqE9)

## License:

Same as illustrious. Feel free to use in your merges, finetunes, etc. Just please leave a link.

## How it's made

I'll consider making a report or something like it later.

In short, 98% of the work is related to dataset preparations. Instead of blindly relying on loss-weighting based on tag frequency from the nai paper, a custom guided loss-weighting implementation along with an asynchronous collator for balancing have been used. Ztsnr (or close to it) with Epsilon prediction was achieved using noise scheduler augmentation.

# Thanks:

First of all, I'd like to acknowledge everyone who supports open source and develops and improves code. Thanks to the authors of illustrious for releasing the model, thanks to the NoobAI team for being pioneers in open finetuning of such a scale, sharing experience, raising and solving issues that previously went unnoticed.

### Personal:

Artists wish to remain anonymous for sharing private works; Soviet Cat - GPU sponsoring; Sv1. - llm access, captioning, code; K. - training code; Bakariso - datasets, testing, advices, insides; NeuroSenko - donations, testing, code; T.,[] - datasets, testing, advices; rred, dga, Fi., ello - donations; other fellow brothers that helped. Love you so much ❤️.

And of course, everyone who made feedback and requests, it's really valuable.

If I forgot to mention anyone, please notify.

## Donations

If you want to support - share my models, leave feedback, make a cute picture with a kemonomimi-girl. And of course, support original artists.

AI is my hobby, I'm spending money on it and not begging for donations. However, it has turned into a large-scale and expensive undertaking. Consider supporting to accelerate new training and research.

(Just keep in mind that I can waste it on alcohol or cosplay girls)

BTC: bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c

ETH/USDT(e): 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db

If you can offer gpu-time (a100+) - PM.

The author of this model has joined the platform. Please visit the original creator's page for the official version and the latest updates.

View duplicate model

Version Detail

Uploaded

Nov 15, 2024 7:57 PM

Base Model

Illustrious

Project Permissions

Model reprinted from : https://civitai.com/models/950531/rouwei

Reprinted models are for communication and learning purposes only, not for commercial use. Original authors can contact us to transfer the models through our Discord channel --- #claim-models.

RouWei

Version Detail

Project Permissions

Related Posts