[REYApping] How Going A Bit "GungHo" On the Network Ranks Could Help in Flux Training #Halloween2024


Updated:

Hello and welcome to the second edition of REYApping, a space where I write a bunch of nonsense. Without further ado, let's begin.

It's been a while since I first wrote REYApping, and that one stays true to its "nonsense ahh writing". This one though, may be a bit different. I'm going to share a bit of experience with one of my Flux LoRa: Anitrait. Now, I'm not a good creator, in fact I'm actually just a mere user that hates Flux original anime style. That's why I decided to make Anitrait. With that in mind, you might think that how I train LoRa sucks and it's bad practice, but whatever, you can roast me later.

Backstory

The first time I tried to make it was kinda hellish since I got a lot of issues with deformity (especially bad anatomy and hands). I lost quite amount of credits because I'm the type to "do first, think later". The issue isn't just about that, there's also the issue of LoRas not triggering when set to 1 and even more.

After getting broke with numerous attempt at training, I seek help and thankfully a great creator, Riiwa (you must've known this person and you might've even used the AI Tools), reached out to me and send a link about someone's experience in training Flux. The article is on CivitAI made by "mnemic" (it's very easy to find on Google if you tried to search it). That person did something that I've never done: having the network alpha higher than the dim. Very interesting. When I asked about its effect to another great creator: NukeAI about its effect, dude said "It make your image fries a.k.a overfits quickly." or something along the line. But me being a person I am decides to just go with it, and the result is actually interesting.

Testing

I want to compare 3 version of Anitrait. All of that is made by training 50 portrait images generated from AnimagineXL 3.1, cropped to 1024x1024 manually via photoshop, and using the simple category word captioning (also found in the article by mnemic, check that part out, it's interesting). The version I'll be comparing is Beta 2_E5, Beta 3_E6, and the real Beta 3 (I'll just call it B3). Here's more settings:

Beta 2_E5:

  • Network Dim: 64

  • Network Alpha: 32

  • LR Scheduler: Cosine

  • Optimizer: AdamW

Beta 3_E6:

  • Network Dim: 64

  • Network Alpha: 64

  • LR Scheduler: Cosine

  • Optimizer: AdamW

B3:

  • Network Dim: 64

  • Network Alpha: 128

  • LR Scheduler: Cosine

  • Optimizer: AdamW8bit (Isn't trained on Tensor, but I try to match every settings except optimizer since regular AdamW isn't supported).

All of the model above will be tested using a herbalist prompt that I find by clicking the dice button in Tensor's prompt box, all with Euler-normal sampler, 25 steps, seed: 1000001, LoRa weight 0.6, model Flux.1 Dev fp8, T5 fp8, no negative prompt, guidance 3.5, and clip skip 1 at 768 x 1152 resolution.

The prompt:
Within the whimsical realm of an anime style, fantasy themed herbalist shop, a serene herbalist stands amidst shelves stacked with ancient tomes and peculiar botanicals. The camera captures a stunning close-up of her gentle features: piercing light-colored eyes, a caring smile, and long hair framing her arcanist attire adorned with herbalist details. Her hands cradle a delicate glass vial filled with shimmering essence as she gazes directly at the viewer, inviting them into her mystical world. The soft, warm lighting emphasizes the intricate textures of her clothes and the lush greenery surrounding her. In the background, an immersive scenery unfolds, replete with symmetrical details and sharpened clarity, transporting the viewer to a realm of wonder and discovery.

Result

Beta 2_E5

Beta 3_E6

B3

Discussion

Not gonna lie, all three version seems good at interpreting the long af prompt. From the herbalist character, to the object she's holding, to her surrounding, etc. but when we're talking about details, there's quite the difference.

Beta 2_E5 has problems generating good finger (for FLux standard). The face is also a little bit off, but the overall detail is richer than the other 2 version: from the more intricate clothing pattern to her more natural plant looking hair accessories.

Beta 3_E7 still has problem generating good finger, but the overall face detail is better. Another problem to point out is that the glass bottle she's holding is deformed. But overall detail is good, and I actually like the eye color here being yellow than green.

B3 has the best result, not only because it generates a very good finger and hands, it's also more vibrant and not a deformity in sight. It's sharp, face looks good, proportion is good, and her chest somehow got a teeny bit bigger. The only problem is that the yellow thing on her shoulder is only generated on one side, but meh, I'll take it.

Conclusion

Going with network alpha higher than the network dim may help in improving image quality and it may help in fixing certain anatomy. But this result may need more testing from other people since it's only 1 case that sees better result.

Thank you for reading this edition of REYApping. Any feedback is appreciated. See you in the next edition!

4
0