A bit of my experience with making AI-generated images and LoRAs ( 3 )


Updated:

https://tensor.art/articles/868883505357024765 ( 1 )

https://tensor.art/articles/868883998204559176 ( 2 )

https://tensor.art/articles/868885754846123117 ( 4 )

https://tensor.art/articles/868890182957418586 ( 5 )

Alright, let’s talk about LoRA—many things in AI image generation really need to be discussed around it.

 

But before that, I suppose it’s time for a bit of preamble again.

 

LoRA, in my view, is the most captivating technology in AI image generation. Those styles—whether they’re imitations or memes. Those characters—one girl in a hundred different outfits, or the body of that boy you’re madly in love with. A large part of the copyright debate surrounding AI actually stems from LoRA, though people who aren’t familiar with AI might not realize this. In reality, it has hurt many people—but it has also captured many hearts.

 

When you suddenly see an image of a boy that no one on any social media platform, in any language, is talking about—don’t you feel a sense of wonder? And when you find out that the image was created with LoRA, doesn’t your heart skip a beat?

 

By the time you’re reading this, my first LoRA for Ragun Kyoudai has already been released. From the moment I had even the slightest thought of making a LoRA, I was determined that they had to be the first—the absolute first.

 

But it wasn’t easy.

 

The full-color illustrations I saved of them as a kid? Gone, thanks to broken hard drives and lost phones. The images you can find online now are barely 200x300 in resolution, and there are painfully few of them. I still remember the composition and poses of every single color illustration from 20 years ago, but in the internet of 2024, they’ve completely disappeared.

 

All I had left were the manga and its covers, CDs, and cards.

 

Could it be done?

 

While searching for LoRA training tutorials and preparing the dataset for training, more and more doubts formed in my mind. Because of the art style, these images didn’t contain accurate anatomical structures. There weren’t multi-angle views—especially not from behind. Compared to datasets sourced from anime, mine felt pitifully incomplete.

 

Still, I nervously gave it a first try.

 

The result was surprising—AI managed to reproduce the facial features of the characters quite well. But it was basically just close-up shots. On the base model used for training, the generated images were completely unrecognizable outside the face. Switching to other derivative models, the characters no longer resembled themselves at all.

 

So was it that AI couldn’t do it? Or was I the one who couldn’t? Or was it simply impossible to create a LoRA with such a flawed dataset?

 

I decided to set it aside for the time being, since with my limited experience, it was hard to make a solid judgment.

 

Later, while generating AI images, I began using LoRAs made by various creators. I wanted to know what differences existed between LoRAs—aside from the characters themselves.

 

I didn’t discover many differences, but I did notice a lot of recurring bugs. That’s when I realized—I’d found a lead. Maybe understanding the causes of these bugs is the key to improving LoRA training.

 

So let’s talk about it: What are these bugs? What do I think causes them? How can we minimize them during image generation? How can we reverse-engineer them to improve LoRA training?

 

Just to clarify—as you know, these experiences are only based on LoRAs of boy characters. Not girls, and not those overly bara-styled characters either.

 

1. Overexposure

2. Feminization

3. On the base model used to train the LoRA (e.g., Pony, Illustrious), it doesn’t work properly: prompts struggle to change character poses or expressions; it’s impossible to generate multi-angle images like side or front views; eyes remain blurry even in close-ups; body shapes are deformed; figures become flat like paper; body proportions fluctuate uncontrollably.

4. Because of the above, many LoRAs only work on very specific checkpoints.

5. Even on various derivative checkpoints, key features like the eyes are still missing; the character doesn’t look right, appears more feminine, character traits come and go; regardless of the clothing prompt used, the original costume features are always present.

6. Character blending: when using two character LoRAs, it’s hard to distinguish between them—let alone using more than two.

7. Artifacts: most notably, using a white background often results in messy, chaotic backgrounds, strange character silhouettes, and even random monsters from who-knows-where.

8. Sweat—and lots of sweat.

9. I haven’t thought of the rest yet. I’ll add more as I write.

 

All of these issues stem from one core cause: the training datasets used for LoRAs are almost never manually tagged.

 

Selecting and cropping the images for your dataset may take only 1% of the time spent. Setting the training parameters and clicking “train”? Barely worth mentioning.

 

The remaining 99% of the effort should go into manually tagging each and every image.

But in reality, most people only use an auto-tagger to label the images, then bulk-edit them to add the necessary trigger words or delete unnecessary ones. Very few go in and manually fix each tag. Even fewer take the time to add detailed, specific tags to each image.

AI will try to identify and learn every element in each image. When certain visual elements aren’t tagged, there’s a chance the AI will associate them with the tagged elements, blending them together.

The most severe case of this kind of contamination happens with white backgrounds.

You spent so much effort capturing, cropping, cleaning, and processing animation frames or generating OC images. When you finally finish training a LoRA and it works, you’re overjoyed. Those “small bugs” don’t seem to matter.

But as you keep using it, they bother you more and more.

So you go back and create a larger dataset. You set repeats to 20, raise epochs to 30, hoping the AI will learn the character more thoroughly.

But is the result really what you wanted?

After pouring in so much effort and time, you might have no choice but to tell yourself, “This is the result I was aiming for.”

Yet the overexposure is worse. The feminization is worse. There are more artifacts. The characters resemble themselves even less.

Why?

Because the untagged elements from the training images become more deeply ingrained in the model through overfitting.

So now it makes sense:

Why there's always overexposure: modern anime tends to overuse highlights, and your dataset probably lacks any tag information about lighting.

Why it's so hard to generate multi-angle shots, and why character sizes fluctuate wildly: because your dataset lacks tags related to camera position and angle.

Why the character becomes more feminine: perhaps your tags inadvertently included terms like 1girl or ambiguous gender.

Why certain actions or poses can't be generated: because tags describing body movement are missing, and the few that exist are overfitted and rigid.

In short:

Elements that are tagged get learned as swappable; elements that are untagged get learned as fixed.

That may sound counterintuitive or even go against common sense—but it’s the truth.

This also explains why, when using two character LoRAs together, they often blend: because tags for traits like eye color, hair color, hairstyle, even tiny details like streaks, bangs, short ponytails, facial scars, shark teeth—all of these are written in detail, and the more detailed the tags, the more they influence each other. Because the AI learns them as swappable—not inherent to the character.

And no matter what clothing prompts you use, the same patterns from the original outfit keep showing up—because those patterns were learned under the clothes tag, which the AI considers separate and constant.

LoRAs that are overfitted also tend to compete with each other over the same trigger words, fighting for influence.

So, from a usage perspective, some of these bugs can be minimized.

Things like overexposure, feminization, sweat—if you don’t want them, include them in your negative prompts.

For elements like lighting, camera type, and viewing angle—think carefully about your composition, refer to Danbooru-style tags, describe these elements clearly and include them in your positive prompts.

Also, make sure to use more effective samplers, as mentioned earlier.

Use LoRAs that enhance detail but don’t interfere with style—such as NoobAI-XL Detailer. Hand-fixing LoRAs aren’t always effective, and it’s best not to stack too many together.

One final reminder: you usually don’t need to add quality-related prompts. Just follow the guidance provided on the checkpoint’s official page.

5
0