1 https://tensor.art/articles/893578426995184161
2 https://tensor.art/articles/893801244529812650
3 https://tensor.art/articles/893811579294880467
4 https://tensor.art/articles/893820315258335399
5 https://tensor.art/articles/893848146646446019
7 https://tensor.art/articles/894196515738788839
8 https://tensor.art/articles/894202977517102361
To reduce this issue as much as possible,
the AB folder’s weight should be lower than that of the A and B folders.
The closer the weights are, the more confusion you’ll get.
In my tests, a 20%–40% weight for AB compared to A and B is a decent range.
Why a range instead of a fixed value?
Because here comes the trickier problem:
Different characters don’t learn at the same pace.
I mentioned this earlier—solo characters already vary in learning speed,
but in duo images, this difference becomes even more amplified,
which increases the chaos.
Too many training steps, and the traits get overblended.
Too few steps, and the model can’t tell them apart.
In both cases, the result is the same:
mixed traits, with no clear separation between the two characters.
This is just based on my personal testing, but here’s how to tell them apart:
If the characters are swapping traits like hair color or eye color,
it means you’ve trained too much.
If traits like face shape, hairstyle, or eye shape are merged into one,
then it means you’ve trained too little.
And please note—this is very important!—
There’s a minimum image count required, as mentioned earlier:
“A = B ≥ 20, AB ≥ 15.”
Based on testing, anything below this count makes weight ratios meaningless.
At that point, the model simply doesn’t have enough information to learn.
Let me emphasize again:
If the characters you're trying to train always appear together in pairs,
and you have plenty of material—say, it’s easy to gather over 100 duo images—
then you don’t need to follow the method in this article at all.
Just throw in a few solo images alongside and train the LoRA normally.
The more material you have and the more diverse it is,
the better the AI can learn to distinguish characters across different contexts—
which drastically reduces the error rate.
This article is aimed at situations where you only have two solo characters,
and you lack enough duo images or even solo materials.
It’s a workaround approach when you’re dealing with limited data.
To put it another way:
If you have enough duo images, then you’re essentially performing a complete “2boys” concept replacement.
But if your material is lacking, you can only do a partial replacement.
Anyone into anime should find this metaphor easy to get😂.
Of course, even in a low-resource situation,
you can still create over 100 duo images using the earlier “stitching” method,
but yeah... in practice, it’s way more exhausting than it sounds.
This reminds me of something—
sorry, a bit of a mental jump here—
but when proofreading, I couldn’t think of a better place to put this,
so I’m going to include it here:
Why is it that for some checkpoints,
even if they don’t recognize a certain character,
you can load just that character’s solo LoRA and still get a correct result?
The answer is simple:
A lot of LoRA creators use auto-screenshot scripts,
or for other reasons, their training sets already include “2boys,” “3boys,” etc. group images.
So if both A and B appear in the same anime screenshot,
then A’s LoRA, via the “2boys” tag and the eye/hair color tags of characters other than A,
has already “absorbed” parts of B’s hairstyle, hair color, and eye color.
Even if those duo images make up a small portion of the training data,
if B’s LoRA also happens to contain similar duo images,
then B ends up “learning” some of A’s features too—
and through this coincidence, you get something that behaves like a true “2boys LoRA.”
But if you use a C character LoRA from the same creator or same batch,
but C’s dataset doesn’t contain similar duo images,
then generating AC or BC will not work properly.
This kind of behavior makes it even clearer that the AI doesn’t truly “understand” that A and B are two separate boys.
What it’s really learning is that “2boys” + all those descriptive tags together represent a conjoined concept—
like Siamese twins.
That’s why I mentioned earlier:
you can use faceless images to generate training data—
by leveraging this exact phenomenon,
and removing all the extra distracting features,
you can create a clean character “silhouette” for the model to learn from.
Here comes another particularly tricky issue: tagging.
Based on the current logic of LoRA training—
“If you don’t add a tag, the AI learns it as an inherent (fixed) feature; if you do add a tag, it’s learned as a replaceable trait”—
when training a “1boy,”
even if you only use a single trigger word like “A” and don’t tag any other features,
as long as your dataset is large enough, the AI can still learn it properly.
The most common approach is to tag hair color and eye color.
If you use an auto-tagging tool, it might also include some hairstyle descriptors like “hair between eyes,” “sidelocks,”
as well as things like “freckles” and “fangs.”
Adding or omitting these more detailed tags doesn’t really make much of a difference for a “1boy” LoRA,
since you're only trying to generate one specific character—A.
But in duo-image training, weird things start to happen.
For example, take the tag “fang.”
If character A has fangs that show even when his mouth is closed,
and in the training tags for “1boy” you include the tag “fang,”
then the generated results will show those fangs clearly—
you’ll see them across various expressions.
If you don’t include the “fang” tag,
the AI will still sort of “guess” that thing might be a fang,
but when generating, the result will be vague—
sometimes it’ll just look like something stuck to the lips.
In these cases, if you add “fang” as a prompt at generation time,
the AI suddenly “gets it” and will output a clear fang.
Now, when you’re training on “AB” duo images—
if you don’t include the tag “fang,”
the AI will learn that part as an uncertain element.
Even if A has fangs, the generated result will be blurry around that area.
And at that point, you can’t just fix it by prompting “fang” during generation,
because B will also get affected by that tag.
But if you do include “fang” in the training tags,
then it gets learned as a replaceable feature—
not an inherent trait of A.
So if you don’t prompt “fang,” it won’t appear at all.
And if you do prompt it, B might still end up being influenced by it.
However!
Not all tags behave like this.
Let’s say character A has traits like “hair between eyes” or “sidelocks.”
If you include those in the training tags,
they can generally be locked to character A.
Even more interesting—these kinds of tags,
even if they aren’t included in the training tags,
can still be used at generation time,
and the AI will correctly recognize which character they belong to
and generate the right visual features.
But with thousands and thousands of tags,
how can you possibly know which ones can be “locked” and which are only “replaceable”?
You can’t.
And what’s more—this is just how Illustrious 2.0 behaves.
Other base models may handle tags differently.
After all, this is anime we’re dealing with—characters can have wildly diverse appearances.
By labeling those features, the AI can learn them in greater detail,
which helps produce clearer and more accurate generations.
But with “2boys,”
you can never be sure which tags will actually get “locked” to which character.
So in the end, the best approach is to not include those tags at all.
In other words:
you should label characters using only the character trigger + basic hair color + basic eye color,
in order to preserve each character’s unique features.
And ideally, don’t use only the character trigger word without any additional tags,
as this increases the chance of character blending.
Let me explain this again:
Some multi-character LoRAs can still generate the correct character even when you only use the trigger word—
but that doesn’t necessarily mean the LoRA is well-trained.
From my own tests, I found that in most cases,
the reason is that the LoRA’s trigger words just happen to overlap with the checkpoint’s existing recognized characters.
This overlap doesn’t need to be exact—
even differences in first-name/last-name order still work.
In fact, even without loading the LoRA,
those trigger words can still be used with the base checkpoint alone to generate that character.
Some LoRAs only work correctly on specific checkpoints
because only that checkpoint has been trained to recognize that character.
I won’t bother including examples here.