kitakaze

kitakaze

純愛戦士 pixiv: users/2139800
280
Followers
347
Following
5.9K
Runs
42
Downloads
2.3K
Likes
271
Stars

Articles

View All
( 中文4 完 ) My Second Note on Experiences, About "2boys" ( 中文4 完 )

( 中文4 完 ) My Second Note on Experiences, About "2boys" ( 中文4 完 )

1中文 https://tensor.art/articles/8942122299503844812中文 https://tensor.art/articles/8942147414325195053中文 https://tensor.art/articles/894475285959803996所以按着我的思路,只能制作人物设计较为简单的角色,对于更复杂的设计还没有摸索到更好的保证生成质量的方式。暂时只能通过加大batch size过拟合训练来改善一些。再次重申,只要素材图片数量足够多,就可以大幅改善,而不用考虑本文的方式,本文只适用缺少素材的情况。 基本上我对于“2boys lora”制作思路和具体方式就是这样了。如果制作出的第一个效果不佳,可以通过前文描述的那些缺陷来简单判断下哪里不足。一般都可以通过以下方式改善后再次制作:1、最关键的,增加训练集中“2boys”的图片,比如单人各有20张,而“2boys”的数量远远超过,是几倍甚至超过100张时,哪怕放进同一个文件夹中,使用同样的repeat,是可以得到比较好的效果的。但数量对调,“2boys”素材寥寥无几时,本文的方式才开始奏效。这里补充一点,在我的测试中,不同角色,放入单独文件夹,和全部放在同一个文件夹,总是前者的效果更好,我暂时不能理解其中缘由。请注意,文件夹的名称实际上是可以作为触发词的,经过对比测试,训练“1boy”时,可以将文件夹名称作为触发词,标签中就可以不用写。训练“2boys”时,放弃这种方式,文件夹要写一个无关触发词的名字,切记不要把文件夹名和标签中的触发词重合。2、调整repeat,根据生图结果判断,让单人步数,或双人步数增加或减少。3、找出特征中哪些可以被固定的标签,重新打标签后再次尝试。4、训练时换一个随机种子,本来这个随机种子就是玄学的一部分,前面1、2、3条其实无论怎样调整,同一训练集在同一随机种子下loss值的大致走向基本是类似的,换一个也许更好也许更坏呗。5、微调优化器参数,或者换一个优化器试试。6、我的局限就到这了,剩下的靠你探索了。 最后补充一个特殊情况,就是正则化,在训练单人lora时,将大量“2boys”图片加入正则集,启用正则化训练。理论上这种方式训练的单人lora组合使用时,应该会改善角色特征融合的情况,然而不知道是我准备的图片的问题还是标签的问题还是什么参数的问题,效果并不好,所以这种思路被我放弃了。 再讲个小插曲,我朋友开玩笑的质疑我,费这么大劲做个ai模型,你咋不自己画呢,找人委托不也行。是啊,如果我会画画的话,而且如果非常非常富有,那还用得着找委托吗?不得不说ai图像的发展和lora的应用,让不具备画画技能的人们拥有了创作同人图的可能。 Ok,到这里基本上就讲完“2boys lora”的方面了。你肯定注意到了,我多次强调,只有在本来就是冷门cp甚至拉郎缺少双人素材时,这个思路才有效,素材多不用考虑,一百张两百张的双人图怎么都会得到更好的效果。但是,这个思路并非一无是处,有一个非常意外的惊喜,那就是可以制作出3人、4人的lora(更多人数还在测试不能保证),并在生成时做到任意双人组合,甚至三人组合,4人实际上已经超出检查点所支持的极限了,效果不好。有些喜欢的“2boys”要找大量素材其实还是好找的,但是三人甚至四人可就不容易了。那就讲一下这个“3boys”和“4boys”lora的流程。 1、单人素材越多越好,方便挑选,不足的话分别训练成单人lora,再生成高质量图,挑选出没有瑕疵的备用。甚至无需准备“2boys”。按前文所述,将单人图进行拼接,使得两两组合都得到相同数量但大于15张的图。比如“AB = AC = BC >15”。2、将“A”“B”“C”“AB”“AC”“BC”分别放置不同文件夹,按前文所述设置repeat,目标单人步数在800-1200左右,batch size=1。计算得出repeat值,A = B = C,AB = AC = BC,其中所有包含同一人物的双人图的总步数为单人的20-40%左右(此时误差较大、仅供参考)。比如“AB”“AC”各15张总计30,“A”20张,epoch = 12,20 12 5 = 1200, 1200 * 0.4 = 480, 480 / 12 = 40, 40 / 30取整数1。 也就是说A = B = C 的repeat=5,AB = AC = BC的repeat=1。请注意,这里40%的步数,是按照包含同一角色的双人图的总数量来计算的,都有“A”的“AB”“AC”加起来的数量,而不是“AB”或“AC”的数量。3、是的,不需要训练三人图,将上述训练集打标签,开始训练。4、训练好后的第一个可能不尽如人意,按照前文介绍的方式进行调整。在这里多写一些,按照前文分析,多了一个变数“C”之后,“A”作为两个连体婴的一部分,可能就包含了“B”“C”的部分信息,主要是“A”本身可能会当做“2boys”的一部分,此时生成双人或三人图时,“A”可能被当做两个“2boys”,“B”可能被当做两个“2boys”,在权重叠加的情况下,双人图相当于叠加四个“2boys”,就更容易出现多余人物和混乱,三人图叠加三次更甚。但是因为检查点本身能力和分辨率的约束,“2boys”的“双倍权重”并不意味着出来“4boys”,而是将所有因为“连体婴现象”学到“2boys”之中的特征进行展现,同时因为“A”这些人物触发词在单人中的权重更高一点,所以“2boys”生成的多余人物基本上就是使用的人物触发词中的角色,比如“A+B”时,可能会在远景生成两个类似A和B的身影,此时减少分辨率或调整提示词会将多余身影消除。“3boys”那就相当于隐藏了“6boys”,因为检查点本身就缺少“3boys”这方面的支持,混乱就会更多,只能通过减少分辨率来约束生成额外内容,3boys时使用“A+B+C”,因为每个单人触发词都含有“2boys”的信息,不仅是会产生多余人物,人物特征也更会混乱,更像是从AABBCC中“任意”挑选三个投放在“三人连体”的轮廓上。理论上通过大量3boys图片来让lora学会什么是“3boys”应该会改善许多,但从哪来那么多三人图,拼接一百张想想就头疼,但如果我的理解没错,理论上只要“3boys”图足够多,不需要其他图,只通过描述特征的标签就能生成任意单人或双人组合。这一段也解释了为什么双人图的步数,要按照含同一角色的双人图数量来计算,在我有限的测试中,如果按照“AB”的数量来计算,似乎比重太大了而让“A”带有了更多来自“AC”连体婴的内容,“B”也是这样,“C”也是这样,就产生了更多混乱。5、同样的方式,准备好单人和相同数量的4人任意组合双人图来制作“4boys lora”,步数和repeat也按照同一角色的双人图总数来计算。但是请做好心理准备,“3boys”和“4boys”想要得到一个更好的结果可能需要进行多次调整。 再次提示,本文的lora训练方法基于illustrious2.0。文中介绍的这一点点应用,在整个ai图像与视频生成的世界中简直沧海一粟、不值一提,有太多的未知、太多新的东西要去接触去学习。本文的时效性应该非常有限。 经过这么多繁琐的词句终于接近尾声了,很抱歉文字有些绕口,我的这方式也不是什么很完善的方式,其中对于真正原理上的理解肯定不对,只是对于观察到的现象的一种直觉。网络上对于多角色lora的讨论实在太少了,男孩们的更是几乎没有,大家是一种各自暗中摸索的状态,希望我的这些经验无论是当做参考还是反面教材,都能对有需要的朋友有所帮助。欢迎大家来留言提问,也可以在civitai和pixiv私信我,ID都是“kitakaze5691”。 许多朋友都制作了自己的帅气、可爱的oc,制作出的单人图许多都让人印象深刻。当然也就有朋友试着将这些oc的lora加载到一起生成双人图,然而结果总是混乱的。小小的得意一下,本文不仅简单解释了其中缘由,也给出了一个能让oc们真正在一起的方案,希望以后能有缘看到这些可爱的oc们,在“保持自我”的情况下,亲密互动~ 从接触ai生图到现在也不知不觉快一年了,这些时间也结识了不少陌生的朋友,你们的支持和鼓励让我仿佛回到少年时代,那个充满好奇与热情、不会停下探索的14岁。每次点开通知看到熟悉的ID都是那样的开心,很久都没有从抑郁中感受到这样令人舒适的情绪,谢谢你们。今后我也会继续分享新的图、新的lora、和新的心得体会,希望这些能给大家带来一点点快乐。 那就下次再见吧~
( 中文3 ) My Second Note on Experiences, About "2boys" ( 中文3 )

( 中文3 ) My Second Note on Experiences, About "2boys" ( 中文3 )

1中文 https://tensor.art/articles/8942122299503844812中文 https://tensor.art/articles/8942147414325195054中文 https://tensor.art/articles/894482801078873091在此说明一下,因为我的运行环境总是无法成功部署Kohya,所以使用的是第三方gui,大致上的设置应该是差不多的,我也对比了tensor的在线设置,只是简化了一些该有的还有。 优化器我测试了很多,在我的思路和运行环境和训练集设置的条件下,“Prodigy”的默认设置相比起来效果最好。但这不是绝对的,请根据你的个人体验调整。 梯度检查点建议始终开启。但梯度累积步数实际上并不需要,如果你的分辨率和dim不会超过显存限制,那是没必要开启的,如果超过了,开启了实际上也改善不了多少。另外如果开启实际上这里应该填写batch size的数值,而batch size应该设置为1。简单的说梯度累积步数是对batch size的模拟,最终的数值是两者的乘积。 暂时总结一下,我的“2boys lora”制作流程:1、准备原素材训练单人lora,使用其生成质量较高的单人图。2、将生成的单人图进行拼接,原素材的单人图如果是质量较好的也可以进行拼接。3、将“A”“B”“AB”分别放入不同文件夹,根据图片数量和目标步数(800-1200左右)计算repeat。其中“A”和“B”的repeat相同。“AB”的步数是单人的20%-40%左右来计算repeat。4、进行标签,注意标签排序,和拼接图需要特定标签标识“split screen”“collage”“two-tone background”等,后文会对这个环节进行补充。5、设置好训练参数,epoch应该在10-16左右,一般情况下10完全足够。开始训练并观察loss值的走势和预览图。 我必须泼一盆冷水,完成上述步骤后,有可能训练出的lora还行,但更大几率不会得到一个令人满意的“2boys lora”,接下来要讲的可能更加啰嗦,但是这些分析我觉得还是应该写出来,方便理解、对比,我尽量理清逻辑。如前文所说双人lora生成时会有很多很多常见错误,以我浅薄的见解分析其原因,以及应对的方法,和无法解决的问题。前文说lora训练中ai目前的学习逻辑并不是把“A,black hair,red eyes”当做一个真正的男孩,而是把“1boy=A=black hair=red eyes”当成一个整体概念,当其中的所有提示词同时出现时,就会完整生成这个概念,但是当其中的提示词只出现一部分时,因为这些提示词作为一个整体概念具有高度关联性,尤其是“1boy”,就造成即使不写全也会带有其中部分特征。这就是为什么如果lora中只含有“A”“B”单人图时无法生成正确的双人图,因为“1boy”同时包含A和B的特征,同理只有“AB”双人图时,“2boys=A+black hair+red eyes+B+blue hair+grey eyes”,“A”或“B”同时包含“2boys”的部分信息。请注意这里我用了“+”号而不是单人时的“=”号,经过测试后分析,“1boy”中因为提示词较少,所以这些特征就完全集合到一起,而“2boys”因为ai不懂是两个不同的男孩,更像是把这些特征分布到“一张白纸”,这张“白纸”就类似于两个“人形轮廓”,生成时把“A,black hair,red eyes,B,blue hair,grey eyes”随机涂到这个轮廓上。 即使“2boys”在任何检查点都是缺乏原生训练的,但加载“lora”使用时检查点的权重依然会大于lora。在测试中,不论单人图还是双人图,如果不设置“A”“B”“AB”来将人物本身诸如长相脸型肤色等不需要标记的特征“收集到一起”,那检查点中本身的“1boy”和“2boys”的信息会大幅影响,就造成“学不会”的现象。所以在“2boys lora”中,需要设置这个人物触发词,有没有发现其中“悖论”,单人图时“A”包含了“A”所有特征,而在双人图时“A”并不完全完全等于“A”的特征,“A”实际上关联了所有“2boys+A+black hair+red eyes+B+blue hair+grey eyes”的信息。那么在生成“2boys”时,“A”或“B”的触发词实际上就等于一个“暴击”的双倍触发,一个确定的“A”叠加一个不确定的“A”,再加一个确定的“B”和一个不确定的“B”。这就是为什么使用双人lora时,总会特征混淆或是产生多余人物。不仅仅是因为检查点本身的性能对“2boys”支持不好。为了尽可能缓解这种现象,“AB”在训练集中的权重要低于“A”“B”本身,权重越接近,混乱越多,经过测试,20%-40%是个相对合适的区间。为什么会是个区间呢?更棘手的问题就出现了,不同角色的拟合进度是不一样的!前文已简单描述这一现象,单人学会的速度就有区别,而他们的双人图中,这种差异更加放大了混乱,步数多了会学混,步数少了也会是没有区别开。是的,在最终呈现的结果上,都是特征混合,没有区别开不同的两人。仅仅是我个人测试后的经验,如何来分辨这两种,仅仅是特征交换,比如发色瞳色交换,那就是学多了,特征比如脸型、发型、眼型等融合在一起那就是学少了。请注意、请注意、请注意,这有个前提就是前文所讲最低图片数量:“A = B ≥20 ,AB≥15”,经测试低于这个数量时,已经跟权重比例没有关系了,是属于缺少信息而学不会。 再次声明:“如果你想制作的角色总是成双成对出现,有充足的的素材,随便就能超过100张双人图,那么完全不需要遵循本文的思路,只要加入少量单人图一起正常训练就好。素材越多越丰富,ai学到的不同场景下的信息更多,带来的结果就是人物更容易区别开,错误几率大幅减小。本文旨在介绍两个单人,缺少双人同框甚至单人素材时不得已而为之的一种思路。”换句话说,双人图足够多的情况下,就是完成了一个完整的“2boys”的概念置换,而素材缺乏时,只能完成部分置换,二次元里这种比方应该很好理解,笑哭。当然,在缺乏素材的情况下,通过前述“拼接”的方式做出超过100张也不是不行,只是真的操作起来比想象中的累多了。 说到这,突然想起来一件事,抱歉思维有些跳跃,但我写完后校对时好像也没其他地方更好放置这一段。那就是为什么有些检查点不认识的角色,只把他们的单人lora加载到一起可以正确生成对应的角色呢?答案很简单,很多lora制作者使用自动截图脚本,或是其他原因,在训练集中本来就包含“2boys”“3boys”等等图片,那么在同个动画截图中有A有B,A角色的lora中就经由“2boys”和A角色以外的发色瞳色的标签“置换”了“B”的发型发色瞳色的概念,这种图在训练集中占比很低,而恰巧B的训练集也有类似图片,“B”也含有了“A”的特征,这种巧合就正好产生了类似单“2boys”lora的效果。如果是同作者同批次制作的C,但是C的训练集中没有类似的“2boys”图片,此时AC或BC就不会正确生成。这种现象也就更好理解ai并不是懂得两个不同的男孩,而是“2boys”和那些特征提示词共同组成了一个“连体婴”的概念。前文写的可以用faceless来生成图片用作训练集,就是利用这种现象,排除多余的因素来创造一个干净的“轮廓”。  另一个非常棘手的问题,关于标签,根据当前lora训练“不加标签会被ai学习成固有特征,加上标签会被学习成可替换特征”的原理,训练“1boy”时,哪怕只用“A”一个单独的触发词,而不对其他特征进行标签,只要素材足够,ai也能学会,最常见的也就是标签一下发色、瞳色,自动打标的话可能还有一些发型的描述诸如“hair between eyes”“sidelocks”和“雀斑”“虎牙”等都会进行标注,这些更详细的标签加上或者不加对“1boy”其实影响不大的,毕竟使用时只是为了生成“A”这一个角色。但是在双人图的训练中,有些奇怪的事就出现了,举个例子“fang”,如果“A”有虎牙,在闭着嘴的时候也会露出来,那么在训练“1boy”时,训练标签中加上“fang”,生成时使用“fang”效果会更好,比如说各种表情中都可以清晰的分辨出“fang”。但是不加,ai也会“识别”这部分可能是“fang”,生成时也会出现“fang”,但是各种表情下这个“fang”就会很模糊,甚至只是一个粘在嘴唇上的什么东西,这种情况下使用“fang”提示词,ai仿佛就懂得这个“东西”确实是“fang”,就会生成清晰的内容。而当训练“AB”时,如果不写“fang”,那么ai就会学成那种“不确定”的状态,虽然“A”有“fang”,但生成时这个部位会模糊不清,此时没办法通过加入“fang”提示词来生成清晰的“fang”,因为“B”也会被这个提示词影响。那么训练标签中加入“fang”,则会被学成可替代的“fang”,而不是属于“A”的固有特征,生成时不使用“fang”,就完全不会有,而使用“fang”依然会影响“B”。但是但是,不是所有这种标签都不能被固定,比如“A”是“hair between eyes”“sidelocks”的话,训练标签中加入时就基本可以固定到“A”。更奇妙的是,这类“标签”在没有标注的情况下,也可以在生成时使用并识别到相应角色且生成正确特征。那这成千上万的标签怎么可能知道哪个可以被“固定”哪个只是“可替换”,完全不可能,而且这只是illustrious2.0的表现,其他基础模型可能有其他情况。毕竟是二次元,人物当然有千变万化的外表,将多样的特征进行标签可以学习的更细致,生成时更清晰准确,然而“2boys”完全不确定哪些标签可以被固定,所以只能不写,也就是只使用“人物触发词+简单的发色+简单的瞳色”来对人物进行标签,以保留人物各自的特征,并且最好不要只使用人物触发词而不用“附加标签”,这会增加人物混合几率。这里再次解释一下,有些多人物lora只使用人物触发词依然可以生成正确人物,并不一定是lora训练到位了,我测试了很多,绝大多数都是因为lora的触发词正好和检查点本身已识别的角色的触发词重合了,这种重合不需要100%,名在前和姓在前都是有效的。这些lora中的触发词,哪怕不使用那个lora,也可以单独用检查点进行生成,有些只能在特定检查点生成正确内容就是因为这个角色只有这个检查点识别了。这里就不贴图例了。 关于“标签”还有一些观察到的现象,illustrious2.0对于图片中没有进行标签的一些内容是会识别并进行拟合的,当使用这些未标记的内容的提示词时,生成内容就会趋近训练集图片中的相应部分。但找出具体是哪些就太难了。概念类似的标签之间也会相互影响。这在服饰的训练中更加明显,比如“footwear”“shoes”“sneakers”,训练时只使用一种标签“sneakers”,生成时使用其他的也会生成训练时进行标签的“sneakers”。为了应对这种现象在生成“2boys”时造成的混乱,建议根据服饰等的具体样式找到danbooru上对应的标签,再根据这个标签起一个专属的名字,比如“某某的蓝色体恤”,而不是“blue shirt,short sleeves”等这类通用的。同时上一段讲述的现象在服饰的训练中也会出现。在训练“1boy”时,可以详细的描述服饰的各种细节元素、各种配饰,将其“分解”以方便生成其他服饰时不会被专有服饰的特征所影响。而训练“2boys”时,详细标注服饰细节元素会使得生成时混乱不堪,因为有些标签可以被“固定”,有一些“可替换”,依然无法判断具体是哪些。所以训练“2boys”的服饰,如果想要保证穿着在正确的男孩身上,那么标签要尽可能的少。这样做的后果,生成其他服饰时必然会带有专属服饰的特征也是没办法的事情。关于服饰还有另一个现象,illustrious2.0会根据标签的内容对图像上的“类似”内容进行“替代”,比如图中的是“tank top”,而标签中写“vest”,那么会无中生有的把图像中tank top转换成完全不一样的vest。这种现象产生的其中一个原因就是自动标签时,对于同一种物品可能会被打上几种不同但类似的标签,如果没进行挑选和删减,那么训练完成后使用时就有一定几率带来这种混乱。为了避免这种混乱,在illustrious2.0上还是尽可能的用danbooru的标准标签进行标注。反过来其实也可以利用这种现象,故意把一些概念进行置换,emmmm,举一个色色的例子吧,就是标签时把“pexxs”替换成“small pexxs”,“正常”被替换是“小”,在生成时空出来的“正常”的“pexxs”就很奇妙的会产生“large pexxs”的效果,确实是会比正常标注“pexxis”时大。唉,为了避免被tensor删掉,你懂我在写什么。但是,我怎么又写但是,这种现象依然只是在部分标签上才有,想利用它进行一些概念置换并不是很简单,完全不知道哪些有效。另外刚才说的起专属名字,也不是所有标签都可以通过专属名字进行拟合,有些即使写了“专属名字”,生成时会发现完全没有被学习,而换用“class token”时,则被正确学习。这依然无法找出到底是哪些标签会产生这种现象。有时候训练出的lora,人物不像或者缺少一些细节等等,怎么调感觉拟合度都不高,大概率就是本段介绍的这些现象发生了,但是无法判断具体是哪一个或者几个标签在捣蛋。总之如果真的遇到了,要么认真排查,要么就随缘吧。
 ( 中文2 ) My Second Note on Experiences, About "2boys" ( 中文2 )

( 中文2 ) My Second Note on Experiences, About "2boys" ( 中文2 )

1中文 https://tensor.art/articles/8942122299503844813中文 https://tensor.art/articles/8944752859598039964中文 https://tensor.art/articles/894482801078873091浅薄的从原理上和实际感受解释一下,训练lora实际上类似是完成一个概念置换,例如“1boy,A,black hair,red eyes”(后面用“A”“B”“C”等简单替代人物触发词),lora中这些标签所代表的内容,在使用时会“替换”掉原本检查点内的相同标签,意味着ai实际不懂得A是一个黑发红眼的男孩,而是将这些标签的组合当成一个整体概念,“1boy=A=black hair=red eyes”,那么用这些触发词生成“A”时,那出来的必然是“A”,即使去掉其中“A,black hair”只用“1boy”,那生成内容也会包含“A”的特征。当训练双人时“2boys,A,black hair,red eyes,B,blue hair,grey eyes”同样不意味着ai懂得这是两个男孩,其中一人有着黑发红眼,另一人是蓝发灰眼,依然是把它当成一个整体概念,这就意味着如果训练集只有“AB”双人图,那么是无法生成单独的“A”或“B”,因为“A”或“B”被所有特征影响,生成“1boy,A”时会融合所有特征。而训练集中只有两人的单人图,也是无法生成“2boys”的,类似同时加载两个单人lora的结果,特征混合。即使是我用母语写这些也觉得绕口,翻译成英文时我不保证能表达出想表达的逻辑,再次说声抱歉,文章中若有哪些异义有误解请不吝赐教留言。 这就引出一个问题,一个双人lora,如果希望能生成“1boy”和“2boys”,同时需要双人和单人图,如前文所述,对于人尽皆知的角色用检查点本身就可以了,不需要lora,而要制作冷门cp们可能就没有多少图,更没有质量较高的双人同框,甚至单人的素材都不够。对于一些有些年代的老作品素材质量较差,影响做出来的效果。为了应对这种素材匮乏的局面,我想到一个点子,并进行实验,还真的成功了,你可以看到我上传的所有模型都是来自原素材匮乏的作品,都可以生成多个男孩。那就介绍一下这种方法,请注意,后文可能用语比较啰嗦,逻辑较为混乱,我缺少用更精炼的语言描绘清楚这种复杂流程的能力,翻译成英文可能更容易产生歧义,请见谅。 首先可以参考第一篇文章中的方法制作两个单人lora,素材充足且质量好时一次基本就够,素材比较久远、又少、分辨率又模糊,可以用lora生成调整后的质量更好图作为训练素材多次训练。然后将这些单人图拼接成双人,是的,这种“拼接”在danbooru上也有对应标签,不用担心会造成混乱的影响,这种人工“2boys”经过测试要比直接使用动画截图或是插画更容易区别角色。简单分析一下,动画截图往往人物不一定处于同样景深,可能一前一后等,这种图片在学习中就更容易造成人物相对体型不对,若是都处于中景、远景的话,动画往往不会描绘清晰,若是近景或同人图之类,人物之间更加贴近,则学习中更容易将人物和对应特征混淆。当然这种先做单人lora、再生成图像、再拼接图像的方式比较繁琐,不过对于缺乏素材的角色来说,是我试过的比较合适的提升质量的方法。如果你想制作的角色总是成双成对出现,有充足的的素材,随便就能超过100张,那么完全不需要遵循本文的思路,只要加入少量单人图正常训练就好。素材越多越丰富,ai学到的不同场景下的信息更多,带来的结果就是人物更容易区别开,错误几率大幅减小。本文旨在介绍两个单人,缺少双人同框甚至单人素材也匮乏时不得已而为之的一种思路。如图所示,可以手动修改人物之间的比例以增加正确概率,但是经过测试,这种比例修改只在人物体型差异较大或接近(到额头)时更有效,身高到脖子、到胸口等往往效果不好。同时这种方法虽然能提高生成质量,但是也会更容易造成过拟合,人物表情和动作比较生硬,除非制作出大量多镜头多动作的图片,可以和原图一起加入训练集来减少过于生硬的情况。 用这种方法也可以完全使用黑白漫画来作为训练集,只需要准确打标签,也可以手工擦除一些多余的线条、边框、文字等减少多余信息。如果有彩图一定要一起使用,这样的话对应的颜色可以被正确学习到,否则完全是对应颜色框架下随机的。使用黑白漫画为素材时,一定要注意相应标签,“greyscale, monochrome, comic, halftone,”一般情况下是必须的,可以在使用标签器时将阈值调至0.1,看看能识别出哪些标签,一些黑白漫画中的绘画技法一定要标注,可以查询danbooru的标签库来准确定位画面元素。发色等特征可以正常描述比如“brown hair”“brown eyes”等,只要准确标注了黑白漫画中的“greyscale”“comic”等图像种类,性能更强社区模型可以在生成时将其排除,如果最终生成效果较差,可以使用提示词或负面提示词来进行一定的修正后再作为训练集重新训练。  接下来介绍下这种思路下的训练集图片数量的需求。之前的单人lora中原素材集尽可能的丰富,毕竟生图过程中可以改善,而用在“2boys lora”的训练集中,一定要仔细挑选原素材中质量最好的,去掉那些模糊的,原素材和再次ai生成的图放在同一文件夹中,如果想要更偏向原素材的画风,可以增加原素材数量的占比,如果找不到更多的图,可以直接复制“最佳图片”和对应的txt标签文档,在windos操作系统中会自动给副本用系统语言命名,不用担心,不需要更改成英文,lora训练器可以识别“UTF-8”编码支持的所有字符。 由于ai生成时的特性,近景特写的效果要好于远景的全身,在“全身”下眼睛的细节可能用adetailer也无法很好修复,而“全身”是让ai尽可能学会正确身材比例的关键,所以可以在生图时用“closed eyes”甚至“faceless”或是其他提示词将眼睛部分省去,头发细节较差也可以加上“bald”,来生成一个除了身体没有任何其他角色特征的无脸人,不用担心,因为头发和眼睛的特征已经由别的图片学到了,而“faceless”等因为被明确标记所以不使用时是不会出现的。如果正常生图可以通过高清修复和adetailer等做到“full body”下的脸部细节,尽量就别用这种方法,测试中虽然99%的情况下都不会被faceless影响,但是就那么偶尔一下会被吓到的。同理手和脚也是这样思路,通过动作或袜子等隐去手部或者脚,或是专门生成正确的手。强调一下,这种方法不是为了提高生成质量,而是为了确保ai不会学到更差的。我想你在使用很多角色lora时都会遇到,手部畸形或是模糊不清,这个不仅是检查点本身的随机性带来的,更大的原因是动画大多数情况不会仔细画手的,而lora把原素材中非常粗糙的那部分经由过拟合的训练学会了。  以上是准备训练集的大致思路,那么到底需要多少图片,需要设置怎样的训练参数呢?以下是我按此思路测试制作了几十个试制品lora得出的结果:A、B单人图每人最少20张,AB双人图最少15张。 步数相同的情况下,更多的图片数量+更少的repeat值的训练效果要好于更少的图片数量+更多repeat。偷懒的话可以用复制的方法来减少工作量,但不同的图的数量要尽可能大于刚才提到的最小数值:“A、B单人图每人最少20张,AB双人图最少15张”。个人测试结果表明,训练“2boys”的前提下,数量不嫌多,但是低于这个数量时,人物学不会或是特征混合的几率大幅提高。  具体多少张,按实际素材数量,根据计算步数来决定。计算方式如下,在“batch size = 1”的情况下,A repeats epoch 和 B repeats epoch相等,其中repeat相等,而epoch基本在10 - 16(绝大多数情况下10足够了),控制在800 - 1200步左右,比如A、B都是60张,repeat = 2,epoch = 10,则60 2 10 =1200,B同理1200步。在单人图步数的基础上,双人图的步数控制在20% - 40%左右,尽可能从40%开始尝试(这不是上限,可以略超过)。比如1200 40% = 480步,epoch = 10,则480 / 10 = 48,也就是说AB文件夹单epoch的步数在48左右,图片数量就可以是48张 1 repeat,可以是24张 2 repeat,可以是15张 3 repeat。简单解释一下为什么,如前文所述,ai其实不懂得A是一个男孩、B是一个男孩,AB的概念更像是一个连体婴,AB的拟合程度跟A或B单人的拟合程度几乎没有关系,都是单独的概念,但是因为共用触发词所以会相互影响,生图时使用A + B的触发词实际上是叠加了AB,就像是触发词的权重翻倍,经过实验,A+B的步数和AB的步数越接近时,人物混合几率越大,可能就是因为相互叠加的权重影响越来越大。这又引出来一个问题,因为人物本身特征的差异,训练时的拟合程度,或者说ai学习的进度是不同的,可能A在600步就完全学会了,而B在600步依然缺少特征,这在服装上表现更甚,这个例子中这种差异就会导致在800步时,A已经过拟合,B可能刚刚好,此时A和B特征混合几率会加大,加到1000步也不会改善。暂时的解决方法就是AB的双人图一定要大于15张,尽可能用更少的repeat和更多AB图,而B的文件夹中可以适当增加几张单人图数量,repeat依然和“A”相等。为了解决双人时的服装尽可能的不混合,可以尝试增加batch size,进行一定的过拟合。此时依然保持单人步数在800-1200左右,再计算repeat。比如A、B都是60张,目标1000步,batch size = 4,epoch = 10,1000/10*4 = 400, 400/60 ≈ 6.66,取整数6就是repeat,此时60*6/4*10 = 900,在目标步数范围内。AB有30张,那么900*0.4=360, 360/10*4 = 144, 144/30 = 4.8,取整数5就是repeat,30*5/4*10 = 375, 375/900 ≈ 0.417,符合单人图和双人图步数所需的百分比。  接着介绍一下训练参数,当图片准备好之后,建议进行手动裁剪和缩放,并开启训练参数中的“启用 arb 桶”“arb 桶不放大图片”,将分辨率设置为2048,2048,这样训练集中的所有图片都不会被自动缩放、裁剪,否则你不知道低于设置分辨率时自动放大后的效果,也不知道高于设置分辨率时被裁剪掉的地方。请注意,分辨率的大小会影响显存的占用,这里大致讲一下显存的占用和哪些参数有关,在训练器的设置中,只要不开启“lowram”将 U-net、文本编码器、VAE 直接加载到显存中,且开启缓存图像到磁盘等选项,那么显存的占用几乎只和训练集中的最大分辨率和network_dim有关,和图片数量没有关系,batch size的影响甚至也不大。超出显存会让训练速度大幅降低并不是不能进行,当然如果你使用在线训练比如tensor那不需要考虑这个问题。在dim=32下,1024*1024的分辨率基本上就用到8G了,我想如果你使用本地部署,应该已经淘汰8G的显卡了。在dim=64下,1024*1024以上到1536*1536,基本会用满16G显存。在dim=128下,显存占用会大幅提高,任何分辨率都会超过16g,1024*1024可能会超过20g,1536*1536将接近40g,这已经不是游戏卡所能承受的了。 简单的说这个dim值实际决定了lora的细节程度,越大学到的细节越多,越小越少。越高的分辨率和越多的内容需要更大的dim值,单人的二次元lora,基本上32是足够的,双人建议32或64,双人以上比较微妙,可以64,可以128,根据你设备的现实情况。而分辨率的情况很微妙,理论上越大的分辨率效果应该越好,但经过实测,训练集使用2048*2048的分辨率并不会让生成768*1152或是1024*1536等图像质量变好,也基本不影响各种分辨率下原本生成的体型差异,同时因为各种检查点本身不能很好的支持高分辨率,在直接生成诸如2048*2048,1536*2048等图片时会出现各种畸形,然而细节上确实要比低分辨率要好的多的多,也不是完全都是畸形,在某些分辨率下比如1200*1600,1280*1920等等分辨率下能产生正确的图像,同时质量好的惊人,远比低分辨率+高清修复的效果好。但是用2048*2048来训练付出的代价实在太大了。综合一下,dim=64,分辨率采用1280*1280或1344*1344,可以在16g显存下取得一个折中的效果。推荐使用1344*1344,且arb桶上限设置1536,此时开启自动放大,常见的2:3比例将被放大到1024*1536, 3:4比例将被放大到1152*1536,这样保证了最常用的长宽比,非常方便生成图像和截图。
2
 ( 中文1 ) My Second Note on Experiences, About "2boys" ( 中文1 )

( 中文1 ) My Second Note on Experiences, About "2boys" ( 中文1 )

2中文 https://tensor.art/articles/8942147414325195053中文 https://tensor.art/articles/8944752859598039964中文 https://tensor.art/articles/894482801078873091前情提要,先前发布的文章讲述了自己如何入坑ai生图,并尝试制作lora的心路历程和一些心得。( https://tensor.art/articles/868883505357024765 )以现在的时间点来看,这第一篇中的许多观点已经有些过时,有些依然适用,本篇后文中会有部分提到,请参考第一篇的中相关部分。我将兑现文中承诺—若以后有了新的心得体会将继续更新,于是本篇会探讨这些时日我一直沉迷的,制作2boys的同人图和lora的一些理解。 在文章开始时,我必须声明,英语并非我的母语,且个人英文水平较差,本文依然是使用ai辅助翻译完成,原文使用中文编写,之后也会上传中文版方便大家自行翻译。至于第一篇就不会再补传中文原版了,因为第一篇我是完成翻译后在英文版中进行校对、增添与删改的,并未再重写中文版。 如你所知,当前ai生图随机性是绝对主导,每个人的常用参数不尽相同,生图的内容更是千差万别,审美也各有偏好。本文只是我个人的一些体验、或者说感受,以及在这基础上的一些经验,远远不是一个严肃的教程,其中可能提到一些原理上的概念只是浅薄的体会,且无论生图还是lora制作都只探讨acg作品中的男孩角色的同人,是已知的来自acg作品的二次元角色,不包括ai随机生成的boys和任何非虚拟人物。生图方面只基于noobai和illustrious的衍生模型,制作lora方面也只基于illustrious2.0官方版本,后文中如无特别声明则全部基于此。 最近几个月,各种社交媒体如X,还有pixiv,civitai等网站,以及多到数不胜数的赞助“图包”的宣传,“2boys”的内容越来越流行,还有一些多男孩的图片出现,当然绝大多数是nsfw,多数只是一秒划走的瞬时的快感,经不起细看,其中也有不少非常精美挑不出毛病的,很难让人不好奇是如何制作的。当时还是非常新手的我第一反应这肯定是用两个角色的lora制作的,当然要尝试一下,正好当时有我非常喜欢的两个男孩的lora,不仅动画非常冷门、角色cp更是无人问津(maki X arashi),没想到还算顺利的直接就做出来了,yeah,是r-18的图,当时的兴奋远超理智,根本没有在意生成质量,但角色似乎是正确的,我就误把这当成正确答案,于是又做同一个动画的另一对cp(tsubasa X shingo),也算是成功了,角色虽然很多时候会混合,但是相对正确的也不是没有,这加深了我对“2boys同人图”生成方法的误解。此时圈内的一位大大大前辈做了yuta X yomogi的双人lora,我大为震撼,原来可以有这种用一个lora的方式来生成双人图,当时发了好多他俩的图(已被tensor隐藏),我又误解了以为这样一个双人lora是很简单的事,我当时非常兴奋以为之后会有很多双人lora的出现,制作喜欢的cp将易如反掌。我想读到这你已经发笑了,这真的是个天大的误会,你一定试过,男孩的个人特征实际上全都混在一起,不管是加载两个lora或是使用这些仅有的双人lora,想做出正确的图实在是太难了,双人lora还好,加载多个lora那就不能用难来形容,几乎就是不可能,无论怎么改变提示词顺序,无论用多少“BREAK”或是“different boys”,无论如何调整lora权重,人物特征永远混在一起。反过来审视这些我制作过的,和社媒上的图,全是被一时的兴奋掩盖的错误,那些相对“完美”的,我又不好意思去问作者,而全网几乎没多少对此制作方法的讨论,女孩们或是男女之间可能还有些信息,男孩之间的就完全没有。那就省略多余的心路历程直奔主题,经过摸索,我大概整理了以下几种方式: 1、实际上底模本身就认识很多角色了,只要写出他们的名字。查询检查点的说明去查找支持的角色,对于检查点已含有信息的角色,直接“2boys”加上他们能被检查点识别的名字就可以了,一些热门角色或者经典角色是完全不需要使用lora的,而且可以任意组合,并不局限于同一个作品。这本应是最常用的,但往往人们想做的cp们并不包含在检查点之内,角色的还原程度也各不相同。有些角色只能还原大概,但是加上额外的瞳色、发色、发型或是其他特征提示词之后,质量会显著提高,但这种方法只适用于1boy,两个男孩会被这些提示词干扰。 2、图生图,局部重绘,这个是很常见的方法,也不用做过多解释,加载两个lora生图做个基础,然后单独加载单个lora重做混合的地方,这只是其中一种流程,这种思路有很多种方法去实现。 3、区域控制。这需要在webui或comfyui中使用一些插件和复杂工作流,作用是将画面分区,并指定lora在分区中的权重,具体名称我就不提了,您可以自行查找,这种方式效果有时非常好,但是我放弃这种方式的理由将在后文中解释。 4、双人或多人lora,目前虽然很少,但你能用到的基本都能工作,有些不稳定,有些效果还不错。后文会展开如何制作2boys的lora,此处也不做更多说明。 5、非常重要!“NovelAI”。这是最最强大的“二次元”商业图像模型。本身支持的角色之多是开源模型无法比拟的,相比起来开源模型只有一些作品的主角罢了,而NovelAI可能连一些冷门作品的配角都可以生成,新角色的生成数据添加也相对较快,而且自带区域控制,不止两个男孩,更多的男孩都可以指定位置指定动作。非常强大,当然会员费也相对贵一些。你在社媒上看到的各种“crossover”很多都是novelai生成的,也很容易分辨novelai生成的图,任何角色的画风都很稳定,跟开源模型的画风区别明显。 6、其他技术,比如最新的Flux kontext。可能是今年最突破性的图像模型。我暂时没有更多时间去详细学习,但用它制作同人图是可行的,如果你还没有使用过,请尝试尝试。人们都知道flux制作真实内容当前第一,实际上制作二次元也十分强大,只是它的使用成本和lora制作成本相比基于sdxl的那些模型太高了。  好像就这么多了,可能还有更多方法我没有探索到,而在这些我发现的方法中我最终选择制作双人lora,那为什么novelai如此强大却不选择它呢,作为闭源的商业模型,它无法使用lora,这千万种lora的吸引力要更大一些,同时它包含的角色信息虽然多,但终归“只有那么多”,而且在某一版本的某一时间曾经无法制作nsfw内容,而tensor的现状能保证不会发生在novelai上吗?如果你想制作的同人不是那么冷门,比如说奇犽x小杰,或小智x太一的crossover等等,推荐使用novelai,简单易用、画风繁多、主题类型丰富等等、是真的比开源模型强的多。就刚才举的两个例子,其实用开源检查点也能生成,但是请注意,不同的检查点效果也不尽相同,而且使用时加载使用同名触发词质量较好的单人lora有时会在保证角色不混合的前提下提升质量,下面举几个例子吧。使用这个手势当然是因为tensor等网站最近发生的事情。如图,在检查点本身包含人物信息的时候,使用lora并不会造成多少混乱。 那些动画火爆且画面制作水平较高的大热角色,比如说炭治郎等人,就完全不需要使用lora。 重要的事说两遍,如果你想制作非绝对冷门的“2boys”或者包含更多角色的同人图,我强烈推荐使用“novelai”,其次可以尝试用检查点直接生成试试,有很多角色你以为需要lora实际却不需要,只是单人时效果更好,搭配生成“2boys”时,相对数据更少的一方会被影响,再举个例子:Edward Elric和Alphonse Elric兄弟俩,都是可以直接生成的,单人时al也是男孩而不是盔甲的形态,但双人时因为绝大部分元数据他俩在一起时都是ed+盔甲al,所以al真正的样子几乎无法生成。当检查点两个角色信息不对称,一方详细另一方缺少细节时,效果较差。搭配lora使用时,虽然可以增加人物细节,但也会将一些人物特征融合,画风也会影响,不过粗看不容易察觉。当检查点没有信息的角色搭配lora生成时,特征将会完全融合,就不举例了,网络上比比皆是。 Novelai的效果,完全没有使用质量和负面提示词,需要更多的调试和尝试来生成细节丰富、风格更精美的图像。你可以看到leonhardt的服装比大多数lora还原的要好很多,青砥作为冷门角色还原也还可以,两个完全八竿子打不着的角色随意搭配。但毕竟是商业模型,每一个算力都要精打细算,不能无限制的试错。 对于图生图和区域控制,有个缺陷,对于角色亲密接触的地方,很容易看出来模糊,而且流程繁琐,不论是部署时还是使用时,所以有些作者的社媒,明明有着看起来效果还不错的双人图,却数量极少,绝大部分还是1boy为主,因为使用lora加分区工作流实在太麻烦了。而且如果你正好经过某个地方突然有了灵感,想到一些提示词,用在线app很简单就做了,这些工作流你起码需要一台配置还不错的电脑来进行本地部署,而且恼人的是,每个插件每个工作流很可能需要的运行环境版本不一样,真的很麻烦,所以我尝试之后立刻放弃,运行环境部署好以后折腾越多bug越多。图生图、区域控制等技术有着更高的入门门槛,需要一定耐心学习后使用,效果也不一定好,有条件的话可以试试。 还有一些平时发布ai图的大佬,有些看着没有瑕疵的双人图、多人图,又不像novelai生成的,质量也远非使用lora+重绘能达到的,不用怀疑,就是大佬手绘的,谁说ai生成和手绘技能不能同时在一人身上呢?综上所述,制作一个双人lora来生成你喜爱的cp的同人图,是一个更合适的选择,几乎可以是任何角色,任何画风,使用简单、只需要提示词。一个双人lora的“原理”很简单,只要有充足的双人同框的图,然而难点在如何制作以实现使用时稳定的生成。如果你使用过双人或多人lora一定被以下问题困扰:1、人物特征依然会交换、混合,完全正确的只是有概率出现。2、人物之间的体型差异完全随机,除非原人物体型差异巨大、比如奇尔查克和莱欧斯等。哪怕是身材一致的两个角色也忽大忽小,下意识会在提示词中加入“same size”等等其实完全没有用的词汇。3、生成质量较差,必须搭配额外画风或是细节lora,以及使用高清重绘或adetailer等。4、其他单人lora会出现的缺陷都会出现。等。 在第一篇文章中我已讲述我制作的第一个lora的心路历程,那是一个双人lora,不过并没有过多展开具体制作方式,这次就好好聊一聊。依然需要道个歉,关于2boys lora的制作网络上并未见有过讨论,可能在一些更私人的聊天群如discord等有人讨论过,但这我无从而知,所以我的思路起源是基于曾有的少有的双人lora或是含多人的画风lora等的使用经验和其公开的训练参数,在此基础上进行探索实验,有一种可能就是我的思路从一开始就走歪了是错的,无论经过大量实验后目前成果如何,我的方法仅仅只能当做参考,若您有意制作双人lora却无从下手,可以试一试当做一种对比方案。 时间倒回我做“lagoon engine”lora的开始,首先当然是训练集的准备,第一反应肯定是要准备双人图,虽然素材较少,但好在兄弟俩都是成双成对出现,单人反而几乎没有,于是再将图片分别裁剪出相同数量的单人部分,放入不同的文件夹,指定相同的repeats,从第一个alpha版本无论后续怎么再添加新图制作新的小修版本,直到beta版的完成,基本上都遵循这个思路,所以当beta版本完成后看到还不错的质量,自然欣喜若狂想要复制同样的训练集设置和参数来制作下一个双人lora,然而完全失败了,之后做了几个不同角色的双人lora无一不是要么特征混合,要么人物都没有学会。当时真的非常沮丧,为什么,到底是哪里错了,难道仅仅是因为运气才正好让我从alpha到beta中间经历多个过度版本都没有出现夸张的人物融合甚至没学会的情况吗?之后通过简单的控制变量进行穷举的方式,又做了好多试制品,想从其中找出一些线索。这个过程写出来将是一堆废话就按下不表了,直接说我得出的结论吧,双人lora想要完全稳定可控几乎是不可能的,所做的一切工作都是为了让正确的内容存在于lora之中,以提高正确几率。
( 8 finish ) My Second Note on Experiences, About "2boys" ( 8 finish )

( 8 finish ) My Second Note on Experiences, About "2boys" ( 8 finish )

1 https://tensor.art/articles/893578426995184161 2 https://tensor.art/articles/8938012445298126503 https://tensor.art/articles/8938115792948804674 https://tensor.art/articles/8938203152583353995 https://tensor.art/articles/8938481466464460196 https://tensor.art/articles/8941842085099989687 https://tensor.art/articles/894196515738788839You’ve probably noticed by now—I’ve emphasized this several times—this method only works when the pair you want to create is an unpopular or rare “ships” with little to no existing duo material.If you already have plenty of “2boys” content, like 100 or 200 duo images, you’re going to get much better results without following the workflow in this article. However, this method isn't useless, this approach does have one very unexpected surprise benefit—it allows you to train LoRAs with three or even four characters (more than that is still being tested and not guaranteed),and still generate any two-character combination at inference time.It can even handle three-character combos.Four characters is already pushing past what the checkpoint can realistically support, so results may vary. It's easy enough to find a good amount of material for your favorite "2boys" pairing, but it's not so easy for three or four characters.So let me walk you through the process of training a 3boys or 4boys LoRA: 1. The more single-character material you have, the better.This helps you pick the best images.If you don’t have enough, train single-character LoRAs for each boy first,then generate high-quality solo images, and pick the best ones for reuse. You don’t even need “2boys” images. As I described earlier, stitch the 1boy images together.Make sure each combination has at least 15 images.For example: AB = AC = BC > 15 2. Organize the dataset into folders:“A”, “B”, “C”, “AB”, “AC”, and “BC”. Set the repeat values according to the earlier method.Target around 800–1200 steps per solo character, with batch size = 1. Then calculate repeat values:A = B = C, and AB = AC = BC. The total steps for all dual-character images that include a certain character should be about 20-40% of the single-character steps (this is a rough estimate). For example:If you have 15 “AB” and 15 “AC” images = 30 total,And 20 “A” images,With epoch = 12: Solo A steps: 20 × 12 × 5 = 1200 40% of that = 480 480 ÷ 12 = 40 40 ÷ 30 = ~1 repeat So you set:A = B = C → repeat = 5AB = AC = BC → repeat = 1 ⚠️ Please note that the 40% step count is calculated based on the total number of dual-character images containing the same character, so it's the number of "AB" and "AC" images combined that have "A," not just the number of "AB" or "AC" images alone. 3. No need to train with 3boys images.Just tag the above dataset and begin training. 4. Your first result might not be perfect.Follow earlier troubleshooting strategies. Let me expand a bit here:Now that you’ve added a third character “C”,“A” becomes part of two “conjoined twins” structures—“AB” and “AC”.That means A might now absorb some traits from both B and C."A" might be seen as part of a "2boys" concept.This can cause issues in generation:When you generate a dual or three-character image, "A" might be treated as two "2boys," and "B" might also be treated as two "2boys." With the weights stacked,a duo image may behave like it’s layering four “2boys” concepts,making extra characters and confusion more likely. With trio images, it’s even worse—now it’s like hiding six “2boys” concepts at once.And since the base checkpoint has no real understanding of “3boys”,it gets overwhelmed. But because of the limitations of the checkpoint and the resolution, a "double weight" for "2boys" doesn't mean you'll get "4boys."Instead, it will display all the features learned from the "conjoined twin" phenomenon. At the same time, because the character trigger words like "A" have a higher weight in single-character images, the extra characters generated in "2boys" images are usually the same characters as in the trigger words. For example, when you use "A+B," you might get two blurry figures similar to A and B in the background.You can get rid of these by reducing the resolution or adjusting the prompts. For "3boys," it's like generating "6boys" secretly.Because checkpoints don’t natively support “3boys”,confusion multiplies.The best way to limit this is by reducing resolution. When generating with “A + B + C”,each trigger word contains its own “2boys” data,not only will it produce extra characters, but the character features will also be more jumbled. It's more like it's randomly picking three characters from AABBCC and putting them on a "three-person conjoined" outline. In theory, training on a large dataset of “3boys” imageswould teach the LoRA what “3boys” actually means and solve most of these issues.But where are you going to find that many trio images? Even the thought of compositing 100 trio images is exhausting. That said, if my understanding is correct,then if you had enough “3boys” images,you wouldn’t need any solos or duos—You could be able to generate any single- or dual-character combination just by describing their features. This also explains why the dual-character step count is calculated based on the total number of dual-character images containing the same character. In my limited tests, if I calculated it based on just the number of "AB" images, the weight seemed too high, and "A" learned too much "conjoined twin" information from "AC." The same happened with "B" and "C," which led to more confusion. 5. You can use the same method to prepare single-character and the same number of four-character dual-combination images to make a "4boys LoRA." The steps and repeats should also be calculated based on the total number of dual-character images containing the same character. But please be prepared—training “3boys” or “4boys” LoRAs and getting a good resultwill require multiple rounds of fine-tuning. A final reminder—The LoRA training method introduced in this article is based on Illustrious 2.0.But let’s be real: everything I’ve written here is just a tiny drop in the vast ocean of AI image and video generation.There’s still so much we don’t know, so much more to learn, to try, and to explore.This article will probably become outdated very quickly.After all these long-winded paragraphs, we’re finally reaching the end.I’m sorry if my writing has been a bit roundabout or difficult to follow.This method isn’t perfect, and I know my understanding of the underlying mechanisms isn’t accurate—it’s just intuition based on the phenomena I’ve observed.There’s so little discussion online about multi-character LoRA, especially for boys.It feels like everyone’s exploring alone, in silence. So I hope that my experience, whether you use them as a reference or a cautionary tale, can still be helpful to someone who needs them.Feel free to leave a comment or message me privately on Civitai or Pixiv—my ID is kitakaze5691. So many people have created such charming and memorable OCs—handsome, cute, full of personality.And many of them have tried to load these OC LoRAs together to generate 2boys images, only to get messy and chaotic results. So, allow me a tiny bit of pride—this article not only offers a simple explanation for why that happens,but also presents a method that might finally bring those OCs together,in the same image,interacting closely—while still staying true to who they are. It’s hard to believe it’s been nearly a year since I started working with AI-generated images.Along the way, I’ve met so many strangers who’ve like friends.Your support and encouragement have made me feel like I’m teenager again—a curious and passionate 14-year-old who never stops exploring. Every time I see a familiar ID in my notifications, it brings me real joy.I haven’t felt this kind of warmth and comfort in a long time, and at ease from my depression.Thank you, truly. I’ll continue to share new images, new loras, and new thoughts whenever I can.I hope they bring you even a little bit of happiness. So… until next time~
( 7 ) My Second Note on Experiences, About "2boys" ( 7 )

( 7 ) My Second Note on Experiences, About "2boys" ( 7 )

1 https://tensor.art/articles/8935784269951841612 https://tensor.art/articles/8938012445298126503 https://tensor.art/articles/8938115792948804674 https://tensor.art/articles/8938203152583353995 https://tensor.art/articles/8938481466464460196 https://tensor.art/articles/8941842085099989688 https://tensor.art/articles/894202977517102361There are also a few other things I’ve noticed when it comes to tags. In Illustrious 2.0, even elements in the image that weren’t explicitly tagged can still be picked up and fitted into the model.And when you later prompt those untagged elements,the AI will tend to generate something close to the corresponding part of the training image. But figuring out which ones it learned that way?That’s pretty much impossible. Also, similar or conceptually related tags tend to affect each other.This becomes even more obvious when training clothing-related features. For example:Say you only tag an outfit as “sneakers” during training—then during generation, even if you use tags like “footwear” or “shoes,”the model still ends up generating the “sneakers” from the training data. To deal with the chaos that this kind of thing can cause in “2boys” generation,I recommend looking up the exact clothing item’s tag on Danbooru,and then assigning it a dedicated custom name—like “someone blue t-shirt,”instead of vague general terms like “blue shirt” or “short sleeves.” The same thing I mentioned in the last paragraph also applies here.When training a “1boy,” you can afford to describe clothing in more detail—all the little design elements, accessories, etc.—breaking them down into pieces helps ensure they don’t bleed into other outfits during generation. But when training “2boys,”detailed tagging of clothing tends to result in absolute chaos during generation.Because some tags get “locked,”and others remain “replaceable,”and you still have no idea which are which. So, if you want to make sure clothing appears on the correct boy in a 2boys LoRA,you need to tag as little as possible. Of course, the tradeoff is that when generating other outfits,the original outfit’s features will often stick around.Now, here’s another fun thing about clothes—Illustrious 2.0 doesn’t just ignore what you tag—sometimes it tries to “replace” similar-looking content in the imagebased on what the tag says. For instance:if the image shows a “tank top”,but the tag says “vest”,then the model might literally rewrite the tank top into a vest in the generated image—like conjuring up something totally different out of nowhere. One reason this happens is that when using auto-tagging,a single item might get labeled with several different but similar tags.If you don’t manually check and clean these up,this ambiguity will likely introduce chaos later on during generation. To avoid that, it’s still best to stick with Danbooru-standard tags when training on Illustrious 2.0. Then again… you can also use this behavior on purpose.Like a little... “concept swapping.” Here’s a not-so-safe-for-Tensor example:you replace “pexxs” with “small pexxs” in your training data.The AI learns that “normal” = “small.”Then, when you generate with just the word “pexxs,”the model ends up producing large ones,as if it’s trying to fill in the gap left by “normal.” Yes—they do actually come out bigger than if you had just labeled them as “pexxs” in the first place. Sigh… you know what I mean.Trying to avoid getting banned on Tensor here 😅 But!I’m saying “but” again...this kind of thing only works on certain tags,so you can’t reliably use it to swap concepts.There’s no way to know in advance which ones will work. Also, regarding the whole “custom naming” trick I mentioned earlier—it doesn’t always work either. Some tags, even when you give them a unique name,won’t actually be learned by the model.But if you swap them out for a class token,suddenly the AI gets it and learns it properly. Again, you just can’t tell in advance which tags behave this way. Sometimes a LoRA turns out looking off,or is missing details,or no matter how much you tweak it, it just doesn’t feel well-fitted.It’s very likely the issues described in this section are to blame—but you’ll have no idea which tags are actually causing the mess. So if this kind of thing happens to you,you’ve got two options:either comb through your dataset carefully…or just accept your fate and move on. So, following my approach,you’ll probably want to stick to characters with simpler designs.I haven’t yet figured out a reliable method to maintain generation quality for more complex designs. For now, the only option is to increase batch size and go for overfitting,which helps improve things somewhat. Let me repeat one more time:As long as you have enough training images, most of these problems go away.You don’t need to worry about all this or follow the methods in this article. This whole workflow is only for situations where you’re short on data.Basically, that’s the full extent of my approach and methodology for creating a “2boys LoRA”.If the first result you train doesn’t come out well, you can use the flaws described earlier to roughly determine what’s missing.In general, you can improve it by retraining with adjustments like the ones below: 1、 Most importantly, increase the number of “2boys” images in your training set.For example, if each single character has 20 images, but your “2boys” images are several times more than that—or even over 100—then even if you throw them all into the same folder with the same repeat value,you can still get pretty good results. But if the situation is reversed, and “2boys” images are scarce,that’s when the methods in this article actually start to become effective. Here’s something extra I noticed during testing:Putting different characters into separate folders always seems to give better results than putting them all into one folder.I honestly don’t know why. Also note: folder names can actually be used as trigger words.After some A/B testing, I found that for “1boy” LoRA training,you can use the folder name as the character’s trigger word,and skip writing it in the tag file entirely. But for “2boys” training, don’t do that.Give the folder a name unrelated to any trigger words,and whatever you do, don’t make the folder name overlap with any trigger words in the tags. 2、Adjust the repeat values.Based on the results of your generations, either increase or decrease the steps for single or double characters. 3、Figure out which features can be locked-in using tags,and try re-tagging and retraining. 4、Try switching to a different random seed.Honestly, random seeds are part science, part superstition.No matter how much you tweak steps 1–3,you’ll often find that loss curves for the same dataset under the same seed look roughly the same.So changing the seed might make it better—or worse. Who knows. 5、Try slightly tweaking the optimizer parameters,or switch to a completely different optimizer and see what happens. 6、That’s as far as my personal methods go.The rest is up to your own experiments.One last thing to add is a special case: regularization. While training single-character LoRAs,I tried adding a large number of “2boys” images into the regularization setand enabling regularization during training. In theory, this kind of setup should reduce feature blending when combining LoRAs later on. But whether it was due to the images I used, or the tags, or the parameters,the results weren’t good—so I abandoned that method. Let me wrap things up with a little side note:A friend of mine jokingly asked—“Why are you going through all this effort to build an AI model?Wouldn’t it be easier to just draw it yourself? Or commission an artist?” Well... sure, if I could draw,or if I were filthy rich,then yeah, I wouldn’t even need to consider AI or commissions, would I? But honestly, the rise of AI-generated images and the practical use of LoRAhave opened the door for people who can’t draw to still create doujin and fanart content. And that’s kinda amazing. Ok, that basically wraps up the discussion on “2boys LoRA”.
( 6 ) My Second Note on Experiences, About "2boys" ( 6 )

( 6 ) My Second Note on Experiences, About "2boys" ( 6 )

1 https://tensor.art/articles/8935784269951841612 https://tensor.art/articles/8938012445298126503 https://tensor.art/articles/8938115792948804674 https://tensor.art/articles/8938203152583353995 https://tensor.art/articles/8938481466464460197 https://tensor.art/articles/8941965157387888398 https://tensor.art/articles/894202977517102361To reduce this issue as much as possible,the AB folder’s weight should be lower than that of the A and B folders.The closer the weights are, the more confusion you’ll get. In my tests, a 20%–40% weight for AB compared to A and B is a decent range. Why a range instead of a fixed value? Because here comes the trickier problem:Different characters don’t learn at the same pace.I mentioned this earlier—solo characters already vary in learning speed,but in duo images, this difference becomes even more amplified,which increases the chaos. Too many training steps, and the traits get overblended.Too few steps, and the model can’t tell them apart. In both cases, the result is the same:mixed traits, with no clear separation between the two characters. This is just based on my personal testing, but here’s how to tell them apart: If the characters are swapping traits like hair color or eye color,it means you’ve trained too much. If traits like face shape, hairstyle, or eye shape are merged into one,then it means you’ve trained too little. And please note—this is very important!—There’s a minimum image count required, as mentioned earlier:“A = B ≥ 20, AB ≥ 15.” Based on testing, anything below this count makes weight ratios meaningless.At that point, the model simply doesn’t have enough information to learn. Let me emphasize again:If the characters you're trying to train always appear together in pairs,and you have plenty of material—say, it’s easy to gather over 100 duo images—then you don’t need to follow the method in this article at all. Just throw in a few solo images alongside and train the LoRA normally.The more material you have and the more diverse it is,the better the AI can learn to distinguish characters across different contexts—which drastically reduces the error rate. This article is aimed at situations where you only have two solo characters,and you lack enough duo images or even solo materials.It’s a workaround approach when you’re dealing with limited data. To put it another way:If you have enough duo images, then you’re essentially performing a complete “2boys” concept replacement.But if your material is lacking, you can only do a partial replacement.Anyone into anime should find this metaphor easy to get😂. Of course, even in a low-resource situation,you can still create over 100 duo images using the earlier “stitching” method,but yeah... in practice, it’s way more exhausting than it sounds. This reminds me of something—sorry, a bit of a mental jump here—but when proofreading, I couldn’t think of a better place to put this,so I’m going to include it here: Why is it that for some checkpoints,even if they don’t recognize a certain character,you can load just that character’s solo LoRA and still get a correct result? The answer is simple:A lot of LoRA creators use auto-screenshot scripts,or for other reasons, their training sets already include “2boys,” “3boys,” etc. group images.So if both A and B appear in the same anime screenshot,then A’s LoRA, via the “2boys” tag and the eye/hair color tags of characters other than A,has already “absorbed” parts of B’s hairstyle, hair color, and eye color. Even if those duo images make up a small portion of the training data,if B’s LoRA also happens to contain similar duo images,then B ends up “learning” some of A’s features too—and through this coincidence, you get something that behaves like a true “2boys LoRA.” But if you use a C character LoRA from the same creator or same batch,but C’s dataset doesn’t contain similar duo images,then generating AC or BC will not work properly. This kind of behavior makes it even clearer that the AI doesn’t truly “understand” that A and B are two separate boys.What it’s really learning is that “2boys” + all those descriptive tags together represent a conjoined concept—like Siamese twins.That’s why I mentioned earlier:you can use faceless images to generate training data—by leveraging this exact phenomenon,and removing all the extra distracting features,you can create a clean character “silhouette” for the model to learn from. Here comes another particularly tricky issue: tagging. Based on the current logic of LoRA training—“If you don’t add a tag, the AI learns it as an inherent (fixed) feature; if you do add a tag, it’s learned as a replaceable trait”—when training a “1boy,”even if you only use a single trigger word like “A” and don’t tag any other features,as long as your dataset is large enough, the AI can still learn it properly. The most common approach is to tag hair color and eye color.If you use an auto-tagging tool, it might also include some hairstyle descriptors like “hair between eyes,” “sidelocks,”as well as things like “freckles” and “fangs.” Adding or omitting these more detailed tags doesn’t really make much of a difference for a “1boy” LoRA,since you're only trying to generate one specific character—A. But in duo-image training, weird things start to happen. For example, take the tag “fang.” If character A has fangs that show even when his mouth is closed,and in the training tags for “1boy” you include the tag “fang,”then the generated results will show those fangs clearly—you’ll see them across various expressions. If you don’t include the “fang” tag,the AI will still sort of “guess” that thing might be a fang,but when generating, the result will be vague—sometimes it’ll just look like something stuck to the lips. In these cases, if you add “fang” as a prompt at generation time,the AI suddenly “gets it” and will output a clear fang. Now, when you’re training on “AB” duo images—if you don’t include the tag “fang,”the AI will learn that part as an uncertain element.Even if A has fangs, the generated result will be blurry around that area.And at that point, you can’t just fix it by prompting “fang” during generation,because B will also get affected by that tag. But if you do include “fang” in the training tags,then it gets learned as a replaceable feature—not an inherent trait of A.So if you don’t prompt “fang,” it won’t appear at all.And if you do prompt it, B might still end up being influenced by it. However!Not all tags behave like this.Let’s say character A has traits like “hair between eyes” or “sidelocks.”If you include those in the training tags,they can generally be locked to character A.Even more interesting—these kinds of tags,even if they aren’t included in the training tags,can still be used at generation time,and the AI will correctly recognize which character they belong toand generate the right visual features. But with thousands and thousands of tags,how can you possibly know which ones can be “locked” and which are only “replaceable”?You can’t.And what’s more—this is just how Illustrious 2.0 behaves.Other base models may handle tags differently. After all, this is anime we’re dealing with—characters can have wildly diverse appearances.By labeling those features, the AI can learn them in greater detail,which helps produce clearer and more accurate generations. But with “2boys,”you can never be sure which tags will actually get “locked” to which character.So in the end, the best approach is to not include those tags at all. In other words:you should label characters using only the character trigger + basic hair color + basic eye color,in order to preserve each character’s unique features. And ideally, don’t use only the character trigger word without any additional tags,as this increases the chance of character blending. Let me explain this again: Some multi-character LoRAs can still generate the correct character even when you only use the trigger word—but that doesn’t necessarily mean the LoRA is well-trained. From my own tests, I found that in most cases,the reason is that the LoRA’s trigger words just happen to overlap with the checkpoint’s existing recognized characters. This overlap doesn’t need to be exact—even differences in first-name/last-name order still work. In fact, even without loading the LoRA,those trigger words can still be used with the base checkpoint alone to generate that character. Some LoRAs only work correctly on specific checkpointsbecause only that checkpoint has been trained to recognize that character. I won’t bother including examples here.
( 5 ) My Second Note on Experiences, About "2boys" ( 5 )

( 5 ) My Second Note on Experiences, About "2boys" ( 5 )

1 https://tensor.art/articles/8935784269951841612 https://tensor.art/articles/8938012445298126503 https://tensor.art/articles/8938115792948804674 https://tensor.art/articles/8938203152583353996 https://tensor.art/articles/8941842085099989687 https://tensor.art/articles/8941965157387888398 https://tensor.art/articles/894202977517102361Now let’s go over the training parameters.Once your images are ready, it’s recommended to manually crop and resize them.Then, enable the training options:“Enable arb bucket” and “Do not upscale in arb bucket”.Set the resolution to 2048×2048.This way, none of the images in your dataset will be automatically scaled or cropped.Otherwise, if an image is smaller than the set resolution, it will be upscaled automatically—but you won’t know what effect that upscaling has.If it’s larger than the set resolution, you won’t know what part got cropped. Please note that resolution size directly affects VRAM usage.Here’s a rough explanation of what factors impact VRAM consumption:In the training settings, as long as you don’t enable “lowram”—meaning the U-Net, text encoder, and VAE are loaded directly into VRAM—and you enable “cache latents to disk” and similar options,then VRAM usage is mostly determined by the maximum resolution in your training set and your network_dim.It has nothing to do with the number of images, and batch size has only a minor impact. Running out of VRAM won’t stop training entirely—it’ll just make it much slower.Of course, if you're using online training platforms like Tensor, you don’t need to worry about this at all. At dim = 32, a resolution of 1024×1024 basically uses around 8GB of VRAM.I assume that if you’re training locally, you’ve probably already moved past 8GB GPUs. At dim = 64, anything above 1024×1024,  up to 1536×1536, will pretty much fill up a 16GB GPU. At dim = 128, VRAM usage increases drastically.Any resolution will exceed 16GB—Even 1024×1024 might go over 20GB,and 1536×1536 will come close to 40GB,which is far beyond what consumer gaming GPUs can typically handle. To put it simply:the dim value determines the level of detail a LoRA learns—the higher it is, the more detail it can capture.Lower dims learn less.Higher resolution and more content require higher dim values. For single-character 2D anime LoRAs, 32 is usually enough.For dual-character LoRAs, I recommend 32 or 64.For more than two characters, things get tricky—You can try 64 or 128, depending on your actual hardware. As for resolution, things get a little more complicated. In theory, the higher the resolution, the better the result.But in practice, I found that training at 2048×2048 doesn’t actually improve image quality when generating at 768×1152 or 1024×1536 etc.It also doesn’t really affect the body shape differences that show up at various resolutions.That said, since most checkpoints don’t handle high resolutions very well, generating directly at 2048×2048, 1536×2048, etc., often leads to distortions.However—those high-res images do have significantly better detail compared to low-res ones.Not all of them are distorted, either.At certain resolutions—like 1200×1600 or 1280×1920—the generations come out correct and stunningly good, far better than low-res + hires fix. But… training at 2048×2048 comes at a huge cost. So here’s the trade-off I recommend: Use dim = 64 Set resolution to 1280×1280 or 1344×1344 This gives you a balanced result on a 16GB GPU. I recommend 1344×1344, and setting the ARB bucket upper limit to 1536. With auto-upscaling enabled: Images with a 2:3 aspect ratio will be upscaled to 1024×1536 Images with a 3:4 ratio will become 1152×1536 This covers the most common aspect ratios, making it super convenient for both image generation and screenshot usage. Just a quick note here:Since I was never able to get Kohya to run properly in my setup, I ended up using a third-party GUI instead.The settings are pretty much the same overall—I’ve compared them with Tensor’s online interface,just simplified a bit, but all the important options are still there. As for the optimizer:I’ve tested a lot of them, and under my specific workflow—based on my setup, training set—I’ve found that the default settings of “Prodigy” worked best overall.But that’s not a universal rule, so you should definitely adjust it based on your own experience. I recommend always enabling gradient checkpointing.But gradient accumulation steps are usually unnecessary.If your resolution and dim don’t exceed your VRAM limits,there’s really no need to turn it on.And even if you do exceed the limit, turning it on doesn’t help much. Also, if you do enable it, you should enter the number as your batch size here,like"4", and your batch size should be 1. Put simply, gradient accumulation is just a simulation of batch sizes.In the end, the actual effect is determined by the product of accumulation steps × batch size.Here’s a quick summary of my “2boys LoRA” workflow: 1、Prepare source material to train two individual LoRAs,and use them to generate high-quality solo character images. 2、Stitch together the solo images to create dual-character compositions.If the original solo source images are already high-quality,you can use those directly for stitching too. 3、Sort images into three folders:“A”, “B”, and “AB”.Based on the number of images and your target steps (usually 800–1200),calculate the repeat value.repeat for “A” and “B” should be the same.For “AB,” calculate steps as 20%–40% of the solo steps,and adjust its repeat accordingly. 4、Tag the images.Pay attention to tag order, and for stitched images, make sure to include specific tags like"split screen", "collage", "two-tone background", etc.(More on this part later.) 5、Set up your training parameters.An epoch count of 10–16 is ideal; usually, 10 is more than enough.Start training, and monitor the loss curve and preview outputs as it runs. I have to throw some cold water on things first:Even after completing all the steps above,you might get a LoRA that works decently,but more often than not, it won’t turn out to be a satisfying “2boys LoRA.” What follows might get a bit long-winded,but I think these explanations are necessary—for better understanding and comparison—so I’ll try my best to keep the logic clear. As mentioned earlier, generating with a dual-character LoRA tends to run into a lot of common errors.So here I’ll try to analyze, from my limited understanding, the possible causes, some countermeasures, and also the unsolvable issues.Like I said before, the AI’s learning mechanism when training a LoRA doesn’t treat something like “A, black hair, red eyes” as an actual boy.Instead, it treats “1boy = A = black hair = red eyes” as a single, bundled concept.When all those tags show up together, the AI will generate that concept completely.But when only some of the tags are present, because they are so highly correlated as a group—especially with “1boy”—you end up getting partial features of the full set, even if not all tags are written. This is why a LoRA trained only on A and B single-character images can’t generate a correct two-boy image:“1boy” includes both A and B’s traits. Similarly, if your dataset includes only “AB” (two-character) images, then“2boys” becomes something like: “A + black hair + red eyes + B + blue hair + grey eyes” In this case, “A” or “B” is no longer a clean standalone identity,because they’re each tied to part of the “2boys” concept. Notice how I used “+” signs here, instead of the “=” we used with single-character images.That’s because when you train “1boy,” there are fewer tags involved,so all those traits get lumped together neatly. But for “2boys,” the AI doesn’t understand that there are two separate boys.It’s more like it takes all those features—“A, black hair, red eyes, B, blue hair, grey eyes”—and smears them onto a blank sheet of paper, which becomes two humanoid outlines,and the AI just randomly paints the traits onto the shapes. Even though most checkpoints don’t natively support “2boys” well,when you load a LoRA, the checkpoint’s weights still dominate over the LoRA.In my testing, whether using solo or duo images—if you don’t set up the trigger words A, B, to help the model associate certain features (like facial structure or skin tone) with specific characters,then the base model’s original understanding of “1boy” and “2boys” will interfere heavily,causing the model to simply fail to learn correctly.So for a “2boys LoRA,” it’s essential to define character trigger tags. Here’s where things get paradoxical: In single-character images, the tag “A” equals all of A’s traits.But in two-character images, “A” doesn’t just equal A.Instead, “A” is associated with the entire bundle of“2boys + A + black hair + red eyes + B + blue hair + grey eyes.” So when generating “2boys,”using the trigger “A” + “B” actually becomes a critical hit—a double trigger:you’re stacking one “definite A” + one “fuzzy A,”plus one “definite B” + one “fuzzy B.” That’s why using a 2boys LoRA often leads to trait confusion or extra characters appearing.It’s not just because the checkpoint itself lacks native support for dual characters.
( 4 ) My Second Note on Experiences, About "2boys" ( 4 )

( 4 ) My Second Note on Experiences, About "2boys" ( 4 )

1 https://tensor.art/articles/8935784269951841612 https://tensor.art/articles/8938012445298126503 https://tensor.art/articles/8938115792948804675 https://tensor.art/articles/8938481466464460196 https://tensor.art/articles/8941842085099989687 https://tensor.art/articles/8941965157387888398 https://tensor.art/articles/894202977517102361You can also use this method with black-and-white manga as your training dataset.As long as the tags are properly applied, it works just fine. You can manually erase unwanted elements like text, speech bubbles, panel borders, or any distracting lines—this helps reduce visual noise in the dataset. That said, if you have any colored images available, be sure to include them too.That way, the model can learn the correct color mappings from the color images, and apply them to the grayscale ones.Otherwise, the AI will try to guess color information based on its own understanding—and it’ll just end up picking something random based on the style.  When training on black-and-white manga, tagging is absolutely essential.At a minimum, make sure you include:“greyscale,” “monochrome,” “comic,” “halftone,” etc.—these tags tell the AI what kind of image it’s looking at.When using an auto-tagging tool, try adjusting the confidence threshold to 0.1—this helps detect lower-confidence visual elements that might still be important. Also, manga-specific drawing techniques—like "halftone""speech bubble",—should be explicitly tagged if possible.You can use the Danbooru tag index to look up the correct vocabulary for these features. Even for things like hair and eye color in B&W manga, it’s totally okay to use normal tags like “brown hair” or “brown eyes.”As long as the image is also tagged with “greyscale” or “comic,” then more advanced community-trained checkpoints will be able to account for that and avoid misinterpreting the color info. And if you find that your results still aren’t coming out right, you can always tweak things using positive/negative prompt tokens, regenerate some images, and retrain using those images.Next, let’s talk about how many images you actually need in the training set when using this method. In single-character LoRAs, it's ok to make the dataset as diverse and rich as possible, since you can always fix minor issues during image generation. But for 2boys LoRAs, it’s a different story.You really need to carefully filter out the best-quality images—skip the blurry ones—and put both your original source material and any AI-generated images you plan to reuse in the same folder.If you want the LoRA to lean more toward the original art style, you can increase the ratio of raw, unmodified source images. And don’t worry about file names—even if you duplicate the best images and their corresponding .txt tag files, your OS (like Windows11) will auto-rename the duplicates, and LoRA training tools support UTF-8 filenames just fine. You don’t need to rename everything into English. Now, let’s talk about image types. Close-up images almost always look better than full-body ones.That’s just how AI generation works—detail quality falls off fast at long distances.Even tools like adetailer may struggle to recover facial features at full-body scale. However, full-body images are crucial for teaching the AI correct proportions.So here’s a trick I discovered: when generating full-body samples, you can deliberately suppress certain features to prevent the AI from learning bad versions of them.For example: Use prompt tokens like “closed eyes,” “faceless,” or even “bald” to make the AI leave out those features entirely. That way, you get a clean full-body reference without noisy detail on the face or hair. Don’t worry—hair or eyes features will still be learned from other images in your set. And because you explicitly tagged those “blank” images as “faceless” etc., they won’t bleed into the generation unless you use those tags later. But, if your setup can generate proper full-body detail using hires.fix or adetailer, don’t bother with the faceless trick.In my testing, 99% of the time the AI respects the faceless tag, but there’s always that one weird generation where you get something uncanny. It can be spooky. The same logic applies to hands and feet—you can use poses, gloves, socks, or out of frame to suppress poorly drawn hands, or even generate dedicated hand references separately. The key point here is: this trick isn’t about increasing generation quality—it’s about making sure the AI doesn’t learn bad patterns from messy source data. If you’ve ever used character LoRAs that produce blurry or distorted hands, this is one of the reasons.Yes, some of it comes from checkpoint randomness—but more often it’s because anime source material barely bothers drawing hands, and the LoRA ended up overfitting on lazy animation frames. So that’s the general idea behind building your dataset—but how many images do you actually need, and how should you set your training parameters?Here’s what I found after testing dozens of trial LoRAs based on this method: At minimum, you need: 20 solo images per character (A and B) 15 dual-character images (AB) And in general: If you keep the step count the same, having more images with lower repeat values gives better results than fewer images with high repeat values. Sure, you can take shortcuts by duplicating images to pad the count, but it’s still best to exceed those minimums above: At least 20 solo images for A, 20 for B, and 15 for AB. From my testing, for any 2boys LoRA, more images is never a problem,but less than that threshold greatly increases the risk of feature blending or characters failing to train. How to Calculate Step CountLet’s get into the math. Assuming batch size = 1,your effective training steps per folder are: Number of images × repeat × epoch Let’s say: A and B both have 60 images, repeat = 2, epoch = 10. Then: A: 60 × 2 × 10 = 1200 steps B: same, 1200 steps Now, for AB (dual-character) images:Try to keep total training steps at 20%–40% of the solo step count. For example: If A and B are both trained with 1200 steps, Then AB should use: 1200 × 0.4 = 480 steps Assuming epoch = 10: That’s 480 ÷ 10 = 48 steps per epoch So your AB folder should have: 48 images × 1 repeat or 24 images × 2 repeat or 15 images × 3 repeat(any of these combos works) Why this matters:As I explained earlier, the model doesn’t actually “understand” that A and B are separate characters.Instead, “A + B” gets merged into a conjoined concept like a two-headed chimera. Training on “AB” is essentially learning a completely different concept than A or B solo.But because they all share overlapping tokens, they affect each other. So when you prompt “A + B” during generation, the model is actually stacking: 1 of “A,” 1 of “B,” and 1 hidden “A” and 1 hidden “B” lurking underneath it all. The more your training steps for AB approach those of A and B, the more this overlapping weight stacking leads to feature confusion. Now, here’s another issue: Each character learns at a different pace.Let’s say A gets learned completely by step 600, but B still lacks features at that point.If you continue to step 800, A becomes overfit, and B is only just reaching the ideal point.At 1000 steps, A is a mess and B might only just overfit. This mismatch increases the chance that their traits will blend together. One workaround is to: Make sure AB has more than 15 images, and Give B’s solo folder a few more images than A’s, while keeping the repeat value the same between them. What About Larger Batch Sizes?If your goal is to stabilize character clothing or reduce merging,you can try using a higher batch size for intentional overfitting. Still keep the solo step count between 800–1200, and recalculate repeats. Here’s an example: A and B each have 60 images Target: 1000 steps batch size = 4 epoch = 10 Then: 1000 ÷ 10 × 4 = 400 steps per epoch400 ÷ 60 ≈ 6.66 → round down to 6 repeats So: 60 × 6 ÷ 4 × 10 = 900 steps Now for AB: Say you have 30 images 900 × 0.4 = 360 360 ÷ 10 × 4 = 144 144 ÷ 30 ≈ 4.8 → round to 5 repeats So: 30 × 5 ÷ 4 × 10 = 375 steps375 ÷ 900 = ~0.417 → which is within the ideal 40% range.
( 3 ) My Second Note on Experiences, About "2boys" ( 3 )

( 3 ) My Second Note on Experiences, About "2boys" ( 3 )

1 https://tensor.art/articles/8935784269951841612 https://tensor.art/articles/8938012445298126504 https://tensor.art/articles/8938203152583353995 https://tensor.art/articles/8938481466464460196 https://tensor.art/articles/8941842085099989687 https://tensor.art/articles/8941965157387888398 https://tensor.art/articles/894202977517102361Let’s rewind to the beginning—back when I started working on my Lagoon Engine LoRA. Step one, of course, was building the training dataset.My first instinct was: Well, it’s a 2boys LoRA, so I need images with both boys together.Even though the available materials were limited, the good thing was that the two brothers almost always appear together in the source material—solo shots of either of them are actually pretty rare.So I collected some dual-character images, then cropped them manually to extract individual character shots. I put each set of solo images into its own folder and made sure they had the same number of repeats during training. From the very first alpha version, all the way through various micro-adjustments and incremental releases up to beta, I stuck to that same core idea. And when the beta version finally came out and the results looked pretty good, I was over the moon.Naturally, I wanted to replicate that setup for another 2boys LoRA. But… total failure.I made several more 2boys LoRAs with different characters, and every single one of them had serious problems.Either the features got horribly blended, or the LoRA straight-up failed to learn the characters at all. It was super frustrating. I couldn’t figure it out. Was it really just luck that the first one worked? Did I get lucky with the way the alpha and beta versions happened to progress, avoiding the worst-case scenarios?I didn’t want to believe that. So I went back and did a series of controlled variable tests, trying to isolate what might be causing the difference. I made a whole bunch of test LoRAs just to look for clues.That process was full of messy trial-and-error, so I won’t write it all out here.Let’s skip to the conclusion: Making a truly stable and controllable 2boys LoRA is almost impossible. Most of what you’re doing is just trying to stack the odds in your favor—doing whatever you can to make sure the correct information is actually learned and embedded into the LoRA, so that it at least has the chance to generate something accurate. Let me try to explain, at least in a very basic and intuitive way, how this works—both from what I’ve felt in practice and from a surface-level understanding of the actual mechanics. Training a LoRA is kind of like doing a conceptual replacement. Say you have this tag combo in your dataset:“1boy, A, black hair, red eyes.”Let’s say “A” is your character’s trigger token. Inside the LoRA, those tags don’t really exist independently. The model ends up treating them like a single, bundled concept:“1boy = A = black hair = red eyes.” That means when you use these tags during generation, the LoRA will override whatever the base checkpoint originally had for those tags—and generate “A.”Even if you remove some of the tags (like “A” or “black hair”) and only keep “1boy,” you’ll still get something that resembles A, because the LoRA associates all of those traits together. Now let’s add a second character and look at what happens with this:“2boys, A, black hair, red eyes, B, blue hair, grey eyes.” The AI doesn’t actually understand that these are two separate boys.Instead, it just sees a big lump of tags that it treats as a single concept again—this time, the whole block becomes something like:“2boys + A + black hair + red eyes + B + blue hair + grey eyes.” So if your dataset only contains pictures with AB, it won’t be able to generate A or B separately—because A and B are always bundled with each other’s features.If you try generating “1boy, A,” it won’t really give you A—it’ll give you a blend of A and B, since A’s identity has been polluted with B’s features in the training data. On the flip side, if your dataset only contains solo images of A and B—no dual-character pictures at all—it’s basically the same as training two separate LoRAs and loading them together. The features will mix horribly. Even as I’m writing this explanation in my native language, I’m tripping over the logic a little—so translating it into English might not fully capture the idea I’m trying to get across.Apologies in advance if anything here seems off or confusing.If you find any parts that sound wrong or unclear, I’d really appreciate any feedback or corrections. And that brings us to an important question:If we want a 2boys LoRA to be able to generate both 1boy and 2boys images, does that mean we need both solo and dual images in the training set?Yes.Like I mentioned earlier, for popular characters, you don’t even need a LoRA—the checkpoint itself can usually generate them just fine.But when it comes to more obscure ships, or really niche character pairings, there just aren’t many usable images out there.You might not even have enough high-quality dual shots of them together—let alone clean solo images. And for older anime series, image quality is often poor, which directly affects your final LoRA performance. So I had to find a workaround for this data scarcity problem. I came up with an idea, tested it—and surprisingly, it worked. If you’ve seen any of the LoRAs I’ve uploaded, you’ll notice that they all come from source material with very limited visual assets. And yet, they’re all capable of generating multi-character results. So let me explain how this approach works. Fair warning: this part might get a little wordy and logically tangled.I’m not that great at explaining complex processes in a concise way.So translating this into English might only make it more confusing, not less. Please bear with me! First, you can follow the method from my first article to train two separate single-character LoRAs.If you’ve got plenty of high-quality materials, one round of training is usually enough.But if the source is old, low-res, or limited in quantity, you can use the LoRA itself to generate better-quality solo images, then retrain on those. Next, take those generated solo images and combine them into dual-character compositions.Yes, I’m talking about literally splicing two single-character images together.This kind of "composite 2boys" image even has a proper tag on danbooru—so don't worry about the AI getting confused.In fact, based on my tests, these handmade 2boys images are actually easier for the model to distinguish than anime screenshots or official illustrations. Let me break that down a bit:In anime screenshots, characters are often drawn at different depths—one in front, one behind, etc. That makes it harder for the AI to learn accurate relative body proportions.If the shot is medium or long distance, facial detail is usually poor.If it’s a close-up or fanart-style composition, the characters tend to be more physically close, which makes it easier for the AI to confuse features between them during training. By contrast, the composite images you make using generated solo pics tend to have clear spacing and symmetrical framing—making it easier for the AI to learn who’s who. Of course, this whole process is more work.But for obscure character pairs with almost no usable material, this was the most effective method I’ve found for improving training quality. If the characters you want to train already appear together all the time, and you can easily collect 100+ dual-character images, then you don’t need to bother with this method at all.Just add a few solo images and train like normal.The more material you have, the better the AI learns to distinguish them across different contexts—and the fewer errors you’ll get.This whole process I’m describing is really just for when you have two characters with little or no decent solo or dual images available.It’s a workaround, not the ideal path.  As shown above, you can even manually adjust the scale between the characters in your composite image to increase the chance of getting accurate proportions.However, testing shows this only really helps when the characters have a noticeable size difference, or when the height goes all the way up to the forehead.When you scale a character to chest or neck height, it often doesn’t work very well.Also, even though this method can boost generation quality, it also increases the risk of overfitting—especially with facial expressions and poses, which may come out looking stiff.To fix this, you can create multiple scenes with different angles and actions, and include both the original and the composite versions in your training set.That way, the model learns more variety and avoids becoming too rigid.
( 2 ) My Second Note on Experiences, About "2boys" ( 2 )

( 2 ) My Second Note on Experiences, About "2boys" ( 2 )

1 https://tensor.art/articles/8935784269951841613 https://tensor.art/articles/8938115792948804674 https://tensor.art/articles/8938203152583353995 https://tensor.art/articles/8938481466464460196 https://tensor.art/articles/8941842085099989687 https://tensor.art/articles/8941965157387888398 https://tensor.art/articles/894202977517102361So why not just use NovelAI, since it’s so powerful?Because it’s closed-source and you can’t use LoRAs with it. And honestly, that whole universe of LoRAs—hundreds, thousands of them—is just too tempting to walk away from. Sure, NovelAI includes a lot of character data, but it’s still “limited.” And there was a point where it couldn’t generate NSFW content, depending on version and timing. Can we be sure the same won’t happen again, especially considering how things are going on platforms like Tensor? If the pairing you want to generate isn’t super obscure—say, something like Killua x Gon, or a crossover like Ash x Taichi—then yeah, NovelAI is highly recommended. It’s easy to use, has tons of art styles and themes, and honestly just blows open-source models out of the water in many cases. Even with open-source checkpoints, it’s not like you can’t generate those pairings at all. But keep in mind—results vary a lot depending on the checkpoint. Sometimes, using high-quality single-character LoRAs along with matching trigger words can help maintain character identity while improving the final quality. Yeah, this pose is a joke—it’s based on recent events involving Tensor and other platforms. 😂 Anyway, as shown here, if the checkpoint already contains character information, then adding a LoRA doesn’t usually create too much chaos. For popular characters from visually polished and high-profile anime—like Tanjiro, for example—you really don’t even need a LoRA. The base checkpoint already does a pretty good job of recreating them. Just to emphasize it again:If you’re trying to generate fanart for “2boys” pairings that aren’t super obscure, or ones with more than two characters, I strongly recommend using NovelAI first. Alternatively, try generating them directly using a capable checkpoint—you might be surprised how many characters don’t actually need a LoRA to be recognizable. Here’s another example:Say you’re trying to generate Edward Elric and Alphonse Elric from Fullmetal Alchemist. Both can be directly generated using certain checkpoints. Alphonse, when prompted alone, will show up in human boy form—not the armor—so that’s great.But as soon as you try to generate both together, the model pulls up the most common metadata for the two brothers—which usually means Ed + armored Al—so the human form of Al becomes almost impossible to get. If one character’s info is richly embedded in the checkpoint and the other’s isn’t, you’ll get poor results. Even when you pair this kind of checkpoint with a LoRA to help boost the weaker character, features still end up getting mixed, and the overall art style may also be altered.To the untrained eye, this kind of fusion might not stand out—but once you know what to look for, it becomes obvious. If you try to generate a character that isn’t in the checkpoint at all, only relying on a LoRA, then character features almost always get fused together. (There are tons of examples of this online, so I won’t bother including another one.) This is a NovelAI-generated image, and it didn’t use any positive or negative prompt tuning at all.You’ll need to experiment and tweak things more if you want to generate high-detail or highly stylized results. But even in this basic output, Leonhardt’s outfit is reproduced way better than what most LoRAs could manage. And Aoto, who’s a really niche character, still comes out surprisingly accurate.These two characters have nothing to do with each other, but NovelAI lets you combine them however you want. That said, NovelAI is a paid, closed-source commercial model, so every bit of GPU time costs money—you can’t afford to waste resources on endless trial and error. Now let’s talk about img2img and regional control workflows. There’s one big downside when it comes to characters being physically close to each other—like if they’re hugging or holding hands: the area where they make contact tends to look blurry or smudged.And beyond that, the workflow itself is pretty tedious. Whether you’re setting it up or just trying to use it day-to-day, it’s a hassle.That’s why, if you look at some creators on social media—even ones who’ve clearly made good-looking dual-character images—you’ll notice that most of their posts are still focused on 1boy.Why? Because working with LoRA + regional control setups is just that much trouble. Also, think about those moments where inspiration strikes out of nowhere—say, you come up with a perfect prompt on the spot. If you’re using an online app, it’s easy to just type it in and get going.But with these regional workflows, you’ll need a decent local PC to run everything. Worse yet, each plugin and workflow often needs a specific environment or version, and things constantly break.Personally, I gave up right away after trying them out. The more I tried to fix stuff after setup, the more bugs I ran into. So yeah, things like img2img and regional control can give decent results if you’re patient and willing to learn, but the learning curve is real—and the payoff isn’t always worth it. Still, if you’ve got the time and hardware for it, it might be worth a try. Also, let’s be real—there are some creator out there posting what look like flawless 2boys or multi-character images, and they don’t have that telltale NovelAI look either.The quality is way beyond what you can get with just LoRA stacking or inpainting.So don’t even doubt it—some of those are hand-drawn. Yes, people who know how to draw can absolutely also use AI. The two aren’t mutually exclusive. In fact, combining both skillsets might just be the ultimate power move. So, with all that said, I decided that making a dedicated 2boys LoRA for each of my favorite ships was the better path forward. It really can be any pairing. Any art style.You don’t need a crazy workflow—just the LoRA and the right prompt. The basic “principle” of a 2boys LoRA is actually pretty simple: as long as you have enough images where the two characters appear together, the LoRA will start learning from that.The hard part isn’t the theory—it’s how to make a dataset and training process that results in a LoRA that’s actually stable and usable. If you’ve ever trained or used a dual-character or multi-character LoRA, then you already know the common headaches: 1、Character features still get mixed up or swapped, and a fully correct output is just a matter of luck.2、Body proportions between the two characters become completely random—unless their original designs have super obvious size differences (like Chilchuck and Laios). Even if their builds are similar, you’ll end up with one shrinking or growing unpredictably. You’ll instinctively start adding prompt like “same size,” but... yeah, that doesn’t help at all.3、Overall image quality can be pretty low, so you have to pair it with style/detail LoRAs, and usually upscale with hires.fix, adetailer, or something similar.4、And all the usual LoRA flaws still apply, like overfitting or distorted hands, etc.In the first article, I already talked a bit about the emotional rollercoaster of making my very first LoRA—which, by the way, was a 2boys LoRA.But I didn’t go into too much technical detail at the time.So this time, let’s really dive in.But first—sorry in advance.As far as I can tell, there’s basically zero public discussion online about how to train LoRAs for 2boys.Maybe there are some small, private Discord servers where people have shared their methods. But there's no way for us to know.My approach here is based entirely on:personal experience using the few existing multi-character LoRAs that are out there, andthe training parameters that were made public by those creators.From there, I did a bunch of experiments and exploration.So there’s always the chance that I took a completely wrong path right from the beginning. Even if the results I get now seem “okay,” I have no idea if I’m doing things the “right” way.At best, consider this a reference method.If you’re interested in making your own 2boys LoRA and don’t know where to start, maybe my process can serve as one way to approach it—or at least something to compare with.
( 1 ) My Second Note on Experiences, About "2boys"  ( 1 )

( 1 ) My Second Note on Experiences, About "2boys" ( 1 )

2 https://tensor.art/articles/8938012445298126503 https://tensor.art/articles/8938115792948804674 https://tensor.art/articles/8938203152583353995 https://tensor.art/articles/8938481466464460196 https://tensor.art/articles/8941842085099989687 https://tensor.art/articles/8941965157387888398 https://tensor.art/articles/894202977517102361A quick recap— in a previous article, I talked about how I first fell into the world of AI-generated images, and how I started trying out LoRA training. I also shared some of my thoughts and experiences along the way. (Here’s the link: https://tensor.art/articles/868883505357024765) Looking at that article now, I have to say—some of the opinions in it feel a bit outdated at this point, though some parts are still valid and usable. I’ll be referencing a few of those ideas later in this article, so feel free to check the first one for additional context. I also made a promise in that post—if I ever learned more, I’d come back and share the updates. So here we are. This time, I’m going to talk about something I’ve been completely hooked on for a while now: making 2boys fanart and training LoRAs for "2boys". Before we jump in, let me say this clearly: English isn’t my native language, and honestly, I’m not very good at it either. So, like last time, this article was written in Chinese first and then translated into English with the help of AI tools. I’ll also be uploading the original Chinese version in case anyone prefers to read it directly or wants to translate it themselves. (As for the first article, I won’t be uploading a Chinese version—because for that one, I actually edited and tweaked the English draft a lot after translating, and never went back to rewrite a proper Chinese version.) As you already know, when it comes to AI-generated images, randomness is the absolute king. Everyone has their own go-to settings, and the results vary wildly depending on the model, the prompts, and personal aesthetic preferences. So this article is just me sharing my own experience—or maybe “impressions” is a better word—and the techniques I’ve picked up along the way. This is not a serious technical guide or a step-by-step how-to. Any theory I mention is only a surface-level take based on what I’ve felt from experimenting, and both the image generation and LoRA training discussed here are focused entirely on ACG-style boy characters—specifically, pre-existing characters from anime, manga, and games. That means no AI-randomly-generated “boys,” and definitely no real people involved. All of the image generation in this article is based on NoobAI and Illustrious derivative models. As for LoRA training, I only use the official Illustrious 2.0 model—unless otherwise stated later on, you can assume everything is built on top of that. Over the past few months, I’ve noticed that platforms like X , pixiv, Civitai, and all kinds of posts have been flooded with more and more 2boys content—and even multi-boy group images. Of course, most of it is NSFW, and a lot of it is the kind of thing you swipe past in a second: instant gratification, forgettable the next moment. But there are also some incredibly polished, undeniably well-made images that make you stop and wonder—how did they even make this? Back then, I was still a total beginner, so my very first thought was: They must have used two separate character LoRAs, right? And obviously, I had to try it myself. It just so happened that I found LoRAs for two of my favorite boys—both from a super obscure anime, with a ship that literally no one cared about (Maki x Arashi)—and somehow, it... kinda worked? Yep, it was an R-18 image, and I was way too excited to think clearly. I didn’t even care about the generation quality. The characters looked more or less correct, so I assumed I’d done it right. Full of confidence, I made another one with a different ship from the same series (Tsubasa x Shingo). Again, it kinda worked—yes, the characters would sometimes get blended together, but not always. So I doubled down on this misunderstanding that “2boys” images could just be done by loading two LoRAs together. Then came a total shocker: a well-known (actually, legendary) creator in the community released a dual-character LoRA for Yuta x Yomogi, and I was blown away. So this is also possible? Just one LoRA, and both characters in the same image? I posted a whole bunch of images using it (they’ve since been hidden on Tensor), and I misunderstood again—thinking oh wow, making a 2boys LoRA must be super easy! I got so hyped, thinking there’d be tons of dual-character LoRAs coming out soon, and that making fanart for my favorite ships would become a total breeze. By now, I bet you’re already laughing. Yep, it was a huge misunderstanding. If you’ve ever tried this yourself, you’ll know exactly what I’m talking about: when it comes to boys LoRAs, their individual features almost always end up getting blended together. Whether you're loading two separate LoRAs or trying to use one of those rare dual-character LoRAs, it's incredibly hard to get a clean, correct image. And dual-character LoRAs are slightly more manageable, but trying to load multiple LoRAs together? That’s not just “difficult,” it’s borderline impossible. You can mess with the prompt order, used "BREAK" and “different boys,” tweak the LoRA weights all you want—but the characters’ features still end up all mixed up. Looking back now at the images I made—and a lot of the ones I’ve seen on social media—they were mostly just wrong, only I was too hyped at the time to notice. Sure, there are a few images that look “perfect,” but I never dared ask the creators how they did it. And to make things worse, there’s barely any discussion online about how these images are actually made. You’ll find some info on mixed-gender or girl-girl LoRA combos, but boy-boy? None.So let’s skip the unnecessary storytelling and jump straight to the point. After a lot of experimenting (and failing), I’ve sorted out a few methods that more or less work: So after all that trial and error, here are the main approaches I’ve figured out so far:1. The checkpoint already “knows” a lot of characters.Sometimes, you can just write their names directly in the prompt. By checking the model’s README or checkpoint notes, you can find out which characters are already embedded into the model. For those that are, you can generate “2boys” images by simply using their recognizable names. In fact, many popular or classic characters work perfectly without any LoRA at all, and you can even mix them across different series.This should be the easiest method, but the catch is—the pairings people want to draw usually aren’t in the checkpoint. Even when the characters are there, their fidelity varies a lot. Some characters are only partially recognizable, and you’ll often need to add extra tags like eye color, hair color, hairstyle, and other traits to boost accuracy. But this method is only good for generating 1boy images. If you try to do 2boys, these extra tags end up interfering with each other. 2. Img2img or inpainting.A common and pretty straightforward method. You can load two LoRAs and generate a base image, then selectively redo the mixed-up parts by enabling just one LoRA at a time and using inpainting. This is just one version of the process—there are many ways to implement this idea. 3. Regional control.This requires plugins and complex workflows in WebUI or ComfyUI. The idea is to divide the canvas into different regions and assign different LoRA weights to each region. I won’t name specific extensions here—you can look them up yourself. Sometimes this works surprisingly well, but I’ll explain later why I eventually gave up on this method. 4. Dual- or multi-character LoRAs.Although there aren’t many of these around right now, most of the ones that do exist work decently. Some are a bit unstable, but others are quite effective. I’ll talk later about how I create 2boys LoRAs, so I won’t go into more detail just yet. 5. And this one’s huge: NovelAI.Still the most powerful commercial anime image model to this day. It supports an insanely large number of characters—far more than any open-source model. While most open models only include the main characters from a handful of franchises, NovelAI can generate even obscure side characters, and it updates new character data pretty fast.It also supports built-in regional control. So not just two boys—you can specify position and action for even more characters. It’s incredibly powerful, though yeah, the subscription isn’t cheap. A lot of those beautifully done crossover images you see on social media are actually NovelAI creations. You can usually spot them too: the art style is super consistent across characters, and clearly distinct from open-source models. 6. Other new methods—like Flux Kontext.This might be one of the biggest breakthroughs of the year in image generation models. I haven’t had time to explore it deeply yet, but it is capable of generating fanart. If you haven’t tried it yet, I recommend giving it a shot. People often talk about how Flux is #1 for photorealism, but it’s also surprisingly strong with anime-style content. The downside is the high cost and complexity—especially compared to SDXL-based workflows for LoRA training.So yeah, those are the main approaches I’ve explored so far. I’m sure there are other methods I haven’t discovered yet. But out of all these, I eventually chose to go with making "2boys" LoRAs. 
Possible Future Doujin LoRA Production Plans (Ongoing)

Possible Future Doujin LoRA Production Plans (Ongoing)

A list of LoRA I'm currently working on or would like to work on—no particular order. It might take a year, maybe two? Anyway, it's just a reminder to myself not to forget or get too lazy. If you happen to see one you're interested in, please don't get your hopes up—truth is, I really don't have much free time... (T_T)Also, I’m calling these the doujin versions because I’ve chosen to sacrifice fidelity in order to improve usability and reduce the chance of errors. The LoRA will only include character information—there won't be original outfits, accessories, or even things like band-aids. Since anime adaptations often mess with the original designs, the general rule is: if there's a manga, it takes priority; if not, then novel illustrations. Anime is used only as a color reference.The coloring and redrawing of black and white comics requires a lot of time to select and redo, and if necessary, some additional painting styles need to be added to assist.The more steps involved, the more the similarity may drop. But through repeated refinement, I can strip out almost 100% of the unnecessary info, so that only the simplest prompt words are needed when using it, and no additional messy quality prompt words and negative prompt words are used to give full play to the performance of the checkpoint model. It'll also be easier to combine with other effect or art-style LoRA.*******, & *** – that's a secret.轟駆流 & 軍司壮太宿海仁太 & 本間聡志Mikami riku & hidaka yukioTakaki & Aston, ride & mikazuki”long-term plans“ 星合の空 – arashi, maki, toma, yuta, tsubasa, shingo, shinjirou****,**、、 – another secretTenkai knights – toxsa, chooki, guren, ceylancomplete Motomiya daisuke – da02 moviecomplete 急襲戦隊ダンジジャー – kosuke, midori, kouji刀剣乱舞 – aizen & atsushiShinra, Shou, Arthur銀河へキックオフ!! – shou, aoto, tagi, ouzou, kotashinkalion – tsuranuki, hayato, ryuji, tatsumi, gin, jou, ryota, taisei, ten****, & **** – another secret, and with only the manga, it feels super difficultVanguard – kamui & kuronoBlack☆star & soulSubaru & GarfielShirou & kogirucomplete アライブ - 最終進化的少年 - Alive: The Final Evolution – 叶太輔 & 瀧沢勇太touma & tsuchimikadoyuta & yomogiEtc.
12
A bit of my experience with making AI-generated images and LoRAs ( 5 )

A bit of my experience with making AI-generated images and LoRAs ( 5 )

https://tensor.art/articles/868883505357024765 ( 1 )https://tensor.art/articles/868883998204559176 ( 2 )https://tensor.art/articles/868884792773445944 ( 3 )https://tensor.art/articles/868885754846123117 ( 4 )Extract the character from the image and place them onto a true white background. You might lose a bit of original coloring or brushstroke texture, but compared to the convenience it brings, that’s a minor issue.But don’t be too naive—things like expression, pose, clothing, and camera angle still need to be described properly. Doing so helps the AI learn accurate depth of field, which in turn helps it learn the correct body proportions. After that, even if you don’t include camera-related prompts when using the LoRA, it’ll still consistently output correct body shapes.A lot of people use cutout characters for their training data, but their tags miss things like camera info. So you might get a buff adult under the “upper body” prompt, and a toddler under “full body.”By now, you should have a solid understanding of how to prepare your training dataset. 5. Parameter Settings This part is quite abstract and highly variable. There are already many tutorials, articles, and videos online that go into detail about training parameters and their effects. Based on those resources, I arranged several parameters and ran exhaustive tests on them. Since I’m not particularly bright and tend to approach things a bit clumsily, brute-force testing has always been the most effective method for me. However, given the limits of my personal time and energy, my sample size is still too small to really compare the pros and cons of different parameter sets. That said, one thing is certain: do not use any derivative checkpoints as your base model. Stick with foundational models like Illustrious or Noobai. Using a derived checkpoint will make Lora work only in this one checkpoint. Another helpful trick for learning from others is that when you deploy LoRA locally, you can directly view the training metadata within SD or WebUI. I’ll also include the main training parameters in the descriptions of any LoRAs I upload in the future for reference. In the following section, I’ll use the long and drawn-out process of creating the LoRA for Ragun Kyoudai as an example, and give a simple explanation of what I learned through it. But before we get into that case study, let’s quickly summarize the basic LoRA training workflow:The real first is always your passion.2. Prepare your dataset.3. Tag your dataset thoroughly and accurately.4. Set your training parameters and begin training. Wait—and enjoy the surprise you’ll get at the end. As mentioned earlier, the first LoRA I made for Ragun Kyoudai was somewhat disappointing. The generated images had blurry eyes and distorted bodies. I chalked it up to poor dataset quality—after all, the anatomy and details in the original artwork weren’t realistic to begin with. I thought it was a lost cause. I searched through all kinds of LoRA training tutorials, tips, articles, and videos in hopes of salvaging it. And surprisingly, I stumbled upon something that felt like a breakthrough: it turns out you can train a LoRA using just a single image of the character. The method is pretty simple. Use one clear image of the character’s face, and then include several images of unrelated full body figures. Use the same trigger words across all of them, describing shared features like hair color, eye color, and so on. Then adjust the repeat values so that the face image and the body images get the same total weight during training. When you use this LoRA, you just need to trigger it with the facial feature tags from the original image, and it swaps in a consistent body from the other images. The resemblance outside the face isn’t great, but it dramatically reduces distortion. This inspired me—what if the “swapped-in” body could actually come from the original character, especially when working with manga? That way, I could use this method to supplement missing information. I went back through the manga and pulled references of side profiles, back views, full body shots, and various camera angles that weren’t available in the color illustrations. I tagged these grayscale images carefully using tags like greyscale, monochrome, comic, halftone, etc., to make sure the AI learned only the body shape, hairstyle, and other physical features, without picking up unwanted stylistic elements. This approach did help. But problems still lingered—blurry eyes, malformed hands, and so on. So I pushed the idea further: I used the trained LoRA to generate high-quality character portraits using detail-focused LoRAs, specific checkpoints, and adetailer. These results then became new training data. In parallel, I used other checkpoints to generate bodies alone, adjusting prompt weights like shota:0.8, toned:0.5, to guide the results closer to the target physique or my own expectations. The idea was that the AI could “fit” these new generated samples to the rest of the dataset during training. And it worked. This is how the Lagoon Engine Beta version came to be. At this point, I could completely ditch the low-resolution color and manga images from the training dataset and just use AI-generated images. I used prompts like simple background + white background to create portrait and upper body images with only the character. To avoid having blurry eyes and inconsistent facial features in full body shots, I used the faceless tag or even manually painted over the heads to prevent the AI from learning them—allowing it to focus solely on body proportions. That said, white background tends to be too bright and can wash out details, while darker backgrounds can cause excessive contrast or artifacts around the character edges. The most effective backgrounds, in my experience, are grey or pink. During this time, I also experimented with making a LoRA using just one single character portrait—again from Lagoon Engine. It was just one full color image with a clear, unobstructed view. And when I applied the same method and added new characters to create a LoRA with four characters, I hit a wall. The characters started blending together—something I’d never encountered before. With En & Jin, mixing was incredibly rare and negligible, but with four characters, it became a real problem.I adjusted parameters based on references from other multi-character LoRAs, but nothing worked. I’m still testing—trying to find out if the problem is with parameters, the need for group images, or specific prompt settings. Although the four-character LoRA was a failure, one great takeaway was this: black-and-white manga can be used to make LoRAs. With current AI redrawing tools, you can generate training data using AI itself.Example: Compared to LoRAs based on rich animation materials, using black-and-white manga is much more difficult and time-consuming. But since it’s viable, even the most obscure series have a shot at making a comeback. To summarize, creating multiple LoRAs for the same target is a process of progressive refinement, like crafting a drink from a shot of espresso. The first dataset with detailed tags is your espresso—it can be consumed as-is or mixed however you like. This method also works surprisingly well when creating LoRAs for original characters (OCs). Since OCs can have more complex features, you can start by generating a base image using a fixed seed with just hair/eye color and style. Train a first LoRA on that, then gradually add more features like dyed highlights or complex hairstyles during image generation. If the added features aren’t stable, remove some, and train another LoRA. Repeat this layering process until your OC’s full complexity is captured. This approach is far more stable than trying to generate all features in one go, even with a fixed seed, due to randomness, it’s hard to maintain complex character traits across different angles without breaking consistency. One more note: regarding character blending when using multiple LoRAs—there seems to be no foolproof way to prevent it. Even adding regularization sets during training doesn’t completely avoid it. As of now, the lowest error rate I’ve seen is when using characters from the same series, trained with the same parameters, by the same author, and ideally in the same training batch. And with that, we’ve reached the end—for now. I’ll continue to share new insights as I gain more experience. See you next time~
2
A bit of my experience with making AI-generated images and LoRAs ( 4 )

A bit of my experience with making AI-generated images and LoRAs ( 4 )

https://tensor.art/articles/868883505357024765 ( 1 )https://tensor.art/articles/868883998204559176 ( 2 )https://tensor.art/articles/868884792773445944 ( 3 )https://tensor.art/articles/868890182957418586 ( 5 )When it comes to training LoRAs, trying to fix all the bugs at the source is seriously exhausting. Unless you're doing LoRA training full-time, who really has the time and energy to spend so much of their free time on just one LoRA? Even if you are full-time, chances are that you'd still prioritize efficiency over perfection. And even after going through all the trouble to eliminate those bugs, the result might only be improving the “purity” from 60% to 80%—just a guess. After all, AI is still a game of randomness. The final training parameters, repeats, epochs, learning rate, optimizer, and so on will all influence the outcome. You’ll never “purify” it to 100%. And really, even 60% can already be impressive enough. So—worth it? My personal take: absolutely. If a certain character—or your OC—is someone your favorite since childhood, someone who’s part of your emotional support, someone who represents a small dream in your life, then why not? They’ll always be worth it.I’ve only made a handful of LoRAs so far, each with a bit of thought and some controlled variables. I’ve never repeated the same workflow, and each result more or less met the expectations I had at the beginning. Still, the sample size is way too small. I don’t think my experiences are close to being truly reliable yet. If you notice anything wrong, please don’t hesitate to point it out—thank you so much. And if you think there’s value in these thoughts, why not give it a try yourself?Oh, right—another disclaimer: due to the limitations of my PC setup, I have no idea what effect larger parameter values would have. All of this is based on training character LoRAs using the Illustrious model.Also, a very important note: this is not a LoRA training tutorial or a definitive guide. If you’ve never made a LoRA yourself but are interested in doing so, try searching around online and go ahead and make your first one. The quality doesn’t matter; just get familiar with the process and experience firsthand the mix of joy and frustration it brings. That said, I’ll still try to lay out the logic clearly and help you get a sense of the steps involved.0. Prepare your training set. This usually comes from anime screenshots or other material of the character you love. A lot of tutorials treat this as the most crucial step, but I won’t go into it here—you’ll understand why after reading the rest.1. Get the tools ready. You’ll need a computer, and you’ll need to download a local LoRA trainer or a tagging tool of some kind. Tools like Tensor can sometimes have unstable network connections, but they’re very convenient. If your internet is reliable, feel free to use Tensor; otherwise, I recommend doing everything on your PC.2. If you’ve never written prompts using Danbooru-style tags before, go read the tag wiki on Danbooru. Get familiar with the categories, what each one means, and look at the images they link to. This is super important—you’ll need to use those tags accurately on your training images.3. Do the auto-tagging. These tagging tools will detect the elements in your image and generate tags for them. On Tensor, just use the default model wd-v1-4-vit-tagger-v2—it’s fine, since Tensor doesn’t support many models anyway, and you can’t adjust the threshold. On PC, you can experiment with different tagger models. Try setting the threshold to 0.10 to make the tags as detailed as possible. You can adjust it based on your own needs.4. Now comes the most critical step—the one that takes up 99% of the entire training workload.After tagging is complete, fix your eyes on the first image in your dataset. Just how many different elements are in this image? Just like how the order of prompts affects output during image generation, prompts during training follow a similar rule. So don’t enable the “shuffle tokens” parameter. Put the most important tokens first—like the character’s name and “1boy.”For the character’s traits, I suggest including only two. Eye color is one of them. Avoid using obscure color names; simple ones like “red” or “blue” are more than enough. You don’t need to describe the hairstyle or hair color in detail—delete all automatically generated hair-related tags. Of course, double-check the eye color too. Sometimes it tags multiple colors like “red” and “orange” together—make sure to delete the extra ones.When it comes to hair, my experience is: if the color is complex, just write the hairstyle (e.g., “short hair”); if the hairstyle is complex, just write the color. Actually, if the training is done properly, you don’t even need to include those—just the character name is enough. But in case you use this LoRA with others that have potential for overfitting, it’s a safety measure to include them.Any tags about things like teeth, tattoos, etc., should be completely removed. If they show up in the auto-tags, delete them. The same goes for tags describing age or body type, such as “muscular,” “toned,” “young,” “child male,” “dark-skinned male,” etc. And if there are nude images in your dataset, and you think the body type looks good and you want future generations to match that body type, do not include tags like “abs” or “pectorals.”You may have realized by now—it’s precisely because those tags weren’t removed that they got explicitly flagged, and so the AI treats them as interchangeable. That’s why you might see the body shape, age, or proportions vary wildly in outputs. Sometimes the figure looks like a sheet of paper. That’s because you had “abs” and “pectorals” in your tags and didn’t realize those became part of the trigger prompts.If you don’t take the initiative to remove or add certain tags, you won’t know which ones have high enough weight to act as triggers. They’ll all blend into the chaos. If you don’t call them, they won’t appear. But if you do—even unintentionally—they’ll show up, and it might just bring total chaos.Once you’re done with all that, your character’s description should include only eye color and hair.For the character name used as a trigger word, don’t format it like Danbooru or e621. That’s because Illustrious and Noobai models already recognize a lot of characters. If your base model already knows your character, a repeated or overly formal name will only confuse it. What nickname do you usually use when referring to the character? Just go with that.See how tedious this process is, even just for tags setup? It’s far more complex than just automatically tagging everything, batch-adding names, and picking out high-frequency tags.Remember the task at the start of this section? To identify all the elements in the first image. You’ve now covered the character features. Now let’s talk about the clothing.Let’s say the boy in the image is wearing a white hoodie with blue sleeves, a tiger graphic on the front, and a chest pocket. Now you face a decision: do you want him to always wear this exact outfit, or do you want him to have a new outfit every day?Auto-tagging tools don’t always fully tag the clothing. If you want him to wear different clothes all the time, then break down this outfit and tag each part accordingly using Danbooru-style tags. But if you want him to always wear the same thing, just use a single tag like “white hoodie,” or even give the outfit a custom name.There’s more to say about clothing, but I’ll save that for the section about OCs. I already feel like this part is too long-winded, but it’s so tightly connected and info-heavy that I don’t know how to express it all clearly without rambling a bit.Next, observe the character’s expression and pose. Use Danbooru-style tags to describe them clearly. I won’t repeat this later. Just remember—tags should align with Danbooru as closely as possible. Eye direction, facial expression, hand position, arm movement, leaning forward or backward, the angle of knees and legs—is the character running, fighting, lying down, sitting, etc.? Describe every detail you can.Now, observe the background. Sky, interiors, buildings, trees—there’s a lot. Even a single wall, or objects on the wall, or the floor material indoors, or items on the floor—or what the character is holding. As mentioned earlier, if you don’t tag these things explicitly, they’re likely to show up alongside any chaotic high-weight tags you forgot to remove, suddenly appearing out of the ether.Are there other characters in the scene? If so, explain them clearly using the same process. But I recommend avoiding images like this altogether. Many LoRA datasets include them—for example, a girl standing next to the boy, or a mecha, or a robot. You need to “disassemble” these extra elements. Otherwise, they’ll linger like ghosts, randomly interfering with your generations.Also, when tagging anime screenshots, the tool often adds “white background” by default—so this becomes one of the most common carriers of chaos.At this point, you might already be feeling frustrated. The good news is that there are plenty of tools now that support automatic background removal—like the latest versions of Photoshop, some ComfyUI workflows, and various online services. These can even isolate just the clothes or other specific objects.
6
A bit of my experience with making AI-generated images and LoRAs ( 3 )

A bit of my experience with making AI-generated images and LoRAs ( 3 )

https://tensor.art/articles/868883505357024765 ( 1 )https://tensor.art/articles/868883998204559176 ( 2 )https://tensor.art/articles/868885754846123117 ( 4 )https://tensor.art/articles/868890182957418586 ( 5 )Alright, let’s talk about LoRA—many things in AI image generation really need to be discussed around it. But before that, I suppose it’s time for a bit of preamble again. LoRA, in my view, is the most captivating technology in AI image generation. Those styles—whether they’re imitations or memes. Those characters—one girl in a hundred different outfits, or the body of that boy you’re madly in love with. A large part of the copyright debate surrounding AI actually stems from LoRA, though people who aren’t familiar with AI might not realize this. In reality, it has hurt many people—but it has also captured many hearts. When you suddenly see an image of a boy that no one on any social media platform, in any language, is talking about—don’t you feel a sense of wonder? And when you find out that the image was created with LoRA, doesn’t your heart skip a beat? By the time you’re reading this, my first LoRA for Ragun Kyoudai has already been released. From the moment I had even the slightest thought of making a LoRA, I was determined that they had to be the first—the absolute first. But it wasn’t easy. The full-color illustrations I saved of them as a kid? Gone, thanks to broken hard drives and lost phones. The images you can find online now are barely 200x300 in resolution, and there are painfully few of them. I still remember the composition and poses of every single color illustration from 20 years ago, but in the internet of 2024, they’ve completely disappeared. All I had left were the manga and its covers, CDs, and cards. Could it be done? While searching for LoRA training tutorials and preparing the dataset for training, more and more doubts formed in my mind. Because of the art style, these images didn’t contain accurate anatomical structures. There weren’t multi-angle views—especially not from behind. Compared to datasets sourced from anime, mine felt pitifully incomplete. Still, I nervously gave it a first try. The result was surprising—AI managed to reproduce the facial features of the characters quite well. But it was basically just close-up shots. On the base model used for training, the generated images were completely unrecognizable outside the face. Switching to other derivative models, the characters no longer resembled themselves at all. So was it that AI couldn’t do it? Or was I the one who couldn’t? Or was it simply impossible to create a LoRA with such a flawed dataset? I decided to set it aside for the time being, since with my limited experience, it was hard to make a solid judgment. Later, while generating AI images, I began using LoRAs made by various creators. I wanted to know what differences existed between LoRAs—aside from the characters themselves. I didn’t discover many differences, but I did notice a lot of recurring bugs. That’s when I realized—I’d found a lead. Maybe understanding the causes of these bugs is the key to improving LoRA training. So let’s talk about it: What are these bugs? What do I think causes them? How can we minimize them during image generation? How can we reverse-engineer them to improve LoRA training? Just to clarify—as you know, these experiences are only based on LoRAs of boy characters. Not girls, and not those overly bara-styled characters either. 1. Overexposure2. Feminization3. On the base model used to train the LoRA (e.g., Pony, Illustrious), it doesn’t work properly: prompts struggle to change character poses or expressions; it’s impossible to generate multi-angle images like side or front views; eyes remain blurry even in close-ups; body shapes are deformed; figures become flat like paper; body proportions fluctuate uncontrollably.4. Because of the above, many LoRAs only work on very specific checkpoints.5. Even on various derivative checkpoints, key features like the eyes are still missing; the character doesn’t look right, appears more feminine, character traits come and go; regardless of the clothing prompt used, the original costume features are always present.6. Character blending: when using two character LoRAs, it’s hard to distinguish between them—let alone using more than two.7. Artifacts: most notably, using a white background often results in messy, chaotic backgrounds, strange character silhouettes, and even random monsters from who-knows-where.8. Sweat—and lots of sweat.9. I haven’t thought of the rest yet. I’ll add more as I write. All of these issues stem from one core cause: the training datasets used for LoRAs are almost never manually tagged. Selecting and cropping the images for your dataset may take only 1% of the time spent. Setting the training parameters and clicking “train”? Barely worth mentioning. The remaining 99% of the effort should go into manually tagging each and every image.But in reality, most people only use an auto-tagger to label the images, then bulk-edit them to add the necessary trigger words or delete unnecessary ones. Very few go in and manually fix each tag. Even fewer take the time to add detailed, specific tags to each image.AI will try to identify and learn every element in each image. When certain visual elements aren’t tagged, there’s a chance the AI will associate them with the tagged elements, blending them together.The most severe case of this kind of contamination happens with white backgrounds.You spent so much effort capturing, cropping, cleaning, and processing animation frames or generating OC images. When you finally finish training a LoRA and it works, you’re overjoyed. Those “small bugs” don’t seem to matter.But as you keep using it, they bother you more and more.So you go back and create a larger dataset. You set repeats to 20, raise epochs to 30, hoping the AI will learn the character more thoroughly.But is the result really what you wanted?After pouring in so much effort and time, you might have no choice but to tell yourself, “This is the result I was aiming for.”Yet the overexposure is worse. The feminization is worse. There are more artifacts. The characters resemble themselves even less.Why?Because the untagged elements from the training images become more deeply ingrained in the model through overfitting.So now it makes sense:Why there's always overexposure: modern anime tends to overuse highlights, and your dataset probably lacks any tag information about lighting.Why it's so hard to generate multi-angle shots, and why character sizes fluctuate wildly: because your dataset lacks tags related to camera position and angle.Why the character becomes more feminine: perhaps your tags inadvertently included terms like 1girl or ambiguous gender.Why certain actions or poses can't be generated: because tags describing body movement are missing, and the few that exist are overfitted and rigid.In short:Elements that are tagged get learned as swappable; elements that are untagged get learned as fixed.That may sound counterintuitive or even go against common sense—but it’s the truth.This also explains why, when using two character LoRAs together, they often blend: because tags for traits like eye color, hair color, hairstyle, even tiny details like streaks, bangs, short ponytails, facial scars, shark teeth—all of these are written in detail, and the more detailed the tags, the more they influence each other. Because the AI learns them as swappable—not inherent to the character.And no matter what clothing prompts you use, the same patterns from the original outfit keep showing up—because those patterns were learned under the clothes tag, which the AI considers separate and constant.LoRAs that are overfitted also tend to compete with each other over the same trigger words, fighting for influence.So, from a usage perspective, some of these bugs can be minimized.Things like overexposure, feminization, sweat—if you don’t want them, include them in your negative prompts.For elements like lighting, camera type, and viewing angle—think carefully about your composition, refer to Danbooru-style tags, describe these elements clearly and include them in your positive prompts.Also, make sure to use more effective samplers, as mentioned earlier.Use LoRAs that enhance detail but don’t interfere with style—such as NoobAI-XL Detailer. Hand-fixing LoRAs aren’t always effective, and it’s best not to stack too many together.One final reminder: you usually don’t need to add quality-related prompts. Just follow the guidance provided on the checkpoint’s official page.
5
A bit of my experience with making AI-generated images and LoRAs ( 2 )

A bit of my experience with making AI-generated images and LoRAs ( 2 )

https://tensor.art/articles/868883505357024765 ( 1 )https://tensor.art/articles/868884792773445944 ( 3 )https://tensor.art/articles/868885754846123117 ( 4 )https://tensor.art/articles/868890182957418586 ( 5 )Second, the prompts  are always the most critical. Many people don't realize that, and haven't read the instructions for the use of those checkpoints, that the number of prompts has an upper limit, and they are also in order, from first to last, so don't let those quality prompts occupy too much. “score_9_up,” “score_8_up,” etc., are used by the Pony model, while the Illustrious and Noobai models don't need them at all. So, regardless of which base model you're using, just follow the instructions written on the page. Whether you write a hundred perfect hands in the positive prompt or add six-finger, seven-finger hands in the negative prompt, it won’t make the hand generation stable. I used to think it would be helpful, but in the face of a lot of facts, it’s just a psychological effect. Excessive quality prompts will make the image worse, not better. The order of these quality prompts does have an effect, but it can generally be ignored. The most important factor is the order of your prompts. Although the prompts are generally random, their order and adjacency do have an impact: tokens placed earlier are more likely to produce better results than those placed later, and neighboring tokens tend to interact with each other. So if you want the image to be more in line with your imagination, it's best to conceive and write the elements of the picture in order. Here's a tool called BREAK, which recalculates the number of tokens. One of the effects it brings is that it tries to interrupt the influence between adjacent prompts. For example, writing "artist name" at the beginning and "BREAK, artist name" at the end will produce a much stronger style than writing the trigger word in the middle. Alternatively, placing it between different characters will likely make the characters more separate. Another tool is the | symbol, which strengthens the connection between two adjacent prompts and tries to merge their effects. Try experimenting with both and using them flexibly. Because of the tag-based training methods of Illustrious and Noobai, it's best to use prompts that align completely with the tags found on Danbooru. When thinking of an action or an object, it's advisable to check Danbooru for corresponding tags. You can also refer to Danbooru’s tag wiki or use many online tag-assistance websites to make your promptss more precise. Elements like lighting, camera angles, and so on can be researched for their effects and incorporated. E621 tags are only applicable to Noobai, while Danbooru tags are universal. Although natural language is not well-supported by Illustrious and Noobai, it can still be useful as a supplement. Be sure to start with a capital letter and end with a period. For example, if you want to describe a blue-eyed cat, writing "cat, blue eyes" might result in several cats with the boy's eyes, but writing "A blue eyes cat." will make sure the cat's eyes are blue. You can also use this method to add extra details after describing a character's actions using reference tags. Additionally, you can describe a scene and use AI tools like Gemini or GPT to generate natural language prompts for you. Prompt weights can also be assigned, with the most common method being using ( ) or a value like :1.5. This will make the weighted prompt appear more often or have a stronger or weaker effect. Fixing a random seed and assigning different weights to prompts is a very useful technique for fine-tuning the image. For example, if you generate an image with the right action but the character looks too muscular, you can recreate the image, find the random seed parameter, fix it, and then adjust with something like "skinny:1.2" or "skinny:0.8" to tweak the character's appearance. This method won’t usually change the original composition of the image. As for the method like (promptA:0.5, promptB:1.3, promptC:0.8), I didn’t find any pattern in it, so it can be used just as a kind of randomness. The above experiences in prompts may not be as good as good luck. Sometimes, just emptying your mind and writing randomly can lead to unexpected results. So, don’t get too caught up in it. If you can’t achieve the desired effect, just let it go and change your mindset. As for the images I’ve posted on Tensor, aside from the first few, all of the prompts have been tested using the same checkpoint on my local ComfyUI. Even though the LORA and parameters used may differ, generating a correct image doesn’t require repeating the process many times, unless it is full of bugs when it is released. There are still some things I haven’t thought of, but I’ll add them when I write the LORA section later. Third, parameter settings such as sampler, schedule, steps, CFG, etc. The principles behind these are too technical and hard to understand, but you can use simple trial and error combined with the test results of others to find the best settings.It’s really important to point out that a lot of people have never touched these settings—those options only show up when you switch Tensor to advanced mode. Free users on Civitai only get a few default choices, which are nowhere near as rich as what Tensor offers. The default sampler, “Euler normal”, generally performs quite poorly. If you haven’t tried other samplers, you might not even realize how much hidden potential your slightly underwhelming LoRA actually has.Below are the ones I use most often. The names are too long, so I’ll use abbreviations: dpm++2s_a, beta, 4, 40. If you’re using ComfyUI, switching the sampler to "res_m_a", “seeds_2”, “seeds_3” will yield unexpected surprise results. The default descriptions of these parameters on Tensor and other websites don’t fully explain their real effects, and many people haven’t tried changing them. In fact, they’re constantly evolving, and the most commonly used and recommended samplers for most checkpoints are "euler_a" and "dpm++2m", "norma" and "karras" don’t perform well in practice. Based on my experience, no matter which sampler you use, combining it with "beta" always gives the best results. If your checkpoint has bugs when using "beta", try "exponential"—these two are always the best, though they are also the slowest. Don’t mind the time; waiting an extra 10 or 20 seconds is worth it. "dpm++2s_a" is also the best in most cases, with more details and a stronger stylization. Only use something else if bugs persist regardless of how you modify the prompts. Next, "euler_dy" or "euler_smea_dy", which are supported by Tensor, offer a balance of detail between "euler_a" and "dpm++2s_a", while being more stable and having fewer bugs than "dpm++2m". Only use classic "dpm++2m" and "karras" if the checkpoint can’t handle the above parameters, and only in the most extreme cases should you resort to "euler_a" and "normal", because this combination results in images with poor details but less bugs. As for the number of steps, I personally like 30 and 40, but they aren’t crucial. More steps doesn’t always mean better results. Sometimes, for a single character image, 20 steps is more than enough, and 40 might introduce a lot of bugs. The real purpose of steps is to randomly generate a composition you’re happy with, and if there are small bugs, fixing the random seed and adjusting the steps can sometimes eliminate them. CFG has a pretty big impact on the results. The default explanation on the site doesn’t really match how it actually feels when you use it. With so many combinations of different checkpoints and LoRAs, there’s no one-size-fits-all reference — you just have to experiment. From what I’ve noticed, in general, the lower the CFG, the more conservative the composition tends to be, and the higher it is, the more exaggerated or dramatic it gets.Fourth, resolution. Each checkpoint will clearly specify the recommended resolution to use. The default resolution for Tensor is well-supported across various checkpoints with relatively fewer bugs. However, it’s quite small. You can use upscaler to increase the image resolution, but many checkpoints can generate larger resolutions directly, as long as the width and height maintain the same ratio as the recommended resolution and are multiples of 64. However, one thing to note is that compared to the default resolution, larger resolutions will result in a larger background area, while smaller resolutions will tend to have the characters occupy more of the space. Changing the resolution, even with the same parameters and seed, will still generate different images. This is also an interesting aspect, so feel free to experiment more with it.
3
A bit of my experience with making AI-generated images and LoRAs ( 1 )

A bit of my experience with making AI-generated images and LoRAs ( 1 )

At the beginning, please allow me to apologize—English isn't my native language, and this article was written with the help of AI translation. There may be grammatical errors or technical term inaccuracies that could cause misunderstandings.https://tensor.art/articles/868883998204559176  ( 2 )https://tensor.art/articles/868884792773445944 ( 3 )https://tensor.art/articles/868885754846123117 ( 4 )https://tensor.art/articles/868890182957418586 ( 5 )And this article is not a guide or tutorial. If you have never used AI drawing, you may not understand it. If you have made pictures or Lora, it will be easier to find possible misunderstandings and errors in the article. The variables and approaches I mention are based on personal experience, and given AI's vast randomness, they may not be universally applicable, your own experiments might yield totally different results. You can use the experience in this article as a comparison, or even as potential "wrong answers" to rule out in your own workflow. Some of my friends have just started AI painting or are preparing to start, I just hope that this article will be of some help to them. Like many people, when I first heard about AI painting, I thought it was a joke. It wasn't until more and more AI pictures of popular characters appeared on social media that I gradually changed my mind. It turned out that there was a way to make doujin like this. However, the carnival of those star boys and the grand occasion of girls who were hundreds and thousands of times more than these star boys did not make me interested in AI painting. One day, on the homepage of Twitter, I saw a post featuring an extremely obscure boy - the kind that barely anyone knows about. Why? How? At that time, my mind was full of excitement except for questions.After that, I would search for the names that lingered in my heart forever on Twitter or pixiv every day, hoping for a miracle to appear, and then, it really appeared.So I continued to wait for the results of others as if I was longing for a miracle, and still didn't think about whether I should try it. I didn't even know about websites like civitai or tensor at that time. More and more people started to make AI pictures, and then I knew about the existence of these websites from their links.These online AIs became a place for daily prayers. I never even clicked the start button, and just indulged in the joy of winning the lottery. Those pioneers shared new pictures and new things called lora every day. One day, I saw the lora of the boy that I was most fascinated with. Finally, I couldn't help it. I clicked the button, figured out how to start, copied other people's prompts, and replaced them with my favorite boy. In this way, I began to try to make pictures myself - the pictures full of bugs. The excitement gradually faded. It turned out that AI couldn't do it, or I couldn't do it. Questions replaced the excitement and occupied my brain. I continued to copy, paste, copy, and paste. Why were other people's results so good, but mine were always so disappointing? At that time, I thought that people didn't need to know the principle of fire, as long as they could use it. I just tried repeatedly without thinking.At this time, there was a phenomenon on the Internet that was becoming more and more common.That is, just hearing about AI drawing in the previous second, the next second, copying and pasting began, and the next second, sponsorship was opened to sell those error-filled picture packages - when the creators of those checkpoint and the creators of lora completely disagreed. What's worse, steal these picture packages and resell them. Not to mention those who stole the pictures that people released for free and sold them. The copyright of AI was originally controversial, and I had my own doubts, but these thieves were too utilitarian and too despicable. Why? How? In anger, I no longer have any doubts. I must think about how AI graphics came about, and I must know that fire needs air to burn and how to extinguish it. -- I have to regain the original motivation for using the Internet when I was still a chuuni boy -- sharing the unknown, sharing the joy, at least sharing it with my friends. I have no power to fight against those shameless guys, but they can never invade my heart. I'm sorry for writing so much nonsense. Let's write down my thoughts and experiences over the past year in a way that's easier to understand.These understandings are only for make character doujin and are not applicable to more creative uses. These experiences are mainly based on the use of illustrious and noobai and their derivative models. They are also based on using only prompt and basic workflows. More complex workflows, such as Controlnet, are not discussed. They can indeed produce more complete pictures, but they are still too cumbersome for spare time. Just writing prompts is enough to generate eye-catching pictures in the basic workflow. First of all, the most important conclusion is that AI drawing is still a probability game at present. All we can do is to intervene in this probability as much as possible to increase the possibility of the expected result. How to improve or change this probability to improve the quality? We need to understand a concept first. The order of AI drawing and human drawing (just digital, not traditional media) is completely opposite. When people draw, they will first conceive the complete scene of the image, then fill in the details, line draft, color, and then zoom in on a certain area to refine, such as the eyes. At this time, the resolution is very high, and even if you zoom back to the full picture, it will not affect the details. However, AI is the opposite. It first generates small details, such as eyes, and then gradually zooms in to depict the full picture. Therefore, when the resolution remains unchanged and undergoes several zooms, those initial details may be blurred. This is why AI drawing a headshot is much better than a full-body picture. So the easiest way to improve the quality is to only do the upper body, or close-up.

Posts