Introverted_69

Introverted_69

769877281832389122
I train lora's. I make stuff. I'm quite interested in fantasy and sci-fi, and making backgrounds, but also do some NSFW
329
Followers
141
Following
240.8K
Runs
391
Downloads
1.8K
Likes
1.8K
Stars
Latest
Most Liked
A bit on my process

A bit on my process

In roughly a week or so, I should have finally trained V1 of my big lora project. After it's out I'll also publish a tag guide for it, for the SDXL version. But before I get there, I thought I'd talk about my method for curating the dataset. Actually most of the dataset is not simply pictures, curated, as is. Almost every picture has been put through upscaler pro, to refine details, remove motion blur, clarify indistinct objects. People have been removed from many scenery pictures, then the gaps healed with composited light passes. I do a lot of the work in gimp - I'll pass the image, then use transparency to paint in JUST the parts of the light pass or upscale I like. With clarifying some of the scenery, it's even more intensive. I prompt every single identifiable or unidentified object in the scene, passing and compositing until all the objects are clear. Then I use the original image, with some parts of it painted out, as a 'light only' and 'dark only' layer at 50% opacity to restore the original photograph level lighting. The goal here, with some more complex scenes that are photography based, and still look entirely photography based, is to clarify every object so that nothing is indistinct. There are very very few indistinct objects of any kind in this dataset as a consequence. I hope this helps reduce artifacts and increases composability. More recently I've used my merge Aeromancer for a lot of the light passing - often the skin texture on that is better than upscaler pro - we've come a long way with open source txt to image. This is why this dataset has taken months to create. 90%+ of the carefully curated pictures are multi-pass composites. Some of the synthetically generated pictures (ones I genned here), were thrown out due to not being realistic enough. Almost all of the images, even the photos have been passed to improve something about them. Also included in the set are things like skyline examples, object examples, makeup examples, to provide period specific anchors - although these make the minority of the set. Otherwise it's largely people, and backgrounds with no people in them, with a few that feature both. Idea being you can then combine elements, for full cinematic style scenes, or also, produce no people backgrounds for things like Silly Tavern. I'm aiming for a fully cinematic, photorealistic lora with a focus on Cyberpunk, Fantasy, Victorian and to much lesser degree Post-Apocalyptic and Western. Two things have bothered me about AI gen - modernity bias that ruins the authenticity of a lot of gens, and how fictional settings tend to look illustrated (more illustrated data in the sets than filmic). This Lora should change that. But if anyone wants to reproduce my work, this article might provide a hint.
3
Some Lessons Learnt As A Newbie Trainer

Some Lessons Learnt As A Newbie Trainer

Lora training lessons learnt as a newbie trainer.All this applies to realistic pictures and may apply less so to highly stylized or illustrated loras. 1) One or a few pictures in the dataset can contaminate the whole set. Anything that leans overly in a direction, such as bright lighting, bright colors or the opposite can be picked up on and ran with. Stylistic dataset variety even in photos is ideal. Recommend about 10-20 pictures with slight different style or lighting conditions. 2) Autotagging is next to useless. Good tagging makes all the difference for manual control of outputs, so either manually carefully edit or create every tag set, or don't even both tagging. Bad tags can ruin a train. This can also allow you to properly spot jewelry, tattoos, watermarking, logos and the like which AI cannot do well, and also allow you to add in variables the end user can control such as ethnicity, eye and hair color, body type, situations or body positions, and context like where the background is. If you do this poorly, it results in poor control - but as I said, poor tags can also ruin the train. If niche features, like say bright studio lighting, or semirealistic are funneled into tags they are more limited in input and also can be negatively prompted for. 3) 50-100 pictures is the ideal size. 100 in particular allows a few examples of multiple subthemes if they are properly tagged. You are better to manually curate these, and manually crop out the logos/text if possible. That will give you better picture quality and diversity than just scrapping. 4) As with bright pictures, or dark pictures, synthetic data can spoil the batch. I find no more than 40% synthetic data, focusing only on high quality data like the best of Pony XL, or best of Flux to be ideal. You need to be choosy when it comes to synthetic data. 5) Slow training allows for more generalization.6) Higher resolutions generalize more than lower resolutions. This can also be a bad thing if your dataset, or tags spoil the batch. If your tagging or dataset is not ideal, you might go for 512x512, although the results will be worse on details.7) Increasing the batch size vastly increases the generalization, which can be a good or a bad thing depending on the subject material. I'd say outside of a very narrow use, like say a body posture, or clothing item, this is probably not ideal, as it can overgeneralize strongly. I probably have a lot more to learn here, but these are a few things I've picked up.
21
1