Some Lessons Learnt As A Newbie Trainer
Lora training lessons learnt as a newbie trainer.All this applies to realistic pictures and may apply less so to highly stylized or illustrated loras. 1) One or a few pictures in the dataset can contaminate the whole set. Anything that leans overly in a direction, such as bright lighting, bright colors or the opposite can be picked up on and ran with. Stylistic dataset variety even in photos is ideal. Recommend about 10-20 pictures with slight different style or lighting conditions. 2) Autotagging is next to useless. Good tagging makes all the difference for manual control of outputs, so either manually carefully edit or create every tag set, or don't even both tagging. Bad tags can ruin a train. This can also allow you to properly spot jewelry, tattoos, watermarking, logos and the like which AI cannot do well, and also allow you to add in variables the end user can control such as ethnicity, eye and hair color, body type, situations or body positions, and context like where the background is. If you do this poorly, it results in poor control - but as I said, poor tags can also ruin the train. If niche features, like say bright studio lighting, or semirealistic are funneled into tags they are more limited in input and also can be negatively prompted for. 3) 50-100 pictures is the ideal size. 100 in particular allows a few examples of multiple subthemes if they are properly tagged. You are better to manually curate these, and manually crop out the logos/text if possible. That will give you better picture quality and diversity than just scrapping. 4) As with bright pictures, or dark pictures, synthetic data can spoil the batch. I find no more than 40% synthetic data, focusing only on high quality data like the best of Pony XL, or best of Flux to be ideal. You need to be choosy when it comes to synthetic data. 5) Slow training allows for more generalization.6) Higher resolutions generalize more than lower resolutions. This can also be a bad thing if your dataset, or tags spoil the batch. If your tagging or dataset is not ideal, you might go for 512x512, although the results will be worse on details.7) Increasing the batch size vastly increases the generalization, which can be a good or a bad thing depending on the subject material. I'd say outside of a very narrow use, like say a body posture, or clothing item, this is probably not ideal, as it can overgeneralize strongly. I probably have a lot more to learn here, but these are a few things I've picked up.