How I LoRA: A beginners guide to LoRA training

A step-by-step guide on how to train a LoRA.

Warning: This guide is based on Kohya_SS

This guide REQUIRES a basic understanding of image generation, read my guide "How I art: A beginners guide" for basic understanding of image generation.
This guide REQUIRES a basic understanding of image editing, tagging, and WebUI navigation. This guide CAN be ported to Tensor.art's trainer; if you know what you are doing.

This guide is an (almost) 1:1 of the following guide: https://civitai.com/articles/3522/valstrixs-crash-course-guide-to-lora-and-lycoris-training

Edits were made to keep it short and only dive into the crucial details. It also removes a lot of recommendations I DO NOT follow.; for more advanced information, please support the original guide. If you want to do things MY way, keep reading.

Part 1 | Datasets: Gathering & Basics

Your dataset is THE MOST IMPORTANT aspect of your LoRA, hands down. A bad dataset will produce a bad LoRA every time, regardless of your settings. Garbage data in gives garbage data out!

Image Count:

Personally, I recommend a dataset size of anywhere from 50 to 100 images at an ideal. While you can absolutely use more or less.

Image Quality:

When assembling your images, ensure you go for quality over quantity. A well-curated set of 30 images can easily outperform a set of 100 poor and mediocre images.

Additionally, I recommend you keep your dataset stylistically varied, unless you're training a style lora. If a style is too prominent in your data, the style itself may be learned alongside your intended concept. When you get to tagging your data, I highly recommend you tag such cases to minimize their effect.

Image Sourcing:

Personally, I gather my data from e621. Again, make sure you try and avoid pulling too much from the same artist and similar styles.

Identifying Your Needs:

Concepts:
- For concepts, you should primarily look for solo images of the subject in question. Duos/Trios also work, but you should only grab them if your primary subject is largely unobscured. Alternatively, extra individuals can be easily removed or cropped out.
- If you do include multi-character images, make sure they are properly and thoroughly tagged.
- Including duo/trio/group images can be very beneficial to using your LoRA in multi-character generations, but is not required by any means.
Styles:
- For styles, gathering data is generally a lot less selective, so long as the images are styled consistently.

Folder Structure:

To keep yourself organized and formatted correctly for Kohya, structure your training folder as follows:

- Root
   - LoRA Dataset Folders
      - Concept Folder (What you're training, and where you point Kohya.)
         - Raw (Not required, but this is where I put all my images before sorting.)
         - 1_Concept (This is what's actually trained. You can replace "Concept" with anything.)

For example, here my data is on kohya_ss>sd-scripts>.dataset>1_Shondo

Part 2 | Datasets: Preparation

Once you have your raw images from part 1, you can begin to preprocess them to get them ready for training.

First Pass:

Personally, I separate my images into two groups: Images that are ok on their own, and images that require some form of editing before use. Those that meet the below criteria are moved to another folder and then edited accordingly.

Important info: Webp is incompatible with current trainers, and should be converted. You are best only using .jpeg/.png at all times.

If you use photoshop, you can resize your entire set at once with image processor. Set the fixed resolution to 2048 on both sides and max quality (7). This will resize all images so the images won't go under 2048 (or 1024, 512, etc) on either side.

Second Pass:

On the second pass, I manually edit any images that needed extra work from the first: This is where I do any cropping or, sometimes, redrawing of image sections.

Once you've done the following, place all of your images that you edited, and the images you didn't edit, in your training folder:

- Root
   - LoRA Dataset Folders
      - Concept Folder
         - Raw
         - 1_Concept <- (This one!)

Part 2.5 | Datasets: Curing Poison

Image poisoning techniques have been found to only work in such niche situations that practically any form is DOA. Generally speaking, the following set of conditions needs to align for poisoned data to have any tangible impact on your training:

The poisoning technique was based on the exact same text encoder as your model.
The poisoning technique also used the exact same or similar VAE as you're training with.
The amount of poisoned data is proportionally higher than unpoisoned data by a sizable margin.

It's still a good idea to clean or discard obviously poisoned images, but it's less to combat the poison and more to have a clean image without artifacts. The poison is actually snake oil (Nightshade and glaze are effectively useless).

Part 3 | Datasets: Tagging

Almost done with the dataset! We're in the final step now, tagging. This will be what sets your instance token (the activator tag), and will determine how your LoRA is used. In my personal opinion you should always do this manually.

Personally, I use the Booru Dataset Tag Manager and tag all of my images manually. You COULD tag without a program, but just... don't. Manually creating, naming, and filling out a .txt for every image is not what you want to do with your time.

Thankfully, BDTM has a nice option to add a tag to every image in your dataset at once, which makes the beginning of the process much easier.

Picking a model:

Before you tag, you need to choose a model to train on! For the sake of compatibility, I suggest you train on a Base Model, which is anything like a finetune that is NOT a mix of other models. For example, ChromaMix is well, a mix BASED on Noob; which is BASED on Illustrous. SO, train on Noob or Illustrous if you are planning to use a Noob-mixed model like Chroma or Kiwi.

Tagging Types:

Now, for the tagging itself. Before you do anything, figure out what type of tags you'll be using, this comes from the model you will use; Chroma and by proxy, Noob are trained on e621/danbooru, therefore using tags from said websites will yield the best results:

The Tagging Process:

Once you know what model and tags you're using, you can start tagging. Depending on what you are training; how you tag. For example, for a style LoRA, I recommend to tag EVERYTHING, while for any other LoRA tag anything you wish to keep as a variable.
Or in lamer terms: Anything NOT related to the training subject should be tagged. If your character always has red eyes, don't tag "red eyes". But if your character can be male or female, then tag those depending on the image.

Tagging Tips:

Don't overtag, keep it simple, the less tags the better.

Don't use "implied" tags, these are tags that imply other tags just by their presence. By having an implied tag, you shouldn't use the tag(s) it implies alongside it; for example, "German shepherd" implies both "canid" and "dog".

Part 3.5 | Datasets: Prior Preservation (Regularization)

While completely optional, another method of combating style bias and improper tag attribution is via the use of a Prior Preservation dataset. This will act as a separate but generalized dataset for use alongside your training dataset, and can usually be used generally between multiple training sessions. I would recommend creating a new folder for them like so:

- Root
   - LoRA Dataset Folders
      - Concept Folders
      - Regularization Folder
         - 1_Concept
         - 1_Concept (You can have more than one!)

"But how exactly do I make and use these?"

You can start by naming your folder after a token - your class token is often a good choice.

Creating a dataset for these is actually incredibly easy - no tagging is required. Within the folder you created for the tag, you simply need to put in a number of random, unique, and varied images that fall within that tag's domain. Do not include images of anything you'll be training. From my own testing, I personally recommend a number roughly equal to the number of images in your main dataset for training.

During training, the trainer will alternate between training on your primary and regularization dataset - this will require you to have longer training to achieve the same amount of learning, but will very potently reduce biasing.

Part 3.6 | Tagging: Examples

Since examples are usually quite helpful:

fallenshadow, :p, skirt, bow, capelet, cat ears, cat tail, closed mouth, fang, heart-shaped pupils, 
holding knife, looking at viewer, smile, solo, tail bow, teddy bear, tongue out, white background

fallenshadow will be the token tag; while everything else will remain a variable. This means that unless I tag "cat ears" alongside fallenshadow when using the LoRA; the image should NOT come out with cat ears.

THIS IS IT FOR PART 1: DATASET PREP. GOOD LUCK!

How I LoRA: A beginners guide to LoRA training | Part 1: Dataset Prep.