LoRa in SD3.5 | Dataset Creation + Tools


Updated:

LoRa Dataset Creation in SD3.5

This is a walkthrough of my LoRa creation process for SD3.5. I am using a number of external tools which are listed in this article.

Part 1: Find a good dataset

For this project I plan on creating a LoRa for a japanese magazine style cover-art for SD 3.5 Large model

The first step is sourcing reference material for your LoRa. Here I used kimirano.jp to find a collection of images

Pinterest is an absolutely amazing site for finding content for LoRa. They have a very neat feature that allows you to store images as 'collections' and find images similiar to that collection. Try using pinterest for sourcing images to your LoRa: https://www.pinterest.com/

And Sankaku Complex for the NSFW stuff (they have a phone app: https://apps.apple.com/us/app/sankaku-anime-ai-girlfriend/)

These are the ones I selected. I compiled this collage using https://gandr.io

Above is example of me using gandr.io on some robots. Here is what the output looks like:

Note that the backgrounds for these robots have been edited using GIMP. For large datasets , the predominant color will oftentimes be white. Try to offset this whenever possible. Adding a black rim at the edges will teach the LoRa that high contrast = good. Green and Blue are rare colors. You want to use a unique color scheme. It will make the LoRa output image stand out in an AI art gallery.

And here are the manga covers:

Compiling the training images this way is a good way to showcase to people what "type" of image your LoRa can create.

Some points when selecting the LoRa reference material:

  • Use images where only 1 person exists

  • Use different colors for backgrounds

  • Avoid items that feature obscured bodyparts

  • Avoid images with a lot of white , beige or gray in them.

  • Judge the pictures based on color , clarity and composition. If you can't tell what the image is based on the thumbnail , the AI model won't understand it either.

  • Understand the Stable Diffusion community. If you want it , others want it too. For example; the reason why I'm training this LoRa is to give people (and myself) the means to produce cool coverpage images for future models, articles , posts etc.

Next , I edit the photos using a photoeditor tool such as GIMP : https://www.gimp.org/downloads/

The goal is to remove anything which may confuse the AI model when trying to re-create the images.

SD3.5 Large model will happily accept any LoRa content that features nudity. The base SD 3.5 Large model lacks any training for NSFW content, so it will happily gobble up such content to make the 'pieces within itself' fit together nicely.

In this case , I've removed all English Text from the cover-art you see above. This is to avoid a concept blend between Kanji letters and English text for the T5 model. We still want to be able to write English text with the LoRa after all.

The general rule of thumb from the LoRa community is having between 20-30 images for a character , and at least 30 images if the LoRa embodies something more abstract like a concept or style.

In this case I use 24 images. My reasoning here is that the coverart is very "chaotic" so only a few images are required to represent a decent amount of variety. Plus we want to save the LoRa training costs since recreating this very densely packed artstyle will likely require a lot of epochs before it "stabilizes" so-to-speak.

Part 2: Selecting a Keyword

While a keyword can be anything you want it to be; this time I've decided to take the scientific approach.

This notebook can be used to search tokens in SD3.5 : https://huggingface.co/datasets/codeShare/text-to-image-prompts/blob/main/Google%20Colab%20Notebooks/token_vectors_math.ipynb

Within this notebook I made some random searches for tokens similar to "manga</w>" and "japan</w>" , and I stumbled upon the rarely used token "kei</w>" . I decided to include this into the keyword

If you want to see the concept representation of each token , you can try: https://benjamin-bertram.github.io/passive-illustration/index.html#token-library

You can also use this notebook to browse text_encodings: https://huggingface.co/datasets/codeShare/fusion-t2i-generator-data/blob/main/Google%20Colab%20Jupyter%20Notebooks/fusion_t2i_CLIP_interrogator.ipynb

I used the text_encodings notebook to find the keyword for the robot lora.

One of the similiar results according to CLIP model in the text_encoding notebook was "art by Brian Sum", so I googled that and behold "Brian Sum" was actually a guy who draws robots!

You can find his creations here: https://www.artstation.com/sum . I did add 4 images of his works into the robot LoRa, bringing up the total from 26 images to 30.

//----//

For the mangacover art I decided the keyword for the LoRa should be "mangacover kei" . This allows me to hitch a ride on the training data which already exists within the SD3.5 model , saving epoch training time.

To check the exact number of tokens used I use this online tokenizer : https://sd-tokenizer.rocker.boo/

SD3.5 uses CLIP_L and CLIP_G , and has the same vocab as the previous SD models. The main difference is that it also uses the T5 model, which is an LLM model akin to chatGPT.

To verify I run a prompt on a SD3.5 model as 'mangacover kei text "LORA" '

Good enough.

Part 3: Writing the prompts

When training T5 models , I prefer running the training images through JoyCaption Alpha One at 200 token length : https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one

We want the prompts to be between 500-800 characters in length in order to keep it within the 256 token context length of the T5 model. To quote stability AI:

  • While this model can handle long prompts, you may observe artifacts on the edge of generations when T5 tokens go over 256. Pay attention to the token limits when using this model in your workflow, and shortern prompts if artifacts becomes too obvious.

    Also note:

  • The medium model (SD3.5M) has a different training data distribution than the large model (SD3.5 Large), so it may not respond to the same prompt similarly.

Source: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium

Part 4: Compiling the Dataset

Finally I apply the selected images to Batchcropper : https://batchcropper.com/en

I prefer using portrait size 768x1024 or 768x1150 for the dataset.

Then I paste the JoyCaption prompts and add my selected keywords to somewhere close to the start of the prompts.

The training set is now done! It can be downloaded as a zip file and kept in a Huggingface repository until it is time to train them: https://huggingface.co/datasets/codeShare/lora-training-data

Should you wish to store training data privately you can use https://mega.nz/ . They are a cloud storage website which encrypts their user data and by policy have 0 % knowledge of the content you store online as long as the repository is set to private.

Note that due to recent legislation in California, more common hosting websites like Google drive may ban your account if you use their services to host certain type of content. This will include celebrity impersonation. This is something to keep in mind.

Part 5: LoRa Settings

If you are training a SD3.5L LoRa on Civitai , make sure to set 'Network Dim' to 32 and 'Network Alpha' to 16

The SD model is just a bunch of matrices doing matrix calculation on an input vector. Each matrix is a 'layer'.

'Network Dim' is in this case a fancy term for saying how many layers we wish to encode in the SD3.5L model.

A larger 'Network Dim' will mean a larger file size for your LoRa model.

Be aware past a certain point , more training does not mean better results.

I prefer training with 10 repeats over 30 epochs.

That way I get solutions within the 20-30 epoch range. This is not an exact science.

The LoRa training task is a 'random walk' to find the best solution. We do 10 dice rolls , pick the best result and repeat the process 30 times.

//----//

When you do your training , remember to document, document , document!

Users wants to see your dataset , your prompts , your examples (including the bad ones) , the loss graph on the LoRa training , the epoch you choose to release , your methods, your sources. The sharing of information is the lifeblood of an open source community.

This is the Loss graph of the Brain Sum LoRa

We see a dip in Loss rate past epoch 14. Thus , it is reasonable to post every epoch past epoch 14 to the LoRa. Then we can do some trial and error on the epochs 14-20 to find which of these has the "best looking" output.

//----//

LoRa Configs I'm using currently

Repeat 10

Epoch 30

Save Every N Epochs 1

Clip Skip 1

Text Encoder learning rate 0.00001

Unet learning rate 0.00015 <--- Important!

LR Scheduler constant

Optimizer AdamW8bit

Network Dim 32 <--- Important!

Network Alpha 16

Gradient Accumulation Steps 2 <--- Good

Noise offset 0.2 <--- Good

//---//

LoRas I've made (so far)

Visual Novel style: https://tensor.art/models/796037910450403539?source_id=njq1pFzjlEOwpPEpaXny-xcu

Brian Sum Lora : https://tensor.art/models/795501520574647074?source_id=njq1pFzjlEOwpPEpaXny-xcu
Naytlayt NSFW training LoRa : https://tensor.art/models/793017079562442313?source_id=njq1pFzjlEOwpPEpaXny-xcu
Tsutomo Nihei LoRa : https://tensor.art/models/791213304967242613?source_id=njq1pFzjlEOwpPEpaXny-xcu

Training Data:

Brain Sum Training data (imgur): https://imgur.com/a/blPjv6S

I post my training data here , which you can download as a zip file: https://huggingface.co/datasets/codeShare/lora-training-data/tree/main

//----//

Thank you for reading this article. Hopefully it will be some help. Good luck on the Lora training for the SD3.5 model.

3
0