[REYApping] Model Training Experience Part1: Illustrious


Updated:

Hello and welcome to the fourth edition of REYApping, a space where I write a bunch of nonsense. Without further ado, let's begin.

Fourth edition? Dang, I didn't expect I'd make a fourth edition of me yapping, but here I am, trying to yap about my experience with model training, with the model this time being "Illustrious". Now I warn you that I never use this in my entire life before. Even if it's based on XL, when trying it, it just felt very different from Animagine or other anime XL in general. Also quick disclaimer: any settings that I used here is just an uneducated guess, so I encourage you to NOT STRAIGHT UP USE THESE SETTINGS. Unless it's coincidentally good. Without further ado, let's begin.

Step 1, Getting Ideas

I'm not really a creative person, so I admit that this part is one of, if not the hardest part. Currently I got a task telling me to train with Illustrious as base, put it in either visual, game, or space design, and get it into Tenstar Fund. The required channel reminds me of the past Hunyuan event with basically the same task except the Tenstar Fund thing, and I made: Fumo Dolls.

I decided to revisit the image that I gathered and see some really good opportunity. Most of my dataset contains anime and/or manga characters with 90% of it being Tohou characters, and since we're dealing with Illustrious, everything just aligned well, so I can continue to the next step.

Step 2, Dataset Gathering and Captioning

Since I choose to use my old images, I can skip the gathering process, but if you started from scratch, then I recommend you to either generate at least 1MP image, or if you look for images in Google, you need to use the advanced search and pick the "larger than 2MP" in size. If you have an interesting concept but don't know where to look and Google is just a no go, then I recommend to try generating with Dall E that you can access through Microsoft Edge's Bing. That will get you 1024x1024 images, which is good enough for training. For this Fumo model, I use 15 images of real Fumo dolls, with different backgrounds, angles, and subjects. After getting the images you want, it's time to do the most annoying part: captioning.

Captioning images is basically telling the base model what the image is all about. If you train a style, then you need to caption pretty much everything in the image. For a character, you assign its key characteristics into a unique word (called trigger word) and just caption its clothes and what it is doing. There're other methods, but I'll be captioning everything in this one. You need to create a txt file that corresponds to the image name. To make it easier, I renamed my image to "fumo" and let Windows assign a number. I then create txt files and renamed it the same. The result pretty much look like this:

Since the dataset mostly contains characters that is recognized by Illustrious, I'll be captioning it by using the trigger word first, character name, and then clothing (if different from original), pose, and then background. After it's done, I zipped all and it's ready to be uploaded.

Step 3, Training parameters

I'll just post a picture off my parameters here.

Also, since the training image is real photo, I add "realistic" tag at the end of all images by doing this:

Now it's time to train.

Part 4, Test and Publish

After the training is done, I published 2 models (Epoch 3 and 6) for tests since Tensor can't use an unpublished model for testing. The result is quite okay.

It apparently also creates cute randomness like this:

Final Thoughts

This is quite interesting. Illustrious as a model seems very good, it can make nice images and understands a lot of characters so it makes it easier for people to create character images without LoRas. This one will be the first of many tests that I will do with this base model. Also, I'll make an update later on with some more base models (like SD3.5 perhaps?).

Thank you for reading this part of REYApping. See you in the next one.

8
0