A bit on my process


Updated:

In roughly a week or so, I should have finally trained V1 of my big lora project. After it's out I'll also publish a tag guide for it, for the SDXL version.

But before I get there, I thought I'd talk about my method for curating the dataset. Actually most of the dataset is not simply pictures, curated, as is.

Almost every picture has been put through upscaler pro, to refine details, remove motion blur, clarify indistinct objects. People have been removed from many scenery pictures, then the gaps healed with composited light passes. I do a lot of the work in gimp - I'll pass the image, then use transparency to paint in JUST the parts of the light pass or upscale I like.

With clarifying some of the scenery, it's even more intensive. I prompt every single identifiable or unidentified object in the scene, passing and compositing until all the objects are clear. Then I use the original image, with some parts of it painted out, as a 'light only' and 'dark only' layer at 50% opacity to restore the original photograph level lighting. The goal here, with some more complex scenes that are photography based, and still look entirely photography based, is to clarify every object so that nothing is indistinct. There are very very few indistinct objects of any kind in this dataset as a consequence. I hope this helps reduce artifacts and increases composability.

More recently I've used my merge Aeromancer for a lot of the light passing - often the skin texture on that is better than upscaler pro - we've come a long way with open source txt to image.

This is why this dataset has taken months to create. 90%+ of the carefully curated pictures are multi-pass composites. Some of the synthetically generated pictures (ones I genned here), were thrown out due to not being realistic enough. Almost all of the images, even the photos have been passed to improve something about them. Also included in the set are things like skyline examples, object examples, makeup examples, to provide period specific anchors - although these make the minority of the set. Otherwise it's largely people, and backgrounds with no people in them, with a few that feature both. Idea being you can then combine elements, for full cinematic style scenes, or also, produce no people backgrounds for things like Silly Tavern.

I'm aiming for a fully cinematic, photorealistic lora with a focus on Cyberpunk, Fantasy, Victorian and to much lesser degree Post-Apocalyptic and Western. Two things have bothered me about AI gen - modernity bias that ruins the authenticity of a lot of gens, and how fictional settings tend to look illustrated (more illustrated data in the sets than filmic). This Lora should change that. But if anyone wants to reproduce my work, this article might provide a hint.

0