Beginners' Basics By Beginner (PART 1)
Firstly, I am still a beginner so please be gentle ;-P With a beginner mind frame, I hope to offer insights for beginners, compared to experts who may omit things they have grown used to & taken for granted.
This article is from Tensor.art (TA) user perspective. Different AI sites have different interfaces & may have different terminology, however, concepts do generally apply.
This it NOT a DO THIS & DO THAT type of article. I cover background info so that one knows what one is doing - hopefully :-P, so to better help with all image generation for a longer term.
I am covering text to image = input words & output image - Text2Img” in TA.
I've broken things into parts meaning this will be a series of articles.
What I will be covering in this part 1:
a) TA Knowledge Resources
b) Very basics on Model, Base Model, Checkpoint & LoRA
Current plan for future Parts:
- basic Model Training background knowledge to help image generation
- about ControlNet & Embeddings
- AI Tools
- Remix
- VAE, Aspect Ratio Sampler, Scheduler Sampling Steps, Guidance Scale, Seed Clip Encoder
how many parts for above depends on how much I can squeeze in size limits
- How to prompt
a) TA Knowledge Resources
Just in case you don't know - the user handbook & video tutorials link (look like 3 squares & a diamond) is near the top right. I think the handbook covers too much too briefly but it is a good overview. The video tutorials have helpful stuff, but its not user friendly for a beginner. If you don't understand the video title then don't waste your time watching.
“Training” does NOT mean a training session for you – “training” is “creating” models. Models are NOT "created" but "trained" as the grunt work in producing a model is mainly by a computer.
b) Very basics on Model, Base Model, Checkpoin & Lora
Model = Base Model , Checkpoint, LoRA, Embedding or ControlNet. Ok, so everything is a model - you are probably rolling your eyes. Technically they are all models but on TA when we say model, its usually the Base Model or Checkpoint, sometimes LoRA & rarely Embedding or ControlNet - context is needed. For example, if TA has an event on training a model, it is referring to Checkpoint or LoRA because these are directly achievable by normal users. Unfortunately, to get the context, you have to live & learn on TA. Being on Discord helps. So don't burn out, control your time spent.
A model is the fundamental resource called upon with your prompts to generate the image. There are 3 main types of models that you deal with as a beginner. Build a comfort level in them before going into other models. They are Base Model, Checkpoint & LoRA. The Base Model is source or core to all the models, without the Base Model, the rest cannot exist or work. You can think of a particular checkpoint as a modified form of a particular Base Model. All Checkpoints of a Base Model and the Base Model forms something like family.
Functionally speaking, base models or check points are the same, you only choose 1.
LoRAs are add-ons to Base Models or Checkpoints. You can generate without LoRAs. For first tries it may be good to generate without LoRAs & use only Base Models so that that you have a feel of basic generation for each base model.
One well known base model is the Stable Diffusion (SD) series. You will usually see "SD something something". The first "something" is usually the version number & the 2nd "something" is typically the variant. Each version is a new base model. For example, you may see "SD1.5", "SDXL", "SD3". SD1.5 & earlier are older & produce smaller images. From XL onwards image size support is 1024x1024. This means SDXL allows you to create bigger images than SD1.5 without enlarging (Upscaling). Upscaling costs credits. Different versions are not compatible, whereas variants may be compatible within a version. Therefore, a LoRA that works for a old base model version will not work for a later one. Version X LoRA (where X is a number) does not imply its for Version X SD, they each number their versions independently. LoRA info may indicate what base model the LORA works with or there may be a clue in the name. Otherwise, later part I will cover how to find out for yourself. Other base models include Midjourney, Hunyuan, Flux, Kolors etc. Spend some time going through others' posts, spot what you like than look at what they used. As a start focus on 1 or 2 models, choose whatever has more of what you like.
Different base model tend to have their own set of strengths, limits or characteristics (cover later in series). Most of the time later versions are generally better - if it is a base model.
For checkpoints, because they are mostly by individuals, "versioning" is ad hoc or subjective – menaing whatever the individual creator likes. The new version could be general improvement, change in look & feel, change in input used to create it, specialization to create certain type of images etc. Hence, the latest version checkpoint is not necessarily the best version for you. Different versions may be suitable to achieve different images. Therefore, the only way to choose a suitable model is to go through their samples. Start from homepage lists, (R)un through the images to find those to your liking, find the models used & link through to the models' (I)nfo page, read the info & go through the (S)amples posted by others, if any, & (E)xperiment. This will be a recurrent advise so I will just mention as “-RISE-”, details in a future part. While doing this, note model combinations plus keywords & phrases in prompts (cover in future). This can link to more & more images, be prudent, browser bookmark for future review - control your time, don't do too many. Create 2 bookmark folders for future review & already reviewed. The already reviewed is because after you know more you can come back to see what you missed the first time. Name & sub-folder the links in a way that best suits you.
LoRAs need a base model or checkpoint to work. LoRA stands for LOw Rank Adaptation, They are usually designed to produce some specific effect or range of effects. Examples include identifiable face, colour range, feel to the image etc. LoRAs are often designed to work with base models & sometimes with specific checkpoints. If a LoRA works for a base/checkpoint model in a family, it usually works for others in the same family. How well it works or whether there will be unexpected results is another matter. A LoRA designed for a base model has a better chance of working well with checkpoints then a LoRA designed for a particular checkpoint working well with the base model & other checkpoints. Between LoRAs, the quality, nature & characteristics of image generated vary greatly.
LoRAs can contribute significantly to your image generated & its normal to have 1 or more LoRAs. LoRAs may interfere with one another while missing a LoRA may mean missing an ingredient. My opinion is anything from 1 to 5 LORAs is desirable. With more LoRAs, usually need to reduce the weight to lessen the chance of interference. The weight on a LORA is basically how strongly you want to apply the LoRA.
The LoRA's info may advise suitable weight. Sometimes when selecting a LoRA, a pop up window may suggest weightage. Generally 0.8 weight is a common starting point. Most of the time you end up lowering it as you experiment. In some cases you may increase it up to for example 2.0. Some LoRAs are more likely to need weight reduction than others. There is no fixed or comparative weighting across LoRAs. Each weight to each LoRA is due to that particular LoRA's behavior. You cannot judge the likely weight needed for a particular LoRA because previously another LoRA worked when weighted in some way. -RISE-. Very often distorted/deformed images/object/subject signals that a LORA is interfering & needs weight reduction or be removed. Some LORAs create a lot of problems compared to others. If you are not sure which LORA is creating the problem & no clues from samples, then you have to experiment using one LORA at a time & playing with the weights. After finding the guilty LoRA(s) & you still want it then you need to slowly find the "sweet spot" by generating 2 or images for each decimal point of weight reduction. A decimal point of weight can make a difference. Bracketing method : deliberately weigh high & low then narrow inwards with ever finer changes in weights so that you don't need to try every decimal point weight. Balance between the LoRA still having an effect vs chance for problems. The sweet spot merely improves chances per generation – some will be ok & others not. There are LoRAs that simply don't work well - so do dump them when you need to. Some LORAs burn credits because use more generations to get something usable or to fixed imperfections.
When selecting Base Model/Checkpoint, different families have different generation cost. Furthermore, cost levels doing text2img is different vs image to image (img2img) & vs image to video (img2video). Sometimes, a generated image has a flaw, is missing something etc but you want to try to fix it. Other than editing it yourself with a graphic editor, this can be done with image to image (img2img). Models more prone to flaws will continue to have flaws if you use it to fix/enhance. So a "cheap" model to do an initial generation may end up being credit expensive when you need fix/enhance later.
Models presented on their info page as photographic, drawn or whatever - may not necessarily produce as presented. The particular combination of Models as well as the prompting affects the image greatly. For example, A LoRA with "drawn" clothes may become photographic in your image generation. -RISE-.