juiwaters's Articles

Beginner Basics By Beginner Part 2: Models, Model Training & What They Mean for Image Generation

Beginners' Basics By Beginner (PART 2)Assumes part 1 read.Model & Model Training Background Understanding for Image GenerationOne way to think of a AI model is that its a program that resulted from other program(s) using a lot of data to learn patterns and correlations. This process we call it “training”. For image generation, for example, this creates a basis for your text to be translated int the various image elements. There are different programs and ways to train. TA has its own too. For beginner we won't go into actually creating models but just to understand image model basics to help us generate images better.Quality & ScopeFirstly, the model creator chooses the input images. Generally speaking the more the better. Quality of input images – clarity, resolution, being representative, etc all affects the end model. Knowing that creators are often ad hoc efforts with various constraints and preferences means that models vary greatly in quality and scope. Quality could be clarity, having more flexibility in subject/object/environment etc, less flaws (deformed, distorted, outright wrong etc). Scope is what that model can produce in object/subject/colors/mood/genre etc.How it impacts us:Company produced models (usually base models) usually have a better dependability, so base models are your default first choice unless something worthwhile is offered by a checkpoint.Free-lance or hobby creators can have a “reputation” for their works to help us choose whether to try future models by them. If they previously did good works and good samples are visible, a new model lacking in samples may be worth trying just by reputation alone. Whether you want to chance some credits to try untested creators and models is basically how attractive its promised offering is to you. Sometimes a model can be “printed” from elsewhere meaning it was copied over. So the source site may have samples.LoRAs give an add-on feature, for example more “special effects” for your images. But if you do not need the special effects then you don't need the LoRA. Stick with what you need so that you have less chances for flaws. In part 1, we mentioned that LoRAs can interfere with one another, so adding LoRAs that you don't need simply add chance for problems. More LoRAs may mean more add-on features, but it can also mean more parts failing to work together.Cleaning and Organizing DataThe creators also need to remove duplicates, avoid over representation or skewing the data etc. After which things like “label & tag" the images needs to be done. This is sort of like describing them so that a prompt can pull from the model to help generate the image. Of course there is more to it than what was just described but it is good enough for a basic understandingHow it impacts us:Everyone works differently whether in quality or quantity so that affects data source, inputs and treatment. Everyone thinks differently, so the “label and tag" will be different. Therefore, sometimes a prompt works for a LORA but not for another , some prompted word work more often and some less often for the same model, some prompts work for some models and not others. These means we need to realize that when we learn to generate images, the lessons cannot be applied blindly across models. Each model have its particularities that must be learned. Limited by time and effort you need to discover your main few model preferences and spend most of your budgeted time and efforts on those, while maybe setting aside some little amount of time occasionally just having a “look see” at what other models can do.Training a ModelThe actual crunching is by a program and computer, there are more than one option/method, that produces models that behave differently. One aspect of this "training" is where the AI processes the inputs such that far more outputs are possible for generating images than what was put in. A simple way to think of it is that the AI is mixing, matching and coming up with new variations and permutations.How it impacts us:The AI is not a thinking person or even thinking anything. It is a program – so it can make mistakes that a program can make which includes hallucination and flaws. Hallucinations are when wrong, misleading or odd results come out. For images having 3 arms on a person, distortions, deformities are examples. These problems are not necessarily purely because of the training, but for simplicity we treat it as such for beginner basics.So when we prompt models, we may feel that many and detailed prompts are good. For purposes of getting as close to what we want, more prompts is good. However, more prompts also mean more complexity because more variables are involved. This increases the likelihood of strange or bad results. Hence some users preference for simple short prompts. Another way to think of it is that when you prompt, you need to know what is the most important to you for the image. In a manner of speaking you need to ration your words so that the key aspects are covered and leave the rest to the model. As AI improve, we may be able to prompt more and maybe even have a responsive AI that ask questions and clarify with you, but for now, too much words is a bad thing. All the extra words will use up the “attention” of the AI and give more chance for flaws. I prefer short concise keywords that contain the critical elements of what is needed. A lot of words like – is are that … etc have no meaning for the image generated. This model is not human and does not pick up meaning from natural language. It picks up the keywords and the context of the keywords. This whole paragraph is unfortunately contrary to what some people suggest when they say to write a lot and use a lot of descriptive words as if you are writing a scene in a novel. I happen to disagree with that method. Its up to you to decide what you believe, look at what others produced or you can test it out for yourself.There are actually far more aspects to training a model but these basic background knowledge should be enough for a beginner simply trying to generate images.

juiwaters

Beginner Basics By Beginner - Part 1 Resource & Model Basics (Updated 14 DEC 2024)

Beginners' Basics By Beginner (PART 1)Firstly, I am still a beginner so please be gentle ;-P With a beginner mind frame, I hope to offer insights for beginners, compared to experts who may omit things they have grown used to & taken for granted.This article is from Tensor.art (TA) user perspective. Different AI sites have different interfaces & may have different terminology, however, concepts do generally apply.This it NOT a DO THIS & DO THAT type of article. I cover background info so that one knows what one is doing - hopefully :-P, so to better help with all image generation for a longer term.I am covering text to image = input words & output image - Text2Img” in TA.I've broken things into parts meaning this will be a series of articles.What I will be covering in this part 1:a) TA Knowledge Resourcesb) Very basics on Model, Base Model, Checkpoint & LoRACurrent plan for future Parts:- basic Model Training background knowledge to help image generation- about ControlNet & Embeddings- AI Tools- Remix- VAE, Aspect Ratio Sampler, Scheduler Sampling Steps, Guidance Scale, Seed Clip Encoderhow many parts for above depends on how much I can squeeze in size limits- How to prompta) TA Knowledge ResourcesJust in case you don't know - the user handbook & video tutorials link (look like 3 squares & a diamond) is near the top right. I think the handbook covers too much too briefly but it is a good overview. The video tutorials have helpful stuff, but its not user friendly for a beginner. If you don't understand the video title then don't waste your time watching.“Training” does NOT mean a training session for you – “training” is “creating” models. Models are NOT "created" but "trained" as the grunt work in producing a model is mainly by a computer.b) Very basics on Model, Base Model, Checkpoin & LoraModel = Base Model , Checkpoint, LoRA, Embedding or ControlNet. Ok, so everything is a model - you are probably rolling your eyes. Technically they are all models but on TA when we say model, its usually the Base Model or Checkpoint, sometimes LoRA & rarely Embedding or ControlNet - context is needed. For example, if TA has an event on training a model, it is referring to Checkpoint or LoRA because these are directly achievable by normal users. Unfortunately, to get the context, you have to live & learn on TA. Being on Discord helps. So don't burn out, control your time spent.A model is the fundamental resource called upon with your prompts to generate the image. There are 3 main types of models that you deal with as a beginner. Build a comfort level in them before going into other models. They are Base Model, Checkpoint & LoRA. The Base Model is source or core to all the models, without the Base Model, the rest cannot exist or work. You can think of a particular checkpoint as a modified form of a particular Base Model. All Checkpoints of a Base Model and the Base Model forms something like family.Functionally speaking, base models or check points are the same, you only choose 1.LoRAs are add-ons to Base Models or Checkpoints. You can generate without LoRAs. For first tries it may be good to generate without LoRAs & use only Base Models so that that you have a feel of basic generation for each base model.One well known base model is the Stable Diffusion (SD) series. You will usually see "SD something something". The first "something" is usually the version number & the 2nd "something" is typically the variant. Each version is a new base model. For example, you may see "SD1.5", "SDXL", "SD3". SD1.5 & earlier are older & produce smaller images. From XL onwards image size support is 1024x1024. This means SDXL allows you to create bigger images than SD1.5 without enlarging (Upscaling). Upscaling costs credits. Different versions are not compatible, whereas variants may be compatible within a version. Therefore, a LoRA that works for a old base model version will not work for a later one. Version X LoRA (where X is a number) does not imply its for Version X SD, they each number their versions independently. LoRA info may indicate what base model the LORA works with or there may be a clue in the name. Otherwise, later part I will cover how to find out for yourself. Other base models include Midjourney, Hunyuan, Flux, Kolors etc. Spend some time going through others' posts, spot what you like than look at what they used. As a start focus on 1 or 2 models, choose whatever has more of what you like.Different base model tend to have their own set of strengths, limits or characteristics (cover later in series). Most of the time later versions are generally better - if it is a base model.For checkpoints, because they are mostly by individuals, "versioning" is ad hoc or subjective – menaing whatever the individual creator likes. The new version could be general improvement, change in look & feel, change in input used to create it, specialization to create certain type of images etc. Hence, the latest version checkpoint is not necessarily the best version for you. Different versions may be suitable to achieve different images. Therefore, the only way to choose a suitable model is to go through their samples. Start from homepage lists, (R)un through the images to find those to your liking, find the models used & link through to the models' (I)nfo page, read the info & go through the (S)amples posted by others, if any, & (E)xperiment. This will be a recurrent advise so I will just mention as “-RISE-”, details in a future part. While doing this, note model combinations plus keywords & phrases in prompts (cover in future). This can link to more & more images, be prudent, browser bookmark for future review - control your time, don't do too many. Create 2 bookmark folders for future review & already reviewed. The already reviewed is because after you know more you can come back to see what you missed the first time. Name & sub-folder the links in a way that best suits you.LoRAs need a base model or checkpoint to work. LoRA stands for LOw Rank Adaptation, They are usually designed to produce some specific effect or range of effects. Examples include identifiable face, colour range, feel to the image etc. LoRAs are often designed to work with base models & sometimes with specific checkpoints. If a LoRA works for a base/checkpoint model in a family, it usually works for others in the same family. How well it works or whether there will be unexpected results is another matter. A LoRA designed for a base model has a better chance of working well with checkpoints then a LoRA designed for a particular checkpoint working well with the base model & other checkpoints. Between LoRAs, the quality, nature & characteristics of image generated vary greatly.LoRAs can contribute significantly to your image generated & its normal to have 1 or more LoRAs. LoRAs may interfere with one another while missing a LoRA may mean missing an ingredient. My opinion is anything from 1 to 5 LORAs is desirable. With more LoRAs, usually need to reduce the weight to lessen the chance of interference. The weight on a LORA is basically how strongly you want to apply the LoRA.The LoRA's info may advise suitable weight. Sometimes when selecting a LoRA, a pop up window may suggest weightage. Generally 0.8 weight is a common starting point. Most of the time you end up lowering it as you experiment. In some cases you may increase it up to for example 2.0. Some LoRAs are more likely to need weight reduction than others. There is no fixed or comparative weighting across LoRAs. Each weight to each LoRA is due to that particular LoRA's behavior. You cannot judge the likely weight needed for a particular LoRA because previously another LoRA worked when weighted in some way. -RISE-. Very often distorted/deformed images/object/subject signals that a LORA is interfering & needs weight reduction or be removed. Some LORAs create a lot of problems compared to others. If you are not sure which LORA is creating the problem & no clues from samples, then you have to experiment using one LORA at a time & playing with the weights. After finding the guilty LoRA(s) & you still want it then you need to slowly find the "sweet spot" by generating 2 or images for each decimal point of weight reduction. A decimal point of weight can make a difference. Bracketing method : deliberately weigh high & low then narrow inwards with ever finer changes in weights so that you don't need to try every decimal point weight. Balance between the LoRA still having an effect vs chance for problems. The sweet spot merely improves chances per generation – some will be ok & others not. There are LoRAs that simply don't work well - so do dump them when you need to. Some LORAs burn credits because use more generations to get something usable or to fixed imperfections.When selecting Base Model/Checkpoint, different families have different generation cost. Furthermore, cost levels doing text2img is different vs image to image (img2img) & vs image to video (img2video). Sometimes, a generated image has a flaw, is missing something etc but you want to try to fix it. Other than editing it yourself with a graphic editor, this can be done with image to image (img2img). Models more prone to flaws will continue to have flaws if you use it to fix/enhance. So a "cheap" model to do an initial generation may end up being credit expensive when you need fix/enhance later.Models presented on their info page as photographic, drawn or whatever - may not necessarily produce as presented. The particular combination of Models as well as the prompting affects the image greatly. For example, A LoRA with "drawn" clothes may become photographic in your image generation. -RISE-.

juiwaters

juiwaters

Beginner Basics By Beginner Part 3 - AI Tool

Beginner Basics By Beginner Part 2: Models, Model Training & What They Mean for Image Generation

Beginner Basics By Beginner - Part 1 Resource & Model Basics (Updated 14 DEC 2024)