juiwaters

juiwaters

Images with step 5 were re-gen to fix or touch up something. Original gen 25 steps.
408
Followers
211
Following
0
Runs
0
Downloads
27.8K
Likes
0
Stars
Latest
Most Liked
Beginner Basics By Beginner Part 3 - AI Tool

Beginner Basics By Beginner Part 3 - AI Tool

Beginner Basics By Beginner 3 - AI ToolThis article is for the Beginner and assumes previous parts have already been read.The name "AI Tool" can mean more than one thing in the world of AI, for the AI industry at large it can generically mean a software that provides a solution. In other words, it is a software tool that is based on AI. This is obviously NOT what we are referring to in the context of Tensor Art (TA). Here, we are taking about a tool we are using to do image generation, which is what this article will be about. Therefore, If you are talking to someone else outside of TA, do note that "AI Tool" can mean something different from what we are talking about here.An AI Tool in TA is typically something a user created using ControlNet. ControlNet as previously stated is not for Beginners, so without delving into ControlNet itself or its uses, for now you can think of ControlNet as a programming tool to create AI Tools. Tools generally are meant to enable or make things easier, so the same goes for AI Tools. For the Beginner, AI Tools offer a simplified way to achieve image or video generation. Therefore, before you try or at least try in earnest on your generation page, AI Tools may be a good starting point to let you know what you can do with a model - motivates you to try the model on your own. Remix is another but I will cover that in another part.Some AI Tools simplify prompting to such an extent as to give you multiple choice options to click on to serve as prompts. Once again RealAnime has this feature. If you do not want to fill in the boxes, there are a few clickable options for subject and another few options for action.There are many AI Tools that simplify the options and even break the prompting into entry boxes for you to fill in accordingly, kind of like filling in a simple form. Fill it, click go and that is it. For example, currently there is a RealAnime AI Tool that has this feature where it ask you to fill in one box who you want to appear as the subject, and in another box fill in what you want the subject to be doing.Other AI Tools may allow you to upload images to generate images with or without the form filling way of doing prompts. An example of this is the 3DVideo where you upload a image and it makes it into a rotating view point video using your subject as the focal point. The effect is to make your image into a 3D version with a camera going around your subject. In 3DVideo, its does not go fully around, just rotates little towards the left. This example includes an inpaint option, The inpaint if you were to click on it allows you to use your cursor to "paint" your image to identify for the AI Tool what is the subject that is the focus. The paint tool includes an undo last paint and a reset all paint to zero. You can choose thicker or finer brush sizes, although the sliding scale control seems to be hard to control (for me it jumps a little when dragging). The inpaint tool also tends to have a small viewer making painting fine areas harder. I use the browser zoom to enlarge but beyond a point the left right control to pan the image seems to fail so I have to zoom out to pan then zoom in again to do fine painting. The need for fine painting seem to vary by AI Tool and by your source image, decide for yourself if you want to spend more time fine painting or just painting broadly. Sometimes broad painting is good enough - in other words its fine to "paint outside the lines" so to speak. In the case of 3DVideo, you can even choose not to paint at all and let the AI Tool try to detect the subject. Painting just gives you more certainty its correctly detected.AI Tools may have some unstated limits. In the 3Dvideo example, a limitation is that it handles landscape (horizontal) images better. Square ones seem to work, although there may be more distortion compare to landscape. Portrait (vertical) images may have cropping and more distortions.Furthermore, AI Tools often simplify other aspects of generation for the Beginner as well. For example, choosing LORA(s), negative prompts etc. You could want it for simplicity, or it may be at a time where its difficult to do on your own. For example, when Flux (a model) first came out, there was a lack of LORAs. Furthermore, its use and behavior was a little different from other models that have a been around for a while. Therefore, in those early days of generation with Flux, AI Tools usage was the go to method for many users.AI Tools are written by different people who are targeting different users, have different ideas as to what is desirable in an AI Tool etc. Hence, simplicity vs flexibility, type & quality of generation etc all vary.One thing to note, is that using an AI Tool is no guarantee that the generation will be a success.- See what others have used it for to know what it is suitable for.- Check the quality of results by others to know what to expect.AI Tools cost credits, each tool has its own cost, so look before you leap. Sometimes there is a discount going on and you can use it cheaper - which is also means the price will go up later, hence be aware and not end up surprised by having paid after the fact.AI Tools also have a queue just like your generation page. I generally advise only to do 1 generation at a time and not queue generations for an AI tool, as it may fail. Yes, AI Tools can fail. Read error message. If its due to the processing and not due to something indeterminate or what you input, then you can try again if you wish - although there is still no guarantee of success. Having had only 1 generation in queue, and if you did not refresh or close browser etc, whatever you set up for generation should still be there in the queue (try tab), so just click to generate.Generations from an AI Tool will appear at your generation page so there is no need to track it at the AI Tool page. If you post your generated result, it may appear at the AI Tool as a post too.
Beginner Basics By Beginner Part 2:  Models, Model Training & What They Mean for Image Generation

Beginner Basics By Beginner Part 2: Models, Model Training & What They Mean for Image Generation

Beginners' Basics By Beginner (PART 2)Assumes part 1 read.Model & Model Training Background Understanding for Image GenerationOne way to think of a AI model is that its a program that resulted from other program(s) using a lot of data to learn patterns and correlations. This process we call it “training”. For image generation, for example, this creates a basis for your text to be translated int the various image elements. There are different programs and ways to train. TA has its own too. For beginner we won't go into actually creating models but just to understand image model basics to help us generate images better.Quality & ScopeFirstly, the model creator chooses the input images. Generally speaking the more the better. Quality of input images – clarity, resolution, being representative, etc all affects the end model. Knowing that creators are often ad hoc efforts with various constraints and preferences means that models vary greatly in quality and scope. Quality could be clarity, having more flexibility in subject/object/environment etc, less flaws (deformed, distorted, outright wrong etc). Scope is what that model can produce in object/subject/colors/mood/genre etc.How it impacts us:Company produced models (usually base models) usually have a better dependability, so base models are your default first choice unless something worthwhile is offered by a checkpoint.Free-lance or hobby creators can have a “reputation” for their works to help us choose whether to try future models by them. If they previously did good works and good samples are visible, a new model lacking in samples may be worth trying just by reputation alone. Whether you want to chance some credits to try untested creators and models is basically how attractive its promised offering is to you. Sometimes a model can be “printed” from elsewhere meaning it was copied over. So the source site may have samples.LoRAs give an add-on feature, for example more “special effects” for your images. But if you do not need the special effects then you don't need the LoRA. Stick with what you need so that you have less chances for flaws. In part 1, we mentioned that LoRAs can interfere with one another, so adding LoRAs that you don't need simply add chance for problems. More LoRAs may mean more add-on features, but it can also mean more parts failing to work together.Cleaning and Organizing DataThe creators also need to remove duplicates, avoid over representation or skewing the data etc. After which things like “label & tag" the images needs to be done. This is sort of like describing them so that a prompt can pull from the model to help generate the image. Of course there is more to it than what was just described but it is good enough for a basic understandingHow it impacts us:Everyone works differently whether in quality or quantity so that affects data source, inputs and treatment. Everyone thinks differently, so the “label and tag" will be different. Therefore, sometimes a prompt works for a LORA but not for another , some prompted word work more often and some less often for the same model, some prompts work for some models and not others. These means we need to realize that when we learn to generate images, the lessons cannot be applied blindly across models. Each model have its particularities that must be learned. Limited by time and effort you need to discover your main few model preferences and spend most of your budgeted time and efforts on those, while maybe setting aside some little amount of time occasionally just having a “look see” at what other models can do.Training a ModelThe actual crunching is by a program and computer, there are more than one option/method, that produces models that behave differently. One aspect of this "training" is where the AI processes the inputs such that far more outputs are possible for generating images than what was put in. A simple way to think of it is that the AI is mixing, matching and coming up with new variations and permutations.How it impacts us:The AI is not a thinking person or even thinking anything. It is a program – so it can make mistakes that a program can make which includes hallucination and flaws. Hallucinations are when wrong, misleading or odd results come out. For images having 3 arms on a person, distortions, deformities are examples. These problems are not necessarily purely because of the training, but for simplicity we treat it as such for beginner basics.So when we prompt models, we may feel that many and detailed prompts are good. For purposes of getting as close to what we want, more prompts is good. However, more prompts also mean more complexity because more variables are involved. This increases the likelihood of strange or bad results. Hence some users preference for simple short prompts. Another way to think of it is that when you prompt, you need to know what is the most important to you for the image. In a manner of speaking you need to ration your words so that the key aspects are covered and leave the rest to the model. As AI improve, we may be able to prompt more and maybe even have a responsive AI that ask questions and clarify with you, but for now, too much words is a bad thing. All the extra words will use up the “attention” of the AI and give more chance for flaws. I prefer short concise keywords that contain the critical elements of what is needed. A lot of words like – is are that … etc have no meaning for the image generated. This model is not human and does not pick up meaning from natural language. It picks up the keywords and the context of the keywords. This whole paragraph is unfortunately contrary to what some people suggest when they say to write a lot and use a lot of descriptive words as if you are writing a scene in a novel. I happen to disagree with that method. Its up to you to decide what you believe, look at what others produced or you can test it out for yourself.There are actually far more aspects to training a model but these basic background knowledge should be enough for a beginner simply trying to generate images.
Beginner Basics By Beginner - Part 1 Resource & Model Basics (Updated 14 DEC 2024)

Beginner Basics By Beginner - Part 1 Resource & Model Basics (Updated 14 DEC 2024)

Beginners' Basics By Beginner (PART 1)Firstly, I am still a beginner so please be gentle ;-P With a beginner mind frame, I hope to offer insights for beginners, compared to experts who may omit things they have grown used to & taken for granted.This article is from Tensor.art (TA) user perspective. Different AI sites have different interfaces & may have different terminology, however, concepts do generally apply.This it NOT a DO THIS & DO THAT type of article. I cover background info so that one knows what one is doing - hopefully :-P, so to better help with all image generation for a longer term.I am covering text to image = input words & output image - Text2Img” in TA.I've broken things into parts meaning this will be a series of articles.What I will be covering in this part 1:a) TA Knowledge Resourcesb) Very basics on Model, Base Model, Checkpoint & LoRACurrent plan for future Parts:- basic Model Training background knowledge to help image generation- about ControlNet & Embeddings- AI Tools- Remix- VAE, Aspect Ratio Sampler, Scheduler Sampling Steps, Guidance Scale, Seed Clip Encoderhow many parts for above depends on how much I can squeeze in size limits- How to prompta) TA Knowledge ResourcesJust in case you don't know - the user handbook & video tutorials link (look like 3 squares & a diamond) is near the top right. I think the handbook covers too much too briefly but it is a good overview. The video tutorials have helpful stuff, but its not user friendly for a beginner. If you don't understand the video title then don't waste your time watching.“Training” does NOT mean a training session for you – “training” is “creating” models. Models are NOT "created" but "trained" as the grunt work in producing a model is mainly by a computer.b) Very basics on Model, Base Model, Checkpoin & LoraModel = Base Model , Checkpoint, LoRA, Embedding or ControlNet. Ok, so everything is a model - you are probably rolling your eyes. Technically they are all models but on TA when we say model, its usually the Base Model or Checkpoint, sometimes LoRA & rarely Embedding or ControlNet - context is needed. For example, if TA has an event on training a model, it is referring to Checkpoint or LoRA because these are directly achievable by normal users. Unfortunately, to get the context, you have to live & learn on TA. Being on Discord helps. So don't burn out, control your time spent.A model is the fundamental resource called upon with your prompts to generate the image. There are 3 main types of models that you deal with as a beginner. Build a comfort level in them before going into other models. They are Base Model, Checkpoint & LoRA. The Base Model is source or core to all the models, without the Base Model, the rest cannot exist or work. You can think of a particular checkpoint as a modified form of a particular Base Model. All Checkpoints of a Base Model and the Base Model forms something like family.Functionally speaking, base models or check points are the same, you only choose 1.LoRAs are add-ons to Base Models or Checkpoints. You can generate without LoRAs. For first tries it may be good to generate without LoRAs & use only Base Models so that that you have a feel of basic generation for each base model.One well known base model is the Stable Diffusion (SD) series. You will usually see "SD something something". The first "something" is usually the version number & the 2nd "something" is typically the variant. Each version is a new base model. For example, you may see "SD1.5", "SDXL", "SD3". SD1.5 & earlier are older & produce smaller images. From XL onwards image size support is 1024x1024. This means SDXL allows you to create bigger images than SD1.5 without enlarging (Upscaling). Upscaling costs credits. Different versions are not compatible, whereas variants may be compatible within a version. Therefore, a LoRA that works for a old base model version will not work for a later one. Version X LoRA (where X is a number) does not imply its for Version X SD, they each number their versions independently. LoRA info may indicate what base model the LORA works with or there may be a clue in the name. Otherwise, later part I will cover how to find out for yourself. Other base models include Midjourney, Hunyuan, Flux, Kolors etc. Spend some time going through others' posts, spot what you like than look at what they used. As a start focus on 1 or 2 models, choose whatever has more of what you like.Different base model tend to have their own set of strengths, limits or characteristics (cover later in series). Most of the time later versions are generally better - if it is a base model.For checkpoints, because they are mostly by individuals, "versioning" is ad hoc or subjective – menaing whatever the individual creator likes. The new version could be general improvement, change in look & feel, change in input used to create it, specialization to create certain type of images etc. Hence, the latest version checkpoint is not necessarily the best version for you. Different versions may be suitable to achieve different images. Therefore, the only way to choose a suitable model is to go through their samples. Start from homepage lists, (R)un through the images to find those to your liking, find the models used & link through to the models' (I)nfo page, read the info & go through the (S)amples posted by others, if any, & (E)xperiment. This will be a recurrent advise so I will just mention as “-RISE-”, details in a future part. While doing this, note model combinations plus keywords & phrases in prompts (cover in future). This can link to more & more images, be prudent, browser bookmark for future review - control your time, don't do too many. Create 2 bookmark folders for future review & already reviewed. The already reviewed is because after you know more you can come back to see what you missed the first time. Name & sub-folder the links in a way that best suits you.LoRAs need a base model or checkpoint to work. LoRA stands for LOw Rank Adaptation, They are usually designed to produce some specific effect or range of effects. Examples include identifiable face, colour range, feel to the image etc. LoRAs are often designed to work with base models & sometimes with specific checkpoints. If a LoRA works for a base/checkpoint model in a family, it usually works for others in the same family. How well it works or whether there will be unexpected results is another matter. A LoRA designed for a base model has a better chance of working well with checkpoints then a LoRA designed for a particular checkpoint working well with the base model & other checkpoints. Between LoRAs, the quality, nature & characteristics of image generated vary greatly.LoRAs can contribute significantly to your image generated & its normal to have 1 or more LoRAs. LoRAs may interfere with one another while missing a LoRA may mean missing an ingredient. My opinion is anything from 1 to 5 LORAs is desirable. With more LoRAs, usually need to reduce the weight to lessen the chance of interference. The weight on a LORA is basically how strongly you want to apply the LoRA.The LoRA's info may advise suitable weight. Sometimes when selecting a LoRA, a pop up window may suggest weightage. Generally 0.8 weight is a common starting point. Most of the time you end up lowering it as you experiment. In some cases you may increase it up to for example 2.0. Some LoRAs are more likely to need weight reduction than others. There is no fixed or comparative weighting across LoRAs. Each weight to each LoRA is due to that particular LoRA's behavior. You cannot judge the likely weight needed for a particular LoRA because previously another LoRA worked when weighted in some way. -RISE-. Very often distorted/deformed images/object/subject signals that a LORA is interfering & needs weight reduction or be removed. Some LORAs create a lot of problems compared to others. If you are not sure which LORA is creating the problem & no clues from samples, then you have to experiment using one LORA at a time & playing with the weights. After finding the guilty LoRA(s) & you still want it then you need to slowly find the "sweet spot" by generating 2 or images for each decimal point of weight reduction. A decimal point of weight can make a difference. Bracketing method : deliberately weigh high & low then narrow inwards with ever finer changes in weights so that you don't need to try every decimal point weight. Balance between the LoRA still having an effect vs chance for problems. The sweet spot merely improves chances per generation – some will be ok & others not. There are LoRAs that simply don't work well - so do dump them when you need to. Some LORAs burn credits because use more generations to get something usable or to fixed imperfections.When selecting Base Model/Checkpoint, different families have different generation cost. Furthermore, cost levels doing text2img is different vs image to image (img2img) & vs image to video (img2video). Sometimes, a generated image has a flaw, is missing something etc but you want to try to fix it. Other than editing it yourself with a graphic editor, this can be done with image to image (img2img). Models more prone to flaws will continue to have flaws if you use it to fix/enhance. So a "cheap" model to do an initial generation may end up being credit expensive when you need fix/enhance later.Models presented on their info page as photographic, drawn or whatever - may not necessarily produce as presented. The particular combination of Models as well as the prompting affects the image greatly. For example, A LoRA with "drawn" clothes may become photographic in your image generation. -RISE-.
6
5