PictureT
PictureT
🇬🇧 🇨🇵 🇪🇸 Mostly in 𝑷𝒊𝒙𝒊𝒗
🌐 https://picture-t.com/
382
Followers
114
Following
69.5K
Runs
11
Downloads
9.9K
Likes
745
Stars
AI Tools
View AllPictureT
Director's Cut | FLUX
PictureT
Western Vintage | FLUX
PictureT
📸 Shutterbug | SD3.5L Turbo
PictureT
🐍 Snakeyear - Minisculpture | SD3.5L Turbo
PictureT
📹 Shutterbug | SVD & SD3.5L Turbo
PictureT
Western Vintage | 𝗣𝗼𝗻𝘆𝗩𝟲 BETA
PictureT
Tool | Testing Playground
PictureT
Ink Style | FLUX
PictureT
Bande Dessinée | FLUX
PictureT
Sarah Bryant - Race Queen
PictureT
Digital Art Portrait (Various Styles) - [HunYuan-DiT][1.2]
PictureT
RPG Background Maker - [HunYuan-DiT][1.2]
Models
View AllLORA Flux
EXCLUSIVE
OC Eris - Cosmic Horrors-F1
PictureT
LORA Flux
EXCLUSIVE
Western Styles-Retrofuturism
PictureT
LORA Illustrious
EXCLUSIVE
Western Fine Art [Illustrious v0.1]-v10
PictureT
LORA Flux
EXCLUSIVE
Fine Art Photography-Space Design
PictureT
LORA Flux
Western Fine Art, Christmas Parody [Flux.1 D]-🎅 Merry Christmas
PictureT
LORA FluxUpdated
EXCLUSIVE
HED☽NICA™-Nadka Viola
PictureT
LORA SD 3.5 L
DSDSDFSDFDFD-e17
PictureT
LORA Pony
EXCLUSIVE
PictureT's Pony Styles — Fine art Illustrators-Ernest Frahm
PictureT
LORA Pony
Christie [DoA]-v6
PictureT
LORA Flux
EXCLUSIVE
PictureT's Wardrobe-2810e
PictureT
LORA Flux
[Cosmic Horrors] Characters-卐 Nightmare
PictureT
LORA Pony
EXCLUSIVE
☢️ Ramshackle-H24_22
PictureT
LORA Pony
Middleton - Engel-v2
PictureT
LORA HunyuanDiT
EXCLUSIVE
[HYDiT] 1990s Anime-v1.2
PictureT
LORA HunyuanDiT
EXCLUSIVE
[HYDiT] RPG Background-v1.2
PictureT
LORA HunyuanDiT
EXCLUSIVE
[HYDiT] The Warren's Archive-v1.2
PictureT
LORA Flux
EXCLUSIVE
Western Illustration Noir / Comic Book-WPM
PictureT
LORA SD 1.5
EXCLUSIVE
Portrait Enhancer by PictureT-v2.0
PictureT
LORA Pony
Sarah Bryant [VF]-🦄v6
PictureT
LORA Flux
EXCLUSIVE
👠 Platform fashionist by PictureT-F1 ⚡︎
PictureT
Workflows
View All25 Nodes
🐍 Snakeyear - SD3.5L Turbo (Template)
PictureT
10 Nodes
worflow/FLUX.1-Turbo-Alpha
PictureT
8 Nodes
Christmas Walkthrough | Image to Prompt in Comfy
PictureT
14 Nodes
workflow/SD3.5
PictureT
14 Nodes
🎃 Halloween 2024 • Pony Diffusion Workflow
PictureT
17 Nodes
Flux1 Advanced Template [LoRA+t5xxl+DynamicThresholding]
PictureT
10 Nodes
Image to Face Detailer
PictureT
11 Nodes
SDXL Dual ClipTextEncoder CLIP L & CLIP G, Basic Template
PictureT
11 Nodes
Fluxdev_fp8 Basic Template
PictureT
Articles
View AllMy 'Model Training' Parammeters for the 'Christmas Model' task of 'Christmas Walkthrough' event
Hello!, I've made a LoRA in TensorArt, and i want to share the processhttps://tensor.art/models/806550646066228285/Christmas-Walkthrough-2024-Merry-ChristmasTutorial for FluxHave an Idea (Christmas)Get a Dataset, [min 1024x1024]Dataset: https://mega.nz/file/DAdHiLwA#-swbYhXvbH2-zMav4JflUDZuERoY-lt_y4RMq7HpzM8Upload your DatasetSelect Batch Cutting: config for 'vertical, horizontal or squared' depending on your dataset, mine is vertical.Select Auto Labeling: config for 'florence2, natural lenguage' or enter your captions manually; I'll do both, first auto labeling and then selecting Add Batch Labels to caption my Keywords. Its Important match the 'Keen n Tokens' with the number of Keywords desired.Keywords: traditional media, christmas parody, fine art, 1940 \(style\)Config the rest of parammeters (see below)You're done, start training!
Christmas Walkthrough | Add Radio Buttons to an old Ai Tool.
What are Radio Buttons?They allow you to use name syntax in your prompt to get a lines of prompt from a file. in TensorArt we will use it as susbtitition for personalized wildcards. So Radio Buttons are pseudo-wildcards. Check this article to know how to manipulate and personalize them. Radio Buttons requires a <Clip Text Encoder> node to be storo within.What do we need?Any working Ai ToolIn my current exploration only certain <CLIP Text Encoder> nodes allows you to use them as Radio Button containers. For this example I'll use my ai tool: 📸 Shutterbug | SD3.5L Turbo.Duplicate/Download your Ai Tool workflow (To have a Backup).Add a <CLIP Text Encode> node.Add a <Conditioning Combine> node,Ensamble the nodes as the illustration shows; be careful with the combine method, use concat if you're not experienced at combining clips, this will instruct your prompting to ADD the Radio Button calling prompt.💾 Save your Ai Tool workflow.Go to Edit mode in your Ai Tool.Export your current User-configurable Settings (JSON).↺ Update your Ai Tool.Import your old User-configurable Settings (JSON).Look for the new <CLIP TextEncode> node, and load it.Hover over the <CLIP TextEncode> new tab, and select Edit.Config your Radio Buttons.Publish your Ai Tool.Done! Enjoy the Radio Button feature in your Ai Tools, so in my case my new Ai Tool looks like this:📹 Shutterbug | SVD & SD3.5L Turbo.Note: I also included SVD video to meet the requirements of the Christmas Walkthrough event.
🎃 Halloween2024 | Optimizing Sampling Schedules in Diffusion Models
You migh have seen this kind of images in the past if you've girly tastes when navigate on pinterest, well guess what? I'll teach you about some parammeters to enhance your Pony SDXL future generations. It's been a while since my last post, today I'll teach you about a cool feature launched by NVIDIA on July 22, 2024. For this task I'll provide an alternative workflow (Diffusion Workflow) for SDXL. Now lets go with the content.ModelsFor my research (AI Tool) I decided to use the next models:Checklpoint model: https://tensor.art/models/757869889005411012/Anime-Confetti-Comrade-Mix-v30.60 LoRA: https://tensor.art/models/7025156632998356040.80 LoRA: https://tensor.art/models/757240925404735859/Sailor-Moon-Vixon's-Anime-Style-Freckledvixon-1.00.75 LoRA: https://tensor.art/models/685518158427095353NodesThe Diffusion Workflow has many nodes I've merged in single nodes I'll explain them below, remember you can group nodes and edit their values to enhance your experience.👑 Super Prompt Styler // Advanced Manager (CLIP G) text_positive_g: positive prompt, subject of the scene (all the elements the scene is meant for, LoRA Keyword activators).(CLIP L) text_positive_l: positive prompt, all the scene itself is meant (composition, lighting, style, scores, ratings).text:negative: negative prompt.◀Style▶: artistic styler, select the direction for your prompt, select 'misc Gothic' for halloween direction.◀Negative Prompt▶: prepares the negative prompt splitting it in two (CLIP G and CLIP L) for the encoder.◀Log Prompt▶: add information to metadata, produces error 1406 when enabled, so turn it off.◀Resolution▶: select the resolution of your generation.👑 Super KSampler // NVIDIA Aligned Stepsbase_seed: similar to esnd (know more here).similarity: this parameter influences base_seed noise to be similar to noise_seed value.noise_seed: the exact same noise seed you know.control after generate: dictates the behavior of noise_seed.cfg: guidance for the prompt, read about <DynamicThresholdingFull> to know the correct value. I recomend 12sampler_name: sampling method.model_type: NVIDIA sampler for SDXL and SD models.steps: the exact same steps you know, dictates how much the sampling denoises the noise injected.denoise: the exact same denoise you know, dictates the strong the sampling denoises the noise injected.latent_offset: select between {-1.00 Darker to 1.00 Brighter} to modify the input latent, any value different than 0 adds information to enhance final result.factor_positive: upscale factor for the conditioning.factor_negative: upscale factor for the conditioning.vae_name: the exact same vae you know, dictates how the noise injected is denoised by the sampler.👑 Super Iterative Upscale // Latent/on Pixel Spacemodel_type: NVIDIA sampler for SDXL and SD models.steps: number of steps the UPSCALER (Pixel KSampler) will use to correct the latent on pixel space while upscaling it.denoise: dictates the strenght of the correction on the latent on pixel space.cfg: guidance for the prompt, read about <DynamicThresholdingFull> to know the correct value. I recomend 12upscale_factor: number of times the upscaler will upscale the latent (must match factor_positive and factor_positive) upscale_steps: dictates the number of steps the UPSCALER (Pixel KSampler) will use to upscale the latent.MiscellaneousDynamicThresholdingFullmimic_scale: 4.5 (Important value. go to learn more)threshold_percentile: 0.98mimic_mode: half cosine downmimic_scale_min: 3.00cfg_mode: half cosine downcfg_scale_min: 0.00sched_val: 3.00separate_feature_channels: enablescaling_starpoint: meanvariability_measure: ADinterpolate_phi: 0.85Learn more: https://www.youtube.com/watch?v=_l0WHqKEKk8Latent OffsetLearn more: https://github.com/spacepxl/ComfyUI-Image-Filters?tab=readme-ov-file#offset-latent-imageAlign Your StepsLearn more: https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/LayerColor: Levelsset black_point = 0 (base level of black)set white_point = 255 (base level of white)Set output_black_point = 20 (makes blacks less blacks)Set output_white_point = 220 (makes whites less whites)Learn more: https://docs.getsalt.ai/md/ComfyUI_LayerStyle/Nodes/LayerColor%3A%20Levels/LayerFilter:Filmcenter_x: 0.50center_y: 0.50saturation: 1.75vignete_intensity: 0.20grain_power: 0.50grain_scale: 1.00grain_sat: 0.00grain_shadows: 0.05grain_highs: 0.00blur_strenght: 0.00blur_focus_spread: 0.1 focal_depth: 1.00Learn more: https://docs.getsalt.ai/md/ComfyUI_LayerStyle/Nodes/LayerFilter%3A%20Film/?h=filmResultAi Tool: https://tensor.art/template/785834262153721417DownloadsPony Diffusion Workflow: https://tensor.art/workflows/785821634949973948
Hunyuan-DiT: Recommendations
ReviewHello everyone; I want to share some of my impressions about the Chinese model, Hunyuan-DiT from tencent. First of all let’s start with some mandatory data to know so we (westerns) can figure out what is meant for:Hunyuan-DiT works well as multi-modal dialogue with users (mainly Chinese and English language), the better explained your prompt the better your generation will be, is not necessary to introduce only keywords, despite it understands them quite well. In terms of rating HYDiT 1.2 is located between SDXL and SD3; is not as powerful than SD3, defeats SDXL almost in everything; for me is how SDXL should’ve be in first place; one of the best parts is that Hunyuan-DiT is compatible with almost all SDXL node suit.Hunyuan-DiT-v1.2, was trained with 1.5B parameters.mT5, was trained with 1.6B parameters.Recommeded VAE: sdxl-vae-fp16-fixRecommended Sampler: ddpm, ddim, or dpmmsPrompt as you’d like to do in SD1.5, don’t be shy and go further in term of length; HunyuanDiT combines two text encoders, a bilingual CLIP and a multilingual T5 encoder to improve language understanding and increase the context length; they divide your prompt on meaningful IDs and then process your entire prompt, their limit is 100 IDs or to 256 tokens. T5 works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task.To improve your prompt, place your resumed prompt in the CLIP:TextEncoder node box (if you disabled t5), or place your extended prompt in the T5:TextEncoder node box (if you enabled t5).You can use the "simple" text encode node to only use one prompt, or you can use the regular one to pass different text to CLIP/T5.The worst is the model only benefits from moderated (high for TensorArt) step values: 40 steps are the basis in most cases.Comfyui (Comfyflow) (Example)TensorArt added all the elements to build a good flow for us; you should try it too.AdditionalWhat can we do in the Open-Source plan? (link)Official info for LoRA training (link)ReferencesAnalysis of HunYuan-DiT | https://arxiv.org/html/2405.08748v1Learn more of T5 | https://huggingface.co/docs/transformers/en/model_doc/t5How CLIP and T5 work together | https://arxiv.org/pdf/2205.11487
🆘 ERROR | Exception
Exception (routeId: 7544339967855538950230)Suspect nodes:<string function>. <LayeStyle>, <LayerUtility>, <FaceDetailer>, many <TextBox>, <Bumpmap>After some reseach (on my own) I've found<FaceDetailer> node is completely broken<TextBox> and <MultiLine:Textbox> node will cause this error if you introduce more than 250+ characters, I'm not very sure about this number, but you won't be able to introduce a decent amount of text anymore.More than 40 nodes, despite its function will couse this error.How do i know this? Well I made a functional comfyflow following those rules:https://tensor.art/template/754955251181895419The next functional comfyflow suddelny stopped from generating, it's almost the same flow than the previous, but with <FaceDetailer> and large text strings to polish the prompt. It works again yay!https://tensor.art/template/752678510492967987 proof it really worked (here)I feel bad for you if this error suddenly disrupt your day; feel bad for me cuz I bought the yearly membership of this broken product I can't refound. I'll be happy to delete this bad review if you fix this error.News081124 | <String Function> has been taken down. Comfyflow works slowly (but works)081024 | eveything is broken again lmao, we cant generate outside TAMS.080624 | <reroute> output node could trigger this error when linked to many inputs.072824 | <FaceDetailer> node seems to work again.
Understanding the Impact of Negative Prompts: When and How Do They Take Effect?
📝 - SynthicalThe Dynamics of Negative Prompts in AI: A Comprehensive Study by: Yuanhao Ban UCLA, Ruochen Wang UCLA, Tianyi Zhou UMD, Minhao Cheng PSU, Boqing Gong, Cho-Jui Hsieh UCLAEThis study addresses the gap in understanding the impact of negative prompts in AI diffusion models. By focusing on the dynamics of diffusion steps, the research aims to answer the question: "When and how do negative prompts take effect?". The investigation categorizes the mechanism of negative prompts into two primary tasks: noun-based removal and adjective-based alteration.The role of prompts in AI diffusion models is crucial for guiding the generation process. Negative prompts, which instruct the model to avoid generating certain features, have been less studied compared to their positive counterparts. This study provides a detailed analysis of negative prompts, identifying the critical steps at which they begin to influence the image generation process.FindingsCritical Steps for Negative PromptsNoun-Based Removal: The influence of noun-based negative prompts peaks at the 5th diffusion step. At this critical step, negative prompts initially generate a target object at a specific location within the image. This neutralizes the positive noise through a subtractive process, effectively erasing the object. However, introducing a negative prompt in the early stages paradoxically results in the generation of the specified object. Therefore, the optimal timing for introducing these prompts is after the critical step.Adjective-Based Alteration: The influence of adjective-based negative prompts peaks around the 10th diffusion step. During the initial stages, the absence of the object leads to a subdued response. Between the 5th and 10th steps, as the object becomes clearer, the negative prompt accurately focuses on the intended area and maintains its influence.Cross-Attention DynamicsAt the peak around the 5th step for noun-based prompts, the negative prompt attempts to generate objects in the middle of the image, regardless of the positive prompt's context. As this process approaches its peak, the negative prompt begins to assimilate layout cues from its positive counterpart, trying to remove the object. This represents the zenith of its influence.For adjective-based prompts, during the peak around the 10th step, the negative prompt maintains its influence on the intended area, accurately targeting the object as it becomes clear.The study highlights the paradoxical effect of introducing negative prompts in the early stages of diffusion, leading to the unintended generation of the specified object. This finding suggests that the timing of negative prompt introduction is crucial for achieving the desired outcome.Reverse Activation PhenomenonA significant phenomenon observed in the study is Reverse Activation. This occurs when a negative prompt, introduced early in the diffusion process, unexpectedly leads to the generation of the specified object within the context of that negative prompt. To explain this, researchers borrowed the concept of the energy function from Energy-Based Models to represent data distribution.Real-world distributions often feature elements like clear blue skies or uniform backgrounds, alongside distinct objects such as the Eiffel Tower. These elements typically possess low energy scores, making the model inclined to generate them. The energy function is designed to assign lower energy levels to more 'likely' or 'natural' images according to the model’s training data, and higher energy levels to less likely ones.A positive difference indicates that the presence of the negative prompt effectively induces the inclusion of this component in the positive noise. The presence of a negative prompt promotes the formation of the object within the positive noise. Without the negative prompt, implicit guidance is insufficient to generate the intended object. The application of a negative prompt intensifies the distribution guidance towards the object, preventing it from materializing.As a result, negative prompts typically do not attend to the correct place until step 5, well after the application of positive prompts. The use of negative prompts in the initial steps can significantly skew the diffusion process, potentially altering the background.ConclusionsDo not step less than 10th times, going beyond 25th times does not make the difference for negative prompting.Negative prompts could enhance your positive prompts, depending on how well the model and LoRA have learn their keywords, so they could be understood as an extension of their counterparts.Weighting-up negative keywords may cause reverse activation, breaking up your image, try keeping the ratio influence of all your LoRAs and models equals.Referencehttps://synthical.com/article/Understanding-the-Impact-of-Negative-Prompts%3A-When-and-How-Do-They-Take-Effect%3F-171ebba1-5ca7-410e-8cf9-c8b8c98d37b6?
Stable Diffusion [Floating Point, Performance in the Cloud]
Overview of Data Formats used in AIfp32 is the default data format used for training, along with mixed-precision training that uses both fp32 and fp16. fp32 has more than adequate scale and definition to effectively train the most complex neural networks. It also results in large models both in terms of parameter size and complexity.fp16 data format both in hardware and software with good performance. In running AI inference workloads, the adoption of fp16 instead of the mainstream fp32 offers tremendous advantages in terms of speed-up while reducing power consumption and memory footprint. This advantage comes with virtually no accuracy loss. The switch to fp16 is completely seamless and does not require any major code changes or fine-tuning. CPUs will improve their AI inference workload performance instantly. Overview of Data Formats used in AI fp32 is the default data format used for training, along with mixed-precision training that uses both fp32 and fp16. fp32 has more than adequate scale and definition to effectively train the most complex neural networks. fp32 can represent numbers between 10⁻⁴⁵ and 10³⁸. In most cases, such a wide range is wasteful and does not bring additional precision. The use of fp16 reduces this range to 10⁻⁸ and 65,504 and cuts in half the memory requirements while also accelerating the training and inference speeds. Make sure to avoid under and overflow situations.Once the training is completed, one of the most popular ways to improve performance is to quantize the network. A popular data format used in this process, mainly in edge applications is int8 and results in at most a 4x reduction in size with a notable performance improvement. However, quantization into int8 frequently leads to some accuracy loss. Sometimes, the loss is limited to a fraction of a percent but often results in a few percent of degradation, and in many applications, this degradation becomes unacceptable.There are ways to limit accuracy loss by doing quantization-aware training. This consists of introducing the int8 data format selectively and/or progressively during training. It is also possible to apply quantization to the weights while keeping activation functions at fp32 resolution. Though these methods will help limit the accuracy loss, they will not eliminate it altogether. fp16 is a data format that can be the right solution for preventing accuracy loss while requiring minimal or no conversion effort. Indeed, it has been observed in many benchmarks that the transition from fp32 to fp16 results in no noticeable accuracy without any re-training.ConclusionFor NVIDIA GPUs and AI, deploy in fp16 to double inference speeds while reducing the memory, footprint and power consumption. Note: If the original model was not trained using fp16, its conversion to fp16 is extremely easy and does not require re-training or code changes. It is also shown that the switch to fp16 led to no visible accuracy loss in most cases.Source: https://amperecomputing.com/
Blender to Stabble Diffusion, animation workflow.
Source: https://www.youtube.com/watch?v=8afb3luBvD8Mickmumpitz guides us on how to use Stable Diffusion, a neural network-based interface, to generate masks and prompts for rendering 3D animations. The process involves setting up passes in Blender, creating a file output node, and then using Stable Diffusion's node-based interface for image workflow. Overall, the video demonstrates how to use these AI tools to enhance the rendering process of 3D animations.The process involves setting up render passes, such as depth and normal passes, in Blender to extract information from the 3D scene for AI image generation. Users can create mask passes to communicate which prompts to use for individual objects in the scene. Stable Diffusion, a neural network-based interface, is used to generate masks and prompts for rendering. Mickmumpitz tell us the differences between using Stable Diffusion and SDXL for image generation and video rendering, highlighting the advantages and disadvantages of each, demonstrating how to use Stable Diffusion 1.5 in Blender to generate specific styles and control the level of detail in the AI-generated scenes.Mickmumpitz shows an updated workflow for rendering 3D animations using AI with Blender and Stable Diffusion. He created simplistic scenes, including a futuristic cityscape and a rope balancing scene, to test the updated version. The workflow uses render passes, such as depth and normal passes, to extract information from the 3D scene for AI image generation. The speaker also explains how to create mask passes to communicate which prompts to use for individual objects in the scene. The workflow aims to make rendering more efficient and versatile.
Stable Diffusion [ADetailer]
After Detailer (ADetailer)After Detailer (ADetailer) is a game-changing extension designed to simplify the process of image enhancement, particularly inpainting. This tool saves you time and proves invaluable in fixing common issues, such as distorted faces in your generated images.Historically we would send the image to an inpainting tool and manually draw a mask around the problematic face area. After Detailer streamlines this process by automating it with the help of a face recognition model. It detects faces and automatically generates the inpaint mask, then proceeds with inpainting by itself.Exploring ADetailer ParametersNow that you've grasped the basics, let's delve into additional parameters that allow fine-tuning of ADetailer's functionality.Detection Model:ADetailer offers various detection models, such as face_xxxx, hand_xxxx, and person_xxxx, catering to specific needs.Notably, face_yolo and person_yolo models, based on YOLO (You Only Look Once), excel at detecting faces and objects, yielding excellent inpainting results.Model Selection:The "8n" and "8s" models vary in speed and power, with "8n" being faster and smaller.Choose the model that suits your detection needs, switching to "8s" if detection proves challenging.ADetailer PromptingInput your prompts and negatives in the ADetailer section to achieve desired results.Detection Model Confidence Threshold:This threshold determines the minimum confidence score needed for model detections. Lower values (e.g., 0.3) are advisable for detecting faces. Adjust as necessary to improve or reduce detections.Mask Min/Max Area Ratio:These parameters control the allowed size range for detected masks. Modifying the minimum area ratio can help filter out undesired small objects.The most crucial setting in the Inpainting section is the "Inpaint denoising strength," which determines the level of denoising applied during automatic inpainting. Adjust it to achieve your desired degree of change.In most cases, selecting "Inpaint only masked" is recommended when inpainting faces.ReferenceThinkDiffusion
TagGUI - captioning tool for model creators
📥 Download | https://github.com/jhc13/tagguiCross-platform desktop application for quickly adding and editing image tags and captions, aimed towards creators of image datasets for generative AI models like Stable Diffusion.FeaturesKeyboard-friendly interface for fast taggingTag autocomplete based on your own most-used tagsIntegrated Stable Diffusion token counterAutomatic caption and tag generation with models including CogVLM, LLaVA, WD Tagger, and many moreBatch tag operations for renaming, deleting, and sorting tagsAdvanced image list filteringCaptioning parametersPrompt: Instructions given to the captioning model. Prompt formats are handled automatically based on the selected model. You can use the following template variables to dynamically insert information about each image into the prompt:{tags}: The tags of the image, separated by commas.{name}: The file name of the image without the extension.{directory} or {folder}: The name of the directory containing the image.An example prompt using a template variable could be Describe the image using the following tags as context: {tags}. With this prompt, {tags} would be replaced with the existing tags of each image before the prompt is sent to the model.Start caption with: Generated captions will start with this text.Remove tag separators in caption: If checked, tag separators (commas by default) will be removed from the generated captions.Discourage from caption: Words or phrases that should not be present in the generated captions. You can separate multiple words or phrases with commas (,). For example, you can put appears,seems,possibly to prevent the model from using an uncertain tone in the captions. The words may still be generated due to limitations related to tokenization.Include in caption: Words or phrases that should be present somewhere in the generated captions. You can separate multiple words or phrases with commas (,). You can also allow the captioning model to choose from a group of words or phrases by separating them with |. For example, if you put cat,orange|white|black, the model will attempt to generate captions that contain the word cat and either orange, white, or black. It is not guaranteed that all of your specifications will be met.Tags to exclude (WD Tagger models): Tags that should not be generated, separated by commas.Many of the other generation parameters are described in the Hugging Face documentation.
Stable Diffusion [Parameters]
Stable DIfusion Intro.Stable Diffusion is an open-source text-to-image AI model that can generate amazing images from given text in seconds. The model was trained on images in the LAION-5B dataset (Large-scale Artificial Intelligence Open Network). It was developed by CompVis, Stable AI and RunwayML. All research artifacts from Stability AI are intended to be open sourced.Promp Engineering.Prompt Engineering is the process of structuring words that can be interpreted and understood by a text-to-image model. Is the language you need to speak in order to tell an AI model what to draw.A well-written prompt consisting of keywords and good sentence structure.Ask yourself a list of questions once you have in mind something.Do you want a photo or a painting, digital art?What’s the subject: a person, an animal the painting itself?What details are part of your idea?Special lighting: soft, ambient, etc.Environment: indoor, outdoor, etc.Colo scheme: vibrant, muted, etc.Shot: front, from behind, etc.Background: solid color, forest, etc.What style: illustration, 3D render, movie poster?The order of words is important.The order and presentation of our desired output is almost as an important aspect as the vocabulary itself. It is recommended to list your concepts explicitly and separately than trying to cramp it into one simple sentence.Keywords and Sub-Keywords.Keywords are words that can change the style, format, or perspective of the image. There are certain magic words or phrases that are proven to boost the quality of the image. sub-keywords are those who belong to the semantic group of keywords; hierarchy is important for prompting as well for LoRAS or Models design.Classifier Free Guidance (CFG default is 7)You can understand this parameter as “Ai Creativity vs {{user}} prompt”. Lower numbers give Ai more freedom to be creative, while higher numbers force it to stick to the prompt.CFG {2, 6}: if you’re discovering, testing or researching for heavy Ai influence.CFG {7, 10}: if you have a solid prompt but you still want some creativity.CFG {10, 15}: if your prompt is solid enough and you do not want Ai disturbs your idea.CFG {16, 20}: Not recommended, uncoherency.Step CountStable Diffusion creates an image by starting with a canvas full of noise and denoise it gradually to reach the final output, this parameter controls the number of these denoising steps. Usually, higher is better but to a certain degree, for beginners it’s recommended to stick with the default.SeedSeed is a number that controls the initial noise. The seed is the reason that you get a different image each time you generate when all the parameters are fixed. By default, on most implementations of Stable Diffusion, the seed automatically changes every time you generate an image. You can get the same result back if you keep the prompt, the seed and all other parameters the same.⚠️ Seeding is important for your creations, so try to save a good seed and slightly tweak the prompt to get what you’re looking for while keeping the same composition.SamplerDiffusion samplers are the method used to denoise the image during generation, they take different durations and different number of steps to reach a usable image. This parameter affects the step count significantly; a refined one could reduce or increase the step count giving more or less subjective detail.CLIP SkipFirst of all we need to know what CLIP is. CLIP, which stands for Contrastive Language Image Pretraining is a multi-modal model trained on 400 million (image, text) pairs. During the training process, a text and image encoder are jointly trained to predict which caption goes with which image as shown in the diagram below.Just think on this like the size like a funnel which uses SD to comb obtained information from its dataset; big numbers result in many information to process, so the final image is not presize. Lower numbers narrow down the captions on the dataset, so you'd get more accurated results.Clip Skip {1}: Strong concidences and less liberty.Clip Skip {2}: Nicer concidences and few liberty.Clip Skip {3-5}: Many concidences and high liberty.Clip Skip {6}: Unexpeted results.ENSD (Eta Noise Seed Delta)Its like a slider for the seed parameter; you can get different image results for a fixed seed number. So... what is the optimal number? There is not. Just use your lucky number, you're ponting the seeding to this number. If you are using a random seed every time, ENSD is irrelevant.So why people use 31337 commonly? Known as eleet or leetspeak, is a system of modified spellings used primarily on the Internet. Its a cabalistic number, its safe using any other number.ReferencesAutomatic1111OpenArt Prompt BookLAIONLAION-5B Paper1337
Stable Diffusion [Weight Syntax]
Weight (Individual CFG for keywords): Colon stablish weight slider on keywords changing its default value(1.00 = default = x).( ) Round brackets, for modifying keyword’s value, example (red) means red:1.10(keyword) means (x+0.1x), if x=1 ⇒ (1+1(0.1)) = 1.10((keyword)) means (x+0.1x)², if x=1 ⇒ (1+0.1))² = 1.21(((keyword))) means (x+0.1x)³, if x=1 ⇒ (1+0.1))³ = 1.33((((keyword)))) means (x+0.1x)⁴, if x=1 ⇒ (1+0.1))⁴ = 1.46+ Plus, for modifying keyword’s value, example red+ means red:1.10keyword+ means (x+0.1x), if x=1 ⇒ (1+1(0.1)) = 1.10keyword++ means (x+0.1x)², if x=1 ⇒ (1+0.1))² = 1.21keyword+++ means (x+0.1x)³, if x=1 ⇒ (1+0.1))³ = 1.33keyword++++ means (x+0.1x)⁴, if x=1 ⇒ (1+0.1))⁴ = 1.46… etc[ ] Square Bracket, for modifying keyword’s value, example [red] means red:0.90[keyword] means (x+0.1x), if x=1 ⇒ (1-1(0.1)) = 0.90[[keyword]] means (x+0.1x)², if x=1 ⇒ (1-0.1))² = 0.81[[[keyword]]] means (x+0.1x)³, if x=1 ⇒ (1-0.1))³ = 0.72[[[[keyword]]]] means (x+0.1x)⁴, if x=1 ⇒ (1+0.1))⁴ = 0.65… etc- Minus, for modifying keyword’s value, example red+ means red:0.90keyword- means (x+0.1x), if x=1 ⇒ (1-1(0.1)) = 0.90keyword-- means (x+0.1x)², if x=1 ⇒ (1-0.1))² = 0.81keyword--- means (x+0.1x)³, if x=1 ⇒ (1-0.1))³ = 0.72keyword---- means (x+0.1x)⁴, if x=1 ⇒ (1+0.1))⁴ = 0.65… etcIn theory you can combine, or even bypass the limit values (0.00 - 2.00) with the correct script or modification in your dashboard.