Model = FLUX.1 Schnell
Flux Fusion V2 [3-4 steps] [AIO & UNET] [ALL GGUF • NF4 • FP8/FP8]
Improved from V1 with sharper good quality images from 4 steps. This version merges Schnell + Finetuned Dev + Hyper using the same but refined formula of variable block ratios from V1.
AIO (All in one) versions include UNET + VAE + CLIP L + T5XXL (fp8). Also known as Checkpoint or Compact version.
It is recommended to use simple/beta scheduler for low steps.
link
DualCLIPLoader:
t5xxl_fp16.safetensors
clip_l_sdxl_base.safetensors
--- ---
CLIP has a maximum of 77 tokens.
T5XXL can handle up to 512 tokens or more, depending on the specific configuration. However, for performance reasons or common usage, a practical limit of 200 to 300 tokens is often observed.
---
Text Encoding: Uses the CLIP model (Contrastive Language-Image Pretraining) to encode the text input into `clip_l`, capturing key features and semantic information from the text.
Enhanced Text Understanding: Employs the large language model T5XXL to process the `t5xxl` input, potentially expanding or refining textual descriptions to provide richer semantic information.
Multimodal Fusion: Combines the processing results from CLIP and T5XXL. While `clip_l` and `t5xxl` are respectively used for entering tags and natural language, in practice, users can input the same textual prompts for both to experiment with desired effects.
GUIDANCE,CFG SCALE: Adjusts the influence of textual prompts on image generation through the GUIDANCE parameter, allowing users to strike a balance between creative freedom and strict adherence to prompts. Higher values increase image-prompt alignment but may reduce creativity.
--- ---
Default VAE (Variational Autoencoder) = ae.sft
--- ---
Max Shift and Base Shift : Set to ""after LORA "
This function allows users to have greater control over the image rendering process by adjusting parameters such as Max Shift and Base Shift.
The presenter explains that these settings can be applied before or after the Lora model, influencing how the model generates images.
--- ---
stop_at_clip_layer = -2 is equivalent to clipskip = 2
Just ComfyUI's node requires negative value.
--- --- --- ---
LORA Strength_model and strength_clip :
What is the difference between strength_model and strength_clip in the “Load LoRA” node?
These separate values control the strength that the LoRA is applied separately to the CLIP model and the main MODEL. In most UIs adjusting the LoRA strength is only one number and setting the lora strength to 0.8 for example is the same as setting both strength_model and strength_clip to 0.8.
The reason you can tune both in ComfyUI is because the CLIP and MODEL/UNET part of the LoRA will most likely have learned different concepts so tweaking them separately can give you better images.