workflow/SD3.5


Updated:

Workflow Preview

Showcases (Image/Video)

is a Multimodal Diffusion Transformer (MMDiT) text-to-image model with that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency, with a focus on fewer inference steps.

Please note: This model is released under the . Visit to learn or for commercial licensing details.

Model Description

  • Developed by: Stability AI

  • Model type: MMDiT text-to-image generative model

  • Model Description: This model generates images based on text prompts. It is an ADD-distilled that use three fixed, pretrained text encoders, and with QK-normalization.

License

  • Community License: Free for research, non-commercial, and commercial use for organizations or individuals with less than $1M in total annual revenue. More details can be found in the . Read more at .

  • For individuals and organizations with annual revenue above $1M: Please to get an Enterprise License.

Model Sources

For local or self-hosted use, we recommend for node-based UI inference, or or for programmatic use.

  • ComfyUI: ,

  • Huggingface Space:

  • Diffusers: .

  • GitHub: .

  • API Endpoints:

Implementation Details

  • QK Normalization: Implements the QK normalization technique to improve training Stability.

  • Adversarial Diffusion Distillation (ADD) (see the ), which allows sampling with 4 steps at high image quality.

  • Text Encoders:

    • CLIPs: , , context length 77 tokens

    • T5: , context length 77/256 tokens at different stages of training

  • Training Data and Strategy:

    This model was trained on a wide variety of data, including synthetic data and filtered publicly available data.

For more technical details of the original MMDiT architecture, please refer to the .

Nodes Detail

14 Nodes
Primitive Node Types
7
  • CLIPTextEncode2
  • ConditioningCombine1
  • KSampler1
  • PreviewImage1
  • VAEDecode1
  • CheckpointLoaderSimple1
Custom Node Types
7
  • ConditioningSetTimestepRange2
  • ConditioningZeroOut1
  • EmptySD3LatentImage1
  • Note1
  • TripleCLIPLoader1
  • ModelSamplingSD31