Kiko9 WAN Native ( ComfyUI Nodes )


Updated:

Workflow Preview

Showcases (Image/Video)

# 🧠 Kiko ComfyUI WAN 2.1 Native Workflow

ComfyUI image-to-video (I2V) pipeline built around WAN 2.1 using native ComfyUI and Torch compilation (`torch.compile`) for performance gains. The design includes 2-pass generation, frame interpolation, upscaling, and slow motion — tailored for high-fidelity AI-enhanced video generation.

---

## 📦 Workflow Overview

---

## 🛠️ Project Breakdown

### 🔧 Project Settings

- Project File Path Generator: Allows saving outputs with a defined base path. Set this to your local output folder.

- ✅ User Action: Update `root_path` to your preferred save location.

---

### 🧮 Aspect Ratio Logic (Don't Touch)

- Calculates `width` and `height` from image size using a float-to-int conversion for maintaining aspect ratio.

- ⚠️ Do not modify unless you understand aspect ratio propagation.

---

### 📸 Image Generation for Video (Optimized Resolution)

- When creating video frames using image generation tools like FLUX / SDXL, it's important to generate at the right resolution to maintain sharpness and consistency.

### 🎯 Target Video Resolution

- Target Size: `480x832`

- Aspect Ratio: `480 ÷ 832 ≈ 0.577`

### ✅ Ideal Generation Resolution

To preserve details and allow for high-quality downscaling, generate at 2x or higher resolution. A perfect match in aspect ratio ensures you avoid cropping or distortion.

| Gen Resolution | Aspect Ratio | Notes |

|----------------|--------------|---------------------------|

| `960x1664` | `960 ÷ 1664 ≈ 0.577` | ✅ Perfect aspect ratio match |

| `1024x1536` | `1024 ÷ 1536 ≈ 0.6667` | 🔶 Slight crop or padding needed |

### 🔄 Workflow

1. Generate High-Res Images

Use `960x1664` or larger with the same aspect ratio. Using FLUX, SDXL, etc.

### 🧮 Why This Works

- High-res generation reduces artifacts and increases fidelity.

- Downscaling averages pixels, smoothing jagged edges and noise.

- Maintaining the same aspect ratio avoids warping or unnecessary padding.

---

### 📥 Loaders

- Load Checkpoint (WAN2.1): Load the WAN 2.1 native (ComfyUI) model checkpoint.

- VAE & CLIP Loader: Loads required VAE and CLIP encoders.

- Power LoRA Loader (optional): For Power LoRa.

- Tile Cache, Enhance, and CLIP Vision: Load auxiliary models.

- ✅ User Action:

- Set `ckpt_name`, `vae_name`, and `clip_name` according to local model files.

- Ensure files are in your configured ComfyUI model folders.

---

### 🖼️ Image / Resize

- Load Image / Resize: Loads the input image or first frame from a video clip, resizes it to model-appropriate dimensions.

---

### 🌍 Global Settings

- CLIP Text Encode (Prompt & Negative): Prompts for conditioning the model.

- ✅ User Action: Customize these prompts per your subject/style.

- Seed Generator / Upscale Factor: Controls random seed and image scale-up.

- ✅ User Action: Set `seed` for reproducibility or leave -1 for random.

---

### 🔁 1st Pass (Initial Generation)

- KSampler: Runs the initial inference.

- VAE Decode & Video Combine: Decodes latent space to image, combines with source.

- Slow Motion / PlaySound: Optional audio sync and slow-mo settings.

- Select last frame for 2nd pass start frame. (Pop Up window)

---

### 🔁 2nd Pass (Refine & Extend)

- Similar to 1st Pass but optimized for longer inference or higher quality.

- Take last frame from 1st pass as 2nd pass starting image.

- Get Mask Range From Clip: Extracts mask regions for attention.

- Image Batch Multi: Processes multiple frames simultaneously.

---

### 📈 Upscaling & Frame Interpolation

- Image Sharpen / Restore Faces: Post-processing enhancements.

- Upscale Image (Real-ESRGAN or similar).

- Frame Interpolation (RIFE / FILM): Smooth transitions for higher FPS.

- Slow Motion: Optional, adds frames and blends for cinematic slow-mo.

---

### 🧪 Experimental (Optional, Long Runtime)

- Advanced enhancement or second-stage denoising/refinement.

- Useful for batch rendering with very high quality needs.

- ⏱️ Warning: These steps significantly increase processing time.

---

## ⚡ Torch Compile Setup (VERY IMPORTANT)

To unlock native acceleration via `torch.compile`, ensure you meet these requirements:

### ✅ Requirements

- PyTorch 2.1+ with CUDA

- NVIDIA GPU with Ampere or later architecture (RTX 30XX, 40XX)

- Use latest nightly ComfyUI or manually apply `torch.compile()` patching.

---

## 💾 Saving Outputs

- Controlled via Project Path Generator and Video Combine nodes.

- Output format (e.g. `.mp4`, `.png`, `.webm`) should be explicitly set in `Video Combine`.

---

## 📋 Notes

- ⚠️ First run of torch.compile will be slow due to graph tracing.

- 🧠 Prompt tuning is crucial for WAN 2.1 — try detailed descriptions.

- ⚠️ Not optimized for older machines.

---

## 🙋 FAQ

Q: My output is laggy or missing frames.

- Check interpolation settings and slow motion settings — disable one if not needed.

Q: Workflow crashes during torch compile.

- Ensure you're using PyTorch 2.1+, and your GPU is Ampere or newer.

Q: Can I use this with other models like SDXL?

- You can, but WAN 2.1 is optimized for this specific setup. Results may vary.

---

## 📎 Credits

- Workflow design by Kiko9

- WAN 2.1

- ComfyUI team for the powerful modular engine

---

## 📂 Folder Structure Example

ComfyUI/

├── models/

│ ├── checkpoints/

│ ├── vae/

│ ├── clip/

├── output/

│ └── generated/

├── custom_nodes/

---

### 📊 End-to-End WAN 2.1 Generation Summary

| Step | Description | Time / Count | Resolution |

|-------------------------------|----------------------------------------------------|-------------------------|------------------------|

| Prompt Start | Initial prompt execution begins | 92.95 sec | — |

| Model Load | Loaded WAN21 model weights | ~15,952 ms | — |

| First Comfy-VFI Pass | Generated frames with TeaCache initialized | ~6 min 13 sec | 480x832 |

| Frames Generated (1st pass) | Comfy-VFI output | 231 frames | 480x832 |

| Second Comfy-VFI Pass | Repeats generation with same steps | ~6 min 28 sec | 480x832 |

| Frames Generated (2nd pass) | Comfy-VFI output | (Implied) | 480x832 |

| WanVAE Load (1st) | Loaded latent space model | ~1220 ms | — |

| WanVAE Load (2nd) | Loaded again for reuse | ~1304 ms | — |

| Face Restoration (GFPGAN) | GFPGANv1.4 restored images | 152 frames | 512x512 |

| Comfy-VFI Run (3rd) | Generated additional frames | ~unknown | 960x1664 |

| Frames Generated (3rd pass) | Comfy-VFI output | 456 frames | 960x1664 |

| Comfy-VFI Run (4th) | Final batch of generation | ~unknown | 960x1664 |

| Frames Generated (4th pass) | Comfy-VFI output | 304 frames | 960x1664 |

| Prompt End | Final step of pipeline | 1050.60 sec | — |

> ℹ️ Notes:

> - "TeaCache skipped" 12 conditional + 12 unconditional steps per 30 = ~20% optimization.

> - Face restoration step was applied to a subset (152 frames).

> - The 960x1664 resolution used in the last two passes matches the 480x832 aspect ratio perfectly, ideal for downscaling or 2x video output.

## 🗨️ Feedback & Contributions

Feel free to submit issues if you encounter bugs or want to contribute improvements.

---

🔥 Happy rendering!

Nodes Detail

157 Nodes
Primitive Node Types
0
Custom Node Types
157
  • SetNode34
  • GetNode63
  • CLIPTextEncode2
  • ProjectFilePathNode1
  • Anything Everywhere?1
  • Power Lora Loader (rgthree)1
  • VHS_VideoCombine9
  • PlaySound|pysssss6
  • RIFE VFI3
  • ImageListToImageBatch1
  • KSampler2
  • VAEDecode2
  • ImageFromBatch+1
  • Seed Generator1
  • Int1
  • ImageBatchMulti1
  • WanImageToVideo2
  • GetImageRangeFromBatch3
  • SimpleMath+2
  • Display Any (rgthree)2
  • ImageSharpen1
  • ReActorRestoreFace1
  • SUPIR_Upscale1
  • LoadImage1
  • ImageResizeKJ1
  • PreviewImage2
  • SkipLayerGuidanceWanVideo1
  • WanVideoTeaCacheKJ1
  • CLIPLoader1
  • WanVideoEnhanceAVideoKJ1
  • UNETLoader1
  • Image Filter1
  • GetImageSize+1
  • VAELoader1
  • CLIPVisionEncode1
  • CLIPVisionLoader1
  • ImageScale1
  • Fast Groups Bypasser (rgthree)1