# 🧠 Kiko ComfyUI WAN 2.1 Native Workflow
ComfyUI image-to-video (I2V) pipeline built around WAN 2.1 using native ComfyUI and Torch compilation (`torch.compile`) for performance gains. The design includes 2-pass generation, frame interpolation, upscaling, and slow motion — tailored for high-fidelity AI-enhanced video generation.
---
## 📦 Workflow Overview
---
## 🛠️ Project Breakdown
### 🔧 Project Settings
- Project File Path Generator: Allows saving outputs with a defined base path. Set this to your local output folder.
- ✅ User Action: Update `root_path` to your preferred save location.
---
### 🧮 Aspect Ratio Logic (Don't Touch)
- Calculates `width` and `height` from image size using a float-to-int conversion for maintaining aspect ratio.
- ⚠️ Do not modify unless you understand aspect ratio propagation.
---
### 📸 Image Generation for Video (Optimized Resolution)
- When creating video frames using image generation tools like FLUX / SDXL, it's important to generate at the right resolution to maintain sharpness and consistency.
### 🎯 Target Video Resolution
- Target Size: `480x832`
- Aspect Ratio: `480 ÷ 832 ≈ 0.577`
### ✅ Ideal Generation Resolution
To preserve details and allow for high-quality downscaling, generate at 2x or higher resolution. A perfect match in aspect ratio ensures you avoid cropping or distortion.
| Gen Resolution | Aspect Ratio | Notes |
|----------------|--------------|---------------------------|
| `960x1664` | `960 ÷ 1664 ≈ 0.577` | ✅ Perfect aspect ratio match |
| `1024x1536` | `1024 ÷ 1536 ≈ 0.6667` | 🔶 Slight crop or padding needed |
### 🔄 Workflow
1. Generate High-Res Images
Use `960x1664` or larger with the same aspect ratio. Using FLUX, SDXL, etc.
### 🧮 Why This Works
- High-res generation reduces artifacts and increases fidelity.
- Downscaling averages pixels, smoothing jagged edges and noise.
- Maintaining the same aspect ratio avoids warping or unnecessary padding.
---
### 📥 Loaders
- Load Checkpoint (WAN2.1): Load the WAN 2.1 native (ComfyUI) model checkpoint.
- VAE & CLIP Loader: Loads required VAE and CLIP encoders.
- Power LoRA Loader (optional): For Power LoRa.
- Tile Cache, Enhance, and CLIP Vision: Load auxiliary models.
- ✅ User Action:
- Set `ckpt_name`, `vae_name`, and `clip_name` according to local model files.
- Ensure files are in your configured ComfyUI model folders.
---
### 🖼️ Image / Resize
- Load Image / Resize: Loads the input image or first frame from a video clip, resizes it to model-appropriate dimensions.
---
### 🌍 Global Settings
- CLIP Text Encode (Prompt & Negative): Prompts for conditioning the model.
- ✅ User Action: Customize these prompts per your subject/style.
- Seed Generator / Upscale Factor: Controls random seed and image scale-up.
- ✅ User Action: Set `seed` for reproducibility or leave -1 for random.
---
### 🔁 1st Pass (Initial Generation)
- KSampler: Runs the initial inference.
- VAE Decode & Video Combine: Decodes latent space to image, combines with source.
- Slow Motion / PlaySound: Optional audio sync and slow-mo settings.
- Select last frame for 2nd pass start frame. (Pop Up window)
---
### 🔁 2nd Pass (Refine & Extend)
- Similar to 1st Pass but optimized for longer inference or higher quality.
- Take last frame from 1st pass as 2nd pass starting image.
- Get Mask Range From Clip: Extracts mask regions for attention.
- Image Batch Multi: Processes multiple frames simultaneously.
---
### 📈 Upscaling & Frame Interpolation
- Image Sharpen / Restore Faces: Post-processing enhancements.
- Upscale Image (Real-ESRGAN or similar).
- Frame Interpolation (RIFE / FILM): Smooth transitions for higher FPS.
- Slow Motion: Optional, adds frames and blends for cinematic slow-mo.
---
### 🧪 Experimental (Optional, Long Runtime)
- Advanced enhancement or second-stage denoising/refinement.
- Useful for batch rendering with very high quality needs.
- ⏱️ Warning: These steps significantly increase processing time.
---
## ⚡ Torch Compile Setup (VERY IMPORTANT)
To unlock native acceleration via `torch.compile`, ensure you meet these requirements:
### ✅ Requirements
- PyTorch 2.1+ with CUDA
- NVIDIA GPU with Ampere or later architecture (RTX 30XX, 40XX)
- Use latest nightly ComfyUI or manually apply `torch.compile()` patching.
---
## 💾 Saving Outputs
- Controlled via Project Path Generator and Video Combine nodes.
- Output format (e.g. `.mp4`, `.png`, `.webm`) should be explicitly set in `Video Combine`.
---
## 📋 Notes
- ⚠️ First run of torch.compile will be slow due to graph tracing.
- 🧠 Prompt tuning is crucial for WAN 2.1 — try detailed descriptions.
- ⚠️ Not optimized for older machines.
---
## 🙋 FAQ
Q: My output is laggy or missing frames.
- Check interpolation settings and slow motion settings — disable one if not needed.
Q: Workflow crashes during torch compile.
- Ensure you're using PyTorch 2.1+, and your GPU is Ampere or newer.
Q: Can I use this with other models like SDXL?
- You can, but WAN 2.1 is optimized for this specific setup. Results may vary.
---
## 📎 Credits
- Workflow design by Kiko9
- WAN 2.1
- ComfyUI team for the powerful modular engine
---
## 📂 Folder Structure Example
ComfyUI/
├── models/
│ ├── checkpoints/
│ ├── vae/
│ ├── clip/
├── output/
│ └── generated/
├── custom_nodes/
│
---
### 📊 End-to-End WAN 2.1 Generation Summary
| Step | Description | Time / Count | Resolution |
|-------------------------------|----------------------------------------------------|-------------------------|------------------------|
| Prompt Start | Initial prompt execution begins | 92.95 sec | — |
| Model Load | Loaded WAN21 model weights | ~15,952 ms | — |
| First Comfy-VFI Pass | Generated frames with TeaCache initialized | ~6 min 13 sec | 480x832 |
| Frames Generated (1st pass) | Comfy-VFI output | 231 frames | 480x832 |
| Second Comfy-VFI Pass | Repeats generation with same steps | ~6 min 28 sec | 480x832 |
| Frames Generated (2nd pass) | Comfy-VFI output | (Implied) | 480x832 |
| WanVAE Load (1st) | Loaded latent space model | ~1220 ms | — |
| WanVAE Load (2nd) | Loaded again for reuse | ~1304 ms | — |
| Face Restoration (GFPGAN) | GFPGANv1.4 restored images | 152 frames | 512x512 |
| Comfy-VFI Run (3rd) | Generated additional frames | ~unknown | 960x1664 |
| Frames Generated (3rd pass) | Comfy-VFI output | 456 frames | 960x1664 |
| Comfy-VFI Run (4th) | Final batch of generation | ~unknown | 960x1664 |
| Frames Generated (4th pass) | Comfy-VFI output | 304 frames | 960x1664 |
| Prompt End | Final step of pipeline | 1050.60 sec | — |
> ℹ️ Notes:
> - "TeaCache skipped" 12 conditional + 12 unconditional steps per 30 = ~20% optimization.
> - Face restoration step was applied to a subset (152 frames).
> - The 960x1664 resolution used in the last two passes matches the 480x832 aspect ratio perfectly, ideal for downscaling or 2x video output.
## 🗨️ Feedback & Contributions
Feel free to submit issues if you encounter bugs or want to contribute improvements.
---
🔥 Happy rendering!