First Experience of Video LoRA Creation / Wan 2.2


Updated:

Simple tutorial to Wan 2.2 Video LoRA creation

First of all — let me just say this. I was blown away by how good Wan 2.2 Video Model is. 🤯

Honestly, I haven’t made that many videos before, but even I could tell right away: this model is awesome.

Making a LoRA for Wan 2.2? Much easier than I expected. Generating videos with it? Even more fun. It feels almost too easy at times. So—welcome to Wan 2.2! In this post, I’ll share my first attempts at creating LoRAs and generating videos, mixing in my experiences with Momo, Yor Forger, and Belleza.

Don’t expect a full textbook here. This is more of a friendly starter guide. Think of it as me handing over the starter settings that worked for me. And yes—I’m still a newbie myself. (So please, double check things before using them seriously 😅)

Goal & Setup

My very first attempt was with Momo from the manga Dandadan. (If you’re curious, I also posted that LoRA separately!)

Later I tried Yor Forger, and then Belleza. Each one taught me something new.

Base model: Wan2.2 low-noise variant (like t2v-14B-low-noise)

Output: short 3–8s anime-style clips (parks, cafés, jogging, lightsaber fight… you name it!)

Data Prep — where the pain begins 🥲

Honestly, preparing data takes way more effort than the actual training. Video LoRA is easier than still-image LoRA in many ways, but you do need good source clips.

What I did:

Around 50 images from 4 video clips (each clip ~5s long, recorded in 1080p 30fps). BTW, you can upload short clips directly (≤5s), but I didn’t. I preferred to control each frame manually.

Extracted frames: roughly 1 frame every 8–10 frames → ~3–4 images per second. There can be several options to do this, but using linux command is the simplest way. You only need to install ffmpeg by "brew install ffmpeg" from the command prompt of your terminal. In my case, a Mac terminal.

Then run:

ffmpeg -i input.mp4 -vf fps=2 frames/frame_%04d.png

This gives you ~2 frames per second.

Resolution: I reduced them down to 960×544, all consistent. (Other ratios like 1:1 or 9:16 also work fine.)

Cleanup: throw away cropped/blurred faces, and dedupe “almost identical” shots. (you’ll often end up with near-duplicates. That can cause overfitting, so I prefer handling images manually. Just my personal choice!)

Captions? They say that video LoRA isn’t as sensitive to captions as still-image LoRA, but it’s still worth cleaning up. Auto-labeler worked fine for me. Just remember to stick your trigger word in front of captions later.

Training Settings (LoRA)

I wanted something that balances cost vs. quality. Video training is expensive!

Base: Wan2.2-t2v-a14B-low-noise

Network: LoRA

Trigger: ek_momo_wan22_vl01 (pick your own trigger word, of course)

Repeat/Epoch: 3 / 8 → total steps = images × 24. (Higher is better, but this was “good enough.”)

UNet LR: 1.2e-4 — stable for me. (The default 1e-4 probably works too.)

Scheduler: cosine (I like the smooth decay, but constant also works.)

Warmup: 5% (important for cosine; for constant, basically 0).

Optimizer: AdamW8bit (default, can’t be changed).

LoRA Rank/Alpha: 64/32 (safe default). There’s a great tutorial on the TA Discord (guide channel + YouTube) if you want more depth.

Target Frames / Frame Sample: 11 / 6.

Save/Preview: every 2 epochs (saves credits). The cost of video Lora training is higher than the still image Lora's. I tried to reduce the training cost.

Preview prompt (super simple):

ek_momo_wan22_vl01, 1girl, medium shot, steady camera, soft daylight, clean background, smooth natural motion, subtle eye blink

👉 Keep previews simple. If the prompt is too fancy, you won’t know whether the LoRA is working.

Generation (Text → Video)

Here’s where the real fun begins. A tiny tweak—steps, CFG, or LoRA weight—can change the entire video.

My “sanity check” preset

Model: Wan2.2-t2v-a14b

Res / Len / FPS: 16:9 (480p), 3s, 16fps (Fast Mode)

Steps / CFG: 8 / 1

LoRA: 0.9–1.0 (depending on face sharpness)

Showcase preset (720p)

1280×720, 5–8s, 16fps

Steps / CFG: 25–30 / 4.2–4.6 (Quality Mode)

LoRA: 1.0

Prompt example (anime park scene):

anime style, cel shading, clean lineart, flat colors, ek_momo_wan22_vl01, 1girl, medium shot, steady camera, sunlit park, blue sky, soft ambient light, smooth natural motion, gentle head turn, subtle eye blink, no scene cuts

Swap out scene/background for cafés, jogging, or lightsaber duels.

Urgent Update: Fast Mode ⚡

Okay, here’s where it gets interesting.

When I first tested Wan 2.2, Fast mode gave me pretty bad results. So I ignored it.

But then I saw other users posting amazing videos generated in Fast mode, using only 8–9 steps. 🤯 Naturally, I retried with my Yor LoRA—and wow. The results were better than anything I got in Quality mode.

So here’s the reality check:

Fast mode = super cheap + super quick

Can actually beat Quality mode if your LoRA is solid

But… sometimes resolution/detail is lower, so it’s not perfect for final showcase

Still, for experimenting and quick iterations? Fast mode is a game-changer.

Closing Thoughts

So far I’ve made three video LoRAs:

Momo (lively, lots of movement—great first subject!)

Yor Forger (less active dataset—reminded me that motion variety really matters)

Belleza (beautiful test case—good for cinematic shots)

Each one was a fun experiment, and I’m honestly still learning. But if I can get these results as a beginner, Wan 2.2 Video LoRA creation is much more approachable than I ever imagined.

Fast mode, especially, opened my eyes. I’ll be using it a lot more going forward.

Anyway—that’s my first experience. Hopefully this helps you get started too. 🤗

1
0