Wan2.2 Training Tutorial


Updated:

In this guide, we’ll walk through the full process of online training on TensorArt using Wan2.2. For this demo, we’ll be using image2video training so you can see direct results.

Step 1 – Open Online Training

Go to the Online Training page.
Here, you can choose between Text2Video or Image2Video.
👉 For this tutorial, we’ll select Image2Video.

rich text editor image

Step 2 – Upload Training Data

Upload the materials you want to train on.

  • You can upload them one by one.

  • Or, if you’ve prepared everything locally, just zip the files and upload the package.

rich text editor image

Step 3 – Adjust Parameters

Once the data is uploaded, you’ll see the parameter panel on the right.

💡 Tip: If you’re training with video clips, keep them around 5 seconds for the best results.

rich text editor image

Step 4 – Set Prompts & Preview Frames

  • The prompt field defines what kind of results you’ll see during and after training.

  • As training progresses, you’ll see epoch previews. This helps you decide which version of the model looks best.

  • For image-to-video LoRA training, you can also set the first frame of the preview video.

rich text editor image

Step 5 – Start Training

Click Start Training once your setup is ready.
When training completes, each epoch will generate a preview video.

You can then review these previews and publish the epoch that delivers the best result.

rich text editor image

Step 6 – Publish Your Model

After publishing, wait a few minutes and your Wan2.2 LoRA model will be ready to use.

rich text editor image

Recommended Training Parameters (Balanced Quality)

Network Module: LoRA
Base Model: Wan2.2 – i2v-high-noise-a14b
Trigger words: (use a unique short tag, e.g. your_project_tag*)*

Image Processing Parameters

  • Repeat: 1

  • Epoch: 12

  • Save Every N Epochs: 1–2

Video Processing Parameters

  • Frame Samples: 16

  • Target Frames: 20

Training Parameters

  • Seed: –

  • Clip Skip: –

  • Text Encoder LR: 1e-5

  • UNet LR: 8e-5 (lower than 1e-4 for more stability)

  • LR Scheduler: cosine (warmup 100 steps if available)

  • Optimizer: AdamW8bit

  • Network Dim: 64

  • Network Alpha: 32

  • Gradient Accumulation Steps: 2 (use 1 if VRAM is limited)

Label Parameters

  • Shuffle caption: –

  • Keep n tokens: –

Advanced Parameters

  • Noise offset: 0.025–0.03 (recommended 0.03)

  • Multires noise discount: 0.1

  • Multires noise iterations: 10

  • conv_dim: –

  • conv_alpha: –

  • Batch Size: 1–2 (depending on VRAM)

  • Video Length: 2

Sample Image Settings

  • Sampler: euler

  • Prompt (example):

Tips

  • Keep training videos around ~5 seconds for best results.

  • Use a consistent dataset (lighting, framing, style) to avoid drift.

  • If previews show overfitting (blurry details, jitter), lower UNet LR to 6e-5 or reduce Epochs to 10.

  • For stronger style binding: increase Network Dim → 96 and Alpha → 64, while lowering UNet LR → 6e-5.

0