Wan2.2 Training Tutorial


Updated:

In this guide, we’ll walk through the full process of online training on TensorArt using Wan2.2. For this demo, we’ll be using image2video training so you can see direct results.

Step 1 – Open Online Training

Go to the Online Training page.
Here, you can choose between Text2Video or Image2Video.
👉 For this tutorial, we’ll select Image2Video.

rich text editor image

Step 2 – Upload Training Data

Upload the materials you want to train on.

  • You can upload them one by one.

  • Or, if you’ve prepared everything locally, just zip the files and upload the package.

    rich text editor image

    Step 3 – Adjust Parameters

    Once the data is uploaded, you’ll see the parameter panel on the right.

    💡 Tip: If you’re training with video clips, keep them around 5 seconds for the best results.

    rich text editor image

    Step 4 – Set Prompts & Preview Frames

    • The prompt field defines what kind of results you’ll see during and after training.

    • As training progresses, you’ll see epoch previews. This helps you decide which version of the model looks best.

    • For image-to-video LoRA training, you can also set the first frame of the preview video.

      rich text editor image

      Step 5 – Start Training

      Click Start Training once your setup is ready.
      When training completes, each epoch will generate a preview video.

      You can then review these previews and publish the epoch that delivers the best result.

      rich text editor image

      Step 6 – Publish Your Model

      After publishing, wait a few minutes and your Wan2.2 LoRA model will be ready to use.

      rich text editor image

      Recommended Training Parameters (Balanced Quality)

      Network Module: LoRA
      Base Model: Wan2.2 – i2v-high-noise-a14b
      Trigger words: (use a unique short tag, e.g. your_project_tag*)*

      Image Processing Parameters

      • Repeat: 1

      • Epoch: 12

      • Save Every N Epochs: 1–2

      Video Processing Parameters

      • Frame Samples: 16

      • Target Frames: 20

      Training Parameters

      • Seed: –

      • Clip Skip: –

      • Text Encoder LR: 1e-5

      • UNet LR: 8e-5 (lower than 1e-4 for more stability)

      • LR Scheduler: cosine (warmup 100 steps if available)

      • Optimizer: AdamW8bit

      • Network Dim: 64

      • Network Alpha: 32

      • Gradient Accumulation Steps: 2 (use 1 if VRAM is limited)

      Label Parameters

      • Shuffle caption: –

      • Keep n tokens: –

      Advanced Parameters

      • Noise offset: 0.025–0.03 (recommended 0.03)

      • Multires noise discount: 0.1

      • Multires noise iterations: 10

      • conv_dim: –

      • conv_alpha: –

      • Batch Size: 1–2 (depending on VRAM)

      • Video Length: 2

      Sample Image Settings

      • Sampler: euler

      • Prompt (example):

      Tips

      • Keep training videos around ~5 seconds for best results.

      • Use a consistent dataset (lighting, framing, style) to avoid drift.

      • If previews show overfitting (blurry details, jitter), lower UNet LR to 6e-5 or reduce Epochs to 10.

      • For stronger style binding: increase Network Dim → 96 and Alpha → 64, while lowering UNet LR → 6e-5.

9
0