It's trained on the 14B model.
It works for Image to Video and Text to Video.
For I2V I'm using wan2.1-i2v-14b-480p-Q5_K_S.gguf and for T2V I'm using wan2.1-t2v-14b-Q5_K_S.gguf; it works on these, don't know how other models will perform.
The sample videos were all with a strength of 1.
I've found Text to Video can be a bit finicky with getting particular angles/viewpoints, but it follows instructions on the rest of the scene pretty well. Also the quality of the T2V images doesn't seem great, but I'm using the .gguf model at a low resolution and low steps, so that might just be my setup. Image to Video gives pretty good results it seems.
Props to https://civitai.com/user/dtwr434 for tips on training!