I'm not one to take sides, but side profile views, f*ck yeah! I am proud to present to you...
Side Doggy
This concept posed unique challenges as I initially bit off more than I could chew. I ended up training a 6 clip set bonified by 6 close-ups. My first attempts had the set split in half, some showing the left side and some showing the right side. This resulted, of course, in body horrors, mutations, conjoined penises, poorly executed blow jobs, etc...
I couldn't get consistent renders with it, but failure has a funny way of manifesting growth. Thanks to two weeks of duds, l had a Eureka moment, detailed below the dots, but first!
Versions
One
This may be the one and only... 6e-5, 256x256 (rough pixel average) for 60 frames. Pose very consistent and all from the same side. This is one of my best LORAs yet, I didn't even have to cherry pick for the showcase! (!!) Performs better in wide but surprisingly stable in tall ratios as well.
Wildcard prompt template.
A beautiful {slim|curvy} {Russian|French|Swedish|Swiss|Latina|Austrian|German|Dutch|English|Irish|Portuguese} woman is seen in side profile on all fours in a doggystyle position as a {fit|fat|skinny|muscular} {African|German|American|Latin|Asian} man kneels upright on the right, facing left, thrusting his {huge|averaged-sized|thick|small} penis in and out of her vagina from behind her. His pelvis stays closely aligned horizontally with her bent hips and buttocks.
She has {blonde|brown|dirty blonde|light blonde} {styled|straight|curly|tied up|pony-tailed} hair hanging down at the left side of the frame.
He is {holding|gripping|grabbing} her hips with his hands. Her {{red|black|white|pink|multi color} {bra|tube top|shirt} covers her chest|{large|medium sized|small} breasts jiggle with each thrust}.
The scene takes place in a well lit modern {bedroom|basement|living room|park|studio|attic|doctor's office|cubicle}.
Training Notes
The revelation presented itself after much hand wringing, but this is a biggie!
HunyuanVideo:
Does not use triggerwords. They are in fact just noise and add nothing, maybe even hurting the learning process, at worst, they are useless. Why?
HunyuanVideo's text encoder doesn't actually take the caption/prompt as is, it translates it semantically and develops it's own internal representation based on the meaning of the text. Non-english words don't carry meaning and so will not add value. We can stop including this. Try out this LORA if you're not yet convinced, the results are amazing without any trigger word.
No matter how good a LORA is, a poorly designed prompt will not behave. On the converse, a weak LORA may actually behave very well with the right prompt.
When looking for the best key phrase, because we DO need a trigger phrase, it was suggested by me and my interaction with ChatGPT (not always reliable but more knowledgeable than most humans when all is said and done), that in order to design an optimal LORA, we need to find the prompt that is going to approximate the pose/concept/specifics of the LORA we're trying to train.
If you want side doggy to work well, you need to isolate the most concise prompt phrases that will get the base model rendering the high level details. The result might be missing motion, or accuracy, but if you're training a two person LORA and without the LORA HunyuanVideo is only rendering one person, or three people, or horrible mutations or completely incorrect positioning, that prompt will not train well and it will not render well.
After a focused trial and error session involving multi paragraph prompts that finally hit the mark on the base model, I distilled the phrase(s) to these:
A woman is seen in side profile on all fours in a doggystyle position as a man upright on the right, facing left, thrusting his penis in and out of her vagina from behind her. His pelvis stays closely aligned horizontally with her bent hips and buttocks. She has hair hanging down at the left side of the frame.
That key paragraph gave me the two people, their positioning and sometimes even a penis between them, it was a moment of victory and relief. When prompting with this (and other environmental cues like the location) even my previously inconsistent LORA was producing good results, really good results, but I didn't stop there, I recaptioned my set with these phrases and trained overnight. I'm rendering showcase vids right now and I'm blown away by the fact that almost every single seed is hitting the mark, which is just insane. I'm 8/8 without mutations, extra people, heads on backwards, etc... (watch me jinx it... lol)
Not only do we all get a side doggy style LORA that works like magic, but we also get this critical training tip: Find the key phrases that make the base model approximate what your LORA is all about first, and only then base your captioning around those consistent phrases, adding in the necessary adjectives and all but sticking to the template like your life/LORA depends on it.
If the concept is complex and you have paragraphs worth of prompting to get the base model there, you'll need to find something shorter, I'd say more than ~40 words is getting long. The longer your captions, the less easily prompting will trigger the concept. The short the captions, the more likely longer more detailed prompts are going to include the right semantic meaning.
For example, if your caption is like 300 words, a prompt less than 300 words may not even succeed in generating the concept, so caption length is actually strongly correlated to how well your LORA will perform.
More training notes here in my training guide.
Disclaimer
Responsible promptly.