Upside Down
I had 5 videos at 60 frames (25FPS) and took a bunch of close ups from them, then applied natural captions with strategic omissions to the full body examples and tight controlled partial captions to the close-ups. LR 4e-5 as I seem to be settling into and about 80 epochs. Epoch 50 seems to be a sweet spot of sports, but a little bit of seed hunting is useful.
The showcase videos all contain the metadata.
Check out my Training Article if you want more details on the method to my madness.