Model Training: How should I Label My Datasets?

I made a Lora for “fox diving into snow”, and I need some help to refine the labels in my dataset.

Based on existing advice I collected around 40 images that reflect the concept, however I didn’t find much documentation on exactly how I should label those images, especially given the unique pose and subject of my lora. I also wanted the lora to be able to extend beyond just applying to foxes, and be able to influence other animal and humanoid subjects.

The problem I have encountered with my first version of the lora is that the fox's head should be buried in the snow and not visible in the shot. Unfortunately, more often than not, some or all of the head is visible in nearly every generation, even with tagging to try to remove the head entirely!

I could not find any articles that adequately explained how to handle this, so I Turned to ChatGPT, and following is the advice it gave. (I originally attached an image from National Geographic of a fox with its head in the snow, which I won’t repost here as it’s probably copyrighted).

Example: Labelling the "Fox Diving into Snow" Image

The attached image captures a fox in the middle of a hunting dive, where its hind legs and tail are prominently visible, but its head is completely buried in the snow. The absence of the fox's head is a key feature that must be labelled clearly.

A precise label for this image could be:
fox, snow, fox diving, head buried, tail up, legs up, dynamic pose, winter

Why This Label Works:

Subject: "fox" is the core focus of the image.
Action: "fox diving" describes the key behavior.
Pose-Specific Detail:
- "head buried" emphasizes that the fox's head is submerged in the snow and not visible.
- "tail up" and "legs up" highlight the upward positioning of the visible body parts.
Environment: "snow" and "winter" provide the seasonal and environmental context.
Pose Type: "dynamic pose" further emphasizes that the fox is mid-action, not static.

By including "head buried" in the labels, you clarify a critical aspect of the image, which differentiates this pose from others (e.g., standing, jumping, or resting foxes).

Best Practices for Labelling "Fox Diving into Snow" Images

Focus on Visibility: Always label which parts of the fox are visible and which are hidden. In this case, “head buried” is essential, as it defines the pose’s uniqueness.
- Examples: “head buried,” “tail visible,” “hind legs up.”
Describe the Pose: Use consistent terms to describe body positioning. For diving foxes, terms like “tail up,” “legs up,” “dynamic pose,” and “mid-dive” are clear and informative.
Highlight Environment: Include “snow” and “winter” in every label to maintain context for the dataset. If the snow has specific qualities (e.g., deep snow, fluffy texture), these can be included as optional descriptors.
Avoid Ambiguity: Avoid labels like “fox jumping” or “fox standing,” which may not represent the pose accurately. The term “diving” specifically describes this action.
Consistency is Key: Always use the same phrasing for similar images. For example, label every diving pose consistently as:
fox, snow, head buried, tail up, legs up, dynamic pose, winter

Building the Dataset: Other Images to Collect

To create a high-quality LoRA model for “fox diving into snow”, you’ll need a variety of images showcasing similar and related concepts:

Fox Diving Variants:
- Different angles of foxes mid-dive (side view, back view, partial body shots).
- Varying depths of snow where the head is buried (shallow, deep snow).
Dynamic and Static Comparisons:
- Mid-action poses (legs stretched, snow flying).
- After-action poses where the fox is emerging or resting in the snow.
Visibility Details:
- Images where only tails or legs are visible above the snow.
Environmental Diversity:
- Snow-covered fields, forests, or backdrops with varying lighting (e.g., overcast, sunrise, or sunset).
Unique Attributes:
- Snow texture (fluffy, compacted) and disturbances caused by the fox’s dive, such as kicked-up snow.

ChatGPT’s Summary

For a “fox diving into snow” LoRA, labelling must emphasize the pose's unique feature: the head being buried and not visible. Consistent use of labels like “head buried,” “tail up,” “legs up,” and “dynamic pose” ensures the AI accurately learns this concept. By collecting images with diverse angles, environments, and snow conditions, you create a robust dataset capable of generating realistic and dynamic images of foxes diving into snow.

Foxy’s Summary

So there you have it. This is what I was able to get out of ChatGPT. The advice seems pretty sound, and I plan to apply it when retraining my “Fox Dive” Lora (probably after we’re done with the Christmas Walkthrough Event). For now, I share it with you all in the hopes that more experienced users can confirm the accuracy of this information, and for other users like me who are still trying to get a grip on how best to label out our datasets.

And now my head is feeling pretty frazzled by all this LoRA talk, so I think I’m gonna go outside and stick it in the snow to cool off…