T800_robot - Half-Robot & Full Endoskeleton LoRA (V2) for Wan 2.1 T2V
Fuel my GPU's caffeine addiction! (Ko-fi) https://ko-fi.com/cyberaimania
Trigger Word: T800_robot
The Upgraded Terminator Effect!
Hello everyone! I'm excited to present the second version (V2) of my T800_robot LoRA, meticulously designed to generate a compelling "half-robot / damaged endoskeleton" face effect, as well as full endoskeleton bodies and robotic limbs, specifically for the Wan 2.1 T2V 14B model.
Building upon the feedback and limitations of V1 (trmi800), this V2 has undergone significant improvements to offer a cleaner, more versatile, and powerful robotic transformation. The primary focus was to eliminate the "Arnold bleeding" issue and expand the creative possibilities.
What's New in V2?
Thanks to extensive work on the dataset and training process, V2 introduces the following key enhancements:
Significantly Reduced "Arnold Bleeding": Through advanced dataset editing techniques, the influence of the actor's likeness has been substantially minimized, allowing for the robotic effect to be applied to a much wider range of faces.
Enhanced Versatility - Beyond the Half-Face: This LoRA now goes beyond just the half-robot face effect! V2 is capable of generating:
Full Endoskeleton Bodies: Create complete robot characters.
Robotic Limbs: Apply robotic arms, legs, or other mechanical parts to human figures.
Mix and Match: Combine these effects! Generate a full human body with only a half-robot face, or a complete endoskeleton with a human hand – the possibilities are now truly expansive.
Potentially Improved Results for Full Body Shots: With the inclusion of more full-figure reference materials in the dataset, V2 may offer better results when generating the robotic effect on entire bodies (though close-ups and medium shots are still recommended for optimal detail).
I put a lot of effort into this version, as I had to practically create the dataset from scratch using other AI tools.
Training Data:
PNG Images: 257 (divided into categories)
Video Clips: 16
Key Training Parameters:
Base Model: Wan 2.1 T2V 14B
Epochs: 14
Micro Batch Size: 1
Gradient Accumulation: 4
Learning Rate: 5e-5
LoRA Rank: 32
Resolution: 1024 (images), 480 (videos)
Work on V3 is underway... Hasta la vista baby
-------------------------------------------------
OLDER_ V1 version:
trmi800- Half-Robot Face Effect LoRA (V1) for Wan 2.1 T2V
Fuel my GPU's caffeine addiction! (Ko-fi)
https://ko-fi.com/cyberaimania
Trigger Word: trmi800
Introduction & Challenges
Hello everyone! This is the first version (V1) of my rbt800
LoRA, designed to create a specific "half-robot / damaged endoskeleton" face effect for the Wan 2.1 T2V 14B model.
Creating this LoRA was a long and challenging journey. The main difficulty was gathering a suitable dataset, as high-quality source material for this specific concept is scarce. A significant portion of the available reference images and videos prominently features actor Arnold Schwarzenegger. As you'll see, this had an impact on the final result.
Training Details
This version was trained on an RTX 4090 (24GB VRAM) using diffusion-pipe
. The process involved significant effort in data curation and parameter tuning.
Base Model: Wan 2.1 T2V 14B
Dataset: Mixed
114 PNG Images (Target resolution: 512p) -
num_repeats = 5
13 Video Clips (Various 480p resolutions: 480x480, 480x270, 480x320) -
num_repeats = 30
Training Parameters (Approximation):
Epochs 10
Learning Rate:
4e-5
LoRA Rank: 32
Optimizer: AdamW8bitKahan
Gradient Accumulation Steps: 2
Key Optimizations:
transformer_dtype = float8
,activation_checkpointing = 'unsloth'
,blocks_to_swap
20
Results, Limitations & Known Issues
I'm not entirely satisfied with this V1, mainly due to the dataset limitations mentioned earlier.
Arnold Bleeding: When using this LoRA alone, you will likely notice that the generated character often resembles Arnold Schwarzenegger. This is because, besides learning the half-robot effect, the LoRA also picked up the actor's distinct features present in much of the dataset.
Works Well When Mixed: The good news is that this "bleeding" effect is usually overridden when you combine this
rbt800
LoRA with another LoRA representing a different person's face or character. In such cases, the robotic effect applies correctly without imposing Arnold's likeness.Best Use Cases: This LoRA performs best for medium shots (waist-up) and close-ups focusing on the face.
Limitations: Achieving satisfactory results for full-body shots (head-to-toe) can be challenging with this version.
Recommended Generation Settings
Based on my tests, I achieved the best results with:
Steps: 25-30
CFG Scale: 6-7
Flow Shift (Wan Node): 6-9
Sampler: Euler (or Euler Ancestral)
Feel free to experiment, but these are good starting points.
Future Work & Concepts (Work in Progress)
I'm exploring ways to improve this LoRA and mitigate the "Arnold bleeding" issue. Here's what I've considered or am working on:
Blurring (Failed Attempt): My initial idea was to blur the human part of the face in the dataset. Unfortunately, this backfired – the LoRA learned the blur itself, resulting in blurry human faces in generations. I don't recommend this approach.
Masked Training: The
diffusion-pipe
documentation suggests mask support (mask_path
). This could be the ideal solution, allowing the training to focus only on the robotic parts while ignoring the human face. However, manually masking over 100 images and 13 videos is extremely time-consuming, and automated tools I've tried haven't yielded satisfactory results due to the variability between images.Next Steps (Concept): My current plan involves a multi-pronged approach:
Finding better source material showing the full figure for improved full-body training.
Using other tools (like Flux) to change the human face in the existing dataset from Arnold to more generic faces before retraining. This should hopefully prevent the LoRA from learning his specific features.
I don't have an ETA for a V2 yet. For now, I hope this V1 proves useful or fun for some of you, especially when combined with other character LoRAs.
1. LORA Trigger:
trmi800
(Always include at the beginning of the prompt)
2. Subject Description (The Man with the T-800 Face):
a man with a battle-damaged face
half-human, half-robot face
severe damage on the [left/right] side
(Specify the side!)exposing metallic endoskeleton
intricate metallic endoskeleton structure
torn skin and flesh
shredded, burnt flesh
glistening dark metallic endoskeleton
chrome components / chrome parts
a piercing glowing red robot eye
intact human eye and skin on the [opposite] side
expression is menacing / grim / determined
eyes are intense / scanning
3. Scene/Environment Description:
in a [type] environment
in a metal foundry
in a bleak, desolate, snowy landscape
on a New York City street in 1981 during the daytime
inside a classic 1980s USA telephone booth
in a dark, narrow alleyway at night
in a bustling cyberpunk city street at night
amidst the ruins of an ancient, crumbling structure
inside a dark, functional laboratory
in a scene of fire and destruction
with intense fire and smoke in the background
glowing molten metal tanks and heavy machinery are visible in the background
rain-soaked [location]
amidst vintage cars and pedestrians
4. Action/Motion Description:
standing still
walking [direction, e.g., to the right side of the frame]
walking sideways
walking with a determined stride
kneeling on one knee, head down
slowly stands up
rises slowly from the flames
looking directly at the camera
staring directly into the camera
turns its head [direction, e.g., towards the camera / to face slightly forward]
smoothly moves his head from left to right, scanning the area
energetically shakes his head from left to right
(Use for violent movement)eyes darting and focusing
holding a telephone receiver to his damaged, robotic ear and talking
in slow motion
(Apply at the beginning of the motion description)
5. Camera Language:
Shots:
extreme close-up (ECU) on the face / on the eye
close-up shot
medium close-up
medium shot
full shot
overhead shot
high angle perspective
low angle shot
Point-Of-View (POV) shot
establishing shot of the scene
Movements:
camera slowly pushes in
(Dolly in)camera slowly zooms in
camera tracks the figure's face
camera performs a smooth tracking shot, moving alongside the man
camera pans left / right
camera tilts up / down
camera rolls
6. Atmosphere and Lighting:
dramatic lighting
high-contrast lighting
dark atmosphere
intense atmosphere
gritty atmosphere
moody lighting
realistic lighting appropriate for the environment
fire provides the ****, harsh, flickering illumination
illuminated by [light source, e.g., the intense heat / flickering neon signs]
cold, harsh atmosphere
bleak and desolate atmosphere
eerie atmosphere
realistic rain effect
wet pavement reflections
7. Style and Quality:
cinematic video
gritty realism
high detail on textures ([type] textures, e.g., skin, metal, dust, snow])
sharp focus
cinematic quality
realistic rendering
You can combine these phrases and elements to build your prompts, following the general structures we discussed earlier. Good luck with your video generation!
I'll be back... 😉
Let me know what you think!