ControlNet: Openpose adapter


Updated:

This article introduces the OpenPose adapter for ControlNet. If you’re new to ControlNet, I recommend checking out my introductory article first.

I don’t use OpenPose much myself, since I find the Canny + Depth combination more convenient. But I did some experiments specifically for this article, so consider this a first look rather than a deep dive.

The OpenPose adapter lets you copy the pose of humanoid characters from one image to another. Like other ControlNet adapters, it uses a preprocessor that takes an input image and generates a control file, in this case - a stick figure representing the positions of key joints and limbs. This stick figure then guides the image generation process. Here is an example:

Left - original picture by Chicken, center - stick figure generated by the OpenPose preprocessor, right - stick figure overlaid on the original image.

As you can see, the stick figure isn’t a full skeleton but marks key joints as dots connected by lines. The colors aren’t random, they follow a color code for different bones and joints. Bing search gives this reference for them. Here is the list of joints and bones with color scheme:

Looking at the example above, I think preprocessor didn't do a good job this time, a few joints seem to be quite a way off the mark and the legs are missing. The picture is somewhat non-trivial, it is close top-down view with perspective distortion. Still, preprocessor should have been able to handle it. It seems to be easy enough to alter the stick figures or even make them from scratch.

Openpose seems to be sensitive to scheduler and sampler settings. Unlike Canny and Depth, it refused to work with karras/dpm_adaptive I use normally, so I switched to normal/euler, 20 steps.

Here are the settings:

And here are the results:

As you can see, the pose and head position are copied to some extent.

I used the default preprocessor here, there are more:

Here is the stick figure for openpose_full:

It includes fingers. A single white dot represents face, I guess preprocessor just failed here. Fingers are nowhere to be seen in the results:

It seems the preprocessor and the main model are out of sync.

The dw_openpose_full stick figure looks promising:

It includes markings for face, eyes and mouth contours. Results, though, are disappointing, it seems to be completely ignored. I think dw_openpose_full preprocessor is not compatible with the adapter model.

So, yeah, quite disappointing. It is not a complete loss, it does work to some extent and can be useful. It is just difficult to be excited about this one.

I should point out that I’m specifically talking about the ControlNet OpenPose adapter for the SDXL-based model on tensor.art, these conclusions are in no way representative for other implementations of this adapter. Also, it is possible that I am "holding it wrong". These things can be tricky and my experience using it is very limited.

If I am missing something here, feel free to drop a comment.

5
0