ControlNet: Canny adapter

This article explores how to use the Canny ControlNet adapter. If you have no idea what it means, refer to my introductory article.

Canny adapter uses the Canny edge detector, an algorithm for finding edges in an image. It is widely used in computer vision due to its simplicity and efficiency. Any picture can be a source for Canny adapter. Screenshot, photo, illustration - if you like something about it you might be able to replicate it, iterate on it, improve and alter in a specific way you want.

To create a Canny control file (I will call them "CCF"), you can use the built-in adapter preprocessor, like we did in the intro article. It works fairly well, but there’s a catch: the output resolution is pretty low. The preprocessor scales the image so that its shorter side is 512 pixels before generating the control file.

It can work remarkably well, but it limits fine detail. A representation of a simple black outline requires at least 3 pixels as there are 2 color transitions. The smaller the detail, the more likely it is that the preprocessor will miss or mangle it.

Fortunately, the Canny adapter accepts external files — and these can have much higher resolutions. Unfortunately, it can't be easily used to create high resolution CCFs. Also, the detector parameters are hard-coded. It can be a problem. Look at these two pictures:

The left one is created by the preprocessor, the other - by an external program. The left one has more details but harder to read and edit. A lot of details there are not essential for image generation and likely to be in the way if you want to change something.

To work around these two problems I used ImageMagick, a free open-source cross-platform software suite for image manipulation. I highly recommend it. It is a command line tool, so brace yourself if you are not into that. Currently I use Windows which is heavily GUI oriented. Far Manager greatly simplifies work with command line.

Here is my batch file for ImageMagick Canny detector, canny.bat:

<ImageMagick binaries path>\convert.exe %1 -canny 0x1+%2%%+%3%% %1.canny-%2-%3.png

The 1st parameter is a file name, the 2nd and 3rd are percentile numbers (0 to 100) used by ImageMagick Canny edge detector. I will not try to explain what they are, ImageMagick documentation is not great, but they control the sensitivity of the edge detector. Experiment with the numbers, and you will see. It usually took me about 3-4 tries to get a CCF I liked.

About two months ago I heard that LLMs had become proficient in programming and asked ChatGPT to write a Java program for making CCFs with interactive visual control. It worked like charm. I will think of a way to share it. I could probably put it on GitHub. It would require "a very particular set of skills" to download, configure, and run though, so I recommend to use ImageMagic for now.

A more accessible option would be an online generator. I found this one: https://randomtools.io/image-tools/edge-detection-online-tool

It has fixed Canny parameters, but it still can be useful. I didn't look too hard though, personally I don't need it. Again, if you find a better one, leave a comment.

Editing CCFs is tricky - they weren’t really meant to be edited manually, and there is no rulebook for it. Topology is convoluted. A single pixel in wrong place can lead to color spill into the wrong area of the image or complete reinvisioning of parts of the scene by AI - body parts merging with environment, stuff like that. Some changes are safe - see the bear replacement in the previous article. "60% of the time, it works every time".

In many cases it is easier to fix issues in CF than in "real" picture - you are just altering white lines on black field, mostly just erasing them. You can use splines to make really smooth lines with little effort, demon tails and body curves for example. Removing shadows is usually trivial. Tracing can be used to great effect (hello, layers). It gets easier with practice.

Here is an example of placing multiple objects into CCF:

Most of this cocktail glass was created in a CCF using ellipses and splines, I couldn't find a good reference for it. The fairy was originally sitting in a bathtub, she was generated in high resolution with a CCF ensuring AI makes the bathtub in shape and angle fitting for the glass I made. Strawberry, lime and wine bottle are from Bing image search, 3 different pictures. This seems to be the final CCF I used:

Note the non-intersecting objects. Lines representing the tablecloth do not touch strawberries, yet in resulting pictures tablecloth often covers entire table. I had quite a few troubles sticking the fairy and liquid into the cocktail glass, almost given up. Now I would act differently - make a collage and produce CCF from it. As this operation loses a lot of information (such as colors) you can get away with very sloppy work and yet get workable CCFs.

So, you can either edit CCF file directly or edit a "real" picture and make a CCF based on it. Neither approach is obviously superior, I choose which one to use based on what I have and what I want to achieve. Often I use both in the same project. The process is iterative. You gradually improve your CCF: erase what doesn’t work, add what does. Eventually, you should end up with a control file that consistently gives good results.

To remove an object from a picture you can either paint the corresponding area of CCF black or smooth it down in real picture to the point Canny filter stops detecting the edges identifying the object.

Frequent special case - I am happy with the centerpiece but don't like the background. This one is easily handled in CCF, just take a large black brush and go nuts. It is hard to mess up. Then ask for new background using prompt.

I see 3 ways to place a new object into picture:

- put its Canny representation into your CCF

- make a rough collage of the picture you want and make a new CCF based on it

- paint the part of CCF where you want changes black and ask AI to generate them

If you chose CCF alteration, it helps to have black border around the new object to reduce probability of two CF images merging incorrectly, 2-3 pixels should work. It is not precise science, but you start to feel what works and what not with practice.

It is difficult to merge complex objects in direct contact when you edit CF - topology of line connections is trivial to mess up. For this reason you should generally avoid trying to merge areas covered by hair, connecting two images correctly is extremely tedious except for most trivial cases.

CCFs can be scaled to some extent. There is a limit to it naturally, if lines blur together they stop being useful. Upscaling is safer. Lines don't have to be pure white, gray works. They don't have to be absolutely crisp, antialiased CCFs still work. It definitely doesn't improve results though so I wouldn't do it.

To state the obvious, resolution of CCF may differ from target resolution. It may have different aspect ratio, adapter scales CCF to cover entire picture which usually means that part of CCF goes out of the frame.

It is important to remember that Canny filter doesn't see anything, it's just math. Gradual transition of color is not an edge, objects that are obviously distinct to our eye are not necessarily same for the detector. It also may not detect low-intensity color transition as an edge. Dark elements adjacent to each other may not be properly separated. The same problem with clothes folds and seams marked by black lines over dark filler - they can easily be lost. The easy way to check for lost boundaries is to overlay CCF over source picture (or vice versa) and make the top one semi-opaque. Preprocessing the image before the detector run, e.g. changing brightness/contrast of the problematic area also often works. The detector doesn't care if your character looks like a clown, it cares about color transitions.

Canny detects edges. Edge is a border between two areas of different colors, so it includes both borders between objects which define geometry of the scene and the less impactful borders such as shadows, cloth patterns, etc. The former are more important and if AI gets confused it is often useful to make a cleanup and simplify the scene. Running Canny detector with different settings often helps.

There is a side benefit to this adapter. Generation at high resolutions, like 1536x1536, is prone to randomly fail due to instability, presumably because models were not trained with this resolution. The typical failures are doubles in the scene or hilariously malformed bodies. The glitch can be useful, the base image for twin sisters was generated this way. Canny adapter seems to help avoid this particular problem completely. It keeps the model in bounds. The majority of pictures in my profile are made in this resolution.

Another benefit is, even when things fail, results stay very close to each other in geometry. Let's say I am making a picture with an angel and a demon. If I have one picture with good angel and one picture with good demon, I can "easily" move part of the picture from one to another, drop-in replacement. That's what most of those "generate by workflow" images in my gallery are about - instead of waiting for gacha to smile on me and deliver a perfect picture I just mix and match good parts of failed tries.

"And that's all I have to say about" Canny adapter. If you find it somewhat messy, it is. It is new experience for me and I am still learning new tricks. Maybe posting a few step by step examples of developing images would be more instructive. I will think about it.

If you have specific questions, "I am here if you want to talk".