ControlNet
ControlNet is an extension for diffusion image models that conditions generation on a structural reference — such as an edge outline, human pose, or depth map — so the output follows a specified layout or shape.
June 16, 2026

ControlNet is an add-on for diffusion image models that conditions generation on a structural input — such as an edge map, pose skeleton, or depth map — giving precise control over the composition of the result.
How it works
A standard text-to-image model is guided only by your prompt, which is good at describing content and style but vague about exact placement. ControlNet adds a second input channel: a preprocessed structural reference extracted from an image. That reference might be a Canny edge outline, a scribble, an OpenPose stick-figure capturing a person's pose, a depth map describing 3D distance, or a segmentation map marking regions.
During generation, the model is constrained to respect that structure while the prompt supplies appearance — colors, materials, lighting, and style. So a pose skeleton fixes how a figure stands, and the prompt decides who the figure is and what they wear. Technically, ControlNet attaches a trainable copy of the model's layers that injects the structural signal without retraining the whole network, which is why many control types can be swapped in cleanly.
Why it matters
ControlNet bridges the gap between "describe it and hope" and exact, repeatable composition. It is what makes consistent poses, faithful re-renders of a sketch, and architecture that matches a floor plan possible. For workflows that need the same layout across many variations — product shots in a fixed frame, characters in matching poses, or a redesign that keeps the original geometry — it turns generation from a lottery into a controllable tool.
In eaxy
eaxy focuses on fast prompt-driven generation and style packs rather than manual control rigs, so ControlNet-style structural conditioning is a power-user concept worth knowing when you compare tools. Understanding it helps explain why clear prompts and reference-led workflows produce more predictable composition.
Related terms
Frequently asked questions
What does ControlNet do?+
It lets you steer an image model with a structural guide instead of words alone. Feed it a pose skeleton, an edge sketch, or a depth map, and the generated image follows that geometry while the prompt fills in style and detail.
How is ControlNet different from a normal prompt?+
A prompt describes what you want; ControlNet enforces where things go. Prompts struggle to pin down exact composition or pose, while ControlNet locks the structure to a reference image so the layout is predictable.
What kinds of control does ControlNet support?+
Common modes include Canny edges, scribbles, human pose (OpenPose), depth maps, normal maps, and segmentation maps. Each constrains a different aspect — outlines, body position, 3D layout, or regions.
Make it with eaxy
Describe anything and generate stunning images in seconds — then bring them to motion with Kling 3.