Enhancing Artistic Control in AI Image Generation with Proportion and Perspective ControlNets

TLDR: Researchers introduced two new ControlNet modules for Flow Matching-based diffusion models: a Proportion ControlNet using bounding boxes for object placement and scale, and a Perspective ControlNet using vanishing lines for 3D scene geometry. These modules provide artists with higher-level control over image generation, trained using automated data pipelines. While effective for their respective tasks (especially 1- and 2-point perspectives), they show limitations with complex constraints like 3-point perspectives and require careful guidance strength management when used together.

Modern text-to-image diffusion models have made incredible strides in generating realistic and complex images. However, artists and creators often face a significant challenge: precisely controlling the spatial arrangement and geometric structure of the elements within these generated images. While a simple text prompt can guide the overall scene, it offers limited fine-grained control over where objects appear or how perspective is rendered.

Addressing this limitation, a new research paper introduces two specialized ControlNet modules designed to give artists more intuitive and high-level control over image generation. These modules, the Proportion ControlNet and the Perspective ControlNet, extend the capabilities of Flow Matching-based diffusion models like FLUX.1-dev, allowing for more deliberate artistic expression.

Proportion Control with Bounding Boxes

The Proportion ControlNet empowers users to dictate the placement and scale of objects using simple bounding boxes. Unlike regional prompting, which assigns different text prompts to specific masked areas, this method uses a single global prompt. The bounding boxes merely define the regions where elements described in the global prompt should appear, giving the model creative freedom to interpret and fill those spaces. This approach is also distinct from low-level controllers like Canny or LineArt, which focus on exact contours. Bounding boxes offer a higher level of abstraction, defining the semantic space an object should occupy rather than its precise shape, making it easier for artists to apply compositional rules like the rule of thirds without needing detailed outlines.

Perspective Control with Vanishing Lines

For defining the 3D geometry and viewpoint of a scene, the Perspective ControlNet utilizes vanishing lines. The researchers found that while vanishing points mathematically define perspective, they are problematic as conditioning inputs because they can be at infinity or far outside the image canvas, making them imprecise and difficult for users to manipulate. Vanishing lines, on the other hand, are intuitive for artists to draw, mimicking the natural sketching process of defining convergence. Crucially, these lines are always contained within the canvas, providing a direct, unambiguous, and spatially grounded proxy for vanishing points, thus offering a more robust and user-friendly input for perspective control.

Automated Data Pipelines for Training

To train these specialized ControlNets, the researchers developed fully automated data pipelines. For the Proportion ControlNet, the pipeline processed the WikiArt dataset, filtering images for aesthetic quality, then using Florence-2 for captioning and Grounding DINO for detecting object bounding boxes. The Perspective ControlNet’s pipeline processed a subset of OpenImages v7, also with aesthetic filtering, and employed a 2-Line Exhaustive Search algorithm to identify images with strong perspective structures, followed by Florence-2 for captioning. It’s noted that the perspective dataset was heavily skewed towards 1-point perspectives.

Also Read:

Experimental Insights and Limitations

Experiments demonstrated that the Proportion ControlNet effectively adheres to bounding box constraints and even showed an emergent ability to interpret non-rectangular shapes as proportional guides, likely due to its LineArt initialization. However, training on WikiArt introduced a “pictorial” style bias that intensified with higher ControlNet guidance strength.

The Perspective ControlNet successfully generated scenes respecting 1- and 2-point perspectives. A notable limitation was its consistent failure to render 3-point perspectives, often ignoring vertical convergence—a problem attributed to the skewed training data. The model also exhibited a strong prior for straight horizons, requiring explicit textual prompting (e.g., “Top view”) to achieve non-standard views like Dutch angles.

When attempting to use both ControlNets simultaneously, the researchers observed that optimal guidance strengths for individual modules led to image degradation, including severe color artifacts and “mushy” textures. Stable generation required reducing the guidance strength of each module to approximately 0.5, achieving partial adherence to both constraints, though robust and precise combined control proved challenging.

This work represents a significant step towards providing artists with more sophisticated and intuitive tools for controlling generative AI models. The researchers conclude that future work should focus on improving data diversity, potentially through synthetic generation from 3D scenes, to overcome current limitations. Both models are openly available on HuggingFace for wider access and experimentation. You can read the full research paper for more details here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Artistic Control in AI Image Generation with Proportion and Perspective ControlNets

Proportion Control with Bounding Boxes

Perspective Control with Vanishing Lines

Automated Data Pipelines for Training

Experimental Insights and Limitations

Gen AI News and Updates

Obello Secures $9.5 Million to Revolutionize Brand Creative Scaling with AI

Generative AI Powers Next-Gen Autonomous Emergency Response

C3-Diff: Enhancing Spatial Gene Expression Maps with AI and Histology

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates