TLDR: AutoOpt is a new framework that automates solving mathematical optimization problems directly from images. It leverages AutoOpt-11k, a large dataset of handwritten and printed mathematical models. The framework consists of three modules: M1 converts images to LaTeX, M2 converts LaTeX to PYOMO script, and M3 solves the problem using a hybrid Bilevel Optimization based Decomposition (BOBD) method. AutoOpt achieves high accuracy, significantly reducing human intervention in complex optimization tasks, and its dataset and code are publicly available.
The world of mathematical optimization, crucial for everything from business logistics to engineering design, often involves complex problems presented in various formats – from handwritten notes on a whiteboard to figures in academic papers. Traditionally, converting these visual representations into a machine-readable format for solving has been a tedious, manual process. A new study introduces AutoOpt, an innovative framework designed to automate this entire workflow, allowing optimization problems to be solved directly from their image-based formulations.
Introducing AutoOpt-11k: A Comprehensive Dataset
Central to the AutoOpt framework is AutoOpt-11k, a unique and extensive image dataset. This dataset comprises over 11,000 images of mathematical optimization models, meticulously collected and curated. It includes a diverse mix of both handwritten and printed formulations, capturing a wide spectrum of problem complexities such as non-linearity, multi-objective functions, multi-level structures, and stochastic elements. Each image in AutoOpt-11k is accompanied by its corresponding LaTeX representation, a standard for mathematical typesetting, and a subset also includes a PYOMO script, a popular optimization modeling language. This rich dataset was developed by 25 experts and underwent a rigorous two-phase verification process to ensure high accuracy and reliability.
The Three Pillars of AutoOpt Framework
The AutoOpt framework operates through three integrated modules, each performing a specialized task in sequence:
Module M1: Image to LaTeX Code Generation
The first module, M1, focuses on Mathematical Expression Recognition (MER). It takes an image of an optimization formulation as input and converts it into LaTeX code. The researchers developed a sophisticated hybrid deep learning architecture for this module, combining the strengths of ResNet and Swin Transformer models. This innovative design allows M1 to effectively capture both local visual patterns (like symbol shapes) and long-range dependencies (like spatial layouts of superscripts and fractions). Notably, this module has demonstrated superior performance compared to existing state-of-the-art tools, including large language models like ChatGPT, Gemini, and Nougat, in terms of accuracy metrics like BLEU score and Character Error Rate.
Module M2: LaTeX to PYOMO Script Generation
Once the LaTeX code is generated by M1, the second module, M2, takes over. Its role is to translate the LaTeX code into a PYOMO script, which is a Python-based modeling language that optimization solvers can understand. This module is powered by a fine-tuned causal decoder-only transformer model, specifically DeepSeek-Coder 1.3B, chosen for its strong code generation capabilities and efficiency. The decision to use a two-stage approach (Image to LaTeX, then LaTeX to PYOMO) was strategic, allowing for easier verification of the intermediate LaTeX output and enhancing the overall reliability of the code generation process.
Module M3: Optimization Using a Hybrid Method
The final module, M3, is responsible for solving the optimization problem described in the PYOMO script. For this crucial step, AutoOpt employs a Bilevel Optimization based Decomposition (BOBD) method. This method is a hybrid approach that intelligently combines classical mathematical programming techniques with metaheuristics, such as genetic algorithms. By decomposing complex problems into a bilevel structure, BOBD can efficiently tackle a wide range of optimization challenges, including those with non-convexity, non-linearity, and high-dimensionality. The BOBD method has shown to yield better results on complex test problems compared to common standalone approaches like interior-point algorithms and genetic algorithms.
Also Read:
- Deconstructing Vehicle Routing: A Modular Approach to Multi-Task Optimization
- Enhancing Autonomous System Safety Through Learning from Expert Behavior
Performance and Future Outlook
The AutoOpt framework demonstrates impressive performance. In module-level evaluations, the Image-to-LaTeX module achieved a reliability of 97.14%, and the LaTeX-to-PYOMO module achieved 91.75%. When the complete pipeline (M1-M2-M3) was evaluated on 500 sample problems outside its training dataset, it achieved an overall success rate of 94.20%. This high success rate underscores the framework’s potential to significantly reduce human intervention in solving complex optimization tasks.
This research marks a substantial contribution to the field by bridging the gap between visual problem representations and automated solutions. The public release of the AutoOpt-11k dataset and the AutoOpt framework is expected to catalyze further research and innovation at the intersection of computer vision, natural language processing, and mathematical optimization. Future work will explore handling ill-defined problems and formulations spanning multiple pages. For more details, you can refer to the original research paper here.


