Please use this identifier to cite or link to this item: doi:10.22028/D291-42655
Title: Efficient and differentiable combinatorial optimization for visual computing
Author(s): Abbas, Ahmed
Language: English
Year of Publication: 2024
DDC notations: 004 Computer science, internet
500 Science
600 Technology
Publikation type: Dissertation
Abstract: Many visual computing tasks involve reasoning over structured domains and discrete objects which can be modeled as combinatorial optimization (CO) problems. Tremendous speed-up in general-purpose CO solvers has allowed to solve these problems in many cases despite being NP-hard. Approaching large-scale structured prediction problems from a combinatorial optimization standpoint however, has been a challenge due to a variety of reasons. These include near real-time performance requirements and lack of differentiability of CO approaches. The latter causes difficulties in harnessing machine learning to specify problem specifications and in developing learnable CO algorithms. We aim to address these shortcomings on multiple avenues. First, we focus on a specific CO problem for clustering known as the multicut problem with many applications in a variety of visual computing tasks. The multicut problem is defined on a weighted graph where edge weights encode preferences of the endpoints belonging to the same or different clusters. We devise methods for reducing human effort in multicut model specification, (i.e., edge weights and graph structure) and for reducing high compute times associated with large problem sizes. For inferring the edge weights, we utilize neural networks and propose strategies for improving gradient estimation through the multicut problem. This allows to train the neural network by imposing a loss on the clustering produced by the multicut problem. Our approach improves performance on large-scale panoptic segmentation tasks. As a second step, we focus on the challenge of graph structure design. We propose a compact formulation of the multicut problem on the complete graph along with efficient algorithms, sidestepping the problem of graph selection. Our approach yields better panoptic segmentation performance as compared to the one obtained by a hand-designed graph structure. Lastly, we address scalability issues of the multicut problem. We devise a massively parallelizable GPU-friendly algorithm resulting in more than an order of magnitude speed-up over existing sequential methods. As a second avenue, we shift our focus to general-purpose CO solvers and tackle challenges related to their scalability. As a first step, we devise parallel algorithms that can harness GPUs gaining up to an order of magnitude speed-up over sequential CPU counterparts. For the second step, we exploit machine learning in solving relaxations of CO problems. We employ differentiable solver primitives and use neural networks inside the solver to guide the solving process. Given a few problem instances from a task of interest, our general-purpose solver can be adapted by learning instead of task-specific solver development. Empirically our learned approach achieves significantly better performance than its non-learned version, better solutions than task-specific solvers, and exhibits better anytime performance compared to a commercial solver (Gurobi). Our method is also applicable to some problems unrelated to structured prediction and offers competitive performance to the commercial solver. In summary, we study methods to make CO for visual computing more practical by devising differentiable, massively parallel, and data-driven methods.
Link to this record: urn:nbn:de:bsz:291--ds-426550
hdl:20.500.11880/38409
http://dx.doi.org/10.22028/D291-42655
Advisor: Swoboda, Paul
Date of oral examination: 2-Aug-2024
Date of registration: 11-Sep-2024
Faculty: MI - Fakultät für Mathematik und Informatik
Department: MI - Informatik
Professorship: MI - Keiner Professur zugeordnet
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
File Description SizeFormat 
thesis_ahmed_comb_opt_visual_computing.pdfDissertation3,15 MBAdobe PDFView/Open


Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.