Bitte benutzen Sie diese Referenz, um auf diese Ressource zu verweisen:
doi:10.22028/D291-42655
Titel: | Efficient and differentiable combinatorial optimization for visual computing |
VerfasserIn: | Abbas, Ahmed |
Sprache: | Englisch |
Erscheinungsjahr: | 2024 |
DDC-Sachgruppe: | 004 Informatik 500 Naturwissenschaften 600 Technik |
Dokumenttyp: | Dissertation |
Abstract: | Many visual computing tasks involve reasoning over structured domains and discrete objects which can be modeled as combinatorial optimization (CO) problems. Tremendous speed-up in general-purpose CO solvers has allowed to solve these problems in many cases despite being NP-hard. Approaching large-scale structured prediction problems from a combinatorial optimization standpoint however, has been a challenge due to a variety of reasons. These include near real-time performance requirements and lack of differentiability of CO approaches. The latter causes difficulties in harnessing machine learning to specify problem specifications and in developing learnable CO algorithms. We aim to address these shortcomings on multiple avenues. First, we focus on a specific CO problem for clustering known as the multicut problem with many applications in a variety of visual computing tasks. The multicut problem is defined on a weighted graph where edge weights encode preferences of the endpoints belonging to the same or different clusters. We devise methods for reducing human effort in multicut model specification, (i.e., edge weights and graph structure) and for reducing high compute times associated with large problem sizes. For inferring the edge weights, we utilize neural networks and propose strategies for improving gradient estimation through the multicut problem. This allows to train the neural network by imposing a loss on the clustering produced by the multicut problem. Our approach improves performance on large-scale panoptic segmentation tasks. As a second step, we focus on the challenge of graph structure design. We propose a compact formulation of the multicut problem on the complete graph along with efficient algorithms, sidestepping the problem of graph selection. Our approach yields better panoptic segmentation performance as compared to the one obtained by a hand-designed graph structure. Lastly, we address scalability issues of the multicut problem. We devise a massively parallelizable GPU-friendly algorithm resulting in more than an order of magnitude speed-up over existing sequential methods. As a second avenue, we shift our focus to general-purpose CO solvers and tackle challenges related to their scalability. As a first step, we devise parallel algorithms that can harness GPUs gaining up to an order of magnitude speed-up over sequential CPU counterparts. For the second step, we exploit machine learning in solving relaxations of CO problems. We employ differentiable solver primitives and use neural networks inside the solver to guide the solving process. Given a few problem instances from a task of interest, our general-purpose solver can be adapted by learning instead of task-specific solver development. Empirically our learned approach achieves significantly better performance than its non-learned version, better solutions than task-specific solvers, and exhibits better anytime performance compared to a commercial solver (Gurobi). Our method is also applicable to some problems unrelated to structured prediction and offers competitive performance to the commercial solver. In summary, we study methods to make CO for visual computing more practical by devising differentiable, massively parallel, and data-driven methods. |
Link zu diesem Datensatz: | urn:nbn:de:bsz:291--ds-426550 hdl:20.500.11880/38409 http://dx.doi.org/10.22028/D291-42655 |
Erstgutachter: | Swoboda, Paul |
Tag der mündlichen Prüfung: | 2-Aug-2024 |
Datum des Eintrags: | 11-Sep-2024 |
Fakultät: | MI - Fakultät für Mathematik und Informatik |
Fachrichtung: | MI - Informatik |
Professur: | MI - Keiner Professur zugeordnet |
Sammlung: | SciDok - Der Wissenschaftsserver der Universität des Saarlandes |
Dateien zu diesem Datensatz:
Datei | Beschreibung | Größe | Format | |
---|---|---|---|---|
thesis_ahmed_comb_opt_visual_computing.pdf | Dissertation | 3,15 MB | Adobe PDF | Öffnen/Anzeigen |
Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt.