Learning and exploiting temporal dependencies in the synthesis and analysis of video signals

Fox, Gereon

Please use this identifier to cite or link to this item: doi:10.22028/D291-46216

Title:	Learning and exploiting temporal dependencies in the synthesis and analysis of video signals
Author(s):	Fox, Gereon
Language:	English
Year of Publication:	2025
Place of publication:	Saarbrücken
DDC notations:	004 Computer science, internet 600 Technology
Publikation type:	Dissertation
Abstract:	The acquisition, reproduction, analysis and modification of visual information are important in all parts of human life - even more so since the advent of sufficiently capable computers. Especially the computational treatment of the temporal dimension is challenging, but also beneficial for many applications. This thesis explores the temporal dimension in three different contexts: For the detection of semantically relevant manipulations, it demonstrates that previous detection methods can be fooled by the same improvements to the manipulation technique that would fool human observers. New methods are presented to nevertheless achieve high detection accuracy, and especially temporal dependencies are shown to help generalise to unseen manipulation methods. For the synthesis of new video signals, previous work has constructed models that entangle spatial and temporal features. This thesis separates these features, reducing memory demand and computation time, as well as the amount of data necessary for training. For the reconstruction of video signals from event data, a data modality for which training data is scarce, the thesis presents a method to turn event data into watchable signals, without using any training data at all, but outperforming previous methods that do so. In each of these contexts, the thesis highlights the degree to which solutions depend on training sets of different sizes, and the impact this has on performance and computational cost. Erfassung, Reproduktion, Analyse und Modifikation visueller Informationen sind wichtig für alle Bereiche menschlichen Lebens -- insbesondere seit der Verfügbarkeit leistungsfähiger Rechner. Vor allem die Zeit-Dimension ist informatisch herausfordernd, aber auch lohnenswert für viele Anwendungen. Die vorliegende Arbeit untersucht diese Dimension in drei verschiedenen Kontexten: Für die Erkennung semantisch relevanter Manipulationen wird gezeigt, dass Manipulationen, die menschliche Betrachter zuverlässig täuschen, auch die bisherigen maschinellen Erkenner in die Irre führen. Neue Erkenner werden eingeführt, denen die Modellierung zeitlicher Abhängigkeiten zu erhöhter Robustheit gegenüber ungesehenen Manipulationen verhilft. Bei der Synthese neuer Videosignale haben vorherige Arbeiten räumliche und zeitliche Zusammenhänge ineinander verwoben modelliert. Die vorliegende Arbeit trennt diese Dimensionen und reduziert so Speicherbedarf, Rechenzeit und Bedarf an Trainingsdaten. Für die Rekonstruktion von Videosignalen aus Event-Daten sind Trainingsdaten nur schwer zu beschaffen. Die Arbeit rekonstruiert Videosignale aus Event-Daten besser als vorherige Methoden, ohne Trainingsdaten zu benötigen. Für alle drei Aufgaben beleuchtet die Arbeit den Bedarf an Trainings-Datensätzen verschiedener Größen, sowie den daraus resultierenden Einfluss auf Ausgabequalität und Ressourcenverbrauch.
Link to this record:	urn:nbn:de:bsz:291--ds-462169 hdl:20.500.11880/40623 http://dx.doi.org/10.22028/D291-46216
Advisor:	Theobalt, Christian Herfet, Thorsten
Date of oral examination:	26-Aug-2025
Date of registration:	29-Sep-2025
Faculty:	MI - Fakultät für Mathematik und Informatik
Department:	MI - Informatik
Professorship:	MI - Prof. Dr. Christian Theobalt
Collections:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:

File	Description	Size	Format
thesis_submit_final.pdf	Vollständige Thesis	64,22 MB	Adobe PDF	View/Open

Export: BibTex