- 2 views
DISSERTATION DFENSE
Author : Canyu Zhang
Advisor: Dr. Song Wang
Date: Sep 12th, 2025
Time: 1:00 pm
Place: Rm 2277, Storey Innovation Building
Abstract
Image reconstruction aims to restore corrupted images and recover visual content that is missing or degraded. Such degradation may arise from factors such as low resolution, occluded or masked regions, and shadow interference. This problem has become an increasingly important research topic, as people encounter and rely on visual information in nearly every aspect of daily life. Neural network–based approaches have recently emerged as highly effective solutions for this task. In particular, convolutional neural networks and transformer-based architectures have demonstrated remarkable success in producing visually convincing reconstructions. However, these models remain subject to several constraints, with one of the most significant being their inability to generate outputs of arbitrary size. For instance, most existing approaches can only process inputs and produce outputs of fixed dimensions, which restricts their flexibility and limits their applicability in real-world scenarios.
Recently, implicit neural function–based methods have been proposed for image processing tasks. Implicit neural representation (INR) provides a powerful framework for mapping discrete data into continuous representations, enabling flexibility and generalization. INR-based approaches have demonstrated significant progress in tasks such as image super-resolution, generation, and semantic segmentation. A key advantage of these methods is their ability to achieve super-resolution for images of arbitrary sizes. Despite these advancements, current INR-based techniques face several challenges. First, they primarily focus on intact images, making them less effective in scenarios involving damaged or incomplete regions. Second, many methods emphasize the continuous representation of pixel color while neglecting the rich semantic information embedded within images. Finally, training these models often requires large numbers of paired datasets, which are particularly difficult to obtain for tasks such as image de-shadowing due to the scarcity of labeled data. In this dissertation, we propose three novel approaches to address these limitations.
To fully exploit the potential of implicit neural representations (INRs) for processing damaged images, we introduce a novel task, $SuperInpaint$, which aims to reconstruct missing regions in low- resolution images while generating complete outputs at arbitrary higher resolutions. To address the second limitation, we develop the Semantic-Aware Implicit Representation (SAIR) framework. This approach augments the implicit representation of each pixel by jointly encoding both appearance and semantic information, thereby strengthening the model’s ability to capture fine- grained details as well as broader contextual structures. Building on the proven success of INR in image reconstruction, we further extend its applicability to the image de-shadowing task. To this end, we propose a specialized method designed to remove shadows while faithfully preserving the underlying image content. A critical challenge with existing INR-based methods lies in their dependence on large training datasets, which are particularly scarce for de-shadowing due to the limited availability of labeled data. To overcome this issue, we adopt a pre-training strategy, where the model is initially trained on large- scale image inpainting datasets. This enables the network to acquire strong reconstruction priors, which are subsequently transferred and fine-tuned for the shadow removal task.
Our proposed methods present a thorough and effective response to the identified challenges, establishing a robust framework that can be applied across a diverse spectrum of image processing tasks. By systematically addressing the key limitations of existing approaches, we expand both the flexibility and scalability of implicit neural representations, thereby enabling them to manage complex scenarios with higher levels of accuracy, robustness, and efficiency.