Monday, June 23, 2025 - 10:00 am
Online

    DISSERTATION DFENSE
 

Author : Ruwan Tharanga Wickramarachchige Don
Advisor: Dr. Amit Sheth
Date: June 23rd, 2025
Time: 10:00 am
Place: AI Institute Seminar room
Zoom Link / Online Access: Join Zoom Meeting
https://sc-edu.zoom.us/j/89344836465?pwd=5sm3lb06ESCU8kcFmNhBWKLL8MnwhF…

 

Meeting ID: 893 4483 6465

Passcode: 180289


Abstract

 

Scene understanding remains a central challenge in the machine perception of autonomous systems. It requires the integration of multiple sources of information, background knowledge, and heterogeneous sensor data to perceive, interpret, and reason about both physical and semantic aspects of dynamic environments. Current approaches to scene understanding primarily rely on computer vision and deep learning models that operate directly on raw sensor data to perform tasks such as object detection, recognition, and localization. However, in real-world domains – such as autonomous driving and smart manufacturing/ Industry 4.0 – this sole reliance on raw perceptual data exposes limitations in safety, robustness, generalization, and explainability. To address these challenges, this dissertation proposes a novel perspective on scene understanding using a Neurosymbolic AI approach that combines knowledge representation, representation learning, and reasoning to advance cognitive and visual reasoning in autonomous systems.

Our approach involves several key contributions. First, we introduce methods for constructing unified knowledge representations that integrate scene data with background knowledge. This includes the development of a dataset-agnostic scene ontology and the construction of knowledge graphs (KGs) to represent multimodal data from autonomous systems. Specifically, we introduce DSceneKG, a suite of large-scale KGs representing real-world driving scenes across multiple autonomous driving datasets. DSceneKG has already been utilized in several emerging neurosymbolic AI tasks, including explainable scene clustering and causal reasoning, and has been adopted for an industrial cross-modal retrieval task. Second, we propose methods to enhance the expressiveness of scene knowledge in sub-symbolic representations to support downstream learning tasks that rely on high-quality translation of KG into embedding space. Our investigation identifies effective KG patterns and structures that enhance the semantic richness of KG embeddings, thereby improving model reasoning capabilities. Third, we introduce knowledge-based entity prediction (KEP), a novel cognitive visual reasoning task that leverages relational knowledge in KGs to predict entities that are not directly observed but are likely to exist given the scene context. Using two high-quality autonomous driving datasets, we evaluate the effectiveness of this approach in predicting entities that are likely to be seen given the current scene context. Fourth, we present CLUE, a context-based method for labeling unobserved entities, designed to improve annotation quality in existing multimodal datasets by incorporating contextual knowledge of entities that may be missing due to perceptual failures. Finally, by integrating these contributions, we introduce CUEBench, a benchmark for contextual entity prediction that systematically evaluates both neurosymbolic and foundation model-based approaches (i.e., large language models and multimodal language models). CUEBench fills a critical gap in current benchmarking by targeting high-level cognitive reasoning under perceptual incompleteness, reflecting real-world challenges faced by autonomous systems.