Wednesday, March 16, 2016 - 12:00 pm
Swearingen 3A75
DISSERTATION DEFENSE Department of Computer Science and Engineering, University of South Carolina Candidate: Yuewei Lin Advisor: Dr. Song Wang Date: March 16, 2016 Time: 12:00 P.M. Place: Swearingen 3A75 Abstract Interest detection is to detect an object, event, or process which causes attention. In this work, we focus on the interest detection in the image, video and multiple videos. Interest detection in an image or a video, is closely related to the visual attention. However, the interest detection in multiple videos needs to consider all the videos as a whole rather than considering the attention in each single video independently. In this work, we first introduce a new computational visual-attention model for detecting region of interest in static images and/or videos. This model constructs the saliency map for each image and takes the region with the highest salient value as the region of interest. Specifically, we use the Earth Mover's Distance (EMD) to measure the center-surround difference in the receptive field. Furthermore, we propose to take two steps of biologically-inspired nonlinear operations for combining different features. Then, we extend the proposed model to construct dynamic saliency maps from videos, by computing the center-surround difference in the spatio-temporal receptive field. Motivated by the natural relation between visual saliency and object/region of interest, we then propose an algorithm to detect infrequently moving foreground, in which the saliency detection technique is used to identify the foreground (object/region of interest) and background. Finally, we focus on the task of locating the co-interest person from multiple temporally synchronized videos taken by the multiple wearable cameras. More specifically, we propose a co-interest detection algorithm that can find persons that draw attention from most camera wearers, even if multiple similar-appearance persons are present in the videos. We built a Conditional Random Field (CRF) to achieve this goal, by taking each frame as a node and the detected persons as the states at each node.