Visual object perception problems in computer vision

Show simple item record

dc.contributor.advisor Raman, Shanmuganathan
dc.contributor.author Vora, Aditya Narendrabhai
dc.date.accessioned 2025-04-25T14:23:43Z
dc.date.available 2025-04-25T14:23:43Z
dc.date.issued 2017
dc.identifier.citation Vora, Aditya Narendrabhai (2017). Visual object perception problems in computer vision. Gandhinagar: Indian Institute of Technology Gandhinagar, 36p. (Acc. No.: T00223).
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/11348
dc.description.abstract Video object segmentation is the task of estimating foreground object segments from the background throughout video. We propose a frame-by-frame approach for video object segmentation that uses cluster information in order to select foreground segments. Unlike previous approaches for video object segmentation that makes use of optical flow in order to localize dynamic object segments throughout the video, we rather focus on selecting a set of foreground segments from a pool of region proposals through clustering, which helps to avoid making use of optical flow and thus help our algorithm to scale-up to longer video sequences. Object localization is the task of estimating precise localized windows around all object instances in the image. We proposed an algorithm for object localization given that single object instance appears in the image. Unlike previous supervised and weakly supervised techniques that require heavy training in order to learn classifiers, our approach is completely unsupervised. Our approach depends on iterative spectral clustering in order select proposals that contain an object from a huge set of proposals generated from an object proposal generation algorithm. From these set of filtered object proposals, we then estimate the final localized window by considering the inter and intra class variations among the object proposals, thus making the entire algorithm completely unsupervised. We consider designing a fully automated action recognition system under uncontrolled environments. Most existing algorithms rely on constructing handcrafted features from the input and then learn classifiers based on the designed features. However, these hand-crafted features are inefficient in modelling more complex scenes. CNN are a class of deep learning models that can learn features automatically from the input during the training process. We design a 3D convolutional neural network for human action recognition. This model is able to extract features in spatio-temporal domain, thereby able to capture the motion information encoded in multiple contiguous frames required for all video processing applications.
dc.description.statementofresponsibility by Aditya Narendrabhai Vora
dc.format.extent 36p.: 29 cm.
dc.language.iso en_US
dc.publisher Indian Institute of Technology Gandhinagar
dc.subject 14410006
dc.subject Video Object Segmentation
dc.subject Spatio-Temporal Domain
dc.subject CNN (Convolutional Neural Network)
dc.subject Deep Learning Models
dc.subject Video Processing Applications
dc.title Visual object perception problems in computer vision
dc.type Thesis
dc.contributor.department Electrical Engineering
dc.description.degree M. Tech.


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account