Geometrical homogeneous clustering for image data reduction

Show simple item record

dc.contributor.author Mody, Shril
dc.contributor.author Thakkar, Janvi
dc.contributor.author Joshi, Devvrat
dc.contributor.author Soni, Siddharth
dc.contributor.author Patil, Rohan
dc.contributor.author Batra, Nipun
dc.coverage.spatial United States of America
dc.date.accessioned 2022-09-07T13:49:27Z
dc.date.available 2022-09-07T13:49:27Z
dc.date.issued 2022-08
dc.identifier.citation Mody, Shril; Thakkar, Janvi; Joshi, Devvrat; Soni, Siddharth; Patil, Rohan and Batra, Nipun, "Geometrical homogeneous clustering for image data reduction", arXiv, Cornell University Library, DOI: arXiv:2208.13079, Aug. 2022. en_US
dc.identifier.uri https://arxiv.org/abs/2208.13079
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/8123
dc.description.abstract In this paper, we present novel variations of an earlier approach called homogeneous clustering algorithm for reducing dataset size. The intuition behind the approaches proposed in this paper is to partition the dataset into homogeneous clusters and select some images which contribute significantly to the accuracy. Selected images are the proper subset of the training data and thus are human-readable. We propose four variations upon the baseline algorithm-RHC. The intuition behind the first approach, RHCKON, is that the boundary points contribute significantly towards the representation of clusters. It involves selecting k farthest and one nearest neighbour of the centroid of the clusters. In the following two approaches (KONCW and CWKC), we introduce the concept of cluster weights. They are based on the fact that larger clusters contribute more than smaller sized clusters. The final variation is GHCIDR which selects points based on the geometrical aspect of data distribution. We performed the experiments on two deep learning models- Fully Connected Networks (FCN) and VGG1. We experimented with the four variants on three datasets- MNIST, CIFAR10, and Fashion-MNIST. We found that GHCIDR gave the best accuracy of 99.35%, 81.10%, and 91.66% and a training data reduction of 87.27%, 32.34%, and 76.80% on MNIST, CIFAR10, and Fashion-MNIST respectively.
dc.description.statementofresponsibility by Shril Mody, Janvi Thakkar, Devvrat Joshi, Siddharth Soni, Rohan Patil and Nipun Batra
dc.language.iso en_US en_US
dc.publisher Cornell University Library en_US
dc.subject Image data reduction en_US
dc.subject RHCKON en_US
dc.subject KONCW en_US
dc.subject CWKC en_US
dc.subject GHCIDR en_US
dc.title Geometrical homogeneous clustering for image data reduction en_US
dc.type Pre-Print en_US
dc.relation.journal arXiv


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account