BloomCoreset: fast coreset sampling using bloom filters for fine-grained self-supervised learning

Show simple item record

dc.contributor.author Singh, Prajwal
dc.contributor.author Vashishtha, Gautam
dc.contributor.author Mastan, Indra Deep
dc.contributor.author Raman, Shanmuganathan
dc.coverage.spatial United States of America
dc.date.accessioned 2025-01-03T12:39:15Z
dc.date.available 2025-01-03T12:39:15Z
dc.date.issued 2024-12
dc.identifier.citation Singh, Prajwal; Vashishtha, Gautam; Mastan, Indra Deep and Raman, Shanmuganathan, "BloomCoreset: fast coreset sampling using bloom filters for fine-grained self-supervised learning", arXiv, Cornell University Library, DOI: arXiv:2412.16942, Dec. 2024.
dc.identifier.uri http://arxiv.org/abs/2412.16942
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/10912
dc.description.abstract The success of deep learning in supervised fine-grained recognition for domain-specific tasks relies heavily on expert annotations. The Open-Set for fine-grained Self-Supervised Learning (SSL) problem aims to enhance performance on downstream tasks by strategically sampling a subset of images (the Core-Set) from a large pool of unlabeled data (the Open-Set). In this paper, we propose a novel method, BloomCoreset, that significantly reduces sampling time from Open-Set while preserving the quality of samples in the coreset. To achieve this, we utilize Bloom filters as an innovative hashing mechanism to store both low- and high-level features of the fine-grained dataset, as captured by Open-CLIP, in a space-efficient manner that enables rapid retrieval of the coreset from the Open-Set. To show the effectiveness of the sampled coreset, we integrate the proposed method into the state-of-the-art fine-grained SSL framework, SimCore [1]. The proposed algorithm drastically outperforms the sampling strategy of the baseline in SimCore [1] with a 98.5% reduction in sampling time with a mere 0.83% average trade-off in accuracy calculated across 11 downstream datasets.
dc.description.statementofresponsibility by Prajwal Singh, Gautam Vashishtha, Indra Deep Mastan and Shanmuganathan Raman
dc.language.iso en_US
dc.publisher Cornell University Library
dc.title BloomCoreset: fast coreset sampling using bloom filters for fine-grained self-supervised learning
dc.type Article
dc.relation.journal arXiv


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account