dc.contributor.author |
Singh, Prajwal |
|
dc.contributor.author |
Vashishtha, Gautam |
|
dc.contributor.author |
Mastan, Indra Deep |
|
dc.contributor.author |
Raman, Shanmuganathan |
|
dc.coverage.spatial |
United States of America |
|
dc.date.accessioned |
2025-01-03T12:39:15Z |
|
dc.date.available |
2025-01-03T12:39:15Z |
|
dc.date.issued |
2024-12 |
|
dc.identifier.citation |
Singh, Prajwal; Vashishtha, Gautam; Mastan, Indra Deep and Raman, Shanmuganathan, "BloomCoreset: fast coreset sampling using bloom filters for fine-grained self-supervised learning", arXiv, Cornell University Library, DOI: arXiv:2412.16942, Dec. 2024. |
|
dc.identifier.uri |
http://arxiv.org/abs/2412.16942 |
|
dc.identifier.uri |
https://repository.iitgn.ac.in/handle/123456789/10912 |
|
dc.description.abstract |
The success of deep learning in supervised fine-grained recognition for domain-specific tasks relies heavily on expert annotations. The Open-Set for fine-grained Self-Supervised Learning (SSL) problem aims to enhance performance on downstream tasks by strategically sampling a subset of images (the Core-Set) from a large pool of unlabeled data (the Open-Set). In this paper, we propose a novel method, BloomCoreset, that significantly reduces sampling time from Open-Set while preserving the quality of samples in the coreset. To achieve this, we utilize Bloom filters as an innovative hashing mechanism to store both low- and high-level features of the fine-grained dataset, as captured by Open-CLIP, in a space-efficient manner that enables rapid retrieval of the coreset from the Open-Set. To show the effectiveness of the sampled coreset, we integrate the proposed method into the state-of-the-art fine-grained SSL framework, SimCore [1]. The proposed algorithm drastically outperforms the sampling strategy of the baseline in SimCore [1] with a 98.5% reduction in sampling time with a mere 0.83% average trade-off in accuracy calculated across 11 downstream datasets. |
|
dc.description.statementofresponsibility |
by Prajwal Singh, Gautam Vashishtha, Indra Deep Mastan and Shanmuganathan Raman |
|
dc.language.iso |
en_US |
|
dc.publisher |
Cornell University Library |
|
dc.title |
BloomCoreset: fast coreset sampling using bloom filters for fine-grained self-supervised learning |
|
dc.type |
Article |
|
dc.relation.journal |
arXiv |
|