BloomCoreset: fast coreset sampling using bloom filters for fine-grained self-supervised learning

Show simple item record

dc.contributor.author Singh, Prajwal
dc.contributor.author Vashishtha, Gautam
dc.contributor.author Mastan, Indra Deep
dc.contributor.author Raman, Shanmuganathan
dc.coverage.spatial India
dc.date.accessioned 2025-05-09T08:23:31Z
dc.date.available 2025-05-09T08:23:31Z
dc.date.issued 2025-04-06
dc.identifier.citation Singh, Prajwal; Vashishtha, Gautam; Mastan, Indra Deep; Raman, Shanmuganathan, "BloomCoreset: fast coreset sampling using bloom filters for fine-grained self-supervised learning", in the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025), Hyderabad, IN, Apr. 06-11, 2025.
dc.identifier.uri https://doi.org/10.1109/ICASSP49660.2025.10888815
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/11391
dc.description.abstract The success of deep learning in supervised fine-grained recognition for domain-specific tasks relies heavily on expert annotations. The Open-Set for fine-grained Self-Supervised Learning (SSL) problem aims to enhance performance on downstream tasks by strategically sampling a subset of images (the Core-Set) from a large pool of unlabeled data (the Open-Set). In this paper, we propose a novel method, BloomCoreset, that significantly reduces sampling time from Open-Set while preserving the quality of samples in the coreset. To achieve this, we utilize Bloom filters as an innovative hashing mechanism to store both low- and high-level features of the fine-grained dataset, as captured by Open-CLIP, in a space-efficient manner that enables rapid retrieval of the coreset from the Open-Set. To show the effectiveness of the sampled coreset, we integrate the proposed method into the state-of-the-art fine-grained SSL framework, SimCore [1]. The proposed algorithm drastically outperforms the sampling strategy of the baseline in [1] with a 98.5% reduction in sampling time with a mere 0.83% average trade-off in accuracy calculated across 11 downstream datasets. We have made the code publicly available.
dc.description.statementofresponsibility by Prajwal Singh, Gautam Vashishtha, Indra Deep Mastan and Shanmuganathan Raman
dc.language.iso en_US
dc.publisher Institute of Electrical and Electronics Engineers (IEEE)
dc.subject Self-supervised learning
dc.subject Representation learning
dc.subject Bloom filter
dc.subject Coreset
dc.subject Open-set
dc.subject Classification
dc.title BloomCoreset: fast coreset sampling using bloom filters for fine-grained self-supervised learning
dc.type Conference Paper
dc.relation.journal IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025)


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account