k-means subclustering: a differentially private algorithm with improved clustering quality

dc.contributor.author	Joshi, Devvrat
dc.contributor.author	Thakkar, Janvi
dc.coverage.spatial	United States of America
dc.date.accessioned	2023-01-20T07:17:55Z
dc.date.available	2023-01-20T07:17:55Z
dc.date.issued	2023-01
dc.identifier.citation	Joshi, Devvrat and Thakkar, Janvi, "k-means subclustering: a differentially private algorithm with improved clustering quality", arXiv, Cornell University Library, DOI: arXiv:2301.02896, Jan. 2023.	en_US
dc.identifier.uri	https://arxiv.org/abs/2301.02896
dc.identifier.uri	https://repository.iitgn.ac.in/handle/123456789/8507
dc.description.abstract	In today's data-driven world, the sensitivity of information has been a significant concern. With this data and additional information on the person's background, one can easily infer an individual's private data. Many differentially private iterative algorithms have been proposed in interactive settings to protect an individual's privacy from these inference attacks. The existing approaches adapt the method to compute differentially private(DP) centroids by iterative Llyod's algorithm and perturbing the centroid with various DP mechanisms. These DP mechanisms do not guarantee convergence of differentially private iterative algorithms and degrade the quality of the cluster. Thus, in this work, we further extend the previous work on 'Differentially Private k-Means Clustering With Convergence Guarantee' by taking it as our baseline. The novelty of our approach is to sub-cluster the clusters and then select the centroid which has a higher probability of moving in the direction of the future centroid. At every Lloyd's step, the centroids are injected with the noise using the exponential DP mechanism. The results of the experiments indicate that our approach outperforms the current state-of-the-art method, i.e., the baseline algorithm, in terms of clustering quality while maintaining the same differential privacy requirements. The clustering quality significantly improved by 4.13 and 2.83 times than baseline for the Wine and Breast_Cancer dataset, respectively.
dc.description.statementofresponsibility	by Devvrat Joshi and Janvi Thakkar
dc.language.iso	en_US	en_US
dc.publisher	Cornell University Library	en_US
dc.subject	DP centroids	en_US
dc.subject	Llyod's algorithm	en_US
dc.subject	k-means clustering	en_US
dc.subject	Baseline algorithm	en_US
dc.subject	DP mechanisms	en_US
dc.title	k-means subclustering: a differentially private algorithm with improved clustering quality	en_US
dc.type	Pre-Print Archive	en_US
dc.relation.journal	arXiv

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

E-print Articles [183]

Show simple item record

Search Digital Repository

Browse

All of DSpace
This Collection
- Titles
- Authors
- By Advisor
- By Issue Date
- Subjects
- By Type
- By Degree
- By Department

k-means subclustering: a differentially private algorithm with improved clustering quality

Files in this item

This item appears in the following Collection(s)

Search Digital Repository

Browse

All of DSpace

This Collection

My Account