PolyGuard: a multilingual safety moderation tool for 17 languages

Show simple item record

dc.contributor.author Kumar, Priyanshu
dc.contributor.author Jain, Devansh
dc.contributor.author Yerukola, Akhila
dc.contributor.author Jiang, Liwei
dc.contributor.author Beniwal, Himanshu
dc.contributor.author Hartvigsen, Thomas
dc.contributor.author Sap, Maarten
dc.coverage.spatial United States of America
dc.date.accessioned 2025-04-17T10:44:51Z
dc.date.available 2025-04-17T10:44:51Z
dc.date.issued 2025-04
dc.identifier.citation Kumar, Priyanshu; Jain, Devansh; Yerukola, Akhila; Jiang, Liwei; Beniwal, Himanshu; Hartvigsen, Thomas and Sap, Maarten, "PolyGuard: a multilingual safety moderation tool for 17 languages", arXiv, Cornell University Library, DOI: arXiv:2504.04377, Apr. 2025.
dc.identifier.uri http://arxiv.org/abs/2504.04377
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/11212
dc.description.abstract Truly multilingual safety moderation efforts for Large Language Models (LLMs) have been hindered by a narrow focus on a small set of languages (e.g., English, Chinese) as well as a limited scope of safety definition, resulting in significant gaps in moderation capabilities. To bridge these gaps, we release POLYGUARD, a new state-of-the-art multilingual safety model for safeguarding LLM generations, and the corresponding training and evaluation datasets. POLYGUARD is trained on POLYGUARDMIX, the largest multilingual safety training corpus to date containing 1.91M samples across 17 languages (e.g., Chinese, Czech, English, Hindi). We also introduce POLYGUARDPROMPTS, a high quality multilingual benchmark with 29K samples for the evaluation of safety guardrails. Created by combining naturally occurring multilingual human-LLM interactions and human-verified machine translations of an English-only safety dataset (WildGuardMix; Han et al., 2024), our datasets contain prompt-output pairs with labels of prompt harmfulness, response harmfulness, and response refusal. Through extensive evaluations across multiple safety and toxicity benchmarks, we demonstrate that POLYGUARD outperforms existing state-of-the-art open-weight and commercial safety classifiers by 5.5%. Our contributions advance efforts toward safer multilingual LLMs for all global users.
dc.description.statementofresponsibility by Priyanshu Kumar, Devansh Jain, Akhila Yerukola, Liwei Jiang, Himanshu Beniwal, Thomas Hartvigsen and Maarten Sap
dc.language.iso en_US
dc.publisher Cornell University Library
dc.title PolyGuard: a multilingual safety moderation tool for 17 languages
dc.type Article
dc.relation.journal arXiv


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account