Breaking mBad! supervised fine-tuning for cross-lingual detoxification

dc.contributor.author	Beniwal, Himanshu
dc.contributor.author	Kim, Youngwoo
dc.contributor.author	Sap, Maarten
dc.contributor.author	Dan, Soham
dc.contributor.author	Hartvigsen, Thomas
dc.coverage.spatial	United States of America
dc.date.accessioned	2025-05-29T07:58:02Z
dc.date.available	2025-05-29T07:58:02Z
dc.date.issued	2025-05
dc.identifier.citation	Beniwal, Himanshu; Kim, Youngwoo; Sap, Maarten; Dan, Soham and Hartvigsen, Thomas, "Breaking mBad! supervised fine-tuning for cross-lingual detoxification", arXiv, Cornell University Library, DOI: arXiv:2505.16722, May 2025.
dc.identifier.uri	https://doi.org/10.48550/arXiv.2505.16722
dc.identifier.uri	https://repository.iitgn.ac.in/handle/123456789/11468
dc.description.abstract	As large language models (LLMs) become increasingly prevalent in global applications, ensuring that they are toxicity-free across diverse linguistic contexts remains a critical challenge. We explore "Cross-lingual Detoxification", a cross-lingual paradigm that mitigates toxicity, enabling detoxification capabilities to transfer between high and low-resource languages across different script families. We analyze cross-lingual detoxification's effectiveness through 504 extensive settings to evaluate toxicity reduction in cross-distribution settings with limited data and investigate how mitigation impacts model performance on non-toxic tasks, revealing trade-offs between safety and knowledge preservation. Our code and dataset are publicly available at https://github.com/himanshubeniwal/Breaking-mBad
dc.description.statementofresponsibility	by Himanshu Beniwal, Youngwoo Kim, Maarten Sap, Soham Dan and Thomas Hartvigsen
dc.language.iso	en_US
dc.publisher	Cornell University Library
dc.title	Breaking mBad! supervised fine-tuning for cross-lingual detoxification
dc.type	Article
dc.relation.journal	arXiv

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

E-print Articles [187]

Show simple item record

Search Digital Repository

Browse

All of DSpace
This Collection
- Titles
- Authors
- By Advisor
- By Issue Date
- Subjects
- By Type
- By Degree
- By Department

Breaking mBad! supervised fine-tuning for cross-lingual detoxification

Files in this item

This item appears in the following Collection(s)

Search Digital Repository

Browse

All of DSpace

This Collection

My Account