dc.contributor.author |
Beniwal, Himanshu |
|
dc.contributor.author |
Kim, Youngwoo |
|
dc.contributor.author |
Sap, Maarten |
|
dc.contributor.author |
Dan, Soham |
|
dc.contributor.author |
Hartvigsen, Thomas |
|
dc.coverage.spatial |
United States of America |
|
dc.date.accessioned |
2025-05-29T07:58:02Z |
|
dc.date.available |
2025-05-29T07:58:02Z |
|
dc.date.issued |
2025-05 |
|
dc.identifier.citation |
Beniwal, Himanshu; Kim, Youngwoo; Sap, Maarten; Dan, Soham and Hartvigsen, Thomas, "Breaking mBad! supervised fine-tuning for cross-lingual detoxification", arXiv, Cornell University Library, DOI: arXiv:2505.16722, May 2025. |
|
dc.identifier.uri |
https://doi.org/10.48550/arXiv.2505.16722 |
|
dc.identifier.uri |
https://repository.iitgn.ac.in/handle/123456789/11468 |
|
dc.description.abstract |
As large language models (LLMs) become increasingly prevalent in global applications, ensuring that they are toxicity-free across diverse linguistic contexts remains a critical challenge. We explore "Cross-lingual Detoxification", a cross-lingual paradigm that mitigates toxicity, enabling detoxification capabilities to transfer between high and low-resource languages across different script families. We analyze cross-lingual detoxification's effectiveness through 504 extensive settings to evaluate toxicity reduction in cross-distribution settings with limited data and investigate how mitigation impacts model performance on non-toxic tasks, revealing trade-offs between safety and knowledge preservation. Our code and dataset are publicly available at https://github.com/himanshubeniwal/Breaking-mBad |
|
dc.description.statementofresponsibility |
by Himanshu Beniwal, Youngwoo Kim, Maarten Sap, Soham Dan and Thomas Hartvigsen |
|
dc.language.iso |
en_US |
|
dc.publisher |
Cornell University Library |
|
dc.title |
Breaking mBad! supervised fine-tuning for cross-lingual detoxification |
|
dc.type |
Article |
|
dc.relation.journal |
arXiv |
|