Decoding the rule book: extracting hidden moderation criteria from reddit communities

Show simple item record

dc.contributor.author Kim, Youngwoo
dc.contributor.author Beniwal, Himanshu
dc.contributor.author Johnson, Steven L.
dc.contributor.author Hartvigsen, Thomas
dc.coverage.spatial United States of America
dc.date.accessioned 2025-09-12T11:18:58Z
dc.date.available 2025-09-12T11:18:58Z
dc.date.issued 2025-09
dc.identifier.citation Kim, Youngwoo; Beniwal, Himanshu; Johnson, Steven L. and Hartvigsen, Thomas, "Decoding the rule book: extracting hidden moderation criteria from reddit communities", arXiv, Cornell University Library, DOI: arXiv:2509.02926, Sep. 2025.
dc.identifier.issn 2331-8422
dc.identifier.uri https://doi.org/10.48550/arXiv.2509.02926
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/12125
dc.description.abstract Effective content moderation systems require explicit classification criteria, yet online communities like subreddits often operate with diverse, implicit standards. This work introduces a novel approach to identify and extract these implicit criteria from historical moderation data using an interpretable architecture. We represent moderation criteria as score tables of lexical expressions associated with content removal, enabling systematic comparison across different communities. Our experiments demonstrate that these extracted lexical patterns effectively replicate the performance of neural moderation models while providing transparent insights into decision-making processes. The resulting criteria matrix reveals significant variations in how seemingly shared norms are actually enforced, uncovering previously undocumented moderation patterns including community-specific tolerances for language, features for topical restrictions, and underlying subcategories of the toxic speech classification.
dc.description.statementofresponsibility by Youngwoo Kim, Himanshu Beniwal, Steven L. Johnson and Thomas Hartvigsen
dc.language.iso en_US
dc.publisher Cornell University Library
dc.title Decoding the rule book: extracting hidden moderation criteria from reddit communities
dc.type Article
dc.relation.journal arXiv


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account