Medusa — Unified Content-Safety Moderation Model

Medusa is a unified content-safety moderation model for large-scale internet platforms, designed to keep up with rapidly evolving violation patterns where rule-based systems fail.

Approach

Backbone: XLM-RoBERTa.
Hybrid head architecture: linear head + Medusa TextCNN multi-head.
Partial-label masking to handle sparse / weakly-labeled data.
10 prompt templates enabling few-shot detection of new categories.

Results

92.3% accuracy on financial violations.
−35% false positives, +60% moderation efficiency.
100+ atomic capabilities deployed online.
Hundreds of thousands of QPS in production.

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)