Medusa — Unified Content-Safety Moderation Model

Medusa is a unified content-safety moderation model for large-scale internet platforms, designed to keep up with rapidly evolving violation patterns where rule-based systems fail.

Approach

  • Backbone: XLM-RoBERTa.
  • Hybrid head architecture: linear head + Medusa TextCNN multi-head.
  • Partial-label masking to handle sparse / weakly-labeled data.
  • 10 prompt templates enabling few-shot detection of new categories.

Results

  • 92.3% accuracy on financial violations.
  • −35% false positives, +60% moderation efficiency.
  • 100+ atomic capabilities deployed online.
  • Hundreds of thousands of QPS in production.