Self-Supervised Robustness Enhancement for Multimodal Neural Networks Under Cross-Domain Adversarial Perturbations

Authors

  • Rohan Sharma Indian Institute of Technology (IIT) Bombay Author
  • Aarav Sharma International Institute of Information Technology (IIIT) Author

Keywords:

Self-supervised learning, multimodal neural networks, adversarial robustness, cross-domain perturbations, contrastive learning, representation learning, defense mechanisms

Abstract

Multimodal neural networks have become foundational in artificial intelligence, enabling systems to learn and reason from diverse modalities such as vision, language, and audio. However, these models remain highly vulnerable to cross-domain adversarial perturbations, where subtle but carefully crafted manipulations across one or more modalities lead to significant performance degradation. Traditional supervised defense mechanisms struggle to address these threats, primarily due to the lack of labeled adversarial data and the complexity of multimodal interactions. This paper proposes self-supervised robustness enhancement as a promising defense paradigm. By leveraging self-supervised pretext tasks and representation learning, multimodal models can learn modality-invariant, semantically consistent features that improve resilience against adversarial inputs. We explore contrastive learning, masked prediction, and redundancy-driven objectives as self-supervised strategies to fortify robustness. The analysis highlights how these methods mitigate cross-domain perturbations without explicit adversarial labels, while also improving generalization and interpretability. The discussion further outlines challenges in scalability, stability, and real-world deployment, positioning self-supervised learning as a crucial pathway toward robust and trustworthy multimodal AI.

Downloads

Published

2025-09-16