Cross-Domain Adversarial Attack Taxonomy for Multimodal Neural Networks: Threat Landscape and Open Challenges
Keywords:
Adversarial attacks, multimodal neural networks, cross-domain vulnerabilities, threat taxonomy, robustness, transferability, deep learning security, interpretability, data fusion, trustworthy AIAbstract
Multimodal neural networks have emerged as powerful frameworks for integrating diverse data modalities such as text, images, audio, and video, enabling advancements in fields ranging from autonomous systems to healthcare and security. However, their increasing adoption has exposed them to adversarial attacks that exploit vulnerabilities across multiple domains simultaneously. Unlike unimodal systems, multimodal networks present expanded attack surfaces due to cross-modal dependencies, complex fusion strategies, and heterogeneous data representations. This paper introduces a taxonomy for cross-domain adversarial attacks targeting multimodal neural networks, categorizing them by modality manipulation, attack vectors, and fusion-stage vulnerabilities. We highlight how these attacks undermine robustness, reliability, and interpretability, while also complicating detection and defense mechanisms. Through a detailed exploration of the threat landscape, we emphasize challenges such as cross-domain transferability of adversarial perturbations, coordinated attacks exploiting multimodal fusion, and real-time adversarial adaptation. Finally, we identify open challenges and research directions for developing resilient, interpretable, and trustworthy multimodal systems capable of withstanding evolving adversarial threats.