Hybrid Explainable AI Framework for Multimodal Healthcare Prediction: From Bangla Text to Stroke and Alzheimer's Diagnostics
Keywords:
Explainable AI, Hybrid Models, Bangla text, Stroke, Alzheimer’s, Multimodal healthcare, Neuroimaging.Abstract
We propose a hybrid explainable AI framework for multimodal healthcare prediction that integrates Bangla clinical narratives, structured patient variables, and neuroimaging-derived features to improve diagnostic support for stroke and Alzheimer’s disease. The framework combines transformer-derived Bangla text embeddings, tree-based learners for tabular signals, and convolutional neural networks for imaging, fused via sequence models and attention mechanisms to produce unified predictions and cross-modal explanations. Experiments comparing baseline Logistic Regression, Random Forest, XGBoost, a dense neural network, and a CNN-LSTM model with attention show a clear performance hierarchy: Logistic Regression (accuracy 0.72, AUC 0.75), Random Forest (accuracy 0.83, AUC 0.87), XGBoost (accuracy 0.86, AUC 0.89), ANN (accuracy 0.84, AUC 0.88), and CNN-LSTM with attention (accuracy 0.90, AUC 0.93). Explainability analyses using SHAP for tabular models, attention heatmaps for sequence models, and Grad-CAM for imaging demonstrate that the hybrid approach not only improves discriminative performance but also provides clinically meaningful attributions across modalities. We discuss the practical implications for clinician trust, language inclusivity, and privacy-aware deployment, and outline future directions, including the expansion to large-scale Bangla medical corpora, the incorporation of federated and privacy-preserving training, and prospective clinical validation. This work provides a reproducible template for constructing transparent, language-aware multimodal diagnostic systems that strike a balance between accuracy and interpretability.