A Multimodal Framework Integrating Acoustic and Linguistic Representations for Interpretable Dementia Detection