Physical attractiveness is the degree to which a person's physical characteristics are considered pleasing and appealing and plays an important role in interpersonal communication. The majority of existing studies have focused on the perception of attractiveness from one modality (e.g., face or voice). This study aims to explore how individuals perceive attractiveness from multimodal information of different attentional relevance. In Experiment 1, twenty-four participants rated the attractiveness of recorded voices of vowels and sentences on a 9-point scale. In Experiment 2, a new group of 64 participants rated the attractiveness of audiovisual stimuli in an adapted Garner paradigm which directed their attention to different modalities of information. In the face-attending task, participants rated facial attractiveness while ignoring the voice; In the voice-attending task, participants rated vocal attractiveness while ignoring the face. The linear-regression results showed that F0 and harmonic-to-noise-ratio predicted the attractiveness of vowels; semantic valence modulated the perceived attractiveness of sentences. The linear mixed-effects model showed that, while attentional irrelevance generally attenuated the perceived attractiveness of either face or voice in multimodal stimuli, only the effect of facial attractiveness persisted under the voice-attending task. These findings demonstrated that the allocation of attentional relevance to certain communicative modality alters human nonverbal attractiveness perception. More importantly, the modulation of top-down processes on multimodal attractiveness integration aligns with the late integration framework and provides evidence for the cognitive underpinnings supporting multimodal communication. [ABSTRACT FROM AUTHOR]