Objective: This study aimed to identify the optimal point of switching between voice and touch modalities for modality selection, regarding the number of characters and hierarchy-based menu structures. Background: Technological advances have enabled the development of multimodal interaction in modern smart devices. Selecting the most adequate modality during interaction with devices is crucial since the interface is regarded as a significant element that can enhance usability and overall satisfaction. Method: To achieve the research objective, the experiment was carried out by developing a modality selection task program through JavaScript. The experiment included three distinct tasks: Text entry, Simple keyboard touch, and Hierarchy menu. During the modality selection task, participants were given the option to select the more efficient modality between voice and touch modalities based on their preference. Results: The Text entry task evaluated the point of switching modalities based on the number of characters, but no significant difference was found in the probability of voice usage between the number of characters. Therefore, the syllable per touch ratio was adopted as an alternative measure in the Simple keyboard touch and Hierarchy menu tasks. A significant difference was found in the predicted probability of voice usage between the syllable per touch, identifying the optimal point of switching modalities, reaching up to two and five syllables per touch, respectively. Conclusion: The results indicate that the point at which users switch between voice and touch modalities depends on the menu structure, and the syllable per touch ratio is a critical factor to consider in modality selection. Application: The findings of this study provide significant insights for developers aiming to design multimodal interactions for various menu structures.