In this paper, we propose an approach to the classification of emotion clusters in infant cries with consideration for frame-wise/local acoustic features and global prosodic features. Our proposed approach has two main characteristics as follows. The emotion cluster detection procedure is based on the most likely segment sequence, which delivers the emotion cluster as a classification result. This is obtained based on a maximum likelihood approach using the frame-wise likelihood and the global prosodic likelihood. We exploit the duration ratios of resonant cry segments and silent segments as prosodic features, while the duration ratios are calculated using the derived segment sequence. The second characteristic is the use of pitch information, in addition to conventional power and spectral information, during the modeling of frame-wise acoustic features with hidden Markov models. The classification performance (74.7%) of our proposed approach with added pitch information was better than (71.5%) the classification method using only power and spectral features. The proposed method based on a maximum likelihood approach using both frame-wise and global features also achieved better performance (75.5%).