Enabling long-term operation during day and night for collaborative robots requires a comprehensive understanding of the unstructured environment. Besides, in the dynamic environment, robots must be able to recognize dynamic objects and collaboratively build a global map. This paper proposes a novel approach for dynamic collaborative mapping based on multimodal environmental perception. For each mission, robots first apply heterogeneous sensor fusion model to detect humans and separate them to acquire static observations. Then, the collaborative mapping is performed to estimate the relative position between robots and local 3D maps are integrated into a globally consistent 3D map. The experiment is conducted in the day and night rainforest with moving people. The results show the accuracy, robustness, and versatility in 3D map fusion missions.