The key role of RGB-D co-salient object detection is to effectively fuse the common information of RGB and depth signals. Existing works directly mix the information captured from both original depth maps and RGB images, but ignore one critical issue: due to the low contrast of the neighborhood objects in depth, the depth maps’ salient regions may correspond to the interference background regions in the RGB images, thereby leading to unsatisfying performance. To address this issue, we propose an Object-aware Calibrated Depth guided transformer (dubbed as OCDFormer) for RGB-D co-salient object detection. The OCDFormer mainly consists of two key designs: First, we design a depth calibration module via spectral clustering, which yields a group of calibrated depth maps that can highlight the co-object region while suppressing the interference regions. Second, we construct a cross-modal transformer, in which the common information from the RGB and the calibrated depth maps are fully captured by first injecting common tokens into the individual tokens, and then mixing them with an interaction-attention mechanism. Extensive evaluations demonstrate that our OCDFormer sets a new state-of-the-art on two public standard benchmarks including RGB-D CoSall5O and RGB-D CoSegl83.