After the success of the transformer networks on natural language processing (NLP), the application of transformers to computer vision (CV) has followed suit to deliver unprecedented performance gains on vision tasks, including image recognition and object detection. The multihead self-attention (MHSA) is the key component in transformers, allowing the models to learn the amount of attention paid to each input position. Despite its strong modeling capability, MHSA involves complex operations that make transformers prohibitively costly for hardware deployment. Existing acceleration efforts with conventional hardware platforms are challenged by the memory wall. To alleviate the memory wall problem, compute-in-memory (CIM) is a promising solution by storing all model parameters on-chip in compute-capable memory arrays. The footprint of 2-D CIM designs must, however, expand to accommodate the increasingly larger model sizes. In this work, we present a heterogeneous 3-D integrated (H3D) accelerator to target the MHSA workloads in vision transformers. H3D allows the proposed H3DAtten architecture to combine the merits of resistive random access memory (RRAM)-based analog CIM (ACIM) in 40 nm and static random access memory (SRAM)-based digital CIM (DCIM) in 16 nm. We perform comprehensive signaling and thermal analyses to examine the effects of 3-D stacking on the accelerator. Compared to iso-capacity 2-D baseline designs, the proposed 5-tier H3DAtten accelerator achieves $8.4\times $ compute density without experiencing accuracy loss on the ImageNet-1k dataset.