Though self-supervised contrastive learning (CL) has shown its potential to achieve state-of-the-art accuracy without any supervision, its behavior still remains under-investigated. Different from most previous work that understands CL from learning objectives, we focus on an unexplored yet natural aspect: the spatial inductive bias which seems to be implicitly exploited via data augmentations in CL. We design an experiment to study the reliance of CL on such spatial inductive bias, by destroying the global or local spatial structures of an image with global or local patch shuffling, and comparing the performance drop between experiments on original and corrupted dataset to quantify the reliance on certain inductive bias. We also use the uniformity of feature space to further research how CL-pre-trained models behave with the corrupted dataset. Our results and analysis show that CL has a much higher reliance on spatial inductive bias than SL, regardless of specific CL algorithm or backbones, opening a new direction for studying the behavior of CL.