Video surveillance systems are playing increasingly important roles in our everyday lives. To get meaningful surveillance information in a timely and accurate manner, it is vital to optimally allocate computation and communication resources for image classification tasks. In this paper, taking face recognition as an example, we propose a novel end-to-edge collaborative computing system based on a multi-exit network to dynamically allocate computation at the front end (the camera sensor) and back end (the mobile edge computing server). With the ∊-greedy algorithm for reinforcement learning, the decision module decides whether to obtain recognition results from earlier exits at the front end or transmit the feature maps to the back end to obtain more accurate results. The module balances recognition accuracy and time overhead under different channel conditions. Experimental results show that the proposed system can significantly save inference time and maintain competitive accuracy in various communication channel conditions.