Motion and depth information are crucial for features of position tracking and obstacle avoidance in diversified blooming applications like drone and AR/VR glasses. Therefore, the low-power low-latency vision sensor is in highly demand [1–5]. For motion detection (MD), there are asynchronous event-based [1] and the synchronous frame-based [2–5] approaches. In event-based approach, the redundant data traffic is reduced with the drawback of data collection time, special processing engine, and noise sensitivity. In frame-based approach, it stores and subtracts frame data with the need of in-pixel storage element and complex pixel circuit. To address these issues, this work presents a frame-based MD vision sensor with a new temporal contrast pixel (TCP), which realizes in-pixel temporal contrast calculation and event reporting with global-shuttering and frame-differencing pulse-width-modulation (PWM) operations using only 6 transistors and 1 capacitor (6T1C). Compared to the ping-pong structure in [5] using 10T2C, the proposed TCP can achieve more features including the consecutive event reporting in a 0.64x smaller pixel with a 2.7x higher fill factor (FF). Moreover, the LBP and ROI extraction are also implemented on-chip for disparity calculation and depth sensing in stereo vision system.