In contemporary shared bicycle scheduling systems, a common occurrence is a severe imbalance between regions, with one area suffering from a dearth of bicycles while another area is burdened with an excess. The task of balancing the distribution of vehicles can be abstracted as The Vehicle Routing Problem with Pickup and Delivery and Time Windows (VRPSPDTW). The primary objective is to efficiently schedule vehicles between multiple sites to minimize the overall transportation cost. However, in practical scheduling scenarios, the demand at each site undergoes dynamic fluctuations over time, thereby presenting a formidable challenge for problem-solving. Hence, this study employs a deep reinforcement learning algorithm equipped with a pointer network, constructing a computational environment capable of accommodating the dynamic nature of demand and lifting the constraints on the number of scheduled vehicles. Furthermore, the integration of a node masking mechanism enables the acquisition of high-quality scheduling strategies for addressing the shared bicycle scheduling predicament. Two distinct scheduling strategies are proposed, one prioritizing cost optimization and the other emphasizing user demand fulfillment. Empirical findings substantiate that the proposed model surpasses other conventional algorithms in terms of performance efficacy and computational efficiency.