Aerial tracking of dispersed crowd groups with a single target window is a novel and one of the most challenging problems in Computer Vision and Robotics. Considering crowd group as a multi-object tracking problem can often lead to computational burden and frequent target mismatch due to numerous occlusions, whereas a single window can efficiently focus on the target. Recent progress on single object tracking (SOT) algorithms is achieved by learning a generic discriminator model from object tracking datasets, continuously updated during the testing steps. However, while tracking a group of crowd with a single window, the rigid discriminator can not generalize frequent group reformation, binomial dispersion, and crowd shape changes due to less knowledge about human-to-human interactions. To alleviate the issues, we propose a novel photo-realistic Unreal UAV Crowd Tracking (UUCT) dataset, which benchmarks aerial crowd group movements into several attributes. Second, we formulate a novel algorithm, Hybrid Motion Pooling (HyMP), which extends the existing SOT algorithm, DiMP, by exploiting graph convolutional networks for learning human groups and low-rank bilinear pooling for capturing temporal group reformations end-to-end. Then, we compare HyMP with state-of-the-art (SOTA) trackers on UUCT to demonstrate HyMP's effectiveness in group tracking. Also, we illustrate the generalizability of HyMP by evaluating on the existing benchmarks. On average, HyMP outperforms SOTA approaches by 7.5% on UUCT and 4.3% on related datasets.