Tree counting plays an important role in wide applications of environmental protection, agricultural planning and crop yield estimation. However, traditional tree counting methods require expensive feature engineering, which causes additional mistake and cannot be optimized overall. Recently, deep learning based approaches have been adopted for this task which demonstrate state-of-the-art performance. In this paper, a point-wise supervised segmentation network is proposed based on a deep segmentation network with only weak supervision, which can complete localization and generate mask of each tree simultaneously. In the first step, a tree feature extractor module is adopted to extract features of input images with a novel encoder–decoder network. In the second step, an effective strategy is designed to deal with different conditions with mask predictions. Finally, the basic localization and rectification guidance are introduced to train the whole network. In addition, two different datasets are created and an existing challenging plant dataset is selected to evaluate the proposed method. Experimental results on those datasets show that the proposed method outperforms the state-of-the-art methods in most challenging conditions. This method has great potential to reduce human labor due to effective automatic generated masks.