Federated Learning (FL) is a promising distributed learning paradigm and has gained recent attention from both academia and industry. One challenge in FL is that when local data across different devices are not independent and identically distributed (non-IID), models trained using FL generally have degraded performance. To address the problem, one natural approach is clustering: clients with similar data distributions are grouped into the same clusters and each cluster trains a specialized model. However, features utilized for clustering generally rely on a single global model trained during FL, whose convergence usually incurs high communication cost. In this paper, we propose CAFL, an energy-efficient clustering method in FL. In CAFL, clustering features of a client are not based on a collaboratively trained global model by FL, but a tensor of gradient vectors computed on local data. With this approach, the communication overhead for clustering is greatly reduced. We validated CAFL on simulated datasets include Fashion-MNIST and CIFAR-10, and the results show that compared with existing clustering methods in FL, CAFL has much lower communication cost while still ensuring a high clustering accuracy.