Apart from the promising potential, federated learning (FL) faces challenges, such as high communication costs and client heterogeneity. Although numerous works have been proposed to address these issues, they lack a holistic perspective to balance all requirements. Moreover, these solutions have not fully utilized the underlying computation capability and network resources, resulting in suboptimal tradeoffs between communication efficiency and inference accuracy. To overcome these challenges, we propose FDL: a federated distillation (FD) learning framework that combines FD and FL to fully utilize computation and network resources. We theoretically prove the convergence bound of the proposed FDL framework. Furthermore, to minimize the training time while maintaining inference accuracy, we design HAD: a heterogeneity-aware FL/FD selection algorithm that determines the total communication rounds and selects the set of FL and FD nodes in each communication round. The optimality of HAD is also theoretically proved. The FDL framework and HAD algorithm together minimize the training time while satisfying the inference accuracy in a heterogeneous and dynamic environment. Extensive experiments on various learning algorithms and data sets show that the proposed FDL-HAD solution can obtain the optimal selection decision in overwhelmingly less selection time compared with the Gurobi solver and can reduce the overall training time by at least 44.8% compared with FL solutions with the same inference accuracy.