Estimating conversion rate (CVR) accurately has been one of the most central problems in online advertising. Existing methods in production focus on learning effective interactions among features to boost the model performance. Despite great success, these methods treat all the features equally without distinction. However, different features suffer differently from cold-start issues. Tail elements in those high-cardinality features, which we denote as fine-grained features, tend to have inadequate samples and thus fail to obtain semantically meaningful embeddings. Interacting with those features leads astray and impairs the accuracy of new ads in a cold-start scenario. In this paper, we propose Automatic Fusion Network (AutoFuse) to better tackle the challenge. AutoFuse explicitly separates features into groups based on their granularity and learns multiple levels of representation conditioned on different combinations of feature groups. Concretely, AutoFuse learns an ad-level representation to depict the unique individual character and a group-level representation to portray the collective information by discarding the fine-grained features. The final robust and general ad representation is obtained by integrating these two level representations adaptively. Such a combination encompasses a wider amount of information, and thereby mitigates the cold-start issue. Extensive experiments on two industrial-scale datasets and three public datasets show that AutoFuse significantly and consistently outperforms a spectrum of competitive methods including our currently deployed model. Meanwhile, the remarkable improvement on new ads validates the effectiveness of our method in cold-start scenarios. We design AutoFuse as a generic approach and thus it can be seamlessly transferred into other domains. Our method has been deployed online to serve billions of users and ads and has achieved significant GMV gain of 2.84%.