Recent unsupervised representation learning methods rely heavily on various transformations to generate distinctive views of given samples. Transformations for these views are generally defined manually, requiring significant human effort to design detailed configurations and validate practical efficacy. Furthermore, the diversity of these views is quite limited in scope causing the network to be invariant to only a small set of data transformations. To address these problems, we introduce a neural transformation network that learns to generate diverse views. Our proposed framework consists of an encoder-decoder network architecture that encodes semantic information and then randomly stylizes it with style amplification. However, such generative processes tend to cause degradation compared to the original images, which can harm the quality of the learned representation. To remedy this issue and generate more diverse styles, we use a linear augmentation between the generated view and the original image. Finally, we apply geometric transformations to aid in contrastive learning of representations. We evaluate the learned representations on various downstream vision tasks. Results show highly competitive recognition performance compared to the state-of-the-art methods that use learned views or hand-crafted views for representation learning.