Novel view synthesis by neural radiance fields has achieved great improvement with the development of deep learning. However, how to make the method generic across scenes has always been a challenging task. A good idea is to introduce 2D features of the single-view image of the scene as prior knowledge for adaptive modeling. In this paper, we are dedicated to exploring ways to better integrate multi-view image features and model this prior knowledge with inquired novel target views. Our framework innovatively adopts a transformer encoder to fuse multi-view features as global memory, and this global memory will be further input into the transformer decoder to get the more effective features conditioned on the target view as a query. This feature acts as prior knowledge to guide the model to become a general neural radiation field. Extensive experiments are carried out both on category-specific and category-agnostic benchmarks. The results show that TransNeRF achieves state-of-the-art performance and is superior to the earlier novel view synthesis methods, whether single-view input or multi-view input.