This paper proposes the world's first deep learning (DL)-assisted video encoder LSI fabricated in a 10nm process with a core area of 0.76mm 2 to integrate quad-core DL accelerators and 4K×2K H.264/H.265 video standards. A visual-contact-field network (VCFNet) DL model is newly designed to predict human focus information for extraordinary reducing the encoding complexity, leading to 82.3% of power reduction. Moreover, input channel reduction and layer merging approaches reduce VCFNet complexity by 69%. Operated at 0.9V and 504MHz, the proposed DL-assisted 4K video encoder LSI consumes 56.54mW to achieve 0.22nJ/pixel of energy efficiency, cutting 0.1-14nJ/pixel compared to conventional designs [1]–[3].