The rapid development of deep learning has propelled many real-world artificial intelligence applications. Many of these applications integrate multiple neural networks (multi-NN) to cater to various functionalities. There are two challenges of multi-NN acceleration: (1) competition for shared resources becomes a bottleneck, and (2) heterogeneous workloads exhibit remarkably different computing-memory characteristics and various synchronization requirements. Therefore, resource isolation and fine-grained resource allocation for each task are two fundamental requirements for multi-NN computing systems. Although a number of multi-NN acceleration technologies have been explored, few can completely fulfill both of these requirements, especially for mobile scenarios. This paper reports a Hierarchical Asynchronous Parallel Model (HASP) to enhance multi-NN performance to meet both requirements. HASP can be implemented on a multicore processor that adopts Multiple Instruction Multiple Data (MIMD) or Single Instruction Multiple Thread (SIMT) architectures, with minor adaptive modification needed. Further, a prototype chip is developed to validate the hardware effectiveness of this design. A corresponding mapping strategy is also developed, allowing the proposed architecture to simultaneously promote resource utilization and throughput. With the same workload, the prototype chip demonstrates 3.62
$\boldsymbol{\times}$×, and 3.51
$\boldsymbol{\times}$× higher throughput over Planaria and 8.68
$\boldsymbol{\times}$×, 2.61
$\boldsymbol{\times}$× over Jetson AGX Orin for MobileNet-V1 and ResNet50, respectively.