Dense systems with large number of cores per node are becoming increasinglypopular. Existing designs of the Process Management Interface (PMI) show poorscalability in terms of performance and memory consumption on such systems withlarge number of processes concurrently accessing the PMI interface. Ouranalysis shows the local socket-based communication scheme used by PMI to be amajor bottleneck. While using a shared memory based channel can avoid thisbottleneck and thus reduce memory consumption and improve performance, there areseveral challenges associated with such a design. We investigate several suchalternatives and propose a novel design that is based on a hybrid socket+sharedmemory based communication protocol and uses multiple shared memory regions. This design can reduce the memory usage per node by a factor of. Ourevaluations show that memory consumption per node can be reduced by an estimated1 GB with 1 million MPI processes and 16 processes per node. Additionally, performance of PMI Get is improved by 1,000 times compared to the existingdesign. The proposed design is backward compatible, secure, and imposesnegligible overhead.