This paper describes the architecture of a shift-reduce parsing framework that (1) leverages the representational powers of BERT and Hierarchical Attention Networks (HANs) and (2) jointly learns syntactic features to perform discourse parsing. Through this model, one can obtain context, structure and syntax-aware text representations that help carry out accurate parsing. Through experiments performed on corpora in five different languages, it is empirically shown that this model's performance rivals that of state-of-the-art parsers. Further, via qualitative analysis of predictions obtained, it is shown that HANs help extract important words and segments from large texts that are relevant for relation classification.