增量计算是针对许多在线大数据集每隔一段时间都会因为新数据添加进来产生缓慢增长,需要对整个数据集重新计算,导致效率低和计算资源浪费的问题提出的.文章通过分析增量计算的一般模式,参考已有增量计算系统的思想,探讨了如何基于开源大数据处理框架Hadoop,依托其最新的YARN模式架构具有通用性的增量计算系统.
Many online large data set will create a slow growth every once, it need to recalculate the whole data set because the new data added. This lead to the problem of low efficiency and waste computing resources. In this paper, analysis the general form of the incremental computations, reference the thought of existing incremental computations system, discusses how to establish an general incremental computations system architecture based on open source big data processing framework Hadoop, and relying on its latest YARN model.