학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Efficient Cache Update for In-Memory Cluster Computing with Spark

Resource Type: Conference
Authors: Ho, Li-Yung; Wu, Jan-Jan; Liu, Pangfeng; Shih, Chia-Chun; Huang, Chi-Chang; Huang, Chao-Wen
Source: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) CCGRID Cluster, Cloud and Grid Computing (CCGRID), 2017 17th IEEE/ACM International Symposium on. :21-30 May, 2017
Subject: Communication, Networking and Broadcast Technologies
Computing and Processing
Sparks
Electronic mail
Relational databases
Telecommunications
Distributed databases
Dynamic programming
big data computing
cache update
Spark
resilient distributed dataset
telecom billing system
Language

Online Access

Full Text (IEEE)

초록

This paper proposes a scalable and efficient cacheupdate technique to improve the performance of in-memorycluster computing in Spark, a popular open-source system forbig data computing. Although the memory cache speeds up dataprocessing in Spark, its data immutability constraint requiresreloading the whole RDD when part of its data is updated. Suchconstraint makes the RDD update inefficient. To address thisproblem, we divide an RDD into partitions, and propose thepartial-update RDD (PRDD) method to enable users to replaceindividual partition(s) of an RDD. We devise two solutions to theRDD partition problem – a dynamic programming algorithm anda nonlinear programming method. Experiment results suggestthat, PRDD achieves 4.32x speedup when compared with theoriginal RDD in Spark. We apply PRDD to a billing system forChunghwa Telecomm, the largest telecommunication company inTaiwan. Our result shows that the PRDD based billing systemoutperforms the original billing system in CHT by a factor of24x in throughput. We also evaluate PRDD using the TPC-Hbenchmark, which also yields promising result.

공지

DAU Library

학술논문

요약정보

Efficient Cache Update for In-Memory Cluster Computing with Spark

Online Access

초록