학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

Compared Analysis of Row-Based Storage and Column-Based Storage

Resource Type: Conference
Authors: Hu, Jianping; Wang, Yongyi; Shi, Fan; Xu, Chengxi
Source: 2018 Eighth International Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC) IMCCC Instrumentation & Measurement, Computer, Communication and Control (IMCCC), 2018 Eighth International Conference on. :168-173 Jul, 2018
Subject: Communication, Networking and Broadcast Technologies
Computing and Processing
Engineering Profession
Robotics and Control Systems
Signal Processing and Analysis
Sparks
Indexes
Engines
Cache memory
Big Data
Linux
Encoding
Row-based Storage
Column-based Storage
Parquet
Performance Analysis
Language

Online Access

Full Text (IEEE)

초록

With the approach of the big data age, row-based databases, such as MySQL, have been inadequate for new application scenarios with massive data. Column-based storage is more efficient in its storage and query performances compared to row-based storage under big data situations. Parquet, which is well known as the column-based storage architecture for Hadoop, applies well to projects under Hadoop ecosystem, especially for big data applications, such as Spark SQL. In this paper, we explain the storage principle of parquet in detail and then analyze parquet's compression and encoding technologies. We conduct several groups of experiments between MySQL and Spark SQL. Our results show that Spark SQL with parquet is 95% smaller in data size than MySQL with index for given datasets and is more efficient in fuzzing queries which are frequently used in big data analysis. Our experiments also show Spark SQL with parquet has lower I/O rate and higher CPU utilization rate than MySQL when they perform the same query.

공지

DAU Library

학술논문

요약정보

Compared Analysis of Row-Based Storage and Column-Based Storage

Online Access

초록