With the approach of the big data age, row-based databases, such as MySQL, have been inadequate for new application scenarios with massive data. Column-based storage is more efficient in its storage and query performances compared to row-based storage under big data situations. Parquet, which is well known as the column-based storage architecture for Hadoop, applies well to projects under Hadoop ecosystem, especially for big data applications, such as Spark SQL. In this paper, we explain the storage principle of parquet in detail and then analyze parquet's compression and encoding technologies. We conduct several groups of experiments between MySQL and Spark SQL. Our results show that Spark SQL with parquet is 95% smaller in data size than MySQL with index for given datasets and is more efficient in fuzzing queries which are frequently used in big data analysis. Our experiments also show Spark SQL with parquet has lower I/O rate and higher CPU utilization rate than MySQL when they perform the same query.