Sunday, 8 April 2018

Parquet file-Read and Write


How to read parquet file

from pyspark import SparkContext, SparkConf

#read parquet file
parquetFile1=sqlContext.read.parquet("StudentId5A.parquet") 

#register Temp Table from this file
parquetFile1.registerTempTable("Student1TB")

#apply sql query on tis table
result1=sqlContext.sql("select * from Student1TB")

result1.show()


How to write into parquet file

Stuctured dataframe with column names can not be saves as Text file , as text file can not handle column stucture.

So, save as parquet file as follows:

Result1=sqlContext.sql("select * from StudentTB where Id>5 ")
Result1.show()

#save as parquet file from sql result table

#Result1.write.save("DBFS:///FileStore/tables/StudentID5.txt")   OR
Result1.write.parquet("DBFS:///FileStore/tables/StudentId5A.parquet")


No comments:

Post a Comment

Spark-Transformation-1

Narrow transformations are the result of map, filter and such that is from the data from a single partition An output RDD has partitions...