Sunday, 8 April 2018

Wordcount sample exercise in Spark

ffrom pyspark import SparkContext, SparkConf

#Reads an input set of text documents.
File1RDD=sc.textFile("/FileStore/tables/string1.txt")

#separate each word by spaces
Word=File1RDD.flatMap(lambda x: x.split(" "))

#apply function for each word with count 1
Word1=Word.map(lambda y:(y,1))

#apply function reducebykey for common words and thr count
Word2=Word1.reduceByKey(lambda x,y:x+y)

#change places as , count and word
Word3=Word2.map(lambda x:(x[1],x[0]))

#sort by key
Word3.sortByKey(False).collect()

#save in text file
#Word2.saveAsTextFile("/FileStore/tables/wordcount1.txt")





Sorted wordcount




For Applicationn writing commands Refer:
http://www.geoinsyssoft.com/pyspark-wordcount-arogram/

No comments:

Post a Comment

Spark-Transformation-1

Narrow transformations are the result of map, filter and such that is from the data from a single partition An output RDD has partitions...