ffrom pyspark import SparkContext, SparkConf
#Reads an input set of text documents.
File1RDD=sc.textFile("/FileStore/tables/string1.txt")
#separate each word by spaces
Word=File1RDD.flatMap(lambda x: x.split(" "))
#apply function for each word with count 1
Word1=Word.map(lambda y:(y,1))
#apply function reducebykey for common words and thr count
Word2=Word1.reduceByKey(lambda x,y:x+y)
#change places as , count and word
Word3=Word2.map(lambda x:(x[1],x[0]))
#sort by key
Word3.sortByKey(False).collect()
#save in text file
#Word2.saveAsTextFile("/FileStore/tables/wordcount1.txt")
Sorted wordcount
For Applicationn writing commands Refer:
http://www.geoinsyssoft.com/pyspark-wordcount-arogram/
#Reads an input set of text documents.
File1RDD=sc.textFile("/FileStore/tables/string1.txt")
#separate each word by spaces
Word=File1RDD.flatMap(lambda x: x.split(" "))
#apply function for each word with count 1
Word1=Word.map(lambda y:(y,1))
#apply function reducebykey for common words and thr count
Word2=Word1.reduceByKey(lambda x,y:x+y)
#change places as , count and word
Word3=Word2.map(lambda x:(x[1],x[0]))
#sort by key
Word3.sortByKey(False).collect()
#save in text file
#Word2.saveAsTextFile("/FileStore/tables/wordcount1.txt")
Sorted wordcount
For Applicationn writing commands Refer:
http://www.geoinsyssoft.com/pyspark-wordcount-arogram/
No comments:
Post a Comment