Spark - Pair RDD Functions in scala

Category : Apache Spark | Sub Category : Apache Spark Programs | By Prasad Bonam Last updated: 2020-10-25 06:47:31 Viewed : 898


 Spark - Pair RDD Functions in Scala

·        def collectAsMap(): Map[String, Int]

Return the key-value pairs in this RDD to the master as a Map.

Warning: this doesn`t return a multimap (so if you have multiple values to the same key, only one value per key is preserved in the map returned)

·        def keys: RDD[String]

Return an RDD with the keys of each tuple.

·        def sortByKey(ascending: Boolean, numPartitions: Int): RDD[(String, Int)]

Sort the RDD by key, so that each partition contains a sorted range of the elements. Calling collect or save on the resulting RDD will return or output an ordered list of records (in the save case, they will be written to multiple part-X files in the filesystem, in order of the keys).

·        def flatMap[U](f: String => TraversableOnce[U])(implicit evidence$4: ClassTag[U]): RDD[U]

Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results. 

1.  Program Setup instructions:

Create a maven project and make sure present below jars in pom.xml

·        JDK 1.7 or higher

·        Scala 2.10.3

·        spark-core_2.10

·        scala-library  

2.  Example:
                   Following example illustrates about Spark - Pair RDD Functions in scala
                Save the file as −  SparkPairFunctions.scala

SparkPairFunctions .scala
 

package com.runnerdev.rdd

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

 

object SparkPairFunctions extends App{

  val conf = new SparkConf().setAppName("SparkPairFunctions").setMaster("local")

  // Create a Scala Spark Context.

  val sc = new SparkContext(conf)

 

  val rdd = sc.parallelize(

    List("Asia Egypt Turkey","India USA Singapore Canada,Turkey","USA India Germany China", "USA India Russia"  ))

 

  val wordsRdd = rdd.flatMap(_.split(" "))

  val pairRDD = wordsRdd.map(m => (m, 1))

  println("pairRDD: ")

  pairRDD.foreach(println)

 

  println("Distinct ==>")

  pairRDD.distinct().foreach(println)

 

  //SortByKey

  println("Sort by Key ==>")

  val sortRDD = pairRDD.sortByKey()

  sortRDD.foreach(println)

 

  //reduceByKey

  println("Reduce by Key ==>")

  val wordCount = pairRDD.reduceByKey((x, y) => x + y)

  wordCount.foreach(println)

 

  def param1 = (accu: Int, v: Int) => accu + v

  def param2 = (accu1: Int, accu2: Int) => accu1 + accu2

  println("Aggregate by Key ==> wordcount ")

  val wordCount2 = pairRDD.aggregateByKey(0)(param1, param2)

  wordCount2.foreach(println)

 

  //keys

  println("Keys ==>")

  wordCount2.keys.foreach(println)

 

  //values

  println("values ==>")

  wordCount2.values.foreach(println)

 

  println("Count :" + wordCount2.count())

 

  println("collectAsMap ==>")

  pairRDD.collectAsMap().foreach(println)

}


Compile and run the above example as follows

mvn clean install

run as a scala application-

 

 

Output:

 

pairRDD:

(Asia,1)

(Egypt,1)

(Turkey,1)

(India,1)

(USA,1)

(Singapore,1)

(Canada,Turkey,1)

(USA,1)

(India,1)

(Germany,1)

(China,1)

(USA,1)

(India,1)

(Russia,1)

Distinct ==>

(Germany,1)

(Singapore,1)

(India,1)

(Egypt,1)

(Turkey,1)

(Canada,Turkey,1)

(China,1)

(USA,1)

(Asia,1)

(Russia,1)

Sort by Key ==>

 

(Asia,1)

(Canada,Turkey,1)

(China,1)

(Egypt,1)

(Germany,1)

(India,1)

(India,1)

(India,1)

(Russia,1)

(Singapore,1)

(Turkey,1)

(USA,1)

(USA,1)

(USA,1)

 

Reduce by Key ==>

 

 

(Asia,1)

(China,1)

(USA,3)

(Singapore,1)

(Germany,1)

(Canada,Turkey,1)

(Egypt,1)

(Russia,1)

(Turkey,1)

(India,3)

 

Aggregate by Key ==> wordcount

Asia,1)

(China,1)

(USA,3)

(Singapore,1)

(Germany,1)

(Canada,Turkey,1)

(Egypt,1)

(Russia,1)

(Turkey,1)

(India,3)

 

Keys ==>


 

Asia

China

USA

Singapore

Germany

Canada,Turkey

Egypt

Russia

Turkey

India

values ==>

1

1

3

1

1

1

1

1

1

3

Count :10

collectAsMap ==>

(Egypt,1)

(Asia,1)

(Canada,Turkey,1)

(Germany,1)

(China,1)

(Russia,1)

(India,1)

(Turkey,1)

(Singapore,1)

(USA,1)

Search
Related Articles

Leave a Comment: