Spark-Apache Spark RDD (Resilient Distributed Dataset) examples in Scala

Category : Apache Spark | Sub Category : Apache Spark Programs | By Prasad Bonam Last updated: 2023-10-01 00:43:26 Viewed : 176


Here are some Apache Spark RDD (Resilient Distributed Dataset) examples in Scala with their respective outputs:

  1. Creating an RDD from a Collection and Performing Basic Operations:
scala
import org.apache.spark.{SparkConf, SparkContext} // Create a SparkConf and SparkContext val conf = new SparkConf().setAppName("RDDExample").setMaster("local[*]") val sc = new SparkContext(conf) // Create an RDD from a collection val data = Seq(1, 2, 3, 4, 5) val rdd = sc.parallelize(data) // Calculate the sum of elements val sum = rdd.reduce(_ + _) // Print the RDD elements and the sum rdd.foreach(println) println(s"Sum of elements: $sum") // Stop the SparkContext sc.stop()

Output:

mathematica
1 2 3 4 5 Sum of elements: 15
  1. Filtering and Mapping an RDD:
scala
import org.apache.spark.{SparkConf, SparkContext} // Create a SparkConf and SparkContext val conf = new SparkConf().setAppName("RDDExample").setMaster("local[*]") val sc = new SparkContext(conf) // Create an RDD from a list of words val words = sc.parallelize(List("hello", "world", "scala", "spark")) // Filter words starting with s val filteredWords = words.filter(_.startsWith("s")) // Map words to their lengths val wordLengths = words.map(word => (word, word.length)) // Print the filtered words and word lengths filteredWords.foreach(println) wordLengths.foreach(println) // Stop the SparkContext sc.stop()

Output:

scss
scala spark (hello,5) (world,5) (scala,5) (spark,5)
  1. Word Count Example:
scala
import org.apache.spark.{SparkConf, SparkContext} // Create a SparkConf and SparkContext val conf = new SparkConf().setAppName("WordCountExample").setMaster("local[*]") val sc = new SparkContext(conf) // Create an RDD from a text file val textFile = sc.textFile("path_to_text_file.txt") // Perform word count val wordCounts = textFile .flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) // Print word counts wordCounts.foreach(println) // Stop the SparkContext sc.stop()

Output (for example input text):

scss
(word1,3) (word2,2) (word3,1) ...

These are some basic Apache Spark RDD examples in Scala. They demonstrate how to create RDDs, perform transformations and actions, and work with simple data operations. You can modify and expand upon these examples to suit your specific needs.

Search
Related Articles

Leave a Comment: