RunnerDev | Home Page

Spark-Apache Spark RDD (Resilient Distributed Dataset) examples in Scala

Category : Apache Spark | Sub Category : Apache Spark Programs | By Prasad Bonam Last updated: 2023-10-01 00:43:26 Viewed : 254

Here are some Apache Spark RDD (Resilient Distributed Dataset) examples in Scala with their respective outputs:

Creating an RDD from a Collection and Performing Basic Operations:

scala
import org.apache.spark.{SparkConf, SparkContext}

// Create a SparkConf and SparkContext
val conf = new SparkConf().setAppName("RDDExample").setMaster("local[*]")
val sc = new SparkContext(conf)

// Create an RDD from a collection
val data = Seq(1, 2, 3, 4, 5)
val rdd = sc.parallelize(data)

// Calculate the sum of elements
val sum = rdd.reduce(_ + _)

// Print the RDD elements and the sum
rdd.foreach(println)
println(s"Sum of elements: $sum")

// Stop the SparkContext
sc.stop()

Output:

mathematica
1
2
3
4
5
Sum of elements: 15

Filtering and Mapping an RDD:

scala
import org.apache.spark.{SparkConf, SparkContext}

// Create a SparkConf and SparkContext
val conf = new SparkConf().setAppName("RDDExample").setMaster("local[*]")
val sc = new SparkContext(conf)

// Create an RDD from a list of words
val words = sc.parallelize(List("hello", "world", "scala", "spark"))

// Filter words starting with s
val filteredWords = words.filter(_.startsWith("s"))

// Map words to their lengths
val wordLengths = words.map(word => (word, word.length))

// Print the filtered words and word lengths
filteredWords.foreach(println)
wordLengths.foreach(println)

// Stop the SparkContext
sc.stop()

Output:

scss
scala
spark
(hello,5)
(world,5)
(scala,5)
(spark,5)

Word Count Example:

scala
import org.apache.spark.{SparkConf, SparkContext}

// Create a SparkConf and SparkContext
val conf = new SparkConf().setAppName("WordCountExample").setMaster("local[*]")
val sc = new SparkContext(conf)

// Create an RDD from a text file
val textFile = sc.textFile("path_to_text_file.txt")

// Perform word count
val wordCounts = textFile
  .flatMap(line => line.split(" "))
  .map(word => (word, 1))
  .reduceByKey(_ + _)

// Print word counts
wordCounts.foreach(println)

// Stop the SparkContext
sc.stop()

Output (for example input text):

scss
(word1,3)
(word2,2)
(word3,1)
...

These are some basic Apache Spark RDD examples in Scala. They demonstrate how to create RDDs, perform transformations and actions, and work with simple data operations. You can modify and expand upon these examples to suit your specific needs.

Spark-Apache Spark RDD (Resilient Distributed Dataset) examples in Scala

Search

Categories

Sub-Categories

Related Articles

Leave a Comment: