Spark- Create an empty RDD

Category : Apache Spark | Sub Category : Apache Spark Programs | By Runner Dev Last updated: 2020-10-25 10:35:53 Viewed : 273


Spark- Create an empty RDD

Spark- Create an empty RDD

     ·         org.apache.spark.SparkConf

Configuration for a Spark application. Used to set various Spark parameters as key-value pairs.

Most of the time, you would create a SparkConf object with new SparkConf(), which will load values from any spark.* Java system properties set in your application as well. In this case, parameters you set directly on the SparkConf object take priority over system properties.

For unit tests, you can also call new SparkConf(false) to skip loading external settings and get the same configuration no matter what the system properties are.

All setter methods in this class support chaining. For example, you can write new SparkConf().setMaster("local").setAppName("My app").

Note that once a SparkConf object is passed to Spark, it is cloned and can no longer be modified by the user. Spark does not support modifying the configuration at runtime.

·         org.apache.spark.SparkContext

Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.

Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one. This limitation may eventually be removed

·       def emptyRDD[T](implicit evidence$44: ClassTag[T]): EmptyRDD[T]

Get an RDD that has no partitions or elements.

·         final def getNumPartitions: Int

Returns the number of partitions of this RDD.

1.   Program Setup instructions:

Create a maven project and make sure present the below jars in pom.xml

·        JDK 1.7 or higher

·        Scala 2.10.3

·        spark-core_2.10

·        scala-library

2.   Example :

Following example illustrates about Spark- Create an empty RDD in scala

Save the file as −  CreateEmptyRDD.scala.

 CreateEmptyRDD.scala

  

 package com.runnerdev.rdd

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

 

object CreateEmptyRDD extends App {

 

  /* This is the way would start a Spark program  */

  /* Create SparkConf */

val conf = new SparkConf().setAppName("CreateRDD").setMaster("local[*]")

  val sc = new SparkContext(conf)

  /* Creates an empty RDD */

  /* val rdd: EmptyRDD[Nothing]*/

  val rdd = sc.emptyRDD

  println("rdd: "+rdd)

 

  /*   creates EmptyRDD[1] */

  /* val rddStr: EmptyRDD[String] */

  val rddStr = sc.emptyRDD[String]

  println("rddStr: "+rddStr)

 

  println("Num of Partitions: " + rdd.getNumPartitions) //0

}

 Compile and run the above example as follows

mvn clean install

run as a scala application

output: 

rdd: EmptyRDD[0] at emptyRDD at CreateRDD.scala:13

rddStr: EmptyRDD[1] at emptyRDD at CreateRDD.scala:18

Num of Partitions: 0

 

Search
Sub-Categories
Related Articles

Leave a Comment: