RunnerDev | Home Page

Spark -Some DataFrame examples using Scala in Apache Spark

Category : Apache Spark | Sub Category : Apache Spark Programs | By Prasad Bonam Last updated: 2023-10-01 14:22:01 Viewed : 273

Here are some DataFrame examples using Scala in Apache Spark:

scala
// Import the SparkSession library
import org.apache.spark.sql.SparkSession

// Create a SparkSession
val spark = SparkSession.builder()
  .appName("DataFrameExamples")
  .getOrCreate()

// Example 1: Creating a DataFrame from a sequence of case class objects

// Define a case class
case class Person(name: String, age: Int)

// Create a sequence of case class objects
val peopleSeq = Seq(Person("Alice", 25), Person("Bob", 30), Person("Charlie", 35))

// Create a DataFrame from the sequence
val peopleDF = spark.createDataFrame(peopleSeq)

// Show the DataFrame
peopleDF.show()

// Example 2: Loading data from a CSV file into a DataFrame

// Load a CSV file into a DataFrame
val csvDF = spark.read
  .option("header", "true") // Treat the first row as the header
  .option("inferSchema", "true") // Infer data types
  .csv("/path/to/your/file.csv")

// Show the DataFrame
csvDF.show()

// Example 3: Performing operations on DataFrames

// Select specific columns
val selectedDF = csvDF.select("name", "age")
selectedDF.show()

// Filtering rows
val filteredDF = csvDF.filter(csvDF("age") > 30)
filteredDF.show()

// Grouping and aggregation
import org.apache.spark.sql.functions._
val groupAggDF = csvDF.groupBy("gender").agg(avg("age"), max("salary"))
groupAggDF.show()

// Example 4: Joining DataFrames

// Create another DataFrame
val departmentDF = Seq((1, "HR"), (2, "Finance"), (3, "Engineering")).toDF("dept_id", "dept_name")

// Join the two DataFrames
val joinedDF = peopleDF.join(departmentDF, peopleDF("dept_id") === departmentDF("dept_id"), "inner")

// Show the joined DataFrame
joinedDF.show()

// Example 5: Writing DataFrames to various formats

// Write the DataFrame to Parquet format
csvDF.write.parquet("/path/to/output/parquet")

// Write the DataFrame to JSON format
csvDF.write.json("/path/to/output/json")

// Stop the SparkSession
spark.stop()

In these examples:

Example 1 shows how to create a DataFrame from a sequence of case class objects.
Example 2 demonstrates loading data from a CSV file into a DataFrame, where we specify options like header and schema inference.
Example 3 illustrates common DataFrame operations, such as selecting columns, filtering rows, and performing group aggregations.
Example 4 showcases joining two DataFrames based on a common key.
Example 5 demonstrates how to write DataFrames to different file formats like Parquet and JSON.

You can run these Scala examples in a Spark shell or as part of a Scala Spark application. Make sure to replace the file paths and column names with your specific data and column names as needed.

Spark -Some DataFrame examples using Scala in Apache Spark

Search

Categories

Sub-Categories

Related Articles

Leave a Comment: