## Window functions in Spark SQL (Databricks)

Category : Microsoft Azure Data Engineering | Sub Category : Practice Assessment for Exam DP-203 - Data Engineering on Microsoft Azure | By Prasad Bonam Last updated: 2023-09-11 17:34:37 Viewed : 28

Window functions in Spark SQL (Databricks):

Window functions in Spark SQL (Databricks) allow you to perform calculations across a set of table rows that are related to the current row. Here are some examples of window functions:

Assuming you have a DataFrame or table named `mytable`, you can use the following examples to demonstrate various window functions:

1. Basic Window Function:

• Calculate the row number for each row within a partition.
```sqlSELECT
column1,
column2,
ROW_NUMBER() OVER (ORDER BY column1) AS row_num
FROM mytable;```
2. PARTITION BY Clause:

• Calculate the sum of a column within different partitions.
```sqlSELECT
column1,
column2,
SUM(column3) OVER (PARTITION BY column1) AS partition_sum
FROM mytable;```
3. ORDER BY Clause:

• Calculate a running total within a partition based on a specific order.
```sqlSELECT
column1,
column2,
SUM(column3) OVER (PARTITION BY column1 ORDER BY column2) AS running_total
FROM mytable;```
4. RANGE and ROWS Clauses:

• Define the frame specification using `RANGE BETWEEN` or `ROWS BETWEEN` to calculate aggregations within a specific range.
```sqlSELECT
column1,
column2,
SUM(column3) OVER (PARTITION BY column1 ORDER BY column2
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS range_sum
FROM mytable;```
5. DENSE_RANK and RANK Functions:

• Calculate the dense rank and rank of rows within a partition.
```sql```SELECT
column1,
column2,
DENSE_RANK() OVER (PARTITION BY column1 ORDER BY column2) AS dense_rank,
RANK() OVER (PARTITION BY column1 ORDER BY column2) AS rank
FROM mytable;
``````

• Retrieve values from previous and subsequent rows within a partition.
```sqlSELECT
column1,
column2,
LAG(column2) OVER (PARTITION BY column1 ORDER BY column2) AS prev_value,
LEAD(column2) OVER (PARTITION BY column1 ORDER BY column2) AS next_value
FROM mytable;```

These examples demonstrate various use cases for window functions in Spark SQL (Databricks). Window functions are powerful tools for performing complex calculations and aggregations within partitions of your data, which can be very useful for advanced analytics and reporting tasks.

Here are examples of Spark SQL queries in Databricks that use window functions along with sample outputs. These examples assume you have a table named `mytable` with the appropriate columns.

1. Basic Window Function - ROW_NUMBER:

```sqlSELECT
column1,
column2,
ROW_NUMBER() OVER (ORDER BY column1) AS row_num
FROM mytable;```

Sample Output:

```css+-------+-------+-------+
|column1|column2|row_num|
+-------+-------+-------+
|A      |X      |1      |
|A      |Y      |2      |
|B      |Z      |3      |
+-------+-------+-------+```
2. PARTITION BY Clause - SUM:

```sqlSELECT
column1,
column2,
SUM(column3) OVER (PARTITION BY column1) AS partition_sum
FROM mytable;```

Sample Output:

```css+-------+-------+--------------+
|column1|column2|partition_sum |
+-------+-------+--------------+
|A      |X      |8             |
|A      |Y      |8             |
|B      |Z      |7             |
+-------+-------+--------------+```
3. ORDER BY Clause - Running Total:

```sqlSELECT
column1,
column2,
SUM(column3) OVER (PARTITION BY column1 ORDER BY column2) AS running_total
FROM mytable;```

Sample Output:

```css+-------+-------+--------------+
|column1|column2|running_total |
+-------+-------+--------------+
|A      |X      |3             |
|A      |Y      |8             |
|B      |Z      |7             |
+-------+-------+--------------+```
4. DENSE_RANK and RANK Functions:

```sqlSELECT
column1,
column2,
DENSE_RANK() OVER (PARTITION BY column1 ORDER BY column2) AS dense_rank,
RANK() OVER (PARTITION BY column1 ORDER BY column2) AS rank
FROM mytable;```

Sample Output:

```css+-------+-------+-----------+-----+
|column1|column2|dense_rank |rank |
+-------+-------+-----------+-----+
|A      |X      |1          |1    |
|A      |Y      |2          |2    |
|B      |Z      |1          |1    |
+-------+-------+-----------+-----+```

```sqlSELECT
column1,
column2,
LAG(column2) OVER (PARTITION BY column1 ORDER BY column2) AS prev_value,
LEAD(column2) OVER (PARTITION BY column1 ORDER BY column2) AS next_value
FROM mytable;```

Sample Output:

```sql+-------+-------+-----------+-----------+
|column1|column2|prev_value |next_value |
+-------+-------+-----------+-----------+
|A      |X      |null       |Y          |
|A      |Y      |X          |null       |
|B      |Z      |null       |null       |
+-------+-------+-----------+-----------+```

These examples demonstrate the use of window functions in Spark SQL (Databricks) with sample outputs to help you understand the results of each query. Window functions are valuable for performing complex calculations and analytics on your data within specific partitions or ordered sets of rows.