Spark - Read data from Hbase using Spark and Scala

Category : Apache Spark | Sub Category : Apache Spark Programs | By Runner Dev Last updated: 2020-10-25 13:19:10 Viewed : 252


Read data from Hbase using Spark and Scala

 

1. Create a Maven project

 

 Add the below jars in pom.xml file

 

<properties>

                    <hbase.version>1.2.0-cdh5.8.5</hbase.version>

                    <spark.version>1.6.0-cdh5.8.0</spark.version>

          </properties>

 

<dependencies>

                     

                    <dependency>

                              <groupId>org.scala-lang</groupId>

                              <artifactId>scala-library</artifactId>

                              <version>2.10.6</version>

 

                    </dependency>

 

                    <dependency>

                              <groupId>org.apache.spark</groupId>

                              <artifactId>spark-core_2.10</artifactId>

                              <version>${spark.version}</version>

                              <exclusions>

 

                                        <exclusion>

                                                  <groupId>org.scala-lang</groupId>

                                                  <artifactId>scala-library</artifactId>

                                        </exclusion>

                              </exclusions>

                    </dependency>                                    

 

                    <dependency>

                              <groupId>org.apache.spark</groupId>

                              <artifactId>spark-sql_2.10</artifactId>

                              <version>${spark.version}</version>

                              <exclusions>

 

                                        <exclusion>

                                                  <groupId>org.scala-lang</groupId>

                                                  <artifactId>scala-library</artifactId>

                                        </exclusion>

                              </exclusions>

                    </dependency>

                    <dependency>

                              <groupId>it.nerdammer.bigdata</groupId>

                              <artifactId>spark-hbase-connector_2.10</artifactId>

                              <version>1.0.3</version>

                              <exclusions>

 

                                        <exclusion>

                                                  <groupId>org.scala-lang</groupId>

                                                  <artifactId>scala-library</artifactId>

                                        </exclusion>

                              </exclusions>

                    </dependency>

          </dependencies>

 

Example :

Following example illustrates about Read the data from Hbase using spark and scala

Save the file as −  Runner.scala.

Hbase Table Name : employee

            

     <row>,<colfamily:colname>,<value> ,<value>

     cf:,cf:empId,cf:empName,cf:location,cf:salary

     001:Ram,001,Ram,India,50000

     002:Rajesh,002,Rajesh,USA,70000

     003:xingpang,003,xingpang,China,50000

     004:Yishun,004,Yishun,Singapore,60000

     005:smita,005,smita,India,70000

     006:swetha,006,swetha,India,90000

     006:Archana,006,Archana,India,90000

     007:Mukhtar,07,Mukhtar,India,70000  

File Name: Runner.scala

 

 package com.runnerdev

 

import org.apache.spark.SparkContext

import org.apache.spark.SparkConf

import it.nerdammer.spark.hbase._

object Runner {

 

  def main(args: Array[String]): Unit = {

    try {

      var sc: SparkContext = null

      // hard code properties

      var master = "local[*]"

      var ZOOKEEPER_Host = "localhost"

      var ZOOKEEPER_Port = "2189"

      var appName = "myAppName"

 

      val sparkConf = new SparkConf().setAppName(appName).setMaster(master)

      sparkConf.set("spark.hbase.host", ZOOKEEPER_Host)

      sparkConf.set("spark.hbase.clientport", ZOOKEEPER_Port)

      sc = new SparkContext(sparkConf)

      val empList: Array[Employee] = readEmployeeList(sc)

      println("empList::" + empList)

      empList.foreach(println(_))

    } catch {

      case e: Exception => { println("Exception", e.printStackTrace()) }

    }

  }

 

  def readEmployeeList(sc: SparkContext): Array[Employee] = {

    val employeeList = sc.hbaseTable[(String, Option[String], Option[String], Option[String])]("Employee")

      .select("empId", "empName","location")

      .inColumnFamily("cf")

      .map(a => Employee(a._1, a._2.getOrElse(""), a._3.getOrElse(""), a._4.getOrElse("")))

      .collect()

    return employeeList

  }

 

  case class Employee(rowKey: String, empId: String, empName: String, location: String)

 

}

 


Run above program as follows

Mvn clean install

Run as a Scala application

Out put:

empList::[Lcom.runnerdev.Runner$Employee;@750ff7d3

Employee(001:Manoj,001,,Mumbai)

Employee(001:Ram,001,,India)

Employee(002:Rajesh,002,,USA)

Employee(003:xingpang,003,,China)

Employee(004:Yishun,004,,Singapore)

Employee(005:smita,005,,India)

Employee(006:Archana,006,,India)

Employee(006:swetha,006,,India)

Employee(007:Mukhtar,07,,India)

Search
Sub-Categories
Related Articles

Leave a Comment: