Category : Hadoop | Sub Category : Hadoop Concepts | By Prasad Bonam Last updated: 2023-07-12 05:20:03 Viewed : 670
Example of using MapReduce in Java:
Here is an example of using MapReduce in Java:
Lets consider a simple word count example where we want to count the occurrences of each word in a given text file.
Mapper Class:
javaimport java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] words = line.split("s+");
for (String word : words) {
this.word.set(word);
context.write(this.word, one);
}
}
}
Reducer Class:
javaimport java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}
Driver Class:
javaimport java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
if (args.length != 2) {
System.err.println("Usage: WordCount <input path> <output path>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(WordCount.class);
job.setJobName("Word Count");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
In this example, the WordCountMapper
class extends Mapper
and overrides the map()
method to split the input text into words and emit each word with a count of 1.
The WordCountReducer
class extends Reducer
and overrides the reduce()
method to receive the words emitted by the mapper, sum up the counts, and emit the final count for each word.
The WordCount
class serves as the driver class, where the configuration and execution of the MapReduce job take place. It sets the input and output paths, specifies the mapper and reducer classes, and waits for the job to complete.
To run this example, you need to package the classes into a JAR file and submit it to your Hadoop cluster using the hadoop jar
command, providing the input file path and output directory path as command-line arguments.
Make sure to have Hadoop installed and configured properly on your system.
This is a basic example to illustrate the MapReduce framework. You can further explore advanced concepts such as combiners, custom data types, and partitioners to enhance the MapReduce process based on your specific requirements.