Intellij jar for map reduce

2/13/2023

Since I want to run the job on AWS EMR I make sure I have a matching Hadoop version. The Hadoop dependency is necessary to make use of the Hadoop classes in my MapReduce job. I modified the default pom to add the necessary plugins and dependencies: Like I said before I use a Maven project for this so I created a new empty Maven project in my IDE, IntelliJ. Next step is creating the Java code for the MapReduce job. I downloaded a few files in different languages and put them together in one file (Hadoop is better to process one large file than multiple small ones). The input dictionaries for the job is taken from here. So the job will read dictionaries of different languages and match each English word with a translation in another language. In this case we need a CSV file with English words from a dictionary and all translations in other languages added to it, separated by a ‘|’ symbol. Lets start with a fictional business case.

In this post I show how to create a MapReduce job in Java based on a Maven project like any other Java project. Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages.

0 Comments

Intellij jar for map reduce

Leave a Reply.

Author

Archives

Categories