Implemented a MapReduce framework with a streamlined API from first principles.
Kartik Mahaley, Shakti Patro, Pankaj Tripathi, Chen Bai
1.0
Our program consist of mapreduce framework which can be used to write your own mapreduce code. Phases of our code is
- Creating and running Amazon Web Services EC2 instances
- Read data and develope mapper
- Shuffle and sort
- Develop reducer
1. Source code
2. Config.properties and log4j.properties
3. README.txt
4. Project Report.pdf
5. Makefile
6. Examples Folder with source code, input and output files
7. Project Report.rmd
1. Mac/Linux machine with 8GB RAM
2. Java 1.7
3. Apache Maven 3.3
4. Any IDE example eclipse
- Add jar to classpath or add it to pom.xml
For adding it to pom, you can use following command
make addmaven - Implement Mapper and Reducer class.
- Run app's run method
Our program requires Amazon Web Services credentials(secret key, access key, region name, pem keys, bucket name etc. Before using our framework please create AWS account with all these details. Please fill all the details in config.properties file which is read by our mapreduce framework.
1) Please don't change ipfile name in this config.properties file.
2) Make sure output folder exist in s3 bucket and is empty
3) Fill config.properties file a mentioned below:
4) Make sure the EC2 instances are available in specified region for which you have pem keys
key=<SECRET KEY>
password=<ACCESS KEY>
bucket=<BUCKET NAME>
inputfile=<INPUT FILE FOLDER>
ipfile=ipaddress.txt
output=<OUTPUT FOLDER NAME>
action=<START/STOP>
instancenumber=<NUMBER OF INSTANCES>
pem=<PEM KEYS>
instancetype=<TYPE OF INSTANCE>
pemKeysPath=<PEM KEY PATH>
jarPath=<JAR NAME>
For example,
key=A***************
password=A**************************
bucket=mybucketname
inputfile=myfiles/input/
ipfile=ipaddress.txt
output=output
action=start
instancenumber=9
pem=pemkeyname
instancetype=t2.micro
pemKeysPath=/home/user/Documents/
jarPath=node.jar
- Through terminal goto folder where jar, log4j.properties, config.properties are present
- Specify action=start, instancetype and instancenumber in config.properties and execute jar as java -jar jarname.jar
- To stop instances, specify action=stop in config.properties and execute jar as java -jar jarname.jar
-- If you want to create your own jar you can use command make buildjar