How to enable YARN MapReduce jobs with ofs:// output? #9146
-
|
Hey Ozone team, We're working on integrating Apache Ozone with our existing Hadoop YARN cluster. We've made good progress getting basic Ozone operations working + distcp between hdfs and ozone, but YARN jobs are failing with a Ratis network error that we can't figure out. Any guidance would be hugely appreciated! Details below: Environment
GoalRun MapReduce jobs that read from HDFS and write to Ozone: hadoop jar hadoop-mapreduce-examples-*.jar wordcount \
hdfs://dev3/user/test/input.txt \
ofs://dev3/user/test/outputCurrent SetupClient Environment (before job submission):cp /etc/ozone/ozone-site.xml /etc/hadoop/conf.client/
export HADOOP_CONF_DIR=/etc/hadoop/conf.client/
export YARN_CONF_DIR=$HADOOP_CONF_DIR
export YARN_USER_CLASSPATH_FIRST=true
export HADOOP_CLASSPATH=/usr/share/ozone/share/ozone/lib/*.jarAll YARN Nodes (ResourceManager + NodeManagers):JARs copied to
Configuration:
<property>
<name>fs.ofs.impl</name>
<value>org.apache.hadoop.fs.ozone.RootedOzoneFileSystem</value>
</property>Job Submission:hadoop jar hadoop-mapreduce-examples-*.jar wordcount \
-Dmapreduce.job.user.classpath.first=true \
hdfs://dev3/user/test/input.txt \
ofs://dev3/user/test/outputErrorJob fails immediately with: Client output: Application Master syslog (from YARN UI): Questions
Additional Info
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
|
Thanks @silvanias for trying Ozone. For Hadoop, YARN and most ecosystem applications, please use only For using OFS from FileContext API, the following config is required in <property>
<name>fs.AbstractFileSystem.ofs.impl</name>
<value>org.apache.hadoop.fs.ozone.RootedOzFs</value>
</property>Connection to OM (9862), SCM (9863) and Ozone Datanodes (9855/9858/9859) are minimally required. I hope I have not missed anything. |
Beta Was this translation helpful? Give feedback.
-
|
Not sure if jumping on someone else's post is bad etiquette, but I'm encountering similar problems. Accessing ozone from the hadoop fs cli works fine. When I go to launch a job, YARN can see the input directory and fails if the output directory exists, so it seems good to go with config and jars. But the mapper dies right away, saying it doesn't know the This is a proof of concept Ozone 2.0.0/Hadoop 3.4.2 cluster with no HDFS, fs.defaultFS is set to Anything I should be digging into? |
Beta Was this translation helpful? Give feedback.
-
|
I am closing this discussion as @adoroszlai's answer was accepted by @silvanias. @mdellabitta could you raise this as a different discussion to get better visibility on the issue? |
Beta Was this translation helpful? Give feedback.
Thanks @silvanias for trying Ozone.
For Hadoop, YARN and most ecosystem applications, please use only
ozone-filesystem-hadoop3-2.0.0.jar. It is a fat jar, contains all Ozone client components, as well as dependencies for using Ozone in such an environment. Adding other Ozone jars will likely cause problems due to duplicate classes.For using OFS from FileContext API, the following config is required in
core-site.xml:Connection to OM (9862), SCM (9863) and Ozone Datanodes (9855/9858/9859) are minimally required. I hope I have not missed anything.