直接放问题异常
hadoop jar /opt/module/hadoop-3.3.2/share/hadoop/tools/lib/hadoop-streaming-3.3.2.jar \
-D mapreduce.map.memory.mb=100 \
-D mapreduce.reduce.memory.mb=100 \
-D mapred.map.tasks=1 \
-D stream.num.map.output.key.fields=2 \
-D num.key.fields.for.partition=1 \
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
-mapper "mapper.py" \
-reducer "reduce.py" \
-file mapper.py \
-file reduce.py \
-numReduceTasks 1 \
-input "hdfs://hacluster/hdfs/python" \
-output "hdfs://hacluster/user/hive/warehouse/python/01"
解决方案如下:
代码调整为
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.2.jar \
-D mapred.map.tasks=1 \
-D stream.num.map.output.key.fields=2 \
-D num.key.fields.for.partition=1 \
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
-file mapper.py \
-mapper "python3 mapper.py" \
-file reduce.py \
-reducer "python3 reduce.py" \
-numReduceTasks 1 \
-input "hdfs://hacluster/hdfs/python" \
-output "hdfs://hacluster/user/hive/warehouse/python/03"