关于java:运行hadoop流和mapreduce作业:PipeMapRed.waitOutputThreads():子进程失败,代码为127

Running a hadoop streaming and mapreduce job: PipeMapRed.waitOutputThreads() : subprocess failed with code 127

我正在尝试在自己的Hadoop cluser上运行它。我使用以下命令运行该作业。

1
hadoop jar hadoop-streaming-3.1.0.jar -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py -input wiki.xml -output output4

但是出现以下错误!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
2018-10-20 16:05:50,021 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [mapper.py, reducer.py, /tmp/hadoop-unjar707072106784045009/] [] /tmp/streamjob4878270244056389381.jar tmpDir=null
2018-10-20 16:05:51,845 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
2018-10-20 16:05:52,512 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
2018-10-20 16:05:53,503 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/anubhav/.staging/job_1540029454250_0014
2018-10-20 16:05:56,044 INFO mapred.FileInputFormat: Total input files to process : 1
2018-10-20 16:05:56,431 INFO mapreduce.JobSubmitter: number of splits:2
2018-10-20 16:05:56,496 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-10-20 16:05:56,686 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540029454250_0014
2018-10-20 16:05:56,688 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-10-20 16:05:57,125 INFO conf.Configuration: resource-types.xml not found
2018-10-20 16:05:57,125 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-10-20 16:05:57,550 INFO impl.YarnClientImpl: Submitted application application_1540029454250_0014
2018-10-20 16:05:57,627 INFO mapreduce.Job: The url to track the job: http://anubhav-Inspiron-3542:8088/proxy/application_1540029454250_0014/
2018-10-20 16:05:57,629 INFO mapreduce.Job: Running job: job_1540029454250_0014
2018-10-20 16:06:07,874 INFO mapreduce.Job: Job job_1540029454250_0014 running in uber mode : false
2018-10-20 16:06:07,890 INFO mapreduce.Job:  map 0% reduce 0%
2018-10-20 16:06:16,052 INFO mapreduce.Job: Task Id : attempt_1540029454250_0014_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2018-10-20 16:06:16,079 INFO mapreduce.Job: Task Id : attempt_1540029454250_0014_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2018-10-20 16:06:26,193 INFO mapreduce.Job: Task Id : attempt_1540029454250_0014_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2018-10-20 16:06:27,203 INFO mapreduce.Job: Task Id : attempt_1540029454250_0014_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2018-10-20 16:06:37,310 INFO mapreduce.Job: Task Id : attempt_1540029454250_0014_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2018-10-20 16:06:37,314 INFO mapreduce.Job: Task Id : attempt_1540029454250_0014_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2018-10-20 16:06:49,429 INFO mapreduce.Job:  map 100% reduce 100%
2018-10-20 16:06:51,458 INFO mapreduce.Job: Job job_1540029454250_0014 failed with state FAILED due to: Task failed task_1540029454250_0014_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0

2018-10-20 16:06:51,571 INFO mapreduce.Job: Counters: 14
    Job Counters
        Failed map tasks=7
        Killed map tasks=1
        Killed reduce tasks=1
        Launched map tasks=8
        Other local map tasks=6
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=105898
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=52949
        Total vcore-milliseconds taken by all map tasks=52949
        Total megabyte-milliseconds taken by all map tasks=162659328
    Map-Reduce Framework
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
2018-10-20 16:06:51,571 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

我还添加了
#! /usr/bin/python
在我的两个文件mapper.py和reducer.py

的开头


如果您使用python3:

1
#!/usr/bin/env python3

最后,我通过更改

解决了这个问题

1
#! /usr/bin/python

1
#!/usr/bin/env python

您可能正在python 2中运行python 3命令(例如字符串格式)。