Running a hadoop streaming and mapreduce job: PipeMapRed.waitOutputThreads() : subprocess failed with code 127
我正在尝试在自己的Hadoop cluser上运行它。我使用以下命令运行该作业。
1 | hadoop jar hadoop-streaming-3.1.0.jar -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py -input wiki.xml -output output4 |
但是出现以下错误!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | 2018-10-20 16:05:50,021 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead. packageJobJar: [mapper.py, reducer.py, /tmp/hadoop-unjar707072106784045009/] [] /tmp/streamjob4878270244056389381.jar tmpDir=null 2018-10-20 16:05:51,845 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032 2018-10-20 16:05:52,512 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032 2018-10-20 16:05:53,503 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/anubhav/.staging/job_1540029454250_0014 2018-10-20 16:05:56,044 INFO mapred.FileInputFormat: Total input files to process : 1 2018-10-20 16:05:56,431 INFO mapreduce.JobSubmitter: number of splits:2 2018-10-20 16:05:56,496 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-10-20 16:05:56,686 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540029454250_0014 2018-10-20 16:05:56,688 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2018-10-20 16:05:57,125 INFO conf.Configuration: resource-types.xml not found 2018-10-20 16:05:57,125 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2018-10-20 16:05:57,550 INFO impl.YarnClientImpl: Submitted application application_1540029454250_0014 2018-10-20 16:05:57,627 INFO mapreduce.Job: The url to track the job: http://anubhav-Inspiron-3542:8088/proxy/application_1540029454250_0014/ 2018-10-20 16:05:57,629 INFO mapreduce.Job: Running job: job_1540029454250_0014 2018-10-20 16:06:07,874 INFO mapreduce.Job: Job job_1540029454250_0014 running in uber mode : false 2018-10-20 16:06:07,890 INFO mapreduce.Job: map 0% reduce 0% 2018-10-20 16:06:16,052 INFO mapreduce.Job: Task Id : attempt_1540029454250_0014_m_000000_0, Status : FAILED Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) 2018-10-20 16:06:16,079 INFO mapreduce.Job: Task Id : attempt_1540029454250_0014_m_000001_0, Status : FAILED Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) 2018-10-20 16:06:26,193 INFO mapreduce.Job: Task Id : attempt_1540029454250_0014_m_000000_1, Status : FAILED Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) 2018-10-20 16:06:27,203 INFO mapreduce.Job: Task Id : attempt_1540029454250_0014_m_000001_1, Status : FAILED Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) 2018-10-20 16:06:37,310 INFO mapreduce.Job: Task Id : attempt_1540029454250_0014_m_000000_2, Status : FAILED Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) 2018-10-20 16:06:37,314 INFO mapreduce.Job: Task Id : attempt_1540029454250_0014_m_000001_2, Status : FAILED Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) 2018-10-20 16:06:49,429 INFO mapreduce.Job: map 100% reduce 100% 2018-10-20 16:06:51,458 INFO mapreduce.Job: Job job_1540029454250_0014 failed with state FAILED due to: Task failed task_1540029454250_0014_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0 2018-10-20 16:06:51,571 INFO mapreduce.Job: Counters: 14 Job Counters Failed map tasks=7 Killed map tasks=1 Killed reduce tasks=1 Launched map tasks=8 Other local map tasks=6 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=105898 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=52949 Total vcore-milliseconds taken by all map tasks=52949 Total megabyte-milliseconds taken by all map tasks=162659328 Map-Reduce Framework CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 2018-10-20 16:06:51,571 ERROR streaming.StreamJob: Job not successful! Streaming Command Failed! |
我还添加了
在我的两个文件mapper.py和reducer.py
的开头
如果您使用python3:
1 | #!/usr/bin/env python3 |
最后,我通过更改
解决了这个问题
1 | #! /usr/bin/python |
至
1 | #!/usr/bin/env python |
您可能正在python 2中运行python 3命令(例如字符串格式)。