cloudera hadoop mapreduce job GC overhead limit exceeded error
我正在cloudera cdh4上运行树冠群集作业(使用mahout)。要群集的内容大约有100万条记录(每条记录的大小均小于1k)。整个hadoop环境(包括所有节点)都在具有4G内存的vm中运行。默认情况下,将安装cdh4。运行作业时出现以下异常。
根据异常,看来作业客户端应该需要更大的jvm堆大小。但是,cloudera manager中有许多用于jvm堆大小的配置选项。我将"客户端Java堆大小(以字节为单位)"从256MiB更改为512MiB。但是,它并没有改善。
是否有设置这些堆大小选项的提示?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | 13/07/03 17:12:45 INFO input.FileInputFormat: Total input paths to process : 1 13/07/03 17:12:46 INFO mapred.JobClient: Running job: job_201307031710_0001 13/07/03 17:12:47 INFO mapred.JobClient: map 0% reduce 0% 13/07/03 17:13:06 INFO mapred.JobClient: map 1% reduce 0% 13/07/03 17:13:27 INFO mapred.JobClient: map 2% reduce 0% 13/07/03 17:14:01 INFO mapred.JobClient: map 3% reduce 0% 13/07/03 17:14:50 INFO mapred.JobClient: map 4% reduce 0% 13/07/03 17:15:50 INFO mapred.JobClient: map 5% reduce 0% 13/07/03 17:17:06 INFO mapred.JobClient: map 6% reduce 0% 13/07/03 17:18:44 INFO mapred.JobClient: map 7% reduce 0% 13/07/03 17:20:24 INFO mapred.JobClient: map 8% reduce 0% 13/07/03 17:22:20 INFO mapred.JobClient: map 9% reduce 0% 13/07/03 17:25:00 INFO mapred.JobClient: map 10% reduce 0% 13/07/03 17:28:08 INFO mapred.JobClient: map 11% reduce 0% 13/07/03 17:31:46 INFO mapred.JobClient: map 12% reduce 0% 13/07/03 17:35:57 INFO mapred.JobClient: map 13% reduce 0% 13/07/03 17:40:52 INFO mapred.JobClient: map 14% reduce 0% 13/07/03 17:46:55 INFO mapred.JobClient: map 15% reduce 0% 13/07/03 17:55:02 INFO mapred.JobClient: map 16% reduce 0% 13/07/03 18:08:42 INFO mapred.JobClient: map 17% reduce 0% 13/07/03 18:59:11 INFO mapred.JobClient: map 8% reduce 0% 13/07/03 18:59:13 INFO mapred.JobClient: Task Id : attempt_201307031710_0001_m_000001_0, Status : FAILED Error: GC overhead limit exceeded 13/07/03 18:59:23 INFO mapred.JobClient: map 9% reduce 0% 13/07/03 19:00:09 INFO mapred.JobClient: map 10% reduce 0% 13/07/03 19:01:49 INFO mapred.JobClient: map 11% reduce 0% 13/07/03 19:04:25 INFO mapred.JobClient: map 12% reduce 0% 13/07/03 19:07:48 INFO mapred.JobClient: map 13% reduce 0% 13/07/03 19:12:48 INFO mapred.JobClient: map 14% reduce 0% 13/07/03 19:19:46 INFO mapred.JobClient: map 15% reduce 0% 13/07/03 19:29:05 INFO mapred.JobClient: map 16% reduce 0% 13/07/03 19:43:43 INFO mapred.JobClient: map 17% reduce 0% 13/07/03 20:49:36 INFO mapred.JobClient: map 8% reduce 0% 13/07/03 20:49:38 INFO mapred.JobClient: Task Id : attempt_201307031710_0001_m_000001_1, Status : FAILED Error: GC overhead limit exceeded 13/07/03 20:49:48 INFO mapred.JobClient: map 9% reduce 0% 13/07/03 20:50:31 INFO mapred.JobClient: map 10% reduce 0% 13/07/03 20:52:08 INFO mapred.JobClient: map 11% reduce 0% 13/07/03 20:54:38 INFO mapred.JobClient: map 12% reduce 0% 13/07/03 20:58:01 INFO mapred.JobClient: map 13% reduce 0% 13/07/03 21:03:01 INFO mapred.JobClient: map 14% reduce 0% 13/07/03 21:10:10 INFO mapred.JobClient: map 15% reduce 0% 13/07/03 21:19:54 INFO mapred.JobClient: map 16% reduce 0% 13/07/03 21:31:35 INFO mapred.JobClient: map 8% reduce 0% 13/07/03 21:31:37 INFO mapred.JobClient: Task Id : attempt_201307031710_0001_m_000000_0, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250) Caused by: java.io.IOException: Task process exit with nonzero status of 65. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237) 13/07/03 21:32:09 INFO mapred.JobClient: map 9% reduce 0% 13/07/03 21:33:31 INFO mapred.JobClient: map 10% reduce 0% 13/07/03 21:35:42 INFO mapred.JobClient: map 11% reduce 0% 13/07/03 21:38:41 INFO mapred.JobClient: map 12% reduce 0% 13/07/03 21:42:27 INFO mapred.JobClient: map 13% reduce 0% 13/07/03 21:48:20 INFO mapred.JobClient: map 14% reduce 0% 13/07/03 21:56:12 INFO mapred.JobClient: map 15% reduce 0% 13/07/03 22:07:20 INFO mapred.JobClient: map 16% reduce 0% 13/07/03 22:26:36 INFO mapred.JobClient: map 17% reduce 0% 13/07/03 23:35:30 INFO mapred.JobClient: map 8% reduce 0% 13/07/03 23:35:32 INFO mapred.JobClient: Task Id : attempt_201307031710_0001_m_000000_1, Status : FAILED Error: GC overhead limit exceeded 13/07/03 23:35:42 INFO mapred.JobClient: map 9% reduce 0% 13/07/03 23:36:16 INFO mapred.JobClient: map 10% reduce 0% 13/07/03 23:38:01 INFO mapred.JobClient: map 11% reduce 0% 13/07/03 23:40:47 INFO mapred.JobClient: map 12% reduce 0% 13/07/03 23:44:44 INFO mapred.JobClient: map 13% reduce 0% 13/07/03 23:50:42 INFO mapred.JobClient: map 14% reduce 0% 13/07/03 23:58:58 INFO mapred.JobClient: map 15% reduce 0% 13/07/04 00:10:22 INFO mapred.JobClient: map 16% reduce 0% 13/07/04 00:21:38 INFO mapred.JobClient: map 7% reduce 0% 13/07/04 00:21:40 INFO mapred.JobClient: Task Id : attempt_201307031710_0001_m_000001_2, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250) Caused by: java.io.IOException: Task process exit with nonzero status of 65. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237) 13/07/04 00:21:50 INFO mapred.JobClient: map 8% reduce 0% 13/07/04 00:22:27 INFO mapred.JobClient: map 9% reduce 0% 13/07/04 00:23:52 INFO mapred.JobClient: map 10% reduce 0% 13/07/04 00:26:00 INFO mapred.JobClient: map 11% reduce 0% 13/07/04 00:28:47 INFO mapred.JobClient: map 12% reduce 0% 13/07/04 00:32:17 INFO mapred.JobClient: map 13% reduce 0% 13/07/04 00:37:34 INFO mapred.JobClient: map 14% reduce 0% 13/07/04 00:44:30 INFO mapred.JobClient: map 15% reduce 0% 13/07/04 00:54:28 INFO mapred.JobClient: map 16% reduce 0% 13/07/04 01:16:30 INFO mapred.JobClient: map 17% reduce 0% 13/07/04 01:32:05 INFO mapred.JobClient: map 8% reduce 0% 13/07/04 01:32:08 INFO mapred.JobClient: Task Id : attempt_201307031710_0001_m_000000_2, Status : FAILED Error: GC overhead limit exceeded 13/07/04 01:32:21 INFO mapred.JobClient: map 9% reduce 0% 13/07/04 01:33:26 INFO mapred.JobClient: map 10% reduce 0% 13/07/04 01:35:37 INFO mapred.JobClient: map 11% reduce 0% 13/07/04 01:38:48 INFO mapred.JobClient: map 12% reduce 0% 13/07/04 01:43:06 INFO mapred.JobClient: map 13% reduce 0% 13/07/04 01:49:58 INFO mapred.JobClient: map 14% reduce 0% 13/07/04 01:59:07 INFO mapred.JobClient: map 15% reduce 0% 13/07/04 02:12:00 INFO mapred.JobClient: map 16% reduce 0% 13/07/04 02:37:56 INFO mapred.JobClient: map 17% reduce 0% 13/07/04 03:31:55 INFO mapred.JobClient: map 8% reduce 0% 13/07/04 03:32:00 INFO mapred.JobClient: Job complete: job_201307031710_0001 13/07/04 03:32:00 INFO mapred.JobClient: Counters: 7 13/07/04 03:32:00 INFO mapred.JobClient: Job Counters 13/07/04 03:32:00 INFO mapred.JobClient: Failed map tasks=1 13/07/04 03:32:00 INFO mapred.JobClient: Launched map tasks=8 13/07/04 03:32:00 INFO mapred.JobClient: Data-local map tasks=8 13/07/04 03:32:00 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=11443502 13/07/04 03:32:00 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0 13/07/04 03:32:00 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/04 03:32:00 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 Exception in thread"main" java.lang.RuntimeException: java.lang.InterruptedException: Canopy Job failed processing vector |
Mahout作业占用大量内存。我不知道映射器还是缩减器是罪魁祸首,但是无论哪种方式,您都必须告诉Hadoop为它们提供更多的RAM。"超出GC开销上限"只是说"内存不足"的一种方式-意味着JVM放弃了尝试回收可用RAM的最后0.01%的操作。
设置方式的确确实有些复杂,因为Hadoop 2中有多个属性并且它们已更改。CDH4可以支持Hadoop 1或2-您使用的是哪个?
如果我不得不猜测:将
您需要更改Hadoop的内存设置,因为分配给Hadoop的内存不足以满足您正在运行的作业要求,请尝试增加堆内存并进行验证,因为过度使用内存可能会导致操作系统死机作业失败的进程。