source link : http://yaseminavcular.blogspot.com/2011/03/hadoop-java-heap-space-error.html
"Error: Java Heap space" means I'm trying to allocate more memory then available in the system.
how to go around? (1) better configuration (2) look for unnecessarily allocated objects
Configuration
mapred.map.child.java.opts : heap size for map tasks
mapred.reduce.child.java.opts: heap size for reduce tasks
mapred.tasktracker.map.tasks.maximum: max map tasks can run simultaneously per node
mapred.tasktracker.reduce.tasks.maximum: max reduce tasks can run simultaneously per node
Make sure ((num_of_maps * map_heap_size) + (num_of_reducers * reduce_heap_size)) is not larger than memory available in the system. Max number of mappers & reducers can also be tuned looking at available system resources.
io.sort.factor: max # of streams to merge at once for sorting. Used both in map and reduce.
io.sort.mb: map side memory buffer size used while sorting
mapred.job.shuffle.input.buffer.percent: Reduce side buffer related - The percentage of memory to be allocated from the maximum heap size for storing map outputs during the shuffle
NOTE: Using fs.inmemory.size.mb is very bad idea!
Unnecessary memory allocation
Simply look for new keyword and make sure there is no unnecessary allocation. A very common tip is using set() method of Writable objects rather than re-allocating a new object at every map or reduce.
Here is a simple count example to show the trick:
public static class UrlReducer extends Reducer{
IntWritable sumw = new IntWritable();
int sum;
public void reduce(Text key,Iterable
sum=0;
for (IntWritable val : vals) {
sum += val.get();
}
sumw.set(sum);
context.write(key, sumw);
}
}
note: There are couple more tips here for resolving common errors in Hadoop.