Capping simultaneous tasks in Hadoop

Fair Scheduler Pools Screenshot

We’ve run into several situations in Hadoop where we want to prevent a job from using more than a certain number of slots. Some of our jobs have external resources that don’t scale. One task needs to talk to a MySQL database. Another writes to our Solr cluster. These are jobs that we know beyond a certain point they don’t go any faster – if we have 200 mappers running, it’s not any faster than 50. We moved to the fair scheduler partially to alleviate some of these concerns. The idea was if multiple jobs are running at once, they aren’t likely to be the same type of job.

The other day I ran into a problem again and decided to take a look around to see if anyone had done anything in this direction. The first issue was HADOOP-5170 which ended with a consensus that the functionality should be in the scheduler, not part of Map Reduce proper. MAPREDUCE-698 is to add a per-pool simultaneous tasks cap to the Fair Scheduler, which is a much better idea than to cap it on the job level.

If your jobs rely on external services like a database or web service, you can run those jobs in a particular pool. If you have two jobs in this pool, then they will share the cap, and the load on your database remains constant. Also, these tasks can be assigned a set minimum on their pool to ensure that you don’t have the database sitting there idle, and then have half your hadoop cluster sitting idle later when you are waiting for these jobs to finish.

If your jobs have very long-running tasks, like when building a Lucene index in a reducer, you may want to avoid having these jobs grab slots during gaps when there are no jobs running. I see this frequently when one job finishes, and in the time before the dependent job starts up, all the slots have been taken by another job. Without preemption, you can end up increasing latency a lot.