October 2009 Archives

Posner explains CYA security theater

| No Comments | No TrackBacks

It's obvious to any rational outside observer that US terrorism policy mostly revolves around making sure people think politicians are "doing something", regardless of whether something needs to be done, or whether what they're doing is the right thing. Explaining the work for which Williamson won the Nobel last week, Judge Posner writes:

[FBI criminal-investigation functions] lend themselves to what are called "high-powered" incentives, which are systems of compensation and promotion that are based on objective performance criteria. In the case of criminal investigation these are number of arrests weighted by convictions and sentence. Intelligence work does not lend itself to such performance criteria, because the effect of surveillance and other intelligence activities in preventing terrorism or subversion is usually very difficult to assess. Hence motivation takes the form of creating a "high commitment" environment in which the organization's leaders try to elicit good performance by getting staff to internalize the organization's goals. The problem is that the absence of objective criteria of performance opens the door to "influence activities" by which members of the organization jockey for advancement.

If both types of task are combined in the same organization--those that can be directed by high-powered incentives and those that require high commitment as their motivator, the best employees will tend to gravitate toward the first type of task because they will be confident that they will do well if their performance is judged according to objective criteria. They will be much less certain how well they will do in a job in which influence activities play a large role in determining success.

To summarize the summary, the best and the brightest will be drawn to organizations that have objective measures of success, but even more so, within a given organization, they will be drawn to these types of roles. Those who aren't very good, and especially those who can be political hacks who shamelessly talk about how the threat level is Orange today, so put on extra sunscreen, will drawn to those roles without objective measures of success, where climbing the career ladder is based on criteria other than doing the job better than the next guy.

Check out the whole article

Capping simultaneous tasks in Hadoop

| No Comments | No TrackBacks

Fair Scheduler Pools Screenshot

We've run into several situations in Hadoop where we want to prevent a job from using more than a certain number of slots. Some of our jobs have external resources that don't scale. One task needs to talk to a MySQL database. Another writes to our Solr cluster. These are jobs that we know beyond a certain point they don't go any faster -- if we have 200 mappers running, it's not any faster than 50. We moved to the fair scheduler partially to alleviate some of these concerns. The idea was if multiple jobs are running at once, they aren't likely to be the same type of job.

The other day I ran into a problem again and decided to take a look around to see if anyone had done anything in this direction. The first issue was HADOOP-5170 which ended with a consensus that the functionality should be in the scheduler, not part of Map Reduce proper. MAPREDUCE-698 is to add a per-pool simultaneous tasks cap to the Fair Scheduler, which is a much better idea than to cap it on the job level.

If your jobs rely on external services like a database or web service, you can run those jobs in a particular pool. If you have two jobs in this pool, then they will share the cap, and the load on your database remains constant. Also, these tasks can be assigned a set minimum on their pool to ensure that you don't have the database sitting there idle, and then have half your hadoop cluster sitting idle later when you are waiting for these jobs to finish.

If your jobs have very long-running tasks, like when building a Lucene index in a reducer, you may want to avoid having these jobs grab slots during gaps when there are no jobs running. I see this frequently when one job finishes, and in the time before the dependent job starts up, all the slots have been taken by another job. Without preemption, you can end up increasing latency a lot.

About this Archive

This page is an archive of entries from October 2009 listed from newest to oldest.

September 2009 is the previous archive.

November 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.