Kevin Dempsey Petersontag:kdpeterson.net,2009-04-16:/blog/22010-12-20T06:05:57ZKevin Petersonbender.js - there's no excuse for sidebars on my phone (anymore)http://kdpeterson.net/blog/2011/08/mobile-stylesheets-with-bender-js2011-08-16T00:00:00-07:00
<p>I’m tired of seeing sidebars on my phone. It’s great that my phone can handle every page on the web, and I can muddle through, but it’s been four years since the iPhone killed the “mobile web” distinction and it’s time for web design to catch up. I can’t do anything about ugly design, but I can at least make it dead simple to show your mobile users a <em>different</em> ugly stylesheet.</p>
<p>There are a lot of proposed solutions out there. What I’m looking for in a good solution is:</p>
<ul>
<li>No redirects. The URL stays the same so that sharing a link from your phone goes to the same place as if you shared it from your desktop.</li>
<li>Purely client-side. My blog is a static site generated by Jekyll, and I want it to stay that way.</li>
<li>Default is only the default. A user should be able to select what version they want to see.</li>
<li>Persistent per device on every page on my site.</li>
<li>Attempt to detect a mobile browser.</li>
</ul>
<p>I’m looking for a solution for informational websites, where a user doesn’t start at the login page, might want to bookmark or share the link, and might or might not visit other pages. That is, it’s a solution for the original web, the world of hypertext and pagerank. These are pages where you can talk about presentation separate from the content. There are a couple of solutions that are almost but not quite exactly what I’m looking for.</p>
<p><a href='http://en.m.wikipedia.org/'>Wikipedia</a> has a fantastic mobile site. Only the first section of an article is shown until you click on a section. They detect and redirect, but include “view this page on regular Wikipedia” link in the footer. The biggest downside of this is that the URL changes. If someone shares a link from a mobile device, they will send you to the mobile wikipedia. Unfortunately, the link “permanently disable mobile site” doesn’t do what it says – it only disables redirection from regular wikipedia to mobile. If the fixed the link-sharing problem, I’d call this a perfect user-experience, but requires smarts on the server side.</p>
<p>The <a href='http://www.alistapart.com/articles/return-of-the-mobile-stylesheet'>solution recommended by A List Apart</a>, is a set of techniques for applying the right styles based on the “media” attribute and using CSS 3 media queries for devices that prefer the stylesheet for “screen”. The problem with this solution is that people think because it’s “standards compliant”, then they won’t guess wrong or have any need for the user to want to switch. The details of these solutions quickly betray any claim to standards compliance. Once you are attempting to detect an iPhone using a combination of device width and pixel density, you aren’t writing to the standards, you’re attempting browser detection, which means in the best case you have to decide whether a Dell Streak 5 is a phone or a tablet, and in the more troublesome case, you break horribly on any new device you haven’t anticipated. With CSS, unlike Wikipedia’s redirect or client-side Javascript, you can’t correct an incorrect default.</p>
<p>There other solutions out there for highly interactive sites like a banks, ecommerce, or webmail. These kinds of sites are not hurt by doing a redirect at the login page and may benefit from having a completely different structure. But this isn’t what I’m looking for.</p>
<p>Since I couldn’t find it, I wrote my own. <a href='https://github.com/kevinpet/bender'>Bender.js</a> is a small Javascript library to simplify switching stylesheets. You need to create separate stylesheets (“screen.css” and “mobile.css” for this example), add the links somwhere in your header or footer, and then bender does the rest.</p>
<pre><code><link rel="stylesheet" type="text/css" href="screen.css" id="style"/>
<script type="text/javascript" src="/bender-min-0.1.js"></script>
<script type="text/javascript">
$(bender("style").add("screen", "screen.css")
.add("mobile", "mobile.css", "auto-mobile").install());
</script>
...
<a id="mobile" href="#">Mobile Site</a><a id="screen" href="#">Full Site</a></code></pre>
<p>Before the page is loaded, bender will check for a cookie if the user has manually selected a stylesheet, if there’s no cookie, and a stylesheet is set as “auto-mobile”, then it will check for a mobile browser. If a cookie is set, or if it detected a mobile browser, it swaps out the stylesheet before the page loads. Once the page loads, the user can choose the other style. (This might be confusing if you are trying to read the code. Note that install() does the initial stylesheet change, and then returns a callback to run on page load.)</p>
<p>If you want more detail, check it out on <a href='https://github.com/kevinpet/bender'>github</a> or <a href='/code/bender/'>this interactive example</a>. You can also download the minified <a href='https://github.com/kevinpet/bender/archives/master'>bender-min-0.1.js</a>. You’ll also need <a href='http://docs.jquery.com/Downloading_jQuery'>jQuery</a>. I’d love to see what people can do when there’s no need to compromise on one style for all users.</p>
Font Size in Mobile Browsershttp://kdpeterson.net/blog/2011/06/font-size-in-mobile-browsers2011-06-27T00:00:00-07:00
<blockquote>
<p>If you found this on Google and just want to know how to make text readable on iPhones and Android phones, set the text size to 24pt and call it a day. For the curious, read on.</p>
</blockquote>
<p>When I redid my website <a href='/blog/2011/06/up-and-running-on-jekyll.html'>using Jekyll</a>, one of my goals was to make sure that it was usable from mobile devices like an iPhone or Android. I was incredibly annoyed when I discovered that making a simple standards-compliant site that doesn’t specify any widths or font sizes leaves the user looking for a magifying glass. Before fixing it, I wanted to do some testing. I created a <a href='/font-size-test.html'>font size test page</a> using every reasonable way to specify the font size, then I visited the page using my desktop, my netbook, an iPhone 4, an iPod touch, and a Droid 2 running Android. There were two main things I was looking for.</p>
<p>First, I wanted to see how faithful the browsers were to what are supposed to be “absolute” sizes. Points, picas, inches and millimeters have exact conversions between them. I measured the length of an em dash, supposedly the height of the bounding box for the font. Chrome and Firefox on my desktop are true to the absolute size.</p>
<p>Secondly, I wanted to see what the browsers did with “relative” font sizes. A relative font size is specified as percent or ems, where 1em = 100%. I set these relative values to match up with what I expected on my desktop and then checked what happened on the other devices. The CSS spec considers pixels a relative length, but I think most people would naturally consider it more absolute. I included this as well.</p>
<ol>
<li>All the browsers convert pixels to points at an implied 96dpi. That is, 24pt is always the same size as 32px.</li>
<li>All the browsers convert between the absolute lengths at the correct consistent ratio. 24pt = 1/3 in = 8.47mm.</li>
<li>Only my desktop was true to an actual ruler. All the other devices displayed smaller than actual size.</li>
</ol>
<table cellspacing='0'>
<tr>
<th>Browser</th>
<th>Em Dash</th>
<th>Scale</th>
</tr>
<tr class='a'>
<td>Chrome on Ubuntu 2048x1152 23" 101dpi</td>
<td>8.5mm</td>
<td>100%</td>
</tr>
<tr class='a'>
<td>Chrome on Ubuntu 1024x600 10" 118dpi</td>
<td>7.5mm</td>
<td>89%</td>
</tr>
<tr>
<td>iPhone 4 default portrait</td>
<td>1mm</td>
<td>12%</td>
</tr>
<tr>
<td>iPhone 4 auto-zoom portrait</td>
<td>2.1mm</td>
<td>25%</td>
</tr>
<tr>
<td>iPhone 4 default landscape</td>
<td>1.8mm</td>
<td>21%</td>
</tr>
<tr>
<td>iPhone 4 auto-zoom landscape</td>
<td>3.5mm</td>
<td>41%</td>
</tr>
<tr class='a'>
<td>iPod Touch default portrait</td>
<td>1.1mm</td>
<td>13%</td>
</tr>
<tr class='a'>
<td>iPod Touch auto-zoom portrait</td>
<td>2mm</td>
<td>24%</td>
</tr>
<tr class='a'>
<td>iPod Touch default landscape</td>
<td>1.9mm</td>
<td>22%</td>
</tr>
<tr class='a'>
<td>iPod Touch auto-zoom landscape</td>
<td>4mm</td>
<td>47%</td>
</tr>
<tr>
<td>Droid 2 default portrait</td>
<td>1.8mm</td>
<td>21%</td>
</tr>
<tr>
<td>Droid 2 auto-zoom portrait</td>
<td>4mm</td>
<td>47%</td>
</tr>
<tr>
<td>Droid 2 default landscape</td>
<td>2.8mm</td>
<td>33%</td>
</tr>
<tr>
<td>Droid 2 auto-zoom landscape</td>
<td>4mm</td>
<td>47%</td>
</tr>
</table>
<p>Note that some of the numbers are within the margin of error when I measure (it’s hard to get calipers around a set of pixels). It seems reasonable that the iPhone 4 and iPod Touch generation 2 use identical font sizes rather than the 10% difference I measured.</p>
<p>All of this makes sense from a historical perspective. Long-standing assumptions get embedded in too many web pages for browser-makers to allow things to break, even if those pages were built on faulty assumptions about what the spec was.</p>
<p>But back to the point, our goal is to make web pages readable, which is not just about the size of the type, but also the viewing distance – text 2mm high looks very normal on 23” monitor, but would look comically large on a phone. I measured the most convenience test subject (myself) and got the following:</p>
<table cellspacing='0'>
<tr>
<th>Environment</th>
<th>Viewing Distance</th>
<th>Implied Ideal Size</th>
</tr>
<tr>
<td>Desktop</td>
<td>70cm</td>
<td>100%</td>
</tr>
<tr>
<td>Netbook on lap</td>
<td>60cm</td>
<td>86%</td>
</tr>
<tr>
<td>Smartphone held portrait</td>
<td>35cm</td>
<td>50%</td>
</tr>
<tr>
<td>Smartphone held landscape</td>
<td>45cm</td>
<td>64%</td>
</tr>
</table>
<p>Some of this is more or less relevant. I am not a fan of extensively customizing a page for a specific browser like in the bad old days of IE-specific tweaks, so I want to build a single mobile stylesheet to represent the compromise between the mobile browsers. On the desktop side, I don’t like specifying font size at all: that should be up the user. I’m willing to do it for mobile browsers because I know that the mobile browsers are intentionally mis-rendering things in order to make crappy desktop-oriented layouts work as well as they can.</p>
<p>I want to pick a a font size to bring it up to the ideal size based on viewing distance without making any devices too big. The largest apparent size of any of the defaults is the Droid in landscape 33% / 64% gives an apparent size of 52% of what you would see on the desktop. So, if I put a font size of twice normal in my mobile stylesheet, it will look pretty good for everyone.</p>
<table cellspacing='0'>
<tr>
<th>Browser</th>
<th>Stylesheet</th>
<th>Actual</th>
<th>Ideal</th>
<th>Perceived</th>
</tr>
<tr>
<td>Chrome on Ubuntu 2048x1152 23" 101dpi</td>
<td>desktop</td>
<td>100%</td>
<td>100%</td>
<td>100%</td>
</tr>
<tr>
<td>Chrome on Ubuntu 1024x600 10" 118dpi</td>
<td>desktop</td>
<td>89%</td>
<td>86%</td>
<td>103%</td>
</tr>
<tr>
<td>iPhone 4 portrait</td>
<td>mobile</td>
<td>24%</td>
<td>50%</td>
<td>48%</td>
</tr>
<tr>
<td>iPhone 4 landscape</td>
<td>mobile</td>
<td>42%</td>
<td>64%</td>
<td>66%</td>
</tr>
<tr>
<td>iPod Touch portrait</td>
<td>mobile</td>
<td>26%</td>
<td>50%</td>
<td>52%</td>
</tr>
<tr>
<td>iPod Touch landscape</td>
<td>mobile</td>
<td>44%</td>
<td>64%</td>
<td>68%</td>
</tr>
<tr>
<td>Droid 2 portrait</td>
<td>mobile</td>
<td>42%</td>
<td>50%</td>
<td>84%</td>
</tr>
<tr>
<td>Droid 2 landscape</td>
<td>mobile</td>
<td>66%</td>
<td>64%</td>
<td>103%</td>
</tr>
</table>
<p>Those are numbers I’m pretty comfortable with. Now the remaining question is whether to specify twice normal as 24pt or 200%. I haven’t been able to find a test that shows these actually behave differently on a mobile device. On my Droid the font size selection changes both relative and absolutely specified sizes. Changing the font size on an iOS is part of the accessibility package and I doubt many people will be using it. I lean toward specifying a font size of 24pt and then using relative sizes for headings or smaller text, just to keep the distinction within my own stylesheet.</p>
<p>So now that you have your nice mobile stylesheet with a font size of 24pts, how do you get the browser to use it? Unfortunately, modern mobile browsers don’t have a medium you can enter in the stylesheet link. You could detect this server side, but as I’ll explain in a future post, I prefer my client-side solution, <a href='https://github.com/kevinpet/bender'>bender.js</a>.</p>
Up and Running on Jekyllhttp://kdpeterson.net/blog/2011/06/up-and-running-on-jekyll2011-06-03T00:00:00-07:00
<p>After a few days of work, I’ve converted to <a href='https://github.com/mojombo/jekyll/wiki'>Jekyll</a>, a blog oriented static site generator. I never felt like learning enough about Movable Type to get it to do things I wanted (like a simple layout and maybe just accepting an HTML page without trying to interpret it), so I’m very happy to be working with a set of files, in a set of directories, no less, that gets converted into a blog according to simple and clear rules.</p>
<p>It seems I couldn’t keep Google Reader from thinking everything was a new post, unfortunately, but other than that things went smoothly. I’ve kept the same external link structure so I’ll keep my small number of in-links, and now that it’s less trouble to get the results I want, I’ll probably be blogging more regularly.</p>
<p>First order of business will likely be to write up the getting started guide I would have wanted for Jekyll.</p>
Fantastic illustration of prior vs. posterior probabilitiestag:kdpeterson.net,2010:/blog//2.1312010-12-19T21:52:51-08:00
<p>People have difficulty internalizing what happens when you have a low error rate in some sort of detector but also have a very low probability event. I ran into this doing social media monitoring when we saw that the majority of items classified as certain uncommon language were wrong.</p>
<p>Check out <a href='http://www.google.com/search?q=%22fuck%22&tbs=bks:1,cdr:1,cd_min:1861,cd_max:1862&lr=lang_en' style='text-decoration: underline; '>occurrences of "fuck" in English language books in 1862.</a> Every single result on the front page is a mistake – “suck”, “such”, and “Puck” seem to be the bulk of it.</p>
<p>From a discussion on <a href='http://news.ycombinator.com/item?id=2019906'>Carlin's seven dirty words on Hacker News.</a></p>
Murder Your Darlingstag:kdpeterson.net,2010:/blog//2.1292010-07-22T22:02:24-07:00
<p>Lately I’ve been working on connectivity with NASDAQ. The protocols involve parsing fixed-offset messages of varios types. We’re not doing high frequency trading so we are optimizing for programmer efficiency – that is, the API I expose to the rest of the system should make sense, so I’m representing the different types of messages, trading conditions, exchange identifiers and so on as enums. I was working on processing incoming messages, in this case, implementing a handle for NASDAQ’s SoupTCP protocol. The incoming message has a one-character code which I translate. I’ve seen programmers code this kind of thing using a big lookup table, but that leads to maintainability problems – when you add an enum value, did you remember to add it to the case statement? Did that case statement get copy and pasted elsewhere? The better solution is to embed that logic in the enum itself using a static map and a factory method.</p>
<pre class='brush: java;tab-size: 2; smart-tabs: true; toolbar: false; gutter: false; first-line: 1;'>enum SoupMessageType {
LOGIN_REQUEST('L'),
LOGIN_ACCEPT('A'),
LOGIN_REJECT('J'),
DATA('S'),
LOGOUT_REQUEST('O');
private char code;
private SoupMessageType(char code) {
this.code = code;
}
private static final Map<Character, SoupMessageType> map;
static {
map = new HashMap<Character, SoupMessageType>(values().length);
for (SoupMessageType v : values()) {
map.put(v.code(), v);
}
}
public char code() {
return code;
}
public SoupMessageType from(char code) {
return map.get(code);
}
}</pre>
<p>This is a simple pattern, and I found myself copying it from another enum. Since copy and paste is bad, I started looking for how to turn this pattern into an abstraction. First, I’d move the static code block into the constructor for a map-like class:</p>
<pre class='brush: java; tab-size: 2; smart-tabs: true; toolbar: false; gutter: false; first-line:1;'>public class CodedEnumer<K, E extends Enum<E> & CodedEnum<K>> {
private Map<K, E> map;
public CodedEnumer(Class<E> klass) {
E[] enumConstants = klass.getEnumConstants();
map = new HashMap<K, E>(enumConstants.length);
for(E v : enumConstants) {
map.put(v.code(), v);
}
}
public static <K, V extends Enum<V> & CodedEnum<K>>
CodedEnumer<K, V> create(Class<V> klass) {
return new CodedEnumer<K, V>(klass);
}
public E get(K key) {
return map.get(key);
}
}</pre>
<p>The enum needs to implement a CodedEnum interface with one method.</p>
<pre class='brush: java;tab-size: 2; smart-tabs: true; toolbar: false; gutter: false; first-line:1;'>public interface CodedEnum<K> {
public K code();
}</pre>
<p>My first draft of this included another type parameter E for the enum, and a public E from(K key) method. But of course this method should be static, and declaring a static method in an interface would be meaningless (aside from the other detail of being a compiler error).</p>
<p>Now, rather than copy and paste building the mapping from code to value, the enum needs to implement CodedEnum, create an instance of the CodedEnumer, and use that to implement the one-line static method.</p>
<pre class='brush: java;tab-size: 2; smart-tabs: true; toolbar: false; gutter: false; first-line:1;'>enum SoupMessageTypeCoded implements CodedEnum<String, SoupMessageTypeCoded> {
LOGIN_REQUEST('L'),
LOGIN_ACCEPT('A'),
LOGIN_REJECT('J'),
DATA('S'),
LOGOUT_REQUEST('O');
private String code;
private SoupMessageTypeCoded(String code) {
this.code = code;
}
private static final CodedEnumer<String, SoupMessageTypeCoded> map =
CodedEnumer.create(SoupMessageTypeCoded.class);
@Override
public String code() {
return code;
}
public static SoupMessageTypeCoded from(String code) {
return map.get(code);
}
}</pre>
<p>Pretty slick, huh? I was pretty pleased with myself when I actually found a use for intersection in a generic declaration. This is where the old writer’s advice of “murder your darlings” comes into play. It’s various attributed to Fitzgerald, Hemmingway or others, but the meaning is that whenever you write a particularly clever turn of phrase, whatever makes you smile at how smart you are, get out the red pencil or delete key, and get rid of it.</p>
<p>On sober reflection, this code sucks. I’ve created two extra types with complicated generics to save two or three lines of code. Anyone who opens up the class in the future will have to open two more files to understand how it works and what it’s doing. So I struck it out, and reverted to the version with those three horrible lines wastefully repeated in each and every enum I use this pattern in. I can only console myself that disk space is getting cheaper.</p>
Inside Automated Sentiment Analysistag:kdpeterson.net,2010:/blog//2.1282010-03-25T16:05:17-07:00
<p>This post details <a href="http://ci.biz360.com" target="_blank">Biz360</a>'s automated sentiment analysis system, including our goals, how the system works, how we measure success, and the ways it can be used and misused. Before getting into the <em>how</em> or <em>why</em>, I want to start with the <em>what</em>. For our purposes, sentiment is the opinion of the author of an article towards the subject of an article. We classify sentiment into four possible categories.</p>
<h3>Positive</h3>
<dl>
<dd>Arguing <em>for</em> something, saying something is a <em>good</em> product, talking about good things a person or company has done, enjoying something, liking something, preferring something. If a mostly positive post has a small portion that is negative, it is still <em>positive</em>.</dd>
</dl>
<h3>Negative</h3>
<dl>
<dd>Arguing <em>against</em> something, saying something is a bad product, a bad experiences, talking about bad things a person or company has done, disliking or having problems with something. If a mostly negative post has a small portion that is positive, it is still negative</dd>
</dl>
<h3>Neutral</h3>
<dl>
<dd>If an post doesn't express any opinion, doesn't present anyone or anything in a favorable or unfavorable way, and wouldn't lead someone to form an opinion for or against, it is <em>neutral</em>.</dd>
</dl>
<h3>Mixed</h3>
<dl>
<dd>If an post is both positive and negative, such as saying something was good in some ways but bad in others, or if the post talks about different subjects and is positive toward one subject but negative to another, then rate the post as <em>mixed</em>.</dd>
</dl>
<p>The first question is why do you need automated sentiment. The simple answer is that there's just too much content. As conversations that used to take place over coffee and on street corners move to Twitter and forums, they become trackable. If a magazine with 100,000 readers mentions you in an article, you'll read that article and discuss what it's saying about you. If 10,000 people tell ten of their friends what they think of Kevin Smith vs. Southwest Air, you can't hope to read more than a small sampling. It's this later use case that we cared about.</p>
<ol>
<li>What portion of my coverage is positive, negative, etc?</li>
<li>I got a spike in coverage on Monday. Was that spike positive or negative?</li>
<li>What kinds of things are people saying that's positive? Negative?</li>
</ol>
<p>We knew from the start that accuracy on the individual article level was never going to be that good. That is, if you want to know what the sentiment for some particular article is, the best thing to do is click on it, read it, and form your own opinion. With the help of <a title="Bill MacCartney's page at Stanford" href="http://nlp.stanford.edu/~wcmac/" target="_blank">Bill MacCartney</a>, an NLP researcher from Stanford, we quickly honed in on the following design parameters:</p>
<ol>
<li>A statistical classification system using two classifiers to detect positive and negative, and another classifier to combine these results. We would start with a simple <a href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier">Naive Bayes classifier</a> and a <a href="http://en.wikipedia.org/wiki/Decision_tree_learning">Decision Tree classifier</a> to get everything working, and experiment with <a title="Wikipedia on Statistical Classifiers" href="http://en.wikipedia.org/wiki/Statistical_classification#Algorithms" target="_blank">more advanced classifiers</a> like the Linear MaxEnt classifier once we had a baseline to measure improvements.</li>
<li>The system would be trained using lots of data from <a title="Mechanical Turk" href="https://www.mturk.com/" target="_blank">Mechanical Turk</a>. Each item would be rated multiple times so we could throw out the results from raters who didn't understand or were not taking enough care.</li>
<li>Our training data would be real social media content, drawn from all the types of social media we process (blogs, micro-blogs, etc).</li>
</ol>
<p>At a very high level view, text classification systems get lumped into groups based on whether they are based on statistical learning from data, or whether they are based on hand-coded rules. Our system is solidly in the statistical camp. We were skeptical that a rule-based system could encompass the wide variety of topics and writing styles and the frequency of ungrammatical or misspelled content on the less formal parts of the Internet.</p>
<p>Our sentiment engine turns each post into a set of features, like ("good", "deal") -> 2, meaning the word "good" followed by the word "deal" occurs twice. This gets fed into a two-stage system. First, everything gets flagged for how positive it is (regardless of also being negative) and for how negative it is (regardless of how positive it is). Next, these get combined into the four categories that are displayed. So high positive sentiment and low negative sentiment would be <em>positive</em>, and high positive <em>and</em> high negative would be <em>negative</em>.</p>
<p>We really wanted a <em>mixed</em> category, because in terms of whether it's a post worth reading, someone who is saying both good and bad things about you is even more interesting than positive or negative. Consider the following three clips:</p>
<ol>
<li>I love my Kinesis Maxim keyboard, it's the best. My wrists feel great since I've been typing on it.</li>
<li>Kinesis is stupid, the Maxim has a stupid layout. I had one for a while but I threw it out.</li>
<li>I like my Kinesis Maxim, but the left <em>alt</em> key is too small and too far to the left.</li>
</ol>
<p>Sure, the first one is what you hope everyone is saying, but reading these doesn't provide much value. The second one at least is an opportunity for damage control, but the third one is the real gold. In a system based on just a range from negative through neutral to positive, the positive and negative would cancel out and this kind of thing would get lumped into the neutral bucket.</p>
<p>This kind of statistical system isn't any good without good data, so we used an approach that gives us lots of good data quickly and cheaply. We sent out thousands of clips to Mechanical Turk, Amazon's "artificial artificial intelligence" where they were scored by ten humans each. The instructions they were given were exactly the definitions I gave above. Those aren't just descriptions of what we think the system produces, those are the starting point. When the results came back, the humans didn't always agree, and some agreed more than others. We threw out the ones who looked like they just didn't understand the problem at all or were clicking randomly since payment was per item. Of the remaining items, we still got disagreements, so we took the majority, so that if five people said <em>positive</em>, three said <em>neutral</em> and two said <em>mixed</em>, we'd used that clip as training data for <em>positive</em>. All of our data was real social media data. We evaluated one off-the-shelf solution which was trained on newspaper data, and when it said that "Comcast sucks!" was neutral, we gave up on that idea.</p>
<p>To evaluate our accuracy, we looked at a whole slew of numbers. We used a technique called k-fold cross validation, which means that we'd hold back some of our human-annotated data to use to evaluate how accurate the system is. A big challenge was that most of the content we got was neutral or positive, not mixed or negative. This makes it hard to use simple accuracy as the only metric. That is, if I have 90 items that should be classified as <em>A</em> and 10 items that should be classified at <em>B</em>, I could be 90% accurate by just saying everything was <em>A</em>. So I looked at the accuracy rates for each of the categories separately, and tried to balance them. Given my example with 90 <em>A</em> and 10 <em>B</em>, if I could get 90% accuracy, I'd really prefer 81 out of 90 <em>A</em>s classified correctly and 9 out of the 10 <em>B</em>s.</p>
<img alt="Sentiment Breakdown Graph" src="/images/sentiment-breakdown-300x229.jpg" width="300" height="229" style="float: right; margin: 0 0 20px 20px;" />
<p>Of course, there's no "make mistakes evenly" button to press, but I think we found a combination that gives useful results. You can see in the attached chart of predicted vs. human-annotated sentiment that the errors are evenly spread across the categories. This illustrates what we mean when we say that sentiment, though it is only correct for about 2/3 of the individual items, is <em>directionally</em> accurate. If the system finds 100 articles for for a topic, and says 50 of them are <em>positive</em>, a lot of those will be wrong. Maybe you go through them and you see that 10 of them were <em>neutral</em>, four <em>negative</em> and one <em>mixed</em>. But when you go to the other categories, you'll find that the errors mostly balance out. Some of the <em>neutral</em> should have been <em>positive</em>, and so on. So maybe there should have been 52 <em>positive</em>.</p>
<p>There's a strong temptation when building an automated sentiment system to treat <em>neutral</em> as "I'm not sure". Computers make different kinds of mistakes than humans, and when the computer screws up something a human would have no trouble classifying correctly, it erodes confidence. The problem with this approach is that it focuses too much on not being wrong, and not enough on being right. If uncertain posts are rated as <em>neutral</em>, it changes the whole distribution of content. If you look at a topic and 75% of the content is "neutral", how much is really neutral and how much is swept under the rug because it didn't cross a confidence threshold? We treat <em>neutral</em> as just another category. To classify something that should be <em>positive</em>, <em>negative</em>, or <em>mixed</em> as <em>neutral</em> is just as incorrect as vice-versa.</p>
<p>I hope this has given you some insight into how Biz360's sentiment engine works, and lets you make better sense of the numbers you are seeing, or, if you are still comparing solutions, gives you things to look for and questions to ask. I'll be following this up in the future with another article explaining "entity" or topic-based sentiment.</p> <p>This article was original posted on <a href="http://blog.biz360.com/2010/03/inside-automated-sentiment-analysis/">Biz360's Blog</a></p>
How to roll back a committed change in SVNtag:kdpeterson.net,2010:/blog//2.1272010-03-09T09:41:43-08:00
<p>A coworker asked me today how to roll back a change that has been committed to SVN. This isn’t obvious and the top google searches return irrelevant results. To back out or roll back a change that has already been committed to the Subversion repository, you first merge your commit in reverse, and then you commit. That is, in change 2918, you committed some config files that should not be there. Do this: <pre>% cd config
% svn merge -c -2918 ^/project/trunk/config
% svn ci -m 'revert checkins to config'</pre></p>
<p>This is covered in more detail in <a href='http://svnbook.red-bean.com/nightly/en/svn.branchmerge.basicmerging.html#svn.branchmerge.basicmerging.undo'>Undoing Changes</a> section of the documentation.</p>
Handy way to monitor multiple HBase logstag:kdpeterson.net,2010:/blog//2.1262010-03-02T14:35:42-08:00
<p>We’ve been running HBase at Biz360 for about six months, but it worked so smoothly at first that I never did much tuning. I’ve recently increased the volume of data we’re storing by about 300%, and have started running into some problems like blocks going missing. I found a good way to monitor all my servers logs to get an idea what’s going on today, and it’s so simple I wanted to share.</p>
<pre>
for x in 01 02 03 04 05 06 07 08 09 10 ; do
server=prod-hbase$x
echo === $server ===
ssh $server tail -30f '/app/hbase/logs/*region*.log'
done
</pre>
<p>It does a tail -f on each server’s log in turn. To move to the next server, just hit ctrl-c.</p>
<p>@squarecog suggests <pre>
for server in `cat hbase_servers.txt`; do ...
</pre></p>
You, right there, go call 911tag:kdpeterson.net,2010:/blog//2.1252010-01-19T13:53:52-08:00
One item that stuck in my mind during a first aid class during boot camp was how to direct someone to call 911.<br /><br /><b>Wrong:</b><br />"Somebody call 911!"<br /><br /><b>Right:</b><br />"You, in the black shirt, go call 911."<br /><br />If something isn't the responsibility of a particular person, it will not get done. Everyone assumes someone else will do it.<br /><br />The same thing applies to software, and especially maintenance issues. If you say "hey, guys, looks like the app is down", it's going to stay that way. An open ticket not assigned to anyone is only useful if that is implicitly assigned to the product manager to review for the next iteration. If the app is down, then the right action is to open a ticket, and say, "hey, the app is down, it looks like the database, Joe, that's your area, here, it's your ticket". If Joe determines that the database is down because someone tripped over the power cord, then Joe can assign the ticket to Larry in ops, but every step of the way, some one particular person is responsible for that task.<br />
Hadoop Workflow Tools Survey tag:kdpeterson.net,2010:/blog//2.1242009-11-19T02:25:36-08:00
<p>Hadoop Map Reduce and HDFS are fairly stable pieces of software. One component that doesn’t have a clear winner yet is higher level job scheduling, also known as workflow scheduling.</p>
<p>To put this in context for someone who isn’t familiar with Hadoop, a single Hadoop job is broken up into many map and reduce tasks. The scheduler runs on the job tracker and assigns tasks to open slots on the task trackers on the worker nodes. When we talk about the scheduler in Hadoop, this is usually what we are talking about. By default, Hadoop uses a FIFO scheduler, but there are two more advanced schedulers which are widely used. The Capacity Scheduler is focused on guaranteing that various users of a cluster will have access to their guaranteed number of slots while making it and the Fair Scheduler is focused on providing good latency for small jobs while long running large jobs share the same cluster. These schedulers closely parallel processor scheduling, with hadoop jobs corresponding to processes and the map and reduce tasks corresponding to time slices.</p>
<p>The next level up is workflow scheduling – starting jobs on a cluster in the right order and with dependencies. Sometimes a single map-reduce job is all you need. More frequently, you will have many jobs with dependencies between them. For example, you might want to identify the most important words in each document using <a href='http://en.wikipedia.org/wiki/Tf%E2%80%93idf'>term frequency–inverse document frequency</a>, which requires first calculating the inverse document frequency then making use of that while examining the documents again. In this case, a shell script that runs the first job, waits for it to complete and then starts the second will work.</p>
<p>Once you go down this path, you start running into difficulties. Perhaps job C depends on job A and job B, but it’s fine for A and B to run in parallel. If D depends on B and C, and B and C depend on A, and B fails part way through, how do you recover? It’s not a particularly hard problem, but it’s enough of a problem that we’d like to not reinvent the wheel. After all, while people use Hadoop for different tasks, this workflow scheduling problem is common to everyone.</p>
<p>I recently sent out a poll to the Hadoop mailing list to see how people are solving this problem.<br />The first question was what are you using to manage your jobs.</p>
<ul>
<li>Five people are using shell scripts</li>
<li>Three people are using a homegrown system.</li>
<li>Five people are using a higher level abstraction like Pig, Hive, or Cascading.</li>
<li>One person reported using Oozie.</li>
<li>One person reported using Opswise.</li>
<li>One person is using Amazon Elastic Map Reduce.</li>
</ul>
<p>The next question mostly applied to those using shell scripts or a homegrown system and asked how these systems interacted with Hadoop.</p>
<ul>
<li>Four people are using in-house systems built on top of JobControl.</li>
<li>One person is using an in-house system that uses Job.</li>
<li>One person is directly submitting the job as XML.</li>
</ul>
<p>I also asked whether people were happy with the tools they are using (whether homegrown or off the shelf)</p>
<ul>
<li>4 people are very happy with their system and would recommend it to others.</li>
<li>5 people have some headaches but aren’t actively looking to replace it.</li>
<li>7 are not happy with their system and would like to replace it.</li>
</ul>
<p>For those who want to look at the raw data (as sparse as it is), I’ve posted it to a <a href='http://spreadsheets.google.com/ccc?key=0AkSgUqAZJOp-dFctWE9QdWhoc1M4TlZSQTNlNXdYVHc&hl=en'>google document</a>. What was most interesting is that of the people using a homegrown system, only one said they were at all happy with it, and none would recommend their system. A majority of those using a higher level abstraction would recommend their system to others. Before taking the poll, I worried that I was doing things wrong, that there was some simple clear solution that everyone was adopting. The opposite was true: any combination I could come up with, there was someone out there who had actually done it that way.</p>
<p>There’s a continuum from just running a Hadoop job by typing <code>bin/hadoop jar ...</code> or putting it into crontab up through more complicated systems like what we have been using at Biz360 that involve scripts to figure out batch numbers and parameters and then start Java processes that may run multiple jobs using JobControl. The only person using a homegrown system who said it was acceptable is using something based on JobControl. JobControl is included with Hadoop and simply helps to manage dependencies between Jobs. Rather than keeping track on your own that you can start job A and job B, and need to wait for them both to finish before starting C, you can add them both to a JobControl and run it. Dependencies can only be between jobs – you can’t have a task to move files around depend on jobs or a job depend on whether a directory is empty or anything like that. This is client side, so the process that started the JobControl will need to keep a thread running. You can detect errors, and it will stop running jobs when a dependency can’t be satisfied, but there’s no way to recover from errors. If you want to retry jobs, you need to handle that yourself.</p>
<p>Another popular option, and the one that seems to have the most happy users, is a higher level abstraction that runs on top of Hadoop like <a href='http://www.cascading.org/'>Cascading</a>, <a href='http://hadoop.apache.org/pig/'>Pig</a> or <a href='http://hadoop.apache.org/hive/'>Hive</a>. These share many common features.</p>
<ul>
<li>Flow of data expressed using concepts more expressive than Map-Reduce, such as filters and grouping operators.</li>
<li>Job optimizer step to translate the higher level data flow into discrete Hadoop Map-Reduce jobs.</li>
<li>A scheduling engine that can run the entire workflow.</li>
<li>Extension points to insert operations that aren’t covered by the system’s primitives.</li>
</ul>
<p>There are significant differences. Hive’s uses SQL to express the workflow, Pig has its own language called Pig Latin, while Cascading is written in Java or Groovy. User defined functions in Pig are very much an extension to Pig, compared to Cascading where it’s possible to create a Flow directly from a Hadoop JobConf. Hive specifically targets integration with SQL based tools. All of these to some extend insulate the user from the Hadoop concept of jobs, replacing it with something else. For our purposes at Biz360, the ability to plug existing Hadoop jobs unchanged into a Cascade would make it the best choice if we wanted to go this route. Other users may find the simple query language Pig or Hive compelling.</p>
<p>One more specialized tool is <a href='http://code.google.com/p/hamake/'>Hamake</a>, named for and inspired by <code>make</code>. This allows you to express your jobs as dependencies on data so that you can run only those portions of a complex pipeline that are not up to date. I should note that Cascading also has this functionality.</p>
<p>None of these really suit our purposes though. I’d prefer to stick to writing Hadoop map-reduce jobs, not Pig Latin or Cascading Flows. Understanding where my data goes when I have a Hadoop Reducer writing to a SolrOutputFormat that writes to an instance of Solr running on the same node as the Reducer is right at the edge of what I can keep track of. If I introduce another layer of indirection, I would get hopelessly confused.</p>
<p>There are tools related to scheduling map reduce jobs and assembling them into workflows. Amazon’s <a href='http://aws.amazon.com/elasticmapreduce/'>Elastic Map Reduce</a> has one tool, but this is tied to their particular service. There’s Cloudera Desktop, which offers some basic job scheduling functionality, but this doesn’t yet offer much functionality for workflow scheduling. I’m not sure what functionality Opswise offers as far as Hadoop scheduling goes. I’d never heard of it before someone mentioned it in the poll.</p>
<p>Some I’m going to list what I think are the killer features to see in a Hadoop workflow scheduling system:</p>
<ul>
<li>Schedule both map reduce jobs and other actions like copying a file from the local filesystem, or testing to ensure that a directory has 60 files in it.</li>
<li>Express a directed acyclic graph of dependencies between jobs and actions. (Loops would be nice, but I don’t need them.)</li>
<li>Full access to set arbitrary input formats, output formats, mapper, reducer, and combiner classes.</li>
<li>Ability to drop into Java code when needed with some sort of “postconfigure class”. I’m thinking of setting up a scanner for HBase TableInputFormat here.</li>
<li>Run as a server-side process. It should be possible for clients to submit entire workflows, and those workflows are then detached from the clients.</li>
<li>Ability to stop and restart a workflow part way through.</li>
<li>Ability to rerun a workflow which had a single part fail.</li>
<li>Ability to persist status between service restarts.</li>
<li>Scheduled jobs.</li>
</ul>
<p>Most of what I’ve listed above is available in <a href='http://issues.apache.org/jira/browse/HADOOP-5303'>Oozie</a>, the Hadoop Workflow System. Generally, it allows you to express a workflow as a DAG using XML, it supports Java map-reduce jobs, streaming jobs, and even Pig. You can have nodes that are file system actions, or call outside shell scripts. Everything is persisted in MySQL and it has a UI showing what stage each running workflow is at. On paper, Oozie looks like the holy grail. The downside is that it isn’t to the level of polish the other projects I mentioned are. I haven’t attempted to set it up because I can’t see that it has been updated for 0.20. The last update to the Jira was in June, and it hasn’t yet been committed, even though it’s just a contrib package.</p>
<p>But never fear, the latest post on Yahoo’s Hadoop blog is about hiring for the Hadoop team, and I notice that they have an opening for “Senior Engineer for Oozie” and say that “the Oozie team is rapidly growing”. At least this means if I set it up and learn to use it, I can trust that it isn’t going to be dying off any time soon.</p>
<p>Even if nothing comes of Oozie, Cloudera is also working in this direction. Cloudera Desktop currently has a job designer that allows you to create and save parameterized single map-reduce jobs, for example, to set all of the options except the input and output directories. It doesn’t yet have any workflow tools, but I’m told they are in the works.</p>
Posner explains CYA security theatertag:kdpeterson.net,2010:/blog//2.1222009-10-22T08:25:53-07:00
<p>It’s obvious to any rational outside observer that US terrorism policy mostly revolves around making sure people think politicians are “doing something”, regardless of whether something needs to be done, or whether what they’re doing is the right thing. Explaining the work for which Williamson won the Nobel last week, Judge Posner writes: <blockquote>
[FBI criminal-investigation functions] lend themselves to what are called "high-powered" incentives, which are systems of compensation and promotion that are based on objective performance criteria. In the case of criminal investigation these are number of arrests weighted by convictions and sentence. Intelligence work does not lend itself to such performance criteria, because the effect of surveillance and other intelligence activities in preventing terrorism or subversion is usually very difficult to assess. Hence motivation takes the form of creating a "high commitment" environment in which the organization's leaders try to elicit good performance by getting staff to internalize the organization's goals. The problem is that the absence of objective criteria of performance opens the door to "influence activities" by which members of the organization jockey for advancement.<br /><br />
If both types of task are combined in the same organization--those that can be directed by high-powered incentives and those that require high commitment as their motivator, the best employees will tend to gravitate toward the first type of task because they will be confident that they will do well if their performance is judged according to objective criteria. They will be much less certain how well they will do in a job in which influence activities play a large role in determining success.
</blockquote></p>
<p>To summarize the summary, the best and the brightest will be drawn to organizations that have objective measures of success, but even more so, within a given organization, they will be drawn to these types of roles. Those who aren’t very good, and especially those who can be political hacks who shamelessly talk about how the threat level is Orange today, so put on extra sunscreen, will drawn to those roles without objective measures of success, where climbing the career ladder is based on criteria other than doing the job better than the next guy.</p>
<p>Check out <a href='http://www.becker-posner-blog.com/archives/2009/10/the_economics_o_10.html'>the whole article</a></p>
Capping simultaneous tasks in Hadooptag:kdpeterson.net,2010:/blog//2.1212009-10-21T00:45:17-07:00
<a href='/images/hadoop-pools.png' onclick='window.open('/images/hadoop-pools.png','popup','width=679,height=323,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false'><img src='/images/hadoop-pools-thumb.png' height='166' alt='Fair Scheduler Pools Screenshot' width='350' style='float: right; margin: 0 0 20px 20px;' /></a>
<p>We’ve run into several situations in Hadoop where we want to prevent a job from using more than a certain number of slots. Some of our jobs have external resources that don’t scale. One task needs to talk to a MySQL database. Another writes to our Solr cluster. These are jobs that we know beyond a certain point they don’t go any faster – if we have 200 mappers running, it’s not any faster than 50. We moved to the fair scheduler partially to alleviate some of these concerns. The idea was if multiple jobs are running at once, they aren’t likely to be the same type of job.</p>
<p>The other day I ran into a problem again and decided to take a look around to see if anyone had done anything in this direction. The first issue was <a href='http://issues.apache.org/jira/browse/HADOOP-5170'>HADOOP-5170</a> which ended with a consensus that the functionality should be in the scheduler, not part of Map Reduce proper. <a href='http://issues.apache.org/jira/browse/MAPREDUCE-698'>MAPREDUCE-698</a> is to add a per-pool simultaneous tasks cap to the Fair Scheduler, which is a much better idea than to cap it on the job level.</p>
<p>If your jobs rely on external services like a database or web service, you can run those jobs in a particular pool. If you have two jobs in this pool, then they will share the cap, and the load on your database remains constant. Also, these tasks can be assigned a set minimum on their pool to ensure that you don’t have the database sitting there idle, and then have half your hadoop cluster sitting idle later when you are waiting for these jobs to finish.</p>
<p>If your jobs have very long-running tasks, like when building a Lucene index in a reducer, you may want to avoid having these jobs grab slots during gaps when there are no jobs running. I see this frequently when one job finishes, and in the time before the dependent job starts up, all the slots have been taken by another job. Without preemption, you can end up increasing latency a lot.</p>
Using HBase TableIndexed from Thrift with unique keystag:kdpeterson.net,2010:/blog//2.1202009-09-30T12:48:55-07:00
<p>HBase is primarily a sorted distributed hash map, but it does support secondary keys through a contrib package called Transactional HBase. The secondary keys are provided by a component called TableIndexed. A good general walkthrough is <a href='http://rajeev1982.blogspot.com/2009/06/secondary-indexes-in-hbase.html'>Secondary indexes in HBase</a> on Rajeev Sharma’s blog. My blog post specifically addresses how to use secondary indexes outside of the Java API and how to handle unique keys.</p>
<p>Our scenario is this: my articles table will use a row key that starts with the timestamp. This is a very common scenario in HBase because it is usually the most natural way to access information. I also have on at least some articles, a secondary key that I want to be able to do lookups by. Let’s have a concrete example for further discussion.</p>
<pre>
row key key:id content:title
1234 abc ...
</pre>
<p>These keys are always unique, this is guaranteed at the application level. I would like to be able to fetch an article using the secondary key. All pretty straightforward, except I want to do this outside of Java using just Thrift, and I would prefer to do it using only get() because that makes it easier to write and easier for anyone coming along to read later. Scanners may be powerful, but it’s not intuitive that a <em>get</em> using a unique secondary index would use them.</p>
<p>The first thing we need is the index specification that can create the unique key that we want. By default, HBase provides a SimpleIndexKeyGenerator that creates row keys that start with the value of the secondary key, in our example, they would be abc1234. This supports having multiple rows that match the secondary key, for example, abc5678, but means that you have to use a scanner to get at the information.</p>
<p>I’ve written a <a href='http://gist.github.com/187532'>UniqueIndexKeyGenerator</a> that would use the exact value from the primary table’s secondary key as the row key for the index table. That is, for our example, the row key in “articles-index” would be “abc”. I’ve also added a forUniqueIndex static method to IndexSpecification to make it easier to call this from the jruby shell.</p>
<pre class='brush:java'>
/**Construct an index spec for a single column that has only unique values.
* @param indexId the name of the index
* @param indexedColumn the column to index
* @return the IndexSpecification
*/
public static IndexSpecification forUniqueIndex(String indexId, byte[] indexedColumn) {
return new IndexSpecification(indexId, new byte[][] { indexedColumn },
null, new UniqueIndexKeyGenerator(indexedColumn));
}
</pre>
<p>I then add a create_index method directly in the shell. The shell is flexible enough that you have access to things like @configuration which are used by the all the other methods. Also, I was unsure whether I could do an import inside a method, but it worked just fine. This is there with the idea that it could eventually be added to the shell and if you tried to call it, would fail with a class not found if you didn’t have the transactional jar in your classpath.</p>
<pre class='brush:ruby'>
def create_index(table_name, index_name, column)
import org.apache.hadoop.hbase.client.tableindexed.IndexedTableAdmin
import org.apache.hadoop.hbase.client.tableindexed.IndexSpecification
@iadmin ||= IndexedTableAdmin.new(@configuration)
spec = IndexSpecification.for_unique_index(index_name, column.to_java_bytes)
@iadmin.addIndex(table_name.to_java_bytes, spec)
end
</pre>
<p>Using this, I can create my tables in the shell easily.</p>
<pre class='brush:ruby'>
create 'a', 'key', 'content'
create_index 'a', 'index', 'key:id'
</pre>
<p>This creates a table “a” with two column families, “key” and “content”. It creates an index called “index” on the “id” column in the “key” column family. Internally,</p>
<p>To access this over Thrift, it’s now very simple. You can look at <a href='http://gist.github.com/198256'>query.rb</a> but these are the important sections. First, we need to make sure we have Thrift hooked up.</p>
<pre class='brush:ruby'>
require 'rubygems'
require 'hbase'
transport = Thrift::BufferedTransport.new(Thrift::Socket.new('127.0.0.1', 9090))
protocol = Thrift::BinaryProtocol.new(transport)
client = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol)
transport.open
</pre>
<p>Thrift is required by hbase, so we don’t need to require it separately. I’ve opened a connection to my own machine on port 9090, the default for the Thrift server. I don’t know the details of the above, it’s just voodoo I picked up on some other site. My main loop lets me query a bunch of secondary key interactively.</p>
<pre class='brush:ruby'>
STDIN.each do |id|
id.strip!
row_keys = client.get index, id, '__INDEX__:ROW'
row_key_cell = row_keys[0]
if row_key_cell
row_key = row_key_cell.value
puts "Found row key #{row_key}"
value = client.get table, row_key, column
puts "Found item #{value[0].value}"
else
puts "unable to find '#{id}' in index"
end
end
</pre>
<p>This is the important part. First, we get an id, this is the value we are trying to match to key:id in the table. The first thing we do is get the contents of “__INDEX__:ROW” from the index table where the row key matches the id. In my case, there’s always only one, but the API doesn’t know that, so it returns a list of cells. The cells have a column specifier, a timestamp, and what we care about, a value. This value is the row key of the primary table. If we found anything in the secondary index, we then do a get on the primary table.</p>
<p>This is currently describing my work-in-progress. Once I’ve actually got everything up and running in production under load, I’ll create some patches to submit to HBase. I think adding UniqueIndexKeyGenerator greatly simplifies one common use of TableIndexed, and adding support for TableIndexed to the shell makes it easy to manage schema.</p>
<p>The code is now available in <a href='https://issues.apache.org/jira/browse/HBASE-1885'>HBASE-1885</a>.</p>
Minimal HBase MapReduce Example for 0.19 and 0.20tag:kdpeterson.net,2010:/blog//2.1192009-09-04T16:02:00-07:00
<p>HBase includes an example for populating a table from Hadoop map reduce, but it seemed overly complicated. I’m getting started with HBase and this was my starting point. This first one uses the old Hadoop API with everything in the mapred package, not mapreduce. It also uses the corresponding API from HBase which is now deprecated.</p>
<pre class='brush:java'>
public class PopulateArticlesTable extends Configured
implements Tool {
public static class Map extends MapReduceBase
implements
Mapper<LongWritable, Text, ImmutableBytesWritable, BatchUpdate> {
private ImmutableBytesWritable outKey = new ImmutableBytesWritable();
@Override
public void map(
LongWritable offset,
Text input,
OutputCollector<ImmutableBytesWritable, BatchUpdate> output,
Reporter report) throws IOException {
// whatever format your data is in
RichArticle art = new RichArticle(input.toString());
// a good HBase row key, consisting of a timestamp and a unique identifier to prevent collisions. All keys are byte arrays.
byte[] rowId = art.getRowId();
outKey.set(rowId);
// We execute one update for each object we encounter, that update may be composed of multiple operations, in this case, two puts
BatchUpdate update = new BatchUpdate(rowId);
if (art.getTitle() != null)
update.put("content:title", Bytes
.toBytes(art.getTitle()));
if (art.getBody() != null)
update.put("content:body", Bytes
.toBytes(art.getBody()));
output.collect(outKey, update);
}
}
@Override
public int run(String[] args) throws Exception {
// Standard boilerplate for creating and running a hadoop job
JobConf job = new JobConf(getConf(), this.getClass());
String input = args[0];
job.setJobName("Populate articles table from " + input);
// Input is just text files in HDFS
TextInputFormat.setInputPaths(job, new Path(input));
job.setMapperClass(Map.class);
job.setNumReduceTasks(0);
// Output is to the table output format, and we set the table we want
job.setOutputFormat(TableOutputFormat.class);
job.set(TableOutputFormat.OUTPUT_TABLE, "articles");
JobClient.runJob(job);
return 0;
}
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new Configuration(),
new PopulateArticlesTable(), args);
System.exit(res);
}
}
</pre>
<p>Next up I’ve converted everything to use the 0.20 APIs. You’ll see that it got shortened too, as I use GenericOptionsParser directly instead of implementing Tool.</p>
<p>For the Hadoop changes, OutputCollector is no more, it has been replaced by Context. The Mapper interface and MapReduceBase have been merged into the Mapper class, which is intended to be extended (without extending it, Mapper is the IdentityMapper). In Job control, you no longer use JobClient.run, instead calling waitForCompletion on the Job. The configuration has been cleaned up, as you’ll notice I have to create a configuration prior to the Job. One big item is that you need to manually setJarByClass, which was previously taken care of by creating the JobConf with the class as a parameter. Job.setOutputFormat has changed its name to setOutputFormatClass.</p>
<p>I’m new to HBase, so I’m not as sure about whether I’ve done things in the recommended way. The important things to note are that you need to set the table name in the conf before creating the job, and Puts and Deletes are Hadoop Writables.</p>
<pre class='brush:java'>
public class PopulateArticlesTable {
public static class Map extends
Mapper<LongWritable, Text, NullWritable, Writable> {
@Override
protected void map(LongWritable offset, Text input, Context context) throws IOException, InterruptedException {
// my input is in JSON format, in other applications, you might be splitting a line of text or any of Hadoop's writable formats
RichArticle art = new RichArticle(input.toString());
// RichArticles are able to output a good HBase row key, consisting of a timestamp and a unique identifier to prevent collisions. All keys in HBase are byte arrays.
byte[] rowId = art.getRowId();
// We output multiple operations for each row
if (art.getTitle() != null) {
Put put = new Put(rowId);
put.add(Bytes.toBytes("content"), Bytes.toBytes("title"), Bytes.toBytes(art.getTitle()));
context.write(NullWritable.get(), put);
}
if (art.getBody() != null) {
Put put = new Put(rowId);
put.add(Bytes.toBytes("content"), Bytes.toBytes("body"), Bytes.toBytes(art.getBody()));
context.write(NullWritable.get(), put);
}
}
}
public static void main(String args[]) throws Exception {
Configuration conf = new Configuration();
conf.set(TableOutputFormat.OUTPUT_TABLE, "articles");
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
String input = otherArgs[0];
Job job = new Job(conf, "Populate Articles Table with " + input);
// Input is just text files in HDFS
FileInputFormat.addInputPath(job, new Path(input));
job.setJarByClass(PopulateArticlesTable.class);
job.setMapperClass(Map.class);
job.setNumReduceTasks(0);
// Output is to the table output format, and we set the table we want
job.setOutputFormatClass(TableOutputFormat.class);
job.waitForCompletion(true);
}
}
</pre>
<p>Hopefully the before and after for the APIs is helpful. These have both been tested and work on my system. You’ll need to modify them slightly, of course, unless you happen to have a data object called RichArticle that has a string serialization.</p>
<p>Update: I should probably point you at the <a href='http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html'>official documentation</a> as well.</p>
Scala Actors at SDForumtag:kdpeterson.net,2010:/blog//2.1182009-06-24T20:03:49-07:00
<p>I’m attending tonight a presentation on actors and actors in scala, presented by SDForum Software Architecture SIG. Upcoming meeting: July 21, 3rd tuesday, Vijay Patel linked in talk about analytics. Carl Hewitt, Stanford; Robey Pointer, Twitter; Frank Sommers, Artima; Bill Venners moderating. Abstract to concrete.</p>
<p>I’ll post this as-is and come back and edit it later.</p>
<h2 id='carl_hewitt_stanford_inventor_of_actor_paradigm'>Carl Hewitt, Stanford, inventor of actor paradigm</h2>
<p>Carl Hewitt: back in the day in 1972 we programmed Smalltalk with a magnetized needle and a steady hand, uphill in the snow both ways.</p>
<p>Three things: send more messages; create new addresses; decide what state for next message. Petri nets as a model suffer from being physically impossible. The three way model has the advantage of being possible. Implications of actors model breaks representation as turing machine or in lambda calculus.</p>
<p>Cloud: it’s clients, all the way up. (Title for client server interaction on the cloud: the fog rolls in.) John MacCarthy defines lisp in terms of lisp. “How many have seen ‘eval’?” 2 hands. Review of that lecture from 61a. Instead of eval the function, eval as a message. If I’m X and I get an eval message with an environment. This is the best way to define concurrent programming languages currently. Bah, say mathematicians, that’s circular! Well, too bad, concurrency doesn’t fit math.</p>
<p>Define theoretical PL ActorScript. XML and JSON instantiations. No assignments, but not functional. Actors get replaced by their next version.</p>
<p>Tension between well-targetted ads and user desire for privacy. But government can’t mine your data fast enough. Spooks will need to reside inside datacenters. Try to move behavioral targetting to client, store encrypted data on cloud. July 23rd symposium on semantic integration at Stanford. Info at http://carlhewitt.info</p>
<p>Invented the one minute lecture. Advertisers can deliver a coherent lecture in 30 seconds. (Fantastic idea – I should do this) Lots of questions following his lecture. This guy just drops the bomb in terms of many ideas at once. Stalinist theory of computation. Lots of parrallelism down the tree. Company model of computation. Different departments. All of the departments talk to each other without having to go through the CEO. That’s concurrency. Map Reduce does the parrallelism but doesn’t do the concurrency.</p>
<h2 id='frank_sommers__actors_in_scala'>Frank Sommers - Actors in Scala</h2>
<p>Why scala makes actors natural. Example Scaling actors the future</p>
<p>Immutability, OO + functional, pattern matching, easy DSL, JVM. Mainstream language that lends itself to Actors.</p>
<p>Walks through use of Scala actors showing a chat program. When user joins, receiver of subscribe message create a new actor to handle updating the user. Illustrates the shorthand syntax for actors. Sync, async and futures messages. Gets derailed by questions from audience that are too detailed.</p>
<p>Scala actors support working in a distributed fashion, same syntax as within a single VM. Need to import RemoteActor._, listen on a port. Sending actor needs to know the “node”, tuple of address, port number, and symbol. Thread-per-actor and Event-driven actor implementations. Example used thread-per-actor, but better scalability with event driven actors, execute actors on a thread pool. Wait for messages without consuming a thread. Can scale to millions of actors on a single JVM using this system. Able to schedule actor sending message to another actor within the same thread, effectively performance of subroutine call.</p>
<p>Missed a bit, I think he’s talking about how react cannot return conventionally. But now the time is displayed in my emacs status bar, so all is good. Now I just need to display my battery status.</p>
<p>Will be getting continuations in the future. Pluggable schedulers, better actor isolation using compiler plugin, static checking, integrating exceptions, actor migration (to different node? I assume). Tensions whether actors should be more complicated, or if the actors library should remain very basic. Also question of single actors library or multiple actors libraries, e.g., Lift uses a simpler library than Scala actors. Partially pragmatic concerns versus more pure approach.</p>
<p>Still lots of theoretical problems, but quite usable for any actual scenario.</p>
<h2 id='robey_pointer__twitter'>Robey Pointer – Twitter</h2>
<p>Talk is titled “solving problems with actors”. Got started with Actors writing a chat proxy for cell phones. Long lived connections, lots of connections, mostly idle. First attempt was with one thread per session. Very simple, but didn’t scale. Went with thread pools and async IO. More scalable but harder to read. Fatal flaw: blocking on other services (http). Fix all APIs to be async using hideous callbacks. If it doesn’t fit on a slide, it’s not good code.</p>
<p>Actors: each session is an actor. Events are just messages. Can seek ahead for specific events. Works will with java.nio and apache mina. Mina wraps nio as events, his naggati library translates this into scala messages.</p>
<p>Kestrel. Message queue. Memcache protocol as a Mina plugin. Scales horizontally, no awareness of each other. Stats on one server: 1 month uptime, 2.4 TB written, 4 billion gets, 1.6 billion sets.</p>
<p>Actors just one of many tools. Used synchronized for some features. What didn’t work: each queue is an actor. Move to queues using synchronized data. Need to read this code and study it.</p>
<p>Actors are still a little shaky. Actors lifetime issues. Mixing threads with actors make it hard to GC.</p>
<p>Lots of exciting stuff.</p>
Make Scala lists work for youtag:kdpeterson.net,2010:/blog//2.1172009-06-17T23:15:40-07:00
<p><em>(Well dammit, I meant to save a draft, and here I’ve accidentally published it and FriendFeed spreads it around like a gossipy neighbor, so I guess I’d better finish it.)</em></p>
<p>I’ve been working on a collaborative filtering system based on genetic algorithms for <strike>message boards</strike>, <strike>running shoe recommendations</strike>, the Netflix Prize for a while now. The latest iteration is a mix of Java and Scala. I sat down to clean up some of the code tonight, and wanted to rewrite a function that made use of dot product.</p>
<p>Scala, like pretty much every modern programming language, has a REPL, aka. interpreter, making it really easy to work the kinks out of something before getting ant, junit, or an IDE involved. But when I go to paste the methods I need to work with into the repl, curses, foiled again! That piece of my app is in Java, I’ll have to rewrite it in Scala.</p>
<p>What a wonderful opportunity to talk about how Lists will make all of your wildest dreams come true. Let’s see the Java version of unit vector:</p>
<pre class='brush:java'>public static double[] unit(double[] vector) {
double[] unit = new double[vector.length];
double norm = 0.0;
for(double v : vector) {
norm += v * v;
}
norm = Math.sqrt(norm);
for(int i = 0; i < vector.length; i++) {
unit[i] = vector[i] / norm;
}
return unit;
}</pre>
<p>Wow that’s a lot of typing for such a simple concept. How would I express “divide each component by the magnitude of the vector” in scala? Well, pretty much exactly like that:</p>
<pre class='brush:scala'>def unit(v : List[Double]) = {
val sum = Math.sqrt((0.0 /: v.map(x => x * x)) {_ + _})
v.map(x => x / sum)
}</pre>
<p>Let’s walk through this one. The innermost <code>v.map(x => x * x)</code> maps the vector to its squares. The <code>/:</code> does a fold left starting with 0.0, and applying <code>{ _ + _ }</code>. Note that <code>/:</code> is a method call on the list returned by the <code>v.map...</code>. This gives the sum of squares, we take the square root to get the magnitude. Our last operation is another map. The tricky part here is the precedence of the <code>/:</code> operator means that you do need parentheses.</p>
<p>Now I have unit vectors, and I need the dot product that operates on unit vectors. If your math is hazy, dot product of unit vectors [a, b, c] and [x, y, z] is a * x + b * y + c * z. Here it is in Java:</p>
<pre class='brush:java'>public static double dot(double[] a, double[] b) {
double result = 0.0;
for(int i = 0; i < a.length; i++) {
result += a[i] * b[i];
}
return result;
}</pre>
<p>Pretty straightforward, but still imposes the mental effort on you of deciding a name for that variable, and I really hate that. Bottom line? Scala lets me avoid thinking up names for variables:</p>
<pre class='brush:scala'>def dot(a : List[Double], b : List[Double]) = {
((a zip b).map{case(x,y) => x * y} :\ 0.0) { _ + _ }
}</pre>
<p>We’ll walk through this one. I love <code>zip</code>. I like the Java 5 for-each loop, but it doesn’t do me any good when iterating over two lists in parallel. With Scala Lists, you can zip them together and treat them as a single list.</p>
<pre class='brush:scala'>scala> List(1, 2, 3) zip List('a, 'b, 'c)
res25: List[(Int, Symbol)] = List((1,'a), (2,'b), (3,'c))</pre>
<p>When I use map, I use pattern matching to get my two items back out. Map expects a function taking a single parameter, and gives it whatever is in the list, which in this case are actually <code>Tuple2</code> objects. Pattern matching allows me to break that tuple apart. I sum the list up, this time using a fold right, which means I need to change the order of the parameters.</p>
<p>So now I’m able to play with things in the REPL, get my replacement code working, and paste it back into my project, and run the junit test to verify that my improvements didn’t break anything. At 11pm, between putting the baby to bed and going to bed myself, I don’t have much mental energy to hold huge methods in my head. Scala means I don’t need to.</p>
<p>The downside is that then I go back to work in the morning, I sit there staring at the screen wondering why Eclipse is not formatting my Java right and displaying little red errors until I realize that I actually need semi-colons and return statements in Java.</p>
Pack multiple small objects in S3 for cost savingstag:kdpeterson.net,2010:/blog//2.1162009-06-12T00:13:57-07:00
<p>Amazon Web Services offers two forms of data storage. First is S3, a key-value store allowing very large files if needed, but with a pricing model that will cause problems for small files. Second is SimpleDB, a schema-less or column-oriented database, which allows storing many small pieces of data, but with limitations that prevent it scaling to the point that S3 becomes economical. In this post I describe a system I built to leverage SimpleDB to reduce the costs of storing small files in S3.</p>
<h2 id='context_biz360_community_insights'>Context: Biz360 Community Insights</h2>
<p><a href='http://www.biz360.com'>Biz360</a> Community Insights is a social media monitoring and measurement system. We consume various types of social media – blogs, forums, microblogs – perform analysis on each item, index it, store it, and present it to the user. We process tens of millions of items every day, ranging from under 1k for a tweet with metadata and analysis results, up to blog posts that can go above 100k. These items are indexed in Solr, but we don’t store the full text in the index for size reasons.</p>
<h2 id='the_problem'>The Problem</h2>
<p>The problem is where to store the complete article with metadata when we are done with it. These average around a few kb. The initial solution was to store them in S3, but our first month’s bill was thousands of dollars. Not for the storage: it was a small amount of data. Not for the data transfer: everything from EC2 is free. The cost was the $0.01 per 1000 PUT requests. We figured there had to be a way to bring this cost down.</p>
<p>About the same time, we were having trouble with a choke point in our application when we did a large map-reduce job to reconcile duplicate articles which could be sped up with a persistent lookup table of some kind. So we needed to stop storing all our items in S3, and we needed a secondary index on at least some of our items. The first thing to come to mind was SimpleDB. We were already in AWS, it didn’t have any operations headaches, and although the storage cost per GB was higher, we had now seen that it the storage cost wasn’t the most significant factor.</p>
<p>Among the other options we considered was running our own non-relational database with the top contenders being <a href='http://couchdb.apache.org/'>CouchDB</a> and the schema-less MySQL used by <a href='http://bret.appspot.com/entry/how-friendfeed-uses-mysql'>FriendFeed</a>. CouchDB didn’t seem all that mature, and we had concerns about the volume of data we could manage in MySQL and the number of machines we would need.</p>
<p>SimpleDB turned out to have significant limitations. First, it limits the size per attribute to 1024 bytes. If we wanted to actually store our data in SimpleDB, we would have needed to do some complicated system splitting the large items into 1kb blocks and store them in several attributes. Secondly, it has a limit of 10GB per domain. We expected to store orders of magnitude more than this.</p>
<h2 id='the_solution'>The Solution</h2>
<p>The general solution is that each S3 object stores multiple items, and two sets of SimpleDB domains provide indexes to those items.</p>
<p>All of my objects have a unique item key. In the first set of domains, this item key will be the “itemName” for SimpleDB. Some of my items have a secondary key, a which I’ll call dupeId for this article. In my secondary SimpleDB set, this dupeId will be the itemName, and the item key will be one of the stored attributes.</p>
<p>Items are distributed to domains based on their key. I pick a number of domains that I will partition across. Here you want to estimate the total volume of data you will be storing in SimpleDB (in my case about 200 bytes per item). Choose a number of domains so that each partition will store no more than 1GB when you get to maximum capacity. SimpleDB is actually limited to 10GB per domain, but <strike>I've been told by
Amazon that above 1GB performance starts to degrade. Also,</strike> you will not be able to change the number of partitions after the fact, so this gives you sufficient headroom when you realize that you can’t actually delete items. In my case, I’m using 30 domains for my main set of domains, and 10 domains for my secondary key (which I don’t need to retain for as long). You can have up to 100 domains without having to contact them. They’ve been happy to increase our instance limit in the past, so if you need more, just ask.</p>
<p>To store data, I need to store three things:</p>
<ol>
<li>The full item in S3, but remember this is what I want to minimize.</li>
<li>The index from the item key to the S3 location in the primary SimpleDB.</li>
<li>The secondary index from the blogHash to the item key in the secondary SimpleDB.</li>
</ol>
<p>I will mostly ignore the third step, as it is straightforward and there is nothing interesting about it.</p>
<h2 id='implementation'>Implementation</h2>
<p>I’ve defined two services, DomainSet and DomainSetS3. My access to SimpleDB is through via <a href='http://code.google.com/p/typica/'>Typica</a>. A domain set is the SimpleDB portion, a domain set with S3 uses a domain set, a buffer for each partition, and an S3 service. I use <a href='http://jets3t.s3.amazonaws.com/index.html'>JetS3t</a> for S3 access.</p>
<p>A DomainSet contains an array of SimpleDB domains, and I index into that array using hash of the item key modulo number of domains. In my case, since we will also be using the system from our Rails front end, I defined my own that I can keep consistent (though I could have just duplicated java’s hashcode in Ruby). The public interface consists of only two methods:</p>
<pre class='brush:java'>public interface DomainSet {
public void store(SimpleDbItem item);
public SimpleDbItem find(String itemName);
}</pre>
<p>SimpleDBItem is an object that contains the item name and a list of attributes. Under the covers, this gets translated to and from the Typica objects, but doesn’t expose the complexity of the SimpleDB API. Since I <a href='http://www.google.com/search?q=prefer+composition+to+inheritance'>prefer composition to inheritance</a>, DomainSetS3 uses this interface, but does not extend from it. This does add a few lines of code (I have to store numDomains and maxFailures and use both for instantiating my DomainSet), but it avoids having a meaningless single argument <code>store(SimpleDbItem)</code> method. The interface is:</p>
<pre class='brush:java'>public interface DomainSetS3<T> {
public void store(SimpleDbItem metadata, T contents);
public void flush();
public SimpleDbItem find(String itemName);
public T loadContents(SimpleDbItem item);
}</pre>
<p>In this case, I’ve let the abstraction leak a bit – the only way to load the contents of an item is to find it in SimpleDB, and then do a separate request to load it. That’s what it’s doing under the covers, of course, and if I combined both of these into one method (perhaps returning a SimpleDbItemWithData<T>), later down the line I’d have to provide the simple find for performance reasons. When you want to store an object, you would create a SimpleDbItem with the metadata (the key and date are the only values we currently use), and store it. The DomainSetS3 would add these to the buffer for whatever partition the item key hashes to until it gets to a predetermined max buffer size. Depending on your data and access patterns, 5-20 items per S3 object is about right. When the buffer is full, I create a HashMap<String, T> and serialize the HashMap into JSON. I create a UUID and store this JSON object in S3 under that id. I then write my metadata with the UUID as an additional attribute “s3loc” to SimpleDB.</p>
<p>To retrieve the data from S3, I need the metadata from SimpleDB. I fetch the object in s3loc from S3, then deserialize it back to Java. This places limitations on what kind of objects you can store, but it’s not too painful with <a href='http://jackson.codehaus.org/'>Jackson</a> – the only real restriction is that you know ahead of time what types of objects you will be deserializing and that they not have any fields which aren’t concrete types. What I get back is the HashMap of String -> T, but my key here is the same as the item key, which of course I have.</p>
<p>If I need to do updates, I will follow the same procedure as above, which results in leaving the original item in the S3 object, but overwriting the s3loc with the location of the updated item. We’re wasting storage space, but we will always see the correct version. Without some form of transactions or conditional writes it would be impossible to guarantee that you can do updates with this scheme. It’s important that anyone building a system on top of AWS understand the consistency guarantees. Again, this problem (and lots more!) would go away if Amazon exposed the “semantic reconciliation” step in the Dynamo system underlying S3. Of course, this step still exists in S3, but for simplicity the semantics are hard coded to “last write wins”.</p>
<h2 id='your_mileage_may_vary'>Your mileage may vary</h2>
<p>This system doesn’t solve everyone’s problem. If you intend to allow outside access to your data via public S3 buckets or Cloudfront, this won’t work. Additionally, if the data you are storing is too big to put into SimpleDB, it may be big enough that repeated downloads will outway the cost savings on the PUT side.</p>
<p>If you are doing a large number of updates, and you are storing data for a long time, you may find that the storage costs starts to be significant. I suppose if you were storing too much obsolete data, you could iterate through old items, repack these into new S3 objects, and update SimpleDB, but at this point, your SimpleDB costs will outweigh your S3 savings.</p>
<h2 id='a_missed_opportunity'>A missed opportunity?</h2>
<p>This was an interesting technical problem, finding the best solution with the pieces at hand, but a voice from my econ classes is whispering in my head “arbitrage”, meaning that Amazon actually has a perverse pricing system that charges me less money to use more resources. If I can store items in S3 by laying another system on top of it, Amazon would certainly be able to do the same, do so more cheaply, and without the inconsistent interface that you need to use if you adopt my system. Unless S3 has significantly different replication and reliability characteristics than SimpleDB, I suspect that they are overcharging for small PUTs, and this is keeping people from using the service to the fullest extent.</p>
<p>I’ve heard from <a href='http://biodivertido.blogspot.com/2008/06/hadoop-on-amazon-ec2-to-generate.html'>Tim Robertson</a> that he ran into the same problem with small files and scaled back what he was storing in S3 to save on costs. <a href='http://daily.hotpads.com/hotpads_daily/2009/06/hotpads-on-aws.html'>HotPads</a> has also done the calculation and this was a factor in their reluctance to recommend others adopt Cloudfront as they did.</p>
<h2 id='simpledb_a_work_in_progress'>SimpleDB: a work in progress</h2>
<p>SimpleDB is a fantastic system, but it’s not quite done yet. Amazon labels it a beta, and it’s certainly good enough to use in production, but there are lots of desireable features that aren’t yet there. As I write this, I have a that has been running for a day just going through and deleting old items from my SimpleDB domains – bulk delete or delete by query aren’t supported yet.</p>
<p>As I learn more about other options, I’m having trouble justifying the hassle of dealing with things like lack of bulk delete, not being able to store large values in the table, and the size limits. HBase is becoming more mature and is now being used to directly serve content to the web. SimpleDB is likely slightly better value if you are pinching pennies (for us, SimpleDB is less than 10% of our Hadoop cluster costs), but if I were doing this again, and I were already using Hadoop, I would prefer HBase. If you have a smaller system, and don’t want to deal with administration headaches, SimpleDB can still be a good choice.</p>
Vote No on everything todaytag:kdpeterson.net,2010:/blog//2.1152009-05-18T23:37:29-07:00
<p>I've previously written that <a href="/blog/2009/04/ca-state-lp-misguided-on-1d-and-1e.html">I don't like the CA LP's reasoning</a> on some of tomorrow's ballot measures. I still think their arguments are only appealing to hard line libertarians, but I've been persuaded by other arguments to vote against everything tomorrow, and I encourage you to do the same.</p>
<p>I was marginally in favor of the rainy day fund aspects of Prop 1A, but as you can see from the links in the comments on my earlier post, this is a fraud, a repeat of the equally pointless prop 58 that was supposed to solve all our problems only a few years ago.</p>
<p>I argued in favor of 1D and 1E in my earlier post, because they could, in theory, have helped prevent tax hikes. But <a href="http://reason.org/studies/show/1007549.html">the Reason Foundation's analysis</a> has another take on it that I found much more persuasive. Even assuming 1D and 1E can offer short term relief (as I did), they are dangerous because they offer an underhanded technique to grow state spending. First, the taxpayers are duped into supporting an initiative to provide mental health services, or pay for idiotic anti-smoking commercials. Next, once California's public employee unions induce another budget crisis, you cut the programs and transfer the money into the general fund.</p>
<p>I don't think the original supporters of the initiatives that established these programs intended it to be used in this manner, but it would set a dangerous precedent.</p>
<p>I'd even like to see 1F, the "no raises while the budget isn't balanced" measure fail. It would be good to send a message that the con games are over.</p>
Who are these "social media experts"?tag:kdpeterson.net,2010:/blog//2.1142009-05-13T17:59:02-07:00
I'm following twitter closely tonight to see how the <a href="http://snurl.com/idolfinalists">American Idol prediction</a> got picked up. And what I don't understand is who are all these people who are retweeting it? They don't seem like real accounts. Or at least, not accounts that anyone who actually read twitter would be interested in following. If you search twitter for <a href="http://search.twitter.com/search?q=Biz360+Analysis+of+Social+Media">Biz360 Analysis of Social Media</a> you see hundreds of tweets, all identical. What's even more interesting is if you click through to the people sending these out. I've clicked into about 10, and without exception, they follow this pattern:<br /><br /><ul><li>Have "social media expert" or something similar in their bio.</li><li>Have fewer followers than following</li><li>Are following a huge number of people</li><li>Send out tons of tweets</li><li>Have nothing of value to say</li></ul>Let's take a look at <a href="http://twitter.com/SocialMediaWonk">@SocialMediaWonk</a> and his tweets. As of right now, he is following almost 7000 people. He has over 6000 followers. His first page of tweets starts this morning, so he's sending out over 20 per day. Every single one of his last 20 updates was posting a link similar to ours. No original content, no commentary.<br /><br />So even though if someone does a search for "biz360" it looks like we're all twitter is talking about, when I do a search for american idol, I need to go to page 7 before I find a mention of us. Now with seo gaming, I understand the motivation. What are these people trying to accomplish?<br />
Configuring Nagios check_http on Ubuntutag:kdpeterson.net,2010:/blog//2.1132009-05-08T13:07:17-07:00
<p>I'm writing this post as I do it. My starting point is a fresh install of Nagios 2 running on an ubuntu server in EC2. My goal is to get it monitoring some web pages. I've found poor documentation, so I'm writing this out as I go.</p>
<p>First, you must define the command you want to use (I think). Here's my definition:</>
<pre>define command {
command_name check_http_port
command_line $USER1$/check_http -h $ARG1$ -p $ARG2$
}</pre>
<p>This goes in <tt>/etc/nagios2/commands.cfg</tt>. Next, you have to create a file in <tt>/etc/nagios2/conf.d</tt> for the host you want monitored (or the group of hosts, or just a local.cfg in that directory). In this file you need to define your hosts, and any services you want monitored. Here I use the check_http_port command defined above to ping our frontend machine. Notice the odd syntax for providing parameters to the command.</p>
<pre>define host{
use generic-host
host_name mi-prod-web01
alias mi-prod-web01
}
define service{
use generic-service
host_name mi-prod-web01
service_description Login Page
check_command check_http_port!mi-prod-app01!80
}</pre>
<p>Lastly, you probably want to add the host to host group in conf.d/</p>
<pre>define hostgroup {
hostgroup_name http-servers
alias HTTP servers
members localhost,mi-prod-web01
}</pre>
Config enum patterntag:kdpeterson.net,2010:/blog//2.1112009-05-04T00:19:54-07:00
<p>This is something I stumbled across while browsing the source for <a href="http://hadoop.apache.org/hive/">Hive (Hadoop</a>).</p>
<p>I normally define my properties and default values using something like this:</p>
<pre class="brush:java">private static final String CLASSIFIER_MODEL_LOC_PROP = "models.classifier";
private static final String CLASSIFIER_MODEL_LOC_DEFAULT = "config/classifier.ser";
</pre>
<p>It works well enough, but it's a little long, and it is sort of repetition -- maybe in one place in the code, I read the value with a different default. Here's what someone did in Hive:</p>
<pre class="brush:java">public static enum ConfVars {
// QL execution stuff
SCRIPTWRAPPER("hive.exec.script.wrapper", null),
PLAN("hive.exec.plan", null),
SCRATCHDIR("hive.exec.scratchdir", "/tmp/"+System.getProperty("user.name")+"/hive"),
SUBMITVIACHILD("hive.exec.submitviachild", false),
SCRIPTERRORLIMIT("hive.exec.script.maxerrsize", 100000),
...
}
</pre>
<p>I generally like enums. Maybe because at a previous company someone thought objects were too hard and everything should be stored in hashmaps, and if you needed a value, you need to know the name of it, and god help you if you need to search for all uses. So I think I will use this next time I do anything dealing with configuration.</p>
CA State LP Misguided on 1D and 1Etag:kdpeterson.net,2010:/blog//2.1102009-04-30T23:19:36-07:00
The Libertarian Party of California has taken a position I think is misguided on the <a href="http://ca.lp.org/pr20090407.shtml">May 19th ballot measures</a>. They have opposed props 1D and 1E, but these are probably a good compromise to improve California's budget situation.<br /><br />What 1D and 1E do is to move funding from two specially financed programs into the general fund. 1D would take tobacco tax money currently used for anti-tobacco initiatives (i.e. patronizing tv ads) and redirect it to the general fund to pay for what are presumably more useful programs. 1E would do the same with special mental health funds currently paid by a tax on the wealthiest Californians.<br /><br />The only argument given against these is that it doesn't cut overall government spending. Is this the only measure of a law? We don't have the option on the ballot to cut taxes. It's not one of the questions being asked. Our options in this election are to spend the money on anti-smoking ads and expanded government provided mental health services, or use it to pay for other more useful services through the general fund.<br /><br />For these reasons, I will be voting yes on prop 1D and 1E, and encourage you to do the same.<br /><br />On the other measures, it's important to spread the message that while 1A is a tax increase, it also includes changes to how budgeting is done, which could reduce the hysteria in the future. Prop 1B on the other hand, is a complete gutting of the only beneficial effects of prop 1A. A reasonable person could conclude that prop 1A, though it is a tax increase, is a better option than what may come out of Sacramento if we force them to go back to the drawing board without approving it. I am undecided how I will vote on 1A at this time.<br /><br />1C is a foolish borrowing against future lottery proceeds, and authorizes the state to increase marketing so that the poor and foolish can help pay off this enormous loan.<br /><br />Prop 1F is a minor symbolic gesture, but definitely one to vote for. It would prevent law makers from raising their pay while running a deficit.<br /><br />I welcome your comments.<br />
<p><i>Edit: here's an <a href="http://www3.signonsandiego.com/stories/2009/may/08/lz1e8poizner2299-propositions-are-just-more-false-/">interesting piece</a> on prop 1A with a comment from Richard Rider also worth reading.</i></p>
Migrating from Nucleus to Movable Typetag:kdpeterson.net,2010:/blog//2.1082009-04-19T00:30:15-07:00
<p>My original blog was based what was available as a one click install when I set up the domain in 2006. Times have changed, and even though I don't intend to become a professional blogger, I do see value in building myself as a brand, and a well done blog is part of that. Nucleus may be great for some people, but I wanted to change hosting providers (I'm now at Dreamhost), and so I would need to at the very least reinstall, and at that point, it made sense to look around. I settled on Movable Type, and I'm happy with it so far. My decision was mostly based on the fact that I liked how well commenting worked on <a href="http://www.schneier.com/blog/">Bruce Scheier's blog</a>. So, what do I need to do?</p>
<ol>
<li>Export the data from Nucleus</li>
<li>Import the data into Movable Type</li>
<li>Make sure external links still work</li>
<li>Fix anyone subscribed to my rss feed</li>
</ol>
<h2>Export from Nucleus in Movable Type format</h2>
<p>My starting point was a database dump from the old system. I imported this into my local mysql, and started looking at it and matching up fields to <a href="http://www.movabletype.org/documentation/appendices/import-export-format.html">Movable Type import/export format</a>. It's a pretty straightforward process. The comments must be matched up to the blog entries, and the date format is different between the two.</p>
<p><a href="http://gist.github.com/96900"><tt>nucleus_export_mt.rb</tt></a></p>
<h2>Import into Movable Type</h2>
<p>Once I went to all the trouble to export into the right format, importing was simple. Every item imported just fine. It seems I ended up exporting category numbers, so I had to go back in and rename my categories, but with only four items, this wasn't worth fixing the import process, it's easier to do by hand.</p>
<h2>Make existing external links still work</h2>
<form mt:asset-id="1" class="mt-enclosure mt-enclosure-image" style="display: inline; float: right;" contenteditable="false"><a href="http://kdpeterson.net/blog/assets_c/2009/04/inbound-links-1.html" onclick="window.open('http://kdpeterson.net/blog/assets_c/2009/04/inbound-links-1.html','popup','width=607,height=434,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://kdpeterson.net/blog/assets_c/2009/04/inbound-links-thumb-350x250-1.png" alt="inbound-links.png" class="mt-image-right" style="margin: 0pt 0pt 20px 20px; float: right;" height="250" width="350" /></a></form>
<p>One last problem was what to do with external links to individual blog posts. You can see what pages are linked to from outside using Google's webmaster tools. I wasn't able to find this exact information in such a convenient form in MSN Live search or Yahoo! As you can see in the image, I have only a handful of files with outside links. These are linking to the old <tt>index.php</tt>. Movable Type does not have a file <tt>index.php</tt> -- the url structure is different.</p>
<p>What I have done is create a very simple <tt>index.php</tt> that does nothing but forward old requests to the correct pages. It contains a hard coded list of these legacy outside links -- they are unlikely to ever change. If it recognizes the parameters as a request for one of these pages, it forwards to the appropriate page on the MT blog, using a 301 Moved Permanently. This will at least tell search engines that they should index the new location.</p>
<p>If the user didn't supply an understood argument to the script, either an item that I haven't covered, or maybe some archive page, I set status code 404 to indicate to search engines that this page isn't valid. I then redirect to the main page of the blog with a message and a three second delay. This gives visitors the best chance of finding what they want, but also gives them a heads up that they will have to look for it.</p>
<p><a href="http://gist.github.com/97962"><tt>index.php</tt></a></p>
<h2>Fix the RSS Feed</h2>
<p>I did the same kind of redirect as above, but this time I'm redirecting anything looking for xml-rss2.php to atom.xml. <a href="http://gist.github.com/97973"><tt>xml-rss2.php</tt></a>. This definitely worked for FriendFeed, still waiting to see if Google Reader picks it up.</p>
<p>Incidentally, every web developer should at least know where to look for <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html">HTTP Status Codes</a>. No need to memorize it, but be aware that sometimes subtle difference matter.</p>
A new recruiting paradigmtag:kdpeterson.net,2010:/blog//2.552009-04-11T05:12:45-07:00
<p>Here’s what my linkedin profile says:<blockquote>I'm not looking to change jobs right now, but if you think you can change my mind, I'm open to hear from you if it's a start up with a good business plan within 15 minutes of San Mateo, you are not a recruiter, and you specifically describe the position you are trying to fill. All of you recruiters with "wonderful opportunities" with "several local companies" just please go pester someone else.</blockquote><br /> Here is what I.M. Dipshit, owner, Dipshit and Associates writes me:<blockquote>I’ve been trying to connect with you regarding a search for a position, but am unable to reach you via the phone. Please give me a call if you have a moment or send me a number where I can reach you. My number is: 800-xxx-xxxx or xxx-xxx-xxxx.</blockquote><br /> Am I going to respond to him? Sure, I’m just going to copy and paste that paragraph from my profile and see if he can follow simple directions. As to how hard this guy has been trying to reach me via the phone, searching for my name with any reasonable search terms (“software engineer”, “san mateo”) would get you my homepage, <em>with my resume linked</em>. As a bonus, it has even more information about the fact that I’m not interested in changing jobs unless you have something really outstanding.<br /> <br /> I have a plan I’ve been considering. I’ll charge for my time to talk to recruiters. Force him to paypal me $100 / hour ahead of time to listen to him. It’s all about aligning incentives and internalizing externalities. Now, a recruiter is making over $10,000 per placement. It’s the same asymmetric cost / reward structure as spam and junk mail. It’s worth it for him to call incessently. If he spends 20 hours / week on the phone, that’s his job.<br /> <br /> To me, the prospective job switcher, most of the time changing jobs is pretty close to no gain. I have gains in the form of improving my resume by switching to a job where I will learn new skills. I have monetary gains. I may have personal gains if i dislike my present job. But I have costs as well. A job search is a lot of hassle, a lot more hassle for the job seeker than for the recruiter, because this isn’t my job. It means stepping out of the office and taking calls, taking personal time to interview, and so on. Additionally, and probably most significant, switching jobs impose a cost on future employability. If you change jobs too many times, no one wants to hire you.<br /> <br /> So if the recruiter honestly believes his sales pitch, that this is a wonderful opportunity and I’m a great fit for it, how about putting some money on that? You send $100 to my PayPal account, I set up a one hour period when I will be available to talk to you. Additionally, I’d make the terms of the contract (and money is changing hands, so this is easy to establish that there is a contract) be that the recruiter is only authorized to send your resume to those companies that you authorize during the phone call. This guards you against a recruiter sending out your resume to companies that you later contact on your own or through someone else, in the hopes of being able to claim the bounty.<br /> <br /> I think the issue here is that if a recruiter is such a jackass that he’s going to send me that kind of vague email after what I said in my profile, I can only expect that he’ll ignore what I say when I tell him I’m not interested in enterprise, I’m not interested in large companies, I’m not interested in relocating, etc.</p>
Easy JSON with Jacksontag:kdpeterson.net,2010:/blog//2.992009-03-19T02:46:18-07:00
Sometimes a casual pick can lead to big problems. Early on in developing our <a href="http://www.biz360.com/products/communityinsights.aspx">new social media product</a> we decided to use JSON for storing our data in HDFS. This idea has worked out really well for us. The performance is good, it's easy to read if you want to take a peak at a file by hand, and there are libraries for every language under the sun, making it easy to write streaming jobs in your <a href="http://www.ruby-lang.org/en/">language of choice.</a><br />
<br />
What didn't work out so well for us was the library we chose. The <a href="http://json-lib.sourceforge.net/">net.sf.json</a> library is a big improvement over the code it's based on from <a href="http://www.json.org/java/">www.json.org</a>, but it has minor memory leak, apparently it is using a threadlocal hashmap of objects seen to make cycle detection faster. This fails miserably when we try to send gigabytes of data through the same Mapper object -- it holds a reference to every object deserialized.<br />
<br />
Since we were unfamiliar with Hadoop when we started, we didn't identify the library as the cause of the problems. We expected some performance hiccups, and when we ran out of memory, we just gave it more. But still the problems didn't go away. Finally a heap dump pointed us in the right direction, and once we knew what to look for, it was easy to spot the suspect line of code.<br />
<br />
At this point we did what we should have done at the outset: ask the mailing list. A quick note to the core-users Hadoop mailing list gave us one person who had previously had trouble with json-lib (and got it fixed quickly after reporting the bug), and two who recommended <a href="http://jackson.codehaus.org/">Jaskson JSON Processor.</a> Jackson is even easier to use than json-lib.<br />
<br />
Here's some sample code that shows how to read and write objects.
<pre class="brush:java">public class JsonUtils {<br />
private static final Log log = LogFactory.getLog(JsonUtils.class);<br />
private static final ObjectMapper mapper = new ObjectMapper();<br />
public static String jsonFromObject(Object object) {<br />
StringWriter writer = new StringWriter();<br />
try {<br />
mapper.writeValue(writer, object);<br />
} catch (RuntimeException e) {<br />
throw e;<br />
} catch (Exception e) {<br />
log.error("Unable to serialize to json: " + object, e);<br />
return null;<br />
}<br />
return writer.toString();<br />
}<br />
<br />
public static User userFromJson(String json) {<br />
return (User) objectFromJson(json, User.class);<br />
}<br />
<br />
static <T> T objectFromJson(String json, Class<T> klass) {<br />
T object;<br />
try {<br />
object = mapper.readValue(json, klass);<br />
} catch (RuntimeException e) {<br />
log.error("Runtime exception during deserializing "<br />
+ klass.getSimpleName() + " from "<br />
+ StringUtils.abbreviate(json, 80));<br />
throw e;<br />
} catch (Exception e) {<br />
log.error("Exception during deserializing " + klass.getSimpleName()<br />
+ " from " + StringUtils.abbreviate(json, 80));<br />
return null;<br />
}<br />
return object;<br />
}<br />
}</pre><br />
<br />
Feel free to use the above if you find it useful. It was written at home and I think is actually an improvement over what I wrote at work (the first time seeing the library).
So who can sign an argument against a local ballot measure?tag:kdpeterson.net,2010:/blog//2.942009-03-10T04:49:22-07:00
<p>Santa Cruz County says anyone can sign a ballot argument (from<br /> http://www.votescount.com/books/argument.pdf )<br /> <br /> “Signers of arguments for or against a county, school, or special district measure do not have to meet the criteria listed above. The filer of the argument must meet the criteria above; however, anyone may sign the argument.”<br /> <br /> Shasta county agrees:<br /> “Arguments and Rebuttals must be accompanied by the Verification Statement included in this Guide. (Elections Code §9600) There is a distinction between a “filer” and a “signer or author.” The filer argument or rebuttal must be either the governing board of the district, a bona fide association of citizens or an individual voter who is eligible to vote on the measure. The “signers or authors” of the argument or rebuttal can be any person or organization accompanied by a signature of a principal officer. Filers do not have to be signers.”<br /> <br /> Sutter county has almost identical language to Santa Cruz. Butte County also.<br /> <br /> San Luis Obispo has different “filers do not have to be signers” language.<br /> <br /> Orange County allows signers who are not filers, but requires that the filers formally delegate their right to sign. See http://ocvote.com/election/rebuttal_Handbook.pdf<br /> <br /> Sacramento County follows a similar system to Orange County.<br /> <br /> Where did this all come from? San Mateo County ROV says that our arguments against local tax measures authored and filed by LSPM’s secretary can only be signed by people who are allowed to author or file the argument. First, the law doesn’t say that. Second, our actual disagreement is about whether we have to use our titles. That is, I’d like to sign ballot arguments as “Kevin Dempsey Peterson, Software Engineer”, because that’s the designation I’ll be using if I run for office again. The ROV wants to put me down as Treasurer, Libertarian Party of San Mateo County. Seems somewhat reasonable, and I’ve have no problem if the law were actually written that way.<br /> <br /> Bonus points: the form they provide for submitting ballot arguments misquotes California election code, combining “authors” and “signers” into the same thing (while the EC makes a distinction).</p>
Prop 8 Gay Marriage Ban is a Failure of Democracytag:kdpeterson.net,2010:/blog//2.882009-03-05T05:07:11-08:00
<p>California’s proposition 8, which says that marriage is only between a man and a woman, illustrates fundamental flaws in democracy. The right answer to the “issue” of gay marriage is the same as the answer to the “issue” of what movie I should watch tonight – no one’s business but mine and my wife’s. I think that the vast majority of Californian’s would agree with this position if asked, but social conservatives were able to frame the debate so that the voters were faced not with the question of whether the government should dictate who can marry whom, but instead forced to codify their gut feeling on marriage into law. <br /> <br /> Almost everyone I have spoken to over the age of forty who isn’t politically active is uncomfortable with the idea of gay marriage. To them, “marriage” means a man and a woman, and when the question is phrased as whether two people of the same sex can marry, they will say no, just as they would say that you can’t be friends with someone you have never met, or you can’t sing a song with no words. These aren’t moral judgments, these are just simple statements that their definition of marriage means one man, one woman. They don’t have any strong opposition to gay marriage, it’s just a “how would that work” uneasiness.<br /> <br /> On the other hand, against this mild queasiness, you have millions of Californians being told they cannot marry, they cannot gain the same legal recognition for their family as others. These same sex couples aren’t just being insulted, they are encountering real, significant consequences of the limitations of the law. <br /> <br /> With that background, a small number of social conservatives were able to frame the question as “what is the definition of marriage”, rather than the more relevant question of “should same sex couples have the same rights as everyone else”. In the eyes of the law, these may be the same thing, but to the voter, these subtle choices of wording and point of view change everything.<br /> <br /> This is why I say that proposition 8 was a failure of democracy. From a legal standpoint, I <a href='http://reason.com/news/show/129641.html'>agree with Tom Campbell</a> that the California Supreme Court made a questionable ruling when they legislated from the bench and this should have been legalized through the democratic process. But the initiative process failed miserably. The California legislature, able to weigh the public’s general queasiness against the significant legal discrimination against same sex couples, was doing their best to provide those rights, as with domestic partnerships. They were doing a reasonably good job of balancing the conflict between what they knew to be right and good for California, and Prop 22, which banned same sex marriage and they did not have the power to repeal.<br /> <br /> But this is not how direct democracy works. The initiative process says one person, one vote, so one person’s slight unease at “legitimizing” gay couples who make them a little uncomfortable is given the same importance as another person who desperately wants to marry his long time partner. Similarly, democracy says that the vote of someone living off government assistance is just as valid as the vote of someone who works and pays taxes. We let people with no children vote in school district elections. We let people who do not own houses vote to raise property taxes.<br /> <br /> In short, we allow two wolves and a sheep to vote on what to have for dinner. Is there an alternative? Yes, the alternative is freedom. Freedom, coupled with the rule of law, says that we take certain things off the table. Certain things are removed from the calculus of political bargaining. We strictly rule out any bill that restricts freedom of speech, of the press, of religion, of the right to keep and bear arms, of the right to be secure in our homes.<br /> <br /> By now you may be snickering, because you know that those rights are no longer strictly protected. They are viewed only as suggestions, or guidelines. What has happened is that the people have forgotten why those rules were adopted. They have willingly reelected politicians who debase themselves and sell our freedom for short term gain, who promise us safety, if only we consent to police searching our bags on BART. Who promise us financial security, if only we hand over half of everything we make.<br /> <br /> Remember that prop 8 isn’t about marriage, and it isn’t about gays. It’s about freedom. As long as the voters remember the first question to ask isn’t “which side am I on”, but rather “is this really a political question at all”, we can dig ourselves out. We can go back to the day when you could live your life in peace, and go about your business without having to convince your neighbors, or, even worse, have to hope that some judge would decide on a whim that he liked you, and find some convoluted argument that allows him to rule the way that he wants to.<br /> <br /> If I have to choose between freedom and democracy, I’ll take freedom any day.</p>
Joined the Twitter hive mindtag:kdpeterson.net,2010:/blog//2.832009-02-12T17:42:39-08:00
<p>I’m on Twitter now. I don’t claim to understand the hype, yet. It reminds me of a bit in Vernor Vinge’s <em>Marooned in Realtime</em>, one of the novels to popularize the Singularity. One of the characters describes being totally cut off from society compared to the state of the art which was augmenting your brain to be able to “think” with several of your colleagues at once. We aren’t quite there yet, but I think twitter is closer than IRC to being the hive mind.<br /> <br /> <a href='http://twitter.com/kdpeterson/'>http://twitter.com/kdpeterson/</a><br /></p>
Most Meaningless Technobabbletag:kdpeterson.net,2010:/blog//2.782009-01-07T00:21:55-08:00
<p>This is just over the top for the most meaningless technobabble in a single sentence I’ve ever read. From the January Amazon Web Services newsletter:<br /> <br /> <blockquote>RunMyProcess delivers a cutting edge and ergonomic "BPM as a service" platform, enabling rapid SaaS integration and workflow applications design.</blockquote></p>
Without assholes, we'd all be full of shittag:kdpeterson.net,2010:/blog//2.792008-12-23T23:27:00-08:00
<p>It applies equally well to biology and to software development. It's easy to get everyone into a room to talk about coding standards and get everyone to agree that they are going to comment every public method, or that they are going to write good unit tests for every piece of code big enough to break. What's hard is getting people to follow through, and it's hard because it requires being an asshole. Nobody wants to be the asshole. Telling someone "hey, this code you just checked in is garbage, every method is public, and nothing has a comment or a unit test." Even worse is the status update with no basis in reality. How many times have you heard the following?</p><p>"What's the status on implementing component X?"</p><p>"It's done."</p><p>"Oh, that's great. Can I see it running?"</p><p>"Well, I haven't actually deployed it to staging yet."</p><p>"So you're just running it on local data on your machine?"</p><p>"Well, no, it doesn't actually work yet. But I think I know where the problem is."</p><p>Now, if you said that's a usage of the word "done" that you weren't familiar with, you might hurt someone's feelings. This kind of touchy-feely concern for feelings is counterproductive. While you might be sparing the feelings of the guy who doesn't want to come right out and say that he's behind schedule and drowning, allowing this kind of thing to happen frustrates the other developers.</p><p>I'm not arguing in favor of throwing around accusations and trying to assign blame rather than getting the job done. I'm talking about assigning meanings to words. Does marking a bug resolved mean that it's been deployed to production, that it's been tested in staging, or that it has been checked in but not tested? All of these are valid options. What's important is that the team agree on one (or if the team can't agree, that the team lead pick one), and that people are held to those standards. Without an asshole to hold people accountable, any decisions about standards or best practices are empty, any meetings are a complete waste of time.</p><p>Everyone wants everyone else to follow standards. Everyone wants all the code they have to take over and maintain to have good tests and clear APIs. But no one wants to go to the hassle of doing it in their own code. I don't need javadoc to remember what the method public int doItAndReturn() does. But the organization does. Realistically, I don't expect to maintain the code I write today. Sure, I'll work with it for a year or two, but chances are my code will be with the company long after I've moved on. At that point, if the code is unmaintainable, the business value suffers. Coding standards, test coverage, good documentation, and automation reduce the technical debt, but these standards don't happen by magic. No project manager will ever ask for refactorings. No manager will say "I see we have 90% test coverage, but are they <em>good</em> tests?" Standards need to be enforced by people who are in a position to see problems.</p>
Bailout Talking Points for Libertarianstag:kdpeterson.net,2010:/blog//2.742008-09-23T19:07:51-07:00
<p>The bailout is in the news. We get new info daily, and it’s still in flux. If you believe in small government, or even responsible government, or even some semblance of a non-corrupt government, these are the key points to make when this subject comes up in conversation. Note that I wrote this on Saturday, 20 Sept, and things may have already changed (I don’t have a staff to figuring things out for me, I just read the news like you do).<br /> <br /> <ul><li>We don't know what the final number is going to be, but it's looking like $700 Billion. That's $2000 for every man, woman and child in the country.</li><br />
<li>Yes, it's true that these are loan guarantees, not straight handouts, but the fact that no profit-driven investors stepped up to the plate means they aren't going to pay out. The banks are not going to be handing off good loans to the bailout fund, they will be handing off the loans that are being defaulted on. We would be lucky to see even half this money returned.</li><br />
<li>These handouts punish the prudent and reward the foolish. Why should Americans be cautious with their money, make sure they don't get in over their head, when those who do will be forced to pay for the "bad luck" of those who bought houses they couldn't afford and those who made the loans?</li><br />
<li>If a bank collapses, the only ones who lose money are investors. If the bank that holds your mortgage collapses, it isn't as if they can come and kick you out of your house. The worst case scenario is that you'll get a letter advising you that your lender is defunct, your loan has been purchased by another bank, and would you please send your future loan payments to a new address?</li><br />
<li>We see that the true meaning of bipartisan action is helping out huge corporations that donate to both parties. It's no virtue to be generous with taxpayer money.</li><br />
<li>Once the banks that have made bad loans have been taken over by more responsibly managed banks, the problem will be solved -- if investors lose money, they aren't likely to invest in bad loans again.</li><br />
<li>As a Libertarian, my preference would be that that $2000 per man, woman and child went toward reducing the debt and cutting taxes, but whatever your politics, I'm sure we can all agree that there are many more important things to spend the money on than making sure the rich<br />
get richer.</li><br />
<li>This is the kind of opportunity that can actually get a Libertarian elected. Hammer this issue hard. This is the wedge that will make voters look to Washington and say "you're out of your minds".</li><br />
<li>Don't talk about the gold standard. It's not a related issue and talking about the gold standard makes 90% of the voters think you are a kook.</li><br />
<li>Don't talk about inflation. The relationship of this bailout to inflation are too complicated to explain in anything less than a feature length article, so just stay away from the issue. If you actually get asked, say it is increasing the money supply so we can expect inflation. Keep in mind that inflation probably directly benefits debtors more than the negative effects on the economy harm them.</li></ul></p>
Why does Wells Fargo use broken passwords?tag:kdpeterson.net,2010:/blog//2.702008-09-19T23:47:18-07:00
<p>Wells Fargo wants to make it easy for you to do your banking online. They don’t want you to deal with little inconveniences like having to enter your password correctly. I’m sure that sometime in the past, you’ve been told it’s good to have both uppercase and lowercase letters in your password. I heard a rumor that Wells Fargo ignores this, tested it, and have confirmed that Wells Fargo will accept a password with any combination of upper and lowercase letters. That is, if you enter “tHis Is mY paSsWoRd” as your password, they will accept “this is my password”.<br /> <br /> Wells Fargo’s current password policy is “Your password must be 6 to 14 characters and contain at least one letter and one number.” I have to ask, why no more than 14? Hard drives are pretty cheap these days, I’m sure you could handle storing as many characters as someone reasonably wanted to store. There is no excuse for bad password security these days. Here’s how to do good password security:<br /> <br /> <ul><li>No maximum length (or so high it isn't needed)</li><li>No disallowed characters, if I want spaces, let me have spaces.</li><li>If you want to have "fuzzy" passwords like smashing case or ignoring spaces, make these optional, or at least inform the user at the time of setting the password.</li><li>If you want to require a certain strength, make the calculations holistic -- if I have a 35 character password, it's secure even without a special character.</li><li>If your programmer tells you that any of the above are impossible, fire him and find someone competent.</li></ul><br /> If anyone knows of a bank that does online security properly, won’t send me junk mail, and won’t throw up an offer to get online statements before letting me get to my account when I already get only online statements, please let me know what bank that is.<br /> <br /> <i>Edit: Christopher makes a good point about telephone access being a possible driver for this in his comment. He also mentions the "security questions" problem that Bruce Schneier <a href='http://www.schneier.com/blog/archives/2008/09/sarah_palins_e-.html'>recently covered</a>.</i></p>
ICANN has .cheeseburger?tag:kdpeterson.net,2010:/blog//2.652008-06-26T17:23:43-07:00
<p>ICANN has opened a process to allow new top level domains. I just wanted to be the first one to make the obvious pun.<br /> <br /> <a href='http://www.icann.org/en/announcements/announcement-4-26jun08-en.htm'>ICANN Announcement</a></p>
Changing location (URL) of SVN repository in Eclipsetag:kdpeterson.net,2010:/blog//2.602008-03-19T12:33:30-07:00
<p>I use Subversion to manage documents and projects at home. I ran into a problem when I changed the IP I had assigned to the box hosting the svn repository. It isn’t obvious how to change where Eclipse keeps track of this information. I found some replies to <a href='http://svn.haxx.se/subusers/archive-2005-05/0098.shtml'>this old message</a>, which got me to try the Switch then Relocate commands in TortoiseSVN. This made it possible to synchronize, but I wasn’t able to get to options like Configure Tags. The fix is to go to the SVN Repository view (or perspective), neither of which I use very often. Here you can access the Relocate command.</p>
The Widescreen Rip-off: 11% Less for the same Pricetag:kdpeterson.net,2010:/blog//2.1002008-03-12T20:53:27-07:00
<p>It’s almost impossible these days to find a monitor or laptop that isn’t wide screen. They are even making their way into bed room size televisions. Is this the wonderful progress of technology, bringing us fantastic products for lower prices every day? Not really. It’s actually just a marketing ploy to let them use a bigger number, much like the Athlon 4800+ which actually runs at 2500mhz.</p>
<p>Monitor sizes are quoted in diagonal measure, just as TVs have been for decades. This has worked well because the aspect ratio used to be constant at 4:3. But when we start playing with the shape of the screen, things get very different. If you have a 20” diagonal screen, at normal 4:3 aspect ratio, the screen is 16” across by 12” tall. This gives an area of 192 sq in. But if you have a 20” at 16:9, you get a screen that is indeed wider at 17.4”, but it’s also shorter, at 9.8”. This gives a screen area of only 170.9 sq in. That’s 11% less than the normal aspect ratio screen.</p>
<p>You can calculate the area of a screen by squaring the diagonal measure (20 * 20 = 400), then multiplying by 0.427 for widescreen, or 0.48 for normal 4:3. If you come across the somewhat unusual 16:10 ratio, it’s closer to widescreen, with the ratio being .449.</p>
<p>For a TV, especially if you like to watch movies in their original aspect ratio, widescreen makes sense. For computers, it doesn’t. Not only do you get less area, but the way that computer interfaces are designed, you usually lose space where you need it most: the top (title bar, menu bar, buttons) and bottom (status bar, taskbar) of the screen.</p>
<p>This is Ruby to easily calculate area of a screen given an aspect ratio and nominal diagonal. It’s both easier to type on the web than real math, and can be dumped directly into <tt>irb</tt> to do your own calculations. The diagonal measurement is the hypotenuse of the triangle. The height / width is the tangent, so we get the angle using arc tangent.</p>
<pre class='brush: ruby;'>def height(diagonal, aspect)
diagonal * Math.sin(aspect)
end
def width(diagonal, aspect)
diagonal * Math.cos(aspect)
end
def area(diagonal, aspect)
width(diagonal, aspect) * height(diagonal, aspect)
end
normal = Math.atan(3.0 / 4.0)
wide = Math.atan(9.0 / 16.0)
# area of a 17" 4:3 screen
area(17, normal)
=> 138.72
# area of a 17" 16:9 screen
area(17, wide)
=> 123.489614243323
#ratio of wide screen / normal
area(1, wide) / area(1, normal)
=> 0.890207715133531</pre>
Peterson for Congress 2008tag:kdpeterson.net,2010:/blog//2.892008-02-03T14:11:13-08:00
<p>I’m running for office because I’m pissed off. I’m pissed off at police that kick down doors using military tactics with no concern for the constitution, under the reasoning that accidentally shooting a few innocent bystanders is a small price to pay to make sure AIDS patients can’t smoke marijuana. I’m pissed off at having 15% of my earnings go to fund a retirement plan which will most likely go bankrupt. I’m pissed off at “terrorism alerts” which accomplish nothing but to keep the populace in a perpetual state of fear, and at airport security which detains children for having a similar name to someone on a secret “no fly” list. I’m pissed off because FEMA is declaring places like Foster City a flood zone so that they can force home owners to buy flood insurance to pay for the poorly managed fiasco in New Orleans.<br /> <br /> I’m pissed off that political debate in this country is entirely about whether the Washington should force everyone to do something, or ban them from doing it, and it seems to never pass through anyone’s mind that maybe, just maybe, an individual can make his or her own decisions.<br /> <br /> I support freedom and individual choice. I support copyright reform which expands fair use rights and oppose retroactive extensions of copyright. I support ending the failed war on drugs. I support full equal rights for gays. I oppose costly government bailouts of private corporations. I support the right of all law abiding citizens to own a gun. Most of all, I support you. I support letting you make the choices that affect your life, and leave government to deal only with those cases where it’s necessary.<br /> <br /> Visit <a href='/congress2008/'>Peterson for Congress 2008</a> and help turn back the tide. If you are a registered libertarian in the 12th Congressional District, I need your signature to get on the ballot. <a href='mailto:congress@kdpeterson.net'>Email me</a> if you have not received a form in the mail already.</p>
Cool T-Shirttag:kdpeterson.net,2010:/blog//2.842007-12-11T22:29:43-08:00
<div style='float: right; padding: 15px;'><img src='http://images.cafepress.com/product/133493024v11_150x150_Front_Color-Black.JPG' /></div>
<p>Looking for the perfect gift for the capitalist on your list? Try this <a href='http://www.cafepress.com/arthurshall/2996353'>Milton Friedman T-Shirt</a> available from <a href='http://www.arthurshall.com/index.shtml'>Arthur's Hall of Viking Manliness</a>.</p>
<p>They also have a good article <a href='http://www.arthurshall.com/x_2007_hippies.shtml'>How tree-Hugging hippies are destroying our environment</a>. You gotta love any website that offers a “manly answer to Oprah’s Book Club”.</p>
Economics in One Paragraphtag:kdpeterson.net,2010:/blog//2.852007-11-18T14:59:21-08:00
<p>Given the overhead for sales taxes, income taxes, etc etc, it takes about 2 hours for me to earn enough to buy one hour of the time of labor from someone who makes the same salary as me.</p>
<p>Let’s assume someone makes $20 / hr. Let’s assume that he works for someone who makes something I want. It costs $20 for parts, overhead of the facilities, tools, etc. It takes 1 hour of the worker’s time. So they have to pay the worker $21.60 to cover their share of the employment tax. Adding in 8.5% tax on the 41.60 gives a total price you would have to pay of $45.14 for this item. How much does the worker take home? Using number from my actual check stubs, he loses 19% to federal income tax, 7.65% to his half of the employment tax, and 7.4% to state income tax, so the worker takes home $13.19. So the worker has to work for two hours to be able to buy back his original one hour of labor ($45.14 - $20 (materials/overhead) = $25.14 just under $26.38).</p>
<p>Note that I’ve assumed taxes make no contribution to the cost of the raw materials and overhead. If these were included, the worker would be losing even more to the government.</p>
<p>If half of my labor goes to the government, do I get half of what I use from the government? Not even close.</p>
<p>(Title inspired by Henry Hazlitt’s <a href='http://www.amazon.com/gp/product/0517548232?ie=UTF8&tag=kdpetersonnet-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0517548232'>Economics in One Lesson</a><img src='http://www.assoc-amazon.com/e/ir?t=kdpetersonnet-20&l=as2&o=1&a=0517548232' border='0' height='1' alt='' width='1' style='border:none !important; margin:0px !important;' />)</p>
Bad Experience at Mangiare Restaurant in Brisbane, CAtag:kdpeterson.net,2010:/blog//2.802007-11-17T13:09:59-08:00
<p>I’m guessing that either this place recently changed ownership, or just that the completely incompetent work the weekend shift. Mangiare was completely empty this Saturday morning, and it’s no surprise since the service is the worst I have ever received. There were eight of us. The first person to receive his food was about 20 minutes after we ordered (at the register). The last person received her food about a half hour after that. They got two orders completely wrong – Jenny had to send hers back. Samantha got the wrong order, and when she pointed this out to the cashier he had the audacity to argue with her about what she had ordered. She decided to take it and eat it anyway, since she was hungry and at this point everyone else had already finished their breakfast.<br /> <br /> No one got the right kind of toast. Apparently they were out of everything but rye and onion bagels so decided to replace sourdough with a random selection of either.<br /> <br /> My food was okay. The potatoes were slightly undercooked and the butter was a little foil packet still at refrigerator temperature, but it was okay.<br /> <br /> Is service normally this shitty? Or is the abysmal service just a clever strategy to make money off the $.50 they change for coffee refills while you are waiting for your food?<br /> <br /> In good news, I can still run 3 miles in just over 21 minutes, and Samantha came in under 27. This was supposedly a 5k, but I think they just rounded down based on a few GPS readings I saw.</p>
San Mateo does electronic voting okaytag:kdpeterson.net,2010:/blog//2.752007-11-07T08:01:49-08:00
<p>When I got my ballot pamphlet a few weeks back, I was disappointed to see that they were moving to electronic voting. I think the chain of reasoning here is pretty clear:<br /> <ol><br />
<li>Senile idiots in Florida can't figure out how to operate paper.</li><br />
<li>Electronic voting systems are even harder to use for easily befuddled retirees.</li><br />
<li>Electronic voting systems cost a lot more and make more money for government contractors.</li><br />
<li>Clearly we should move to electronic voting systems.</li><br />
</ol><br /> They list all sorts of ways I know that my vote is secure, like “rigorous logic and accuracy testing”, and “stored in four physically separate locations for backup”. All of this is smoke and mirrors. In fact, all established methods of testing make the assumption that the person producing it intends for it to work as described. The problem with the security model they are using to evaluate these systems is that electronic votes behave like pieces of paper. That is, they assume that the system accurately records the vote cast, that the system will not change the vote without malevolent outside intervention, and that the system will accurately count the votes. None of these types of controls will do anything to prevent an insider (someone at the manufacturer) from adding code to switch votes to a preferred candidate.<br /> <br /> There is mention that the source code was audited by an outside source. Even assuming that it was feasible to do this audit in the time provided (a separate issue), and that an audit can find flaws in a short period of time (it can’t generally find security flaws, but it should prevent intentional vote manipulation by insiders), there is a remaining problem. We will assume that the source code was audited, and the auditors found no problems because there were no problems to find – the source code was perfect. (<a href='http://www.schneier.com/blog/archives/2007/07/california_voti.html'>this wasn't the case</a>) The remaining issue is, how do I know that the source code matches what is actually running on the machines? It’s a long process to go from source code to the actual machines sitting in polling stations. Nothing guarantees that the source code didn’t have malicious bits purged before giving it to the auditors. Nothing guarantees that the machines won’t get a “more up-to-date version” of the software. Nothing guarantees that someone in the manufacturing plant doesn’t replace the software with something of his own design. Even if the audit was perfect, all we get is that some source code looks like it works right, but this tells us nothing about the machines that are supposedly running that software.<br /> <br /> But they have a voter verifiable paper trail. And this is all that saves the process. Computers are a great way to produce something which is easy to read. They make it easy to catch spelling errors, and so on. So the eSlate is a thousand dollar machine to make sure that the paper ballots are readable. All the security features are a waste of tax dollars.</p>
Metro Silicon Valley Marathon 2007 - 4:00:25tag:kdpeterson.net,2010:/blog//2.712007-11-05T16:03:48-08:00
<div style='float: right; padding: 15px;'><a href='/images/20071104_marathon_midpoint_800x600.jpg'><img src='/images/20071104_marathon_midpoint_220x165.jpg' alt='Kevin at half way point of Silicon Valley Marathon' float='right' /></a></div>
<p>So close – one second per mile and I would have been sub-4:00. It’s close enough that I’m satisfied for now though. I don’t plan on running another marathon for quite a while.</p>
<p>I did my training better this time around. I was following something similar to the <a href='http://www.amazon.com/gp/product/159486649X?ie=UTF8&tag=kdpetersonnet-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=159486649X'>FIRST Training Program</a>, but I wasn’t as strict about the pace of my workouts. I did MWF bicycle, TThS run, with Tuesday being intervals (1/2 mile with 2 minute rest), Thursday temp (6-8 miles at about 8:15 pace), and Saturday long run (with 3 20-milers, 7 weeks, 5 weeks, and 3 weeks before the marathon). I was a little rushed, because I jumped into training right after recovering from San Francisco, so I think I can do better if I do another full proper training cycle, starting from a good base without any injuries or recent overtraining.</p>
<p>Samantha did her first marathon in 5:32. Ted took first in his age group at 3:15. Frances took second in her age group for the half at 1:57. Annie finished her first half marathon at 2:17.</p>
Live from the Singularity Summit 2007tag:kdpeterson.net,2010:/blog//2.612007-09-09T10:00:00-07:00
<p>I’m currently at the <a href='http://www.singinst.org/summit2007'>Singularity Summit 2007</a> at San Francisco’s Palace of Fine Arts. I’ll post updates during the day.<br /> <br /> I will eventually (swear to the flying spaghetti monster), put these into a more coherent form. But here’s some things that stuck in my head as a relative outsider to “futurism” per se, but someone fairly familiar with AI and longtime reader of the sl4 list. Some of these are my ideas triggered by something mostly unrelated.<br /> <h3>Day One</h3><ul><li>Morality is an attempt to avoid guilt. Guilt is an emotional response based on projecting the self onto others. We model other people but we cannot separate the emotions experienced while being them from our own.</li><li>The evolutionary origin of the brain, and the physical substrate it runs on, place constraints on how careful we have to be in mapping, measuring and simulating it. The onus is on Penrose to explain how a dirty system like the brain could evolve to use some mystical microtubules computation method.</li><li>Interesting phrase from Wendell on robot rights: "are they conscious in a way that we cannot prove they are not?" I like people who are this precise in exactly what they are trying to communicate.</li><li>I think my idea of the doctrine of subjective immortality, that death only has negative value by its perception, means that we can freely turn them off, since in principle they can always be restarted.</li><li>Motivated by Sam Adams response to the rights question. Would a computer's "recognition of the other" be hindered if the "other" did not recognize it? That is, is the refusal to recognize robots as our moral equals a potential problem in the early childhood development of something like Joshua Blue?</li><li>Does Kismet use the same subsystems for generating its own facial responses as it does for analyzing those of others?</li><li><a href='http://www.ad.com'>Marcos</a> says system will scale to 100 billion neurons on system using approx 500 TB ram. I'm just assuming realtime. Rough calculations: 500 TB = 4 GB / machine * 125,000 machines * $1000 / machine / $125,000,000.</li><li>Jamais Cascio can give the "in a world where ..." guy a run for his money. His question "if we need to get more people involved how do we get their attention?" makes me feel bad for not encouraging Sam to come.</li><li>Steven Omohundro's presentation was fascinating to me, right up my alley with the game theory. I will be reading his paper, then starting on the references. His thesis is sort of like the Coase theorem for self improving systems: a self improving system will approach ideal economic rationality. AGI will protect its utility function at all costs. Most other work is concentrating on the belief function.</li><li>Peter Voss is a little spooky. I find it interesting that he just automatically assumes that life, continued existence, is of value, and dismisses the need to justify this. I've heard that he is a hard core Objectivist, and this position is consistent with Rand's. Voss says as soon as five years.</li><li>What in the world could "treaty verification" as a field for narrow AI mean?</li><li><a href='http://www.novamente.net'>Ben Goertzel</a> wants to embody AGI as a baby to be used as a fashion accessory in Second Life so that they get lots of interaction with humans. I'm looking forward to more Kraftwerk scored demo reels. They may be ready to demo Novamente in Second Life by the Virtual Worlds conference in San Jose October 10th. I should get his book Hidden Pattern. I have <a href='http://metacog.org'>metacog.org</a> in my notes, think it's background. Also a book Probabilistic Logic Networks is coming out from Springer Verlag soon. He mentions releasing some of the Novamente framework open source to get narrow AI implementers to start thinking in a more AGI way. I wonder if it will possibly allow the development of standardized interfaces, allowing, for example, more or less biologically valid models of different components of mind for different purposes. Particularly, you can cheat and feed predigested stuff to your AGI to get it interacting with a simulation well before you can start embodying it.</li><li>Paul Saffo read the poem "Machines of Loving Grace" by Richard Brautigan. He encouraged us to spread the world of what's going on to those who are not geeks.</li></ul><br /> <h3>Day Two</h3><dl><dt>Peter Norvig</dt><dd>His position is more along the lines of building the subcomponents, then trying to assemble them. He considers key components probabilistic first order logic and hierarchical representation and problem solving. I'm not too clear on what he means by the second. He talks about how whether you see an acceleration depends heavily on how you interpret the data. Some data sets show no such accelleration. It is unclear that a perceived accelleration or recentness-of-important-events is real or caused by observer bias. My example: what we now see as different phyla are descended from what were once different closely related species. Is asked about smooth transition without noticing an AGI. He says is possible, but will be noticed afterwords. I think mostly pointless question because a human level AGI will be quickly followed by a post-human AGI, and then there will be no difficulty with the question. Annoying "question" that starts "it seems to me". Note to world: "it seems to me..." is not a question-word. But question is about training technicians versus scientists and whether the university system fails. Some smart ass finds contradiction in his points.</dd><dt>J. Storrs Hall: Asimov's Laws of Robotics -- Revised</dt><dd>I recognize the name as a frequent writer on SL4. Interesting to put a face to him now. Has a book out on machine ethics. Analogy: Hammurabi trying to hand down a code to prevent Enron. Thesis: ethics considered as behavioral ESS converges to something like friendly behavior. Seems to me to omit the issue of how they will deal with us. Seems to strongly conflict with Omohundro's point, but has the advantage of taking into account situations like Hammurabi.</dd><dt>Peter Thiel</dt><dd>If singularity, then either the world goes to shit or it is a sustained boom. No sense in investing for the former. I think I've written before on this topic of ignoring extreme bad situations. How to invest for singularity? Thiel seems to be handwaving over the issue that just because there may be a massive growth, it doesn't mean that pets.com will have any part of the growth. Says Warren Buffett is moving toward investing in catastrophe insurance.</dd><dt>Michael Lindsey: XPrize</dt><dd>They are trying to formulate an XPrize in the area of educational software. Another long question and answer session. And by "question", I mean whiny hippie "why aren't you adopting my idiot pet method of looking at the world" soapboxing. Except the elearning person who points out exactly what I was thinking: this is not a well defined prize like "2 space launches in 10 days".</dd><dt>Christine Peterson: Open Source Physical Security</dt><dd>I think she was walking the line between wanting to actually go into detail for how this has been thought out, and risk boring and boggling people, versus the "Isn't open source great?" cheering section. She points out some good things about Open Source, but I'm skeptical of the claim that Open Source is good at debating things like physical security. The defining feature of physical security, or at least the current usual example of airport security, is that the choice made by one person affects many other people. As I argue in [my previous post](/blog/2007/03/freedom-not-democracy-1.html), the value of open source comes from the market, the freedom to choose that solution that best fits the needs of the person doing the choosing. I'm not even sure I'm a fan of the principle of openness in physical security, at least not to the degree many people take it. Remember, the idea of security through obscurity was such a difficult idea to overcome in computer security precisely because it is such a useful idea in physical security. I am a fan of her libertarian decentralization ideas, but that isn't specific to anything related to the conference.</dd><dt>James Hughes</dt><dd>Made a good point to at least mention the "millenialist cognitive biases", but I think his political vision is hopelessly obsolete. Data is easier to hide, easier to move, and vastly more potentially useful than guns, but somehow the same political system which cannot prevent misuse of guns even when they do restrict legitimate use will magically gain the ability to shackle AGI with regulations.</dd><dt>Eliezer</dt><dd>Also mentions the editing of value functions. Seems to want something more similar to Hall's model than to Omohundro's. Uses terminology "terminal values" and "instrumental values". Possibly these are easier to understand when spoken. I think I prefer ends values and means values for written. Someone, not sure who, asked a question to the panel of the three above about whether AGI will impose it's morality on us. Eliezer cites importance of valuing freedom as an intrinsic / terminal / ends value. Hughes waves his hands and says that somehow multi-national governmental bodies are going to be better at regulating such things than designing for friendliness. I'll be sure to feel happy and safe when Iran is chair of the UN Commission on Permissible Uses of AGI.</dd></dl> <b>"All Watched Over by Machines of Loving Grace"</b><br /> I like to think (and<br /> the sooner the better!)<br /> of a cybernetic meadow<br /> where mammals and computers<br /> live together in mutually<br /> programming harmony<br /> like pure water<br /> touching clear sky.<br /> <br /> I like to think<br /> (right now please!)<br /> of a cybernetic forest<br /> filled with pines and electronics<br /> where deer stroll peacefully<br /> past computers<br /> as if they were flowers<br /> with spinning blossoms.<br /> <br /> I like to think<br /> (it has to be!)<br /> of a cybernetic ecology<br /> where we are free of our labors<br /> and joined back to nature,<br /> returned to our mammal<br /> brothers and sisters,<br /> and all watched over<br /> by machines of loving grace.<br /> <br /> Text from <a href='http://www.brautigan.net/machines.html'>Richard Brautigan Bibliography and Archive</a>. Brautigan initially published it with a vague free for non-commercial use license.</p>
Well how about that?tag:kdpeterson.net,2010:/blog//2.562007-08-06T22:37:12-07:00
<p>I actually won a game of chess online. I’m surprised, because my general experience has been that it’s all hard core chess geeks who play online, and they would slaughter someone like me who thinks that Ruy Lopez is Jennifer’s brother. But after warming up for a few weeks playing my girlfriend (she’s on 23 games now, about 16 of them against me, I’ve lost twice due to not paying attention while in teacher mode), I can actually hold my own and not be totally embarrassed.<br /> <br /> I play at the <a href='http://www.freechess.org/'>Free Internet Chess Server</a>. I’m looking forward to when I beat someone who has a few wins in their history to solidify my ranking (currently at “who the hell knows” due to having played a handful of games and my opponents having similar rankings).</p>
Atheists in Foxholestag:kdpeterson.net,2010:/blog//2.1012007-08-05T21:44:06-07:00
<p>I received the following in email from SF Atheists, who got it from American Atheists, who received it from Kathleen Johnson, who explains who she is in the first paragraph.<img class='inline' src='/images/atheist.jpg' alt='Atheist Dog Tag' /></p>
<blockquote>
Thought you'd be interested in this report of the first-ever meeting of Atheist service-members in Iraq under the umbrella of the MAAF-Iraq chapter of the Military Association of Atheists and Freethinkers. This meeting was put together by the same young MAAF member who recently had his second letter published in the Stars and Stripes.
One of our members, a young Atheist enlisted soldier, thought he would like to see if he could generate some interest in MAAF meetings at his Forward Operating Base (FOB) here in Iraq (not the base I'm at, by the way). He got things coordinated and started hanging flyers, and after weeks of having to re-hang his flyers almost daily because some vandal kept tearing them down, he finally succeeded in having a small MAAF meeting. I wasn't there because the meeting wasn't on my FOB, but I knew he was holding it and was expecting to hear from him after the meeting. Keep in mind that this young soldier did everything right - he went through the Chaplain's office and jumped through all the hoops it takes to legally hold meetings that are religiously or philosophically based. Four soldiers attended this meeting - all of them very junior enlisted soldiers with the exception of one Major (an O-4), who claimed to be a "freethinker".
Well, to make a very long story a little shorter, the Major turned out to be a fundamentalist Christian who verbally berated the other attendees, accused them of plotting against Christians and disrespecting soldiers who have died protecting the Constitution, and threatened them with punishment under the UCMJ for their activities (said they were "going down") and said he would do whatever it took to shut the meetings down. Keep in mind that by this point, he had two of the attendees (one soldier fled when the shouting started) standing at the position of attention so that he could yell at them, berate them, and humiliate them. This apparently went on for several minutes at which time the Major shut down the meeting by saying he wasn't some "push-over Chaplain" and that he would not tolerate the meetings to continue.
The young MAAF member who hosted the meeting is absolutely freaked out about what happened, but he said he's going to continue with the meetings and isn't going to be bullied by the prayer warriors. I've advised him to immediately notify the Chaplain sponsor of what happened to get guidance while I try to figure out what to do next. I should hear something back from him tonight sometime and there's even a small possibility I might be able to score a mission to his FOB and attend one of his meetings in the next few weeks (if I do, I'll meet with the Chaplain in person).
As for immediate action, he's going to get me the names of his Chaplain sponsor and the name of the officer who disrupted the meeting. My intent right now is to make a formal report to the most senior Chaplain I can find along with possibly an Equal Opportunity complaint against the officer if we can get him fully identified. I may not be eligible to make that complaint because I wasn't there, but I can at least smooth the way for this young troop to make one if he elects to. At the very least, I can make the EO office formally aware of what happened there.
More info will follow when I get it, but right now, feel free to disseminate this information since I've intentionally sanitized it for names and locations. I will be happy to forward any words of support to him if they get mailed to [redacted] he could really use some encouragement right now, I think.
</blockquote>
<p>I’ve removed Kathleen Johnson’s email address for privacy, but I’m confident you can contact her through the website <a href='http://www.atheistfoxholes.org/'>Atheists in Foxholes</a>. If not you can email me at Kevin at this domain. I have to say I find this somewhat hard to believe, but she assures me the events were witnessed by people in the MWR building who weren’t even part of the meeting.</p>
<p>I never experienced such outright persecution in the Marines. I wouldn’t be surprised if it turned out that the soon to be “retired to pursue other opportunities” Major turned out to be a reservist. Very unprofessional behavior.</p>
<p>About the worst I ever had it was dog tags. The dog tags I was issued in boot camp came back saying “NO PREF”, which is what they put if you failed to list a religion. I would strongly prefer being able to identify as “none” or “not religious” because I do not consider atheism a religion, do not feel any affinity for others who call themselves atheist, and don’t have much inclination to join any sort of group or proclaim it, but none of those are options, so #75 it is. But even before a deployment when we were given the opportunity to have dog tags made (which was good because mine were not regulation – engraved rather than stamped), and after spelling “atheist” over the phone to make sure they got it right, they still came back saying “no pref.”</p>
<p>So I went down to the surplus store known to make stamped dog tags. And the bitch behind the counter gave me shit about being an atheist. Yes, that’s right pin headed imbeciles, I’m off defending you from people who want to impose an Islamic theocracy on you, and know, being shot at did <em>not</em> somehow make me suspend my ability to think clearly. “Oh, there are men with AK-47s who hate me because their imam tells them I’m the great satan. Clearly there is an all powerful loving god watching out for me, I better stop thinking logically and masturbating.”</p>
Finished SF Marathon in 4:17tag:kdpeterson.net,2010:/blog//2.662007-07-29T20:15:00-07:00
<p>I finished the SF Marathon in 4:17:46. I’m very disappointed. I ran the first half in 1:50, way too fast, and just died on the second half. I haven’t yet signed up for the <a href='http://www.svmarathon.com/'>Silicon Valley Marathon</a>, but I am planning on running it to redeem myself and get an acceptable sub-4:00 time.</p>
Zen Satori and P vs. NPtag:kdpeterson.net,2010:/blog//2.902007-06-13T21:59:01-07:00
<p>Suzuki says satori is irrational, it cannot be explained.</p>
<p>Logical positivism says that we can only talk meaningfully about things which refer to experience. See for example Hempel’s “The Empiricist Criterion of Meaning” in <a href='http://en.wikipedia.org/w/index.php?title=Special:Booksources&isbn=0029011302'>Ayer's <i>Logical Positivism</i></a></p>
<p>Are these incompatible? I claim they are not, because even holding to the requirements of the positivists, there is still an area that allows things which cannot be explained, which I explain by analogy with the computational classes of P vs. NP. In computer science – and by computer science I mean math – P and NP are groups describing the difficulty of computing functions. P stands for Polynomial time, meaning that functions in P can be computed in a time proportional to a polynomial function of the length of the input. The simplest example of this is probably multiplication of two numbers. It’s easy to see that the simple method you learned in school takes time proportional to the multiple of the lengths of the two numbers, or in other words, it takes O(n<super>2</super>) time. On the other hand, there is no such efficient method for factoring a number into its composite primes. If you want to know the prime factors of 987654321, you pretty much have to just try every prime up to the square root of the number*. NP stands for Nondeterministic Polynomial time, which you don’t need to worry about and but just remember it is the class of all problems which can be <i>verified</i> quickly.</p>
<p>So here we have two groups of problems, P which are those things which we can find the answers to, and NP which are the things we cannot find the answers to, but can verify the answers are correct once we have them. (See <a href='http://en.wikipedia.org/w/index.php?title=Special:Booksources&isbn=053494728x'>Sipser</a> p. 243 for the key concept I will use.) These problems, both those in P and those in NP are still well defined problems. We’re not even including things like “is this sentence false?” or “what’s the last digit of Pi”. We will refer to these last as undecidable problems.</p>
<p>For the positivists, all statements fall into three groups. First are those statements that are analytic, which are true or false solely by virtue of their form, also known as <i>a priori</i> knowledge. In this class are the statements of math and logic. Next are those statements which are synthetic statements, empirical knowledge. All statements in this class refer directly or indirectly to knowledge about the world, that can be perceived by the senses. In the third class are the metaphysical statements which are neither analytic, nor statements which refer to things which can be perceived by the senses. This includes statements like “God loves you” and “colorless green ideas sleep furiously”. These statements are meaningless. I don’t mean that they are confusing, or difficult to understand, but because they are neither analytic nor synthetic, there is no observation or chain of reasoning from observations which can have bearing on the truth value of these statements. (I’m skipping a whole section here where I should talk about Wittgenstein and Humpty Dumpty, but eventually this would result in the same conclusions but would be much more difficult to read and follow.)</p>
<p>Generally, religious thought falls into the second and third categories. Claims that the earth is 6000 years old, or that all your psychological problems are caused by things you overheard while unconscious are synthetic (though clearly false) statements. Claims that Jesus is simultaneous different from and the same as his father would fall into the category of metaphysical or meaningless statements. It is neither true nor false that God is Love.</p>
<p>So what do we make of Zen? “There is something,” says Suzuki, “in Zen that defies explanation, and to which no master however ingenious can lead his disciples through intellectual analysis.” (<a href='http://en.wikipedia.org/w/index.php?title=Special:Booksources&isbn=038548349X'>Zen Buddhism</a>) Let us consider a master who wants to instill a particular state of mind in the student. If the state of mind is something like understanding Euclidean geometry, there are well established means to accomplish this task. The master will proceed by teaching the student certain axioms, showing him the relation between these, and so on. To put it in terms that Java programmers can understand, Euclidean geometry implements <a href='http://java.sun.com/j2se/1.5.0/docs/api/index.html?java/io/Serializable.html'>Serializable</a>.</p>
<p>This ability to express a concept in words, in such a manner that the concept can be clearly communicated from master to student, will correspond to P in my analogy. Just as some problems (P) can be calculated, so some ideas are explainable. And I claim that satori, while not explainable, is testable. That is, a master cannot impart satori to the student, but the master can identify whether the student understands Zen / has experienced satori / is enlightened (choose your favorite characterization of this phenomenon). So while I claim that Satori is a meaningful concept, and does refer to an empirically verifiable state of the world (of which the mind of the supposed zen master is part), it is also impossible to explain. This shouldn’t be taken to mean that I think all sorts of religious nonsense falls in the same category. Zen is the only thing I’ve found so far which I think meets this criterion.</p>
<p>To formalize somewhat, I believe that a core claim of Zen Buddhism is of this general form. There exists a state of mind, or world view, which we we call <i>satori</i>, that once adopted, is persistent and stable. Satori offers peace of mind, the absence of fear of death, the lack of desire for worldly goods, happiness, and the desire to pass it on to others. Etc., etc. Satori cannot be explained in the sense that there is no known sequence of experiences which will consistently cause this state of mind. Satori can however be recognized by the responses of one who has experiences satori to certain experiences or situations (e.g. the stories where the master smiles when he sees that the student is finally enlightened).</p>
<p>What’s important is that you will note that the above contains no mention of anything metaphysical. This is what makes it a meaningful statement. I consider satori of questionable desirability due to my personality (I have no desire to lose desire), but I consider it a real phenomenon. Actually, I consider it a single example of a phenomenon that occurs to people of all sorts of religious backgrounds, but the Buddhists are better at intentionally triggering it and have a framework which allows them to see it as closer to what it actually is than those who experienced this state of mind in the context of 1st century Judaism or 7th century Arab polytheism. This last part is totally unsupported and I’m open to other interpretations.</p>
<p>Join me next time when I answer the koan, “who is the great master who makes the grass green?”</p>
<small>\*Yes, I know this is wrong. I'm sure since you are such a transcendent genius that you remember that factoring is sub-exponential and in the next paragraph I outright assume that P != NP, you've already figured out where I'm going with this and don't need to read the wrong parts anyway. Aren't you missing a Star Trek re-run?</small>
Bay to Breakerstag:kdpeterson.net,2010:/blog//2.912007-05-20T13:56:10-07:00
<p>Finished Bay to Breakers in <a href='http://results.active.com/pages/searchform.jsp?rsID=44574'>58:07, 773rd out of 23692.</a> It hurt. Next year I may do it more “in the spirit”, meaning with a camelbak full of wine.<br /> <a href='/images/baytobreakersbiblarge.jpg' target='_blank'><img src='/images/baytobreakersbibsmall.jpg' /></a></p>
And you, sir, win the "dumb even by internet standards" awardtag:kdpeterson.net,2010:/blog//2.862007-05-15T22:50:55-07:00
<p>In this post titled <a href='http://www.bmezine.com/ritual/A70504/ritcruci.html'>Crucify Me</a>, “Patrick” writes,<br /> <br /> <blockquote>"Why?" The question becomes predictable, almost routine when someone enters my car. They immediately seem to notice the polished steel hook hanging from my rearview mirror, and inevitably want an explanation. Each time I hear that annoying three letter word, I'm almost compelled to launch into some sort of horribly graphic and disturbing explanation. One that might circulate and prevent any future inquiry as to why I choose to do so.</blockquote><br /> <br /> So let me see if I got this straight. You don’t like being asked about why you have a hook hanging from your rear view mirror, but you still have it hanging there. What? Did the annoying question gremlins sneak into your garage and superglue it in place? Make a deal with the devil that you will forever bring up a topic that you don’t like talking about?<br /> <br /> Forgive me if I find it a bit of a stretch. But it gets a little hard to buy the “it’s such a deep personal experience I couldn’t possible describe it” when it seems you go out of your way to bring it up. Guess what, kid? People who really have things they don’t want to talk about don’t go around wearing a t-shirt saying, “Please ask me about my trendy hobbies so I can act nonchalant and downplay them.”</p>
1846 The Murder of José de los Reyes Berreyesa and the twin brothers named Haro by a group of bear (flag) partisanstag:kdpeterson.net,2010:/blog//2.812007-05-08T20:02:29-07:00
<i>Juan Pablo Bernal was born in San Francisco about 1810, son of José Joaquin and Maria Josefa Sanchez (both Anza settlers), married Rafaela Feliz, granddaughter of Joaquin Ysidro Castro (Anza settler) and (supposedly) daughter of Maria Antonia Amador daughter of Pedro Amador (Portola 1769 to Monterey).</i>
<p>José de los Reyes Berreyesa left Martínez in a boat for San Rafael in company with the two young men surnamed Haro. I do not recall the date. At the time they went ashore a band of Americans or the Bear forces, who were awaiting them on the beach, took them by surprise. It is an established fact that the elderly Berreyesa was the one they killed first and one of the Haro boys (which of the two I do not remember), seeing that Berreyesa was dead, said, “You’ve killed my uncle, now kill me too.” The murderers, angered by these words, fired at the one who had uttered them, killing him also. It is said that upon the other Ha ro brother witnessing this, he exclaimed, “You’ve killed my brother, so do the same to me!” At these words, he was dropped to the ground, wounded by a bullet aimed at him. It is told that they themselves buried the bodies of their victims in the same grave near the place of the crime.</p>
<p>The young Haros had a haughty bearing, were very amiable, with a courage equal to any test, and so much alike in appearance that even their very close acquaintances took them for one another frequently.</p>
<p>Berreyesa was a man of high honor, very affectionate toward his family, a great worker, of a pleasant disposition and also was very courageous in the face of danger, as was evident upon this occasion, when he faced death with great fortitude. It is said that when Berreyesa was questioned as to the purpose of his trip, he replied that he was on his way to visit his sons who were prisoners. But the Americans would not believe him, since they suspected him of being a messenger or spy of the enemy and, considering him to be such, they killed him without mercy, just as they did the young men, whom they believed to be his accomplices.</p>
<p>José de los Reyes Berreyesa was married to my sister, Zacarías Bernal. Joaquín de la Torre, with his company, was scouting under the orders of Comandante General Castro in the Lompalí section. The Americans, being aware of this, set out in pursuit and caught up with him near the last named place, attacking him w ith a considerable force. In the fight, which was bitterly contested, there was killed an officer named Manuel Cantúa, a very talented young man. It is known that the forces of la Torre suffered no other losses. It is not known as to whether there were any losses on the American side or not. Capt. Joaquín de la Torre, seeing he was beaten, fled in the direction of Sonoma, boarding a tule raft made by himself and an American with whom he made the crossing from Benicia. When he arrived at San Lorenzo, he found Don José Castro encamped there with his troops and informed them of the encounter with the enemy, the death of Cantúa and the routing of his troop, which consisted of about 40 men. Castro decided to remain in San Lorenzo two days longer, for the purpose of reinforcing his lit tle army so he could go and attack those who had routed Capt. de la Torre. At the end of this time, however, whether on account of the fact that he did not consider himself sufficiently strong or for other reasons unknown, he gave the order to march to the Potrero de Santa Clara, which was done. We were encamped there about fifteen days, at the end of which time we marched to San Juan, where we arrived without mishap. On these marches nothing worthy of mention took place.</p>
<i>I stumbled on this list of <a href='http://members.aol.com/bernal411/anza.html'>Anza Expedition Families</a>, and figured if that was interesting to me, it's probably worthwhile to make this more obscure stuff available. It'd be nice if there were a Californio wiki, since that type of collaboration could make lots of information available that's currently scattered around in grandmothers' dusty notebooks. If anyone knows of such a thing, email me at kevin at this domain.</i>
Moderate Libertarianismtag:kdpeterson.net,2010:/blog//2.762007-04-27T23:38:57-07:00
<p>Libertarians like to point to polls that suggest “a lot of Americans are libertarians”, based on surveys like the <a href='http://www.theadvocates.org/library/poll-results.html'>Advocates' WSPQ</a>. Then they sit around wondering why no one votes for Libertarians. “Surely,” they say, “it must be because people are not aware of what we stand for.”<br /> <br /> Let’s imagine a situation where everyone forgot what the traditional parties are about, and had to consider all the candidates based solely on their positions. So the Libertarians, of course, nominate some pure thinking radical with the campaign slogan “How dare you suggest we tax prostitution and drugs after we legalize them. Taxes are immoral.” The Republicans find a good bible thumping reactionary with the idea to deport any “known homosexuals”, cut capital gains tax, and bring back slavery. The Democrats campaign on a “100% income tax over $65,000, free abortions for all” platform.<br /> <br /> That’s how it goes right?<br /> <br /> I think you see my point. I’m not going to vote for a wacko. I’m not going to vote for anyone whose claim to fame is being a heavy pot smoker, or dying his skin blue, or demonstrating a poor understanding of fractional reserve banking.<br /> <br /> Will I be voting for a libertarian next November? Probably not. I hate all of the Dems running for President and Feinstein too, so I’ll vote for Republicans in those races. My congressman, a democrat, fled the holocaust and went on to get a Ph.D. in economics. I doubt I agree with him, but I think he’s at least qualified for the job.<br /> <br /> Guess what, party faithful? The people afraid of government mind control rays won’t vote anyway, because that would mean leaving their mountain cabin, or, heaven forbid, consenting to a government monopoly by mailing in an absentee ballot. This isn’t 18th century Boston. The revolution is over. Time to learn how politics work and get with the program.<br /> <br /> Oh, yeah, vote for me in … 2010 or so. Peterson, Libertarian for Whatever, cause I’ve got the balls to support school vouchers even though they are not ideologically perfect.</p>
The Wheelbarrow Principletag:kdpeterson.net,2010:/blog//2.622007-04-22T22:17:00-07:00
<p>So I made mention of what I was going to call the MacGyver principle, but I think the best pop culture reference for this idea is The Princess Bride. Slightly abbreviated:</p>
<dl> <dt>Westley</dt> <dd>And our assets?</dd> <dt>Inigo</dt> <dd>Your brains, Fezzik's strength, my steel.</dd> <dt>Westley</dt> <dd>That's it? Impossible. If I had a month to plan, maybe I could come up with something, but this....<shakes head> ... I mean, if we only had a wheelbarrow, that would be something.</dd> <dt>Inigo</dt> <dd>Where did we put that wheelbarrow the albino had?</dd> <dt>Fezzik</dt> <dd>With the albino, I think.</dd> </dl>
<p>Consider that you are Westley. You have an objective. There is no partial success, either you succeed or you fail. The objective cannot be achieved without a wheelbarrow. Now we’ll deviate from the story a little bit. Assume that Westley cannot know if he will have access to a wheelbarrow yet. He is making his plans, and he will meet with Fezzik and Inigo at the castle. He has no access to a wheelbarrow, and no way to communicate his need to Fezzik and Inigo. What should he do?</p>
<p>Let’s outline two possible actions. In the optimistic course, Westley assumes that a wheelbarrow will be available. Maybe he finds one on the way to the castle, or Fezzik and Inigo bring one. Whether he succeeds or fails depends entirely on whether he happens upon a wheelbarrow. In the pessimistic course, he assumes he cannot get the wheelbarrow, so he … fails.</p>
<p>Westley is never worse off by assuming he will find a wheelbarrow. Sometimes he is better off. Assuming that he will find a wheelbarrow is always at least as good as not assuming. So now we are ready to formalize.</p>
<p>The wheelbarrow principle says, when an outcome cannot occur unless an unlikely event occurs, and this outcome is important,* you can assume that the unlikely event occurs.</p>
<p>So if you are treading water in the middle of Pacific, you will die unless someone happens by, either a boat or a plane. The correct course of action is to assume that someone will happen by, and make your plans to maximize your survival given that assumption. So, <i>when</i> a boat or plane comes by, be sure you are ready to signal it.</p>
<ul>
<li>The outcome must be sufficiently important that the likelihood of the unlikely event times the value of the outcome makes up for the expended resources under the more likely case that you try and fail.</li>
</ul>
The Validity of Circular Reasoning (incomplete)tag:kdpeterson.net,2010:/blog//2.572007-04-21T22:42:49-07:00
<p>As far as I know, no one has yet solved such questions as “how do you derive ought from is,” or “why is there something rather than nothing”, or “how can we be sure we aren’t just living in the matrix?” The difficulty lies in the fact that the best answers we have to these statements (“you can’t”, “no reason”, “you can’t”) aren’t very satisfying, and more importantly, they cannot be used to derive other conclusions, because of the weakness of the solutions to these initial problems. Is there any way around this? Yes. I’ll address solipsism specifically, the technical term for the “how do I know this isn’t the Matrix” question. Let us assume that I have constructed a beautiful and all-encompassing belief system, showing the path to happiness, virtue, and true understanding of the universe. However, it happens to also say that there’s a 95% chance that we’re actually brains floating in a vat dreaming our whole existence. But it’s a beautiful, elegant, totally logically sound system, it makes a handful of quite reasonable assumptions, and aside from the little disembodied brains problem seems pretty good.<em><br /> <br /> Mostly likely, you wouldn’t accept it. After some discussion, we determine that you would be unwilling to accept any philosophical system that didn’t conclude that the world is mostly as it appears to be.</em>* So my solution is to simply assume that the world is more or less how it is. Assuming I’m not abysmally bad at reasoning, I should end up the other side with a system that still says the world is more or less how it is. Can I do this? Is it logically valid? Sure!<br /> <br /> My formal logic is a little rusty, and even if it wasn’t, I’m not sure how to type stuff in HTML, so let’s do it like this. Let <i>b</i> be “we might be floating brains.” For all world views <i>w</i>, if <i>w</i> implies <i>b</i>, I reject <i>w</i>. For any <i>w</i> I accept, <i>~b</i>. Hmm… then you do something with introducing other conclusions… universal and existential qualifiers….not sure how it works now. Shit.<br /> <br /> I swear I had it worked out. Maybe I’ll post the MacGyver principle of unlikely occurrences first, since that ties in pretty well with this.<br /> <br /> * A more likely scenario is that you can’t actually conclude anything without assuming solipsism is invalid.<br /> ** I leave “mostly as it appears to be” vague, but I don’t see this as a problem because most people have a reasonable flexibility in terms of accepting that the sun and moon are in fact different sizes, or similar things, and I expect that the theories which assume this would be well within this flexibility.</p>
Arm Teacherstag:kdpeterson.net,2010:/blog//2.1022007-04-18T23:45:07-07:00
<p>I heard talk on the radio this morning about some school training teachers how to barricade their room. The castle went away at the time we developed gun powder. You want safer schools? Get rid of the guns free schools bullshit. Guess what? Psychos aren’t particularly concerned about breaking these laws. The laws do nothing but keep the victims from protecting themselves. Easy solution, no training required:</p>
<p>Allow any credentialed teacher, administrator, licensed psychologist working at a school who served in the military and was honorably discharged, to carry a gun. If they’re responsible for children, then they better be responsible. If they served in the military, they know how to handle a gun. Will this guarantee nothing like Colu…Black…whatever the massacre of the week won’t happen again? No.</p>
<p>Well, I’ll stop arguing this, and instead just link to <a href='http://reason.com/news/show/119694.html'>this article at Reason</a> which makes the same argument, but like, more professional and stuff.</p>
<p><em>June 25 update:</em> Looks like <a href='http://www.lasvegasnow.com/Global/story.asp?S=6698362&nav=menu102_2'>someone is taking my advice.</a></p>
Tax Gas Moretag:kdpeterson.net,2010:/blog//2.962007-04-16T23:09:58-07:00
<p>Now, now, calm down there my gun toting, short haired, “if the poor don’t like being poor they can get a damn job” friends. The Bay has not gotten to me yet. My tofu consumption has not increased and I do not have a handsome “roommate” with a well trimmed goatee. More to the point, I drive a Ford Mustang rather than a Toyota Prius, because I’m good at math.<br /> <br /> Fact of the matter is, gasoline is an insignificant portion of the cost of car ownership. Based on my records, I spend about $1500/year on gas. Let’s assume I bought a Prius that got 3x the mileage my Mustang gets. Wow, I just saved $1000! And it only cost me $25,000.<br /> <br /> Figure the average yuppie buys a new $45,000 car every five years, drives it 50,000 miles and gets $20,000 back on the trade-in. At 20mpg, that’s $7750/year for depreciation and gas. At 60mpg, that’s $6875. Those amounts are close enough that no sane person is going to make mileage a big factor in the purchase of a car. For about $75/month, he can get the big ass SUV with room for everything instead of the hippie mobile. (The story is different if you have a really horrendous commute.)<br /> <br /> I found some data at <a href='http://www.eia.doe.gov/'>the man's propaganda machine</a>. They say 840 million gallons of petroleum products consumed per day. <a href='http://nationalpriorities.org/'>Long hair hippie treehuggers</a> estimate the Iraq war at around $105 billion a year. That’s 34 cents per gallon of petroleum products. (You can object that not all of our oil comes from the middle east, go and do the more complicated math, and you will end up at the same number. Proof left as an exercise for the reader.) Right now, the price of gasoline is subsidized by 34 cents from other sources, meaning income taxes, borrowing from the children, etc. So, jack it up 34 cents a gallon right there. That’s on top of the federal and state road taxes. I’d also support funding an appropriate percentage of emergency services with gasoline taxes.<br /> <br /> As to the environmental impact, now, libertarians are fond of saying, “let the market decide.” But don’t forget an important detail, what we want the market to <i>decide</i> is efficient allocation of resources. But the issue of assigning the right to those resources to begin with doesn’t happen in the marketplace. The environment is the epitome of a public good. So while I might be of the opinion that the Earth is big enough and old enough to take care of itself, some other people seem to have a strong emotional attachment to our current climate and the details of coastlines. Fine. Put a price on it. Let’s all agree on a fair market price for carbon emissions and specify where the payments go. Or cap it. Set it to the average emissions at some point, give each country what it currently has, and never touch it again (it is critical that these caps and allocations never be changed). Then just create a market. Let the poor countries pollute if they like. Or let them sell their carbon rights. Hell, let them go into debt if they please. But the question of transfer payments to countries run by criminal thugs (and I mean real, chop-his-head-off thugs, not the white coller criminals running our own country), is entirely separate from the question of market mechanisms to efficiently allocate scarce resources (in this case, the amount of pollution the environment can reasonably absorb). Conflating these separate issues cannot lead to reasonable policy.</p>
The Ex Book Scamtag:kdpeterson.net,2010:/blog//2.972007-04-10T21:18:40-07:00
<p>So I get some annoying spam about <a href='http://www.theexbook.com'>this scam website</a> the ex book. The email of course says that some ex posted horrible horrible things about you, log on to the site to check them out. Of course, to log on you have to pay for a subscription. Just a quick hint to anyone clueless enough to fall for this: legitimate websites don’t register through <a href='http://www.domainsbyproxy.com/'>Domains By Proxy</a>.<br /> <br /> If someone wanted to, they would have no trouble winning a lawsuit against these losers. Unfortunately, they probably don’t have the assets to make it worthwhile.</p>
Borders doesn't sell bookstag:kdpeterson.net,2010:/blog//2.922007-04-03T21:55:14-07:00
<p>You might not have noticed this, since the change was gradual, but Borders and Barnes and Noble don’t sell books anymore.<br /> <br /> I know you’re thinking, “umm…dude, they’re bookstores, that’s what they do, they sell books.” Well, yes, I’ll grant that they still do allow you to exchange money for a bundle of paper, but that’s not their primary business anymore. Selling books as a business model has gone the way of the buggy whip maker. Now, the objective is to provide a place for people who like books to hang out, and monetize it.<br /> <br /> You may remember that word from the dot com crash of 1999. Let me use it in a sentence to refresh your memory:<br /> <br /> “Our website allows users to track hot air balloons. In real time.”<br /> <br /> “That sounds fantastic, but how are you going to monetize it?”<br /> <br /> “Monetize? Umm…we’re just going to build a really cool website and then go public.”<br /> <br /> ”I’ll be happy to take over payments on the BMW when you go bankrupt.”<br /> <br /> The mega bookstores clearly are not about selling books. I could sell books much cheaper using significantly less space. Hell, I could even go online and do it over the internet. Pretty clear they aren’t about providing the service of locating books either. What do they tell you when you walk in and say you’re looking for <em>Der Einzige und sein Eigentum</em>? Usually, they offer to order it for you.<br /> <br /> So what, then, do I claim Borders sells? Simple, they sell a public library with less reference books and more <em>Guide to getting it on</em>. But there’s no money in that, so they monetize the library with selling books and coffee.</p>
San Ramon Half Marathontag:kdpeterson.net,2010:/blog//2.872007-03-25T12:09:03-07:00
<p>In an attempt to answer the question “what’s the minimum effort I need to expend to run under 4 hours in July?” I decided yesterday to go to bed early so that I can get up at 5:00am and run a half marathon (13.1 miles) in San Ramon. <a href='http://www.runlikethewindinsanramon.com/results.html'>Results have been posted here</a>, I ran 1:59:22, just over the 9 minute pace I tried to hold to. 10th out of 18 in 25-29 males.<br /> <br /> This is the longest distance I’ve run, and I went there with the goal to come in under 2 hours by sticking to the pace, but I’m not good at it yet. My first mile was like 7:40, but by about mile 3 I had it sorted out. Started to run out of energy at the 5 mile mark. Left toe started going numb about 9 miles. All but the big toe on right foot numb by 11 miles. Had enough energy to step it up to probably around a 7 min pace the last few hundred yards, which means I could have set my pace faster.<br /> <br /> One thing that I was able to determine is that I am able to eat energy gel and drink some water while running without getting an upset stomach.<br /> <br /> 7:00pm addendum: my knees are not likely to hold up on a marathon unless I build up to it. I can walk, but I have trouble with stairs and getting out of the car right now. Hopefully I’ll be better by tomorrow.</p>
I'll never waste my time internet dating againtag:kdpeterson.net,2010:/blog//2.822007-03-22T08:38:40-07:00
<p>In <a href='http://forums.fark.com/cgi/fark/comments.pl?IDLink=2687954#c28906181'>this comment</a> on Fark.com, Eric Nave does some analysis of internet dating sites. Conclusion? All kinds of men, nothing but fat single mothers.</p>
First organized race: Emerald Across the Bay 12ktag:kdpeterson.net,2010:/blog//2.772007-03-18T13:30:42-07:00
<p>I came in 924th out of 2711 runners, with a time of 1:05:08 (a 8:44 mile pace). I was shooting for an hour, and thought I could do it, but… I guess I’ll need to train more. Results are available at <a href='http://www.doitsports.com/groups/results/timers-calendar.tcl?group_id=105'>doitsports.com</a>.</p>
<p>This is the first visible step in my progress towards running the SF Marathon in July (and running it under 4 hours). Thanks to Haley for suggesting this as a New Year’s resolution. Also, thanks for flashing us. Very nice.</p>
<p>So, what would I have wanted to know beforehand? <dl>
<dt>Safety Pins</dt>
<dd>I forgot about safety pins until the night before and had to stop at Safeway on my way to the race. Easy to forget because pinning a bib to your shirt is not something you normally do.</dd>
<dt>Sweats</dt>
<dd>I showed up at the shuttle pickup location wearing what I would run in, plus a sweatshirt. It was about 48 degrees (9 deg C). I would have been much more comfortable if I had worn sweatpants, and possibly even a jacket. This isn't the Marines, and I don't have to look like a hard ass anymore.</dd>
<dt>Bring your crap</dt>
<dd>Okay, this is going to vary from run to run, but for the Across the Bay 12k, I'd say go ahead and bring your backpack with stuff you will want after the race with you to the starting point. Obviously, don't bring valuables, but I was glad I had put my fanny pack with my car keys and some cash in with my sweatshirt. For Bay to Breakers, there's no sweat service, I can't say what's normal yet.</dd>
<dt>Push to the front</dt>
<dd>If you are going to run a respectable pace, and are starting with the commoners in the last wave because you don't have any time to qualify for a better start position, get to the front if you are going to start fast. I probably could have gained 2-3 minutes by starting at the front.</dd>
<dt>Tape those nipples</dt>
<dd>Actually, I knew this before hand. I think my 3rd Marines 10k tank top has blood stains from where my nipples were beforehand. This is especially important if you wear cotton. Cotton is evil. Moisture + cotton = sandpaper. Point being, it's hard to keep track of which shirts happen to hit you just right to leave you in pain, and no one is ever going to know you wear band-aids on your nipples when running more than about 5 miles. Unless you go an post to your blog about it. If you like, you can actually find specially made nipple protectors like NipGuards.</dd>
</dl> Next up will probably be <a href='http://ingbaytobreakers.com/main.html'>Bay to Breakers</a>, also a 12k, but I don’t know if I’ll break an hour with the hillier course. Bay to Breakers is a much larger run, I’m going to sign up, now that I have a time to give them, since I should be able to get into the second wave behind the actual competitive runners.</p>
This Is The Funniesttag:kdpeterson.net,2010:/blog//2.722007-03-09T23:11:20-08:00
<a href='http://xkcd.com/c191.html'><img src='http://imgs.xkcd.com/comics/lojban.png' /></a>
<p>This is the funniest thing I’ve read in a long time.</p>
Guess I better get runningtag:kdpeterson.net,2010:/blog//2.1032007-03-04T23:45:00-08:00
<p>Signed up for the <a href='http://www.rhodyco.com/across12k.html' target='_blank'>Across the Bay 12k</a>. This is about the furthest distance I’ve run, but I’ve run 5mi to 10k distances many times. Biggest difference is that this is the first time I’m doing it willingly, and I won’t have sing cadence the whole way. I ran 45 minutes today, would guess about 5 miles, but don’t know. But the main thing I need to do to get ready for this (it’s only two weeks off) is to start getting to bed earlier.</p>
Freedom, not democracytag:kdpeterson.net,2010:/blog//2.732007-03-03T23:57:32-08:00
<p>I was actually being productive and doing some work related research when I stumbled on <a href='http://www.javaworld.com/javaworld/jw-02-2007/jw-0216-opensource.html?page=2' target='_blank'>this article on Open Source</a>. The content of the article isn’t that important. What prompted me to mention it is this:<br /> <br /> <blockquote>"Just because a big company open sources a product doesn't mean it will get used," says Neelan Choksi, vice president for Interface21, which makes the popular open source Spring framework for Java and Java EE applications. "Open source is a democracy—people vote with their feet."</blockquote><br /> <br /> Now, Choksi is making a point about the futility of using public relations tactics to get people to use crappy software, but what is more interesting to me is the last bit, “Open source is a democracy—people vote with their feet.” He’s right that it is about “voting with one’s feet”, but he’s wrong that has anything to do with democracy.<br /> <br /> Democracy is a form of cooperative decision making. The process is that everyone agrees beforehand to be bound by a single decision arrived at by the group as a whole. Whether all votes are equal isn’t really crucial to the core idea. What’s important here is that there is only one decision, one outcome, and it will be binding on everyone.<br /> <br /> “Voting with one’s feet” describes a market. The distinguishing feature of a market is that each individual is able to choose among options. There are many decisions, many outcomes, and each is binding only on that individual.<br /> <br /> So which is open source software? … So which is open source software? Actually, being open source doesn’t have anything to do with it. Choosing what software to use is very definitely a market situation. Of course, you are limited by what is economically feasible. If what you want is something that lots of people want, then you’ll be able to get it. So if I want the performance of running Linux in 64 bit mode, then I have to do without being able to see websites that require Flash. If I want everything to work right out of the box and play lots of games, then I have to deal with Windows being a piece of crap. If I want to have a beautiful user interface built on top of a rock solid BSD core, then I have to fork over the money for Apple products, and even more for the increased overall software cost due to the limited availability. But being a market means that my choice only impacts me.<br /> <br /> If software were like democracy, then the internet would get together and decide we’re all going to use Windows, or all going to use Linux.<br /> <br /> Democracy has its place. If it really is necessary for there to be one, and only one decision made, then democracy is a good way to make that decision under some circumstances. Democracy has a pretty good record in the political arena. Democracies tend to be wealthier, more free, and significantly more peaceful than other forms of government. Democracy is the basis for governing public companies (although here it is “one share, one vote” not, “one person, one vote”). In politics, a caucus is a group (generally members of a political party or coalition within a legislative body) who agree to decide democratically among themselves how to vote on given issues and then be bound to vote with the group in the larger body.<br /> <br /> But too many people assume that decisions always need to be made collectively. That’s why I love the metaphor Choksi used, “voting with your feet”. If you live in a democracy, and you don’t like the decisions that are being made, that is, if it always seems that every decision ends up the opposite of how you would want things, maybe then it’s time to leave that group. If you have a group of friends, and you find yourself saying, “why don’t we go see this movie”, and everyone else wants to go to the bar, you can whine about it, ask them to take into account your desires from time to time, or try to convince them that it’s going to be a really good movie, but eventually, you may have to face the fact that it’s time to find a new circle of friends.<br /> <br /> What’s this an argument for? Devolution and decentralization of power. To go back to the concrete world, we shouldn’t be mandating school standards at the federal level. We should let the states try different things, and let people move to the state that suits them. Or even at the city level. Sure, there are things that need to be done at the federal level, or even international level. Two hundred years ago, the states had armies. Now, we don’t even go to war without an international coalition. The issue there touches on the subject of hard versus soft compromises, which I’ll get into a different time.<br /> <br /> Democracy means one answer for all. The market means freedom. The first question to ask isn’t, “what should <i>we</i> do?”, it’s whether this is something that <i>we</i> need to do to begin with.<br /></p>
Hanging two of my grandfather's paintingstag:kdpeterson.net,2010:/blog//2.692007-01-24T23:05:40-08:00
<p>I’m in the process of figuring out how best to hang these paintings. <%popup(20070124-painting1.jpg|800|617|Here)%> and <%popup(20070124-painting2.jpg|380|600|here.)%> They’re oil on wood. I’m thinking of attaching 3/4” wood border around the back, flush with the edges, so that when it hangs, the border will be flush with the wall. It’ll also give some reinforcement to the larger painting, which is quite flimsy.</p>
Decorating the new apartmenttag:kdpeterson.net,2010:/blog//2.642007-01-09T00:34:45-08:00
<p>So I’m up in San Mateo, working at <a href='http://www.nextag.com'>NexTag</a>, and just trying to get into a routine. The apartment is coming along nicely, though it’s a little barren. I’m still unpacking my last few things. Some stuff got a little rumpled in transit.<br /> <br /> <a href='/images/flag_iron_large.jpg'><img src='/images/flag_iron.jpg' height='300' width='400' /></a></p>
The Low Limit Credit Cardtag:kdpeterson.net,2010:/blog//2.592006-12-28T00:38:41-08:00
<p>A simple idea to bring the web to millions of people who may be wary: a credit card with a low limit. I know I <em>could</em> get a low limit credit card, but it would take some effort and I’m not paranoid enough to go about it. Here’s my idea:<br /> <br /> From a bank you already have a credit card with. They give you another account, with a limit that you can set yourself online on their website (or that you can’t change online at all for the really paranoid). So I can set my other credit card number to a max amount of $100. At any time, I can go to the bank’s website, review the charges, and move them over to my real credit card. If I see something suspicious, I can cancel the limited card, apply for a new limited card online, and voila, no possibility of more fraudulent charges, I haven’t had to cancel my real credit card, and I have a new number to use for web purchases. Like so:<br /> <br /> CC#1: my real card, I get a paper bill for this, credit limit $2500, current balance $450<br /> CC#2: first low limit account, saw fraudulent charges after using it to verify my age on… umm… nevermind what kind of site, canceled account, will be disputing those $70 in charges with my bank<br /> CC#3: replacement low limit account, reviewed charges, transfered the $40 purchase at amazon.com to CC#1<br /> <br /> Details to work out yes, but this could help get a lot of people who are reluctant to buy online more comfortable with the idea.</p>
Back in Calitag:kdpeterson.net,2010:/blog//2.1052006-12-12T16:37:56-08:00
<p>I’m back in California, staying with my parents in Thousand Oaks. I’m trying to move out before I go insane, but I don’t want to get an apartment before I get a job because then I’d be limiting myself geographically. Even if I were to rule out one of Los Angeles or the Bay Area, I’d still like to wait so that I can ensure a short commute (well, if I ruled out Los Angeles, I would just move up there, cause it’s a pain doing this long distance).<br /> <br /> Bought a car last week, 98 mustang should last until I get everything stabilized and my savings go back on the uptick.</p>
Turning off internettag:kdpeterson.net,2010:/blog//2.932006-11-29T20:06:00-08:00
<p>I’m flying out Friday, so going to turn off internet (and turn-in modem) today. Will be on and off until I get into new apartment.</p>
I'm looking for a jobtag:kdpeterson.net,2010:/blog//2.1042006-11-29T19:23:23-08:00
<p>I need a job. I’m a programmer, BA CS UC Berkeley 2002, spent the last four years in the Marine Corps. I’m interested in collaborative filtering, comparison shopping sites, ecash and micropayment systems, and any sort of innovative methods to improve people’s access to information.</p>
Homepage redirects to blogtag:kdpeterson.net,2010:/blog//2.982006-11-29T19:18:09-08:00
<p>This makes it a lot easier for me to add the type of content you are likely to see here. MySpace is for losers, real geeks have domains.<br /> <br /> You can all email me at kevin@kdpeterson.net. I get out of the Marines tomorrow.</p>
La Tigro de William BLAKE en Esperantotag:kdpeterson.net,2010:/blog//2.682001-03-28T23:01:00-08:00
<p>Like all geeks, I speak the artificial language of <a href='http://www.esperanto.net/'>Esperanto</a>. I learned from native speakers. Good luck figuring that one out :) Next up I’m going to learn <a href='http://www.lojban.org'>lojban</a>. <p>Tradukis mi. Notu: cxi tiu traduko ne estas perfekta. Mi estas nur komencanto, kaj mi tradukas por lerni.</p> Tigro tigro bruli brile<br /> En la arbaroj de la nokto<br /> Kio senmorta okulo aux mano<br /> Povus formi vian teruran simetrion<br /> <br /> En kio malproksima profundaj aux cxieloj<br /> Brulis la fajro de viaj okuloj?<br /> Je kio flugiloj auxdacus li asperi<br /> Kio la mano, auxdacus ekkapti la fajro<br /> <br /> Kaj kio sxultro, kaj kio arto<br /> Povus tordi la tendenojn de via karo?<br /> Kaj kien via karo komencis pulsi<br /> Kio timega mano kaj kio timega piedo?<br /> <br /> Kio la martelo, kio la cxeno?<br /> En kio forno estis via cerbo?<br /> Kio la amboso, kio timega teno<br /> Auxdacus gxia mortegaj teruroj teni?<br /> <br /> Kien la steloj jxetis iliajn lancojn malsupre<br /> Kaj akvis cxielo per gxiaj larmoj<br /> Cxu li placxigxis vidi lia verko<br /> Cxu li kiu faris la sxafidon faris vin?<br /> <br /> Tigro tigro bruli brile<br /> En la arbaroj de la nokto<br /> Kio senmorta okulo aux mano<br /> Auxdacus formi vian teruran simetrion?<p>Notoj</p><p>En la kvina verso, mi usas "cxielo" pro "sky", kiu signifas nenio mistika, sed en la dek oka verso, me usas "cxielo" pro "heaven", kiu signifas kie Dio logxas. En la dek tria verso, la angla "chain" signifas "cxeno", sed ankaux segustas "kateno". En la dek dua verso, "dread" signifas "timega" aux "terura", sed oni foje usas "dread" kiel titolo de respekto. Exemple vidu Hamlet I.ii.50. La kvina stanco estis tre malfacila traduki. En la tria verso de gxi, mi tradukis "smile" (rideti) kiel placxigxi, cxar gxi ne sugestas ridi. Sxafidon estas simbolo pro Kristo, kaj ankaux alia poemo de William Blake.</p><br /></p>
Third Party Success Through the Electoral Collegetag:kdpeterson.net,2010:/blog//2.582000-11-12T00:19:00-08:00
<p><i>First posted at <a href="http://www.ocf.berkeley.edu/~peterson/docs/electors.shtml">OCF</a>,
12 Nov 2000.</i></p> <h3>Third Party Success Through the Electoral
College</h3> <h4>Summary</h4> <p>By adopting proportional selection of
electors, we can eliminate the "wasted vote problem" and reduce the
likelihood of a mismatch between the popular vote and the final
winner.</p> <h4>Preface</h4> <p>For a while now, I've been toying with
the idea of replacing our current election system with a version where
we would randomly select, say, 1% of the registered voters to vote in
the next election, and give them six months notice to actually learn
about the issues. A lot of people I talk to tell me they don't like it
because it's not democratic or some such. The reason I like the
general idea is because it's really not worth the trouble to figure
out whose opinions you like better for something like state assembly,
where the news coverage stinks, and the candidates may not even bother
publishing much information.</p> <p>Recently, in the wake of the
chance that Bush may win the election while losing the popular vote,
and the obvious knee-jerk reaction (which would never succeed anyway)
of eliminating the electoral college, I got to wondering why we had
the electoral college at all. So I go to the source, and check out the
Federalist No. 68. In it, I find hints of reasoning similar to the
above. Hamilton's vision of the electoral college was for a number of
people to be selected directly from the populace, who then gather and
discuss who would be the best person for the job. "A small number of
persons, selected by their fellow-citizens from the general mass, will
be most likely to possess the information and discernment requisite to
so complicated an investigation."</p> <p>This, of course, bears little
resemblance to the current system of a bunch of people who have
already decided who they will vote for, and who will all vote for the
same person, getting together to write out some forms and be done with
it. What went wrong?</p> <h4>The Idea</h4> <p>The first idea I had was
that allowing proportional allotment of the electors would prevent the
kind of fights now going on in Florida. Since both candidates got near
50% of the vote, they would both end up with near half of the
electors, and it's less likely that they would need to fight to the
bitter end for that last one or two electors.</p> <p>Then I started
thinking about the details of how to do a "proportional" system. In
Maine and Nebraska, the only states that split the electors, whoever
wins the popular vote in the state gets the two votes corresponding to
the senators, and whoever wins each congressional district wins that
elector. It's better than nothing, and at least makes sense, but this
still means that third parties have no power.</p> <p>Consider the
California
results:</p> <table> <tr><td>Gore:</td> <td>53.7%</td></tr> <tr><td>Bush:</td> <td>41.5%</td></tr> <tr><td>Nader:</td> <td>3.9%</td></tr> <tr><td>Other:</td> <td>.9%</td></tr> </table> <p>California
has 54 electoral votes. This election, they will all go to Gore. I
don't have data for the presidential race by district, but we have 20
Republican representatives and 32 Democrats now, so it's a good guess
that using a system like Maine or Nebraska we would get a breakdown
similar to this. This clearly better represents the will of the people
than does giving them all to Gore. But what about Nader? He received
more than 2/54 of the vote in California, but doesn't get a single
elector. If we give each candidate a number of electors equal to the
percentage of the vote they won times 54, rounded down, we would get
the following breakdown:</p><br />
<table> <tr><td>Gore:</td> <td>28</td></tr> <tr><td>Bush:</td> <td>22</td></tr> <tr><td>Nader:</td> <td>2</td></tr> <tr><td>Left Over:</td> <td>2</td></tr> </table> <p>Since we have to do something with the left overs, and since complicated math and law don't mix, we'll just give them to Gore. So the final tally is Gore with 30, Bush with 22, and Nader with 2. Okay, I'm now going to give up on real statistics and just make up the following scenario:</p> <table> <tr><td>A:</td> <td>265 Electors</td></tr> <tr><td>B:</td> <td>265 Electors</td></tr> <tr><td>C:</td> <td>8 Electors</td></tr> </table> <p>Assuming that C's party prefers A to B, then C's electors can cast their votes for A, ensuring that the third party votes were not wasted, but also showing the media that C was a serious candidate.</p> <p>Now assume that C's party is indifferent between A's and B's party, and that A's party controls the House and the Senate. C can now offer a major coup to B. Perhaps C's electors could cast their vote for B, in exchange for B's electors casting their vote for someone in their party other than B's running mate who is more desirable from the point of view of C's party. (E.g. A Bush/McCain administration rather than Bush/Cheney. I can't think of a good example for the Democrats.) This final outcome is probably much more representative of the general will of the people than any of the outcomes that would come about without this vote trading.</p> <p>Those of you who have done any readings on mathematical analysis of voting will probably realize immediately that the following three situations are completely equivalent if the groups vote as blocks:</p> <table> <tr><th>Faction</th><th>Situation 1</th> <th>Situation 2</th> <th>Situation 3</th></tr> <tr><td>A</td> <td>34</td> <td>2</td> <td>26</td> <tr><td>B</td> <td>33</td> <td>49</td> <td>25</td> <tr><td>C</td> <td>33</td> <td>49</td> <td>49</td></tr> </table> <p>In all cases, blocks A, B, and C all have equal power, because any two of them can do anything by themselves, and no single block can do anything. With more blocks, it just gets harder to easily see whether they are equal, or whether one is irrelevant, or what. So if we were to adopt proportional allotment of electors, one might initially think that any three way impasse would give equal power to the third party (who may only be supported by a small percentage of the population). This would not be the case, though, because if they can't come to a compromise, it will be decided by the House and the Senate, who are not likely to go with the third party. The psychological positions of the various factions would also come into play. A minor third party could claim victory if they are able to win a small concession (e.g. McCain instead of Cheney) from a major party, and would find their support increased in the next election after this success. On the other hand, a major party that granted a large concession (choosing a third party vice-president) to a minor party would be hated and distrusted by the people and would find their support collapse in the next election.</p> <p>I expect if such a system were adopted, the idea of "major" vs. "minor" parties would completely disappear from the presidential race within a few elections. Instead, everyone would be able to vote for the candidate they actually liked, and coalitions would be built at the electoral colleges, and some compromise would be worked out. A third party vote would no longer be wasted in a large state, and you wouldn't even need to get that much of the vote in a not-so-large state. I remember Perot getting slightly less than 20% of the vote in 92. Twenty percent would be enough for at least one electoral vote in 37 states, so that vote would not have been wasted. And, of course, third parties would get more of the vote if people didn't think a third party vote was wasted.</p> <p>Admittedly, getting the states to adopt such a system is a pretty tough chore. If a state has a clear majority for one of the major parties, the majority of the people like who the votes are going for, and wouldn't want to split them. The only hope is to ram it through a few states where the minority party and third party sympathizers could actually form a majority. Your best bet is in states that have the initiative process, since the legislature is always going to be dominated by the majority party. Once a few states adopt it (Florida would probably be real sympathetic right now) third parties can start growing in power and then more states will want to adopt the system.</p> <h4>Further Reading</h4> <dl> <dt><a href="http://hanson.gmu.edu/ignore.ps">On Voter Incentives to Become Informed</a></dt> <dd>Robin Hanson's extremely technical analysis of the problem of it not being worth while to be a well informed voter. Sorry, only PostScript. I haven't read this, but it's on my reading list.</dd> <dt><a href="http://thomas.loc.gov/home/histdox/fed_68.html">Federalist No 68</a></dt> <dd>Hamilton's "The Mode of Electing the President" is the most direct discussion of the electoral college. Also available in many print editions.</dd> </dl><br />
A DOS Batch File Virustag:kdpeterson.net,2010:/blog//2.631998-03-26T00:38:00-08:00
<h3>26. March 1998</h3> <p>Joshua Zarwel recently expressed disbelief in the possibility of batch file viruses on <a href = "news:alt.comp.virus">alt.comp.virus</a>. I was going to respond with the rumor I had heard about a virus which spreads by using the MS-DOS <i>find</i> command to extract itself, but I wondered about the details, so I went up a few cul-de-sacs before finding a method that allows the creation of a batch file virus.</p> <p>Please note that I am not a "virus writer." Don't send me any email saying, "Hey, d00d, I need to crash my junior high library's computer!!! Send me an email viriiiiii that will turn it into slag." If I thought this virus was at all dangerous, I wouldn't have posted it. If you are going to play with it, keep in mind that you have to type the command in UPPERCASE to avoid an infinite loop. If anyone knows how to make the <i>if string==string</i> operation case insensitive, please write me. Create yourself a new directory with some simple batch files in it, so that you don't end up infecting yourself (although a few minutes with a text editor will let you remove the virus from your system).</p> <p>The logical steps in the operation of a batch file virus are:</p> <ol> <li>Extract the virus portion of the host file</li> <li>Iterate through a selection of batch files to be infected</li> <li>Append the virus to each batch file</li> </ol> <p>Extracting the virus portion of the host file is best done using the <i>find</i> command, which works much like Unix <i>grep</i>, allowing one to output all lines containing a search string. So, we have to make sure all lines of our virus contain the same search string that isn't likely to occur randomly (or the virus would pick up random lines of other files).</p>
<p>Iteration is accomplished via the <i>for <b>var</b> in <b>list</b> do <b>command</b></i> construction. Simply enough, the list is just <i>(*.bat)</i>.</p> <p>Appending is done using the <i>copy</i> command, which supports adding two files together. <i>"copy file1 + file2"</i> will append <i>file2</i> to the end of <i>file1</i>. So here's the virus:</p> <pre>find "batvirus.xyz" %0.bat |find /v /i "%0.bat"> batvirus.xyz for %%b in (*.bat) do if not %%b==%0.BAT copy %%b + batvirus.xyz del batvirus.xyz</pre> <p>First, we get the name of the host file, which will probably be the first word of the command line with the suffix ".bat" added on to it. We search through it for the name of our temp file, because it appears in all three lines. Because <i>find</i> thinks you are a moron, it adds the name of the file searched to its output. The second <i>find</i> filters this out, by searching for the name of the host file in the output of find (remember that the %0 will be interpolated before <i>find</i> gets its arguments). The <i>/v</i> switch does the same thing as on <i>grep</i>: output all lines that <i>don't</i> match the pattern. <i>Find</i> doesn't add any file names when reading from a pipe. Using <i>type %0.bat | find "batvirus.xyz" > batvirus.xyz</i> would work just as well, but I thought of the other first. After this command finishes, we have a temp file called "batvirus.xyz", which contains the entire virus (any name would work, except that it cannot end in ".bat" or you get multiple copies of the virus). All we have to do now is infect all the bat files with it.</p> <p>We iterate through all the batch files in the current directory using the <i>for...do</i> construction. Since MS-DOS is such crap, you are limited to a single line command, and you can't even nest <i>for</i>s, like you would do to infect an entire directory tree rather than a single directory. The command the <i>for</i> command runs is an <i>if</i> statement that tests to make sure we aren't trying to reinfect the host file. This is where the biggest bug creeps in: <i>for</i> loads its arguments with uppercase names, so <i>%b</i> will evaluate to something like "PROGRAM.BAT". If the batch file was run by typing "program arg1 arg2 arg3", then <i>%0.BAT</i> evaluates to "program.BAT", which <i>if</i> doesn't think is the same thing. So unless the program was run as uppercase, we reinfect the host file, which causes the virus to repeatedly run. In testing, before I could hit CTRL-C, I had batch files several thousand bytes long filled with multiple copies of the virus. Note that there is no check if a batch file is already infected. This could be accomplished by a more complicated construction with <i>find</i> searching for the temp file's name in the batch file, and returning an errorlevel to indicate whether to infect or not.</p> <p>So now we have several infected batch files (if the directory had a lot). To avoid leaving incriminating evidence around, the last line deletes the temp file.</p> <p>For you aspiring criminals out there, one easy change is to turn echo off in the first line of the virus. The tricky part is to include the search string in the line to do this. If you're up for a challenge, it's even easier to write viruses in the Unix Shell. Dr. Cohen mentions that "once you are logged into a Unix system, you can type a 8 character command, and before too long, the virus will spread" (<i>A Short Course on Computer Viruses</i>, page 38). I haven't been able to figure out what this eight character virus is.</p> <p><small><i>11 March 2001 note:</i> I make reference to an "email virus" in the above. There were no email viruses until a year after I wrote this, so the meaning of that bit has changed slightly since writing.</small></p><br />