Make Scala lists work for you

(Well dammit, I meant to save a draft, and here I’ve accidentally published it and FriendFeed spreads it around like a gossipy neighbor, so I guess I’d better finish it.)

I’ve been working on a collaborative filtering system based on genetic algorithms for message boards, running shoe recommendations, the Netflix Prize for a while now. The latest iteration is a mix of Java and Scala. I sat down to clean up some of the code tonight, and wanted to rewrite a function that made use of dot product.

Scala, like pretty much every modern programming language, has a REPL, aka. interpreter, making it really easy to work the kinks out of something before getting ant, junit, or an IDE involved. But when I go to paste the methods I need to work with into the repl, curses, foiled again! That piece of my app is in Java, I’ll have to rewrite it in Scala.

What a wonderful opportunity to talk about how Lists will make all of your wildest dreams come true. Let’s see the Java version of unit vector:

public static double[] unit(double[] vector) {
    double[] unit = new double[vector.length];
    double norm = 0.0;
    for(double v : vector) {
        norm += v * v;
    } 
    norm = Math.sqrt(norm);
    for(int i = 0; i < vector.length; i++) {
        unit[i] = vector[i] / norm;
    }
    return unit;
}

Wow that’s a lot of typing for such a simple concept. How would I express “divide each component by the magnitude of the vector” in scala? Well, pretty much exactly like that:

def unit(v : List[Double]) = {
  val sum = Math.sqrt((0.0 /: v.map(x => x * x)) {_ + _})
  v.map(x => x / sum)
}

Let’s walk through this one. The innermost v.map(x => x * x) maps the vector to its squares. The /: does a fold left starting with 0.0, and applying { _ + _ }. Note that /: is a method call on the list returned by the v.map.... This gives the sum of squares, we take the square root to get the magnitude. Our last operation is another map. The tricky part here is the precedence of the /: operator means that you do need parentheses.

Now I have unit vectors, and I need the dot product that operates on unit vectors. If your math is hazy, dot product of unit vectors [a, b, c] and [x, y, z] is a * x + b * y + c * z. Here it is in Java:

public static double dot(double[] a, double[] b) {
    double result = 0.0;
    for(int i = 0; i < a.length; i++) {
        result += a[i] * b[i];
    }
    return result;
}

Pretty straightforward, but still imposes the mental effort on you of deciding a name for that variable, and I really hate that. Bottom line? Scala lets me avoid thinking up names for variables:

def dot(a : List[Double], b : List[Double]) = {
  ((a zip b).map{case(x,y) => x * y} :\ 0.0) { _ + _ }
}

We’ll walk through this one. I love zip. I like the Java 5 for-each loop, but it doesn’t do me any good when iterating over two lists in parallel. With Scala Lists, you can zip them together and treat them as a single list.

scala> List(1, 2, 3) zip List('a, 'b, 'c)
res25: List[(Int, Symbol)] = List((1,'a), (2,'b), (3,'c))

When I use map, I use pattern matching to get my two items back out. Map expects a function taking a single parameter, and gives it whatever is in the list, which in this case are actually Tuple2 objects. Pattern matching allows me to break that tuple apart. I sum the list up, this time using a fold right, which means I need to change the order of the parameters.

So now I’m able to play with things in the REPL, get my replacement code working, and paste it back into my project, and run the junit test to verify that my improvements didn’t break anything. At 11pm, between putting the baby to bed and going to bed myself, I don’t have much mental energy to hold huge methods in my head. Scala means I don’t need to.

The downside is that then I go back to work in the morning, I sit there staring at the screen wondering why Eclipse is not formatting my Java right and displaying little red errors until I realize that I actually need semi-colons and return statements in Java.