noodling towards a functional brain

Tuesday, April 21, 2009

Deconstruct(or)ing Scala

The question of whether to use a constructor or a factory method for object construction is not new; we've had this discussion for years in the Java community. Scala's approach to object construction has a few features that will undoubtedly reignite this debate.

On the one hand, ordinary object construction is significantly more powerful than in Java - first, the all the ordinary Java boilerplate of assigning constructor parameters to member variables is abolished. What in Java would look like this:

public class Foo {
final int myInt;
final String myString;

public Foo(int myInt, String myString) {
this.myInt = myInt;
this.myString = myString;
}
}

becomes the concise and sensible

class Foo(val myInt: Int, val myString: String)

More significantly, Scala's trait mechanism allows for extension of a class definition at the use site; for example, if I have a trait that adds rendering logic to a class, I can mix it in only when I need it.

trait FooRenderer extends Foo {
def render: String = "I'm a Foo! My int is "+myInt+" and my string is "+myString
}

val f1 = new Foo(1, "hi") // normal Foo
val f2 = new Foo(2, "well, hello!") with FooRenderer // renderable Foo

So Scala's constructors are really powerful and you really want to use them, right? But wait...

It turns out that there are some subtle issues that arise if you start adding more logic to constructors in Scala. The logic of a Scala constructor goes directly in the body of the class, and here's the tricky bit: this is also where other member variables that aren't just blindly assigned as constructor parameters are declared and assigned. In Java, any intermediate variables that you used within a constructor were unavoiably local; in Scala they can easily (and will, if care is not taken) become a permanent part of the object.

Let's look at something a little more complex. Consider a class that, as part of its construction, finds the most common element in a list and assigns both that element and the number of occurrences to member variables.

class Common[T](l: Iterable[T]) {
val (value, count) = l.foldLeft(Map.empty[T,Int]) {(m, v) =>
m + (v -> (m.getOrElse(v, 0) + 1))
}.reduceLeft {(a, b) =>
if (a._2 > b._2) a else b
}
}

Now, there may be ways to implement this that avoid constructing and decomposing a tuple, but this is the most straightforward and efficient implementation I could come up with. A peek at the generated bytecode reveals something interesting, however:

93: putfield #83; //Field x$1:Lscala/Tuple2;
96: aload_0
97: aload_0
98: getfield #83; //Field x$1:Lscala/Tuple2;
101: invokevirtual #73; //Method scala/Tuple2._1:()Ljava/lang/Object;
104: putfield #85; //Field value:Ljava/lang/Object;
107: aload_0
108: aload_0
109: getfield #83; //Field x$1:Lscala/Tuple2;
112: invokevirtual #76; //Method scala/Tuple2._2:()Ljava/lang/Object;
115: invokestatic #91; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
118: putfield #93; //Field count:I

What's up with line 93? I didn't want that spurious Tuple2 to hang around as a member field - I just needed it as an intermediate value in the construction of the object!

As it turns out, this problem is not restricted to tuples; if you use intermediate variables in the construction of your objects, they will become permanent residents. This may not be a real problem in most circumstances, but it feels messy. Now, the standard thing to do in this situation would to be to create a factory method on the companion object:

object Common {
def apply[T](l: Iterable[T]): Common[T] = {
val (value, count) = l.foldLeft(Map.empty[T,Int]) {(m, v) =>
m + (v -> (m.getOrElse(v, 0) + 1))
}.reduceLeft {(a, b) =>
if (a._2 > b._2) a else b
}

new Common(value, count)
}
}

class Common[T](val value: T, val count: Int)

Now there's no intermediate variable stored in the bytecode for Common, but we have a new problem: we can no longer construct the object from an iterable while mixing in an additional trait at the instantiation site!

Scala supports the use of auxiliary constructors, with the caveat (similar to that present in Java) that the first statement in an auxiliary constructor must be either a call to the primary constructor, or another auxiliary constructor. Because of this constraint, we can't simply use the contents of the factory method above in an auxiliary constructor. We can, however, evaluate a method within the chained call, and that gives us a workable, if somewhat boilerplate-laden solution.

class Common[T](val value: T, val count: Int) {

private def this(t: Tuple2[T, Int]) = this(t._1, t._2)

def this(l: Iterable[T]) = this(
l.foldLeft(Map.empty[T,Int]) {(m, v) =>
m + (v -> (m.getOrElse(v, 0) + 1))
}.reduceLeft {(a, b) =>
if (a._2 > b._2) a else b
}
)
}

By threading the decomposition through a private constructor that takes a tuple, we can now avoid the spurious intermediate values getting incorporated into the class, and still enjoy the benefits of instantiation-site mix-ins. What's more, there has been a bunch of talk on the Scala mailing list about a future unification of tuples with method (and hopefully constructor) parameters - in which event the extra private constructor could disappear entirely!

1 comment:

  1. Indeed, I faced same issue couple of months back where I was accidently leaking memory because of that (http://gnufied.org/2009/03/25/constructor-memory-leaks/).

    There ought to be a permanent solution for this nastiness.

    ReplyDelete

About Me

My photo
aspiring to elegant simplicity