Let’s get physical

I’m a mechanical engineer and, although I studied for software engineering degrees right after I graduated, I did my fair share of numerical calculations during my five years of studies in engineering. Some of it in Turbo Pascal, but most of it in the recommended language, which was Fortran. It was very spartan but, one thing it did well, was get out of the way between engineers and their calculations. Around 2007, 17 full years away from purely numerical computing, I turned around to see what had changed in the domain of calculating with physical quantities. Believe me, any new version of Fortran that has come out was a huge improvement over Fortran 77, but I found that not much had changed in the calculations themselves. In particular, regarding calculations with physical units and proper dimensional analysis. The kind that could avoid disasters like these.

Before I tell you what I concocted in 2007 and then in 2014, let me stay a little longer on Fortran, or rather what was designed to be a modern-day Fortran: Fortress.

Fortress

Fortress, the brainchild of Guy Steele, was an experimental project in Sun to answer the HPCS challenge (cf. “Fwd: HPCS recap”). After it was not selected in subsequent rounds of the challenge, the project persisted for a total of ten years, before being axed by Oracle. People explain things like that because of the “pragmatic” approach of Oracle towards Java. Which is a polite way to say that Oracle makes Java walk the streets for money, and probably gives her a good beatin’ whenever she comes back with low earnings.

Fortress had nailed the dimension thing, offering dimensions (decribed as “type-like constructs that are separate from other types”) and units (described as “special constructs that modify types and values […] that are instances of dimensions”). Dimensionless and dimensioned numbers can use the whole gamut of language constructs. In addition to standard SI dimensions and units, many more are available in the standard library and more can be added.

Here is what some dimension-aware calculations would look in Fortress (Fortress is a language written to be read back as close to mathematics as possible, so this is how Fortress code would look like).

x:ℝ64 Length = 1.3m
t:ℝ64 Time = 5 s
v:ℝ64 Velocity = x/t
w:ℝ64 Velocity in nm/s = 17nm/s

Standard SI prefixes like (nano, micro etc) and their abbreviations (n, μ etc) can be used. In order to avoid misunderstandings, these prefixes cannot be used in front of units of dimension Information (bit and byte).

Both dimensions and units can be multiplied, divided, and raised to rational powers to produce new dimensions and units. Multiplication can be expressed using juxtaposition or · (yes, Fortress actually gives the natural meaning of multiplication to juxtaposition!) Division can be expressed using / or per. The syntactic operators square and cubic (or the special postfix syntactic operators squared and cubed) may be applied to a dimension or unit in order
to raise it to the second power, third power, respectively. The syntactic operator inverse may be applied to a dimension or unit to divide it into 1. Ex. grams per cubic centimeter, meter per second squared, inverse ohms.

All in all, a pretty solid treatment. Unfortunately, none of this was actually implemented… However, there is one existing language which does support dimensioned calculations in a very real way.

F#

F# is the only mainstream language, as far as I know, that contains specific constructs for working with units (Curl supposedly does also, but I’m not familiar with it and it’s a long stretch to call it mainstream).

It is not entirely pleasing as, there is no concept of dimension, only the concept of unit of measure, but all the machinery needed for safe dimensional calculations is there. As is common in the .Net and Java world, the units of measure are defined using an attribute (annotation in Java-speak). SI predefined “measures” can be found in the PowerPack.

[<Measure>] type kg
[<Measure>] type s
[<Measure>] type m
[<Measure>] type cm = m / 100
[<Measure>] type ml = cm^3

let gravityOnEarth = 9.81<m/s^2>
let heightOfFall = 15<m>
let speedOfImpact = sqrt(2.0 * gravityOnEarth * heightOfFall)
let someForce = 10<kg m s^-2>

As you see in the example, units can be defined by their own (in which case they can be thought as defining a dimension) or as derived from some unit expression. Such expressions can be suffixed to numeric literals between angle brackets (without any intervening space). As a nod to Fortress, who knows?, juxtaposition stands for multiplication in the numerator of unit expressions. A dimensionless quantity is denoted by the unit expression <1>. One can even have generic units! And, you know, .Net has real multidimensional array types, and value types and also arrays of value types. The whole makes a compelling proposal for a scientist or an engineer.

Dimenso
Back in 2007, none of that was there. So I set myself a goal of experimenting in that domain, leading to the open-source project Dimenso. I focused on just SI quantities and then, just the MKS part together with current (no temperature, no amount of substance and no luminous intensity).

Dimenso: C++

The prototypical implementation was in C++. Believe it or not, the template language in C++ is pretty advanced, even with today’s standards. One could really make a case that C++ template metaprogramming is akin to using dependent types. Because templates in C++, unlike generics in the likes of Java or .Net, can actually be parameterized by integers, in addition to types. Here’s how the definition of the class that embodies dimensioned values would start: template<int L, int M, int T, int C> class DimensionedValue.

Defining what it means to multiply two dimensioned values together is this simple operator (yes! Java-lovers, most other languages do have operator overloads!).

template<int L2, int M2, int T2, int C2>
 DimensionedValue<L+L2,M+M2,T+T2,C+C2> operator*(DimensionedValue<L2,M2,T2,C2> other) {
     return DimensionedValue<L+L2,M+M2,T+T2,C+C2>(magnitude()*other.magnitude());
}

Given all the machinery available in C++, one can really make these objects look and behave like plain numbers. The challenge, then, was to go as close to that in C# and Java, which lacked support for dependent types.

Dimenso: C# and Java

The way I followed was: code generation. I generated all dimension combinations within specific bounds (but I also provided the Python script if one would need a more extended set). These combinations were named after the corresponding exponents (for example, type m_2k0s1a0 represents dimension L-2T) but actual names from physics were used, when available (but not in a right way… if you try to use Dimenso, stick to the ugly names).

This is how an arithmetic operation on “length” would look like in Java, if I hadn’t messed up the name thing.

public velocity divide(time v2) {
    velocity ret = new velocity(this.value/v2.getValue());
    return ret;
}

In C#, which does have operator overloads and value types, the whole is a little more palatable. But still, generating hundreds of classes feels… kind of wrong. Let’s see what the Java of today has to say on the matter.

JSR-275, JSR-363 and JScience
To make a long story short, JSR-275 “Measures and Units” was finally rejected and, its successor, JSR-363 “Units of Measurement API” has not really formed yet but, abandoning at that point would make it really short. That’s why I’m going to look at support for units in JScience, which claims support for such.

Let’s look at some sample code, which will highlight the point I was trying to make with Dimenso.

// Compiler Warnings
Unit<Acceleration> mps2 = Unit.valueOf("m/s²"); // Compiler Warning.
Unit<Power> WATT = JOULE.divide(SECOND); // Compiler Warning.

// Adding run-time check of dimension consistency removes the warning.
Unit<Acceleration> mps2 = Unit.valueOf("m/s²")
    .asType(Acceleration.class); 
Unit<Power> WATT = JOULE.divide(SECOND).asType(Power.class);

// Use wildcard parameterized type (no warning and no check).
Unit<?> kelvinPerSec = KELVIN.divide(SECOND);

JScience has a lot of capabilities for units, including arbitrary systems of dimensions, localizable unit parsing and output and compound units, like those for common time. But there is no way to have the compiler figure out what units two quantities multiplied or divided among themselves have. This is a shortcoming of Java, not of JScience. And, unless you’ve been ruined by too many years of Java, you can tell that no scientist or engineer would put up with so much unwarranted verbosity to do his job.

Do not despair, as Java is not the only choice you have on the JVM.

Scala

Let’s see what I’ve been able to do with some typelevel hacking in Scala. First of all, I’d like to point you to a better solution, Metascala Units, which did come up in my Google search, and a tab in my browser was open, waiting patiently for me to have a look at them, which I did, but after the fact. No matter, because what I really wanted was some practical experience with typelevel programming, with dimension-aware calculations as a good case to work on. The way that Metascala encodes integers in the typesystem is a lot more concise and a lot more elegant than my own, which is based on Shapeless’ encoding of naturals.

Here is some safe code using Metascala Units:

val dist : Length = m(2.3)
val time : Time = s(1.7)
val x = dist * time
val speed : Speed = dist / time

I hope you can appreciate the fact that this looks like using built-in features. It is as concise as it can get and offers full compile-time checking.

And here’s my own stab at this. It’s not part of Dimenso yet, because it’s not fully tested, I believe that if I build stronger typelevel kung-fu, I’ll be able to handle output of units at compile-time, as well, and there is probably some more magic to add with implicit value classes.

case class DimVal[M1 <: ZInt, K1 <: ZInt, S1 <: ZInt, A1 <: ZInt](val value : Double) {
    type T = DimVal[M1,K1,S1,A1]
    def +(b: DimVal[M1,K1,S1,A1]) = new DimVal[M1,K1,S1,A1](value + b.value)

    def -(b: DimVal[M1,K1,S1,A1]) = new DimVal[M1,K1,S1,A1](value - b.value)

    def *[M2 <: ZInt, M <: ZInt, K2 <: ZInt, K <: ZInt, S2 <: ZInt, S <: ZInt, A2 <: ZInt, A <: ZInt]
    (b: DimVal[M2,K2,S2,A2])
    (implicit ev: ZSumAux[M1,M2,M], ev2: ZSumAux[K1,K2,K], ev3: ZSumAux[S1,S2,S], ev4: ZSumAux[A1,A2,A]) : DimVal[M,K,S,A] =
        new DimVal[M,K,S,A](value * b.value)

    def /[M2 <: ZInt, M <: ZInt, K2 <: ZInt, K <: ZInt, S2 <: ZInt, S <: ZInt, A2 <: ZInt, A <: ZInt]
    (b: DimVal[M2,K2,S2,A2])
    (implicit ev: ZDiffAux[M1,M2,M], ev2: ZDiffAux[K1,K2,K], ev3: ZDiffAux[S1,S2,S], ev4: ZDiffAux[A1,A2,A]) : DimVal[M,K,S,A] =
        new DimVal[M,K,S,A](value / b.value)

    def asString () (implicit i1: ToZInt[M1], i2: ToZInt[K1], i3: ToZInt[S1], i4: ToZInt[A1]) = {
        val x1 = i1.apply()
        val x2 = i2.apply()
        val x3 = i3.apply()
        val x4 = i4.apply()
        val u =
        (x1,x2,x3,x4) match {
            case (0,0,1,1) => " C"
            case (1,1,-2,0) => " N"
// ... more special cases to handle derived units elided ...
            case _ => {
                val sb = new StringBuilder
// ... ugly, imperative code elided ...
                if(sb.length > 0) " "+sb else ""
            }
        }
        value.toString + u
    }

}

The dimension calculations are done at the type level, using types ZSumAux and ZDiffAux, and type ToZInt handles conversion to language-level integers. Like I said, not all induction cases are tested, but here’s what they look like now.

import shapeless._
import shapeless.Nat._0

trait ZInt

case class ZPos[T <: Nat]() extends ZInt
case class ZNeg[T <: Nat]() extends ZInt

trait ZSumAux[A <: ZInt, B <: ZInt, C <: ZInt]
trait ZDiffAux[A <: ZInt, B <: ZInt, C <: ZInt]

object ZSumAux {
    implicit def sum3[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : SumAux[A, B, C]) = new ZSumAux[ZPos[A], ZPos[B], ZPos[C]] {}

    implicit def sum4[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : SumAux[A, B, C]) = new ZSumAux[ZNeg[A], ZNeg[B], ZNeg[C]] {}

    implicit def sum5[A <: Nat] = new ZSumAux[ZPos[A], ZNeg[A], ZPos[_0]] {}

    implicit def sum6[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[A, B, C], ev2: LT[B,A]) = new ZSumAux[ZPos[A], ZNeg[B], ZPos[C]] {}

    implicit def sum7[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[B, A, C], ev2: LT[A,B]) = new ZSumAux[ZPos[A], ZNeg[B], ZNeg[C]] {}

    implicit def sum8[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[A, B, C], ev2: LT[B,A]) = new ZSumAux[ZNeg[A], ZPos[B], ZNeg[C]] {}

    implicit def sum9[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[B, A, C], ev2: LT[A,B]) = new ZSumAux[ZNeg[A], ZPos[B], ZPos[C]] {}
}

object ZDiffAux {
    implicit def diff3[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[A, B, C], ev2: LT[B,A]) = new ZDiffAux[ZPos[A], ZPos[B], ZPos[C]] {}

    implicit def diff3b[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[B, A, C], ev2: LT[A,B]) = new ZDiffAux[ZPos[A], ZPos[B], ZNeg[C]] {}

    implicit def diff4[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[A, B, C], ev2: LT[B,A]) = new ZDiffAux[ZNeg[A], ZNeg[B], ZNeg[C]] {}

    implicit def diff4b[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[B, A, C], ev2: LT[A,B]) = new ZDiffAux[ZNeg[A], ZNeg[B], ZPos[C]] {}

    implicit def diff5[A <: Nat] = new ZDiffAux[ZPos[A], ZPos[A], ZPos[_0]] {}

    implicit def diff5b[A <: Nat] = new ZDiffAux[ZNeg[A], ZNeg[A], ZPos[_0]] {}

    implicit def diff6[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : SumAux[A, B, C], ev2: LT[B,A]) = new ZDiffAux[ZPos[A], ZNeg[B], ZPos[C]] {}

    implicit def diff7[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : SumAux[B, A, C], ev2: LT[A,B]) = new ZDiffAux[ZPos[A], ZNeg[B], ZNeg[C]] {}

    implicit def diff8[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : SumAux[A, B, C], ev2: LT[B,A]) = new ZDiffAux[ZNeg[A], ZPos[B], ZNeg[C]] {}

    implicit def diff9[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : SumAux[B, A, C], ev2: LT[A,B]) = new ZDiffAux[ZNeg[A], ZPos[B], ZPos[C]] {}
}

trait ToZInt[Z <: ZInt] {
    def apply() : Int
}

object ToZInt {
    implicit def toPosInt[N <: Nat] (implicit i: ToInt[N]) = new ToZInt[ZPos[N]] {
        def apply() = i.apply()
    }
    implicit def toNegInt[N <: Nat] (implicit i: ToInt[N]) = new ToZInt[ZNeg[N]] {
        def apply() = - i.apply()
    }
}

Some useful definitions are in object DimValOps.

object DimValOps {
    type Z0 = ZPos[_0]
    type Z1 = ZPos[Succ[_0]]
    type Z2 = ZPos[Succ[Succ[_0]]]
    type Z3 = ZPos[Succ[Succ[Succ[_0]]]]
    type Z4 = ZPos[Succ[Succ[Succ[Succ[_0]]]]]
    type Z_1 = ZNeg[Succ[_0]]
    type Z_2 = ZNeg[Succ[Succ[_0]]]
    type Z_3 = ZNeg[Succ[Succ[Succ[_0]]]]

    type length = DimVal[Z1, Z0, Z0, Z0]
    type mass = DimVal[Z0, Z1, Z0, Z0]
    type time = DimVal[Z0, Z0, Z1, Z0]
    type current = DimVal[Z0, Z0, Z0, Z1]
    type charge = DimVal[Z0, Z0, Z1, Z1]
    type area = DimVal[Z2, Z0, Z1, Z0]
    type volume = DimVal[Z3, Z0, Z1, Z0]
    type velocity = DimVal[Z1, Z0, Z_1, Z0]
    type acceleration = DimVal[Z1, Z0, Z_2, Z0]
    type momentum = DimVal[Z1, Z1, Z_1, Z0]
    type angular_momentum = DimVal[Z2, Z1, Z_1, Z0]
    type frequency = DimVal[Z0, Z0, Z_1, Z0]
    type force = DimVal[Z1, Z1, Z_2, Z0]
    type pressure = DimVal[Z_1, Z1, Z_2, Z0]
    type viscosity = DimVal[Z_1, Z1, Z_1, Z0]
    type energy = DimVal[Z2, Z1, Z_2, Z0]
    type power = DimVal[Z2, Z1, Z_3, Z0]
    type potential = DimVal[Z2, Z1, Z_3, Z_1]
    type capacitance = DimVal[Z_2, Z_1, Z4, Z2]
    type resistance = DimVal[Z2, Z1, Z_3, Z_2]
    type conductance = DimVal[Z_2, Z_1, Z3, Z2]
    type magnetic_flux = DimVal[Z2, Z1, Z_2, Z_1]
    type magnetic_flux_density = DimVal[Z0, Z1, Z_2, Z_1]
    type inductance = DimVal[Z2, Z1, Z_2, Z_2]

    val m = new length(1) //meter
    val kg = new mass(1) //kilogram
    val s = new time(1) //second
    val A = new current(1) //Ampere
    val C = new charge(1) //Coulomb
    val N = new force(1) //Newton
    val Hz = new frequency(1) //Hertz
    val Pa = new pressure(1) //Pascal
    val ohm = new resistance(1)
    val V = new potential(1) //Volt
    val S = new conductance(1) //Siemens
    val W = new power(1) //Watt
    val J = new energy(1) //Joule
    val F = new capacitance(1) //Farad
    val H = new inductance(1) //Henry
    val Wb = new magnetic_flux(1) //Weber
    val T = new magnetic_flux_density(1) //Tesla

    implicit def numToDim(d : Double) : DimVal[Z0,Z0,Z0,Z0] = new DimVal[Z0,Z0,Z0,Z0](d)
}

On the one hand, it looks frightening. On the other hand, it’s just a few lines of code to add a capability that is missing since forever. Here is some putting it through its paces.

    val c0 = new charge(4)
    println("c0="+c0.asString())

    val v = 3 * m / s
    println("v="+v.asString)

    val mom = 10 * kg * v
    println("mom="+mom.asString())

    val e = mom * v
    println("e="+e.asString())

    val p = e / (3 * s)
    println("p="+p.asString())

Which outputs:

c0=4.0 C
v=3.0 m·s-1
mom=30.0 m·kg·s-1
e=90.0 J
p=30.0 W

Look what happens when you try to compile an assignment with mismatched dimensions.

    val p : pressure = e / (3 * s)
Error:(37, 26) type mismatch;
 found   : ShapelessExperiment.DimVal[ShapelessExperiment.ZPos[shapeless.Succ[shapeless.Succ[shapeless.Nat._0]]],ShapelessExperiment.ZPos[shapeless.Succ[shapeless.Nat._0]],ShapelessExperiment.ZNeg[shapeless.Succ[shapeless.Succ[shapeless.Succ[shapeless.Nat._0]]]],ShapelessExperiment.ZPos[shapeless.Nat._0]]
 required: ShapelessExperiment.DimValOps.pressure
    (which expands to)  ShapelessExperiment.DimVal[ShapelessExperiment.ZNeg[shapeless.Succ[shapeless.Nat._0]],ShapelessExperiment.ZPos[shapeless.Succ[shapeless.Nat._0]],ShapelessExperiment.ZNeg[shapeless.Succ[shapeless.Succ[shapeless.Nat._0]]],ShapelessExperiment.ZPos[shapeless.Nat._0]]
    val p : pressure = e / (3 * s)
                         ^

Ok, the message is not entirely simple but, if you understand the type signatures, it is clear how to read it. In conclusion, even though Scala’s compile-time metaprogramming has been said to imitate, rather than implement, dependent types, you can see it is able to quack like a duck and walk like a duck. And it is perfectly capable of stepping in when Java proves too little, even though it cannot overcome fundamental limitations of the JVM.

Advertisements

Spring break

My witty-title-generator was a bit off for this one. The pun does not imply that Spring is broken, not that it has broken anything. But what I’m going to describe is bound to shake (and break?) a misconception around it.

Consider Spring JavaConfig (now officially part of the Spring distribution). I admit it’s not my favorite configuration option (and – both – regular readers know I have strong opinions on DI). But many people prefer it to XML configuration. And some of them believe something about it, which is not true.

Classes annotated with @Configuration are scanned for bean declarations. At the same time, this annotation also creates a bean for the class. So far, so good. Throw @DependsOn to the picture, which can be used to inform Spring about a dependency which cannot be established through the regular means of constructor or property injection. Don’t get me started about how often people skip proper DI in favor of: getting hold of the context, shamelessly pulling a bean out, then using @DependsOn to force the dependency. I might write another post about this (if I find a way to do it without giving in to my impulse to shout it in all caps and lots of exclamation marks). Suppose, for argument’s sake, that its use is warrantied because of some hidden dependency on a side effect or something.

The problem, which prompted this post, is that some people think that using @DependsOn, from a configuration class to some other configuration class, affects the initialization order of the beans declared in them.

Let me demonstrate what’s wrong with this idea, and show you what @DependsOn really does in that case. Classes foo and bar are configuration classes, but I have also exposed their bean faces, by declaring initialization methods in both of them (imports elided for brevity).

@Configuration
public class foo {
	@PostConstruct
	public void init() {
		System.out.println("foo.init()");
	}

	@Bean
	public Long foobean() {
		System.out.println("foobean");
		return System.currentTimeMillis();
	}
}

@Configuration
@DependsOn("foo")
public class bar {
	@PostConstruct
	public void init() {
		System.out.println("bar.init()");
	}

	@Bean
	public Long barbean() {
		System.out.println("barbean");
		return System.currentTimeMillis();
	}
}

The two unit tests that follow register them in two different orders. Scanning a package is the same, except the registering order is arbitrary and does not lead itself to a repeatable test. I’ll tell you how to conduct that test later, if you want to try it.

	@Test
	public void testDependsOnOrderOne() {
		System.out.println("Running test with bar scanned before foo, and bar having @DependsOn(\"foo\")");
		AnnotationConfigApplicationContext ctx = new AnnotationConfigApplicationContext();
		ctx.register(bar.class);
		ctx.register(foo.class);
		ctx.refresh();
		final Long foobean = ctx.getBean("foobean", Long.class);
		final Long barbean = ctx.getBean("barbean", Long.class);
		Assert.assertTrue(barbean < foobean);
	}

	@Test
	public void testDependsOnOrderTwo() {
		System.out.println("Running test with foo scanned before bar, and bar having @DependsOn(\"foo\")");
		AnnotationConfigApplicationContext ctx = new AnnotationConfigApplicationContext();
		ctx.register(foo.class);
		ctx.register(bar.class);
		ctx.refresh();
		final Long foobean = ctx.getBean("foobean", Long.class);
		final Long barbean = ctx.getBean("barbean", Long.class);
		Assert.assertTrue(barbean > foobean);
	}

Have a look at what these tests (which both succeed) output.

Running test with bar scanned before foo, and bar having @DependsOn("foo")
foo.init()
bar.init()
barbean
foobean
Running test with foo scanned before bar, and bar having @DependsOn("foo")
foo.init()
bar.init()
foobean
barbean

Notice that @DependsOn does have an effect. It has the effect it is designed to have: force an initialization order between foo and bar. Notice also that it does not have the effect it is believed to have: force any order between beans inside foo and bar.

Scanning a package is the exact same thing, only the scanning order is not guaranteed. But it’s easy to overcome this problem if first you establish what order happens to be used in your setup, then fixing the @DependsOn the other way and assert that it is ignored.

If you’ve been using @DependsOn between configuration classes, it is time you did some Spring cleaning (a more appropriate pun than the one in the title?).

Fwd: HPCS recap

Not having much time to produce original content, it’s a pleasure to find a post on a subject I was considering to cover, and actually being treated better than I could have hoped to do myself: a post about HPCS on Bartosz Milewski’s blog. Other than the fact that Fortress is not under development but has currently “wound down” (as Guy Steele said in the keynote speech of the JVMLS 2013), the article is up to date. All three approaches resonate with the theme in my post on array processing. I just want to add that both Guy Steele (Fortress) and Vijay Saraswat (X10) have had great influence on me, the former via Common Lisp and Scheme (cf. TinyScheme), the latter via the “Concurrent Constraint Programming” book (cf. my thesis on Concurrent Constraint Logic Programming [Technical Report CERMICS 92-04]). And Fortress intended to tackle the problem of properly dimensioned calculations, which had also been one my favorite subjects.

Even mainstream Java has been oriented towards how to do efficient and practical computing (cf. “Arrays 2.0“).

Enjoy!

Lies, damned lies and methods implementing specialized generic interfaces

You know that I’ve been spoiled by spending years in a land with reified generics. I’ve heard Daniel Spiewak, in an episode of “The Scalawags“, say that he prefers erased generics because they permit more freedom in the typesystem, and Martin Odersky also swears by them, so I’m not ready to condemn them yet. But they can sure tie one in knots.

Here’s a recount of a recent experience of mine, suitably distilled to its essentials. Follows javap disassembly for a completely boring pair of interface and implementation.

Compiled from "Interface.java"
public interface Interface {
  public abstract boolean toBoolean(java.lang.String);
}
Compiled from "Implementation.java"
public class Implementation implements Interface {
  public Implementation();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public boolean toBoolean(java.lang.String);
    Code:
       0: iconst_1
       1: ireturn
}

Surely you must be yawning by now. Let’s spice it up by introducing some Java 8-like base interface. toBoolean looks like Predicate#test, doesn’t it? But this time I’ll also show you the original Java, because you’re in for a surprise.

public interface Predicate<T> {
  public abstract boolean test(T);
}

public interface Interface2 extends Predicate<java.lang.String> {
}

public class Implementation2 implements Interface2 {
 public boolean test(String s) {
  return true;
 }
}
Compiled from "Implementation2.java"
public class Implementation2 implements Interface2 {
  public Implementation2();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public boolean test(java.lang.String);
    Code:
       0: iconst_1
       1: ireturn

  public boolean test(java.lang.Object);
    Code:
       0: aload_0
       1: aload_1
       2: checkcast     #2                  // class java/lang/String
       5: invokevirtual #3                  // Method test:(Ljava/lang/String;)Z
       8: ireturn
}

What the…? Apparently, the mere act of defining Interface2 as a specialization of Predicate, actually specialized the signature of test, so as to allow uses of “real” test(String). But, at the same time, the compiler injected an erased test that calls the specialized one. That’s how I found it. I was creating the implementation with Javassist, and the Javassist compiler did not perform this sleigh-of-hand. Which was OK until I tried to call the method and got an abstract method error. Not a very common sight, is it?

Oh, and you cannot do this explicitly with Java code, because trying to do so gives you an error.

OtherImplementation2b.java:2: error: name clash: test(Object) in OtherImplementation2b and test(T) in Predicate have the same erasure, yet neither overrides the other
 public boolean test(Object o) {
                ^
  where T is a type-variable:
    T extends Object declared in interface Predicate
1 error

boolean test(java.lang.Object) should have been final, but isn’t. Even so, the aforementioned error does not let you override it, anyway. You can only override the specialized method.

Ultimately, it is not just compilers for “other” languages who need to jump into hoops to get around the limitations of Generics. This is a case of the Java compiler itself jumping through hoops. Remember that the Javassist-created class compiled fine and was loaded fine, but the error stroke at runtime. Beware!

UPDATE: These phantom methods are called “bridge” methods, and they are utilized in all places where the typesystem needs to express things the JVM cannot. They sport the ACC_BRIDGE flag.

Intersection of automata, and why should you care

Regular readers of this blog (both of them…) know that I have a propensity to make actual use of basic theory, and this has served me really well, all these years, primarily because most others don’t. After all, you know the saying that, in theory, theory and practice do not differ but, in practice, they do. Well, the point is to know when they do and when they don’t. And I’ll stop being cryptic now and tell you that the theory concerning this post is automata theory.

I first used statemachines as an explicit programming abstraction around 2001, in a system that handled SMS dialogs with mobile subscribers (I say explicit, because, at some level, any system mutating state, transitions between states of an implicitly defined statemachine). Statemachines are a very concise way to model such things, as they have a memory of exactly unit length. You collapse all past history on this one thing, which is the current state. You gain an impoverished but surprisingly sufficient language (a regular grammar) to handle events.

And then, reality intervenes, and things become messy. There are Moore machines, and Mealy machines and statechart diagrams, the UML standard way to model statemachines for software, that encompass both. Because we don’t just care to recognize a series of events, what we want is to act on them, so we need a finite state transducer. I’ll use the Mealy viewpoint (actions linked to transitions) in the sequel. Let me relate some thoughts of mine about how to think about transition guards.

We start with a sample transition, as in the diagram.

Statemachine transition with guard

Statemachine transition with guard

You move from State 1 to State 2 when Event 1 arrives, but only when guard() is true. Supposedly, guard() is a boolean expression on the World. It can be either true or false, or else we wouldn’t bother asking, and, supposedly, the World sometimes makes it true and sometimes makes it false. I’ll come back to this in a moment. Suppose, now, that, the World being in a state of making guard() true, is represented as a real state, in another statemachine.

Two automata

Two automata

We now have two statemachines, and I’ve taken the liberty to make the World statemachine understand the Event 1 transition as well (not doing anything important with it). The reason I did that, was to pave the way to forming the intersection of the two statemachines.

Product of automata

Product of automata

What just happened, is that we got rid of the guard, at the cost of having to maintain a second statemachine, the World one. In a system where transitions in the World statemachine are also done from within the system itself, this amounts to optimizing the guard calculations to happen only when the World changes, and memoizing the result. The main statemachine should also understand (and similarly ignore) the events of the World one, so that the intersection statemachine is well-defined.

Don’t get all worked up about the complexity of all that, because you don’t have to physically form the intersection. All that “understanding” and “ignoring”, you should read it as happening implicitly. In plain words, you can just define transitions in the second machine in response to World-changing events, and transitions in the first machine in response to the main events, but the latter transitions will be defined on product-states, formed from one state from each statemachine.

I’ll be spending more time on statemachines in the immediate future, so you may expect more stuff coming from that direction. I can’t let you in on job-related things, but I’ll be happy to share basic knowledge.

HoSql Type System proof-of-concept

Well, even “nugget”-sized posts proved to be beyond my ability, these days. You see, I’m in a new job, with everything that goes with that. It’s exciting, but super-exhausting, and invigorating, but frustrating, and would have given me zillion things to blog about, had I time to do that. I’m back in the JVM, as you probably understood by my love affair with Scala. But this post is not about that. This post is, actually, a continuation of the one about Higher-Order SQL, triggered by my discovery of Yeti.

Yeti is a strict language of the ML family (syntactically not close to Standard ML, though), with a characteristic that suits me very much: it supports structures in its typesystem, which is of the Hindley-Milner variety. Which means that, some of the work for the “Higher-Order SQL” typesystem has already been done (the unification part — the constraint satisfaction part, I still have to do myself). How does this work? I’ll draw upon Yeti and papers on strongly-typed query languages to try and answer that.

Yeti supports structures, otherwise known as records. Staying informal, a structure type is a collection of labeled types. For example, the type of {bar="3", foo=3}, in Yeti, is {bar is string, foo is number} (that’s actually the way the REPL shows it). If you read how structures behave in Yeti, it will probably remind you of HList in Haskell (at least if you know as little Haskell as I do). BTW, even though Scala is not a functional-first language, its malleability has given opportunities for lots of functional explorations: check out HList in Shapeless by Miles Sabin, a fellow described by Dick Wall as “not being comfortable unless he’s programming with his hair on fire”. Here, I managed to slip a bit of Scala in, again.

We can use structure types to simulate relation types, if we forget, for a moment, that relations are sets of structures. Enclosing everything inside a set wouldn’t add anything to our discussion.

Let’s compile a cookbook of HoSQL expressions together with static types and the Yeti expressions I used to coerce the compiler to perform the needed unifications. Newlines cannot be used in the REPL, nor are they always produced in the output, but I’ll insert some manually to wrap the expressions conveniently.

Projection. In HoSQL, you don’t have to always use projection, like in SQL. But Yeti does not have an empty structure type, so I’ll be jumping past the from x, with x completely unconstrained, expression. It would just result in an unconstrained type. Apart from that, the dot-star shortcut can only be used as a syntactic shortcut for known relations, and is not part of HoSQL. I never intended HoSQL to represent syntactic abstractions.

HoSQL:

select x.a, x.b from x

Type:

{.a is 'a, .b is 'b} -> {a is 'a, b is 'b}

Ignore the dots; dots just mean inferred fields, while their absence means provided ones. In our case, one might interpret the dots as meaning that the input might have other fields, which we care not about, while the output is just what we inferred.

Yeti session:

> do x: {a=x.a, b=x.b} done
<code$> is {.a is 'a, .b is 'b} -> {a is 'a, b is 'b}

Unifying this with an actual relation produces a more specific type. I’m using a structure literal to generate the intended type, here, and this is the equivalent of a first-order term, which can be used to generate SQL (the actual value shown here, of course, is immaterial).

Type:

{a is number, b is string}

Yeti session:

> do x: {a=x.a, b=x.b} done { a=0, b="", c=0 }
{a=0, b=""} is {a is number, b is string}

Selection.

HoSQL:

from x where x.a>3 and length(x.b)>4

Type:

{.a is number, .b is string} -> {a is number, b is string}

Yeti session:

> do x: if x.a==0 and x.b=="" 
> then {a=x.a, b=x.b} else {a=x.a, b=x.b} fi done
<code$> is {.a is number, .b is string} -> {a is number, b is string}

The specific expressions are immaterial, inasmuch as we just care that they constrain the types appropriately.

Union. The good part begins here.

HoSQL:

select x.a, x.b from x
union
select y.a, y.b from y

Type:

{.a is 'a, .b is 'b} -> {.a is 'a, .b is 'b} -> {a is 'a, b is 'b}

Yeti session:

> do x y: if x.a==y.a and x.b==y.b 
> then {a=x.a, b=x.b} else {a=y.a, b=y.b} fi done
<code$> is {.a is 'a, .b is 'b} -> {.a is 'a, .b is 'b}
 -> {a is 'a, b is 'b}

Join. And the good part continues.

HoSQL:

x inner join y on x.a = y.a

Type:

('b is {.a is 'a}) -> {.a is 'a} -> {x is 'b, y is {.a is 'a}}

Yeti session:

> do x y: if x.a==y.a then {x, y} else {x, y} fi done
<code$> is ('b is {.a is 'a}) -> {.a is 'a} -> {x is 'b, y is {.a is 'a}}

Recursive CTE. But the best part is here.

HoSQL:

let t = 
 select x.a, x.b from x
 union
 select t.a+1, t.b-1 from t where t.b>0
t

Type:

{.a is 'a, .b is number} -> {a is 'a, b is number}

Yeti session:

> f x = if x.b==0 then { a=x.a, b=x.b } else f x fi
f is {.a is 'a, .b is number} -> {a is 'a, b is number} = <code$f>

These are the building blocks for most queries (and you can see how the little I left out can work in that point of view). This was just a demonstration of real-life unification of structure types at work. I believe that adding value constraints to the typesystem can build on top of that, without disturbing the waters. Next time, I hope I can return to the “nuggets” and continue the face-off with a demonstration of mixed-site variance in Kotlin.

Generics Face-Off, Nugget #1

My last post was a bit overly enthusiastic about Scala. To counterbalance it, I present another “nugget” in this post, inspired by my slim experience on using Scala together with Java. You see, Scala aims to remedy some shortcomings of the Java typesystem, in particular its generics, which makes it somewhat uncomfortable, at times, to bridge the two. And the face-off you read in the title, will be between Scala and another JVM language, Kotlin, which aims more towards being a better Java than providing a wholly different solution. Let’s see how they do, using the scale used by Catch-22‘s Colonel Cathcart.

Java’s wacky use of Generics

Suppose there’s a generic Java class, named Something, presented here as a bare-bones example.

public class Something<T> {
    @Override
    public String toString() {
        return "Something";
    }
}

And here’s the rub. Java allows you to “cop out” of Generics and use either the erased type, or an existential type, in the following meaningless way (and I’m not making this up).

public class JTest {
    public static Something<?> getSomething() {
        return new Something<String>();
    }
}

Here’s some Scala code to process an instance of Something<String>.

class STest {
    def workWithSomethingString(s : Something[String]) {
        print(s)
    }
}

Suppose you know that the particular instance is a Something<String> (the way you know it lies certainly outside of the type system). Going from Java to Scala produces an unchecked warning, as it should.

    public static void scalaDriver() {
        STest sTest = new STest();
        sTest.workWithSomethingString((Something<String>)getSomething()); 
        //flagged as unckecked
    }

Enter Kotlin.

class KTest {
    fun workWithSomethingString(s:Something<String>) {
        print(s)
    }
}

Same thing here.

    public static void kotlinDriver() {
        KTest ktest = new KTest();
        ktest.workWithSomethingString((Something<String>) getSomething());
        //flagged as unckecked
    }

Both fail to cope with Java’s incongruent use of Generics. Hence, a black eye for both, in that regard. Hint: at the call site, it’s the Java compiler that runs the show. There’s really no way to win. Let’s see what we get if we give full control of the calling site to the other compilers.

Reified generics

If we disregard direct interconnection with Java, for a moment, both Scala and Kotlin want to provide some solution to being able to recover type information that is lost by erasure.

Scala does it using TypeTags and such, in various ways which employ the compiler to supply information about generic types from the calling site to the called method.

Below is one way to do it. The first call to workWithSomething delegates correctly to workWithSomethingString.

    def passTheBuck() {
        workWithSomething(new Something[String])
        workWithSomething(new Something[Int])
    }

    def workWithSomething[T](s : Something[T])(implicit t : TypeTag[T]) {
        if( t.tpe.typeConstructor =:= typeOf[String] ) {
            workWithSomethingString(s.asInstanceOf[Something[String]])
        } else {
            print("\nCannot handle\n")
        }
    }

How about Kotlin? Kotlin has promised reified generics, but it’s still in the roadmap. Black eye for Kotlin, feather in Scala’s cap.

Things don’t look so good for Kotlin, so far. In the next nugget, let’s see if it manages to recoup, using its distinguishing feature of mixed-site variance.