Anti-Phishing Measures in Tellody

(Cross-posting of https://www.tellody.com/blog/166-anti-phishing-measures-in-tellody)

Sending bulk emails is a risky business because, on the one hand, for Tellody that is running it, it’s prone to be abused by scammers and, on the other, for our users, it is easy to create emails in good faith but inadvertently contain indicators that will taint them.

Protecting ourselves and our users is an open-ended issue without easy solutions, but we decided that something is better than nothing and some action is better than no action. We tried to implement some kind of spoofing detection, as a first step. In this article, I’ll briefly explain what we have implemented, what you have to look out for and how you can fix your email if it gets flagged. Since we target phishers, the component is called “CoastGuard”.

I will use the following sample screen that contains all current notifications, as my reference.

phishing-indicators-2

 

First of all, you should not mix scripts in your words. Mixing scripts is a prime indicator of spoofing because words can appear as meaning one thing without actually writing them the way they are supposed to be. For example, the “Spoofed” notification appears to refer to the word “Paypal” but it mixes two scripts, “Basic Latin” (what “Paypal” is supposed to be written in) and “Greek and Coptic”. In this case it’s the uppercase Greek Rho, indistinguishable in all manners from the Latin uppercase P, that triggers this notification. There’s no legitimate reason to use more than one script within a word, and there are plenty of illegitimate ones so, if you find yourself needing to use a particularly clever brand name, that you just thought of, which uses more than one script, Tellody will not allow you to send it in an email. Get in touch if this is something essential to you.

Second, Punycodes have been shown to present the same spoofing opportunities, with little added value. The Punycode that is rejected above looks the same as https://www.epic.com but is not the same URL. Tellody rejects punycodes altogether, at least until major browsers start presenting adequate warnings about them. Again, if using a punycode is essential to you, get in touch with us.

The third category of spoofing indicator is when we think that someone tries to masquerade link text to make it appear as if it’s a URL. Maybe not many people would be fooled by “http:\\”, but would you be fooled by “http:⁄⁄”, containing a character that looks a lot like a forward slash? I know I would.

The next two indicators pertain to using the plethora of Unicode characters to make a string look as if it’s contiguous but is actually broken by characters with no width or scarcely any width. Adding such a pseudo-space character can create a string that looks to the eye like “Viagra” but fails to match a filter that looks for an exact match.

Finally, the last indicator is shown when the text of a link is something that looks as if it is a URL but it is different than the actual link target. A URL that appears to go to http://www.apple.com, while it goes to some shady site, is rejected by Tellody. Note that “appears” means how we think the text appears to a human and, the more tricks we detect to make something look like a URL while failing the straightforward syntactic check for one, the worse off we deem it to be.

This is a first version of those checks and they might evolve over time. If you find yourself having a legitimate business need that is hampered, please get in touch with info@tellody.com to discuss it.

Gender Detection, Vocatives and Namedays for Greek First Names in Tellody

(Cross posting of https://www.tellody.com/blog/164-gender-detection-and-vocatives-for-greek-first-names-in-tellody)

When Tellody was selected by Wind to run the MarketApp service, a big market opened in Greece. Even though Greek is one of the supported UI languages (and currently only EN and EL are usable in the language selector), we felt that the Greek audience deserved some more functionality, enabled by the inherent structure of the Greek language.

You might have noticed that, when the UI language is Greek, you get two new buttons for “message tags”. These enable you to use vocatives for the first and last names.vocative_buttons

We made a decision that we’d limit ourselves to traditional birth names and usual diminutives. That ruled out names of foreign origin like “Μισέλ” or “Νάταλι”. First of all, these names cannot be inflected, so there can be no separate vocative form for them. Secondly, we cannot use them unambiguously for detecting gender. Both previous examples can be male or female names.

This leaves baptismal names which are of ancient origin (ancient Greek names or names from the Old Testament) and modern ones. Both detecting the gender and forming the vocative are operations on the structure of the name, which we’ve implemented in Tellody. Because of that, we can handle a lot more names than what is in the name list we are using for matching names against, which we are currently using to calculate the nameday for the corresponding names.

We test for phonetic equality, so we’ll match various phonemes, no matter how they are spelled. However, we handle diacritic marks in the same phonetic way. We’ll handle omission of the accent mark (Unicode Character ‘GREEK TONOS’ U+0384) and unneeded uses of diaeresis (Unicode Character ‘COMBINING DIAERESIS’ U+0308) that don’t affect syllabication and enunciation, but we won’t match, for example, “Χάιδω” (the first syllable of which is a diphthong of “α” and “ι”) and “Χαιδω” or “ΧΑΙΔΩ”, in uppercase, (the first syllable of which is the composite vowel “αι”, same phoneme as “ε”). However, “Χαϊδω”, having two first syllables heard as “α” and “ι”, will be treated as phonetically equal to the diphthong “άι”, since both consist of the same phonemes.

Namedays come in five flavors:

  1. Fixed date
  2. Easter ± (n)
  3. If Easter is before April 23rd then (Fixed date) else Easter ± (n)
  4. First Sunday on or after December 11th
  5. First Sunday on or after February 13th

Tellody, at the time of this writing, includes data for approximately 1800 name forms for 1250 male and female names, with 800 of them having nameday data.

The Universe conspired to bring me precompiled binaries for Idris

Installing it with cabal on Windows proved impossible. A SuSe VM on VirtualBox couldn’t cut it either. Actually installing MinGW, which was the recommended way on WIndows, didn’t give me Idris, either. But still, I wanted it. And, sure enough, as Paulo Coelho predicted, the Universe conspired to help me get it, and one can now download precompiled binaries for Windows!

Now that I had a real dependently-typed programming language, with realistic-looking tutorials and documentation, I thought it was a great excuse to implement type-safe dimensional calculations in it. The resulting code is so terse, compared to the Scala example, it almost feels like cheating. In the beginning, is the dependent type itself.

data DimVal : Int -> Int -> Int -> Int -> Type where
    DV :  (m:Int) -> (k:Int) -> (s:Int) -> (a:Int) -> (v:Float) -> DimVal m k s a

Please compare it with the Scala (and C++) equivalent. The similarities are obvious but this version completely blends the two levels together. DimVal is a type constructor indexed by four integers and DV is its sole constructor.

Calculating with these types needs the usual arithmetic operations. Unfortunately, trying to overload the operators did not work for me, as they were imported already by the Num class in the Prelude.

||| The Num class defines basic numerical arithmetic.
class Num a where
(+) : a -> a -> a
(-) : a -> a -> a
(*) : a -> a -> a
||| Absolute value
abs : a -> a
||| Conversion from Integer.
fromInteger : Integer -> a

DimVal cannot be made an instance of Num, because types need to be closed under all three operations, and this cannot be the case, since the point is to represent each dimension vector by a different type. So I faked four appropriate operators.

infixr 10 /+/
(/+/) : DimVal m k s a -> DimVal m k s a -> DimVal m k s a
(DV m k s a v1) /+/ (DV m k s a v2) = DV m k s a (v1+v2)

infixr 10 /-/
(/-/) : DimVal m k s a -> DimVal m k s a -> DimVal m k s a
(DV m k s a v1) /-/ (DV m k s a v2) = DV m k s a (v1-v2)

infixr 20 /*/
(/*/) : DimVal m1 k1 s1 a1 -> DimVal m2 k2 s2 a2 -> DimVal (m1+m2) (k1+k2) (s1+s2) (a1+a2)
(DV m1 k1 s1 a1 v1) /*/ (DV m2 k2 s2 a2 v2) = DV (m1+m2) (k1+k2) (s1+s2) (a1+a2) (v1*v2)

infixr 20 ///
(///) : DimVal m1 k1 s1 a1 -> DimVal m2 k2 s2 a2 -> DimVal (m1-m2) (k1-k2) (s1-s2) (a1-a2)
(DV m1 k1 s1 a1 v1) /// (DV m2 k2 s2 a2 v2) = DV (m1-m2) (k1-k2) (s1-s2) (a1-a2) (v1/v2)

Next up, printing dimensioned values. For this, I used an intermediate Dimension type.

data Dimension : Type where
    Dim : (m:Int) -> (k:Int) -> (s:Int) -> (a:Int) -> Dimension    

unitToStr : Dimension -> String
unitToStr (Dim m k s a) =
    case (m, k, s, a) of
        (0,0,1,1) => "C"
        (1,1,-2,0) => "N"
        (0,0,-1,0) => "Hz"
        (-1,1,-2,0) => "Pa"
        (2,1,-3,-2) => "ohm"
        (2,1,-3,-1) => "V"
        (-2,-1,3,2) => "S"
        (2,1,-3,0) => "W"
        (2,1,-2,0) => "J"
        (-2,-1,4,2) => "F"
        (2,1,-2,-2) => "H"
        (2,1,-2,-1) => "Wb"
        (0,1,-2,-1) => "T"
        _ => (if m /= 0 then "m" ++ (if m /= 1 then (show m) else "") else "") ++
            (if (m /= 0) && (k /= 0) then "." else "") ++
            (if k /= 0 then "Kg" ++ (if k /= 1 then (show k) else "") else "") ++
            (if (abs(m) + abs(k)) /= 0 && s /= 0 then "." else "") ++
            (if s /= 0 then "s" ++ (if s /= 1 then (show s) else "") else "") ++
            (if (abs(m) + abs(k) + abs(s)) /= 0 && a /= 0 then "." else "") ++
            (if a /= 0 then "A" ++ (if a /= 1 then (show a) else "") else "")
            
toStr : DimVal m k s a -> String
toStr (DV m k s a v) = (show v) ++ " " ++ (unitToStr (Dim m k s a))
    
instance Show (DimVal m k s a) where
    show dv = toStr dv

The Num class was out of the question, but Eq and Ord are readily instantiable.

dimValEq : DimVal m k s a -> DimVal m k s a -> Bool
dimValEq (DV m k s a v1) (DV m k s a v2) = (v1==v2)
    
dimValComp : DimVal m k s a -> DimVal m k s a -> Ordering
dimValComp (DV m k s a v1) (DV m k s a v2) = compare v1 v2

instance Eq (DimVal m k s a) where
    dv1 == dv2 = dimValEq dv1 dv2

instance Ord (DimVal m k s a) where
    compare dv1 dv2 = dimValComp dv1 dv2

Idris has syntax support for creating DSLs so, a native Idris speaker would probably go on to create a better DSL for creating values, arithmetic etc. I’ll just show how type-safe calculations can naturally be encoded in plain Idris. I did not go into the trouble of defining all named physical types but I guess you see how this can be done without any problem.

Mass : Type
Mass = DimVal 1 0 0 0

mass : Float -> Mass
mass = DV 1 0 0 0

main : IO ()
main = 
    let x = DV 1 1 (-2) 0 14.5 in
    let y = DV 1 1 (-2) 0 15.5 in
    let z = DV 2 2 0 0 15.5 in
    let xpy = x /+/ y in
    let xmy = x /-/ y in
    let xtz = x /*/ z in
    let xdz = x /// z in
    let m : Mass = mass 12.6 in
        do
            putStrLn (show m)
            putStrLn (show x)
            putStrLn (show y)
            putStrLn (show z)
            putStrLn (show (x == x))
            putStrLn (show (x == y))
--            putStrLn ((show (y == z))) -- Doesn't even compile!
            putStrLn (show (x < y))
--            putStrLn ((show (z < y))) -- Doesn't even compile!
            putStrLn (show xtz)
            putStrLn (show xdz)

It does feel like cheating, doesn’t it? Calculating with physical types is actually done at two levels: the values themselves are operated upon as if they were dimensionless, while the dimensions are vectors representing exponents and are operated upon using exponent arithmetic. Idris handles both levels in the same way, very naturally. It is said that all high-level languages are equivalent in the computations they can express, but their compilers are certainly not equivalent.

My Frugal Summer of Code, part 2

Here I am, continuing my foray into the realm of array programming, that started in my previous post. Let me start with a disclaimer: I do not claim that what you read here is good, idiomatic J.This is not a J tutorial. I mean it as an exposé of the notions of array programming and how they can enhance the worldview of the traditional programmer. I’d love it if it could also be a bridge between mainstream FP and Array Programming, but I’m afraid I’m not sufficiently proficient in either to do it.

Remember the last five lines of code? I’m going to dissect them now for your benefit.

NB. Locations (mask bits) where digits have lines 
digitmasks =. '         ' ~:"(1) (,"2 digits)

Laid out, this means: concatenate all lines in each digit (keeping them in three lines serves only for displaying them). This is done by , (called append), applied to cells of digits of rank 2 (the "2 modifier, called an conjunction in J). Then, compare lines (cells of rank 1, the "1 thing) of that, using the “not equals” comparator (the ~:) to a literal with 9 spaces.

This gives you a bit (an 1) whenever there’s anything than a space. I can do that because the digits, as defined, when they share a graphic element, it’s always the same in the same location. So I’m getting rid of some excessive information to make my life easier. So, digitmasks is the following.

0 1 0 1 0 1 1 1 1
0 0 0 0 0 1 0 0 1
0 1 0 0 1 1 1 1 0
0 1 0 0 1 1 0 1 1
0 0 0 1 1 1 0 0 1
0 1 0 1 1 0 0 1 1
0 1 0 1 1 0 1 1 1
0 1 0 0 0 1 0 0 1
0 1 0 1 1 1 1 1 1
0 1 0 1 1 1 0 1 1
NB. Locations where digits share mask bits
maskcompmatrix =. (-"1/~ digitmasks) ="0 ] 0

Moving on, first, I make a table where, I subtract each line (i.e. digit bitmask) from each other. The -"1/~ digitmasks thing is shorthand for digitmasks -"1/ digitmasks, i.e. tabulate (the / adverb – another kind of higher-level construct, like conjunctions – for a dyadic operation) the operation “subtract each line” (the -"1). Afterwards, the whole thing is compared, bit by bit (the ="0) to zero. The result is a 10 x 10 x 9 bitmask array. Are you getting the hang of it?

NB. Count of similar mask bits between digits
diffmatrix =. +/"1 maskcompmatrix

Further, I sum: insert, using the / adverb, the operation “plus” (the + verb) at the level of each line (the "1 conjunction) on maskcompmatrix. This gives a 10 x 10 array of the number of same bits in each bitmask (digit).

9 5 6 6 5 6 7 6 8 7
5 9 4 6 7 4 3 8 4 5
6 4 9 7 4 5 6 5 7 6
6 6 7 9 6 7 6 7 7 8
5 7 4 6 9 6 5 6 6 7
6 4 5 7 6 9 8 5 7 8
7 3 6 6 5 8 9 4 8 7
6 8 5 7 6 5 4 9 5 6
8 4 7 7 6 7 8 5 9 8
7 5 6 8 7 8 7 6 8 9

Now comes the easy part.

NB. Outside the diagonal: Digits that share 8 out of 9 mask bits
offbyone =. diffmatrix ="(0) 8

I make a 10 x 10 array of selectors (bits) when digits share 8 out of 9 bits.

offbyones =. (monad : '< y # i. 10')"1 offbyone

Then, I make a vector (list, in J) of which digits share 8 bits for each digit.

┌─┬─┬┬─┬┬───┬───┬─┬─────┬─────┐
│8│7││9││6 9│5 8│1│0 6 9│3 5 8│
└─┴─┴┴─┴┴───┴───┴─┴─────┴─────┘

I wanted to spare you (and me!) the trouble of bringing up “boxing” but, since the shape is irregular, it was really the easiest solution. Boxing (conveniently displayed with actual boxes) creates an atomic value of anything you put into it.

And, in reality, I found out that it makes calculations simpler if the digits themselves are included, so I really need the following.

NB. Digits that share 8 or 9 (i.e. themselves) mask bits
sameandoffbyone =. diffmatrix >:"(0) 8

sameandoffbyones =. (monad : '< y # i. 10')"1 sameandoffbyone
┌───┬───┬─┬───┬─┬─────┬─────┬───┬───────┬───────┐
│0 8│1 7│2│3 9│4│5 6 9│5 6 8│1 7│0 6 8 9│3 5 8 9│
└───┴───┴─┴───┴─┴─────┴─────┴───┴───────┴───────┘

This was sort of easy, wasn’t it? I’m afraid the rest will be a little more complicated. I’ll throw it all in one glob, and focus on the big picture first. The operation I’m trying to achieve, is fix broken account numbers by trying similar digits, where similar is defined by off-by-one bit.

NB. fold operation to produce the combinations
foldop =. dyad : ',/ <"1 (>y) ,"1/ (,. >x)'
identity =. <''

NB. Use if checksum is <> 0: try combinations with digits off by one mask bit and see which (if any) produce checksum of 0
alternatives =. monad define
 alt =. >"0 foldop/ identity ,~ |. y { sameandoffbyones
 ((checksum"1 = 0:) alt) # alt
)

There’s a fold taking place, which is a common thing in FP. The first complication is that, in J, for reasons of uniformity with its right-to-left evaluation order, when you fold a list (remember, a list is a vector of something) you also do it right-to-left. That’s why I reverse (the |. thing) the result of indexing (the {) sameandoffbyones by the account digits (y stands for the argument in a monadic verb). Then I append the identity element, which is a boxed empty list. Folding takes place, and I’ll explain the operation folded next but, for now, let’s stick to what the fold produces, which is a list of boxed candidate account numbers, which are unboxed (the >"0) and then checked with checksum (the checksum"1 = 0: is actually a composite verb!) and only those with zero checksum are kept (operation copy, the #).

Example usage:

alternatives 2 4 4 4 1 2 5 6 7
2 4 4 4 1 2 5 6 1
2 4 4 4 1 2 9 8 7
2 4 4 4 7 2 6 6 1

However, don’t try it on 7 2 3 4 5 6 7 8 9. It produces 109 possible alternatives! In retrospect, this error correction policy is not very robust or, rather, the checksum used in not suitable to how digits can be misread.

Let’s go back to the folded operation now. I won’t go into it symbol-by-symbol but here’s an English description. One operand holds a list of boxes with partial account numbers, and the result is also a list of boxes with longer partial account numbers, formed by appending the initial partial account numbers with all alternatives for the current digit, held by the other operand as a boxed list.

Have I picked your curiosity until now? Have I succeeded in presenting J as more than a curiosity itself? There is real power in its notions and it goes beyond its conciseness. Imagine all these operations using standard Scala, for example. “It’s easy if you try.” I think it is a useful thing to add to standard FP, which lacks the language to talk about arrays at the level that array programming languages do. I have not even found a data type to represent arbitrary-ranked arrays, let alone how to represent ranked operations and implicit loops and stuff. But I think it can be done, and I’m setting it as a wanna-do-it for the future.

My Frugal Summer of Code

The title being a wordplay on “Google Summer of Code” does not mean that it is not entirely true, as there is hardly any time during vacationing with two little children to write much code. And I admit that most of what you’ll see in this post was written pre- and post-vacations. But I did read a lot of stuff on array programming, APL, J, papers on array functional programming. I thought really hard (but vacation-grade hard) about how some of these things can be grafted on a more conventional functional language. And I did make an honest effort to find a Java or Scala library for dense multidimensional arrays that could provide a basic substrate for such an effort. If the job, Flying Donut or life, in general, let me, you’ll hear more in another post.

The current post, however, will use J to tackle a code kata I found, which I thought would be ideal for some array-fu.

Assume you have the three lines that constitute an account number. The first thing you should do is separate the nine “digits”. Conveniently, the same functionality can be used to generate some data useful in the sequel, so let’s start by separating the ten digits 0 – 9.

alldigits =. ] ;._2 (0 : 0)
 _     _  _     _  _  _  _  _
| |  | _| _||_||_ |_   ||_||_|
|_|  ||_  _|  | _||_|  ||_| _|
)

NB. Break in 3x3 cells
chop =. monad : ',/ (3 3 ,: 3 3) ];._3 y'

digits =. chop alldigits

I know, most of it looks like a cartoon character swearing. What you see in the above is a handy way of writing multi-line data and a handy higher-level operation to apply an operation to subarrays of an operand. Let’s chop a sample account number.

NB. A number with no wrong digits
num =. ] ;._2 (0 : 0)
    _  _     _  _  _  _  _ 
  | _| _||_||_ |_   ||_||_|
  ||_  _|  | _||_|  ||_| _|
)
 
nums =. chop num
<"_1 nums

The resulting “digits” are “boxed” so that they are also displayed as boxed.

┌───┬───┬───┬───┬───┬───┬───┬───┬───┐
│   │ _ │ _ │   │ _ │ _ │ _ │ _ │ _ │
│  |│ _|│ _|│|_|│|_ │|_ │  |│|_|│|_|│
│  |│|_ │ _|│  |│ _|│|_|│  |│|_|│ _|│
└───┴───┴───┴───┴───┴───┴───┴───┴───┘

And the magic continues. Let’s write lots of nested loops to recognize the corresponding digits.

NB. Results in character list
digitize =. digits&i.

Ok, I was kidding. Here’s what digitize nums gives you: the vector 1 2 3 4 5 6 7 8 9. Transforming it to a human readable string is equally ridiculously simple.

NB. Takes digitized list, results in character list 
show =. {&'0123456789?'

Non recognized digits result in a past-the-end index, hence the ‘?’ after the last valid digit.

NB. A number with one wrong digit
wrongnum =. ] ;._2 (0 : 0)
_   _  _     _  _  _  _  _ 
  | _| _||_||_ |_   ||_||_|
  ||_  _|  | _||_|  ||_| _|
)  
wrongnums =. chop wrongnum
show digitize wrongnums

The output is ?23456789.

Surely calculating the checksum should be a little more difficult.

NB. (to be used only on correctly parsed numbers)
checksum =. monad : '11 | (1+i. 9) (+/ . *) |. y'

Well, it is a little more difficult, truth be told. But I have to go to bed, so I’ll break off at this point, but only after I compute some data for the next step, which is to try digits to similar ones in order to produce candidate account numbers to recover from checksum errors.

NB. Locations (mask bits) where digits have lines 
digitmasks =. '         ' ~:"(1) (,"2 digits)

NB. Locations where digits share mask bits
maskcompmatrix =. (-"1/~ digitmasks) ="0 ] 0

NB. Count of similar mask bits between digits
diffmatrix =. +/"1 maskcompmatrix

NB. Outside the diagonal: Digits that share 8 out of 9 mask bits
offbyone =. diffmatrix ="(0) 8

offbyones =. (monad : '< y # i. 10')"1 offbyone

I guess this suddenly strikes you as too much code, doesn’t it? Five whole lines! It’s only one line shorter than the code to define a property in Java, for crying out loud!

I’ll get into the details next time!

TO BE CONTINUED

As simple as possible, but not simpler

The simplicity being alluded to in the title, is not some aesthetic measure of code brevity inspired by Martin Odersky’s keynote in ScalaDays. It actually refers to simplifying arithmetic conditions, with the specific goal of proving whether the intersection of two sets, defined intensionally, is empty or not. This problem is actually very hard, but the case I’m interested in is where atomic conditions are linear inequalities involving single variables, which you have to imagine as referring to properties of a type, instances of which you define sets of. I can hear your enthusiasm winding down, because I’ve now simplified the problem a millionfold. One would think that this would be a toy problem for which there would be many solutions and yet, I couldn’t find any that didn’t involve finding concrete solutions and, all I needed was a simplified expression of the condition for the set intersection (for example, I looked at JSR 331). So, I decided to tackle it myself; why not have a little fun, huh?

Digging through the pits of my school-days memory, my first hunch was that some kind of normalization would be in order, and I settled on disjunctive normal form. Please don’t write me that all SAT and SMT solvers use conjunctive normal form and that I probably dug through the armpits of my school-days memory. Hear me out, first.

Each conjunction in DNF can have all its constituent atomic conditions on same variables be merged together, which results in either false, or another atomic condition. Consequently, each conjunction in DNF can be brought to, what can be described as, a simplified form where each variable is mentioned only once. And disjunctions will be fairly uncommon in the use case I was envisioning, which is some query builder interface (but not absent).

I’m going to walk you through my code and thought process. You’ll have to imagine the former being enclosed in a Scala object and the latter enclosed inside that glass display case for brains, that you see in movies. You’ll also have to imagine what role various types and case classes/enums play, because their names are evocative enough.

def toFlatDNF(c: ICriterion) : ICriterion = simplify(toDNF(c))

In a top-down fashion, I’m going first to transform the expression to DNF, then simplify it.

def flatten(c: ICriterion) : List[ICriterion] = {
    c match {
        case Composite(_, li2) => li2
        case _ => List(c)
    }
}

def cartesianProduct[T](xss: List[List[T]]): List[List[T]] = xss match {
   case Nil => List(Nil)
   case h :: t => for(xh <- h; xt <- cartesianProduct(t)) yield xh :: xt
}

def distributeOrOver(ors: List[ICriterion], ands: List[ICriterion]): List[ICriterion] = {
    val andsInOrs = ors.map(flatten)
    val product = cartesianProduct(andsInOrs)
    val allEnds = product.map(_ ++ ands)
    allEnds.map({
        case List(c) => c
        case li => Composite(AND,li)
    })
}

def toDNF(c: ICriterion) : ICriterion = {
    c match {
        case Composite(op,li) => {
            val li2 = li.map(toDNF).toList 
// each might be an OR of up to depth 2 or an AND of depth up to 1
            val (ands, ors) = li2.partition {
                case Composite(AND, c) => true
                case Composite(OR, c) => false
                case _ => true
            }
            op match {
                case OR => {
                    val flators = ors.flatMap(flatten)
                    Composite(OR, flators ++ ands)
                }
                case AND => {
                    val flatands = ands.flatMap(flatten)
                    if (ors.length == 0)
                         Composite(AND, flatands)
                    else {
                        val over = distributeOrOver(ors, flatands)
                        Composite(OR, over)
                    }
                }
            }

        }
        case _ => c
    }
}

What this (supposedly) does is to distribute OR over AND in order to arrive to DNF.

Now, the simplification that follows is based on extending the domain of all values to add a global MIN and a global MAX value, and on rephrasing all equalities and inequalities as interval conditions, whose either limit can be open or closed (methods left() and right() return the corresponding comparisons to the corresponding limit). A special comparison function, cmp arranges for MIN/MAX-aware comparisons.

The expression is normalized, in that way, then merged, then denormalized again to simple equalities and inequalities.

def groupByParameterEvaluable(liFinal: List[ICriterion]): Map[ParameterEvaluable,List[ICriterion]] = {
    liFinal.groupBy({case ParamOpVal(p,_,_) => p})
}

def normalize(c: ICriterion) : List[ICriterion] = c match {
    case ParamOpVal(p, op, values) =>
        op match {
            case EQ => List(ParamOpVal(p, BTW_E, List(values(0), values(0))))
            case LT => List(ParamOpVal(p, BTW_NE, List(MIN, values(0))))
            case GT => List(ParamOpVal(p, BTW_NE, List(values(0), MAX)))
            case LE => List(ParamOpVal(p, BTW_GE, List(MIN, values(0))))
            case GE => List(ParamOpVal(p, BTW_LE, List(values(0), MAX)))
            case NE => List(
                ParamOpVal(p, BTW_NE, List(MIN, values(0))),
                ParamOpVal(p, BTW_NE, List(values(0), MAX))
            )
            case _ => List(c)
        }
}

def denormalize(c: ICriterion) : List[ICriterion] = c match {
    case ParamOpVal(p, BTW_E, List(x, y)) if x == y => List(ParamOpVal(p, EQ, List(x)))
    case ParamOpVal(p, BTW_NE, List(MIN, x)) => List(ParamOpVal(p, LT, List(x)))
    case ParamOpVal(p, BTW_GE, List(MIN, x)) => List(ParamOpVal(p, LE, List(x)))
    case ParamOpVal(p, BTW_NE, List(x, MAX)) => List(ParamOpVal(p, GT, List(x)))
    case ParamOpVal(p, BTW_LE, List(x, MAX)) => List(ParamOpVal(p, GE, List(x)))
    case _ => List(c)
}

def merge(li: List[ICriterion]) : ICriterion = li.reduceLeft(combine)

def simplifyCriteria(criteria: List[ICriterion]): List[ICriterion] = {
    var normalized = criteria.flatMap(normalize)
    var merged = merge(normalized)
    denormalize(merged)
}

def simplify(cDnf: ICriterion): ICriterion = {
    cDnf match {
        case Composite(OR, liConj) => Composite(OR, liConj.map(simplify))
        case Composite(AND, liFinal) => {
            val grouped = groupByParameterEvaluable(liFinal)
            val groupedMerged = grouped.iterator.map({
                case (_, v) => simplifyCriteria(v)
            })
            val merged = groupedMerged.flatten.toList
            Composite(AND, merged)
        }
        case _ => cDnf
    }
}

The most complicated operation in all that is how to combine too intervals (one step of the merging process). Notice that, in order for two intervals to stand a chance of being able to combine, their closed-closed analogs should intersect and the easiest way to express that is as a negation of the condition that they not intersect: either one is before the other, or vice versa.

def fuseBetweenOperator(op1: CriteriaOperator, op2: CriteriaOperator) : CriteriaOperator =
    (op1, op2) match {
        case (GE, LE) => BTW_E
        case (GT, LE) => BTW_GE
        case (GE, LT) => BTW_LE
        case (GT, LT) => BTW_NE
        case _ => throw new Error(s"fuseBetweenOperator cannot be used on $op1 and $op2")
    }

//This combines two comparisons with conjunction
def combine(ic1: ICriterion, ic2: ICriterion) : ICriterion = {
    val FALSE = new FalseCriterion
    if(ic1.isInstanceOf[FalseCriterion]) return FALSE
    val c1 = ic1.asInstanceOf[SimpleCriterion]
    val c2 = ic2.asInstanceOf[SimpleCriterion]
    (c1,c2) match {
        case (ParamOpVal(p,op1,List(v1,v2)),ParamOpVal(_,op2,List(w1,w2))) if cmp(w1,v2)<=0 =>
            if(cmp(w1,v2)==0 && (op1.right()==LT || op2.left()==GT))
                FALSE
            else {
                val (lowerBound, leftop) = if(cmp(w1,v1)<=0) (v1,op1.left()) else (w1,op2.left())
                val (upperBound, rightop) = if(cmp(v2,w2)<=0) (v2,op1.right()) else (w2,op2.right())
                val operator = fuseBetweenOperator(leftop, rightop)
                if(cmp(lowerBound,upperBound)==0 && operator!=BTW_E)
                    FALSE
                else
                    ParamOpVal(p, operator, List(lowerBound, upperBound))
            }
        case (ParamOpVal(p,op1,List(v1,v2)),ParamOpVal(_,op2,List(w1,w2))) if cmp(w1,v2)<=0 =>
            if(cmp(v1,w2)==0 && (op2.right()==LT || op1.left()==GT))
                FALSE
            else {
                val (lowerBound, leftop) = if(cmp(w1,v1)<=0) (v1,op1.left()) else (w1,op2.left())
                val (upperBound, rightop) = if(cmp(v2,w2)<=0) (v2,op1.right()) else (w2,op2.right())
                val operator = fuseBetweenOperator(leftop, rightop)
                if(cmp(lowerBound,upperBound)==0 && operator!=BTW_E)
                    FALSE
                else
                    ParamOpVal(p, operator, List(lowerBound, upperBound))
            }
        case _ => FALSE
    }
}

Having all that implemented, testing for intersection is as simple as forming the conjunction of the conditions that define the sets involved, performing toFlatDNF, and seeing if the result is the false expression. If it isn’t, you have it as an expression to inspect and maybe think about how to change the set definitions so that they not intersect.

Java 8: too little, too late?

Lately, I attended an event that Oracle organized in Athens, which also served as the inaugural event for Java 8. I consider Java 8 a major release (unlike prior “major” releases) which does merit discussion. And, as I usually play the Devil’s Advocate, I’m going to share some thoughts on it.

I’ll focus on certain new features: the most prominent are Streams and lambdas, but Optional and default methods are also worth discussing. I won’t discuss new time support, or any other library improvement.

Streams is an attempt to give Java a real collection library; only, the library consists of a single interface, the Stream, and you have to convert back and forth between it and the real data structures. Still, it’s exciting to go from almost zero to being able to do map, filter, flatMap, fold, reduce etc in Java.

Lambdas will help a lot to bring some long-needed conciseness to Java. But they’re not really lambda expressions, meaning, anonymous functions. The only place where they can exist is where a Functional interface is expected (and a functional interface, in short, is any interface representing a single concern – more on that later). Combined with some limited type inference, this does give a lot of leverage without doing real changes to the language. If you were planning to do heavy higher-order functional stuff in Java, don’t hold high hopes, because the type signatures you have to write will likely dissuade you. There’s no type inference to ease the burden. And no predefined function interface for three or more arguments. But it’s nice for collection operations and such.

Since you know that I don’t have time for lengthy posts, here is a Java 8 sample which demonstrates, in a single shot, various things about Java 8.

    public static void main(String[] args) {
        final Optional<Integer> integerOptional =
                Arrays.stream(args)
                .findFirst()
                .flatMap(a -> getEl(getTheMap(), a))
                .flatMap(SampleUseOptional::tryParseInt)
                .map(i -> i*2);
    }

    public static <K,V> Optional<V> getEl(Map<K,V> map, K key) {
        return map.containsKey(key) ? Optional.of(map.get(key)) : Optional.empty();
    }

    public static Optional<Integer> tryParseInt(String s) {
        try {
            return Optional.of(Integer.valueOf(s));
        } catch (NumberFormatException e) {
            return Optional.empty();
        }
    }

    private static HashMap<String, String> getTheMap() { /* ... */ }

The meat, of course, is the monadic workflow in main. But notice how artfully I crammed in there

  1. the fact that, even arrays can be made into streams, even though they were second-class collections up to Java 8
  2. the fact that, using defender (default) methods to enhance existing interfaces can do nothing for Java arrays, hence the need for Arrays.stream
  3. the fact that Optional is also a monad in Java 8, as are collections (by virtue of Streams)
  4. the fact that Optional is not integrated where it makes sense, like Map.get or Integer.parse
  5. the fact that there’s shorthand for referring to a method
  6. the fact that there’s no shorthand to refer to a curried method (e.g. getEl(getTheMap())

Let’s work on that lack of currying for a moment. Surely there’s some (verbose) way to define what currying means. Here is an appropriate definition for a two-argument, may I dare say?, function.

    private static <T1,T2,R> Function<T2,R> curryFirstOfTwo(BiFunction<T1,T2,R> f, T1 x) {
        return y -> f.apply(x,y);
    }

Ok, that was not so bad. This is still Java, after all. Let’s try to use it.

        final Optional<Integer> integerOptional3 =
                Arrays.stream(args)
                .findFirst()
                .flatMap(curryFirstOfTwo(SampleUseOptional::getEl, getTheMap()))
                .flatMap(SampleUseOptional::tryParseInt)
                .map(i -> i * 2);

Hmm, IDEA chokes on this, and stops complaining only when the first argument to curryFirstOfTwo is rewritten as a lambda with typed arguments, but the actual compiler fares better. So this amount of boilerplate gives you an idea of what you can expect when you want to go a little further than the design space of the new features. Still, it’s feasible. And don’t forget all the fun, implementing interfaces for functions of arity N > 2!

I didn’t cover what Optional is, because it is what its name implies: it’s what Scala and F# call Option, Haskell calls Maybe and Guava already had with the same name. It’s your way out of NPE Hell, and how you’ll avoid being ridiculed by your C/C++ programmer friends to which you dared boast one day that programming in Java is safer. They’re still laughing, but not for long. The simplicity of this idea makes it be misunderstood. How can anything so simple have so much value? Aren’t nullity declarations, non-nullable types, nullity inference, elvis operators and such (found in Kotlin, Gosu, Fantom), similar, if not better? No, in my opinion, they aren’t. But this is a discussion for another day. The important thing is that Optional is now part of Java 8, and yet it isn’t, since the whole behemoth of its library is still exception-waiting-to-happen-ridden. But it can be part of your codebase.

Default methods, which were added to allow Oracle to extend interfaces, but now allow anyone to do as much, bring multiple (implementation) inheritance in Java. It’s that simple. No diamond problem, according to a presentation from Oracle I watched. At least, no diamond problem if you restrict the problem to how state is initialized in multiple inheritance, not how the rest of the world defines it, in a more general sense. With great power comes great responsibility, and Java, breaking out of its cocoon, starts having grown-up-language problems. Welcome, Java!

So, is Java 8 too little, too late? In my opinion, no and no. It doesn’t come up ahead. It just plays catch with everyone else. But it’s the first time it did anything to approach everyone else in a long time.

Already, Java 8 supersedes Guava for functional-ish programming (my up-to-now escape to sanity while doing Java), which had opted for a more limited set of features. Will it clash with Scala? Certainly not. If anything, it will pull more developers out of the stagnating waters older versions of Java had left them in. Thousands of people had stopped thinking about where programming can go to. Out of some twisted notion of devotion to Java? Out of safety in doing what thousands of others do? Out of arrogance for belonging to the biggest programming community? Something, in any case. And now, it is these people that are the least qualified to make the move to Java 8, while anyone with experience in any other JVM, .Net or any other language should feel right at home.

However, I am pleased to say that many more thousands of people are thirsty for more, which is a sign of sanity in the community. Adoption of Java 8 has a chance of being a lot faster than that of previous releases, and any JVM language will benefit from that happening. Not to mention that, increasing the momentum of the JVM might trigger faster evolution of over-the-horizon features like value classes and real arrays.

Let’s get physical

I’m a mechanical engineer and, although I studied for software engineering degrees right after I graduated, I did my fair share of numerical calculations during my five years of studies in engineering. Some of it in Turbo Pascal, but most of it in the recommended language, which was Fortran. It was very spartan but, one thing it did well, was get out of the way between engineers and their calculations. Around 2007, 17 full years away from purely numerical computing, I turned around to see what had changed in the domain of calculating with physical quantities. Believe me, any new version of Fortran that has come out was a huge improvement over Fortran 77, but I found that not much had changed in the calculations themselves. In particular, regarding calculations with physical units and proper dimensional analysis. The kind that could avoid disasters like these.

Before I tell you what I concocted in 2007 and then in 2014, let me stay a little longer on Fortran, or rather what was designed to be a modern-day Fortran: Fortress.

Fortress

Fortress, the brainchild of Guy Steele, was an experimental project in Sun to answer the HPCS challenge (cf. “Fwd: HPCS recap”). After it was not selected in subsequent rounds of the challenge, the project persisted for a total of ten years, before being axed by Oracle. People explain things like that because of the “pragmatic” approach of Oracle towards Java. Which is a polite way to say that Oracle makes Java walk the streets for money, and probably gives her a good beatin’ whenever she comes back with low earnings.

Fortress had nailed the dimension thing, offering dimensions (decribed as “type-like constructs that are separate from other types”) and units (described as “special constructs that modify types and values […] that are instances of dimensions”). Dimensionless and dimensioned numbers can use the whole gamut of language constructs. In addition to standard SI dimensions and units, many more are available in the standard library and more can be added.

Here is what some dimension-aware calculations would look in Fortress (Fortress is a language written to be read back as close to mathematics as possible, so this is how Fortress code would look like).

x:ℝ64 Length = 1.3m
t:ℝ64 Time = 5 s
v:ℝ64 Velocity = x/t
w:ℝ64 Velocity in nm/s = 17nm/s

Standard SI prefixes like (nano, micro etc) and their abbreviations (n, μ etc) can be used. In order to avoid misunderstandings, these prefixes cannot be used in front of units of dimension Information (bit and byte).

Both dimensions and units can be multiplied, divided, and raised to rational powers to produce new dimensions and units. Multiplication can be expressed using juxtaposition or · (yes, Fortress actually gives the natural meaning of multiplication to juxtaposition!) Division can be expressed using / or per. The syntactic operators square and cubic (or the special postfix syntactic operators squared and cubed) may be applied to a dimension or unit in order
to raise it to the second power, third power, respectively. The syntactic operator inverse may be applied to a dimension or unit to divide it into 1. Ex. grams per cubic centimeter, meter per second squared, inverse ohms.

All in all, a pretty solid treatment. Unfortunately, none of this was actually implemented… However, there is one existing language which does support dimensioned calculations in a very real way.

F#

F# is the only mainstream language, as far as I know, that contains specific constructs for working with units (Curl supposedly does also, but I’m not familiar with it and it’s a long stretch to call it mainstream).

It is not entirely pleasing as, there is no concept of dimension, only the concept of unit of measure, but all the machinery needed for safe dimensional calculations is there. As is common in the .Net and Java world, the units of measure are defined using an attribute (annotation in Java-speak). SI predefined “measures” can be found in the PowerPack.

[<Measure>] type kg
[<Measure>] type s
[<Measure>] type m
[<Measure>] type cm = m / 100
[<Measure>] type ml = cm^3

let gravityOnEarth = 9.81<m/s^2>
let heightOfFall = 15<m>
let speedOfImpact = sqrt(2.0 * gravityOnEarth * heightOfFall)
let someForce = 10<kg m s^-2>

As you see in the example, units can be defined by their own (in which case they can be thought as defining a dimension) or as derived from some unit expression. Such expressions can be suffixed to numeric literals between angle brackets (without any intervening space). As a nod to Fortress, who knows?, juxtaposition stands for multiplication in the numerator of unit expressions. A dimensionless quantity is denoted by the unit expression <1>. One can even have generic units! And, you know, .Net has real multidimensional array types, and value types and also arrays of value types. The whole makes a compelling proposal for a scientist or an engineer.

Dimenso
Back in 2007, none of that was there. So I set myself a goal of experimenting in that domain, leading to the open-source project Dimenso. I focused on just SI quantities and then, just the MKS part together with current (no temperature, no amount of substance and no luminous intensity).

Dimenso: C++

The prototypical implementation was in C++. Believe it or not, the template language in C++ is pretty advanced, even with today’s standards. One could really make a case that C++ template metaprogramming is akin to using dependent types. Because templates in C++, unlike generics in the likes of Java or .Net, can actually be parameterized by integers, in addition to types. Here’s how the definition of the class that embodies dimensioned values would start: template<int L, int M, int T, int C> class DimensionedValue.

Defining what it means to multiply two dimensioned values together is this simple operator (yes! Java-lovers, most other languages do have operator overloads!).

template<int L2, int M2, int T2, int C2>
 DimensionedValue<L+L2,M+M2,T+T2,C+C2> operator*(DimensionedValue<L2,M2,T2,C2> other) {
     return DimensionedValue<L+L2,M+M2,T+T2,C+C2>(magnitude()*other.magnitude());
}

Given all the machinery available in C++, one can really make these objects look and behave like plain numbers. The challenge, then, was to go as close to that in C# and Java, which lacked support for dependent types.

Dimenso: C# and Java

The way I followed was: code generation. I generated all dimension combinations within specific bounds (but I also provided the Python script if one would need a more extended set). These combinations were named after the corresponding exponents (for example, type m_2k0s1a0 represents dimension L-2T) but actual names from physics were used, when available (but not in a right way… if you try to use Dimenso, stick to the ugly names).

This is how an arithmetic operation on “length” would look like in Java, if I hadn’t messed up the name thing.

public velocity divide(time v2) {
    velocity ret = new velocity(this.value/v2.getValue());
    return ret;
}

In C#, which does have operator overloads and value types, the whole is a little more palatable. But still, generating hundreds of classes feels… kind of wrong. Let’s see what the Java of today has to say on the matter.

JSR-275, JSR-363 and JScience
To make a long story short, JSR-275 “Measures and Units” was finally rejected and, its successor, JSR-363 “Units of Measurement API” has not really formed yet but, abandoning at that point would make it really short. That’s why I’m going to look at support for units in JScience, which claims support for such.

Let’s look at some sample code, which will highlight the point I was trying to make with Dimenso.

// Compiler Warnings
Unit<Acceleration> mps2 = Unit.valueOf("m/s²"); // Compiler Warning.
Unit<Power> WATT = JOULE.divide(SECOND); // Compiler Warning.

// Adding run-time check of dimension consistency removes the warning.
Unit<Acceleration> mps2 = Unit.valueOf("m/s²")
    .asType(Acceleration.class); 
Unit<Power> WATT = JOULE.divide(SECOND).asType(Power.class);

// Use wildcard parameterized type (no warning and no check).
Unit<?> kelvinPerSec = KELVIN.divide(SECOND);

JScience has a lot of capabilities for units, including arbitrary systems of dimensions, localizable unit parsing and output and compound units, like those for common time. But there is no way to have the compiler figure out what units two quantities multiplied or divided among themselves have. This is a shortcoming of Java, not of JScience. And, unless you’ve been ruined by too many years of Java, you can tell that no scientist or engineer would put up with so much unwarranted verbosity to do his job.

Do not despair, as Java is not the only choice you have on the JVM.

Scala

Let’s see what I’ve been able to do with some typelevel hacking in Scala. First of all, I’d like to point you to a better solution, Metascala Units, which did come up in my Google search, and a tab in my browser was open, waiting patiently for me to have a look at them, which I did, but after the fact. No matter, because what I really wanted was some practical experience with typelevel programming, with dimension-aware calculations as a good case to work on. The way that Metascala encodes integers in the typesystem is a lot more concise and a lot more elegant than my own, which is based on Shapeless’ encoding of naturals.

Here is some safe code using Metascala Units:

val dist : Length = m(2.3)
val time : Time = s(1.7)
val x = dist * time
val speed : Speed = dist / time

I hope you can appreciate the fact that this looks like using built-in features. It is as concise as it can get and offers full compile-time checking.

And here’s my own stab at this. It’s not part of Dimenso yet, because it’s not fully tested, I believe that if I build stronger typelevel kung-fu, I’ll be able to handle output of units at compile-time, as well, and there is probably some more magic to add with implicit value classes.

case class DimVal[M1 <: ZInt, K1 <: ZInt, S1 <: ZInt, A1 <: ZInt](val value : Double) {
    type T = DimVal[M1,K1,S1,A1]
    def +(b: DimVal[M1,K1,S1,A1]) = new DimVal[M1,K1,S1,A1](value + b.value)

    def -(b: DimVal[M1,K1,S1,A1]) = new DimVal[M1,K1,S1,A1](value - b.value)

    def *[M2 <: ZInt, M <: ZInt, K2 <: ZInt, K <: ZInt, S2 <: ZInt, S <: ZInt, A2 <: ZInt, A <: ZInt]
    (b: DimVal[M2,K2,S2,A2])
    (implicit ev: ZSumAux[M1,M2,M], ev2: ZSumAux[K1,K2,K], ev3: ZSumAux[S1,S2,S], ev4: ZSumAux[A1,A2,A]) : DimVal[M,K,S,A] =
        new DimVal[M,K,S,A](value * b.value)

    def /[M2 <: ZInt, M <: ZInt, K2 <: ZInt, K <: ZInt, S2 <: ZInt, S <: ZInt, A2 <: ZInt, A <: ZInt]
    (b: DimVal[M2,K2,S2,A2])
    (implicit ev: ZDiffAux[M1,M2,M], ev2: ZDiffAux[K1,K2,K], ev3: ZDiffAux[S1,S2,S], ev4: ZDiffAux[A1,A2,A]) : DimVal[M,K,S,A] =
        new DimVal[M,K,S,A](value / b.value)

    def asString () (implicit i1: ToZInt[M1], i2: ToZInt[K1], i3: ToZInt[S1], i4: ToZInt[A1]) = {
        val x1 = i1.apply()
        val x2 = i2.apply()
        val x3 = i3.apply()
        val x4 = i4.apply()
        val u =
        (x1,x2,x3,x4) match {
            case (0,0,1,1) => " C"
            case (1,1,-2,0) => " N"
// ... more special cases to handle derived units elided ...
            case _ => {
                val sb = new StringBuilder
// ... ugly, imperative code elided ...
                if(sb.length > 0) " "+sb else ""
            }
        }
        value.toString + u
    }

}

The dimension calculations are done at the type level, using types ZSumAux and ZDiffAux, and type ToZInt handles conversion to language-level integers. Like I said, not all induction cases are tested, but here’s what they look like now.

import shapeless._
import shapeless.Nat._0

trait ZInt

case class ZPos[T <: Nat]() extends ZInt
case class ZNeg[T <: Nat]() extends ZInt

trait ZSumAux[A <: ZInt, B <: ZInt, C <: ZInt]
trait ZDiffAux[A <: ZInt, B <: ZInt, C <: ZInt]

object ZSumAux {
    implicit def sum3[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : SumAux[A, B, C]) = new ZSumAux[ZPos[A], ZPos[B], ZPos[C]] {}

    implicit def sum4[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : SumAux[A, B, C]) = new ZSumAux[ZNeg[A], ZNeg[B], ZNeg[C]] {}

    implicit def sum5[A <: Nat] = new ZSumAux[ZPos[A], ZNeg[A], ZPos[_0]] {}

    implicit def sum6[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[A, B, C], ev2: LT[B,A]) = new ZSumAux[ZPos[A], ZNeg[B], ZPos[C]] {}

    implicit def sum7[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[B, A, C], ev2: LT[A,B]) = new ZSumAux[ZPos[A], ZNeg[B], ZNeg[C]] {}

    implicit def sum8[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[A, B, C], ev2: LT[B,A]) = new ZSumAux[ZNeg[A], ZPos[B], ZNeg[C]] {}

    implicit def sum9[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[B, A, C], ev2: LT[A,B]) = new ZSumAux[ZNeg[A], ZPos[B], ZPos[C]] {}
}

object ZDiffAux {
    implicit def diff3[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[A, B, C], ev2: LT[B,A]) = new ZDiffAux[ZPos[A], ZPos[B], ZPos[C]] {}

    implicit def diff3b[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[B, A, C], ev2: LT[A,B]) = new ZDiffAux[ZPos[A], ZPos[B], ZNeg[C]] {}

    implicit def diff4[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[A, B, C], ev2: LT[B,A]) = new ZDiffAux[ZNeg[A], ZNeg[B], ZNeg[C]] {}

    implicit def diff4b[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : DiffAux[B, A, C], ev2: LT[A,B]) = new ZDiffAux[ZNeg[A], ZNeg[B], ZPos[C]] {}

    implicit def diff5[A <: Nat] = new ZDiffAux[ZPos[A], ZPos[A], ZPos[_0]] {}

    implicit def diff5b[A <: Nat] = new ZDiffAux[ZNeg[A], ZNeg[A], ZPos[_0]] {}

    implicit def diff6[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : SumAux[A, B, C], ev2: LT[B,A]) = new ZDiffAux[ZPos[A], ZNeg[B], ZPos[C]] {}

    implicit def diff7[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : SumAux[B, A, C], ev2: LT[A,B]) = new ZDiffAux[ZPos[A], ZNeg[B], ZNeg[C]] {}

    implicit def diff8[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : SumAux[A, B, C], ev2: LT[B,A]) = new ZDiffAux[ZNeg[A], ZPos[B], ZNeg[C]] {}

    implicit def diff9[A <: Nat, B <: Nat, C <: Nat]
    (implicit ev : SumAux[B, A, C], ev2: LT[A,B]) = new ZDiffAux[ZNeg[A], ZPos[B], ZPos[C]] {}
}

trait ToZInt[Z <: ZInt] {
    def apply() : Int
}

object ToZInt {
    implicit def toPosInt[N <: Nat] (implicit i: ToInt[N]) = new ToZInt[ZPos[N]] {
        def apply() = i.apply()
    }
    implicit def toNegInt[N <: Nat] (implicit i: ToInt[N]) = new ToZInt[ZNeg[N]] {
        def apply() = - i.apply()
    }
}

Some useful definitions are in object DimValOps.

object DimValOps {
    type Z0 = ZPos[_0]
    type Z1 = ZPos[Succ[_0]]
    type Z2 = ZPos[Succ[Succ[_0]]]
    type Z3 = ZPos[Succ[Succ[Succ[_0]]]]
    type Z4 = ZPos[Succ[Succ[Succ[Succ[_0]]]]]
    type Z_1 = ZNeg[Succ[_0]]
    type Z_2 = ZNeg[Succ[Succ[_0]]]
    type Z_3 = ZNeg[Succ[Succ[Succ[_0]]]]

    type length = DimVal[Z1, Z0, Z0, Z0]
    type mass = DimVal[Z0, Z1, Z0, Z0]
    type time = DimVal[Z0, Z0, Z1, Z0]
    type current = DimVal[Z0, Z0, Z0, Z1]
    type charge = DimVal[Z0, Z0, Z1, Z1]
    type area = DimVal[Z2, Z0, Z1, Z0]
    type volume = DimVal[Z3, Z0, Z1, Z0]
    type velocity = DimVal[Z1, Z0, Z_1, Z0]
    type acceleration = DimVal[Z1, Z0, Z_2, Z0]
    type momentum = DimVal[Z1, Z1, Z_1, Z0]
    type angular_momentum = DimVal[Z2, Z1, Z_1, Z0]
    type frequency = DimVal[Z0, Z0, Z_1, Z0]
    type force = DimVal[Z1, Z1, Z_2, Z0]
    type pressure = DimVal[Z_1, Z1, Z_2, Z0]
    type viscosity = DimVal[Z_1, Z1, Z_1, Z0]
    type energy = DimVal[Z2, Z1, Z_2, Z0]
    type power = DimVal[Z2, Z1, Z_3, Z0]
    type potential = DimVal[Z2, Z1, Z_3, Z_1]
    type capacitance = DimVal[Z_2, Z_1, Z4, Z2]
    type resistance = DimVal[Z2, Z1, Z_3, Z_2]
    type conductance = DimVal[Z_2, Z_1, Z3, Z2]
    type magnetic_flux = DimVal[Z2, Z1, Z_2, Z_1]
    type magnetic_flux_density = DimVal[Z0, Z1, Z_2, Z_1]
    type inductance = DimVal[Z2, Z1, Z_2, Z_2]

    val m = new length(1) //meter
    val kg = new mass(1) //kilogram
    val s = new time(1) //second
    val A = new current(1) //Ampere
    val C = new charge(1) //Coulomb
    val N = new force(1) //Newton
    val Hz = new frequency(1) //Hertz
    val Pa = new pressure(1) //Pascal
    val ohm = new resistance(1)
    val V = new potential(1) //Volt
    val S = new conductance(1) //Siemens
    val W = new power(1) //Watt
    val J = new energy(1) //Joule
    val F = new capacitance(1) //Farad
    val H = new inductance(1) //Henry
    val Wb = new magnetic_flux(1) //Weber
    val T = new magnetic_flux_density(1) //Tesla

    implicit def numToDim(d : Double) : DimVal[Z0,Z0,Z0,Z0] = new DimVal[Z0,Z0,Z0,Z0](d)
}

On the one hand, it looks frightening. On the other hand, it’s just a few lines of code to add a capability that is missing since forever. Here is some putting it through its paces.

    val c0 = new charge(4)
    println("c0="+c0.asString())

    val v = 3 * m / s
    println("v="+v.asString)

    val mom = 10 * kg * v
    println("mom="+mom.asString())

    val e = mom * v
    println("e="+e.asString())

    val p = e / (3 * s)
    println("p="+p.asString())

Which outputs:

c0=4.0 C
v=3.0 m·s-1
mom=30.0 m·kg·s-1
e=90.0 J
p=30.0 W

Look what happens when you try to compile an assignment with mismatched dimensions.

    val p : pressure = e / (3 * s)
Error:(37, 26) type mismatch;
 found   : ShapelessExperiment.DimVal[ShapelessExperiment.ZPos[shapeless.Succ[shapeless.Succ[shapeless.Nat._0]]],ShapelessExperiment.ZPos[shapeless.Succ[shapeless.Nat._0]],ShapelessExperiment.ZNeg[shapeless.Succ[shapeless.Succ[shapeless.Succ[shapeless.Nat._0]]]],ShapelessExperiment.ZPos[shapeless.Nat._0]]
 required: ShapelessExperiment.DimValOps.pressure
    (which expands to)  ShapelessExperiment.DimVal[ShapelessExperiment.ZNeg[shapeless.Succ[shapeless.Nat._0]],ShapelessExperiment.ZPos[shapeless.Succ[shapeless.Nat._0]],ShapelessExperiment.ZNeg[shapeless.Succ[shapeless.Succ[shapeless.Nat._0]]],ShapelessExperiment.ZPos[shapeless.Nat._0]]
    val p : pressure = e / (3 * s)
                         ^

Ok, the message is not entirely simple but, if you understand the type signatures, it is clear how to read it. In conclusion, even though Scala’s compile-time metaprogramming has been said to imitate, rather than implement, dependent types, you can see it is able to quack like a duck and walk like a duck. And it is perfectly capable of stepping in when Java proves too little, even though it cannot overcome fundamental limitations of the JVM.

Spring break

My witty-title-generator was a bit off for this one. The pun does not imply that Spring is broken, not that it has broken anything. But what I’m going to describe is bound to shake (and break?) a misconception around it.

Consider Spring JavaConfig (now officially part of the Spring distribution). I admit it’s not my favorite configuration option (and – both – regular readers know I have strong opinions on DI). But many people prefer it to XML configuration. And some of them believe something about it, which is not true.

Classes annotated with @Configuration are scanned for bean declarations. At the same time, this annotation also creates a bean for the class. So far, so good. Throw @DependsOn to the picture, which can be used to inform Spring about a dependency which cannot be established through the regular means of constructor or property injection. Don’t get me started about how often people skip proper DI in favor of: getting hold of the context, shamelessly pulling a bean out, then using @DependsOn to force the dependency. I might write another post about this (if I find a way to do it without giving in to my impulse to shout it in all caps and lots of exclamation marks). Suppose, for argument’s sake, that its use is warrantied because of some hidden dependency on a side effect or something.

The problem, which prompted this post, is that some people think that using @DependsOn, from a configuration class to some other configuration class, affects the initialization order of the beans declared in them.

Let me demonstrate what’s wrong with this idea, and show you what @DependsOn really does in that case. Classes foo and bar are configuration classes, but I have also exposed their bean faces, by declaring initialization methods in both of them (imports elided for brevity).

@Configuration
public class foo {
	@PostConstruct
	public void init() {
		System.out.println("foo.init()");
	}

	@Bean
	public Long foobean() {
		System.out.println("foobean");
		return System.currentTimeMillis();
	}
}

@Configuration
@DependsOn("foo")
public class bar {
	@PostConstruct
	public void init() {
		System.out.println("bar.init()");
	}

	@Bean
	public Long barbean() {
		System.out.println("barbean");
		return System.currentTimeMillis();
	}
}

The two unit tests that follow register them in two different orders. Scanning a package is the same, except the registering order is arbitrary and does not lead itself to a repeatable test. I’ll tell you how to conduct that test later, if you want to try it.

	@Test
	public void testDependsOnOrderOne() {
		System.out.println("Running test with bar scanned before foo, and bar having @DependsOn(\"foo\")");
		AnnotationConfigApplicationContext ctx = new AnnotationConfigApplicationContext();
		ctx.register(bar.class);
		ctx.register(foo.class);
		ctx.refresh();
		final Long foobean = ctx.getBean("foobean", Long.class);
		final Long barbean = ctx.getBean("barbean", Long.class);
		Assert.assertTrue(barbean < foobean);
	}

	@Test
	public void testDependsOnOrderTwo() {
		System.out.println("Running test with foo scanned before bar, and bar having @DependsOn(\"foo\")");
		AnnotationConfigApplicationContext ctx = new AnnotationConfigApplicationContext();
		ctx.register(foo.class);
		ctx.register(bar.class);
		ctx.refresh();
		final Long foobean = ctx.getBean("foobean", Long.class);
		final Long barbean = ctx.getBean("barbean", Long.class);
		Assert.assertTrue(barbean > foobean);
	}

Have a look at what these tests (which both succeed) output.

Running test with bar scanned before foo, and bar having @DependsOn("foo")
foo.init()
bar.init()
barbean
foobean
Running test with foo scanned before bar, and bar having @DependsOn("foo")
foo.init()
bar.init()
foobean
barbean

Notice that @DependsOn does have an effect. It has the effect it is designed to have: force an initialization order between foo and bar. Notice also that it does not have the effect it is believed to have: force any order between beans inside foo and bar.

Scanning a package is the exact same thing, only the scanning order is not guaranteed. But it’s easy to overcome this problem if first you establish what order happens to be used in your setup, then fixing the @DependsOn the other way and assert that it is ignored.

If you’ve been using @DependsOn between configuration classes, it is time you did some Spring cleaning (a more appropriate pun than the one in the title?).

Fwd: HPCS recap

Not having much time to produce original content, it’s a pleasure to find a post on a subject I was considering to cover, and actually being treated better than I could have hoped to do myself: a post about HPCS on Bartosz Milewski’s blog. Other than the fact that Fortress is not under development but has currently “wound down” (as Guy Steele said in the keynote speech of the JVMLS 2013), the article is up to date. All three approaches resonate with the theme in my post on array processing. I just want to add that both Guy Steele (Fortress) and Vijay Saraswat (X10) have had great influence on me, the former via Common Lisp and Scheme (cf. TinyScheme), the latter via the “Concurrent Constraint Programming” book (cf. my thesis on Concurrent Constraint Logic Programming [Technical Report CERMICS 92-04]). And Fortress intended to tackle the problem of properly dimensioned calculations, which had also been one my favorite subjects.

Even mainstream Java has been oriented towards how to do efficient and practical computing (cf. “Arrays 2.0“).

Enjoy!