My Frugal Summer of Code, part 2

Here I am, continuing my foray into the realm of array programming, that started in my previous post. Let me start with a disclaimer: I do not claim that what you read here is good, idiomatic J.This is not a J tutorial. I mean it as an exposé of the notions of array programming and how they can enhance the worldview of the traditional programmer. I’d love it if it could also be a bridge between mainstream FP and Array Programming, but I’m afraid I’m not sufficiently proficient in either to do it.

Remember the last five lines of code? I’m going to dissect them now for your benefit.

NB. Locations (mask bits) where digits have lines 
digitmasks =. '         ' ~:"(1) (,"2 digits)

Laid out, this means: concatenate all lines in each digit (keeping them in three lines serves only for displaying them). This is done by , (called append), applied to cells of digits of rank 2 (the "2 modifier, called an conjunction in J). Then, compare lines (cells of rank 1, the "1 thing) of that, using the “not equals” comparator (the ~:) to a literal with 9 spaces.

This gives you a bit (an 1) whenever there’s anything than a space. I can do that because the digits, as defined, when they share a graphic element, it’s always the same in the same location. So I’m getting rid of some excessive information to make my life easier. So, digitmasks is the following.

0 1 0 1 0 1 1 1 1
0 0 0 0 0 1 0 0 1
0 1 0 0 1 1 1 1 0
0 1 0 0 1 1 0 1 1
0 0 0 1 1 1 0 0 1
0 1 0 1 1 0 0 1 1
0 1 0 1 1 0 1 1 1
0 1 0 0 0 1 0 0 1
0 1 0 1 1 1 1 1 1
0 1 0 1 1 1 0 1 1
NB. Locations where digits share mask bits
maskcompmatrix =. (-"1/~ digitmasks) ="0 ] 0

Moving on, first, I make a table where, I subtract each line (i.e. digit bitmask) from each other. The -"1/~ digitmasks thing is shorthand for digitmasks -"1/ digitmasks, i.e. tabulate (the / adverb – another kind of higher-level construct, like conjunctions – for a dyadic operation) the operation “subtract each line” (the -"1). Afterwards, the whole thing is compared, bit by bit (the ="0) to zero. The result is a 10 x 10 x 9 bitmask array. Are you getting the hang of it?

NB. Count of similar mask bits between digits
diffmatrix =. +/"1 maskcompmatrix

Further, I sum: insert, using the / adverb, the operation “plus” (the + verb) at the level of each line (the "1 conjunction) on maskcompmatrix. This gives a 10 x 10 array of the number of same bits in each bitmask (digit).

9 5 6 6 5 6 7 6 8 7
5 9 4 6 7 4 3 8 4 5
6 4 9 7 4 5 6 5 7 6
6 6 7 9 6 7 6 7 7 8
5 7 4 6 9 6 5 6 6 7
6 4 5 7 6 9 8 5 7 8
7 3 6 6 5 8 9 4 8 7
6 8 5 7 6 5 4 9 5 6
8 4 7 7 6 7 8 5 9 8
7 5 6 8 7 8 7 6 8 9

Now comes the easy part.

NB. Outside the diagonal: Digits that share 8 out of 9 mask bits
offbyone =. diffmatrix ="(0) 8

I make a 10 x 10 array of selectors (bits) when digits share 8 out of 9 bits.

offbyones =. (monad : '< y # i. 10')"1 offbyone

Then, I make a vector (list, in J) of which digits share 8 bits for each digit.

┌─┬─┬┬─┬┬───┬───┬─┬─────┬─────┐
│8│7││9││6 9│5 8│1│0 6 9│3 5 8│
└─┴─┴┴─┴┴───┴───┴─┴─────┴─────┘

I wanted to spare you (and me!) the trouble of bringing up “boxing” but, since the shape is irregular, it was really the easiest solution. Boxing (conveniently displayed with actual boxes) creates an atomic value of anything you put into it.

And, in reality, I found out that it makes calculations simpler if the digits themselves are included, so I really need the following.

NB. Digits that share 8 or 9 (i.e. themselves) mask bits
sameandoffbyone =. diffmatrix >:"(0) 8

sameandoffbyones =. (monad : '< y # i. 10')"1 sameandoffbyone
┌───┬───┬─┬───┬─┬─────┬─────┬───┬───────┬───────┐
│0 8│1 7│2│3 9│4│5 6 9│5 6 8│1 7│0 6 8 9│3 5 8 9│
└───┴───┴─┴───┴─┴─────┴─────┴───┴───────┴───────┘

This was sort of easy, wasn’t it? I’m afraid the rest will be a little more complicated. I’ll throw it all in one glob, and focus on the big picture first. The operation I’m trying to achieve, is fix broken account numbers by trying similar digits, where similar is defined by off-by-one bit.

NB. fold operation to produce the combinations
foldop =. dyad : ',/ <"1 (>y) ,"1/ (,. >x)'
identity =. <''

NB. Use if checksum is <> 0: try combinations with digits off by one mask bit and see which (if any) produce checksum of 0
alternatives =. monad define
 alt =. >"0 foldop/ identity ,~ |. y { sameandoffbyones
 ((checksum"1 = 0:) alt) # alt
)

There’s a fold taking place, which is a common thing in FP. The first complication is that, in J, for reasons of uniformity with its right-to-left evaluation order, when you fold a list (remember, a list is a vector of something) you also do it right-to-left. That’s why I reverse (the |. thing) the result of indexing (the {) sameandoffbyones by the account digits (y stands for the argument in a monadic verb). Then I append the identity element, which is a boxed empty list. Folding takes place, and I’ll explain the operation folded next but, for now, let’s stick to what the fold produces, which is a list of boxed candidate account numbers, which are unboxed (the >"0) and then checked with checksum (the checksum"1 = 0: is actually a composite verb!) and only those with zero checksum are kept (operation copy, the #).

Example usage:

alternatives 2 4 4 4 1 2 5 6 7
2 4 4 4 1 2 5 6 1
2 4 4 4 1 2 9 8 7
2 4 4 4 7 2 6 6 1

However, don’t try it on 7 2 3 4 5 6 7 8 9. It produces 109 possible alternatives! In retrospect, this error correction policy is not very robust or, rather, the checksum used in not suitable to how digits can be misread.

Let’s go back to the folded operation now. I won’t go into it symbol-by-symbol but here’s an English description. One operand holds a list of boxes with partial account numbers, and the result is also a list of boxes with longer partial account numbers, formed by appending the initial partial account numbers with all alternatives for the current digit, held by the other operand as a boxed list.

Have I picked your curiosity until now? Have I succeeded in presenting J as more than a curiosity itself? There is real power in its notions and it goes beyond its conciseness. Imagine all these operations using standard Scala, for example. “It’s easy if you try.” I think it is a useful thing to add to standard FP, which lacks the language to talk about arrays at the level that array programming languages do. I have not even found a data type to represent arbitrary-ranked arrays, let alone how to represent ranked operations and implicit loops and stuff. But I think it can be done, and I’m setting it as a wanna-do-it for the future.

My Frugal Summer of Code

The title being a wordplay on “Google Summer of Code” does not mean that it is not entirely true, as there is hardly any time during vacationing with two little children to write much code. And I admit that most of what you’ll see in this post was written pre- and post-vacations. But I did read a lot of stuff on array programming, APL, J, papers on array functional programming. I thought really hard (but vacation-grade hard) about how some of these things can be grafted on a more conventional functional language. And I did make an honest effort to find a Java or Scala library for dense multidimensional arrays that could provide a basic substrate for such an effort. If the job, Flying Donut or life, in general, let me, you’ll hear more in another post.

The current post, however, will use J to tackle a code kata I found, which I thought would be ideal for some array-fu.

Assume you have the three lines that constitute an account number. The first thing you should do is separate the nine “digits”. Conveniently, the same functionality can be used to generate some data useful in the sequel, so let’s start by separating the ten digits 0 – 9.

alldigits =. ] ;._2 (0 : 0)
 _     _  _     _  _  _  _  _
| |  | _| _||_||_ |_   ||_||_|
|_|  ||_  _|  | _||_|  ||_| _|
)

NB. Break in 3x3 cells
chop =. monad : ',/ (3 3 ,: 3 3) ];._3 y'

digits =. chop alldigits

I know, most of it looks like a cartoon character swearing. What you see in the above is a handy way of writing multi-line data and a handy higher-level operation to apply an operation to subarrays of an operand. Let’s chop a sample account number.

NB. A number with no wrong digits
num =. ] ;._2 (0 : 0)
    _  _     _  _  _  _  _ 
  | _| _||_||_ |_   ||_||_|
  ||_  _|  | _||_|  ||_| _|
)
 
nums =. chop num
<"_1 nums

The resulting “digits” are “boxed” so that they are also displayed as boxed.

┌───┬───┬───┬───┬───┬───┬───┬───┬───┐
│   │ _ │ _ │   │ _ │ _ │ _ │ _ │ _ │
│  |│ _|│ _|│|_|│|_ │|_ │  |│|_|│|_|│
│  |│|_ │ _|│  |│ _|│|_|│  |│|_|│ _|│
└───┴───┴───┴───┴───┴───┴───┴───┴───┘

And the magic continues. Let’s write lots of nested loops to recognize the corresponding digits.

NB. Results in character list
digitize =. digits&i.

Ok, I was kidding. Here’s what digitize nums gives you: the vector 1 2 3 4 5 6 7 8 9. Transforming it to a human readable string is equally ridiculously simple.

NB. Takes digitized list, results in character list 
show =. {&'0123456789?'

Non recognized digits result in a past-the-end index, hence the ‘?’ after the last valid digit.

NB. A number with one wrong digit
wrongnum =. ] ;._2 (0 : 0)
_   _  _     _  _  _  _  _ 
  | _| _||_||_ |_   ||_||_|
  ||_  _|  | _||_|  ||_| _|
)  
wrongnums =. chop wrongnum
show digitize wrongnums

The output is ?23456789.

Surely calculating the checksum should be a little more difficult.

NB. (to be used only on correctly parsed numbers)
checksum =. monad : '11 | (1+i. 9) (+/ . *) |. y'

Well, it is a little more difficult, truth be told. But I have to go to bed, so I’ll break off at this point, but only after I compute some data for the next step, which is to try digits to similar ones in order to produce candidate account numbers to recover from checksum errors.

NB. Locations (mask bits) where digits have lines 
digitmasks =. '         ' ~:"(1) (,"2 digits)

NB. Locations where digits share mask bits
maskcompmatrix =. (-"1/~ digitmasks) ="0 ] 0

NB. Count of similar mask bits between digits
diffmatrix =. +/"1 maskcompmatrix

NB. Outside the diagonal: Digits that share 8 out of 9 mask bits
offbyone =. diffmatrix ="(0) 8

offbyones =. (monad : '< y # i. 10')"1 offbyone

I guess this suddenly strikes you as too much code, doesn’t it? Five whole lines! It’s only one line shorter than the code to define a property in Java, for crying out loud!

I’ll get into the details next time!

TO BE CONTINUED

APL, J, NIAL and Array Theory

When I was in France, I was doing a stage with Jean-Louis Laurière in a company that dealt with payroll software and services. The idea was to express all payroll logic in a rule-based system, and he had enrolled a senior person, very knowledgeable on the subject, to explain some aspect of it over lunch. That person was a very experienced COBOL programmer and, at one point, after he had explained some calculation using nested loops, I tried to summarize:

– So, this is the Min of x and the Max of y and z.

I got a blank stare and later, when we were alone with Jean-Louis, he said he almost burst into laughter at that moment, because it was obvious he did not understand what I was saying. Mind you, he knew perfectly well how to calculate the number in question procedurally, but he lacked the concepts to express it in the succinct way I was using. He was using a lower level of abstraction, and, since there seems to be a limit on the “chunks” we can hold in our heads, the higher the level of abstraction we use, the further away our minds can reach.

Those of you that share my fascination with Functional Programming probably like it for the same reason: you can express so darned more if you use a higher level of abstraction (higher order functions, in this case). All the better, if modern implementations can perform so well that you can use this abstraction for real-life computations. This post is not about Functional Programming. There are zillion sources to read about that. This post is about what higher abstractions might look like on the realm of array programming and, in particular, APL-like languages. In another post, I might write about the three languages that contested for the HPCS.

One thing you should understand while reading about those languages is that the APL branch of languages spawed away from the conventional trunk very early. I have found a copy of “A Programming Language” and I guess that a Fortran programmer reading it at the time must have felt like someone having found the manual for the Flux Capacitor in the Middle Ages. It’s a safe guess, because I can safely imagine a Java programmer of today feeling the same, since the concepts it is based on are as much distant from Java as they are from Fortran. In fact, the Fortran programmer probably has an edge, since Fortran is ultimately concerned with manipulating array data in the first place. I hope I have tickled your fancy enough to spend some time researching APL, but I’m going to propose an easier route: study J instead, which has a free implementation, can be written in ASCII (you must surely be wondering what APL is written in, if not ASCII) and you can find a very good introductory text from the conventional viewpoint: “J for C Programmers” (can be found in the J Books). This book is very important for the outsider, because it appears that all other sources present J like the most natural thing on earth which only your limited brainpower perceives as different. Sorry, no “J for Functional Programmers” exists, but this is no big deal as array programming has no different treatment in functional languages that in conventional ones. Although you will surely have little voices in your mind whispering about higher order functions while you’ll be reading about J adverbs. Curiously, a presentation of array programming that seems more close to the language of the functional programmer can be found in material for NIAL (download Papers.zip and don’t go to the root of the site), which I did not know but read about in the APL F.A.Q. Array Theory seems like a good theoretical underpinning for such languages.

Rule #34 of the internet supposedly says that, if something exists, there’s porn for it. Here’s some J porn for you. I used this as a programming test once believing that, someone who can learn enough J to answer this, can learn anything else needed on the job. I was not let down.

Given a 9×9 array of integers from 1 to 9, forming the supposed solution to a Sudoku puzzle (http://en.wikipedia.org/wiki/Sudoku), write a J verb (program) that computes whether the 9×9 array is, indeed, a solution according the following rules for Sudoku: Every line, every row and every 3×3 tile (3×3 tiles are arranged in a 3×3 arrangement to tile the entire array, see Wikipedia article) should contain every number from 1 to 9 one unique time.

The solution should use verb ranks to partition the array in all the needed ways (cells), and determine uniqueness by applying the same test verb in the appropriate manner.

And this is my own solution to that.

test3 =: monad define
                items =. ,y
                (9=#~.items) *. *./(items>0),(items<10)
)

is_sudoku3 =: monad define
                *./ (test3"_1 y) , (test3"_1 |: y) , (,(3 3 ,: 3 3) test3;._3 y)
)

Update: At the time I devised this exercise I could not find any relevant article, but I just now found a full Sudoku solver in J at the J Wiki.